commit 8aab2b4410a257349539c4b09ac9038f369094f5 Author: Greg Kroah-Hartman Date: Sun Jan 13 09:24:10 2019 +0100 Linux 4.20.2 commit 69acfe1758b8dbd420958505e65d63bd7fc4cd89 Author: Enric Balletbo i Serra Date: Sat Oct 13 12:56:54 2018 +0200 drm/rockchip: psr: do not dereference encoder before it is null checked. commit 4eda776c3cefcb1f01b2d85bd8753f67606282b5 upstream. 'encoder' is dereferenced before it is null sanity checked, hence we potentially have a null pointer dereference bug. Instead, initialise drm_drv from encoder->dev->dev_private after we are sure 'encoder' is not null. Fixes: 5182c1a556d7f ("drm/rockchip: add an common abstracted PSR driver") Cc: stable@vger.kernel.org Signed-off-by: Enric Balletbo i Serra Signed-off-by: Heiko Stuebner Link: https://patchwork.freedesktop.org/patch/msgid/20181013105654.11827-1-enric.balletbo@collabora.com Signed-off-by: Greg Kroah-Hartman commit 0ec775884e7d761fdfed56e86d286bc66948cffe Author: Boris Brezillon Date: Tue Oct 9 15:24:46 2018 +0200 drm/vc4: Set ->is_yuv to false when num_planes == 1 commit 2b02a05bdc3a62d36e0d0b015351897109e25991 upstream. When vc4_plane_state is duplicated ->is_yuv is left assigned to its previous value, and we never set it back to false when switching to a non-YUV format. Fix that by setting ->is_yuv to false in the 'num_planes == 1' branch of the vc4_plane_setup_clipping_and_scaling() function. Fixes: fc04023fafecf ("drm/vc4: Add support for YUV planes.") Cc: Signed-off-by: Boris Brezillon Reviewed-by: Eric Anholt Link: https://patchwork.freedesktop.org/patch/msgid/20181009132446.21960-1-boris.brezillon@bootlin.com Signed-off-by: Greg Kroah-Hartman commit 59ca55fec3a74ee85545ace346dea4c2d807083f Author: Lyude Paul Date: Wed Nov 14 20:39:51 2018 -0500 drm/nouveau/drm/nouveau: Check rc from drm_dp_mst_topology_mgr_resume() commit b89fdf7ae8500feae1100d8b283176a44d31d698 upstream. We need to actually make sure we check this on resume since otherwise we won't know whether or not the topology is still there once we've resumed, which will cause us to still think the topology is connected even after it's been removed if the removal happens mid-suspend. Signed-off-by: Lyude Paul Cc: stable@vger.kernel.org Signed-off-by: Ben Skeggs Signed-off-by: Ben Skeggs Signed-off-by: Greg Kroah-Hartman commit 243b1fc746f0b04d3f1947ab92153925a764d156 Author: Christophe Leroy Date: Mon Dec 10 08:08:28 2018 +0000 lib: fix build failure in CONFIG_DEBUG_VIRTUAL test commit 10fdf838e5f540beca466e9d1325999c072e5d3f upstream. On several arches, virt_to_phys() is in io.h Build fails without it: CC lib/test_debug_virtual.o lib/test_debug_virtual.c: In function 'test_debug_virtual_init': lib/test_debug_virtual.c:26:7: error: implicit declaration of function 'virt_to_phys' [-Werror=implicit-function-declaration] pa = virt_to_phys(va); ^ Fixes: e4dace361552 ("lib: add test module for CONFIG_DEBUG_VIRTUAL") CC: stable@vger.kernel.org Signed-off-by: Christophe Leroy Reviewed-by: Kees Cook Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman commit 90f97b5ce7d456ad217fec027a92534a12e1b8d1 Author: Frank Rowand Date: Tue Dec 18 11:40:03 2018 -0800 of: __of_detach_node() - remove node from phandle cache commit 5801169a2ed20003f771acecf3ac00574cf10a38 upstream. Non-overlay dynamic devicetree node removal may leave the node in the phandle cache. Subsequent calls to of_find_node_by_phandle() will incorrectly find the stale entry. Remove the node from the cache. Add paranoia checks in of_find_node_by_phandle() as a second level of defense (do not return cached node if detached, do not add node to cache if detached). Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of of_find_node_by_phandle()") Reported-by: Michael Bringmann Cc: stable@vger.kernel.org # v4.17+ Signed-off-by: Frank Rowand Signed-off-by: Rob Herring Signed-off-by: Greg Kroah-Hartman commit 8a6b25938742e2fecf2ba119c08892b4f06bc3fb Author: Frank Rowand Date: Tue Dec 18 11:40:02 2018 -0800 of: of_node_get()/of_node_put() nodes held in phandle cache commit b8a9ac1a5b99a2fcbed19fd29d2d59270c281a31 upstream. The phandle cache contains struct device_node pointers. The refcount of the pointers was not incremented while in the cache, allowing use after free error after kfree() of the node. Add the proper increment and decrement of the use count. Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of of_find_node_by_phandle()") Cc: stable@vger.kernel.org # v4.17+ Signed-off-by: Frank Rowand Signed-off-by: Rob Herring Signed-off-by: Greg Kroah-Hartman commit bf91a7117e1b9edc16fbe6ad1067ed9411565ab4 Author: Lubomir Rintel Date: Fri Nov 16 17:23:47 2018 +0100 power: supply: olpc_battery: correct the temperature units commit ed54ffbe554f0902689fd6d1712bbacbacd11376 upstream. According to [1] and [2], the temperature values are in tenths of degree Celsius. Exposing the Celsius value makes the battery appear on fire: $ upower -i /org/freedesktop/UPower/devices/battery_olpc_battery ... temperature: 236.9 degrees C Tested on OLPC XO-1 and OLPC XO-1.75 laptops. [1] include/linux/power_supply.h [2] Documentation/power/power_supply_class.txt Fixes: fb972873a767 ("[BATTERY] One Laptop Per Child power/battery driver") Cc: stable@vger.kernel.org Signed-off-by: Lubomir Rintel Acked-by: Pavel Machek Signed-off-by: Sebastian Reichel Signed-off-by: Greg Kroah-Hartman commit b7b14f082590ee05bb410e9188ef7498a00ffda5 Author: Alexander Shishkin Date: Wed Dec 19 17:19:22 2018 +0200 intel_th: msu: Fix an off-by-one in attribute store commit ec5b5ad6e272d8d6b92d1007f79574919862a2d2 upstream. The 'nr_pages' attribute of the 'msc' subdevices parses a comma-separated list of window sizes, passed from userspace. However, there is a bug in the string parsing logic wherein it doesn't exclude the comma character from the range of characters as it consumes them. This leads to an out-of-bounds access given a sufficiently long list. For example: > # echo 8,8,8,8 > /sys/bus/intel_th/devices/0-msc0/nr_pages > ================================================================== > BUG: KASAN: slab-out-of-bounds in memchr+0x1e/0x40 > Read of size 1 at addr ffff8803ffcebcd1 by task sh/825 > > CPU: 3 PID: 825 Comm: npktest.sh Tainted: G W 4.20.0-rc1+ > Call Trace: > dump_stack+0x7c/0xc0 > print_address_description+0x6c/0x23c > ? memchr+0x1e/0x40 > kasan_report.cold.5+0x241/0x308 > memchr+0x1e/0x40 > nr_pages_store+0x203/0xd00 [intel_th_msu] Fix this by accounting for the comma character. Signed-off-by: Alexander Shishkin Fixes: ba82664c134ef ("intel_th: Add Memory Storage Unit driver") Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: Greg Kroah-Hartman commit 1b756aeb6181c4f22ec1b0bfb4894e8029442b3f Author: Christian Borntraeger Date: Wed Dec 12 14:45:18 2018 +0100 genwqe: Fix size check commit fdd669684655c07dacbdb0d753fd13833de69a33 upstream. Calling the test program genwqe_cksum with the default buffer size of 2MB triggers the following kernel warning on s390: WARNING: CPU: 30 PID: 9311 at mm/page_alloc.c:3189 __alloc_pages_nodemask+0x45c/0xbe0 CPU: 30 PID: 9311 Comm: genwqe_cksum Kdump: loaded Not tainted 3.10.0-957.el7.s390x #1 task: 00000005e5d13980 ti: 00000005e7c6c000 task.ti: 00000005e7c6c000 Krnl PSW : 0704c00180000000 00000000002780ac (__alloc_pages_nodemask+0x45c/0xbe0) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3 Krnl GPRS: 00000000002932b8 0000000000b73d7c 0000000000000010 0000000000000009 0000000000000041 00000005e7c6f9b8 0000000000000001 00000000000080d0 0000000000000000 0000000000b70500 0000000000000001 0000000000000000 0000000000b70528 00000000007682c0 0000000000277df2 00000005e7c6f9a0 Krnl Code: 000000000027809e: de7195001000 ed 1280(114,%r9),0(%r1) 00000000002780a4: a774fead brc 7,277dfe #00000000002780a8: a7f40001 brc 15,2780aa >00000000002780ac: 92011000 mvi 0(%r1),1 00000000002780b0: a7f4fea7 brc 15,277dfe 00000000002780b4: 9101c6b6 tm 1718(%r12),1 00000000002780b8: a784ff3a brc 8,277f2c 00000000002780bc: a7f4fe2e brc 15,277d18 Call Trace: ([<0000000000277df2>] __alloc_pages_nodemask+0x1a2/0xbe0) [<000000000013afae>] s390_dma_alloc+0xfe/0x310 [<000003ff8065f362>] __genwqe_alloc_consistent+0xfa/0x148 [genwqe_card] [<000003ff80658f7a>] genwqe_mmap+0xca/0x248 [genwqe_card] [<00000000002b2712>] mmap_region+0x4e2/0x778 [<00000000002b2c54>] do_mmap+0x2ac/0x3e0 [<0000000000292d7e>] vm_mmap_pgoff+0xd6/0x118 [<00000000002b081c>] SyS_mmap_pgoff+0xdc/0x268 [<00000000002b0a34>] SyS_old_mmap+0x8c/0xb0 [<000000000074e518>] sysc_tracego+0x14/0x1e [<000003ffacf87dc6>] 0x3ffacf87dc6 turns out the check in __genwqe_alloc_consistent uses "> MAX_ORDER" while the mm code uses ">= MAX_ORDER". Fix genwqe. Cc: stable@vger.kernel.org Signed-off-by: Christian Borntraeger Signed-off-by: Frank Haverkamp Signed-off-by: Greg Kroah-Hartman commit a05257f9ad48624cbaad747dc47920dafe4dda23 Author: Shuah Khan Date: Wed Dec 12 20:25:14 2018 -0700 selftests: Fix test errors related to lib.mk khdr target commit 211929fd3f7c8de4d541b1cc243b82830e5ea1e8 upstream. Commit b2d35fa5fc80 ("selftests: add headers_install to lib.mk") added khdr target to run headers_install target from the main Makefile. The logic uses KSFT_KHDR_INSTALL and top_srcdir as controls to initialize variables and include files to run headers_install from the top level Makefile. There are a few problems with this logic. 1. Exposes top_srcdir to all tests 2. Common logic impacts all tests 3. Uses KSFT_KHDR_INSTALL, top_srcdir, and khdr in an adhoc way. Tests add "khdr" dependency in their Makefiles to TEST_PROGS_EXTENDED in some cases, and STATIC_LIBS in other cases. This makes this framework confusing to use. The common logic that runs for all tests even when KSFT_KHDR_INSTALL isn't defined by the test. top_srcdir is initialized to a default value when test doesn't initialize it. It works for all tests without a sub-dir structure and tests with sub-dir structure fail to build. e.g: make -C sparc64/drivers/ or make -C drivers/dma-buf ../../lib.mk:20: ../../../../scripts/subarch.include: No such file or directory make: *** No rule to make target '../../../../scripts/subarch.include'. Stop. There is no reason to require all tests to define top_srcdir and there is no need to require tests to add khdr dependency using adhoc changes to TEST_* and other variables. Fix it with a consistent use of KSFT_KHDR_INSTALL and top_srcdir from tests that have the dependency on headers_install. Change common logic to include khdr target define and "all" target with dependency on khdr when KSFT_KHDR_INSTALL is defined. Only tests that have dependency on headers_install have to define just the KSFT_KHDR_INSTALL, and top_srcdir variables and there is no need to specify khdr dependency in the test Makefiles. Fixes: b2d35fa5fc80 ("selftests: add headers_install to lib.mk") Cc: stable@vger.kernel.org Signed-off-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman commit 178538f8c0cb3b565932e96c357fbc4cff995149 Author: Christian Lamparter Date: Sat Dec 22 15:35:38 2018 +0100 powerpc/4xx/ocm: Fix compilation error due to PAGE_KERNEL usage commit d0757237d7b18b1ce74293be7c077d86f7a732e8 upstream. This patch fixes a recent compilation regression in ocm: ocm.c: In function ‘ocm_init_node’: ocm.c:182:18: error: invalid operands to binary | (have ‘int’ and ‘pgprot_t’ {aka ‘struct ’}) _PAGE_EXEC | PAGE_KERNEL_NCG); ^ ocm.c:197:17: error: invalid operands to binary | (have ‘int’ and ‘pgprot_t’ {aka ‘struct ’}) _PAGE_EXEC | PAGE_KERNEL); ^ Fixes: 56f3c1413f5c ("powerpc/mm: properly set PAGE_KERNEL flags in ioremap()") Cc: stable@vger.kernel.org # v4.20 Signed-off-by: Christian Lamparter Reviewed-by: Christophe Leroy Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman commit bb9cc97bcbb03a2e8aa5f7401284c34d280e7e52 Author: Shaokun Zhang Date: Fri Jan 4 14:21:34 2019 +0800 drivers/perf: hisi: Fixup one DDRC PMU register offset commit eb4f5213251833567570df1a09803f895653274d upstream. For DDRC PMU, each PMU counter is fixed-purpose. There is a mismatch between perf list and driver definition on rw_chg event. # perf list | grep chg hisi_sccl1_ddrc0/rnk_chg/ [Kernel PMU event] hisi_sccl1_ddrc0/rw_chg/ [Kernel PMU event] But the register offset of rw_chg event is not defined in the driver, meanwhile bnk_chg register offset is mis-defined, let's fixup it. Fixes: 904dcf03f086 ("perf: hisi: Add support for HiSilicon SoC DDRC PMU driver") Cc: stable@vger.kernel.org Cc: John Garry Cc: Will Deacon Cc: Mark Rutland Reported-by: Weijian Huang Signed-off-by: Shaokun Zhang Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman commit d0e9298c5210100cba46e8f198741a296b055680 Author: YueHaibing Date: Thu Dec 20 19:13:08 2018 +0100 video: fbdev: pxafb: Fix "WARNING: invalid free of devm_ allocated data" commit 2607391882fca37463187e7f2a9c76dec286947e upstream. 'info->modes' got allocated with devm_kcalloc in of_get_pxafb_display. This gives this error message: ./drivers/video/fbdev/pxafb.c:2238:2-7: WARNING: invalid free of devm_ allocated data Fixes: c8f96304ec8b4 ("video: fbdev: pxafb: switch to devm_* API") Cc: stable@kernel.org [v4.19+] Signed-off-by: YueHaibing Reviewed-by: Daniel Mack Cc: Robert Jarzmik Signed-off-by: Bartlomiej Zolnierkiewicz Signed-off-by: Greg Kroah-Hartman commit 1b3083557a5dfddddc26a6d2fb77c55b92d69d98 Author: Yan, Zheng Date: Thu Nov 29 11:22:50 2018 +0800 ceph: don't update importing cap's mseq when handing cap export commit 3c1392d4c49962a31874af14ae9ff289cb2b3851 upstream. Updating mseq makes client think importer mds has accepted all prior cap messages and importer mds knows what caps client wants. Actually some cap messages may have been dropped because of mseq mismatch. If mseq is left untouched, importing cap's mds_wanted later will get reset by cap import message. Cc: stable@vger.kernel.org Signed-off-by: "Yan, Zheng" Signed-off-by: Ilya Dryomov Signed-off-by: Greg Kroah-Hartman commit 7a400b91868336b50ef75b0223f90a99ba5f7d51 Author: Linus Torvalds Date: Thu Dec 27 13:46:17 2018 -0800 sched/fair: Fix infinite loop in update_blocked_averages() by reverting a9e7f6544b9c commit c40f7d74c741a907cfaeb73a7697081881c497d0 upstream. Zhipeng Xie, Xie XiuQi and Sargun Dhillon reported lockups in the scheduler under high loads, starting at around the v4.18 time frame, and Zhipeng Xie tracked it down to bugs in the rq->leaf_cfs_rq_list manipulation. Do a (manual) revert of: a9e7f6544b9c ("sched/fair: Fix O(nr_cgroups) in load balance path") It turns out that the list_del_leaf_cfs_rq() introduced by this commit is a surprising property that was not considered in followup commits such as: 9c2791f936ef ("sched/fair: Fix hierarchical order in rq->leaf_cfs_rq_list") As Vincent Guittot explains: "I think that there is a bigger problem with commit a9e7f6544b9c and cfs_rq throttling: Let take the example of the following topology TG2 --> TG1 --> root: 1) The 1st time a task is enqueued, we will add TG2 cfs_rq then TG1 cfs_rq to leaf_cfs_rq_list and we are sure to do the whole branch in one path because it has never been used and can't be throttled so tmp_alone_branch will point to leaf_cfs_rq_list at the end. 2) Then TG1 is throttled 3) and we add TG3 as a new child of TG1. 4) The 1st enqueue of a task on TG3 will add TG3 cfs_rq just before TG1 cfs_rq and tmp_alone_branch will stay on rq->leaf_cfs_rq_list. With commit a9e7f6544b9c, we can del a cfs_rq from rq->leaf_cfs_rq_list. So if the load of TG1 cfs_rq becomes NULL before step 2) above, TG1 cfs_rq is removed from the list. Then at step 4), TG3 cfs_rq is added at the beginning of rq->leaf_cfs_rq_list but tmp_alone_branch still points to TG3 cfs_rq because its throttled parent can't be enqueued when the lock is released. tmp_alone_branch doesn't point to rq->leaf_cfs_rq_list whereas it should. So if TG3 cfs_rq is removed or destroyed before tmp_alone_branch points on another TG cfs_rq, the next TG cfs_rq that will be added, will be linked outside rq->leaf_cfs_rq_list - which is bad. In addition, we can break the ordering of the cfs_rq in rq->leaf_cfs_rq_list but this ordering is used to update and propagate the update from leaf down to root." Instead of trying to work through all these cases and trying to reproduce the very high loads that produced the lockup to begin with, simplify the code temporarily by reverting a9e7f6544b9c - which change was clearly not thought through completely. This (hopefully) gives us a kernel that doesn't lock up so people can continue to enjoy their holidays without worrying about regressions. ;-) [ mingo: Wrote changelog, fixed weird spelling in code comment while at it. ] Analyzed-by: Xie XiuQi Analyzed-by: Vincent Guittot Reported-by: Zhipeng Xie Reported-by: Sargun Dhillon Reported-by: Xie XiuQi Tested-by: Zhipeng Xie Tested-by: Sargun Dhillon Signed-off-by: Linus Torvalds Acked-by: Vincent Guittot Cc: # v4.13+ Cc: Bin Li Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Tejun Heo Cc: Thomas Gleixner Fixes: a9e7f6544b9c ("sched/fair: Fix O(nr_cgroups) in load balance path") Link: http://lkml.kernel.org/r/1545879866-27809-1-git-send-email-xiexiuqi@huawei.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman commit 8c47bf0c17c9a95b613520bfdafc3fab457dcd4c Author: Sohil Mehta Date: Wed Nov 21 15:29:33 2018 -0800 iommu/vt-d: Handle domain agaw being less than iommu agaw commit 3569dd07aaad71920c5ea4da2d5cc9a167c1ffd4 upstream. The Intel IOMMU driver opportunistically skips a few top level page tables from the domain paging directory while programming the IOMMU context entry. However there is an implicit assumption in the code that domain's adjusted guest address width (agaw) would always be greater than IOMMU's agaw. The IOMMU capabilities in an upcoming platform cause the domain's agaw to be lower than IOMMU's agaw. The issue is seen when the IOMMU supports both 4-level and 5-level paging. The domain builds a 4-level page table based on agaw of 2. However the IOMMU's agaw is set as 3 (5-level). In this case the code incorrectly tries to skip page page table levels. This causes the IOMMU driver to avoid programming the context entry. The fix handles this case and programs the context entry accordingly. Fixes: de24e55395698 ("iommu/vt-d: Simplify domain_context_mapping_one") Cc: Cc: Ashok Raj Cc: Jacob Pan Cc: Lu Baolu Reviewed-by: Lu Baolu Reported-by: Ramos Falcon, Ernesto R Tested-by: Ricardo Neri Signed-off-by: Sohil Mehta Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman commit de3b4f54c261d7169c9e9f02da388ed0cc62cd08 Author: Steve Wise Date: Thu Dec 20 14:00:11 2018 -0800 RDMA/iwcm: Don't copy past the end of dev_name() string commit d53ec8af56d5163f8a42e961ece3aeb5c560e79d upstream. We now use dev_name(&ib_device->dev) instead of ib_device->name in iwpm messages. The name field in struct device is a const char *, where as ib_device->name is a char array of size IB_DEVICE_NAME_MAX, and it is pre-initialized to zeros. Since iw_cm_map() was using memcpy() to copy in the device name, and copying IWPM_DEVNAME_SIZE bytes, it ends up copying past the end of the source device name string and copying random bytes. This results in iwpmd failing the REGISTER_PID request from iwcm. Thus port mapping is broken. Validate the device and if names, and use strncpy() to inialize the entire message field. Fixes: 896de0090a85 ("RDMA/core: Use dev_name instead of ibdev->name") Cc: stable@vger.kernel.org Signed-off-by: Steve Wise Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman commit e43876157f18a600a9ed75ce4dcb4129e1b35ccf Author: Bart Van Assche Date: Mon Dec 17 13:20:40 2018 -0800 RDMA/srpt: Fix a use-after-free in the channel release code commit ed041919f0d23c109d52cde8da6ddc211c52d67e upstream. This patch avoids that KASAN sporadically reports the following: BUG: KASAN: use-after-free in rxe_run_task+0x1e/0x60 [rdma_rxe] Read of size 1 at addr ffff88801c50d8f4 by task check/24830 CPU: 4 PID: 24830 Comm: check Not tainted 4.20.0-rc6-dbg+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x86/0xca print_address_description+0x71/0x239 kasan_report.cold.5+0x242/0x301 __asan_load1+0x47/0x50 rxe_run_task+0x1e/0x60 [rdma_rxe] rxe_post_send+0x4bd/0x8d0 [rdma_rxe] srpt_zerolength_write+0xe1/0x160 [ib_srpt] srpt_close_ch+0x8b/0xe0 [ib_srpt] srpt_set_enabled+0xe7/0x150 [ib_srpt] srpt_tpg_enable_store+0xc0/0x100 [ib_srpt] configfs_write_file+0x157/0x1d0 __vfs_write+0xd7/0x3d0 vfs_write+0x102/0x290 ksys_write+0xab/0x130 __x64_sys_write+0x43/0x50 do_syscall_64+0x71/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe Allocated by task 13856: save_stack+0x43/0xd0 kasan_kmalloc+0xc7/0xe0 kasan_slab_alloc+0x11/0x20 kmem_cache_alloc+0x105/0x320 rxe_alloc+0xff/0x1f0 [rdma_rxe] rxe_create_qp+0x9f/0x160 [rdma_rxe] ib_create_qp+0xf5/0x690 [ib_core] rdma_create_qp+0x6a/0x140 [rdma_cm] srpt_cm_req_recv.cold.59+0x1588/0x237b [ib_srpt] srpt_rdma_cm_req_recv.isra.35+0x1d5/0x220 [ib_srpt] srpt_rdma_cm_handler+0x6f/0x100 [ib_srpt] cma_listen_handler+0x59/0x60 [rdma_cm] cma_ib_req_handler+0xd5b/0x2570 [rdma_cm] cm_process_work+0x2e/0x110 [ib_cm] cm_work_handler+0x2aae/0x502b [ib_cm] process_one_work+0x481/0x9e0 worker_thread+0x67/0x5b0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 Freed by task 3440: save_stack+0x43/0xd0 __kasan_slab_free+0x139/0x190 kasan_slab_free+0xe/0x10 kmem_cache_free+0xbc/0x330 rxe_elem_release+0x66/0xe0 [rdma_rxe] rxe_destroy_qp+0x3f/0x50 [rdma_rxe] ib_destroy_qp+0x140/0x360 [ib_core] srpt_release_channel_work+0xdc/0x310 [ib_srpt] process_one_work+0x481/0x9e0 worker_thread+0x67/0x5b0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 Cc: Sergey Gorenko Cc: Max Gurtovoy Cc: Laurence Oberman Cc: Signed-off-by: Bart Van Assche Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman commit a64a09edaba6b426c61f8f48fa50f0a6d6c96028 Author: Alexander Shishkin Date: Wed Dec 19 17:19:20 2018 +0200 stm class: Fix a module refcount leak in policy creation error path commit c18614a1a11276837bdd44403d84d207c9951538 upstream. Commit c7fd62bc69d0 ("stm class: Introduce framing protocol drivers") adds a bug into the error path of policy creation, that would do a module_put() on a wrong module, if one tried to create a policy for an stm device which already has a policy, using a different protocol. IOW, | mkdir /config/stp-policy/dummy_stm.0:p_basic.test | mkdir /config/stp-policy/dummy_stm.0:p_sys-t.test # puts "p_basic" | mkdir /config/stp-policy/dummy_stm.0:p_sys-t.test # "p_basic" -> -1 throws: | general protection fault: 0000 [#1] SMP PTI | CPU: 3 PID: 2887 Comm: mkdir | RIP: 0010:module_put.part.31+0xe/0x90 | Call Trace: | module_put+0x13/0x20 | stm_put_protocol+0x11/0x20 [stm_core] | stp_policy_make+0xf1/0x210 [stm_core] | ? __kmalloc+0x183/0x220 | ? configfs_mkdir+0x10d/0x4c0 | configfs_mkdir+0x169/0x4c0 | vfs_mkdir+0x108/0x1c0 | do_mkdirat+0xe8/0x110 | __x64_sys_mkdir+0x1b/0x20 | do_syscall_64+0x5a/0x140 | entry_SYSCALL_64_after_hwframe+0x44/0xa9 Correct this sad mistake by calling calling 'put' on the correct reference, which happens to match another error path in the same function, so we consolidate the two at the same time. Signed-off-by: Alexander Shishkin Fixes: c7fd62bc69d0 ("stm class: Introduce framing protocol drivers") Reported-by: Ammy Yi Cc: stable Signed-off-by: Greg Kroah-Hartman commit 739f7f1b44f78eb3f4c2dd558acb0d526c011558 Author: Sagi Grimberg Date: Thu Oct 25 12:40:57 2018 -0700 rxe: fix error completion wr_id and qp_num commit e48d8ed9c6193502d849b35767fd18e20bbd7ba2 upstream. Error completions must still contain a valid wr_id and qp_num such that the consumer can rely on. Correctly fill these fields in receive error completions. Reported-by: Walker Benjamin Cc: stable@vger.kernel.org Signed-off-by: Sagi Grimberg Reviewed-by: Zhu Yanjun Tested-by: Zhu Yanjun Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman commit 7030ab2d41dd189a5de75a6addd6e178b20140c9 Author: Dominique Martinet Date: Mon Nov 5 09:52:48 2018 +0100 9p/net: put a lower bound on msize commit 574d356b7a02c7e1b01a1d9cba8a26b3c2888f45 upstream. If the requested msize is too small (either from command line argument or from the server version reply), we won't get any work done. If it's *really* too small, nothing will work, and this got caught by syzbot recently (on a new kmem_cache_create_usercopy() call) Just set a minimum msize to 4k in both code paths, until someone complains they have a use-case for a smaller msize. We need to check in both mount option and server reply individually because the msize for the first version request would be unchecked with just a global check on clnt->msize. Link: http://lkml.kernel.org/r/1541407968-31350-1-git-send-email-asmadeus@codewreck.org Reported-by: syzbot+0c1d61e4db7db94102ca@syzkaller.appspotmail.com Signed-off-by: Dominique Martinet Cc: Eric Van Hensbergen Cc: Latchesar Ionkov Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman commit 0fc78fa0e24961a5b84099d4721a2559ed56c3a7 Author: Mircea Caprioru Date: Thu Dec 6 15:53:15 2018 +0200 iio: dac: ad5686: fix bit shift read register commit 0e76df5c978338f3051e5126fc0c4245c57a307a upstream. This patch solves the register readback issue with the bit shift. When the dac resolution was lower than the register size (ex. 12 bits out of 16 bits) the readback value was not shifted with the difference in bits and the value was higher. Also a mask is applied on the read value in order to get the value relative to the actual bit size. Fixes: 0357e488b8 ("iio:dac:ad5686: Refactor the driver") Signed-off-by: Mircea Caprioru Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit 30533049aadf3e25b79792c0da7eb8609c9707c5 Author: Evan Green Date: Tue Dec 4 11:14:19 2018 -0800 iio: adc: qcom-spmi-adc5: Initialize prescale properly commit db23d88756abd38e0995ea8449d0025b3de4b26b upstream. adc5_get_dt_data uses a local, prop, feeds it to adc5_get_dt_channel_data, and then puts the result into adc->chan_props. The problem is adc5_get_dt_channel_data may not initialize that structure fully, so a garbage value is used for prescale if the optional "qcom,pre-scaling" is not defined in DT. adc5_read_raw then uses this as an array index, generating a crash that looks like this: [ 6.683186] Unable to handle kernel paging request at virtual address ffffff90e78c7964 Call trace: qcom_vadc_scale_code_voltage_factor+0x74/0x104 qcom_vadc_scale_hw_calib_die_temp+0x20/0x60 qcom_adc5_hw_scale+0x78/0xa4 adc5_read_raw+0x3d0/0x65c iio_channel_read+0x240/0x30c iio_read_channel_processed+0x10c/0x150 qpnp_tm_get_temp+0xc0/0x40c of_thermal_get_temp+0x7c/0x98 thermal_zone_get_temp+0xac/0xd8 thermal_zone_device_update+0xc0/0x38c qpnp_tm_probe+0x624/0x81c platform_drv_probe+0xe4/0x11c really_probe+0x188/0x3fc driver_probe_device+0xb8/0x188 __device_attach_driver+0x114/0x180 bus_for_each_drv+0xd8/0x118 __device_attach+0x180/0x27c device_initial_probe+0x20/0x2c bus_probe_device+0x78/0x124 deferred_probe_work_func+0xfc/0x138 process_one_work+0x3d8/0x8b0 process_scheduled_works+0x48/0x6c worker_thread+0x488/0x7cc kthread+0x24c/0x264 ret_from_fork+0x10/0x18 Unfortunately, when I went to add the initializer for this and tried to boot it, my machine shut down immediately, complaining that it was hotter than the sun. It appears that adc5_chans_pmic and adc5_chans_rev2 were initializing prescale_index as if it were directly a divisor, rather than the index into adc5_prescale_ratios that it is. Fix the uninitialized value, and change the static initialization to use indices into adc5_prescale_ratios. Signed-off-by: Evan Green Reviewed-by: Matthias Kaehlcke Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit 21e5f402c6c25c1b65fa29d9927a601e661d175a Author: Breno Leitao Date: Wed Nov 21 17:21:09 2018 -0200 powerpc/tm: Set MSR[TS] just prior to recheckpoint commit e1c3743e1a20647c53b719dbf28b48f45d23f2cd upstream. On a signal handler return, the user could set a context with MSR[TS] bits set, and these bits would be copied to task regs->msr. At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set, several __get_user() are called and then a recheckpoint is executed. This is a problem since a page fault (in kernel space) could happen when calling __get_user(). If it happens, the process MSR[TS] bits were already set, but recheckpoint was not executed, and SPRs are still invalid. The page fault can cause the current process to be de-scheduled, with MSR[TS] active and without tm_recheckpoint() being called. More importantly, without TEXASR[FS] bit set also. Since TEXASR might not have the FS bit set, and when the process is scheduled back, it will try to reclaim, which will be aborted because of the CPU is not in the suspended state, and, then, recheckpoint. This recheckpoint will restore thread->texasr into TEXASR SPR, which might be zero, hitting a BUG_ON(). kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434! cpu 0xb: Vector: 700 (Program Check) at [c00000041f1576d0] pc: c000000000054550: restore_gprs+0xb0/0x180 lr: 0000000000000000 sp: c00000041f157950 msr: 8000000100021033 current = 0xc00000041f143000 paca = 0xc00000000fb86300 softe: 0 irq_happened: 0x01 pid = 1021, comm = kworker/11:1 kernel BUG at /build/linux-sf3Co9/linux-4.9.30/arch/powerpc/kernel/tm.S:434! Linux version 4.9.0-3-powerpc64le (debian-kernel@lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) enter ? for help [c00000041f157b30] c00000000001bc3c tm_recheckpoint.part.11+0x6c/0xa0 [c00000041f157b70] c00000000001d184 __switch_to+0x1e4/0x4c0 [c00000041f157bd0] c00000000082eeb8 __schedule+0x2f8/0x990 [c00000041f157cb0] c00000000082f598 schedule+0x48/0xc0 [c00000041f157ce0] c0000000000f0d28 worker_thread+0x148/0x610 [c00000041f157d80] c0000000000f96b0 kthread+0x120/0x140 [c00000041f157e30] c00000000000c0e0 ret_from_kernel_thread+0x5c/0x7c This patch simply delays the MSR[TS] set, so, if there is any page fault in the __get_user() section, it does not have regs->msr[TS] set, since the TM structures are still invalid, thus avoiding doing TM operations for in-kernel exceptions and possible process reschedule. With this patch, the MSR[TS] will only be set just before recheckpointing and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in invalid state. Other than that, if CONFIG_PREEMPT is set, there might be a preemption just after setting MSR[TS] and before tm_recheckpoint(), thus, this block must be atomic from a preemption perspective, thus, calling preempt_disable/enable() on this code. It is not possible to move tm_recheckpoint to happen earlier, because it is required to get the checkpointed registers from userspace, with __get_user(), thus, the only way to avoid this undesired behavior is delaying the MSR[TS] set. The 32-bits signal handler seems to be safe this current issue, but, it might be exposed to the preemption issue, thus, disabling preemption in this chunk of code. Changes from v2: * Run the critical section with preempt_disable. Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals") Cc: stable@vger.kernel.org (v3.9+) Signed-off-by: Breno Leitao Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman commit cecc892029273c6cc800d7ac40902673bc671e17 Author: Greg Kroah-Hartman Date: Fri Jan 11 08:05:32 2019 +0100 Revert "powerpc/tm: Unset MSR[TS] if not recheckpointing" This reverts commit d412deb85a4aada382352a8202beb7af8921cd53 which is commit 6f5b9f018f4c7686fd944d920209d1382d320e4e upstream. It breaks the powerpc build, so drop it from the tree until a fix goes upstream. Reported-by: Guenter Roeck Cc: Breno Leitao Cc: Michal Suchánek Cc: Michael Ellerman Cc: Christoph Biedl Signed-off-by: Greg Kroah-Hartman commit 0a1246ed50c020045b76f27f72cbe382539a375a Author: J. Bruce Fields Date: Thu Nov 15 11:21:40 2018 -0500 nfsd4: zero-length WRITE should succeed commit fdec6114ee1f0f43b1ad081ad8d46b23ba126d70 upstream. Zero-length writes are legal; from 5661 section 18.32.3: "If the count is zero, the WRITE will succeed and return a count of zero subject to permissions checking". This check is unnecessary and is causing zero-length reads to return EINVAL. Cc: stable@vger.kernel.org Fixes: 3fd9557aec91 "NFSD: Refactor the generic write vector fill helper" Cc: Chuck Lever Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman commit c7e10e59d1e5452104088ca057409d719c86354d Author: Chuck Lever Date: Wed Dec 19 10:58:13 2018 -0500 xprtrdma: Yet another double DMA-unmap commit e2f34e26710bfaa545a9d9cd0c70137406401467 upstream. While chasing yet another set of DMAR fault reports, I noticed that the frwr recycler conflates whether or not an MR has been DMA unmapped with frwr->fr_state. Actually the two have only an indirect relationship. It's in fact impossible to guess reliably whether the MR has been DMA unmapped based on its fr_state field, especially as the surrounding code and its assumptions have changed over time. A better approach is to track the DMA mapping status explicitly so that the recycler is less brittle to unexpected situations, and attempts to DMA-unmap a second time are prevented. Signed-off-by: Chuck Lever Cc: stable@vger.kernel.org # v4.20 Signed-off-by: Anna Schumaker Signed-off-by: Greg Kroah-Hartman commit 2fd246ade5159753ebaa53820e32078a75797c68 Author: Benjamin Coddington Date: Thu Nov 1 13:39:49 2018 -0400 lockd: Show pid of lockd for remote locks commit b8eee0e90f9797b747113638bc75e739b192ad38 upstream. Commit 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks") specified that the l_pid returned for F_GETLK on a local file that has a remote lock should be the pid of the lock manager process. That commit, while updating other filesystems, failed to update lockd, such that locks created by lockd had their fl_pid set to that of the remote process holding the lock. Fix that here to be the pid of lockd. Also, fix the client case so that the returned lock pid is negative, which indicates a remote lock on a remote file. Fixes: 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific...") Cc: stable@vger.kernel.org Signed-off-by: Benjamin Coddington Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman commit 39e1be324c2f9048b013aaa190acf91b3f23b1a8 Author: Jarkko Nikula Date: Tue Oct 23 14:45:52 2018 +0300 PCI / PM: Allow runtime PM without callback functions commit c5eb1190074cfb14c5d9cac692f1912eecf1a5e4 upstream. a9c8088c7988 ("i2c: i801: Don't restore config registers on runtime PM") nullified the runtime PM suspend/resume callback pointers while keeping the runtime PM enabled. This caused the SMBus PCI device to stay in D0 with /sys/devices/.../power/runtime_status showing "error" when the runtime PM framework attempted to autosuspend the device. This is due to PCI bus runtime PM, which checks for driver runtime PM callbacks and returns -ENOSYS if they are not set. Since i2c-i801.c doesn't need to do anything device-specific for runtime PM, Jean Delvare proposed this be fixed in the PCI core rather than adding dummy runtime PM callback functions in the PCI drivers. Change pci_pm_runtime_suspend()/pci_pm_runtime_resume() so they allow changing the PCI device power state during runtime PM transitions even if the driver supplies no runtime PM callbacks. This fixes the runtime PM regression on i2c-i801.c. It is not obvious why the code previously required the runtime PM callbacks. The test has been there since the code was introduced by 6cbf82148ff2 ("PCI PM: Run-time callbacks for PCI bus type"). On the other hand, a similar change was done to generic runtime PM callbacks in 05aa55dddb9e ("PM / Runtime: Lenient generic runtime pm callbacks"). Fixes: a9c8088c7988 ("i2c: i801: Don't restore config registers on runtime PM") Reported-by: Mika Westerberg Signed-off-by: Jarkko Nikula Signed-off-by: Bjorn Helgaas Reviewed-by: Jean Delvare Reviewed-by: Rafael J. Wysocki Cc: stable@vger.kernel.org # v4.18+ Signed-off-by: Greg Kroah-Hartman commit 33068413505670313cdc30cff68066a865548b81 Author: Ondrej Mosnacek Date: Tue Oct 23 09:02:17 2018 +0200 selinux: policydb - fix byte order and alignment issues commit 5df275cd4cf51c86d49009f1397132f284ba515e upstream. Do the LE conversions before doing the Infiniband-related range checks. The incorrect checks are otherwise causing a failure to load any policy with an ibendportcon rule on BE systems. This can be reproduced by running (on e.g. ppc64): cat >my_module.cil < Cc: Eli Cohen Cc: James Morris Cc: Doug Ledford Cc: # 4.13+ Fixes: a806f7a1616f ("selinux: Create policydb version for Infiniband support") Signed-off-by: Ondrej Mosnacek Acked-by: Stephen Smalley Signed-off-by: Paul Moore Signed-off-by: Greg Kroah-Hartman commit 047ecbc9fa4eb98afafbb2b67818a95b96c8ed0e Author: Larry Finger Date: Mon Nov 19 20:01:24 2018 +0200 b43: Fix error in cordic routine commit 8ea3819c0bbef57a51d8abe579e211033e861677 upstream. The cordic routine for calculating sines and cosines that was added in commit 6f98e62a9f1b ("b43: update cordic code to match current specs") contains an error whereby a quantity declared u32 can in fact go negative. This problem was detected by Priit Laes who is switching b43 to use the routine in the library functions of the kernel. Fixes: 986504540306 ("b43: make cordic common (LP-PHY and N-PHY need it)") Reported-by: Priit Laes Cc: Rafał Miłecki Cc: Stable # 2.6.34 Signed-off-by: Larry Finger Signed-off-by: Priit Laes Signed-off-by: Kalle Valo Signed-off-by: Greg Kroah-Hartman commit a62b07e9fe15e26e40a16dd132cf91b4a0f2d7ca Author: Andreas Gruenbacher Date: Tue Dec 4 15:06:27 2018 +0100 gfs2: Fix loop in gfs2_rbm_find commit 2d29f6b96d8f80322ed2dd895bca590491c38d34 upstream. Fix the resource group wrap-around logic in gfs2_rbm_find that commit e579ed4f44 broke. The bug can lead to unnecessary repeated scanning of the same bitmaps; there is a risk that future changes will turn this into an endless loop. Fixes: e579ed4f44 ("GFS2: Introduce rbm field bii") Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Andreas Gruenbacher Signed-off-by: Bob Peterson Signed-off-by: Greg Kroah-Hartman commit dfb1922adf7ad76e2e4fc1f77ed1fb5c27a13e01 Author: Andreas Gruenbacher Date: Mon Nov 26 18:45:35 2018 +0100 gfs2: Get rid of potential double-freeing in gfs2_create_inode commit 6ff9b09e00a441599f3aacdf577254455a048bc9 upstream. In gfs2_create_inode, after setting and releasing the acl / default_acl, the acl / default_acl pointers are not set to NULL as they should be. In that state, when the function reaches label fail_free_acls, gfs2_create_inode will try to release the same acls again. Fix that by setting the pointers to NULL after releasing the acls. Slightly simplify the logic. Also, posix_acl_release checks for NULL already, so there is no need to duplicate those checks here. Fixes: e01580bf9e4d ("gfs2: use generic posix ACL infrastructure") Reported-by: Pan Bian Cc: Christoph Hellwig Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Andreas Gruenbacher Signed-off-by: Bob Peterson Signed-off-by: Greg Kroah-Hartman commit d6d479985765e1c6d3bdbb25d717caf266094f1c Author: Vasily Averin Date: Thu Nov 15 13:18:56 2018 +0300 dlm: memory leaks on error path in dlm_user_request() commit d47b41aceeadc6b58abc9c7c6485bef7cfb75636 upstream. According to comment in dlm_user_request() ua should be freed in dlm_free_lkb() after successful attach to lkb. However ua is attached to lkb not in set_lock_args() but later, inside request_lock(). Fixes 597d0cae0f99 ("[DLM] dlm: user locks") Cc: stable@kernel.org # 2.6.19 Signed-off-by: Vasily Averin Signed-off-by: David Teigland Signed-off-by: Greg Kroah-Hartman commit b956f5bf6d296e318308542e590aa155fd8df9b2 Author: Vasily Averin Date: Thu Nov 15 13:18:24 2018 +0300 dlm: lost put_lkb on error path in receive_convert() and receive_unlock() commit c0174726c3976e67da8649ac62cae43220ae173a upstream. Fixes 6d40c4a708e0 ("dlm: improve error and debug messages") Cc: stable@kernel.org # 3.5 Signed-off-by: Vasily Averin Signed-off-by: David Teigland Signed-off-by: Greg Kroah-Hartman commit 1f00b0a6bb02ab6a3288e6de8dc97268c37c6be5 Author: Vasily Averin Date: Thu Nov 15 13:18:18 2018 +0300 dlm: possible memory leak on error path in create_lkb() commit 23851e978f31eda8b2d01bd410d3026659ca06c7 upstream. Fixes 3d6aa675fff9 ("dlm: keep lkbs in idr") Cc: stable@kernel.org # 3.1 Signed-off-by: Vasily Averin Signed-off-by: David Teigland Signed-off-by: Greg Kroah-Hartman commit 78460f37a784c5e0d299438adfa461d80121148a Author: Vasily Averin Date: Thu Nov 15 13:15:05 2018 +0300 dlm: fixed memory leaks after failed ls_remove_names allocation commit b982896cdb6e6a6b89d86dfb39df489d9df51e14 upstream. If allocation fails on last elements of array need to free already allocated elements. v2: just move existing out_rsbtbl label to right place Fixes 789924ba635f ("dlm: fix race between remove and lookup") Cc: stable@kernel.org # 3.6 Signed-off-by: Vasily Averin Signed-off-by: David Teigland Signed-off-by: Greg Kroah-Hartman commit 0a2fff2428f1e175932dc3cf115b68c868e3a839 Author: Jaegeuk Kim Date: Tue Dec 18 09:25:37 2018 -0800 dm: do not allow readahead to limit IO size commit c6d6e9b0f6b4201c77f2cea3964dd122697e3543 upstream. Update DM to set the bdi's io_pages. This fixes reads to be capped at the device's max request size (even if user's read IO exceeds the established readahead setting). Fixes: 9491ae4a ("mm: don't cap request size based on read-ahead setting") Cc: stable@vger.kernel.org Reviewed-by: Jens Axboe Signed-off-by: Jaegeuk Kim Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit d902258a8997d9e1007c41da5205a831756858c7 Author: Damien Le Moal Date: Mon Dec 17 15:14:05 2018 +0900 block: mq-deadline: Fix write completion handling commit 7211aef86f79583e59b88a0aba0bc830566f7e8e upstream. For a zoned block device using mq-deadline, if a write request for a zone is received while another write was already dispatched for the same zone, dd_dispatch_request() will return NULL and the newly inserted write request is kept in the scheduler queue waiting for the ongoing zone write to complete. With this behavior, when no other request has been dispatched, rq_list in blk_mq_sched_dispatch_requests() is empty and blk_mq_sched_mark_restart_hctx() not called. This in turn leads to __blk_mq_free_request() call of blk_mq_sched_restart() to not run the queue when the already dispatched write request completes. The newly dispatched request stays stuck in the scheduler queue until eventually another request is submitted. This problem does not affect SCSI disk as the SCSI stack handles queue restart on request completion. However, this problem is can be triggered the nullblk driver with zoned mode enabled. Fix this by always requesting a queue restart in dd_dispatch_request() if no request was dispatched while WRITE requests are queued. Fixes: 5700f69178e9 ("mq-deadline: Introduce zone locking support") Cc: Signed-off-by: Damien Le Moal Signed-off-by: Greg Kroah-Hartman Add missing export of blk_mq_sched_restart() Signed-off-by: Jens Axboe commit 7571b18bcad56132570f86695b641b56f403991e Author: Ming Lei Date: Wed Dec 12 19:44:34 2018 +0800 block: deactivate blk_stat timer in wbt_disable_default() commit 544fbd16a461a318cd80537d1331c0df5c6cf930 upstream. rwb_enabled() can't be changed when there is any inflight IO. wbt_disable_default() may set rwb->wb_normal as zero, however the blk_stat timer may still be pending, and the timer function will update wrb->wb_normal again. This patch introduces blk_stat_deactivate() and applies it in wbt_disable_default(), then the following IO hang triggered when running parted & switching io scheduler can be fixed: [ 369.937806] INFO: task parted:3645 blocked for more than 120 seconds. [ 369.938941] Not tainted 4.20.0-rc6-00284-g906c801e5248 #498 [ 369.939797] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 369.940768] parted D 0 3645 3239 0x00000000 [ 369.941500] Call Trace: [ 369.941874] ? __schedule+0x6d9/0x74c [ 369.942392] ? wbt_done+0x5e/0x5e [ 369.942864] ? wbt_cleanup_cb+0x16/0x16 [ 369.943404] ? wbt_done+0x5e/0x5e [ 369.943874] schedule+0x67/0x78 [ 369.944298] io_schedule+0x12/0x33 [ 369.944771] rq_qos_wait+0xb5/0x119 [ 369.945193] ? karma_partition+0x1c2/0x1c2 [ 369.945691] ? wbt_cleanup_cb+0x16/0x16 [ 369.946151] wbt_wait+0x85/0xb6 [ 369.946540] __rq_qos_throttle+0x23/0x2f [ 369.947014] blk_mq_make_request+0xe6/0x40a [ 369.947518] generic_make_request+0x192/0x2fe [ 369.948042] ? submit_bio+0x103/0x11f [ 369.948486] ? __radix_tree_lookup+0x35/0xb5 [ 369.949011] submit_bio+0x103/0x11f [ 369.949436] ? blkg_lookup_slowpath+0x25/0x44 [ 369.949962] submit_bio_wait+0x53/0x7f [ 369.950469] blkdev_issue_flush+0x8a/0xae [ 369.951032] blkdev_fsync+0x2f/0x3a [ 369.951502] do_fsync+0x2e/0x47 [ 369.951887] __x64_sys_fsync+0x10/0x13 [ 369.952374] do_syscall_64+0x89/0x149 [ 369.952819] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 369.953492] RIP: 0033:0x7f95a1e729d4 [ 369.953996] Code: Bad RIP value. [ 369.954456] RSP: 002b:00007ffdb570dd48 EFLAGS: 00000246 ORIG_RAX: 000000000000004a [ 369.955506] RAX: ffffffffffffffda RBX: 000055c2139c6be0 RCX: 00007f95a1e729d4 [ 369.956389] RDX: 0000000000000001 RSI: 0000000000001261 RDI: 0000000000000004 [ 369.957325] RBP: 0000000000000002 R08: 0000000000000000 R09: 000055c2139c6ce0 [ 369.958199] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c2139c0380 [ 369.959143] R13: 0000000000000004 R14: 0000000000000100 R15: 0000000000000008 Cc: stable@vger.kernel.org Cc: Paolo Valente Signed-off-by: Ming Lei Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit db4570bb0d7b516c529a842802df972ad85a211d Author: Matthew Wilcox Date: Fri Dec 28 07:22:26 2018 -0800 Fix failure path in alloc_pid() commit 1a80dade010c7a7f4885a4c4c2a7ac22cc7b34df upstream. The failure path removes the allocated PIDs from the wrong namespace. This could lead to us inadvertently reusing PIDs in the leaf namespace and leaking PIDs in parent namespaces. Fixes: 95846ecf9dac ("pid: replace pid bitmap implementation with IDR API") Cc: Signed-off-by: Matthew Wilcox Acked-by: "Eric W. Biederman" Reviewed-by: Oleg Nesterov Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 1fdd2859daca9819def080c87455e4ba377438af Author: Rafael J. Wysocki Date: Thu Dec 13 19:27:47 2018 +0100 driver core: Add missing dev->bus->need_parent_lock checks commit e121a833745b4708b660e3fe6776129c2956b041 upstream. __device_release_driver() has to check dev->bus->need_parent_lock before dropping the parent lock and acquiring it again as it may attempt to drop a lock that hasn't been acquired or lock a device that shouldn't be locked and create a lock imbalance. Fixes: 8c97a46af04b (driver core: hold dev's parent lock when needed) Signed-off-by: Rafael J. Wysocki Cc: stable Reviewed-by: Daniel Vetter Signed-off-by: Greg Kroah-Hartman commit a38adf5a8ced455db3cf9f0f2cd835dae224dab9 Author: Dennis Krein Date: Fri Oct 26 07:38:24 2018 -0700 srcu: Lock srcu_data structure in srcu_gp_start() commit eb4c2382272ae7ae5d81fdfa5b7a6c86146eaaa4 upstream. The srcu_gp_start() function is called with the srcu_struct structure's ->lock held, but not with the srcu_data structure's ->lock. This is problematic because this function accesses and updates the srcu_data structure's ->srcu_cblist, which is protected by that lock. Failing to hold this lock can result in corruption of the SRCU callback lists, which in turn can result in arbitrarily bad results. This commit therefore makes srcu_gp_start() acquire the srcu_data structure's ->lock across the calls to rcu_segcblist_advance() and rcu_segcblist_accelerate(), thus preventing this corruption. Reported-by: Bart Van Assche Reported-by: Christoph Hellwig Reported-by: Sebastian Kuzminsky Signed-off-by: Dennis Krein Signed-off-by: Paul E. McKenney Tested-by: Dennis Krein Cc: # 4.16.x Signed-off-by: Greg Kroah-Hartman commit 9dfe7ee5cd357c58c2447708f1901e9e0cd3d82d Author: Takashi Iwai Date: Wed Jan 2 17:12:21 2019 +0100 ALSA: usb-audio: Always check descriptor sizes in parser code commit 3e96d7280f16e2f787307f695a31296b9e4a1cd7 upstream. There are a few places where we access the data without checking the actual object size from the USB audio descriptor. This may result in OOB access, as recently reported. This patch addresses these missing checks. Most of added codes are simple bLength checks in the caller side. For the input and output terminal parsers, we put the length check in the parser functions. For the input terminal, a new argument is added to distinguish between UAC1 and the rest, as they treat different objects. Reported-by: Mathias Payer Reported-by: Hui Peng Tested-by: Hui Peng Cc: Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 0005a4680fd49297b63056156a70d388a2844471 Author: Hui Peng Date: Tue Dec 25 18:11:52 2018 -0500 ALSA: usb-audio: Fix an out-of-bound read in create_composite_quirks commit cbb2ebf70daf7f7d97d3811a2ff8e39655b8c184 upstream. In `create_composite_quirk`, the terminating condition of for loops is `quirk->ifnum < 0`. So any composite quirks should end with `struct snd_usb_audio_quirk` object with ifnum < 0. for (quirk = quirk_comp->data; quirk->ifnum >= 0; ++quirk) { ..... } the data field of Bower's & Wilkins PX headphones usb device device quirks do not end with {.ifnum = -1}, wihch may result in out-of-bound read. This Patch fix the bug by adding an ending quirk object. Fixes: 240a8af929c7 ("ALSA: usb-audio: Add a quirck for B&W PX headphones") Signed-off-by: Hui Peng Cc: Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit cd5564f4066372b5839d36e4bd752a9c51e0c7e3 Author: Takashi Iwai Date: Wed Dec 19 14:04:47 2018 +0100 ALSA: usb-audio: Check mixer unit descriptors more strictly commit 0bfe5e434e6665b3590575ec3c5e4f86a1ce51c9 upstream. We've had some sanity checks of the mixer unit descriptors but they are too loose and some corner cases are overlooked. Add more strict checks in uac_mixer_unit_get_channels() for avoiding possible OOB accesses by malformed descriptors. This also changes the semantics of uac_mixer_unit_get_channels() slightly. Now it returns zero for the cases where the descriptor lacks of bmControls instead of -EINVAL. Then the caller side skips the mixer creation for such unit while it keeps parsing it. This corresponds to the case like Maya44. Cc: Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 3d2a19f849453485c6527f395fe35f85414e4f23 Author: Takashi Iwai Date: Wed Dec 19 12:36:27 2018 +0100 ALSA: usb-audio: Avoid access before bLength check in build_audio_procunit() commit f4351a199cc120ff9d59e06d02e8657d08e6cc46 upstream. The parser for the processing unit reads bNrInPins field before the bLength sanity check, which may lead to an out-of-bound access when a malformed descriptor is given. Fix it by assignment after the bLength check. Cc: Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit e189fc044135a38f0ab5ec62e465d0eef603459c Author: Dan Carpenter Date: Tue Jan 8 10:43:30 2019 +0300 ALSA: cs46xx: Potential NULL dereference in probe commit 1524f4e47f90b27a3ac84efbdd94c63172246a6f upstream. The "chip->dsp_spos_instance" can be NULL on some of the ealier error paths in snd_cs46xx_create(). Reported-by: "Yavuz, Tuba" Signed-off-by: Dan Carpenter Cc: Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit 0fde9064fea50354221c91fa63bcdd75a81b517d Author: Brad Love Date: Wed Dec 19 12:07:01 2018 -0500 media: cx23885: only reset DMA on problematic CPUs commit 4bd46aa0353e022c2401a258e93b107880a66533 upstream. It is reported that commit 95f408bbc4e4 ("media: cx23885: Ryzen DMA related RiSC engine stall fixes") caused regresssions with other CPUs. Ensure that the quirk will be applied only for the CPUs that are known to cause problems. A module option is added for explicit control of the behaviour. Fixes: 95f408bbc4e4 ("media: cx23885: Ryzen DMA related RiSC engine stall fixes") Signed-off-by: Brad Love Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit e9ef9dd3986a4d6294eef5ea3ce5e7dd32c2641d Author: Huang Ying Date: Fri Dec 28 00:39:53 2018 -0800 mm, swap: fix swapoff with KSM pages commit 7af7a8e19f0c5425ff639b0f0d2d244c2a647724 upstream. KSM pages may be mapped to the multiple VMAs that cannot be reached from one anon_vma. So during swapin, a new copy of the page need to be generated if a different anon_vma is needed, please refer to comments of ksm_might_need_to_copy() for details. During swapoff, unuse_vma() uses anon_vma (if available) to locate VMA and virtual address mapped to the page, so not all mappings to a swapped out KSM page could be found. So in try_to_unuse(), even if the swap count of a swap entry isn't zero, the page needs to be deleted from swap cache, so that, in the next round a new page could be allocated and swapin for the other mappings of the swapped out KSM page. But this contradicts with the THP swap support. Where the THP could be deleted from swap cache only after the swap count of every swap entry in the huge swap cluster backing the THP has reach 0. So try_to_unuse() is changed in commit e07098294adf ("mm, THP, swap: support to reclaim swap space for THP swapped out") to check that before delete a page from swap cache, but this has broken KSM swapoff too. Fortunately, KSM is for the normal pages only, so the original behavior for KSM pages could be restored easily via checking PageTransCompound(). That is how this patch works. The bug is introduced by e07098294adf ("mm, THP, swap: support to reclaim swap space for THP swapped out"), which is merged by v4.14-rc1. So I think we should backport the fix to from 4.14 on. But Hugh thinks it may be rare for the KSM pages being in the swap device when swapoff, so nobody reports the bug so far. Link: http://lkml.kernel.org/r/20181226051522.28442-1-ying.huang@intel.com Fixes: e07098294adf ("mm, THP, swap: support to reclaim swap space for THP swapped out") Signed-off-by: "Huang, Ying" Reported-by: Hugh Dickins Tested-by: Hugh Dickins Acked-by: Hugh Dickins Cc: Rik van Riel Cc: Johannes Weiner Cc: Minchan Kim Cc: Shaohua Li Cc: Daniel Jordan Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 8992f97a03be42cb89ee7238da56cdd8f49d4bbb Author: Dan Williams Date: Fri Dec 28 00:35:15 2018 -0800 mm, hmm: mark hmm_devmem_{add, add_resource} EXPORT_SYMBOL_GPL commit 02917e9f8676207a4c577d4d94eae12bf348e9d7 upstream. At Maintainer Summit, Greg brought up a topic I proposed around EXPORT_SYMBOL_GPL usage. The motivation was considerations for when EXPORT_SYMBOL_GPL is warranted and the criteria for taking the exceptional step of reclassifying an existing export. Specifically, I wanted to make the case that although the line is fuzzy and hard to specify in abstract terms, it is nonetheless clear that devm_memremap_pages() and HMM (Heterogeneous Memory Management) have crossed it. The devm_memremap_pages() facility should have been EXPORT_SYMBOL_GPL from the beginning, and HMM as a derivative of that functionality should have naturally picked up that designation as well. Contrary to typical rules, the HMM infrastructure was merged upstream with zero in-tree consumers. There was a promise at the time that those users would be merged "soon", but it has been over a year with no drivers arriving. While the Nouveau driver is about to belatedly make good on that promise it is clear that HMM was targeted first and foremost at an out-of-tree consumer. HMM is derived from devm_memremap_pages(), a facility Christoph and I spearheaded to support persistent memory. It combines a device lifetime model with a dynamically created 'struct page' / memmap array for any physical address range. It enables coordination and control of the many code paths in the kernel built to interact with memory via 'struct page' objects. With HMM the integration goes even deeper by allowing device drivers to hook and manipulate page fault and page free events. One interpretation of when EXPORT_SYMBOL is suitable is when it is exporting stable and generic leaf functionality. The devm_memremap_pages() facility continues to see expanding use cases, peer-to-peer DMA being the most recent, with no clear end date when it will stop attracting reworks and semantic changes. It is not suitable to export devm_memremap_pages() as a stable 3rd party driver API due to the fact that it is still changing and manipulates core behavior. Moreover, it is not in the best interest of the long term development of the core memory management subsystem to permit any external driver to effectively define its own system-wide memory management policies with no encouragement to engage with upstream. I am also concerned that HMM was designed in a way to minimize further engagement with the core-MM. That, with these hooks in place, device-drivers are free to implement their own policies without much consideration for whether and how the core-MM could grow to meet that need. Going forward not only should HMM be EXPORT_SYMBOL_GPL, but the core-MM should be allowed the opportunity and stimulus to change and address these new use cases as first class functionality. Original changelog: hmm_devmem_add(), and hmm_devmem_add_resource() duplicated devm_memremap_pages() and are now simple now wrappers around the core facility to inject a dev_pagemap instance into the global pgmap_radix and hook page-idle events. The devm_memremap_pages() interface is base infrastructure for HMM. HMM has more and deeper ties into the kernel memory management implementation than base ZONE_DEVICE which is itself a EXPORT_SYMBOL_GPL facility. Originally, the HMM page structure creation routines copied the devm_memremap_pages() code and reused ZONE_DEVICE. A cleanup to unify the implementations was discussed during the initial review: http://lkml.iu.edu/hypermail/linux/kernel/1701.2/00812.html Recent work to extend devm_memremap_pages() for the peer-to-peer-DMA facility enabled this cleanup to move forward. In addition to the integration with devm_memremap_pages() HMM depends on other GPL-only symbols: mmu_notifier_unregister_no_release percpu_ref region_intersects __class_create It goes further to consume / indirectly expose functionality that is not exported to any other driver: alloc_pages_vma walk_page_range HMM is derived from devm_memremap_pages(), and extends deep core-kernel fundamentals. Similar to devm_memremap_pages(), mark its entry points EXPORT_SYMBOL_GPL(). [logang@deltatee.com: PCI/P2PDMA: match interface changes to devm_memremap_pages()] Link: http://lkml.kernel.org/r/20181130225911.2900-1-logang@deltatee.com Link: http://lkml.kernel.org/r/154275560565.76910.15919297436557795278.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams Signed-off-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig Cc: Logan Gunthorpe Cc: "Jérôme Glisse" Cc: Balbir Singh , Cc: Michal Hocko Cc: Benjamin Herrenschmidt Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 665aaf56a6ac8a92029e434e48264165e21bc3da Author: Dan Williams Date: Fri Dec 28 00:35:11 2018 -0800 mm, hmm: replace hmm_devmem_pages_create() with devm_memremap_pages() commit bbecd94e6c514a1559fc1a7749a62715958137b1 upstream. Commit e8d513483300 ("memremap: change devm_memremap_pages interface to use struct dev_pagemap") refactored devm_memremap_pages() to allow a dev_pagemap instance to be supplied. Passing in a dev_pagemap interface simplifies the design of pgmap type drivers in that they can rely on container_of() to lookup any private data associated with the given dev_pagemap instance. In addition to the cleanups this also gives hmm users multi-order-radix improvements that arrived with commit ab1b597ee0e4 "mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups" As part of the conversion to the devm_memremap_pages() method of handling the percpu_ref relative to when pages are put, the percpu_ref completion needs to move to hmm_devmem_ref_exit(). See 71389703839e ("mm, zone_device: Replace {get, put}_zone_device_page...") for details. Link: http://lkml.kernel.org/r/154275560053.76910.10870962637383152392.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams Reviewed-by: Christoph Hellwig Reviewed-by: Jérôme Glisse Acked-by: Balbir Singh Cc: Logan Gunthorpe Cc: Michal Hocko Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit bb8067e09571c84f2872dd60ce4f6c1c235d40ea Author: Dan Williams Date: Fri Dec 28 00:35:07 2018 -0800 mm, hmm: use devm semantics for hmm_devmem_{add, remove} commit 58ef15b765af0d2cbe6799ec564f1dc485010ab8 upstream. devm semantics arrange for resources to be torn down when device-driver-probe fails or when device-driver-release completes. Similar to devm_memremap_pages() there is no need to support an explicit remove operation when the users properly adhere to devm semantics. Note that devm_kzalloc() automatically handles allocating node-local memory. Link: http://lkml.kernel.org/r/154275559545.76910.9186690723515469051.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams Reviewed-by: Christoph Hellwig Reviewed-by: Jérôme Glisse Cc: "Jérôme Glisse" Cc: Logan Gunthorpe Cc: Balbir Singh Cc: Michal Hocko Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 097605b3e7a0213a96f24ee92c4ca51369fa71ee Author: Dan Williams Date: Fri Dec 28 00:35:01 2018 -0800 mm, devm_memremap_pages: add MEMORY_DEVICE_PRIVATE support commit 69324b8f48339de2f90fdf2f774687fc6c47629a upstream. In preparation for consolidating all ZONE_DEVICE enabling via devm_memremap_pages(), teach it how to handle the constraints of MEMORY_DEVICE_PRIVATE ranges. [jglisse@redhat.com: call move_pfn_range_to_zone for MEMORY_DEVICE_PRIVATE] Link: http://lkml.kernel.org/r/154275559036.76910.12434636179931292607.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams Reviewed-by: Jérôme Glisse Acked-by: Christoph Hellwig Reported-by: Logan Gunthorpe Reviewed-by: Logan Gunthorpe Cc: Balbir Singh Cc: Michal Hocko Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 4de731b3d9d1f58b7a2aeab63fa4ba70ed0d1f1a Author: Vasily Averin Date: Mon Dec 24 14:44:42 2018 +0300 sunrpc: use SVC_NET() in svcauth_gss_* functions commit b8be5674fa9a6f3677865ea93f7803c4212f3e10 upstream. Signed-off-by: Vasily Averin Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman commit 505c37984ddb9831cbcc48e0f3a1b4db0256ab59 Author: Vasily Averin Date: Wed Nov 28 11:45:57 2018 +0300 sunrpc: fix cache_head leak due to queued request commit 4ecd55ea074217473f94cfee21bb72864d39f8d7 upstream. After commit d202cce8963d, an expired cache_head can be removed from the cache_detail's hash. However, the expired cache_head may be waiting for a reply from a previously submitted request. Such a cache_head has an increased refcounter and therefore it won't be freed after cache_put(freeme). Because the cache_head was removed from the hash it cannot be found during cache_clean() and can be leaked forever, together with stalled cache_request and other taken resources. In our case we noticed it because an entry in the export cache was holding a reference on a filesystem. Fixes d202cce8963d ("sunrpc: never return expired entries in sunrpc_cache_lookup") Cc: Pavel Tikhomirov Cc: stable@kernel.org # 2.6.35 Signed-off-by: Vasily Averin Reviewed-by: NeilBrown Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman commit 6081e10fcd83eb8590c8da6956baec99a2f70e01 Author: Michal Hocko Date: Fri Dec 28 00:39:57 2018 -0800 memcg, oom: notify on oom killer invocation from the charge path commit 7056d3a37d2c6aaaab10c13e8e69adc67ec1fc65 upstream. Burt Holzman has noticed that memcg v1 doesn't notify about OOM events via eventfd anymore. The reason is that 29ef680ae7c2 ("memcg, oom: move out_of_memory back to the charge path") has moved the oom handling back to the charge path. While doing so the notification was left behind in mem_cgroup_oom_synchronize. Fix the issue by replicating the oom hierarchy locking and the notification. Link: http://lkml.kernel.org/r/20181224091107.18354-1-mhocko@kernel.org Fixes: 29ef680ae7c2 ("memcg, oom: move out_of_memory back to the charge path") Signed-off-by: Michal Hocko Reported-by: Burt Holzman Acked-by: Johannes Weiner Cc: Vladimir Davydov [4.19+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 6e6a8b24e4e20b59a83b0cc6368dab3ad97d8dde Author: Dan Williams Date: Fri Dec 28 00:34:57 2018 -0800 mm, devm_memremap_pages: fix shutdown handling commit a95c90f1e2c253b280385ecf3d4ebfe476926b28 upstream. The last step before devm_memremap_pages() returns success is to allocate a release action, devm_memremap_pages_release(), to tear the entire setup down. However, the result from devm_add_action() is not checked. Checking the error from devm_add_action() is not enough. The api currently relies on the fact that the percpu_ref it is using is killed by the time the devm_memremap_pages_release() is run. Rather than continue this awkward situation, offload the responsibility of killing the percpu_ref to devm_memremap_pages_release() directly. This allows devm_memremap_pages() to do the right thing relative to init failures and shutdown. Without this change we could fail to register the teardown of devm_memremap_pages(). The likelihood of hitting this failure is tiny as small memory allocations almost always succeed. However, the impact of the failure is large given any future reconfiguration, or disable/enable, of an nvdimm namespace will fail forever as subsequent calls to devm_memremap_pages() will fail to setup the pgmap_radix since there will be stale entries for the physical address range. An argument could be made to require that the ->kill() operation be set in the @pgmap arg rather than passed in separately. However, it helps code readability, tracking the lifetime of a given instance, to be able to grep the kill routine directly at the devm_memremap_pages() call site. Link: http://lkml.kernel.org/r/154275558526.76910.7535251937849268605.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams Fixes: e8d513483300 ("memremap: change devm_memremap_pages interface...") Reviewed-by: "Jérôme Glisse" Reported-by: Logan Gunthorpe Reviewed-by: Logan Gunthorpe Reviewed-by: Christoph Hellwig Cc: Balbir Singh Cc: Michal Hocko Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 13ab61ae09ed9577cb4676fef4985f70400798f8 Author: Dan Williams Date: Fri Dec 28 00:34:54 2018 -0800 mm, devm_memremap_pages: kill mapping "System RAM" support commit 06489cfbd915ff36c8e36df27f1c2dc60f97ca56 upstream. Given the fact that devm_memremap_pages() requires a percpu_ref that is torn down by devm_memremap_pages_release() the current support for mapping RAM is broken. Support for remapping "System RAM" has been broken since the beginning and there is no existing user of this this code path, so just kill the support and make it an explicit error. This cleanup also simplifies a follow-on patch to fix the error path when setting a devm release action for devm_memremap_pages_release() fails. Link: http://lkml.kernel.org/r/154275557997.76910.14689813630968180480.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams Reviewed-by: "Jérôme Glisse" Reviewed-by: Christoph Hellwig Reviewed-by: Logan Gunthorpe Cc: Balbir Singh Cc: Michal Hocko Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 6765d93cb181b644498dd79af0495ecf45e41b38 Author: Dan Williams Date: Fri Dec 28 00:34:50 2018 -0800 mm, devm_memremap_pages: mark devm_memremap_pages() EXPORT_SYMBOL_GPL commit 808153e1187fa77ac7d7dad261ff476888dcf398 upstream. devm_memremap_pages() is a facility that can create struct page entries for any arbitrary range and give drivers the ability to subvert core aspects of page management. Specifically the facility is tightly integrated with the kernel's memory hotplug functionality. It injects an altmap argument deep into the architecture specific vmemmap implementation to allow allocating from specific reserved pages, and it has Linux specific assumptions about page structure reference counting relative to get_user_pages() and get_user_pages_fast(). It was an oversight and a mistake that this was not marked EXPORT_SYMBOL_GPL from the outset. Again, devm_memremap_pagex() exposes and relies upon core kernel internal assumptions and will continue to evolve along with 'struct page', memory hotplug, and support for new memory types / topologies. Only an in-kernel GPL-only driver is expected to keep up with this ongoing evolution. This interface, and functionality derived from this interface, is not suitable for kernel-external drivers. Link: http://lkml.kernel.org/r/154275557457.76910.16923571232582744134.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams Reviewed-by: Christoph Hellwig Acked-by: Michal Hocko Cc: "Jérôme Glisse" Cc: Balbir Singh Cc: Logan Gunthorpe Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit a2b977e3d9e4298d28ebe5cfff9e0859b74a7ac7 Author: Michal Hocko Date: Fri Dec 28 00:38:01 2018 -0800 hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined commit b15c87263a69272423771118c653e9a1d0672caa upstream. We have received a bug report that an injected MCE about faulty memory prevents memory offline to succeed on 4.4 base kernel. The underlying reason was that the HWPoison page has an elevated reference count and the migration keeps failing. There are two problems with that. First of all it is dubious to migrate the poisoned page because we know that accessing that memory is possible to fail. Secondly it doesn't make any sense to migrate a potentially broken content and preserve the memory corruption over to a new location. Oscar has found out that 4.4 and the current upstream kernels behave slightly differently with his simply testcase === int main(void) { int ret; int i; int fd; char *array = malloc(4096); char *array_locked = malloc(4096); fd = open("/tmp/data", O_RDONLY); read(fd, array, 4095); for (i = 0; i < 4096; i++) array_locked[i] = 'd'; ret = mlock((void *)PAGE_ALIGN((unsigned long)array_locked), sizeof(array_locked)); if (ret) perror("mlock"); sleep (20); ret = madvise((void *)PAGE_ALIGN((unsigned long)array_locked), 4096, MADV_HWPOISON); if (ret) perror("madvise"); for (i = 0; i < 4096; i++) array_locked[i] = 'd'; return 0; } === + offline this memory. In 4.4 kernels he saw the hwpoisoned page to be returned back to the LRU list kernel: [] dump_trace+0x59/0x340 kernel: [] show_stack_log_lvl+0xea/0x170 kernel: [] show_stack+0x21/0x40 kernel: [] dump_stack+0x5c/0x7c kernel: [] warn_slowpath_common+0x81/0xb0 kernel: [] __pagevec_lru_add_fn+0x14c/0x160 kernel: [] pagevec_lru_move_fn+0xad/0x100 kernel: [] __lru_cache_add+0x6c/0xb0 kernel: [] add_to_page_cache_lru+0x46/0x70 kernel: [] extent_readpages+0xc3/0x1a0 [btrfs] kernel: [] __do_page_cache_readahead+0x177/0x200 kernel: [] ondemand_readahead+0x168/0x2a0 kernel: [] generic_file_read_iter+0x41f/0x660 kernel: [] __vfs_read+0xcd/0x140 kernel: [] vfs_read+0x7a/0x120 kernel: [] kernel_read+0x3b/0x50 kernel: [] do_execveat_common.isra.29+0x490/0x6f0 kernel: [] do_execve+0x28/0x30 kernel: [] call_usermodehelper_exec_async+0xfb/0x130 kernel: [] ret_from_fork+0x55/0x80 And that latter confuses the hotremove path because an LRU page is attempted to be migrated and that fails due to an elevated reference count. It is quite possible that the reuse of the HWPoisoned page is some kind of fixed race condition but I am not really sure about that. With the upstream kernel the failure is slightly different. The page doesn't seem to have LRU bit set but isolate_movable_page simply fails and do_migrate_range simply puts all the isolated pages back to LRU and therefore no progress is made and scan_movable_pages finds same set of pages over and over again. Fix both cases by explicitly checking HWPoisoned pages before we even try to get reference on the page, try to unmap it if it is still mapped. As explained by Naoya: : Hwpoison code never unmapped those for no big reason because : Ksm pages never dominate memory, so we simply didn't have strong : motivation to save the pages. Also put WARN_ON(PageLRU) in case there is a race and we can hit LRU HWPoison pages which shouldn't happen but I couldn't convince myself about that. Naoya has noted the following: : Theoretically no such gurantee, because try_to_unmap() doesn't have a : guarantee of success and then memory_failure() returns immediately : when hwpoison_user_mappings fails. : Or the following code (comes after hwpoison_user_mappings block) also impli= : es : that the target page can still have PageLRU flag. : : /* : * Torn down by someone else? : */ : if (PageLRU(p) && !PageSwapCache(p) && p->mapping =3D=3D NULL) { : action_result(pfn, MF_MSG_TRUNCATED_LRU, MF_IGNORED); : res =3D -EBUSY; : goto out; : } : : So I think it's OK to keep "if (WARN_ON(PageLRU(page)))" block in : current version of your patch. Link: http://lkml.kernel.org/r/20181206120135.14079-1-mhocko@kernel.org Signed-off-by: Michal Hocko Reviewed-by: Oscar Salvador Debugged-by: Oscar Salvador Tested-by: Oscar Salvador Acked-by: David Hildenbrand Acked-by: Naoya Horiguchi Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 472cab0db95e0b428addd2ef84a69091695aec5c Author: Minchan Kim Date: Fri Dec 28 00:36:37 2018 -0800 zram: fix double free backing device commit 5547932dc67a48713eece4fa4703bfdf0cfcb818 upstream. If blkdev_get fails, we shouldn't do blkdev_put. Otherwise, kernel emits below log. This patch fixes it. WARNING: CPU: 0 PID: 1893 at fs/block_dev.c:1828 blkdev_put+0x105/0x120 Modules linked in: CPU: 0 PID: 1893 Comm: swapoff Not tainted 4.19.0+ #453 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 RIP: 0010:blkdev_put+0x105/0x120 Call Trace: __x64_sys_swapoff+0x46d/0x490 do_syscall_64+0x5a/0x190 entry_SYSCALL_64_after_hwframe+0x49/0xbe irq event stamp: 4466 hardirqs last enabled at (4465): __free_pages_ok+0x1e3/0x490 hardirqs last disabled at (4466): trace_hardirqs_off_thunk+0x1a/0x1c softirqs last enabled at (3420): __do_softirq+0x333/0x446 softirqs last disabled at (3407): irq_exit+0xd1/0xe0 Link: http://lkml.kernel.org/r/20181127055429.251614-3-minchan@kernel.org Signed-off-by: Minchan Kim Reviewed-by: Sergey Senozhatsky Reviewed-by: Joey Pabalinas Cc: [4.14+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 7434971300e22b1bd59daae911a22ba460717494 Author: David Herrmann Date: Tue Jan 8 13:58:52 2019 +0100 fork: record start_time late commit 7b55851367136b1efd84d98fea81ba57a98304cf upstream. This changes the fork(2) syscall to record the process start_time after initializing the basic task structure but still before making the new process visible to user-space. Technically, we could record the start_time anytime during fork(2). But this might lead to scenarios where a start_time is recorded long before a process becomes visible to user-space. For instance, with userfaultfd(2) and TLS, user-space can delay the execution of fork(2) for an indefinite amount of time (and will, if this causes network access, or similar). By recording the start_time late, it much closer reflects the point in time where the process becomes live and can be observed by other processes. Lastly, this makes it much harder for user-space to predict and control the start_time they get assigned. Previously, user-space could fork a process and stall it in copy_thread_tls() before its pid is allocated, but after its start_time is recorded. This can be misused to later-on cycle through PIDs and resume the stalled fork(2) yielding a process that has the same pid and start_time as a process that existed before. This can be used to circumvent security systems that identify processes by their pid+start_time combination. Even though user-space was always aware that start_time recording is flaky (but several projects are known to still rely on start_time-based identification), changing the start_time to be recorded late will help mitigate existing attacks and make it much harder for user-space to control the start_time a process gets assigned. Reported-by: Jann Horn Signed-off-by: Tom Gundersen Signed-off-by: David Herrmann Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit a88b5ff6cff9514bd8297ae4c4815c0410621bbb Author: Ewan D. Milne Date: Thu Dec 13 15:25:16 2018 -0500 scsi: lpfc: do not set queue->page_count to 0 if pc_sli4_params.wqpcnt is invalid commit 4e87eb2f46ea547d12a276b2e696ab934d16cfb6 upstream. Certain older adapters such as the OneConnect OCe10100 may not have a valid wqpcnt value. In this case, do not set queue->page_count to 0 in lpfc_sli4_queue_alloc() as this will prevent the driver from initializing. Fixes: 895427bd01 ("scsi: lpfc: NVME Initiator: Base modifications") Cc: stable@vger.kernel.org # 4.11+ Signed-off-by: Ewan D. Milne Reviewed-by: Laurence Oberman Tested-by: Laurence Oberman Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman commit eed234bc49619c961dfda7b1c818dd111d437c41 Author: Steffen Maier Date: Thu Dec 6 17:31:20 2018 +0100 scsi: zfcp: fix posting too many status read buffers leading to adapter shutdown commit 60a161b7e5b2a252ff0d4c622266a7d8da1120ce upstream. Suppose adapter (open) recovery is between opened QDIO queues and before (the end of) initial posting of status read buffers (SRBs). This time window can be seconds long due to FSF_PROT_HOST_CONNECTION_INITIALIZING causing by design looping with exponential increase sleeps in the function performing exchange config data during recovery [zfcp_erp_adapter_strat_fsf_xconf()]. Recovery triggered by local link up. Suppose an event occurs for which the FCP channel would send an unsolicited notification to zfcp by means of a previously posted SRB. We saw it with local cable pull (link down) in multi-initiator zoning with multiple NPIV-enabled subchannels of the same shared FCP channel. As soon as zfcp_erp_adapter_strategy_open_fsf() starts posting the initial status read buffers from within the adapter's ERP thread, the channel does send an unsolicited notification. Since v2.6.27 commit d26ab06ede83 ("[SCSI] zfcp: receiving an unsolicted status can lead to I/O stall"), zfcp_fsf_status_read_handler() schedules adapter->stat_work to re-fill the just consumed SRB from a work item. Now the ERP thread and the work item post SRBs in parallel. Both contexts call the helper function zfcp_status_read_refill(). The tracking of missing (to be posted / re-filled) SRBs is not thread-safe due to separate atomic_read() and atomic_dec(), in order to depend on posting success. Hence, both contexts can see atomic_read(&adapter->stat_miss) == 1. One of the two contexts posts one too many SRB. Zfcp gets QDIO_ERROR_SLSB_STATE on the output queue (trace tag "qdireq1") leading to zfcp_erp_adapter_shutdown() in zfcp_qdio_handler_error(). An obvious and seemingly clean fix would be to schedule stat_work from the ERP thread and wait for it to finish. This would serialize all SRB re-fills. However, we already have another work item wait on the ERP thread: adapter->scan_work runs zfcp_fc_scan_ports() which calls zfcp_fc_eval_gpn_ft(). The latter calls zfcp_erp_wait() to wait for all the open port recoveries during zfcp auto port scan, but in fact it waits for any pending recovery including an adapter recovery. This approach leads to a deadlock. [see also v3.19 commit 18f87a67e6d6 ("zfcp: auto port scan resiliency"); v2.6.37 commit d3e1088d6873 ("[SCSI] zfcp: No ERP escalation on gpn_ft eval"); v2.6.28 commit fca55b6fb587 ("[SCSI] zfcp: fix deadlock between wq triggered port scan and ERP") fixing v2.6.27 commit c57a39a45a76 ("[SCSI] zfcp: wait until adapter is finished with ERP during auto-port"); v2.6.27 commit cc8c282963bd ("[SCSI] zfcp: Automatically attach remote ports")] Instead make the accounting of missing SRBs atomic for parallel execution in both the ERP thread and adapter->stat_work. Signed-off-by: Steffen Maier Fixes: d26ab06ede83 ("[SCSI] zfcp: receiving an unsolicted status can lead to I/O stall") Cc: #2.6.27+ Reviewed-by: Jens Remus Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman