commit 343a5fbeef08baf2097b8cf4e26137cebe3cfef4 Author: Zefan Li Date: Wed Apr 27 18:55:30 2016 +0800 Linux 3.4.112 commit 97e0c9082179f7abe08d6921d3594749e2431542 Author: Andy Lutomirski Date: Wed Mar 16 14:14:21 2016 -0700 x86/iopl/64: Properly context-switch IOPL on Xen PV commit b7a584598aea7ca73140cb87b40319944dd3393f upstream. On Xen PV, regs->flags doesn't reliably reflect IOPL and the exit-to-userspace code doesn't change IOPL. We need to context switch it manually. I'm doing this without going through paravirt because this is specific to Xen PV. After the dust settles, we can merge this with the 32-bit code, tidy up the iopl syscall implementation, and remove the set_iopl pvop entirely. Fixes XSA-171. Reviewewd-by: Jan Beulich Signed-off-by: Andy Lutomirski Cc: Andrew Cooper Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: David Vrabel Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Jan Beulich Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/693c3bd7aeb4d3c27c92c622b7d0f554a458173c.1458162709.git.luto@kernel.org Signed-off-by: Ingo Molnar [ kamal: backport to 3.19-stable: no X86_FEATURE_XENPV so just call xen_pv_domain() directly ] Acked-by: Andy Lutomirski kernel.org> Signed-off-by: Kamal Mostafa canonical.com> Signed-off-by: Zefan Li commit 0765dbc54e7dcd07581edf6cc0fafc8277bbe331 Author: Christophe Leroy Date: Wed May 6 17:26:47 2015 +0200 splice: sendfile() at once fails for big files commit 0ff28d9f4674d781e492bcff6f32f0fe48cf0fed upstream. Using sendfile with below small program to get MD5 sums of some files, it appear that big files (over 64kbytes with 4k pages system) get a wrong MD5 sum while small files get the correct sum. This program uses sendfile() to send a file to an AF_ALG socket for hashing. /* md5sum2.c */ #include #include #include #include #include #include #include #include #include int main(int argc, char **argv) { int sk = socket(AF_ALG, SOCK_SEQPACKET, 0); struct stat st; struct sockaddr_alg sa = { .salg_family = AF_ALG, .salg_type = "hash", .salg_name = "md5", }; int n; bind(sk, (struct sockaddr*)&sa, sizeof(sa)); for (n = 1; n < argc; n++) { int size; int offset = 0; char buf[4096]; int fd; int sko; int i; fd = open(argv[n], O_RDONLY); sko = accept(sk, NULL, 0); fstat(fd, &st); size = st.st_size; sendfile(sko, fd, &offset, size); size = read(sko, buf, sizeof(buf)); for (i = 0; i < size; i++) printf("%2.2x", buf[i]); printf(" %s\n", argv[n]); close(fd); close(sko); } exit(0); } Test below is done using official linux patch files. First result is with a software based md5sum. Second result is with the program above. root@vgoip:~# ls -l patch-3.6.* -rw-r--r-- 1 root root 64011 Aug 24 12:01 patch-3.6.2.gz -rw-r--r-- 1 root root 94131 Aug 24 12:01 patch-3.6.3.gz root@vgoip:~# md5sum patch-3.6.* b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz c5e8f687878457db77cb7158c38a7e43 patch-3.6.3.gz root@vgoip:~# ./md5sum2 patch-3.6.* b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz 5fd77b24e68bb24dcc72d6e57c64790e patch-3.6.3.gz After investivation, it appears that sendfile() sends the files by blocks of 64kbytes (16 times PAGE_SIZE). The problem is that at the end of each block, the SPLICE_F_MORE flag is missing, therefore the hashing operation is reset as if it was the end of the file. This patch adds SPLICE_F_MORE to the flags when more data is pending. With the patch applied, we get the correct sums: root@vgoip:~# md5sum patch-3.6.* b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz c5e8f687878457db77cb7158c38a7e43 patch-3.6.3.gz root@vgoip:~# ./md5sum2 patch-3.6.* b3ffb9848196846f31b2ff133d2d6443 patch-3.6.2.gz c5e8f687878457db77cb7158c38a7e43 patch-3.6.3.gz Signed-off-by: Christophe Leroy Signed-off-by: Jens Axboe Cc: Ben Hutchings Signed-off-by: Zefan Li commit b381fbc509052d07ccf8641fd7560a25d46aaf1e Author: Ben Hutchings Date: Sat Feb 13 02:34:52 2016 +0000 pipe: Fix buffer offset after partially failed read Quoting the RHEL advisory: > It was found that the fix for CVE-2015-1805 incorrectly kept buffer > offset and buffer length in sync on a failed atomic read, potentially > resulting in a pipe buffer state corruption. A local, unprivileged user > could use this flaw to crash the system or leak kernel memory to user > space. (CVE-2016-0774, Moderate) The same flawed fix was applied to stable branches from 2.6.32.y to 3.14.y inclusive, and I was able to reproduce the issue on 3.2.y. We need to give pipe_iov_copy_to_user() a separate offset variable and only update the buffer offset if it succeeds. References: https://rhn.redhat.com/errata/RHSA-2016-0103.html Signed-off-by: Ben Hutchings Cc: Jeffrey Vander Stoep Signed-off-by: Zefan Li commit e08cc94c26fab53cf0d2c655ecdcaf39d31dd18a Author: Ben Hutchings Date: Wed Nov 18 02:01:21 2015 +0000 usb: Use the USB_SS_MULT() macro to decode burst multiplier for log message commit 5377adb092664d336ac212499961cac5e8728794 upstream. usb_parse_ss_endpoint_companion() now decodes the burst multiplier correctly in order to check that it's <= 3, but still uses the wrong expression if warning that it's > 3. Fixes: ff30cbc8da42 ("usb: Use the USB_SS_MULT() macro to get the ...") Signed-off-by: Ben Hutchings Signed-off-by: Greg Kroah-Hartman Signed-off-by: Zefan Li commit 9237baa5c61ef9e11d8a71d02d73b53d8a2b7d01 Author: Nate Dailey Date: Mon Feb 29 10:43:58 2016 -0500 raid1: include bio_end_io_list in nr_queued to prevent freeze_array hang commit ccfc7bf1f09d6190ef86693ddc761d5fe3fa47cb upstream. If raid1d is handling a mix of read and write errors, handle_read_error's call to freeze_array can get stuck. This can happen because, though the bio_end_io_list is initially drained, writes can be added to it via handle_write_finished as the retry_list is processed. These writes contribute to nr_pending but are not included in nr_queued. If a later entry on the retry_list triggers a call to handle_read_error, freeze array hangs waiting for nr_pending == nr_queued+extra. The writes on the bio_end_io_list aren't included in nr_queued so the condition will never be satisfied. To prevent the hang, include bio_end_io_list writes in nr_queued. There's probably a better way to handle decrementing nr_queued, but this seemed like the safest way to avoid breaking surrounding code. I'm happy to supply the script I used to repro this hang. Fixes: 55ce74d4bfe1b(md/raid1: ensure device failure recorded before write request returns.) Signed-off-by: Nate Dailey Signed-off-by: Shaohua Li Signed-off-by: Zefan Li commit 4b7e6b747c90c912340b5f3f3c876d81d60cf273 Author: Dāvis Mosāns Date: Fri Aug 21 07:29:22 2015 +0300 mvsas: Fix NULL pointer dereference in mvs_slot_task_free commit 2280521719e81919283b82902ac24058f87dfc1b upstream. When pci_pool_alloc fails in mvs_task_prep then task->lldd_task stays NULL but it's later used in mvs_abort_task as slot which is passed to mvs_slot_task_free causing NULL pointer dereference. Just return from mvs_slot_task_free when passed with NULL slot. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=101891 Signed-off-by: Dāvis Mosāns Reviewed-by: Tomas Henzl Reviewed-by: Johannes Thumshirn Signed-off-by: James Bottomley Signed-off-by: Zefan Li commit ede7386c3e8605a23c9ea89530894ce074397aa6 Author: Mike Snitzer Date: Thu Oct 22 10:56:40 2015 -0400 dm btree: fix leak of bufio-backed block in btree_split_beneath error path commit 4dcb8b57df3593dcb20481d9d6cf79d1dc1534be upstream. btree_split_beneath()'s error path had an outstanding FIXME that speaks directly to the potential for _not_ cleaning up a previously allocated bufio-backed block. Fix this by releasing the previously allocated bufio block using unlock_block(). Reported-by: Mikulas Patocka Signed-off-by: Mike Snitzer Acked-by: Joe Thornber Signed-off-by: Zefan Li commit e2c8a2c8819e0840e85aee5132f0925f92c047c8 Author: Jan Kara Date: Thu Oct 22 13:32:21 2015 -0700 mm: make sendfile(2) killable commit 296291cdd1629c308114504b850dc343eabc2782 upstream. Currently a simple program below issues a sendfile(2) system call which takes about 62 days to complete in my test KVM instance. int fd; off_t off = 0; fd = open("file", O_RDWR | O_TRUNC | O_SYNC | O_CREAT, 0644); ftruncate(fd, 2); lseek(fd, 0, SEEK_END); sendfile(fd, fd, &off, 0xfffffff); Now you should not ask kernel to do a stupid stuff like copying 256MB in 2-byte chunks and call fsync(2) after each chunk but if you do, sysadmin should have a way to stop you. We actually do have a check for fatal_signal_pending() in generic_perform_write() which triggers in this path however because we always succeed in writing something before the check is done, we return value > 0 from generic_perform_write() and thus the information about signal gets lost. Fix the problem by doing the signal check before writing anything. That way generic_perform_write() returns -EINTR, the error gets propagated up and the sendfile loop terminates early. Signed-off-by: Jan Kara Reported-by: Dmitry Vyukov Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Zefan Li commit b1d64b01ee7d043373a068c6b58b058876e56c7d Author: Ilia Mirkin Date: Tue Oct 20 01:15:39 2015 -0400 drm/nouveau/gem: return only valid domain when there's only one commit 2a6c521bb41ce862e43db46f52e7681d33e8d771 upstream. On nv50+, we restrict the valid domains to just the one where the buffer was originally created. However after the buffer is evicted to system memory, we might move it back to a different domain that was not originally valid. When sharing the buffer and retrieving its GEM_INFO data, we still want the domain that will be valid for this buffer in a pushbuf, not the one where it currently happens to be. This resolves fdo#92504 and several others. These are due to suspend evicting all buffers, making it more likely that they temporarily end up in the wrong place. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92504 Signed-off-by: Ilia Mirkin Signed-off-by: Ben Skeggs Signed-off-by: Zefan Li commit f9a3e62aa95e2f8c62e3d2c8188cbfe9fbfa1290 Author: Joerg Roedel Date: Tue Oct 20 14:59:36 2015 +0200 iommu/amd: Don't clear DTE flags when modifying it commit cbf3ccd09d683abf1cacd36e3640872ee912d99b upstream. During device assignment/deassignment the flags in the DTE get lost, which might cause spurious faults, for example when the device tries to access the system management range. Fix this by not clearing the flags with the rest of the DTE. Reported-by: G. Richard Bellamy Tested-by: G. Richard Bellamy Signed-off-by: Joerg Roedel Signed-off-by: Zefan Li commit b5fb46e134ce146fc20672e9b2a57da977c5f26f Author: Charles Keepax Date: Tue Oct 20 10:25:58 2015 +0100 ASoC: wm8904: Correct number of EQ registers commit 97aff2c03a1e4d343266adadb52313613efb027f upstream. There are 24 EQ registers not 25, I suspect this bug came about because the registers start at EQ1 not zero. The bug is relatively harmless as the extra register written is an unused one. Signed-off-by: Charles Keepax Signed-off-by: Mark Brown Signed-off-by: Zefan Li commit 394d806071beb3be46d7b3922af27039f283b57f Author: Herbert Xu Date: Mon Oct 19 18:23:57 2015 +0800 crypto: api - Only abort operations on fatal signal commit 3fc89adb9fa4beff31374a4bf50b3d099d88ae83 upstream. Currently a number of Crypto API operations may fail when a signal occurs. This causes nasty problems as the caller of those operations are often not in a good position to restart the operation. In fact there is currently no need for those operations to be interrupted by user signals at all. All we need is for them to be killable. This patch replaces the relevant calls of signal_pending with fatal_signal_pending, and wait_for_completion_interruptible with wait_for_completion_killable, respectively. Signed-off-by: Herbert Xu Signed-off-by: Zefan Li commit 4333b97f9ab15347378efda6370f9eea974ad94a Author: Laura Abbott Date: Mon Oct 12 11:30:13 2015 +0300 xhci: Add spurious wakeup quirk for LynxPoint-LP controllers commit fd7cd061adcf5f7503515ba52b6a724642a839c8 upstream. We received several reports of systems rebooting and powering on after an attempted shutdown. Testing showed that setting XHCI_SPURIOUS_WAKEUP quirk in addition to the XHCI_SPURIOUS_REBOOT quirk allowed the system to shutdown as expected for LynxPoint-LP xHCI controllers. Set the quirk back. Note that the quirk was originally introduced for LynxPoint and LynxPoint-LP just for this same reason. See: commit 638298dc66ea ("xhci: Fix spurious wakeups after S5 on Haswell") It was later limited to only concern HP machines as it caused regression on some machines, see both bug and commit: Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=66171 commit 6962d914f317 ("xhci: Limit the spurious wakeup fix only to HP machines") Later it was discovered that the powering on after shutdown was limited to LynxPoint-LP (Haswell-ULT) and that some non-LP HP machine suffered from spontaneous resume from S3 (which should not be related to the SPURIOUS_WAKEUP quirk at all). An attempt to fix this then removed the SPURIOUS_WAKEUP flag usage completely. commit b45abacde3d5 ("xhci: no switching back on non-ULT Haswell") Current understanding is that LynxPoint-LP (Haswell ULT) machines need the SPURIOUS_WAKEUP quirk, otherwise they will restart, and plain Lynxpoint (Haswell) machines may _not_ have the quirk set otherwise they again will restart. Signed-off-by: Laura Abbott Cc: Takashi Iwai Cc: Oliver Neukum [Added more history to commit message -Mathias] Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman Signed-off-by: Zefan Li commit 720083a9e3257eee6efa05eae310d3250f5270a5 Author: Mathias Nyman Date: Mon Oct 12 11:30:12 2015 +0300 xhci: handle no ping response error properly commit 3b4739b8951d650becbcd855d7d6f18ac98a9a85 upstream. If a host fails to wake up a isochronous SuperSpeed device from U1/U2 in time for a isoch transfer it will generate a "No ping response error" Host will then move to the next transfer descriptor. Handle this case in the same way as missed service errors, tag the current TD as skipped and handle it on the next transfer event. Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman Signed-off-by: Zefan Li commit d425239d4fa8ba4aa7ffa6d9e4a2584563bdad16 Author: Christian Zander Date: Wed Jun 10 09:41:45 2015 -0700 iommu/vt-d: fix range computation when making room for large pages commit ba2374fd2bf379f933773811fdb06cb6a5445f41 upstream. In preparation for the installation of a large page, any small page tables that may still exist in the target IOV address range are removed. However, if a scatter/gather list entry is large enough to fit more than one large page, the address space for any subsequent large pages is not cleared of conflicting small page tables. This can cause legitimate mapping requests to fail with errors of the form below, potentially followed by a series of IOMMU faults: ERROR: DMA PTE for vPFN 0xfde00 already set (to 7f83a4003 not 7e9e00083) In this example, a 4MiB scatter/gather list entry resulted in the successful installation of a large page @ vPFN 0xfdc00, followed by a failed attempt to install another large page @ vPFN 0xfde00, due to the presence of a pointer to a small page table @ 0x7f83a4000. To address this problem, compute the number of large pages that fit into a given scatter/gather list entry, and use it to derive the last vPFN covered by the large page(s). Signed-off-by: Christian Zander Signed-off-by: David Woodhouse [bwh: Backported to 3.2: - Add the lvl_pages variable, added by an earlier commit upstream - Also change arguments to dma_pte_clear_range(), which is called by dma_pte_free_pagetable() upstream] Signed-off-by: Ben Hutchings Signed-off-by: Zefan Li commit 2147886bdb36b55873f83ccd0322b2f390bfee4b Author: Russell King Date: Fri Oct 9 20:43:33 2015 +0100 crypto: ahash - ensure statesize is non-zero commit 8996eafdcbad149ac0f772fb1649fbb75c482a6a upstream. Unlike shash algorithms, ahash drivers must implement export and import as their descriptors may contain hardware state and cannot be exported as is. Unfortunately some ahash drivers did not provide them and end up causing crashes with algif_hash. This patch adds a check to prevent these drivers from registering ahash algorithms until they are fixed. Signed-off-by: Russell King Signed-off-by: Herbert Xu Signed-off-by: Zefan Li commit efb049430f508a269d6b9214e3a7138edec0fbfb Author: Cathy Avery Date: Fri Oct 2 09:35:01 2015 -0400 xen-blkfront: check for null drvdata in blkback_changed (XenbusStateClosing) commit a54c8f0f2d7df525ff997e2afe71866a1a013064 upstream. xen-blkfront will crash if the check to talk_to_blkback() in blkback_changed()(XenbusStateInitWait) returns an error. The driver data is freed and info is set to NULL. Later during the close process via talk_to_blkback's call to xenbus_dev_fatal() the null pointer is passed to and dereference in blkfront_closing. Signed-off-by: Cathy Avery Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Zefan Li commit de7f6bfbcce6468304cb9c1c4bb2e5b7c6616778 Author: Takashi Iwai Date: Mon Oct 5 16:55:09 2015 +0200 ALSA: synth: Fix conflicting OSS device registration on AWE32 commit 225db5762dc1a35b26850477ffa06e5cd0097243 upstream. When OSS emulation is loaded on ISA SB AWE32 chip, we get now kernel warnings like: WARNING: CPU: 0 PID: 2791 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x51/0x80() sysfs: cannot create duplicate filename '/devices/isa/sbawe.0/sound/card0/seq-oss-0-0' It's because both emux synth and opl3 drivers try to register their OSS device object with the same static index number 0. This hasn't been a big problem until the recent rewrite of device management code (that exposes sysfs at the same time), but it's been an obvious bug. This patch works around it just by using a different index number of emux synth object. There can be a more elegant way to fix, but it's enough for now, as this code won't be touched so often, in anyway. Reported-and-tested-by: Michael Shell Signed-off-by: Takashi Iwai Signed-off-by: Zefan Li commit 3f258c664a0c84414c4ec17128fc877720866425 Author: Jann Horn Date: Sun Oct 4 19:29:12 2015 +0200 drivers/tty: require read access for controlling terminal commit 0c55627167870255158db1cde0d28366f91c8872 upstream. This is mostly a hardening fix, given that write-only access to other users' ttys is usually only given through setgid tty executables. Signed-off-by: Jann Horn Signed-off-by: Greg Kroah-Hartman [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li commit 3a0b4c1d2b134d8d33793b66de7cc91089c0e33a Author: Kosuke Tatsukawa Date: Fri Oct 2 08:27:05 2015 +0000 tty: fix stall caused by missing memory barrier in drivers/tty/n_tty.c commit e81107d4c6bd098878af9796b24edc8d4a9524fd upstream. My colleague ran into a program stall on a x86_64 server, where n_tty_read() was waiting for data even if there was data in the buffer in the pty. kernel stack for the stuck process looks like below. #0 [ffff88303d107b58] __schedule at ffffffff815c4b20 #1 [ffff88303d107bd0] schedule at ffffffff815c513e #2 [ffff88303d107bf0] schedule_timeout at ffffffff815c7818 #3 [ffff88303d107ca0] wait_woken at ffffffff81096bd2 #4 [ffff88303d107ce0] n_tty_read at ffffffff8136fa23 #5 [ffff88303d107dd0] tty_read at ffffffff81368013 #6 [ffff88303d107e20] __vfs_read at ffffffff811a3704 #7 [ffff88303d107ec0] vfs_read at ffffffff811a3a57 #8 [ffff88303d107f00] sys_read at ffffffff811a4306 #9 [ffff88303d107f50] entry_SYSCALL_64_fastpath at ffffffff815c86d7 There seems to be two problems causing this issue. First, in drivers/tty/n_tty.c, __receive_buf() stores the data and updates ldata->commit_head using smp_store_release() and then checks the wait queue using waitqueue_active(). However, since there is no memory barrier, __receive_buf() could return without calling wake_up_interactive_poll(), and at the same time, n_tty_read() could start to wait in wait_woken() as in the following chart. __receive_buf() n_tty_read() ------------------------------------------------------------------------ if (waitqueue_active(&tty->read_wait)) /* Memory operations issued after the RELEASE may be completed before the RELEASE operation has completed */ add_wait_queue(&tty->read_wait, &wait); ... if (!input_available_p(tty, 0)) { smp_store_release(&ldata->commit_head, ldata->read_head); ... timeout = wait_woken(&wait, TASK_INTERRUPTIBLE, timeout); ------------------------------------------------------------------------ The second problem is that n_tty_read() also lacks a memory barrier call and could also cause __receive_buf() to return without calling wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken() as in the chart below. __receive_buf() n_tty_read() ------------------------------------------------------------------------ spin_lock_irqsave(&q->lock, flags); /* from add_wait_queue() */ ... if (!input_available_p(tty, 0)) { /* Memory operations issued after the RELEASE may be completed before the RELEASE operation has completed */ smp_store_release(&ldata->commit_head, ldata->read_head); if (waitqueue_active(&tty->read_wait)) __add_wait_queue(q, wait); spin_unlock_irqrestore(&q->lock,flags); /* from add_wait_queue() */ ... timeout = wait_woken(&wait, TASK_INTERRUPTIBLE, timeout); ------------------------------------------------------------------------ There are also other places in drivers/tty/n_tty.c which have similar calls to waitqueue_active(), so instead of adding many memory barrier calls, this patch simply removes the call to waitqueue_active(), leaving just wake_up*() behind. This fixes both problems because, even though the memory access before or after the spinlocks in both wake_up*() and add_wait_queue() can sneak into the critical section, it cannot go past it and the critical section assures that they will be serialized (please see "INTER-CPU ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for a better explanation). Moreover, the resulting code is much simpler. Latency measurement using a ping-pong test over a pty doesn't show any visible performance drop. Signed-off-by: Kosuke Tatsukawa Signed-off-by: Greg Kroah-Hartman [lizf: Backported to 3.4: - adjust context - s/wake_up_interruptible_poll/wake_up_interruptible/ - drop changes to __receive_buf() and n_tty_set_termios()] Signed-off-by: Zefan Li commit d6706f05d3d4246d94fc126a3632072a4c8daef6 Author: Vincent Palatin Date: Thu Oct 1 14:10:22 2015 -0700 usb: Add device quirk for Logitech PTZ cameras commit 72194739f54607bbf8cfded159627a2015381557 upstream. Add a device quirk for the Logitech PTZ Pro Camera and its sibling the ConferenceCam CC3000e Camera. This fixes the failed camera enumeration on some boot, particularly on machines with fast CPU. Tested by connecting a Logitech PTZ Pro Camera to a machine with a Haswell Core i7-4600U CPU @ 2.10GHz, and doing thousands of reboot cycles while recording the kernel logs and taking camera picture after each boot. Before the patch, more than 7% of the boots show some enumeration transfer failures and in a few of them, the kernel is giving up before actually enumerating the webcam. After the patch, the enumeration has been correct on every reboot. Signed-off-by: Vincent Palatin Signed-off-by: Greg Kroah-Hartman Signed-off-by: Zefan Li commit 18316131ccdd4cba57f2f96239143e2a8754ece2 Author: Yao-Wen Mao Date: Mon Aug 31 14:24:09 2015 +0800 USB: Add reset-resume quirk for two Plantronics usb headphones. commit 8484bf2981b3d006426ac052a3642c9ce1d8d980 upstream. These two headphones need a reset-resume quirk to properly resume to original volume level. Signed-off-by: Yao-Wen Mao Signed-off-by: Greg Kroah-Hartman Signed-off-by: Zefan Li commit b81cc21d0356a38838502ae2fae709382a324b75 Author: John Stultz Date: Mon Sep 14 18:05:20 2015 -0700 clocksource: Fix abs() usage w/ 64bit values commit 67dfae0cd72fec5cd158b6e5fb1647b7dbe0834c upstream. This patch fixes one cases where abs() was being used with 64-bit nanosecond values, where the result may be capped at 32-bits. This potentially could cause watchdog false negatives on 32-bit systems, so this patch addresses the issue by using abs64(). Signed-off-by: John Stultz Cc: Prarit Bhargava Cc: Richard Cochran Cc: Ingo Molnar Link: http://lkml.kernel.org/r/1442279124-7309-2-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li commit e9c235998ff71a2f837c778dbce6a8fc674bdca0 Author: Mel Gorman Date: Thu Oct 1 15:36:57 2015 -0700 mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault commit 2f84a8990ebbe235c59716896e017c6b2ca1200f upstream. SunDong reported the following on https://bugzilla.kernel.org/show_bug.cgi?id=103841 I think I find a linux bug, I have the test cases is constructed. I can stable recurring problems in fedora22(4.0.4) kernel version, arch for x86_64. I construct transparent huge page, when the parent and child process with MAP_SHARE, MAP_PRIVATE way to access the same huge page area, it has the opportunity to lead to huge page copy on write failure, and then it will munmap the child corresponding mmap area, but then the child mmap area with VM_MAYSHARE attributes, child process munmap this area can trigger VM_BUG_ON in set_vma_resv_flags functions (vma - > vm_flags & VM_MAYSHARE). There were a number of problems with the report (e.g. it's hugetlbfs that triggers this, not transparent huge pages) but it was fundamentally correct in that a VM_BUG_ON in set_vma_resv_flags() can be triggered that looks like this vma ffff8804651fd0d0 start 00007fc474e00000 end 00007fc475e00000 next ffff8804651fd018 prev ffff8804651fd188 mm ffff88046b1b1800 prot 8000000000000027 anon_vma (null) vm_ops ffffffff8182a7a0 pgoff 0 file ffff88106bdb9800 private_data (null) flags: 0x84400fb(read|write|shared|mayread|maywrite|mayexec|mayshare|dontexpand|hugetlb) ------------ kernel BUG at mm/hugetlb.c:462! SMP Modules linked in: xt_pkttype xt_LOG xt_limit [..] CPU: 38 PID: 26839 Comm: map Not tainted 4.0.4-default #1 Hardware name: Dell Inc. PowerEdge R810/0TT6JF, BIOS 2.7.4 04/26/2012 set_vma_resv_flags+0x2d/0x30 The VM_BUG_ON is correct because private and shared mappings have different reservation accounting but the warning clearly shows that the VMA is shared. When a private COW fails to allocate a new page then only the process that created the VMA gets the page -- all the children unmap the page. If the children access that data in the future then they get killed. The problem is that the same file is mapped shared and private. During the COW, the allocation fails, the VMAs are traversed to unmap the other private pages but a shared VMA is found and the bug is triggered. This patch identifies such VMAs and skips them. Signed-off-by: Mel Gorman Reported-by: SunDong Reviewed-by: Michal Hocko Cc: Andrea Arcangeli Cc: Hugh Dickins Cc: Naoya Horiguchi Cc: David Rientjes Reviewed-by: Naoya Horiguchi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Zefan Li commit a03288decdc919c95a23729e6f9c8934c5809510 Author: Ben Hutchings Date: Sat Sep 26 12:23:56 2015 +0100 genirq: Fix race in register_irq_proc() commit 95c2b17534654829db428f11bcf4297c059a2a7e upstream. Per-IRQ directories in procfs are created only when a handler is first added to the irqdesc, not when the irqdesc is created. In the case of a shared IRQ, multiple tasks can race to create a directory. This race condition seems to have been present forever, but is easier to hit with async probing. Signed-off-by: Ben Hutchings Link: http://lkml.kernel.org/r/1443266636.2004.2.camel@decadent.org.uk Signed-off-by: Thomas Gleixner Signed-off-by: Zefan Li commit 12bdf057277f6ef12aee6e62e04d920d558d197f Author: Thomas Gleixner Date: Wed Sep 30 08:38:22 2015 +0000 x86/process: Add proper bound checks in 64bit get_wchan() commit eddd3826a1a0190e5235703d1e666affa4d13b96 upstream. Dmitry Vyukov reported the following using trinity and the memory error detector AddressSanitizer (https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel). [ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on address ffff88002e280000 [ 124.576801] ffff88002e280000 is located 131938492886538 bytes to the left of 28857600-byte region [ffffffff81282e0a, ffffffff82e0830a) [ 124.578633] Accessed by thread T10915: [ 124.579295] inlined in describe_heap_address ./arch/x86/mm/asan/report.c:164 [ 124.579295] #0 ffffffff810dd277 in asan_report_error ./arch/x86/mm/asan/report.c:278 [ 124.580137] #1 ffffffff810dc6a0 in asan_check_region ./arch/x86/mm/asan/asan.c:37 [ 124.581050] #2 ffffffff810dd423 in __tsan_read8 ??:0 [ 124.581893] #3 ffffffff8107c093 in get_wchan ./arch/x86/kernel/process_64.c:444 The address checks in the 64bit implementation of get_wchan() are wrong in several ways: - The lower bound of the stack is not the start of the stack page. It's the start of the stack page plus sizeof (struct thread_info) - The upper bound must be: top_of_stack - TOP_OF_KERNEL_STACK_PADDING - 2 * sizeof(unsigned long). The 2 * sizeof(unsigned long) is required because the stack pointer points at the frame pointer. The layout on the stack is: ... IP FP ... IP FP. So we need to make sure that both IP and FP are in the bounds. Fix the bound checks and get rid of the mix of numeric constants, u64 and unsigned long. Making all unsigned long allows us to use the same function for 32bit as well. Use READ_ONCE() when accessing the stack. This does not prevent a concurrent wakeup of the task and the stack changing, but at least it avoids TOCTOU. Also check task state at the end of the loop. Again that does not prevent concurrent changes, but it avoids walking for nothing. Add proper comments while at it. Reported-by: Dmitry Vyukov Reported-by: Sasha Levin Based-on-patch-from: Wolfram Gloger Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Dmitry Vyukov Cc: Andrey Ryabinin Cc: Andy Lutomirski Cc: Andrey Konovalov Cc: Kostya Serebryany Cc: Alexander Potapenko Cc: kasan-dev Cc: Denys Vlasenko Cc: Andi Kleen Cc: Wolfram Gloger Link: http://lkml.kernel.org/r/20150930083302.694788319@linutronix.de Signed-off-by: Thomas Gleixner [lizf: Backported to 3.4: - s/READ_ONCE/ACCESS_ONCE - remove TOP_OF_KERNEL_STACK_PADDING] Signed-off-by: Zefan Li commit 127cf7cb2ae3657e3ee4141216997b0a8a25df99 Author: shengyong Date: Mon Sep 28 17:57:19 2015 +0000 UBI: return ENOSPC if no enough space available commit 7c7feb2ebfc9c0552c51f0c050db1d1a004faac5 upstream. UBI: attaching mtd1 to ubi0 UBI: scanning is finished UBI error: init_volumes: not enough PEBs, required 706, available 686 UBI error: ubi_wl_init: no enough physical eraseblocks (-20, need 1) UBI error: ubi_attach_mtd_dev: failed to attach mtd1, error -12 <= NOT ENOMEM UBI error: ubi_init: cannot attach mtd1 If available PEBs are not enough when initializing volumes, return -ENOSPC directly. If available PEBs are not enough when initializing WL, return -ENOSPC instead of -ENOMEM. Signed-off-by: Sheng Yong Signed-off-by: Richard Weinberger Reviewed-by: David Gstir Signed-off-by: Zefan Li commit 15acb368e63874fbb7b34d424185921e9a8dc996 Author: Richard Weinberger Date: Tue Sep 22 23:58:07 2015 +0200 UBI: Validate data_size commit 281fda27673f833a01d516658a64d22a32c8e072 upstream. Make sure that data_size is less than LEB size. Otherwise a handcrafted UBI image is able to trigger an out of bounds memory access in ubi_compare_lebs(). Signed-off-by: Richard Weinberger Reviewed-by: David Gstir [lizf: Backported to 3.4: use dbg_err() instead of ubi_err()]; Signed-off-by: Zefan Li commit 9ed559d3d6fd06cb3a20e120f8b55be46c8d33db Author: Malcolm Crossley Date: Mon Sep 28 11:36:52 2015 +0100 x86/xen: Do not clip xen_e820_map to xen_e820_map_entries when sanitizing map commit 64c98e7f49100b637cd20a6c63508caed6bbba7a upstream. Sanitizing the e820 map may produce extra E820 entries which would result in the topmost E820 entries being removed. The removed entries would typically include the top E820 usable RAM region and thus result in the domain having signicantly less RAM available to it. Fix by allowing sanitize_e820_map to use the full size of the allocated E820 array. Signed-off-by: Malcolm Crossley Reviewed-by: Boris Ostrovsky Signed-off-by: David Vrabel [lizf: Backported to 3.4: s/map/xen_e820_map] Signed-off-by: Zefan Li commit d311156acfbc6ec43df3898f7cd95313b6877174 Author: Andreas Schwab Date: Wed Sep 23 23:12:09 2015 +0200 m68k: Define asmlinkage_protect commit 8474ba74193d302e8340dddd1e16c85cc4b98caf upstream. Make sure the compiler does not modify arguments of syscall functions. This can happen if the compiler generates a tailcall to another function. For example, without asmlinkage_protect sys_openat is compiled into this function: sys_openat: clr.l %d0 move.w 18(%sp),%d0 move.l %d0,16(%sp) jbra do_sys_open Note how the fourth argument is modified in place, modifying the register %d4 that gets restored from this stack slot when the function returns to user-space. The caller may expect the register to be unmodified across system calls. Signed-off-by: Andreas Schwab Signed-off-by: Geert Uytterhoeven Signed-off-by: Zefan Li commit edb236d7d193cb7ae1dec654183be24b60334474 Author: Felix Fietkau Date: Thu Sep 24 16:59:46 2015 +0200 ath9k: declare required extra tx headroom commit 029cd0370241641eb70235d205aa0b90c84dce44 upstream. ath9k inserts padding between the 802.11 header and the data area (to align it). Since it didn't declare this extra required headroom, this led to some nasty issues like randomly dropped packets in some setups. Signed-off-by: Felix Fietkau Signed-off-by: Kalle Valo Signed-off-by: Zefan Li commit f5499bfc0b4645b44537cdbda9e58f5dfd0cb2af Author: Joseph Qi Date: Tue Sep 22 14:59:20 2015 -0700 ocfs2/dlm: fix deadlock when dispatch assert master commit 012572d4fc2e4ddd5c8ec8614d51414ec6cae02a upstream. The order of the following three spinlocks should be: dlm_domain_lock < dlm_ctxt->spinlock < dlm_lock_resource->spinlock But dlm_dispatch_assert_master() is called while holding dlm_ctxt->spinlock and dlm_lock_resource->spinlock, and then it calls dlm_grab() which will take dlm_domain_lock. Once another thread (for example, dlm_query_join_handler) has already taken dlm_domain_lock, and tries to take dlm_ctxt->spinlock deadlock happens. Signed-off-by: Joseph Qi Cc: Joel Becker Cc: Mark Fasheh Cc: "Junxiao Bi" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li commit 6fa2028d94598e9a98ef4420ef236301be6b3cfd Author: Peter Seiderer Date: Thu Sep 17 21:40:12 2015 +0200 cifs: use server timestamp for ntlmv2 authentication commit 98ce94c8df762d413b3ecb849e2b966b21606d04 upstream. Linux cifs mount with ntlmssp against an Mac OS X (Yosemite 10.10.5) share fails in case the clocks differ more than +/-2h: digest-service: digest-request: od failed with 2 proto=ntlmv2 digest-service: digest-request: kdc failed with -1561745592 proto=ntlmv2 Fix this by (re-)using the given server timestamp for the ntlmv2 authentication (as Windows 7 does). A related problem was also reported earlier by Namjae Jaen (see below): Windows machine has extended security feature which refuse to allow authentication when there is time difference between server time and client time when ntlmv2 negotiation is used. This problem is prevalent in embedded enviornment where system time is set to default 1970. Modern servers send the server timestamp in the TargetInfo Av_Pair structure in the challenge message [see MS-NLMP 2.2.2.1] In [MS-NLMP 3.1.5.1.2] it is explicitly mentioned that the client must use the server provided timestamp if present OR current time if it is not Reported-by: Namjae Jeon Signed-off-by: Peter Seiderer Signed-off-by: Steve French [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li commit 276a6c94ded4ad29a4f86381c5f80b840b8a7030 Author: Mathias Nyman Date: Mon Sep 21 17:46:16 2015 +0300 xhci: change xhci 1.0 only restrictions to support xhci 1.1 commit dca7794539eff04b786fb6907186989e5eaaa9c2 upstream. Some changes between xhci 0.96 and xhci 1.0 specifications forced us to check the hci version in code, some of these checks were implemented as hci_version == 1.0, which will not work with new xhci 1.1 controllers. xhci 1.1 behaves similar to xhci 1.0 in these cases, so change these checks to hci_version >= 1.0 Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman Signed-off-by: Zefan Li commit fa8600fa40e5ca152b8ece604579535b4296145f Author: Roger Quadros Date: Mon Sep 21 17:46:13 2015 +0300 usb: xhci: Clear XHCI_STATE_DYING on start commit e5bfeab0ad515b4f6df39fe716603e9dc6d3dfd0 upstream. For whatever reason if XHCI died in the previous instant then it will never recover on the next xhci_start unless we clear the DYING flag. Signed-off-by: Roger Quadros Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman Signed-off-by: Zefan Li commit 63a2bddd9e9d605149860e819cca40f79835eeed Author: Mathias Nyman Date: Mon Sep 21 17:46:10 2015 +0300 xhci: give command abortion one more chance before killing xhci commit a6809ffd1687b3a8c192960e69add559b9d32649 upstream. We want to give the command abortion an additional try to stop the command ring before we completely hose xhci. Tested-by: Vincent Pelletier Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman [lizf: Backported to 3.4: call handshake() instead of xhci_handshake()] Signed-off-by: Zefan Li commit 40dba0fdd43511ea6f24b432c39edb39479eff59 Author: Mathias Nyman Date: Mon Sep 21 17:46:09 2015 +0300 usb: Use the USB_SS_MULT() macro to get the burst multiplier. commit ff30cbc8da425754e8ab96904db1d295bd034f27 upstream. Bits 1:0 of the bmAttributes are used for the burst multiplier. The rest of the bits used to be reserved (zero), but USB3.1 takes bit 7 into use. Use the existing USB_SS_MULT() macro instead to make sure the mult value and hence max packet calculations are correct for USB3.1 devices. Note that burst multiplier in bmAttributes is zero based and that the USB_SS_MULT() macro adds one. Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman Signed-off-by: Zefan Li commit 7d148ce4451a12ca1b226373ccda950718154140 Author: Paolo Bonzini Date: Fri Sep 18 17:33:04 2015 +0200 KVM: x86: trap AMD MSRs for the TSeg base and mask commit 3afb1121800128aae9f5722e50097fcf1a9d4d88 upstream. These have roughly the same purpose as the SMRR, which we do not need to implement in KVM. However, Linux accesses MSR_K8_TSEG_ADDR at boot, which causes problems when running a Xen dom0 under KVM. Just return 0, meaning that processor protection of SMRAM is not in effect. Reported-by: M A Young Acked-by: Borislav Petkov Signed-off-by: Paolo Bonzini Signed-off-by: Zefan Li commit 42bffe1abdfe0543b54c0d29550e9a2bcd9edd34 Author: Mark Brown Date: Sat Sep 19 07:12:34 2015 -0700 regmap: debugfs: Don't bother actually printing when calculating max length commit 176fc2d5770a0990eebff903ba680d2edd32e718 upstream. The in kernel snprintf() will conveniently return the actual length of the printed string even if not given an output beffer at all so just do that rather than relying on the user to pass in a suitable buffer, ensuring that we don't need to worry if the buffer was truncated due to the size of the buffer passed in. Reported-by: Rasmus Villemoes Signed-off-by: Mark Brown Signed-off-by: Zefan Li commit f4524d728fa32ea451c6034a6dccba0159ecc153 Author: Mark Brown Date: Sat Sep 19 07:00:18 2015 -0700 regmap: debugfs: Ensure we don't underflow when printing access masks commit b763ec17ac762470eec5be8ebcc43e4f8b2c2b82 upstream. If a read is attempted which is smaller than the line length then we may underflow the subtraction we're doing with the unsigned size_t type so move some of the calculation to be additions on the right hand side instead in order to avoid this. Reported-by: Rasmus Villemoes Signed-off-by: Mark Brown Signed-off-by: Zefan Li commit 413e7340b4ceac04d90253c25472a221e2f529ff Author: Jeff Mahoney Date: Fri Sep 11 21:44:17 2015 -0400 btrfs: skip waiting on ordered range for special files commit a30e577c96f59b1e1678ea5462432b09bf7d5cbc upstream. In btrfs_evict_inode, we properly truncate the page cache for evicted inodes but then we call btrfs_wait_ordered_range for every inode as well. It's the right thing to do for regular files but results in incorrect behavior for device inodes for block devices. filemap_fdatawrite_range gets called with inode->i_mapping which gets resolved to the block device inode before getting passed to wbc_attach_fdatawrite_inode and ultimately to inode_to_bdi. What happens next depends on whether there's an open file handle associated with the inode. If there is, we write to the block device, which is unexpected behavior. If there isn't, we through normally and inode->i_data is used. We can also end up racing against open/close which can result in crashes when i_mapping points to a block device inode that has been closed. Since there can't be any page cache associated with special file inodes, it's safe to skip the btrfs_wait_ordered_range call entirely and avoid the problem. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=100911 Tested-by: Christoph Biedl Signed-off-by: Jeff Mahoney Reviewed-by: Filipe Manana Signed-off-by: Zefan Li commit 557b53d8d4c7dbcc0615c2eeace00fd91bd2a17f Author: Guenter Roeck Date: Sun Sep 6 01:46:54 2015 +0300 spi: Fix documentation of spi_alloc_master() commit a394d635193b641f2c86ead5ada5b115d57c51f8 upstream. Actually, spi_master_put() after spi_alloc_master() must _not_ be followed by kfree(). The memory is already freed with the call to spi_master_put() through spi_master_class, which registers a release function. Calling both spi_master_put() and kfree() results in often nasty (and delayed) crashes elsewhere in the kernel, often in the networking stack. This reverts commit eb4af0f5349235df2e4a5057a72fc8962d00308a. Link to patch and concerns: https://lkml.org/lkml/2012/9/3/269 or http://lkml.iu.edu/hypermail/linux/kernel/1209.0/00790.html Alexey Klimov: This revert becomes valid after 94c69f765f1b4a658d96905ec59928e3e3e07e6a when spi-imx.c has been fixed and there is no need to call kfree() so comment for spi_alloc_master() should be fixed. Signed-off-by: Guenter Roeck Signed-off-by: Alexey Klimov Signed-off-by: Mark Brown Signed-off-by: Zefan Li commit 47d7e7e7c22274dfaba2e25d4a8baff86989757c Author: Tan, Jui Nee Date: Tue Sep 1 10:22:51 2015 +0800 spi: spi-pxa2xx: Check status register to determine if SSSR_TINT is disabled commit 02bc933ebb59208f42c2e6305b2c17fd306f695d upstream. On Intel Baytrail, there is case when interrupt handler get called, no SPI message is captured. The RX FIFO is indeed empty when RX timeout pending interrupt (SSSR_TINT) happens. Use the BIOS version where both HSUART and SPI are on the same IRQ. Both drivers are using IRQF_SHARED when calling the request_irq function. When running two separate and independent SPI and HSUART application that generate data traffic on both components, user will see messages like below on the console: pxa2xx-spi pxa2xx-spi.0: bad message state in interrupt handler This commit will fix this by first checking Receiver Time-out Interrupt, if it is disabled, ignore the request and return without servicing. Signed-off-by: Tan, Jui Nee Acked-by: Jarkko Nikula Signed-off-by: Mark Brown Signed-off-by: Zefan Li commit 2f5b9f275560942c8c1a56270bb1d7c8cbd885df Author: Dan Carpenter Date: Thu Oct 29 16:37:54 2015 +0300 drm: crtc: integer overflow in drm_property_create_blob() commit 9ac0934bbe52290e4e4c2a58ec41cab9b6ca8c96 upstream. The size here comes from the user via the ioctl, it is a number between 1-u32max so the addition here could overflow on 32 bit systems. Fixes: f453ba046074 ('DRM: add mode setting support') Signed-off-by: Dan Carpenter Reviewed-by: Daniel Stone Signed-off-by: Dave Airlie [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li commit 6126604d3fafa03231cadbdbef2d8cd5faa00085 Author: NeilBrown Date: Sat Oct 24 16:02:16 2015 +1100 md/raid1: don't clear bitmap bit when bad-block-list write fails. commit bd8688a199b864944bf62eebed0ca13b46249453 upstream. When a write fails and a bad-block-list is present, we can update the bad-block-list instead of writing the data. If this succeeds then it is OK clear the relevant bitmap-bit as no further 'sync' of the block is needed. However if writing the bad-block-list fails then we need to treat the write as failed and particularly must not clear the bitmap bit. Otherwise the device can be re-added (after any hardware connection issues are resolved) and because the relevant bit in the bitmap is clear, that block will not be resynced. This leads to data corruption. We already delay the final bio_endio() on the write until the bad-block-list is written so that when the write returns: either that data is safe, the bad-block record is safe, or the fact that the device is faulty is safe. However we *don't* delay the clearing of the bitmap, so the bitmap bit can be recorded as cleared before we know if the bad-block-list was written safely. So: delay that until the write really is safe. i.e. move the call to close_write() until just before calling bio_endio(), and recheck the 'is array degraded' status before making that call. This bug goes back to v3.1 when bad-block-lists were introduced, though it only affects arrays created with mdadm-3.3 or later as only those have bad-block lists. Backports will require at least Commit: 55ce74d4bfe1 ("md/raid1: ensure device failure recorded before write request returns.") as well. I'll send that to 'stable' separately. Note that of the two tests of R1BIO_WriteError that this patch adds, the first is certain to fail and the second is certain to succeed. However doing it this way makes the patch more obviously correct. I will tidy the code up in a future merge window. Reported-and-tested-by: Nate Dailey Cc: Jes Sorensen Fixes: cd5ff9a16f08 ("md/raid1: Handle write errors by updating badblock log.") Signed-off-by: NeilBrown Signed-off-by: Zefan Li commit f6b1d7cb981875eb995795eb9987be6d7825099e Author: NeilBrown Date: Fri Aug 14 11:11:10 2015 +1000 md/raid1: ensure device failure recorded before write request returns. commit 55ce74d4bfe1b9444436264c637f39a152d1e5ac upstream. When a write to one of the legs of a RAID1 fails, the failure is recorded in the metadata of the other leg(s) so that after a restart the data on the failed drive wont be trusted even if that drive seems to be working again (maybe a cable was unplugged). Similarly when we record a bad-block in response to a write failure, we must not let the write complete until the bad-block update is safe. Currently there is no interlock between the write request completing and the metadata update. So it is possible that the write will complete, the app will confirm success in some way, and then the machine will crash before the metadata update completes. This is an extremely small hole for a racy to fit in, but it is theoretically possible and so should be closed. So: - set MD_CHANGE_PENDING when requesting a metadata update for a failed device, so we can know with certainty when it completes - queue requests that experienced an error on a new queue which is only processed after the metadata update completes - call raid_end_bio_io() on bios in that queue when the time comes. Signed-off-by: NeilBrown Signed-off-by: Zefan Li commit 0570dab32ddd0f0c7db5ccd025a1597fffb9e464 Author: NeilBrown Date: Sat Oct 24 16:23:48 2015 +1100 md/raid10: don't clear bitmap bit when bad-block-list write fails. commit c340702ca26a628832fade4f133d8160a55c29cc upstream. When a write fails and a bad-block-list is present, we can update the bad-block-list instead of writing the data. If this succeeds then it is OK clear the relevant bitmap-bit as no further 'sync' of the block is needed. However if writing the bad-block-list fails then we need to treat the write as failed and particularly must not clear the bitmap bit. Otherwise the device can be re-added (after any hardware connection issues are resolved) and because the relevant bit in the bitmap is clear, that block will not be resynced. This leads to data corruption. We already delay the final bio_endio() on the write until the bad-block-list is written so that when the write returns: either that data is safe, the bad-block record is safe, or the fact that the device is faulty is safe. However we *don't* delay the clearing of the bitmap, so the bitmap bit can be recorded as cleared before we know if the bad-block-list was written safely. So: delay that until the write really is safe. i.e. move the call to close_write() until just before calling bio_endio(), and recheck the 'is array degraded' status before making that call. This bug goes back to v3.1 when bad-block-lists were introduced, though it only affects arrays created with mdadm-3.3 or later as only those have bad-block lists. Backports will require at least Commit: 95af587e95aa ("md/raid10: ensure device failure recorded before write request returns.") as well. I'll send that to 'stable' separately. Note that of the two tests of R10BIO_WriteError that this patch adds, the first is certain to fail and the second is certain to succeed. However doing it this way makes the patch more obviously correct. I will tidy the code up in a future merge window. Reported-by: Nate Dailey Fixes: bd870a16c594 ("md/raid10: Handle write errors by updating badblock log.") Signed-off-by: NeilBrown Signed-off-by: Zefan Li commit 43bf02bac43bd2697f9e20c0feee434ba6ebe9db Author: NeilBrown Date: Fri Aug 14 11:26:17 2015 +1000 md/raid10: ensure device failure recorded before write request returns. commit 95af587e95aacb9cfda4a9641069a5244a540dc8 upstream. When a write to one of the legs of a RAID10 fails, the failure is recorded in the metadata of the other legs so that after a restart the data on the failed drive wont be trusted even if that drive seems to be working again (maybe a cable was unplugged). Currently there is no interlock between the write request completing and the metadata update. So it is possible that the write will complete, the app will confirm success in some way, and then the machine will crash before the metadata update completes. This is an extremely small hole for a racy to fit in, but it is theoretically possible and so should be closed. So: - set MD_CHANGE_PENDING when requesting a metadata update for a failed device, so we can know with certainty when it completes - queue requests that experienced an error on a new queue which is only processed after the metadata update completes - call raid_end_bio_io() on bios in that queue when the time comes. Signed-off-by: NeilBrown [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li commit 894f53c9eceea3dd9af3197e1419538ce05b3ffa Author: Vasant Hegde Date: Fri Oct 16 15:53:29 2015 +0530 powerpc/rtas: Validate rtas.entry before calling enter_rtas() commit 8832317f662c06f5c06e638f57bfe89a71c9b266 upstream. Currently we do not validate rtas.entry before calling enter_rtas(). This leads to a kernel oops when user space calls rtas system call on a powernv platform (see below). This patch adds code to validate rtas.entry before making enter_rtas() call. Oops: Exception in kernel mode, sig: 4 [#1] SMP NR_CPUS=1024 NUMA PowerNV task: c000000004294b80 ti: c0000007e1a78000 task.ti: c0000007e1a78000 NIP: 0000000000000000 LR: 0000000000009c14 CTR: c000000000423140 REGS: c0000007e1a7b920 TRAP: 0e40 Not tainted (3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le) MSR: 1000000000081000 CR: 00000000 XER: 00000000 CFAR: c000000000009c0c SOFTE: 0 NIP [0000000000000000] (null) LR [0000000000009c14] 0x9c14 Call Trace: [c0000007e1a7bba0] [c00000000041a7f4] avc_has_perm_noaudit+0x54/0x110 (unreliable) [c0000007e1a7bd80] [c00000000002ddc0] ppc_rtas+0x150/0x2d0 [c0000007e1a7be30] [c000000000009358] syscall_exit+0x0/0x98 Fixes: 55190f88789a ("powerpc: Add skeleton PowerNV platform") Reported-by: NAGESWARA R. SASTRY Signed-off-by: Vasant Hegde [mpe: Reword change log, trim oops, and add stable + fixes] Signed-off-by: Michael Ellerman Signed-off-by: Zefan Li commit 7abd07f2a328030ae7d68c19f0facf59121bc647 Author: Doron Tsur Date: Sun Oct 11 15:58:17 2015 +0300 IB/cm: Fix rb-tree duplicate free and use-after-free commit 0ca81a2840f77855bbad1b9f172c545c4dc9e6a4 upstream. ib_send_cm_sidr_rep could sometimes erase the node from the sidr (depending on errors in the process). Since ib_send_cm_sidr_rep is called both from cm_sidr_req_handler and cm_destroy_id, cm_id_priv could be either erased from the rb_tree twice or not erased at all. Fixing that by making sure it's erased only once before freeing cm_id_priv. Fixes: a977049dacde ('[PATCH] IB: Add the kernel CM implementation') Signed-off-by: Doron Tsur Signed-off-by: Matan Barak Signed-off-by: Doug Ledford Signed-off-by: Zefan Li commit a12321d34f35fb8eacb3f39d1d53eb8d6e52fa8a Author: Peter Zijlstra Date: Tue Sep 29 14:45:09 2015 +0200 sched/core: Fix TASK_DEAD race in finish_task_switch() commit 95913d97914f44db2b81271c2e2ebd4d2ac2df83 upstream. So the problem this patch is trying to address is as follows: CPU0 CPU1 context_switch(A, B) ttwu(A) LOCK A->pi_lock A->on_cpu == 0 finish_task_switch(A) prev_state = A->state <-. WMB | A->on_cpu = 0; | UNLOCK rq0->lock | | context_switch(C, A) `-- A->state = TASK_DEAD prev_state == TASK_DEAD put_task_struct(A) context_switch(A, C) finish_task_switch(A) A->state == TASK_DEAD put_task_struct(A) The argument being that the WMB will allow the load of A->state on CPU0 to cross over and observe CPU1's store of A->state, which will then result in a double-drop and use-after-free. Now the comment states (and this was true once upon a long time ago) that we need to observe A->state while holding rq->lock because that will order us against the wakeup; however the wakeup will not in fact acquire (that) rq->lock; it takes A->pi_lock these days. We can obviously fix this by upgrading the WMB to an MB, but that is expensive, so we'd rather avoid that. The alternative this patch takes is: smp_store_release(&A->on_cpu, 0), which avoids the MB on some archs, but not important ones like ARM. Reported-by: Oleg Nesterov Signed-off-by: Peter Zijlstra (Intel) Acked-by: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Cc: manfred@colorfullife.com Cc: will.deacon@arm.com Fixes: e4a52bcb9a18 ("sched: Remove rq->lock from the first half of ttwu()") Link: http://lkml.kernel.org/r/20150929124509.GG3816@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar [lizf: Backported to 3.4: use smb_mb() instead of smp_store_release(), which is not defined in 3.4.y] Signed-off-by: Zefan Li commit c2acc6aa8577494fe6e8830922b4cabe956fdd20 Author: Johannes Berg Date: Tue Sep 15 14:36:09 2015 +0200 iwlwifi: dvm: fix D3 firmware PN programming commit 5bd166872d8f99f156fac191299d24f828bb2348 upstream. The code to send the RX PN data (for each TID) to the firmware has a devastating bug: it overwrites the data for TID 0 with all the TID data, leaving the remaining TIDs zeroed. This will allow replays to actually be accepted by the firmware, which could allow waking up the system. Signed-off-by: Johannes Berg Signed-off-by: Luca Coelho [lizf: Backported to 3.4: adjust filename] Signed-off-by: Zefan Li commit 55555bf1c6f353b84a23e681f3b487152f730926 Author: NeilBrown Date: Thu Sep 24 15:47:47 2015 +1000 md/raid0: apply base queue limits *before* disk_stack_limits commit 66eefe5de11db1e0d8f2edc3880d50e7c36a9d43 upstream. Calling e.g. blk_queue_max_hw_sectors() after calls to disk_stack_limits() discards the settings determined by disk_stack_limits(). So we need to make those calls first. Fixes: 199dc6ed5179 ("md/raid0: update queue parameter in a safer location.") Reported-by: Jes Sorensen Signed-off-by: NeilBrown Signed-off-by: Zefan Li commit 65cee714454e3210a673ba4b33f22fac3af77021 Author: James Hogan Date: Fri Mar 27 08:33:43 2015 +0000 MIPS: dma-default: Fix 32-bit fall back to GFP_DMA commit 53960059d56ecef67d4ddd546731623641a3d2d1 upstream. If there is a DMA zone (usually 24bit = 16MB I believe), but no DMA32 zone, as is the case for some 32-bit kernels, then massage_gfp_flags() will cause DMA memory allocated for devices with a 32..63-bit coherent_dma_mask to fall back to using __GFP_DMA, even though there may only be 32-bits of physical address available anyway. Correct that case to compare against a mask the size of phys_addr_t instead of always using a 64-bit mask. Signed-off-by: James Hogan Fixes: a2e715a86c6d ("MIPS: DMA: Fix computation of DMA flags from device's coherent_dma_mask.") Cc: Ralf Baechle Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/9610/ Signed-off-by: Ralf Baechle Signed-off-by: Zefan Li commit f9c715eeb2a73e61005f0f12caed9cf3a0e9dacd Author: Robert Jarzmik Date: Tue Sep 15 20:51:31 2015 +0200 ASoC: fix broken pxa SoC support commit 3c8f7710c1c44fb650bc29b6ef78ed8b60cfaa28 upstream. The previous fix of pxa library support, which was introduced to fix the library dependency, broke the previous SoC behavior, where a machine code binding pxa2xx-ac97 with a coded relied on : - sound/soc/pxa/pxa2xx-ac97.c - sound/soc/codecs/XXX.c For example, the mioa701_wm9713.c machine code is currently broken. The "select ARM" statement wrongly selects the soc/arm/pxa2xx-ac97 for compilation, as per an unfortunate fate SND_PXA2XX_AC97 is both declared in sound/arm/Kconfig and sound/soc/pxa/Kconfig. Fix this by ensuring that SND_PXA2XX_SOC correctly triggers the correct pxa2xx-ac97 compilation. Fixes: 846172dfe33c ("ASoC: fix SND_PXA2XX_LIB Kconfig warning") Signed-off-by: Robert Jarzmik Signed-off-by: Mark Brown Signed-off-by: Zefan Li commit 244e0dbc02ee3af070bd100a840bd112b769e35a Author: Herbert Xu Date: Fri Sep 4 13:21:06 2015 +0800 ipv6: Fix IPsec pre-encap fragmentation check commit 93efac3f2e03321129de67a3c0ba53048bb53e31 upstream. The IPv6 IPsec pre-encap path performs fragmentation for tunnel-mode packets. That is, we perform fragmentation pre-encap rather than post-encap. A check was added later to ensure that proper MTU information is passed back for locally generated traffic. Unfortunately this check was performed on all IPsec packets, including transport-mode packets. What's more, the check failed to take GSO into account. The end result is that transport-mode GSO packets get dropped at the check. This patch fixes it by moving the tunnel mode check forward as well as adding the GSO check. Fixes: dd767856a36e ("xfrm6: Don't call icmpv6_send on local error") Signed-off-by: Herbert Xu Signed-off-by: Steffen Klassert [lizf: Backported to 3.4: - adjust context - s/ignore_df/local_df] Signed-off-by: Zefan Li commit d8776fffaeff0ae3ecf78b33515f1a8624f78ace Author: Peter Zijlstra Date: Thu Aug 20 10:34:59 2015 +0930 module: Fix locking in symbol_put_addr() commit 275d7d44d802ef271a42dc87ac091a495ba72fc5 upstream. Poma (on the way to another bug) reported an assertion triggering: [] module_assert_mutex_or_preempt+0x49/0x90 [] __module_address+0x32/0x150 [] __module_text_address+0x16/0x70 [] symbol_put_addr+0x29/0x40 [] dvb_frontend_detach+0x7d/0x90 [dvb_core] Laura Abbott produced a patch which lead us to inspect symbol_put_addr(). This function has a comment claiming it doesn't need to disable preemption around the module lookup because it holds a reference to the module it wants to find, which therefore cannot go away. This is wrong (and a false optimization too, preempt_disable() is really rather cheap, and I doubt any of this is on uber critical paths, otherwise it would've retained a pointer to the actual module anyway and avoided the second lookup). While its true that the module cannot go away while we hold a reference on it, the data structure we do the lookup in very much _CAN_ change while we do the lookup. Therefore fix the comment and add the required preempt_disable(). Reported-by: poma Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Rusty Russell Fixes: a6e6abd575fc ("module: remove module_text_address()") Signed-off-by: Zefan Li commit 2bba66d6ae0f8b4c6fd7b7010437d29c0ecfff0a Author: David Woodhouse Date: Wed Sep 16 14:10:03 2015 +0100 x86/platform: Fix Geode LX timekeeping in the generic x86 build commit 03da3ff1cfcd7774c8780d2547ba0d995f7dc03d upstream. In 2007, commit 07190a08eef36 ("Mark TSC on GeodeLX reliable") bypassed verification of the TSC on Geode LX. However, this code (now in the check_system_tsc_reliable() function in arch/x86/kernel/tsc.c) was only present if CONFIG_MGEODE_LX was set. OpenWRT has recently started building its generic Geode target for Geode GX, not LX, to include support for additional platforms. This broke the timekeeping on LX-based devices, because the TSC wasn't marked as reliable: https://dev.openwrt.org/ticket/20531 By adding a runtime check on is_geode_lx(), we can also include the fix if CONFIG_MGEODEGX1 or CONFIG_X86_GENERIC are set, thus fixing the problem. Signed-off-by: David Woodhouse Cc: Andres Salomon Cc: Linus Torvalds Cc: Marcelo Tosatti Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1442409003.131189.87.camel@infradead.org Signed-off-by: Ingo Molnar Signed-off-by: Zefan Li commit 31a526445157a00b43d5018b1fdc2b8aa6c84c78 Author: Russell King Date: Fri Sep 11 16:44:02 2015 +0100 ARM: fix Thumb2 signal handling when ARMv6 is enabled commit 9b55613f42e8d40d5c9ccb8970bde6af4764b2ab upstream. When a kernel is built covering ARMv6 to ARMv7, we omit to clear the IT state when entering a signal handler. This can cause the first few instructions to be conditionally executed depending on the parent context. In any case, the original test for >= ARMv7 is broken - ARMv6 can have Thumb-2 support as well, and an ARMv6T2 specific build would omit this code too. Relax the test back to ARMv6 or greater. This results in us always clearing the IT state bits in the PSR, even on CPUs where these bits are reserved. However, they're reserved for the IT state, so this should cause no harm. Fixes: d71e1352e240 ("Clear the IT state when invoking a Thumb-2 signal handler") Acked-by: Tony Lindgren Tested-by: H. Nikolaus Schaller Tested-by: Grazvydas Ignotas Signed-off-by: Russell King Signed-off-by: Zefan Li commit b2e30785526c7e1b7bba74d5c50784cfcfe0bc21 Author: T.J. Purtell Date: Wed Nov 6 18:38:05 2013 +0100 ARM: 7880/1: Clear the IT state independent of the Thumb-2 mode commit 6ecf830e5029598732e04067e325d946097519cb upstream. The ARM architecture reference specifies that the IT state bits in the PSR must be all zeros in ARM mode or behavior is unspecified. On the Qualcomm Snapdragon S4/Krait architecture CPUs the processor continues to consider the IT state bits while in ARM mode. This makes it so that some instructions are skipped by the CPU. Signed-off-by: T.J. Purtell [rmk+kernel@arm.linux.org.uk: fixed whitespace formatting in patch] Signed-off-by: Russell King Signed-off-by: Zefan Li commit 98e57bab3f696bf9642899b57ed733d122f3ed4e Author: Arnaldo Carvalho de Melo Date: Fri Sep 11 12:36:12 2015 -0300 perf header: Fixup reading of HEADER_NRCPUS feature commit caa470475d9b59eeff093ae650800d34612c4379 upstream. The original patch introducing this header wrote the number of CPUs available and online in one order and then swapped those values when reading, fix it. Before: # perf record usleep 1 # perf report --header-only | grep 'nrcpus \(online\|avail\)' # nrcpus online : 4 # nrcpus avail : 4 # echo 0 > /sys/devices/system/cpu/cpu2/online # perf record usleep 1 # perf report --header-only | grep 'nrcpus \(online\|avail\)' # nrcpus online : 4 # nrcpus avail : 3 # echo 0 > /sys/devices/system/cpu/cpu1/online # perf record usleep 1 # perf report --header-only | grep 'nrcpus \(online\|avail\)' # nrcpus online : 4 # nrcpus avail : 2 After the fix, bringing back the CPUs online: # perf report --header-only | grep 'nrcpus \(online\|avail\)' # nrcpus online : 2 # nrcpus avail : 4 # echo 1 > /sys/devices/system/cpu/cpu2/online # perf record usleep 1 # perf report --header-only | grep 'nrcpus \(online\|avail\)' # nrcpus online : 3 # nrcpus avail : 4 # echo 1 > /sys/devices/system/cpu/cpu1/online # perf record usleep 1 # perf report --header-only | grep 'nrcpus \(online\|avail\)' # nrcpus online : 4 # nrcpus avail : 4 Acked-by: Namhyung Kim Cc: Adrian Hunter Cc: Borislav Petkov Cc: David Ahern Cc: Frederic Weisbecker Cc: Jiri Olsa Cc: Kan Liang Cc: Stephane Eranian Cc: Wang Nan Fixes: fbe96f29ce4b ("perf tools: Make perf.data more self-descriptive (v8)") Link: http://lkml.kernel.org/r/20150911153323.GP23511@kernel.org Signed-off-by: Arnaldo Carvalho de Melo [lizf: Backported to 3.4: fix it by saving values in an array and then print it in reverse order] Signed-off-by: Zefan Li commit b834fc16378b11a3d753c6299dd2e30a9a52f550 Author: Paul Mackerras Date: Thu Sep 10 14:36:21 2015 +1000 powerpc/MSI: Fix race condition in tearing down MSI interrupts commit e297c939b745e420ef0b9dc989cb87bda617b399 upstream. This fixes a race which can result in the same virtual IRQ number being assigned to two different MSI interrupts. The most visible consequence of that is usually a warning and stack trace from the sysfs code about an attempt to create a duplicate entry in sysfs. The race happens when one CPU (say CPU 0) is disposing of an MSI while another CPU (say CPU 1) is setting up an MSI. CPU 0 calls (for example) pnv_teardown_msi_irqs(), which calls msi_bitmap_free_hwirqs() to indicate that the MSI (i.e. its hardware IRQ number) is no longer in use. Then, before CPU 0 gets to calling irq_dispose_mapping() to free up the virtal IRQ number, CPU 1 comes in and calls msi_bitmap_alloc_hwirqs() to allocate an MSI, and gets the same hardware IRQ number that CPU 0 just freed. CPU 1 then calls irq_create_mapping() to get a virtual IRQ number, which sees that there is currently a mapping for that hardware IRQ number and returns the corresponding virtual IRQ number (which is the same virtual IRQ number that CPU 0 was using). CPU 0 then calls irq_dispose_mapping() and frees that virtual IRQ number. Now, if another CPU comes along and calls irq_create_mapping(), it is likely to get the virtual IRQ number that was just freed, resulting in the same virtual IRQ number apparently being used for two different hardware interrupts. To fix this race, we just move the call to msi_bitmap_free_hwirqs() to after the call to irq_dispose_mapping(). Since virq_to_hw() doesn't work for the virtual IRQ number after irq_dispose_mapping() has been called, we need to call it before irq_dispose_mapping() and remember the result for the msi_bitmap_free_hwirqs() call. The pattern of calling msi_bitmap_free_hwirqs() before irq_dispose_mapping() appears in 5 places under arch/powerpc, and appears to have originated in commit 05af7bd2d75e ("[POWERPC] MPIC U3/U4 MSI backend") from 2007. Fixes: 05af7bd2d75e ("[POWERPC] MPIC U3/U4 MSI backend") Reported-by: Alexey Kardashevskiy Signed-off-by: Paul Mackerras Signed-off-by: Michael Ellerman [bwh: Backported to 3.2: - powernv uses a private functions instead of msi_bitmap_free_hwirqs() - Adjust filename, context] Signed-off-by: Ben Hutchings Signed-off-by: Zefan Li commit 59463bb2d1c003c1b6618c74ad319996f732d82d Author: Ard Biesheuvel Date: Thu Sep 3 13:24:40 2015 +0100 ARM: 8429/1: disable GCC SRA optimization commit a077224fd35b2f7fbc93f14cf67074fc792fbac2 upstream. While working on the 32-bit ARM port of UEFI, I noticed a strange corruption in the kernel log. The following snprintf() statement (in drivers/firmware/efi/efi.c:efi_md_typeattr_format()) snprintf(pos, size, "|%3s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]", was producing the following output in the log: | | | | | |WB|WT|WC|UC] | | | | | |WB|WT|WC|UC] | | | | | |WB|WT|WC|UC] |RUN| | | | |WB|WT|WC|UC]* |RUN| | | | |WB|WT|WC|UC]* | | | | | |WB|WT|WC|UC] |RUN| | | | |WB|WT|WC|UC]* | | | | | |WB|WT|WC|UC] |RUN| | | | | | | |UC] |RUN| | | | | | | |UC] As it turns out, this is caused by incorrect code being emitted for the string() function in lib/vsprintf.c. The following code if (!(spec.flags & LEFT)) { while (len < spec.field_width--) { if (buf < end) *buf = ' '; ++buf; } } for (i = 0; i < len; ++i) { if (buf < end) *buf = *s; ++buf; ++s; } while (len < spec.field_width--) { if (buf < end) *buf = ' '; ++buf; } when called with len == 0, triggers an issue in the GCC SRA optimization pass (Scalar Replacement of Aggregates), which handles promotion of signed struct members incorrectly. This is a known but as yet unresolved issue. (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65932). In this particular case, it is causing the second while loop to be executed erroneously a single time, causing the additional space characters to be printed. So disable the optimization by passing -fno-ipa-sra. Acked-by: Nicolas Pitre Signed-off-by: Ard Biesheuvel Signed-off-by: Russell King Signed-off-by: Zefan Li commit 3cd0ee55312d9e53ac6966d071c48e218c3a2a53 Author: Christoph Hellwig Date: Wed Sep 9 18:04:18 2015 +0200 scsi_dh: fix randconfig build error commit 294ab783ad98066b87296db1311c7ba2a60206a5 upstream. It looks like the Kconfig check that was meant to fix this (commit fe9233fb6914a0eb20166c967e3020f7f0fba2c9 [SCSI] scsi_dh: fix kconfig related build errors) was actually reversed, but no-one noticed until the new set of patches which separated DM and SCSI_DH). Fixes: fe9233fb6914a0eb20166c967e3020f7f0fba2c9 Signed-off-by: Christoph Hellwig Tested-by: Mike Snitzer Signed-off-by: James Bottomley Signed-off-by: Zefan Li commit 409172802372d568ab1d5460004e109f01abaa39 Author: Hin-Tak Leung Date: Wed Sep 9 15:38:07 2015 -0700 hfs: fix B-tree corruption after insertion at position 0 commit b4cc0efea4f0bfa2477c56af406cfcf3d3e58680 upstream. Fix B-tree corruption when a new record is inserted at position 0 in the node in hfs_brec_insert(). This is an identical change to the corresponding hfs b-tree code to Sergei Antonov's "hfsplus: fix B-tree corruption after insertion at position 0", to keep similar code paths in the hfs and hfsplus drivers in sync, where appropriate. Signed-off-by: Hin-Tak Leung Cc: Sergei Antonov Cc: Joe Perches Reviewed-by: Vyacheslav Dubeyko Cc: Anton Altaparmakov Cc: Al Viro Cc: Christoph Hellwig Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Zefan Li commit 6babbcb0fe14fc13033667281d1bb782f84cbd4d Author: Hin-Tak Leung Date: Wed Sep 9 15:38:04 2015 -0700 hfs,hfsplus: cache pages correctly between bnode_create and bnode_free commit 7cb74be6fd827e314f81df3c5889b87e4c87c569 upstream. Pages looked up by __hfs_bnode_create() (called by hfs_bnode_create() and hfs_bnode_find() for finding or creating pages corresponding to an inode) are immediately kmap()'ed and used (both read and write) and kunmap()'ed, and should not be page_cache_release()'ed until hfs_bnode_free(). This patch fixes a problem I first saw in July 2012: merely running "du" on a large hfsplus-mounted directory a few times on a reasonably loaded system would get the hfsplus driver all confused and complaining about B-tree inconsistencies, and generates a "BUG: Bad page state". Most recently, I can generate this problem on up-to-date Fedora 22 with shipped kernel 4.0.5, by running "du /" (="/" + "/home" + "/mnt" + other smaller mounts) and "du /mnt" simultaneously on two windows, where /mnt is a lightly-used QEMU VM image of the full Mac OS X 10.9: $ df -i / /home /mnt Filesystem Inodes IUsed IFree IUse% Mounted on /dev/mapper/fedora-root 3276800 551665 2725135 17% / /dev/mapper/fedora-home 52879360 716221 52163139 2% /home /dev/nbd0p2 4294967295 1387818 4293579477 1% /mnt After applying the patch, I was able to run "du /" (60+ times) and "du /mnt" (150+ times) continuously and simultaneously for 6+ hours. There are many reports of the hfsplus driver getting confused under load and generating "BUG: Bad page state" or other similar issues over the years. [1] The unpatched code [2] has always been wrong since it entered the kernel tree. The only reason why it gets away with it is that the kmap/memcpy/kunmap follow very quickly after the page_cache_release() so the kernel has not had a chance to reuse the memory for something else, most of the time. The current RW driver appears to have followed the design and development of the earlier read-only hfsplus driver [3], where-by version 0.1 (Dec 2001) had a B-tree node-centric approach to read_cache_page()/page_cache_release() per bnode_get()/bnode_put(), migrating towards version 0.2 (June 2002) of caching and releasing pages per inode extents. When the current RW code first entered the kernel [2] in 2005, there was an REF_PAGES conditional (and "//" commented out code) to switch between B-node centric paging to inode-centric paging. There was a mistake with the direction of one of the REF_PAGES conditionals in __hfs_bnode_create(). In a subsequent "remove debug code" commit [4], the read_cache_page()/page_cache_release() per bnode_get()/bnode_put() were removed, but a page_cache_release() was mistakenly left in (propagating the "REF_PAGES <-> !REF_PAGE" mistake), and the commented-out page_cache_release() in bnode_release() (which should be spanned by !REF_PAGES) was never enabled. References: [1]: Michael Fox, Apr 2013 http://www.spinics.net/lists/linux-fsdevel/msg63807.html ("hfsplus volume suddenly inaccessable after 'hfs: recoff %d too large'") Sasha Levin, Feb 2015 http://lkml.org/lkml/2015/2/20/85 ("use after free") https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887 https://bugzilla.kernel.org/show_bug.cgi?id=42342 https://bugzilla.kernel.org/show_bug.cgi?id=63841 https://bugzilla.kernel.org/show_bug.cgi?id=78761 [2]: http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\ fs/hfs/bnode.c?id=d1081202f1d0ee35ab0beb490da4b65d4bc763db commit d1081202f1d0ee35ab0beb490da4b65d4bc763db Author: Andrew Morton Date: Wed Feb 25 16:17:36 2004 -0800 [PATCH] HFS rewrite http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\ fs/hfsplus/bnode.c?id=91556682e0bf004d98a529bf829d339abb98bbbd commit 91556682e0bf004d98a529bf829d339abb98bbbd Author: Andrew Morton Date: Wed Feb 25 16:17:48 2004 -0800 [PATCH] HFS+ support [3]: http://sourceforge.net/projects/linux-hfsplus/ http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/ http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/ http://linux-hfsplus.cvs.sourceforge.net/viewvc/linux-hfsplus/linux/\ fs/hfsplus/bnode.c?r1=1.4&r2=1.5 Date: Thu Jun 6 09:45:14 2002 +0000 Use buffer cache instead of page cache in bnode.c. Cache inode extents. [4]: http://git.kernel.org/cgit/linux/kernel/git/\ stable/linux-stable.git/commit/?id=a5e3985fa014029eb6795664c704953720cc7f7d commit a5e3985fa014029eb6795664c704953720cc7f7d Author: Roman Zippel Date: Tue Sep 6 15:18:47 2005 -0700 [PATCH] hfs: remove debug code Signed-off-by: Hin-Tak Leung Signed-off-by: Sergei Antonov Reviewed-by: Anton Altaparmakov Reported-by: Sasha Levin Cc: Al Viro Cc: Christoph Hellwig Cc: Vyacheslav Dubeyko Cc: Sougata Santra Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Zefan Li commit b0cce01be5f58ed399fdfc8e1b0fbcd827a35aef Author: Kees Cook Date: Fri Sep 4 15:44:57 2015 -0700 fs: create and use seq_show_option for escaping commit a068acf2ee77693e0bf39d6e07139ba704f461c3 upstream. Many file systems that implement the show_options hook fail to correctly escape their output which could lead to unescaped characters (e.g. new lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This could lead to confusion, spoofed entries (resulting in things like systemd issuing false d-bus "mount" notifications), and who knows what else. This looks like it would only be the root user stepping on themselves, but it's possible weird things could happen in containers or in other situations with delegated mount privileges. Here's an example using overlay with setuid fusermount trusting the contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use of "sudo" is something more sneaky: $ BASE="ovl" $ MNT="$BASE/mnt" $ LOW="$BASE/lower" $ UP="$BASE/upper" $ WORK="$BASE/work/ 0 0 none /proc fuse.pwn user_id=1000" $ mkdir -p "$LOW" "$UP" "$WORK" $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt $ cat /proc/mounts none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0 none /proc fuse.pwn user_id=1000 0 0 $ fusermount -u /proc $ cat /proc/mounts cat: /proc/mounts: No such file or directory This fixes the problem by adding new seq_show_option and seq_show_option_n helpers, and updating the vulnerable show_option handlers to use them as needed. Some, like SELinux, need to be open coded due to unusual existing escape mechanisms. [akpm@linux-foundation.org: add lost chunk, per Kees] [keescook@chromium.org: seq_show_option should be using const parameters] Signed-off-by: Kees Cook Acked-by: Serge Hallyn Acked-by: Jan Kara Acked-by: Paul Moore Cc: J. R. Okajima Signed-off-by: Kees Cook Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [lizf: Backported to 3.4: - adjust context - one more place in ceph needs to be changed - drop changes to overlayfs - drop showing vers in cifs] Signed-off-by: Zefan Li commit 7646c507f1ad8bd14a3196f84b0eabb229dafb75 Author: Andrey Ryabinin Date: Thu Sep 3 14:32:01 2015 +0300 crypto: ghash-clmulni: specify context size for ghash async algorithm commit 71c6da846be478a61556717ef1ee1cea91f5d6a8 upstream. Currently context size (cra_ctxsize) doesn't specified for ghash_async_alg. Which means it's zero. Thus crypto_create_tfm() doesn't allocate needed space for ghash_async_ctx, so any read/write to ctx (e.g. in ghash_async_init_tfm()) is not valid. Signed-off-by: Andrey Ryabinin Signed-off-by: Herbert Xu Signed-off-by: Zefan Li commit 7ac8dba7a19bd98ce6839f7d980464bbc6abd996 Author: Mikulas Patocka Date: Wed Sep 2 22:51:53 2015 +0200 hpfs: update ctime and mtime on directory modification commit f49a26e7718dd30b49e3541e3e25aecf5e7294e2 upstream. Update ctime and mtime when a directory is modified. (though OS/2 doesn't update them anyway) Signed-off-by: Mikulas Patocka Signed-off-by: Linus Torvalds Signed-off-by: Zefan Li commit d86e11470198db9ef87fad02a24968c7f9072e89 Author: Christoph Hellwig Date: Wed Aug 26 11:00:37 2015 +0200 IB/uverbs: reject invalid or unknown opcodes commit b632ffa7cee439ba5dce3b3bc4a5cbe2b3e20133 upstream. We have many WR opcodes that are only supported in kernel space and/or require optional information to be copied into the WR structure. Reject all those not explicitly handled so that we can't pass invalid information to drivers. Signed-off-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Reviewed-by: Sagi Grimberg Signed-off-by: Doug Ledford Signed-off-by: Zefan Li commit 6488ee2fb553a54c6679de09e0572d3932f00333 Author: Jeffery Miller Date: Tue Sep 1 11:23:02 2015 -0400 Add radeon suspend/resume quirk for HP Compaq dc5750. commit 09bfda10e6efd7b65bcc29237bee1765ed779657 upstream. With the radeon driver loaded the HP Compaq dc5750 Small Form Factor machine fails to resume from suspend. Adding a quirk similar to other devices avoids the problem and the system resumes properly. Signed-off-by: Jeffery Miller Signed-off-by: Alex Deucher Signed-off-by: Zefan Li commit 88ebf1a8b3cd478d73a9731dc2276cd6a670b3af Author: Yishai Hadas Date: Thu Aug 13 18:32:03 2015 +0300 IB/uverbs: Fix race between ib_uverbs_open and remove_one commit 35d4a0b63dc0c6d1177d4f532a9deae958f0662c upstream. Fixes: 2a72f212263701b927559f6850446421d5906c41 ("IB/uverbs: Remove dev_table") Before this commit there was a device look-up table that was protected by a spin_lock used by ib_uverbs_open and by ib_uverbs_remove_one. When it was dropped and container_of was used instead, it enabled the race with remove_one as dev might be freed just after: dev = container_of(inode->i_cdev, struct ib_uverbs_device, cdev) but before the kref_get. In addition, this buggy patch added some dead code as container_of(x,y,z) can never be NULL and so dev can never be NULL. As a result the comment above ib_uverbs_open saying "the open method will either immediately run -ENXIO" is wrong as it can never happen. The solution follows Jason Gunthorpe suggestion from below URL: https://www.mail-archive.com/linux-rdma@vger.kernel.org/msg25692.html cdev will hold a kref on the parent (the containing structure, ib_uverbs_device) and only when that kref is released it is guaranteed that open will never be called again. In addition, fixes the active count scheme to use an atomic not a kref to prevent WARN_ON as pointed by above comment from Jason. Signed-off-by: Yishai Hadas Signed-off-by: Shachar Raindel Reviewed-by: Jason Gunthorpe Signed-off-by: Doug Ledford Signed-off-by: Zefan Li commit e8dae252c50857f01f1916cf9296542c91133d73 Author: Noa Osherovich Date: Thu Jul 30 17:34:24 2015 +0300 IB/mlx4: Use correct SL on AH query under RoCE commit 5e99b139f1b68acd65e36515ca347b03856dfb5a upstream. The mlx4 IB driver implementation for ib_query_ah used a wrong offset (28 instead of 29) when link type is Ethernet. Fixed to use the correct one. Fixes: fa417f7b520e ('IB/mlx4: Add support for IBoE') Signed-off-by: Shani Michaeli Signed-off-by: Noa Osherovich Signed-off-by: Or Gerlitz Signed-off-by: Doug Ledford Signed-off-by: Zefan Li commit 84c2e63639a79e81cbe82e9f7c5a958cc39f0abd Author: Trond Myklebust Date: Sat Aug 29 13:36:30 2015 -0700 SUNRPC: xs_reset_transport must mark the connection as disconnected commit 0c78789e3a030615c6650fde89546cadf40ec2cc upstream. In case the reconnection attempt fails. Signed-off-by: Trond Myklebust [lizf: Backported to 3.4: add definition of variable xprt] Signed-off-by: Zefan Li commit 5b59369813efb1bb78fc360546a2b657cb957fe1 Author: Grant Likely Date: Sun Jun 7 15:20:11 2015 +0100 drivercore: Fix unregistration path of platform devices commit 7f5dcaf1fdf289767a126a0a5cc3ef39b5254b06 upstream. The unregister path of platform_device is broken. On registration, it will register all resources with either a parent already set, or type==IORESOURCE_{IO,MEM}. However, on unregister it will release everything with type==IORESOURCE_{IO,MEM}, but ignore the others. There are also cases where resources don't get registered in the first place, like with devices created by of_platform_populate()*. Fix the unregister path to be symmetrical with the register path by checking the parent pointer instead of the type field to decide which resources to unregister. This is safe because the upshot of the registration path algorithm is that registered resources have a parent pointer, and non-registered resources do not. * It can be argued that of_platform_populate() should be registering it's resources, and they argument has some merit. However, there are quite a few platforms that end up broken if we try to do that due to overlapping resources in the device tree. Until that is fixed, we need to solve the immediate problem. Cc: Pantelis Antoniou Cc: Wolfram Sang Cc: Rob Herring Cc: Greg Kroah-Hartman Cc: Ricardo Ribalda Delgado Signed-off-by: Grant Likely Tested-by: Ricardo Ribalda Delgado Tested-by: Wolfram Sang Signed-off-by: Rob Herring Signed-off-by: Zefan Li commit ec62ecdcf609f00b3de511492f305cff47b41f98 Author: David Daney Date: Wed Aug 19 13:17:47 2015 -0700 of/address: Don't loop forever in of_find_matching_node_by_address(). commit 3a496b00b6f90c41bd21a410871dfc97d4f3c7ab upstream. If the internal call to of_address_to_resource() fails, we end up looping forever in of_find_matching_node_by_address(). This can be caused by a defective device tree, or calling with an incorrect matches argument. Fix by calling of_find_matching_node() unconditionally at the end of the loop. Signed-off-by: David Daney Signed-off-by: Rob Herring Signed-off-by: Zefan Li commit 470940f18efa5a536385d8b85a6f9175a8e4198d Author: Stephen Chandler Paul Date: Fri Aug 21 14:16:12 2015 -0400 DRM - radeon: Don't link train DisplayPort on HPD until we get the dpcd commit 924f92bf12bfbef3662619e3ed24a1cea7c1cbcd upstream. Most of the time this isn't an issue since hotplugging an adaptor will trigger a crtc mode change which in turn, causes the driver to probe every DisplayPort for a dpcd. However, in cases where hotplugging doesn't cause a mode change (specifically when one unplugs a monitor from a DisplayPort connector, then plugs that same monitor back in seconds later on the same port without any other monitors connected), we never probe for the dpcd before starting the initial link training. What happens from there looks like this: - GPU has only one monitor connected. It's connected via DisplayPort, and does not go through an adaptor of any sort. - User unplugs DisplayPort connector from GPU. - Change in HPD is detected by the driver, we probe every DisplayPort for a possible connection. - Probe the port the user originally had the monitor connected on for it's dpcd. This fails, and we clear the first (and only the first) byte of the dpcd to indicate we no longer have a dpcd for this port. - User plugs the previously disconnected monitor back into the same DisplayPort. - radeon_connector_hotplug() is called before everyone else, and tries to handle the link training. Since only the first byte of the dpcd is zeroed, the driver is able to complete link training but does so against the wrong dpcd, causing it to initialize the link with the wrong settings. - Display stays blank (usually), dpcd is probed after the initial link training, and the driver prints no obvious messages to the log. In theory, since only one byte of the dpcd is chopped off (specifically, the byte that contains the revision information for DisplayPort), it's not entirely impossible that this bug may not show on certain monitors. For instance, the only reason this bug was visible on my ASUS PB238 monitor was due to the fact that this monitor using the enhanced framing symbol sequence, the flag for which is ignored if the radeon driver thinks that the DisplayPort version is below 1.1. Signed-off-by: Stephen Chandler Paul Reviewed-by: Jerome Glisse Signed-off-by: Alex Deucher Signed-off-by: Zefan Li commit e770b6a84b5a61a9ad0b592d795ff7ef68464736 Author: Tyler Hicks Date: Wed Aug 5 11:26:36 2015 -0500 eCryptfs: Invalidate dcache entries when lower i_nlink is zero commit 5556e7e6d30e8e9b5ee51b0e5edd526ee80e5e36 upstream. Consider eCryptfs dcache entries to be stale when the corresponding lower inode's i_nlink count is zero. This solves a problem caused by the lower inode being directly modified, without going through the eCryptfs mount, leaving stale eCryptfs dentries cached and the eCryptfs inode's i_nlink count not being cleared. Signed-off-by: Tyler Hicks Reported-by: Richard Weinberger [bwh: Backported to 3.2: - Test d_revalidate pointer directly rather than a DCACHE_OP flag - Open-code d_inode() - Adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Zefan Li commit f150995233209f9bf6364196f3e21cbe3dae454e Author: Matthijs Kooijman Date: Tue Aug 18 10:33:56 2015 +0200 USB: ftdi_sio: Added custom PID for CustomWare products commit 1fb8dc36384ae1140ee6ccc470de74397606a9d5 upstream. CustomWare uses the FTDI VID with custom PIDs for their ShipModul MiniPlex products. Signed-off-by: Matthijs Kooijman Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman Signed-off-by: Zefan Li commit c2ea2fd87815e0d47e3c029ba6559da45d013559 Author: Peter Chen Date: Mon Aug 17 10:23:03 2015 +0800 usb: host: ehci-sys: delete useless bus_to_hcd conversion commit 0521cfd06e1ebcd575e7ae36aab068b38df23850 upstream. The ehci platform device's drvdata is the pointer of struct usb_hcd already, so we doesn't need to call bus_to_hcd conversion again. Signed-off-by: Peter Chen Acked-by: Alan Stern Signed-off-by: Greg Kroah-Hartman Signed-off-by: Zefan Li commit f11d9b18aa5b2da19d2493be4c5195765dc693dc Author: NeilBrown Date: Thu Jul 30 13:00:56 2015 +1000 NFSv4: don't set SETATTR for O_RDONLY|O_EXCL commit efcbc04e16dfa95fef76309f89710dd1d99a5453 upstream. It is unusual to combine the open flags O_RDONLY and O_EXCL, but it appears that libre-office does just that. [pid 3250] stat("/home/USER/.config", {st_mode=S_IFDIR|0700, st_size=8192, ...}) = 0 [pid 3250] open("/home/USER/.config/libreoffice/4-suse/user/extensions/buildid", O_RDONLY|O_EXCL NFSv4 takes O_EXCL as a sign that a setattr command should be sent, probably to reset the timestamps. When it was an O_RDONLY open, the SETATTR command does not identify any actual attributes to change. If no delegation was provided to the open, the SETATTR uses the all-zeros stateid and the request is accepted (at least by the Linux NFS server - no harm, no foul). If a read-delegation was provided, this is used in the SETATTR request, and a Netapp filer will justifiably claim NFS4ERR_BAD_STATEID, which the Linux client takes as a sign to retry - indefinitely. So only treat O_EXCL specially if O_CREAT was also given. Signed-off-by: NeilBrown Signed-off-by: Trond Myklebust [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li commit 28709fcfc6ed7b6e38d5bb47316fcaf6c33667f7 Author: Paul Bolle Date: Fri Jul 31 14:08:58 2015 +0200 windfarm: decrement client count when unregistering commit fe2b592173ff0274e70dc44d1d28c19bb995aa7c upstream. wf_unregister_client() increments the client count when a client unregisters. That is obviously incorrect. Decrement that client count instead. Fixes: 75722d3992f5 ("[PATCH] ppc64: Thermal control for SMU based machines") Signed-off-by: Paul Bolle Signed-off-by: Michael Ellerman Signed-off-by: Zefan Li commit 7680c810fcae078c3ba33a04c548c4003882a82d Author: Masahiro Yamada Date: Wed Jul 15 10:29:00 2015 +0900 devres: fix devres_get() commit 64526370d11ce8868ca495723d595b61e8697fbf upstream. Currently, devres_get() passes devres_free() the pointer to devres, but devres_free() should be given with the pointer to resource data. Fixes: 9ac7849e35f7 ("devres: device resource management") Signed-off-by: Masahiro Yamada Acked-by: Tejun Heo Signed-off-by: Greg Kroah-Hartman Signed-off-by: Zefan Li commit d4f8c54e0b6ef550a5932ad90743ddac2a7b37b1 Author: Sudip Mukherjee Date: Mon Jul 20 17:27:21 2015 +0530 auxdisplay: ks0108: fix refcount commit bab383de3b84e584b0f09227151020b2a43dc34c upstream. parport_find_base() will implicitly do parport_get_port() which increases the refcount. Then parport_register_device() will again increment the refcount. But while unloading the module we are only doing parport_unregister_device() decrementing the refcount only once. We add an parport_put_port() to neutralize the effect of parport_get_port(). Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman Signed-off-by: Zefan Li commit ec2890a5fd0febb6e27e4893f5f6ffbd4dfaf735 Author: NeilBrown Date: Mon Aug 3 13:11:47 2015 +1000 md/raid0: update queue parameter in a safer location. commit 199dc6ed5179251fa6158a461499c24bdd99c836 upstream. When a (e.g.) RAID5 array is reshaped to RAID0, the updating of queue parameters (e.g. max number of sectors per bio) is done in the wrong place. It should be part of ->run, but it is actually part of ->takeover. This means it happens before level_store() calls: blk_set_stacking_limits(&mddev->queue->limits); and so it ineffective. This can lead to errors from underlying devices. So move all the relevant settings out of create_stripe_zones() and into raid0_run(). As this can lead to a bug-on it is suitable for any -stable kernel which supports reshape to RAID0. So 2.6.35 or later. As the bug has been present for five years there is no urgency, so no need to rush into -stable. Fixes: 9af204cf720c ("md: Add support for Raid5->Raid0 and Raid10->Raid0 takeover") Reported-by: Yi Zhang Signed-off-by: NeilBrown [lizf: Backported to 3.4: - adjust context - remove changes to discard and write-same features] Signed-off-by: Zefan Li commit 0f9ae878be5d28f53b0e3aad9b217efc65b6fca1 Author: Chuck Lever Date: Thu Jul 9 16:45:18 2015 -0400 svcrdma: Fix send_reply() scatter/gather set-up commit 9d11b51ce7c150a69e761e30518f294fc73d55ff upstream. The Linux NFS server returns garbage in the data payload of inline NFS/RDMA READ replies. These are READs of under 1000 bytes or so where the client has not provided either a reply chunk or a write list. The NFS server delivers the data payload for an NFS READ reply to the transport in an xdr_buf page list. If the NFS client did not provide a reply chunk or a write list, send_reply() is supposed to set up a separate sge for the page containing the READ data, and another sge for XDR padding if needed, then post all of the sges via a single SEND Work Request. The problem is send_reply() does not advance through the xdr_buf when setting up scatter/gather entries for SEND WR. It always calls dma_map_xdr with xdr_off set to zero. When there's more than one sge, dma_map_xdr() sets up the SEND sge's so they all point to the xdr_buf's head. The current Linux NFS/RDMA client always provides a reply chunk or a write list when performing an NFS READ over RDMA. Therefore, it does not exercise this particular case. The Linux server has never had to use more than one extra sge for building RPC/RDMA replies with a Linux client. However, an NFS/RDMA client _is_ allowed to send small NFS READs without setting up a write list or reply chunk. The NFS READ reply fits entirely within the inline reply buffer in this case. This is perhaps a more efficient way of performing NFS READs that the Linux NFS/RDMA client may some day adopt. Fixes: b432e6b3d9c1 ('svcrdma: Change DMA mapping logic to . . .') BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=285 Signed-off-by: Chuck Lever Signed-off-by: J. Bruce Fields [lizf: Backported to 3.4: adjust context] Signed-off-by: Zefan Li commit 1f4cd884c13085f41e34ffe809cfafcc99f8f2b1 Author: Thomas Huth Date: Fri Jul 17 12:46:58 2015 +0200 powerpc/rtas: Introduce rtas_get_sensor_fast() for IRQ handlers commit 1c2cb594441d02815d304cccec9742ff5c707495 upstream. The EPOW interrupt handler uses rtas_get_sensor(), which in turn uses rtas_busy_delay() to wait for RTAS becoming ready in case it is necessary. But rtas_busy_delay() is annotated with might_sleep() and thus may not be used by interrupts handlers like the EPOW handler! This leads to the following BUG when CONFIG_DEBUG_ATOMIC_SLEEP is enabled: BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:496 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.2.0-rc2-thuth #6 Call Trace: [c00000007ffe7b90] [c000000000807670] dump_stack+0xa0/0xdc (unreliable) [c00000007ffe7bc0] [c0000000000e1f14] ___might_sleep+0x134/0x180 [c00000007ffe7c20] [c00000000002aec0] rtas_busy_delay+0x30/0xd0 [c00000007ffe7c50] [c00000000002bde4] rtas_get_sensor+0x74/0xe0 [c00000007ffe7ce0] [c000000000083264] ras_epow_interrupt+0x44/0x450 [c00000007ffe7d90] [c000000000120260] handle_irq_event_percpu+0xa0/0x300 [c00000007ffe7e70] [c000000000120524] handle_irq_event+0x64/0xc0 [c00000007ffe7eb0] [c000000000124dbc] handle_fasteoi_irq+0xec/0x260 [c00000007ffe7ef0] [c00000000011f4f0] generic_handle_irq+0x50/0x80 [c00000007ffe7f20] [c000000000010f3c] __do_irq+0x8c/0x200 [c00000007ffe7f90] [c0000000000236cc] call_do_irq+0x14/0x24 [c00000007e6f39e0] [c000000000011144] do_IRQ+0x94/0x110 [c00000007e6f3a30] [c000000000002594] hardware_interrupt_common+0x114/0x180 Fix this issue by introducing a new rtas_get_sensor_fast() function that does not use rtas_busy_delay() - and thus can only be used for sensors that do not cause a BUSY condition - known as "fast" sensors. The EPOW sensor is defined to be "fast" in sPAPR - mpe. Fixes: 587f83e8dd50 ("powerpc/pseries: Use rtas_get_sensor in RAS code") Signed-off-by: Thomas Huth Reviewed-by: Nathan Fontenot Signed-off-by: Michael Ellerman Signed-off-by: Zefan Li commit f5289af2c594adccb54d09258a8517c50d6b2d84 Author: Mark Rustad Date: Mon Jul 13 11:40:07 2015 -0700 PCI: Add VPD function 0 quirk for Intel Ethernet devices commit 7aa6ca4d39edf01f997b9e02cf6d2fdeb224f351 upstream. Set the PCI_DEV_FLAGS_VPD_REF_F0 flag on all Intel Ethernet device functions other than function 0, so that on multi-function devices, we will always read VPD from function 0 instead of from the other functions. [bhelgaas: changelog] Signed-off-by: Mark Rustad Signed-off-by: Bjorn Helgaas Acked-by: Alexander Duyck [bwh: Backported to 3.2: - Put the class check in the new function as there is no DECLARE_PCI_FIXUP_CLASS_EARLY( - Adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Zefan Li commit dc7e2fb68c6c0c268bfb264846d54dc8bf273a56 Author: Mark Rustad Date: Mon Jul 13 11:40:02 2015 -0700 PCI: Add dev_flags bit to access VPD through function 0 commit 932c435caba8a2ce473a91753bad0173269ef334 upstream. Add a dev_flags bit, PCI_DEV_FLAGS_VPD_REF_F0, to access VPD through function 0 to provide VPD access on other functions. This is for hardware devices that provide copies of the same VPD capability registers in multiple functions. Because the kernel expects that each function has its own registers, both the locking and the state tracking are affected by VPD accesses to different functions. On such devices for example, if a VPD write is performed on function 0, *any* later attempt to read VPD from any other function of that device will hang. This has to do with how the kernel tracks the expected value of the F bit per function. Concurrent accesses to different functions of the same device can not only hang but also corrupt both read and write VPD data. When hangs occur, typically the error message: vpd r/w failed. This is likely a firmware bug on this device. will be seen. Never set this bit on function 0 or there will be an infinite recursion. Signed-off-by: Mark Rustad Signed-off-by: Bjorn Helgaas Acked-by: Alexander Duyck Signed-off-by: Zefan Li commit c02b085fafa2f8a6a9c447c4ac472552fabf3140 Author: Bob Copeland Date: Sat Jun 13 10:16:31 2015 -0400 mac80211: enable assoc check for mesh interfaces commit 3633ebebab2bbe88124388b7620442315c968e8f upstream. We already set a station to be associated when peering completes, both in user space and in the kernel. Thus we should always have an associated sta before sending data frames to that station. Failure to check assoc state can cause crashes in the lower-level driver due to transmitting unicast data frames before driver sta structures (e.g. ampdu state in ath9k) are initialized. This occurred when forwarding in the presence of fixed mesh paths: frames were transmitted to stations with whom we hadn't yet completed peering. Reported-by: Alexis Green Tested-by: Jesse Jones Signed-off-by: Bob Copeland Signed-off-by: Johannes Berg Signed-off-by: Zefan Li commit b9ed7f2f1893f30e42d2f7745a5d4ddec430eea4 Author: Bjorn Helgaas Date: Fri Jun 19 15:58:24 2015 -0500 PCI: Fix TI816X class code quirk commit d1541dc977d376406f4584d8eb055488655c98ec upstream. In fixup_ti816x_class(), we assigned "class = PCI_CLASS_MULTIMEDIA_VIDEO". But PCI_CLASS_MULTIMEDIA_VIDEO is only the two-byte base class/sub-class and needs to be shifted to make space for the low-order interface byte. Shift PCI_CLASS_MULTIMEDIA_VIDEO to set the correct class code. Fixes: 63c4408074cb ("PCI: Add quirk for setting valid class for TI816X Endpoint") Signed-off-by: Bjorn Helgaas CC: Hemant Pedanekar Signed-off-by: Zefan Li commit c2b5a66d7651c4ae1fa7ad30fc1ceff02a50d37d Author: David Härdeman Date: Tue May 19 19:03:12 2015 -0300 rc-core: fix remove uevent generation commit a66b0c41ad277ae62a3ae6ac430a71882f899557 upstream. The input_dev is already gone when the rc device is being unregistered so checking for its presence only means that no remove uevent will be generated. Signed-off-by: David Härdeman Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Zefan Li