commit aa4cd140bba57b7064b4c7a7141bebd336d32087 Author: Greg Kroah-Hartman Date: Mon Sep 30 16:23:56 2024 +0200 Linux 6.1.112 Link: https://lore.kernel.org/r/20240927121719.897851549@linuxfoundation.org Tested-by: Peter Schneider Tested-by: Allen Pais Tested-by: Jon Hunter Tested-by: Florian Fainelli Tested-by: Salvatore Bonaccorso Tested-by: Linux Kernel Functional Testing Tested-by: Shuah Khan Tested-by: Ron Economos Tested-by: kernelci.org bot Tested-by: Pavel Machek (CIP) Signed-off-by: Greg Kroah-Hartman commit ba6269e187aa1b1f20faf3c458831a0d6350304b Author: Edward Adam Davis Date: Sun Sep 8 17:17:41 2024 +0800 USB: usbtmc: prevent kernel-usb-infoleak commit 625fa77151f00c1bd00d34d60d6f2e710b3f9aad upstream. The syzbot reported a kernel-usb-infoleak in usbtmc_write, we need to clear the structure before filling fields. Fixes: 4ddc645f40e9 ("usb: usbtmc: Add ioctl for vendor specific write") Reported-and-tested-by: syzbot+9d34f80f841e948c3fdb@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9d34f80f841e948c3fdb Signed-off-by: Edward Adam Davis Cc: stable Link: https://lore.kernel.org/r/tencent_9649AA6EC56EDECCA8A7D106C792D1C66B06@qq.com Signed-off-by: Greg Kroah-Hartman commit c74796ff4fbc501b305600a4430b8f7a55edee40 Author: Junhao Xie Date: Tue Sep 3 23:06:38 2024 +0800 USB: serial: pl2303: add device id for Macrosilicon MS3020 commit 7d47d22444bb7dc1b6d768904a22070ef35e1fc0 upstream. Add the device id for the Macrosilicon MS3020 which is a PL2303HXN based device. Signed-off-by: Junhao Xie Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman commit a20eea14a6ad9a1b9b607f7cabcd9981df72f4f0 Author: Tony Luck Date: Wed Apr 24 11:15:18 2024 -0700 x86/mm: Switch to new Intel CPU model defines commit 2eda374e883ad297bd9fe575a16c1dc850346075 upstream. New CPU #defines encode vendor and family as well as model. [ dhansen: vertically align 0's in invlpg_miss_ids[] ] Signed-off-by: Tony Luck Signed-off-by: Dave Hansen Signed-off-by: Borislav Petkov (AMD) Link: https://lore.kernel.org/all/20240424181518.41946-1-tony.luck%40intel.com [ Ricardo: I used the old match macro X86_MATCH_INTEL_FAM6_MODEL() instead of X86_MATCH_VFM() as in the upstream commit. I also kept the ALDERLAKE_N name instead of ATOM_GRACEMONT. Both refer to the same CPU model. ] Signed-off-by: Ricardo Neri Reviewed-by: Pawan Gupta Signed-off-by: Greg Kroah-Hartman commit ee8adcb4c0f5c947a5b1ec0ef3b2bd84ecbbd6d7 Author: Sumeet Pawnikar Date: Thu Jun 8 08:00:06 2023 +0530 powercap: RAPL: fix invalid initialization for pl4_supported field commit d05b5e0baf424c8c4b4709ac11f66ab726c8deaf upstream. The current initialization of the struct x86_cpu_id via pl4_support_ids[] is partial and wrong. It is initializing "stepping" field with "X86_FEATURE_ANY" instead of "feature" field. Use X86_MATCH_INTEL_FAM6_MODEL macro instead of initializing each field of the struct x86_cpu_id for pl4_supported list of CPUs. This X86_MATCH_INTEL_FAM6_MODEL macro internally uses another macro X86_MATCH_VENDOR_FAM_MODEL_FEATURE for X86 based CPU matching with appropriate initialized values. Reported-by: Dave Hansen Link: https://lore.kernel.org/lkml/28ead36b-2d9e-1a36-6f4e-04684e420260@intel.com Fixes: eb52bc2ae5b8 ("powercap: RAPL: Add Power Limit4 support for Meteor Lake SoC") Fixes: b08b95cf30f5 ("powercap: RAPL: Add Power Limit4 support for Alder Lake-N and Raptor Lake-P") Fixes: 515755906921 ("powercap: RAPL: Add Power Limit4 support for RaptorLake") Fixes: 1cc5b9a411e4 ("powercap: Add Power Limit4 support for Alder Lake SoC") Fixes: 8365a898fe53 ("powercap: Add Power Limit4 support") Signed-off-by: Sumeet Pawnikar Signed-off-by: Rafael J. Wysocki Reviewed-by: Pawan Gupta [ Ricardo: I removed METEORLAKE and METEORLAKE_L from pl4_support_ids as they are not included in v6.1. ] Signed-off-by: Ricardo Neri Signed-off-by: Greg Kroah-Hartman commit 563df8b411dea5adcf22955fb189c9d152dcf4cc Author: Filipe Manana Date: Tue Mar 21 11:13:59 2023 +0000 btrfs: calculate the right space for delayed refs when updating global reserve commit f8f210dc84709804c9f952297f2bfafa6ea6b4bd upstream. When updating the global block reserve, we account for the 6 items needed by an unlink operation and the 6 delayed references for each one of those items. However the calculation for the delayed references is not correct in case we have the free space tree enabled, as in that case we need to touch the free space tree as well and therefore need twice the number of bytes. So use the btrfs_calc_delayed_ref_bytes() helper to calculate the number of bytes need for the delayed references at btrfs_update_global_block_rsv(). Reviewed-by: Josef Bacik Signed-off-by: Filipe Manana Reviewed-by: David Sterba Signed-off-by: David Sterba [Diogo: this patch has been cherry-picked from the original commit; conflicts included lack of a define (picked from commit 5630e2bcfe223) and lack of btrfs_calc_delayed_ref_bytes (picked from commit 0e55a54502b97) - changed const struct -> struct for compatibility.] Signed-off-by: Diogo Jahchan Koike Signed-off-by: Greg Kroah-Hartman commit 2626cbee1f5d0ffd551be48766ad9f86388f528f Author: Matthieu Baerts (NGI0) Date: Tue Sep 10 21:06:36 2024 +0200 selftests: mptcp: join: restrict fullmesh endp on 1st sf commit 49ac6f05ace5bb0070c68a0193aa05d3c25d4c83 upstream. A new endpoint using the IP of the initial subflow has been recently added to increase the code coverage. But it breaks the test when using old kernels not having commit 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk"), e.g. on v5.15. Similar to commit d4c81bbb8600 ("selftests: mptcp: join: support local endpoint being tracked or not"), it is possible to add the new endpoint conditionally, by checking if "mptcp_pm_subflow_check_next" is present in kallsyms: this is not directly linked to the commit introducing this symbol but for the parent one which is linked anyway. So we can know in advance what will be the expected behaviour, and add the new endpoint only when it makes sense to do so. Fixes: 4878f9f8421f ("selftests: mptcp: join: validate fullmesh endp on 1st sf") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts (NGI0) Link: https://patch.msgid.link/20240910-net-selftests-mptcp-fix-install-v1-1-8f124aa9156d@kernel.org Signed-off-by: Jakub Kicinski [ Conflicts in mptcp_join.sh, because the 'run_tests' helper has been modified in multiple commits that are not in this version, e.g. commit e571fb09c893 ("selftests: mptcp: add speed env var"). The conflict was in the context, the new lines can still be added at the same place. ] Signed-off-by: Matthieu Baerts (NGI0) Signed-off-by: Greg Kroah-Hartman commit 0ba8b599c30ca3a9a71eee36b89d3e6e96ad94c3 Author: Marc Kleine-Budde Date: Wed Jan 11 12:10:04 2023 +0100 can: mcp251xfd: move mcp251xfd_timestamp_start()/stop() into mcp251xfd_chip_start/stop() commit a7801540f325d104de5065850a003f1d9bdc6ad3 upstream. The mcp251xfd wakes up from Low Power or Sleep Mode when SPI activity is detected. To avoid this, make sure that the timestamp worker is stopped before shutting down the chip. Split the starting of the timestamp worker out of mcp251xfd_timestamp_init() into the separate function mcp251xfd_timestamp_start(). Call mcp251xfd_timestamp_init() before mcp251xfd_chip_start(), move mcp251xfd_timestamp_start() to mcp251xfd_chip_start(). In this way, mcp251xfd_timestamp_stop() can be called unconditionally by mcp251xfd_chip_stop(). Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman commit 88047c4b2d3e1831e375204ed7e8712bac4a42b5 Author: Marc Kleine-Budde Date: Thu Apr 25 10:14:45 2024 +0200 can: mcp251xfd: properly indent labels commit 51b2a721612236335ddec4f3fb5f59e72a204f3a upstream. To fix the coding style, remove the whitespace in front of labels. Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman commit 672c19165fc96dfad531a5458e0b3cdab414aae4 Author: Hagar Hemdan Date: Thu May 23 08:53:32 2024 +0000 gpio: prevent potential speculation leaks in gpio_device_get_desc() commit d795848ecce24a75dfd46481aee066ae6fe39775 upstream. Userspace may trigger a speculative read of an address outside the gpio descriptor array. Users can do that by calling gpio_ioctl() with an offset out of range. Offset is copied from user and then used as an array index to get the gpio descriptor without sanitization in gpio_device_get_desc(). This change ensures that the offset is sanitized by using array_index_nospec() to mitigate any possibility of speculative information leaks. This bug was discovered and resolved using Coverity Static Analysis Security Testing (SAST) by Synopsys, Inc. Signed-off-by: Hagar Hemdan Link: https://lore.kernel.org/r/20240523085332.1801-1-hagarhem@amazon.com Signed-off-by: Bartosz Golaszewski Signed-off-by: Hugo SIMELIERE Signed-off-by: Greg Kroah-Hartman commit 5c3a421c1f3be1ba84b4c0fdfeb6d9cf52b096e1 Author: Kent Gibson Date: Wed Jun 26 13:29:23 2024 +0800 gpiolib: cdev: Ignore reconfiguration without direction commit b440396387418fe2feaacd41ca16080e7a8bc9ad upstream. linereq_set_config() behaves badly when direction is not set. The configuration validation is borrowed from linereq_create(), where, to verify the intent of the user, the direction must be set to in order to effect a change to the electrical configuration of a line. But, when applied to reconfiguration, that validation does not allow for the unset direction case, making it possible to clear flags set previously without specifying the line direction. Adding to the inconsistency, those changes are not immediately applied by linereq_set_config(), but will take effect when the line value is next get or set. For example, by requesting a configuration with no flags set, an output line with GPIO_V2_LINE_FLAG_ACTIVE_LOW and GPIO_V2_LINE_FLAG_OPEN_DRAIN set could have those flags cleared, inverting the sense of the line and changing the line drive to push-pull on the next line value set. Skip the reconfiguration of lines for which the direction is not set, and only reconfigure the lines for which direction is set. Fixes: a54756cb24ea ("gpiolib: cdev: support GPIO_V2_LINE_SET_CONFIG_IOCTL") Signed-off-by: Kent Gibson Link: https://lore.kernel.org/r/20240626052925.174272-3-warthog618@gmail.com Signed-off-by: Bartosz Golaszewski Signed-off-by: Greg Kroah-Hartman commit e388656a850ec168ded1ec0d47a4bda76f9ce5c0 Author: Ping-Ke Shih Date: Thu Sep 26 08:30:17 2024 +0800 Revert "wifi: cfg80211: check wiphy mutex is held for wdev mutex" This reverts commit 19d13ec00a8b1d60c5cc06bd0006b91d5bd8d46f which is commmit 1474bc87fe57deac726cc10203f73daa6c3212f7 upstream. The reverted commit is based on implementation of wiphy locking that isn't planned to redo on a stable kernel, so revert it to avoid warning: WARNING: CPU: 0 PID: 9 at net/wireless/core.h:231 disconnect_work+0xb8/0x144 [cfg80211] CPU: 0 PID: 9 Comm: kworker/0:1 Not tainted 6.6.51-00141-ga1649b6f8ed6 #7 Hardware name: Freescale i.MX6 SoloX (Device Tree) Workqueue: events disconnect_work [cfg80211] unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x58/0x70 dump_stack_lvl from __warn+0x70/0x1c0 __warn from warn_slowpath_fmt+0x16c/0x294 warn_slowpath_fmt from disconnect_work+0xb8/0x144 [cfg80211] disconnect_work [cfg80211] from process_one_work+0x204/0x620 process_one_work from worker_thread+0x1b0/0x474 worker_thread from kthread+0x10c/0x12c kthread from ret_from_fork+0x14/0x24 Reported-by: petter@technux.se Closes: https://lore.kernel.org/linux-wireless/9e98937d781c990615ef27ee0c858ff9@technux.se/T/#t Cc: Johannes Berg Signed-off-by: Ping-Ke Shih Signed-off-by: Greg Kroah-Hartman commit ddeead4761c6c943f7616d12e160b281ef542a75 Author: Pablo Neira Ayuso Date: Tue Sep 17 22:25:04 2024 +0200 netfilter: nf_tables: missing iterator type in lookup walk commit efefd4f00c967d00ad7abe092554ffbb70c1a793 upstream. Add missing decorator type to lookup expression and tighten WARN_ON_ONCE check in pipapo to spot earlier that this is unset. Fixes: 29b359cf6d95 ("netfilter: nft_set_pipapo: walk over current view on netlink dump") Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 52735a010f37580b3a569a996f878fdd87425650 Author: Pablo Neira Ayuso Date: Tue Sep 17 22:25:03 2024 +0200 netfilter: nft_set_pipapo: walk over current view on netlink dump commit 29b359cf6d95fd60730533f7f10464e95bd17c73 upstream. The generation mask can be updated while netlink dump is in progress. The pipapo set backend walk iterator cannot rely on it to infer what view of the datastructure is to be used. Add notation to specify if user wants to read/update the set. Based on patch from Florian Westphal. Fixes: 2b84e215f874 ("netfilter: nft_set_pipapo: .walk does not deal with generations") Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 8a64f87e746044e0016cc3321ae0e87a1d2b593b Author: Dan Carpenter Date: Sat Sep 14 12:56:51 2024 +0300 netfilter: nft_socket: Fix a NULL vs IS_ERR() bug in nft_socket_cgroup_subtree_level() commit 7052622fccb1efb850c6b55de477f65d03525a30 upstream. The cgroup_get_from_path() function never returns NULL, it returns error pointers. Update the error handling to match. Fixes: 7f3287db6543 ("netfilter: nft_socket: make cgroupsv2 matching work with namespaces") Signed-off-by: Dan Carpenter Acked-by: Florian Westphal Acked-by: Pablo Neira Ayuso Link: https://patch.msgid.link/bbc0c4e0-05cc-4f44-8797-2f4b3920a820@stanley.mountain Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman commit ace0db36b4a1db07a48517c4f04488d1cd05e5f5 Author: Florian Westphal Date: Sat Sep 7 16:07:49 2024 +0200 netfilter: nft_socket: make cgroupsv2 matching work with namespaces commit 7f3287db654395f9c5ddd246325ff7889f550286 upstream. When running in container environmment, /sys/fs/cgroup/ might not be the real root node of the sk-attached cgroup. Example: In container: % stat /sys//fs/cgroup/ Device: 0,21 Inode: 2214 .. % stat /sys/fs/cgroup/foo Device: 0,21 Inode: 2264 .. The expectation would be for: nft add rule .. socket cgroupv2 level 1 "foo" counter to match traffic from a process that got added to "foo" via "echo $pid > /sys/fs/cgroup/foo/cgroup.procs". However, 'level 3' is needed to make this work. Seen from initial namespace, the complete hierarchy is: % stat /sys/fs/cgroup/system.slice/docker-.../foo Device: 0,21 Inode: 2264 .. i.e. hierarchy is 0 1 2 3 / -> system.slice -> docker-1... -> foo ... but the container doesn't know that its "/" is the "docker-1.." cgroup. Current code will retrieve the 'system.slice' cgroup node and store its kn->id in the destination register, so compare with 2264 ("foo" cgroup id) will not match. Fetch "/" cgroup from ->init() and add its level to the level we try to extract. cgroup root-level is 0 for the init-namespace or the level of the ancestor that is exposed as the cgroup root inside the container. In the above case, cgrp->level of "/" resolved in the container is 2 (docker-1...scope/) and request for 'level 1' will get adjusted to fetch the actual level (3). v2: use CONFIG_SOCK_CGROUP_DATA, eval function depends on it. (kernel test robot) Cc: cgroups@vger.kernel.org Fixes: e0bb96db96f8 ("netfilter: nft_socket: add support for cgroupsv2") Reported-by: Nadia Pinaeva Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 5899daf1d88a8932a7ea9b32544aa28acefe551f Author: Dave Chinner Date: Wed Sep 25 17:57:05 2024 -0700 xfs: journal geometry is not properly bounds checked [ Upstream commit f1e1765aad7de7a8b8102044fc6a44684bc36180 ] If the journal geometry results in a sector or log stripe unit validation problem, it indicates that we cannot set the log up to safely write to the the journal. In these cases, we must abort the mount because the corruption needs external intervention to resolve. Similarly, a journal that is too large cannot be written to safely, either, so we shouldn't allow those geometries to mount, either. If the log is too small, we risk having transaction reservations overruning the available log space and the system hanging waiting for space it can never provide. This is purely a runtime hang issue, not a corruption issue as per the first cases listed above. We abort mounts of the log is too small for V5 filesystems, but we must allow v4 filesystems to mount because, historically, there was no log size validity checking and so some systems may still be out there with undersized logs. The problem is that on V4 filesystems, when we discover a log geometry problem, we skip all the remaining checks and then allow the log to continue mounting. This mean that if one of the log size checks fails, we skip the log stripe unit check. i.e. we allow the mount because a "non-fatal" geometry is violated, and then fail to check the hard fail geometries that should fail the mount. Move all these fatal checks to the superblock verifier, and add a new check for the two log sector size geometry variables having the same values. This will prevent any attempt to mount a log that has invalid or inconsistent geometries long before we attempt to mount the log. However, for the minimum log size checks, we can only do that once we've setup up the log and calculated all the iclog sizes and roundoffs. Hence this needs to remain in the log mount code after the log has been initialised. It is also the only case where we should allow a v4 filesystem to continue running, so leave that handling in place, too. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Signed-off-by: Greg Kroah-Hartman commit 68e6efe0d4f1558e3bba98e9149510730a5f8b77 Author: Darrick J. Wong Date: Tue Sep 24 11:38:51 2024 -0700 xfs: set bnobt/cntbt numrecs correctly when formatting new AGs [ Upstream commit 8e698ee72c4ecbbf18264568eb310875839fd601 ] Through generic/300, I discovered that mkfs.xfs creates corrupt filesystems when given these parameters: Filesystems formatted with --unsupported are not supported!! meta-data=/dev/sda isize=512 agcount=8, agsize=16352 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=1 = reflink=1 bigtime=1 inobtcount=1 nrext64=1 data = bsize=4096 blocks=130816, imaxpct=25 = sunit=32 swidth=128 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=8192, version=2 = sectsz=512 sunit=32 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 = rgcount=0 rgsize=0 blks Discarding blocks...Done. Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - zero log... - 16:30:50: zeroing log - 16320 of 16320 blocks done - scan filesystem freespace and inode maps... agf_freeblks 25, counted 0 in ag 4 sb_fdblocks 8823, counted 8798 The root cause of this problem is the numrecs handling in xfs_freesp_init_recs, which is used to initialize a new AG. Prior to calling the function, we set up the new bnobt block with numrecs == 1 and rely on _freesp_init_recs to format that new record. If the last record created has a blockcount of zero, then it sets numrecs = 0. That last bit isn't correct if the AG contains the log, the start of the log is not immediately after the initial blocks due to stripe alignment, and the end of the log is perfectly aligned with the end of the AG. For this case, we actually formatted a single bnobt record to handle the free space before the start of the (stripe aligned) log, and incremented arec to try to format a second record. That second record turned out to be unnecessary, so what we really want is to leave numrecs at 1. The numrecs handling itself is overly complicated because a different function sets numrecs == 1. Change the bnobt creation code to start with numrecs set to zero and only increment it after successfully formatting a free space extent into the btree block. Fixes: f327a00745ff ("xfs: account for log space when formatting new AGs") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit af871df651586cde8ef790cc9edbb7edaa2f57a3 Author: Darrick J. Wong Date: Tue Sep 24 11:38:50 2024 -0700 xfs: fix reloading entire unlinked bucket lists [ Upstream commit 537c013b140d373d1ffe6290b841dc00e67effaa ] During review of the patcheset that provided reloading of the incore iunlink list, Dave made a few suggestions, and I updated the copy in my dev tree. Unfortunately, I then got distracted by ... who even knows what ... and forgot to backport those changes from my dev tree to my release candidate branch. I then sent multiple pull requests with stale patches, and that's what was merged into -rc3. So. This patch re-adds the use of an unlocked iunlink list check to determine if we want to allocate the resources to recreate the incore list. Since lost iunlinked inodes are supposed to be rare, this change helps us avoid paying the transaction and AGF locking costs every time we open any inode. This also re-adds the shutdowns on failure, and re-applies the restructuring of the inner loop in xfs_inode_reload_unlinked_bucket, and re-adds a requested comment about the quotachecking code. Retain the original RVB tag from Dave since there's no code change from the last submission. Fixes: 68b957f64fca1 ("xfs: load uncached unlinked inodes into memory on demand") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 62ca59104567391044ff13e9f85b318332ef476c Author: Darrick J. Wong Date: Tue Sep 24 11:38:49 2024 -0700 xfs: make inode unlinked bucket recovery work with quotacheck [ Upstream commit 49813a21ed57895b73ec4ed3b99d4beec931496f ] Teach quotacheck to reload the unlinked inode lists when walking the inode table. This requires extra state handling, since it's possible that a reloaded inode will get inactivated before quotacheck tries to scan it; in this case, we need to ensure that the reloaded inode does not have dquots attached when it is freed. Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit e9d1551f806920785e6978024d437727a617b954 Author: Darrick J. Wong Date: Tue Sep 24 11:38:48 2024 -0700 xfs: reload entire unlinked bucket lists [ Upstream commit 83771c50e42b92de6740a63e152c96c052d37736 ] The previous patch to reload unrecovered unlinked inodes when adding a newly created inode to the unlinked list is missing a key piece of functionality. It doesn't handle the case that someone calls xfs_iget on an inode that is not the last item in the incore list. For example, if at mount time the ondisk iunlink bucket looks like this: AGI -> 7 -> 22 -> 3 -> NULL None of these three inodes are cached in memory. Now let's say that someone tries to open inode 3 by handle. We need to walk the list to make sure that inodes 7 and 22 get loaded cold, and that the i_prev_unlinked of inode 3 gets set to 22. Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 8ffd3ae7a00828c1c8ca7d3f5eaff9a01c4fd957 Author: Darrick J. Wong Date: Tue Sep 24 11:38:47 2024 -0700 xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list [ Upstream commit f12b96683d6976a3a07fdf3323277c79dbe8f6ab ] Alter the definition of i_prev_unlinked slightly to make it more obvious when an inode with 0 link count is not part of the iunlink bucket lists rooted in the AGI. This distinction is necessary because it is not sufficient to check inode.i_nlink to decide if an inode is on the unlinked list. Updates to i_nlink can happen while holding only ILOCK_EXCL, but updates to an inode's position in the AGI unlinked list (which happen after the nlink update) requires both ILOCK_EXCL and the AGI buffer lock. The next few patches will make it possible to reload an entire unlinked bucket list when we're walking the inode table or performing handle operations and need more than the ability to iget the last inode in the chain. The upcoming directory repair code also needs to be able to make this distinction to decide if a zero link count directory should be moved to the orphanage or allowed to inactivate. An upcoming enhancement to the online AGI fsck code will need this distinction to check and rebuild the AGI unlinked buckets. Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 8e2147f37f470e556285c2de359a2d3dd340b0bb Author: Shiyang Ruan Date: Tue Sep 24 11:38:46 2024 -0700 xfs: correct calculation for agend and blockcount [ Upstream commit 3c90c01e49342b166e5c90ec2c85b220be15a20e ] The agend should be "start + length - 1", then, blockcount should be "end + 1 - start". Correct 2 calculation mistakes. Also, rename "agend" to "range_agend" because it's not the end of the AG per se; it's the end of the dead region within an AG's agblock space. Fixes: 5cf32f63b0f4 ("xfs: fix the calculation for "end" and "length"") Signed-off-by: Shiyang Ruan Reviewed-by: "Darrick J. Wong" Signed-off-by: Chandan Babu R Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit d931b6c6a9858e9bb229419128454f9538d4f140 Author: Dave Chinner Date: Tue Sep 24 11:38:45 2024 -0700 xfs: fix unlink vs cluster buffer instantiation race [ Upstream commit 348a1983cf4cf5099fc398438a968443af4c9f65 ] Luis has been reporting an assert failure when freeing an inode cluster during inode inactivation for a while. The assert looks like: XFS: Assertion failed: bp->b_flags & XBF_DONE, file: fs/xfs/xfs_trans_buf.c, line: 241 ------------[ cut here ]------------ kernel BUG at fs/xfs/xfs_message.c:102! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 4 PID: 73 Comm: kworker/4:1 Not tainted 6.10.0-rc1 #4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 Workqueue: xfs-inodegc/loop5 xfs_inodegc_worker [xfs] RIP: 0010:assfail (fs/xfs/xfs_message.c:102) xfs RSP: 0018:ffff88810188f7f0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff88816e748250 RCX: 1ffffffff844b0e7 RDX: 0000000000000004 RSI: ffff88810188f558 RDI: ffffffffc2431fa0 RBP: 1ffff11020311f01 R08: 0000000042431f9f R09: ffffed1020311e9b R10: ffff88810188f4df R11: ffffffffac725d70 R12: ffff88817a3f4000 R13: ffff88812182f000 R14: ffff88810188f998 R15: ffffffffc2423f80 FS: 0000000000000000(0000) GS:ffff8881c8400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055fe9d0f109c CR3: 000000014426c002 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: xfs_trans_read_buf_map (fs/xfs/xfs_trans_buf.c:241 (discriminator 1)) xfs xfs_imap_to_bp (fs/xfs/xfs_trans.h:210 fs/xfs/libxfs/xfs_inode_buf.c:138) xfs xfs_inode_item_precommit (fs/xfs/xfs_inode_item.c:145) xfs xfs_trans_run_precommits (fs/xfs/xfs_trans.c:931) xfs __xfs_trans_commit (fs/xfs/xfs_trans.c:966) xfs xfs_inactive_ifree (fs/xfs/xfs_inode.c:1811) xfs xfs_inactive (fs/xfs/xfs_inode.c:2013) xfs xfs_inodegc_worker (fs/xfs/xfs_icache.c:1841 fs/xfs/xfs_icache.c:1886) xfs process_one_work (kernel/workqueue.c:3231) worker_thread (kernel/workqueue.c:3306 (discriminator 2) kernel/workqueue.c:3393 (discriminator 2)) kthread (kernel/kthread.c:389) ret_from_fork (arch/x86/kernel/process.c:147) ret_from_fork_asm (arch/x86/entry/entry_64.S:257) And occurs when the the inode precommit handlers is attempt to look up the inode cluster buffer to attach the inode for writeback. The trail of logic that I can reconstruct is as follows. 1. the inode is clean when inodegc runs, so it is not attached to a cluster buffer when precommit runs. 2. #1 implies the inode cluster buffer may be clean and not pinned by dirty inodes when inodegc runs. 3. #2 implies that the inode cluster buffer can be reclaimed by memory pressure at any time. 4. The assert failure implies that the cluster buffer was attached to the transaction, but not marked done. It had been accessed earlier in the transaction, but not marked done. 5. #4 implies the cluster buffer has been invalidated (i.e. marked stale). 6. #5 implies that the inode cluster buffer was instantiated uninitialised in the transaction in xfs_ifree_cluster(), which only instantiates the buffers to invalidate them and never marks them as done. Given factors 1-3, this issue is highly dependent on timing and environmental factors. Hence the issue can be very difficult to reproduce in some situations, but highly reliable in others. Luis has an environment where it can be reproduced easily by g/531 but, OTOH, I've reproduced it only once in ~2000 cycles of g/531. I think the fix is to have xfs_ifree_cluster() set the XBF_DONE flag on the cluster buffers, even though they may not be initialised. The reasons why I think this is safe are: 1. A buffer cache lookup hit on a XBF_STALE buffer will clear the XBF_DONE flag. Hence all future users of the buffer know they have to re-initialise the contents before use and mark it done themselves. 2. xfs_trans_binval() sets the XFS_BLI_STALE flag, which means the buffer remains locked until the journal commit completes and the buffer is unpinned. Hence once marked XBF_STALE/XFS_BLI_STALE by xfs_ifree_cluster(), the only context that can access the freed buffer is the currently running transaction. 3. #2 implies that future buffer lookups in the currently running transaction will hit the transaction match code and not the buffer cache. Hence XBF_STALE and XFS_BLI_STALE will not be cleared unless the transaction initialises and logs the buffer with valid contents again. At which point, the buffer will be marked marked XBF_DONE again, so having XBF_DONE already set on the stale buffer is a moot point. 4. #2 also implies that any concurrent access to that cluster buffer will block waiting on the buffer lock until the inode cluster has been fully freed and is no longer an active inode cluster buffer. 5. #4 + #1 means that any future user of the disk range of that buffer will always see the range of disk blocks covered by the cluster buffer as not done, and hence must initialise the contents themselves. 6. Setting XBF_DONE in xfs_ifree_cluster() then means the unlinked inode precommit code will see a XBF_DONE buffer from the transaction match as it expects. It can then attach the stale but newly dirtied inode to the stale but newly dirtied cluster buffer without unexpected failures. The stale buffer will then sail through the journal and do the right thing with the attached stale inode during unpin. Hence the fix is just one line of extra code. The explanation of why we have to set XBF_DONE in xfs_ifree_cluster, OTOH, is long and complex.... Fixes: 82842fee6e59 ("xfs: fix AGF vs inode cluster buffer deadlock") Signed-off-by: Dave Chinner Tested-by: Luis Chamberlain Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Chandan Babu R Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 1486aeb788370260f8ed6c53abe233cc2d74dd53 Author: Darrick J. Wong Date: Tue Sep 24 11:38:44 2024 -0700 xfs: fix negative array access in xfs_getbmap [ Upstream commit 1bba82fe1afac69c85c1f5ea137c8e73de3c8032 ] In commit 8ee81ed581ff, Ye Bin complained about an ASSERT in the bmapx code that trips if we encounter a delalloc extent after flushing the pagecache to disk. The ioctl code does not hold MMAPLOCK so it's entirely possible that a racing write page fault can create a delalloc extent after the file has been flushed. The proposed solution was to replace the assertion with an early return that avoids filling out the bmap recordset with a delalloc entry if the caller didn't ask for it. At the time, I recall thinking that the forward logic sounded ok, but felt hesitant because I suspected that changing this code would cause something /else/ to burst loose due to some other subtlety. syzbot of course found that subtlety. If all the extent mappings found after the flush are delalloc mappings, we'll reach the end of the data fork without ever incrementing bmv->bmv_entries. This is new, since before we'd have emitted the delalloc mappings even though the caller didn't ask for them. Once we reach the end, we'll try to set BMV_OF_LAST on the -1st entry (because bmv_entries is zero) and go corrupt something else in memory. Yay. I really dislike all these stupid patches that fiddle around with debug code and break things that otherwise worked well enough. Nobody was complaining that calling XFS_IOC_BMAPX without BMV_IF_DELALLOC would return BMV_OF_DELALLOC records, and now we've gone from "weird behavior that nobody cared about" to "bad behavior that must be addressed immediately". Maybe I'll just ignore anything from Huawei from now on for my own sake. Reported-by: syzbot+c103d3808a0de5faaf80@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-xfs/20230412024907.GP360889@frogsfrogsfrogs/ Fixes: 8ee81ed581ff ("xfs: fix BUG_ON in xfs_getbmap()") Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 4790c167cc662250114dc4ecdc5d9cedbae1fe01 Author: Darrick J. Wong Date: Tue Sep 24 11:38:43 2024 -0700 xfs: load uncached unlinked inodes into memory on demand [ Upstream commit 68b957f64fca1930164bfc6d6d379acdccd547d7 ] shrikanth hegde reports that filesystems fail shortly after mount with the following failure: WARNING: CPU: 56 PID: 12450 at fs/xfs/xfs_inode.c:1839 xfs_iunlink_lookup+0x58/0x80 [xfs] This of course is the WARN_ON_ONCE in xfs_iunlink_lookup: ip = radix_tree_lookup(&pag->pag_ici_root, agino); if (WARN_ON_ONCE(!ip || !ip->i_ino)) { ... } >From diagnostic data collected by the bug reporters, it would appear that we cleanly mounted a filesystem that contained unlinked inodes. Unlinked inodes are only processed as a final step of log recovery, which means that clean mounts do not process the unlinked list at all. Prior to the introduction of the incore unlinked lists, this wasn't a problem because the unlink code would (very expensively) traverse the entire ondisk metadata iunlink chain to keep things up to date. However, the incore unlinked list code complains when it realizes that it is out of sync with the ondisk metadata and shuts down the fs, which is bad. Ritesh proposed to solve this problem by unconditionally parsing the unlinked lists at mount time, but this imposes a mount time cost for every filesystem to catch something that should be very infrequent. Instead, let's target the places where we can encounter a next_unlinked pointer that refers to an inode that is not in cache, and load it into cache. Note: This patch does not address the problem of iget loading an inode from the middle of the iunlink list and needing to set i_prev_unlinked correctly. Reported-by: shrikanth hegde Triaged-by: Ritesh Harjani Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 0cc1922687896e30cefcad767959da29f379c302 Author: Shiyang Ruan Date: Tue Sep 24 11:38:42 2024 -0700 xfs: fix the calculation for "end" and "length" [ Upstream commit 5cf32f63b0f4c520460c1a5dd915dc4f09085f29 ] The value of "end" should be "start + length - 1". Signed-off-by: Shiyang Ruan Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 4427e3d36228f691b9a0e77b1f045ffdfa046ea2 Author: Dave Chinner Date: Tue Sep 24 11:38:41 2024 -0700 xfs: remove WARN when dquot cache insertion fails [ Upstream commit 4b827b3f305d1fcf837265f1e12acc22ee84327c ] It just creates unnecessary bot noise these days. Reported-by: syzbot+6ae213503fb12e87934f@syzkaller.appspotmail.com Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit e8c6533404243b8e32f227b980c8539a7d42ede0 Author: Long Li Date: Tue Sep 24 11:38:40 2024 -0700 xfs: fix ag count overflow during growfs [ Upstream commit c3b880acadc95d6e019eae5d669e072afda24f1b ] I found a corruption during growfs: XFS (loop0): Internal error agbno >= mp->m_sb.sb_agblocks at line 3661 of file fs/xfs/libxfs/xfs_alloc.c. Caller __xfs_free_extent+0x28e/0x3c0 CPU: 0 PID: 573 Comm: xfs_growfs Not tainted 6.3.0-rc7-next-20230420-00001-gda8c95746257 Call Trace: dump_stack_lvl+0x50/0x70 xfs_corruption_error+0x134/0x150 __xfs_free_extent+0x2c1/0x3c0 xfs_ag_extend_space+0x291/0x3e0 xfs_growfs_data+0xd72/0xe90 xfs_file_ioctl+0x5f9/0x14a0 __x64_sys_ioctl+0x13e/0x1c0 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd XFS (loop0): Corruption detected. Unmount and run xfs_repair XFS (loop0): Internal error xfs_trans_cancel at line 1097 of file fs/xfs/xfs_trans.c. Caller xfs_growfs_data+0x691/0xe90 CPU: 0 PID: 573 Comm: xfs_growfs Not tainted 6.3.0-rc7-next-20230420-00001-gda8c95746257 Call Trace: dump_stack_lvl+0x50/0x70 xfs_error_report+0x93/0xc0 xfs_trans_cancel+0x2c0/0x350 xfs_growfs_data+0x691/0xe90 xfs_file_ioctl+0x5f9/0x14a0 __x64_sys_ioctl+0x13e/0x1c0 do_syscall_64+0x39/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f2d86706577 The bug can be reproduced with the following sequence: # truncate -s 1073741824 xfs_test.img # mkfs.xfs -f -b size=1024 -d agcount=4 xfs_test.img # truncate -s 2305843009213693952 xfs_test.img # mount -o loop xfs_test.img /mnt/test # xfs_growfs -D 1125899907891200 /mnt/test The root cause is that during growfs, user space passed in a large value of newblcoks to xfs_growfs_data_private(), due to current sb_agblocks is too small, new AG count will exceed UINT_MAX. Because of AG number type is unsigned int and it would overflow, that caused nagcount much smaller than the actual value. During AG extent space, delta blocks in xfs_resizefs_init_new_ags() will much larger than the actual value due to incorrect nagcount, even exceed UINT_MAX. This will cause corruption and be detected in __xfs_free_extent. Fix it by growing the filesystem to up to the maximally allowed AGs and not return EINVAL when new AG count overflow. Signed-off-by: Long Li Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 02f44e7ff670e258bd09074cc4232bbf3b7bc421 Author: Dave Chinner Date: Tue Sep 24 11:38:39 2024 -0700 xfs: collect errors from inodegc for unlinked inode recovery [ Upstream commit d4d12c02bf5f768f1b423c7ae2909c5afdfe0d5f ] Unlinked list recovery requires errors removing the inode the from the unlinked list get fed back to the main recovery loop. Now that we offload the unlinking to the inodegc work, we don't get errors being fed back when we trip over a corruption that prevents the inode from being removed from the unlinked list. This means we never clear the corrupt unlinked list bucket, resulting in runtime operations eventually tripping over it and shutting down. Fix this by collecting inodegc worker errors and feed them back to the flush caller. This is largely best effort - the only context that really cares is log recovery, and it only flushes a single inode at a time so we don't need complex synchronised handling. Essentially the inodegc workers will capture the first error that occurs and the next flush will gather them and clear them. The flush itself will only report the first gathered error. In the cases where callers can return errors, propagate the collected inodegc flush error up the error handling chain. In the case of inode unlinked list recovery, there are several superfluous calls to flush queued unlinked inodes - xlog_recover_iunlink_bucket() guarantees that it has flushed the inodegc and collected errors before it returns. Hence nothing in the calling path needs to run a flush, even when an error is returned. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 65fc94fc8774146ab96d7c20f138cfab1a4db6f2 Author: Dave Chinner Date: Tue Sep 24 11:38:38 2024 -0700 xfs: fix AGF vs inode cluster buffer deadlock [ Upstream commit 82842fee6e5979ca7e2bf4d839ef890c22ffb7aa ] Lock order in XFS is AGI -> AGF, hence for operations involving inode unlinked list operations we always lock the AGI first. Inode unlinked list operations operate on the inode cluster buffer, so the lock order there is AGI -> inode cluster buffer. For O_TMPFILE operations, this now means the lock order set down in xfs_rename and xfs_link is AGI -> inode cluster buffer -> AGF as the unlinked ops are done before the directory modifications that may allocate space and lock the AGF. Unfortunately, we also now lock the inode cluster buffer when logging an inode so that we can attach the inode to the cluster buffer and pin it in memory. This creates a lock order of AGF -> inode cluster buffer in directory operations as we have to log the inode after we've allocated new space for it. This creates a lock inversion between the AGF and the inode cluster buffer. Because the inode cluster buffer is shared across multiple inodes, the inversion is not specific to individual inodes but can occur when inodes in the same cluster buffer are accessed in different orders. To fix this we need move all the inode log item cluster buffer interactions to the end of the current transaction. Unfortunately, xfs_trans_log_inode() calls are littered throughout the transactions with no thought to ordering against other items or locking. This makes it difficult to do anything that involves changing the call sites of xfs_trans_log_inode() to change locking orders. However, we do now have a mechanism that allows is to postpone dirty item processing to just before we commit the transaction: the ->iop_precommit method. This will be called after all the modifications are done and high level objects like AGI and AGF buffers have been locked and modified, thereby providing a mechanism that guarantees we don't lock the inode cluster buffer before those high level objects are locked. This change is largely moving the guts of xfs_trans_log_inode() to xfs_inode_item_precommit() and providing an extra flag context in the inode log item to track the dirty state of the inode in the current transaction. This also means we do a lot less repeated work in xfs_trans_log_inode() by only doing it once per transaction when all the work is done. Fixes: 298f7bec503f ("xfs: pin inode backing buffer to the inode log item") Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit b4aea9f9e0b2de5e1a2cefcaceea9f982f6c6edd Author: Dave Chinner Date: Tue Sep 24 11:38:37 2024 -0700 xfs: defered work could create precommits [ Upstream commit cb042117488dbf0b3b38b05771639890fada9a52 ] To fix a AGI-AGF-inode cluster buffer deadlock, we need to move inode cluster buffer operations to the ->iop_precommit() method. However, this means that deferred operations can require precommits to be run on the final transaction that the deferred ops pass back to xfs_trans_commit() context. This will be exposed by attribute handling, in that the last changes to the inode in the attr set state machine "disappear" because the precommit operation is not run. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 81274891036602355e72eabbb5184490f354b506 Author: Dave Chinner Date: Tue Sep 24 11:38:36 2024 -0700 xfs: buffer pins need to hold a buffer reference [ Upstream commit 89a4bf0dc3857569a77061d3d5ea2ac85f7e13c6 ] When a buffer is unpinned by xfs_buf_item_unpin(), we need to access the buffer after we've dropped the buffer log item reference count. This opens a window where we can have two racing unpins for the buffer item (e.g. shutdown checkpoint context callback processing racing with journal IO iclog completion processing) and both attempt to access the buffer after dropping the BLI reference count. If we are unlucky, the "BLI freed" context wins the race and frees the buffer before the "BLI still active" case checks the buffer pin count. This results in a use after free that can only be triggered in active filesystem shutdown situations. To fix this, we need to ensure that buffer existence extends beyond the BLI reference count checks and until the unpin processing is complete. This implies that a buffer pin operation must also take a buffer reference to ensure that the buffer cannot be freed until the buffer unpin processing is complete. Reported-by: yangerkun Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit cbf91ddb888d490c09cf90c51edbc9b1b0c263d9 Author: Ye Bin Date: Tue Sep 24 11:38:35 2024 -0700 xfs: fix BUG_ON in xfs_getbmap() [ Upstream commit 8ee81ed581ff35882b006a5205100db0b57bf070 ] There's issue as follows: XFS: Assertion failed: (bmv->bmv_iflags & BMV_IF_DELALLOC) != 0, file: fs/xfs/xfs_bmap_util.c, line: 329 ------------[ cut here ]------------ kernel BUG at fs/xfs/xfs_message.c:102! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 14612 Comm: xfs_io Not tainted 6.3.0-rc2-next-20230315-00006-g2729d23ddb3b-dirty #422 RIP: 0010:assfail+0x96/0xa0 RSP: 0018:ffffc9000fa178c0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff888179a18000 RDX: 0000000000000000 RSI: ffff888179a18000 RDI: 0000000000000002 RBP: 0000000000000000 R08: ffffffff8321aab6 R09: 0000000000000000 R10: 0000000000000001 R11: ffffed1105f85139 R12: ffffffff8aacc4c0 R13: 0000000000000149 R14: ffff888269f58000 R15: 000000000000000c FS: 00007f42f27a4740(0000) GS:ffff88882fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000b92388 CR3: 000000024f006000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: xfs_getbmap+0x1a5b/0x1e40 xfs_ioc_getbmap+0x1fd/0x5b0 xfs_file_ioctl+0x2cb/0x1d50 __x64_sys_ioctl+0x197/0x210 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd Above issue may happen as follows: ThreadA ThreadB do_shared_fault __do_fault xfs_filemap_fault __xfs_filemap_fault filemap_fault xfs_ioc_getbmap -> Without BMV_IF_DELALLOC flag xfs_getbmap xfs_ilock(ip, XFS_IOLOCK_SHARED); filemap_write_and_wait do_page_mkwrite xfs_filemap_page_mkwrite __xfs_filemap_fault xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); iomap_page_mkwrite ... xfs_buffered_write_iomap_begin xfs_bmapi_reserve_delalloc -> Allocate delay extent xfs_ilock_data_map_shared(ip) xfs_getbmap_report_one ASSERT((bmv->bmv_iflags & BMV_IF_DELALLOC) != 0) -> trigger BUG_ON As xfs_filemap_page_mkwrite() only hold XFS_MMAPLOCK_SHARED lock, there's small window mkwrite can produce delay extent after file write in xfs_getbmap(). To solve above issue, just skip delalloc extents. Signed-off-by: Ye Bin Reviewed-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit fcd6ff906db9132dead3e744419c97e448b0b8a3 Author: Dave Chinner Date: Tue Sep 24 11:38:34 2024 -0700 xfs: quotacheck failure can race with background inode inactivation [ Upstream commit 0c7273e494dd5121e20e160cb2f047a593ee14a8 ] The background inode inactivation can attached dquots to inodes, but this can race with a foreground quotacheck failure that leads to disabling quotas and freeing the mp->m_quotainfo structure. The background inode inactivation then tries to allocate a quota, tries to dereference mp->m_quotainfo, and crashes like so: XFS (loop1): Quotacheck: Unsuccessful (Error -5): Disabling quotas. xfs filesystem being mounted at /root/syzkaller.qCVHXV/0/file0 supports timestamps until 2038 (0x7fffffff) BUG: kernel NULL pointer dereference, address: 00000000000002a8 .... CPU: 0 PID: 161 Comm: kworker/0:4 Not tainted 6.2.0-c9c3395d5e3d #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: xfs-inodegc/loop1 xfs_inodegc_worker RIP: 0010:xfs_dquot_alloc+0x95/0x1e0 .... Call Trace: xfs_qm_dqread+0x46/0x440 xfs_qm_dqget_inode+0x154/0x500 xfs_qm_dqattach_one+0x142/0x3c0 xfs_qm_dqattach_locked+0x14a/0x170 xfs_qm_dqattach+0x52/0x80 xfs_inactive+0x186/0x340 xfs_inodegc_worker+0xd3/0x430 process_one_work+0x3b1/0x960 worker_thread+0x52/0x660 kthread+0x161/0x1a0 ret_from_fork+0x29/0x50 .... Prevent this race by flushing all the queued background inode inactivations pending before purging all the cached dquots when quotacheck fails. Reported-by: Pengfei Xu Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 120108df9236f276e8547bd09ecdacc26d484780 Author: Darrick J. Wong Date: Tue Sep 24 11:38:33 2024 -0700 xfs: fix uninitialized variable access [ Upstream commit 60b730a40c43fbcc034970d3e77eb0f25b8cc1cf ] If the end position of a GETFSMAP query overlaps an allocated space and we're using the free space info to generate fsmap info, the akeys information gets fed into the fsmap formatter with bad results. Zero-init the space. Reported-by: syzbot+090ae72d552e6bd93cfe@syzkaller.appspotmail.com Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit ce563912b084c6734a4fd9809fe1a0a462d2561c Author: Dave Chinner Date: Tue Sep 24 11:38:32 2024 -0700 xfs: block reservation too large for minleft allocation [ Upstream commit d5753847b216db0e553e8065aa825cfe497ad143 ] When we enter xfs_bmbt_alloc_block() without having first allocated a data extent (i.e. tp->t_firstblock == NULLFSBLOCK) because we are doing something like unwritten extent conversion, the transaction block reservation is used as the minleft value. This works for operations like unwritten extent conversion, but it assumes that the block reservation is only for a BMBT split. THis is not always true, and sometimes results in larger than necessary minleft values being set. We only actually need enough space for a btree split, something we already handle correctly in xfs_bmapi_write() via the xfs_bmapi_minleft() calculation. We should use xfs_bmapi_minleft() in xfs_bmbt_alloc_block() to calculate the number of blocks a BMBT split on this inode is going to require, not use the transaction block reservation that contains the maximum number of blocks this transaction may consume in it... Signed-off-by: Dave Chinner Reviewed-by: Allison Henderson Reviewed-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 0e3c9d69501782dabc7b29c708e7ba90c8b7285a Author: Dave Chinner Date: Tue Sep 24 11:38:31 2024 -0700 xfs: prefer free inodes at ENOSPC over chunk allocation [ Upstream commit f08f984c63e9980614ae3a0a574b31eaaef284b2 ] When an XFS filesystem has free inodes in chunks already allocated on disk, it will still allocate new inode chunks if the target AG has no free inodes in it. Normally, this is a good idea as it preserves locality of all the inodes in a given directory. However, at ENOSPC this can lead to using the last few remaining free filesystem blocks to allocate a new chunk when there are many, many free inodes that could be allocated without consuming free space. This results in speeding up the consumption of the last few blocks and inode create operations then returning ENOSPC when there free inodes available because we don't have enough block left in the filesystem for directory creation reservations to proceed. Hence when we are near ENOSPC, we should be attempting to preserve the remaining blocks for directory block allocation rather than using them for unnecessary inode chunk creation. This particular behaviour is exposed by xfs/294, when it drives to ENOSPC on empty file creation whilst there are still thousands of free inodes available for allocation in other AGs in the filesystem. Hence, when we are within 1% of ENOSPC, change the inode allocation behaviour to prefer to use existing free inodes over allocating new inode chunks, even though it results is poorer locality of the data set. It is more important for the allocations to be space efficient near ENOSPC than to have optimal locality for performance, so lets modify the inode AG selection code to reflect that fact. This allows generic/294 to not only pass with this allocator rework patchset, but to increase the number of post-ENOSPC empty inode allocations to from ~600 to ~9080 before we hit ENOSPC on the directory create transaction reservation. Signed-off-by: Dave Chinner Reviewed-by: Allison Henderson Reviewed-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit bb798c9128874cce32ad4b2eba1f403d7d800e2d Author: Dave Chinner Date: Tue Sep 24 11:38:30 2024 -0700 xfs: fix low space alloc deadlock [ Upstream commit 1dd0510f6d4b85616a36aabb9be38389467122d9 ] I've recently encountered an ABBA deadlock with g/476. The upcoming changes seem to make this much easier to hit, but the underlying problem is a pre-existing one. Essentially, if we select an AG for allocation, then lock the AGF and then fail to allocate for some reason (e.g. minimum length requirements cannot be satisfied), then we drop out of the allocation with the AGF still locked. The caller then modifies the allocation constraints - usually loosening them up - and tries again. This can result in trying to access AGFs that are lower than the AGF we already have locked from the failed attempt. e.g. the failed attempt skipped several AGs before failing, so we have locks an AG higher than the start AG. Retrying the allocation from the start AG then causes us to violate AGF lock ordering and this can lead to deadlocks. The deadlock exists even if allocation succeeds - we can do a followup allocations in the same transaction for BMBT blocks that aren't guaranteed to be in the same AG as the original, and can move into higher AGs. Hence we really need to move the tp->t_firstblock tracking down into xfs_alloc_vextent() where it can be set when we exit with a locked AG. xfs_alloc_vextent() can also check there if the requested allocation falls within the allow range of AGs set by tp->t_firstblock. If we can't allocate within the range set, we have to fail the allocation. If we are allowed to to non-blocking AGF locking, we can ignore the AG locking order limitations as we can use try-locks for the first iteration over requested AG range. This invalidates a set of post allocation asserts that check that the allocation is always above tp->t_firstblock if it is set. Because we can use try-locks to avoid the deadlock in some circumstances, having a pre-existing locked AGF doesn't always prevent allocation from lower order AGFs. Hence those ASSERTs need to be removed. Signed-off-by: Dave Chinner Reviewed-by: Allison Henderson Reviewed-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit cdbc02da9f9c713615d4f425aa3442f3b5915eea Author: Dave Chinner Date: Tue Sep 24 11:38:29 2024 -0700 xfs: don't use BMBT btree split workers for IO completion [ Upstream commit c85007e2e3942da1f9361e4b5a9388ea3a8dcc5b ] When we split a BMBT due to record insertion, we offload it to a worker thread because we can be deep in the stack when we try to allocate a new block for the BMBT. Allocation can use several kilobytes of stack (full memory reclaim, swap and/or IO path can end up on the stack during allocation) and we can already be several kilobytes deep in the stack when we need to split the BMBT. A recent workload demonstrated a deadlock in this BMBT split offload. It requires several things to happen at once: 1. two inodes need a BMBT split at the same time, one must be unwritten extent conversion from IO completion, the other must be from extent allocation. 2. there must be a no available xfs_alloc_wq worker threads available in the worker pool. 3. There must be sustained severe memory shortages such that new kworker threads cannot be allocated to the xfs_alloc_wq pool for both threads that need split work to be run 4. The split work from the unwritten extent conversion must run first. 5. when the BMBT block allocation runs from the split work, it must loop over all AGs and not be able to either trylock an AGF successfully, or each AGF is is able to lock has no space available for a single block allocation. 6. The BMBT allocation must then attempt to lock the AGF that the second task queued to the rescuer thread already has locked before it finds an AGF it can allocate from. At this point, we have an ABBA deadlock between tasks queued on the xfs_alloc_wq rescuer thread and a locked AGF. i.e. The queued task holding the AGF lock can't be run by the rescuer thread until the task the rescuer thread is runing gets the AGF lock.... This is a highly improbably series of events, but there it is. There's a couple of ways to fix this, but the easiest way to ensure that we only punt tasks with a locked AGF that holds enough space for the BMBT block allocations to the worker thread. This works for unwritten extent conversion in IO completion (which doesn't have a locked AGF and space reservations) because we have tight control over the IO completion stack. It is typically only 6 functions deep when xfs_btree_split() is called because we've already offloaded the IO completion work to a worker thread and hence we don't need to worry about stack overruns here. The other place we can be called for a BMBT split without a preceeding allocation is __xfs_bunmapi() when punching out the center of an existing extent. We don't remove extents in the IO path, so these operations don't tend to be called with a lot of stack consumed. Hence we don't really need to ship the split off to a worker thread in these cases, either. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit 98b8fd60b338d5d35b1092e011d996e58e1820ca Author: Wengang Wang Date: Tue Sep 24 11:38:28 2024 -0700 xfs: fix extent busy updating [ Upstream commit 601a27ea09a317d0fe2895df7d875381fb393041 ] In xfs_extent_busy_update_extent() case 6 and 7, whenever bno is modified on extent busy, the relavent length has to be modified accordingly. Signed-off-by: Wengang Wang Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit b36c2ae02aa7c6559e32e342e05b5b26301b30ad Author: Wu Guanghao Date: Tue Sep 24 11:38:27 2024 -0700 xfs: Fix deadlock on xfs_inodegc_worker [ Upstream commit 4da112513c01d7d0acf1025b8764349d46e177d6 ] We are doing a test about deleting a large number of files when memory is low. A deadlock problem was found. [ 1240.279183] -> #1 (fs_reclaim){+.+.}-{0:0}: [ 1240.280450] lock_acquire+0x197/0x460 [ 1240.281548] fs_reclaim_acquire.part.0+0x20/0x30 [ 1240.282625] kmem_cache_alloc+0x2b/0x940 [ 1240.283816] xfs_trans_alloc+0x8a/0x8b0 [ 1240.284757] xfs_inactive_ifree+0xe4/0x4e0 [ 1240.285935] xfs_inactive+0x4e9/0x8a0 [ 1240.286836] xfs_inodegc_worker+0x160/0x5e0 [ 1240.287969] process_one_work+0xa19/0x16b0 [ 1240.289030] worker_thread+0x9e/0x1050 [ 1240.290131] kthread+0x34f/0x460 [ 1240.290999] ret_from_fork+0x22/0x30 [ 1240.291905] [ 1240.291905] -> #0 ((work_completion)(&gc->work)){+.+.}-{0:0}: [ 1240.293569] check_prev_add+0x160/0x2490 [ 1240.294473] __lock_acquire+0x2c4d/0x5160 [ 1240.295544] lock_acquire+0x197/0x460 [ 1240.296403] __flush_work+0x6bc/0xa20 [ 1240.297522] xfs_inode_mark_reclaimable+0x6f0/0xdc0 [ 1240.298649] destroy_inode+0xc6/0x1b0 [ 1240.299677] dispose_list+0xe1/0x1d0 [ 1240.300567] prune_icache_sb+0xec/0x150 [ 1240.301794] super_cache_scan+0x2c9/0x480 [ 1240.302776] do_shrink_slab+0x3f0/0xaa0 [ 1240.303671] shrink_slab+0x170/0x660 [ 1240.304601] shrink_node+0x7f7/0x1df0 [ 1240.305515] balance_pgdat+0x766/0xf50 [ 1240.306657] kswapd+0x5bd/0xd20 [ 1240.307551] kthread+0x34f/0x460 [ 1240.308346] ret_from_fork+0x22/0x30 [ 1240.309247] [ 1240.309247] other info that might help us debug this: [ 1240.309247] [ 1240.310944] Possible unsafe locking scenario: [ 1240.310944] [ 1240.312379] CPU0 CPU1 [ 1240.313363] ---- ---- [ 1240.314433] lock(fs_reclaim); [ 1240.315107] lock((work_completion)(&gc->work)); [ 1240.316828] lock(fs_reclaim); [ 1240.318088] lock((work_completion)(&gc->work)); [ 1240.319203] [ 1240.319203] *** DEADLOCK *** ... [ 2438.431081] Workqueue: xfs-inodegc/sda xfs_inodegc_worker [ 2438.432089] Call Trace: [ 2438.432562] __schedule+0xa94/0x1d20 [ 2438.435787] schedule+0xbf/0x270 [ 2438.436397] schedule_timeout+0x6f8/0x8b0 [ 2438.445126] wait_for_completion+0x163/0x260 [ 2438.448610] __flush_work+0x4c4/0xa40 [ 2438.455011] xfs_inode_mark_reclaimable+0x6ef/0xda0 [ 2438.456695] destroy_inode+0xc6/0x1b0 [ 2438.457375] dispose_list+0xe1/0x1d0 [ 2438.458834] prune_icache_sb+0xe8/0x150 [ 2438.461181] super_cache_scan+0x2b3/0x470 [ 2438.461950] do_shrink_slab+0x3cf/0xa50 [ 2438.462687] shrink_slab+0x17d/0x660 [ 2438.466392] shrink_node+0x87e/0x1d40 [ 2438.467894] do_try_to_free_pages+0x364/0x1300 [ 2438.471188] try_to_free_pages+0x26c/0x5b0 [ 2438.473567] __alloc_pages_slowpath.constprop.136+0x7aa/0x2100 [ 2438.482577] __alloc_pages+0x5db/0x710 [ 2438.485231] alloc_pages+0x100/0x200 [ 2438.485923] allocate_slab+0x2c0/0x380 [ 2438.486623] ___slab_alloc+0x41f/0x690 [ 2438.490254] __slab_alloc+0x54/0x70 [ 2438.491692] kmem_cache_alloc+0x23e/0x270 [ 2438.492437] xfs_trans_alloc+0x88/0x880 [ 2438.493168] xfs_inactive_ifree+0xe2/0x4e0 [ 2438.496419] xfs_inactive+0x4eb/0x8b0 [ 2438.497123] xfs_inodegc_worker+0x16b/0x5e0 [ 2438.497918] process_one_work+0xbf7/0x1a20 [ 2438.500316] worker_thread+0x8c/0x1060 [ 2438.504938] ret_from_fork+0x22/0x30 When the memory is insufficient, xfs_inonodegc_worker will trigger memory reclamation when memory is allocated, then flush_work() may be called to wait for the work to complete. This causes a deadlock. So use memalloc_nofs_save() to avoid triggering memory reclamation in xfs_inodegc_worker. Signed-off-by: Wu Guanghao Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit d2b4752119ca629b175d599d9a68b90ae010abfb Author: Dave Chinner Date: Tue Sep 24 11:38:26 2024 -0700 xfs: dquot shrinker doesn't check for XFS_DQFLAG_FREEING [ Upstream commit 52f31ed228212ba572c44e15e818a3a5c74122c0 ] Resulting in a UAF if the shrinker races with some other dquot freeing mechanism that sets XFS_DQFLAG_FREEING before the dquot is removed from the LRU. This can occur if a dquot purge races with drop_caches. Reported-by: syzbot+912776840162c13db1a3@syzkaller.appspotmail.com Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Leah Rumancik Acked-by: Chandan Babu R Signed-off-by: Greg Kroah-Hartman commit cfb926051fab19b10d1e65976211f364aa820180 Author: Ferry Meng Date: Mon May 20 10:40:24 2024 +0800 ocfs2: strict bound check before memcmp in ocfs2_xattr_find_entry() [ Upstream commit af77c4fc1871847b528d58b7fdafb4aa1f6a9262 ] xattr in ocfs2 maybe 'non-indexed', which saved with additional space requested. It's better to check if the memory is out of bound before memcmp, although this possibility mainly comes from crafted poisonous images. Link: https://lkml.kernel.org/r/20240520024024.1976129-2-joseph.qi@linux.alibaba.com Signed-off-by: Ferry Meng Signed-off-by: Joseph Qi Reported-by: lei lu Reviewed-by: Joseph Qi Cc: Changwei Ge Cc: Gang He Cc: Joel Becker Cc: Jun Piao Cc: Junxiao Bi Cc: Mark Fasheh Signed-off-by: Andrew Morton Signed-off-by: Sasha Levin commit 9b32539590a8e6400ac2f6e7cf9cbb8e08711a2f Author: Ferry Meng Date: Mon May 20 10:40:23 2024 +0800 ocfs2: add bounds checking to ocfs2_xattr_find_entry() [ Upstream commit 9e3041fecdc8f78a5900c3aa51d3d756e73264d6 ] Add a paranoia check to make sure it doesn't stray beyond valid memory region containing ocfs2 xattr entries when scanning for a match. It will prevent out-of-bound access in case of crafted images. Link: https://lkml.kernel.org/r/20240520024024.1976129-1-joseph.qi@linux.alibaba.com Signed-off-by: Ferry Meng Signed-off-by: Joseph Qi Reported-by: lei lu Reviewed-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Changwei Ge Cc: Gang He Cc: Jun Piao Signed-off-by: Andrew Morton Stable-dep-of: af77c4fc1871 ("ocfs2: strict bound check before memcmp in ocfs2_xattr_find_entry()") Signed-off-by: Sasha Levin commit 8220c3e2ab77b6daa6778f587b9a87446143e514 Author: Geert Uytterhoeven Date: Tue Sep 3 14:32:27 2024 +0200 spi: spidev: Add missing spi_device_id for jg10309-01 [ Upstream commit 5478a4f7b94414def7b56d2f18bc2ed9b0f3f1f2 ] When the of_device_id entry for "elgin,jg10309-01" was added, the corresponding spi_device_id was forgotten, causing a warning message during boot-up: SPI driver spidev has no spi_device_id for elgin,jg10309-01 Fix module autoloading and shut up the warning by adding the missing entry. Fixes: 5f3eee1eef5d0edd ("spi: spidev: Add an entry for elgin,jg10309-01") Signed-off-by: Geert Uytterhoeven Link: https://patch.msgid.link/54bbb9d8a8db7e52d13e266f2d4a9bcd8b42a98a.1725366625.git.geert+renesas@glider.be Signed-off-by: Mark Brown Signed-off-by: Sasha Levin commit 892a5d4f1c492c522289c1e760e0e686b053dff7 Author: Hongyu Jin Date: Tue Jan 30 15:26:34 2024 -0500 block: Fix where bio IO priority gets set [ Upstream commit f3c89983cb4fc00be64eb0d5cbcfcdf2cacb965e ] Commit 82b74cac2849 ("blk-ioprio: Convert from rqos policy to direct call") pushed setting bio I/O priority down into blk_mq_submit_bio() -- which is too low within block core's submit_bio() because it skips setting I/O priority for block drivers that implement fops->submit_bio() (e.g. DM, MD, etc). Fix this by moving bio_set_ioprio() up from blk-mq.c to blk-core.c and call it from submit_bio(). This ensures all block drivers call bio_set_ioprio() during initial bio submission. Fixes: a78418e6a04c ("block: Always initialize bio IO priority on submit") Co-developed-by: Yibin Ding Signed-off-by: Yibin Ding Signed-off-by: Hongyu Jin Reviewed-by: Eric Biggers Reviewed-by: Mikulas Patocka [snitzer: revised commit header] Signed-off-by: Mike Snitzer Reviewed-by: Ming Lei Link: https://lore.kernel.org/r/20240130202638.62600-2-snitzer@kernel.org Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin commit ff913aff000122ebab15e7179a176b8a127f2391 Author: zhang jiao Date: Mon Sep 2 12:21:03 2024 +0800 tools: hv: rm .*.cmd when make clean [ Upstream commit 5e5cc1eb65256e6017e3deec04f9806f2f317853 ] rm .*.cmd when make clean Signed-off-by: zhang jiao Reviewed-by: Saurabh Sengar Link: https://lore.kernel.org/r/20240902042103.5867-1-zhangjiao2@cmss.chinamobile.com Signed-off-by: Wei Liu Message-ID: <20240902042103.5867-1-zhangjiao2@cmss.chinamobile.com> Signed-off-by: Sasha Levin commit 0b78afa66d08e1134c245762c2adad04e134fdec Author: Michael Kelley Date: Wed Jun 5 19:55:59 2024 -0700 x86/hyperv: Set X86_FEATURE_TSC_KNOWN_FREQ when Hyper-V provides frequency [ Upstream commit 8fcc514809de41153b43ccbe1a0cdf7f72b78e7e ] A Linux guest on Hyper-V gets the TSC frequency from a synthetic MSR, if available. In this case, set X86_FEATURE_TSC_KNOWN_FREQ so that Linux doesn't unnecessarily do refined TSC calibration when setting up the TSC clocksource. With this change, a message such as this is no longer output during boot when the TSC is used as the clocksource: [ 1.115141] tsc: Refined TSC clocksource calibration: 2918.408 MHz Furthermore, the guest and host will have exactly the same view of the TSC frequency, which is important for features such as the TSC deadline timer that are emulated by the Hyper-V host. Signed-off-by: Michael Kelley Reviewed-by: Roman Kisel Link: https://lore.kernel.org/r/20240606025559.1631-1-mhklinux@outlook.com Signed-off-by: Wei Liu Message-ID: <20240606025559.1631-1-mhklinux@outlook.com> Signed-off-by: Sasha Levin commit 123c2d18f8f4386962c466e7acf6693235bc2aea Author: Paulo Alcantara Date: Sat Aug 31 21:40:28 2024 -0300 smb: client: fix hang in wait_for_response() for negproto [ Upstream commit 7ccc1465465d78e6411b7bd730d06e7435802b5c ] Call cifs_reconnect() to wake up processes waiting on negotiate protocol to handle the case where server abruptly shut down and had no chance to properly close the socket. Simple reproducer: ssh 192.168.2.100 pkill -STOP smbd mount.cifs //192.168.2.100/test /mnt -o ... [never returns] Cc: Rickard Andersson Signed-off-by: Paulo Alcantara (Red Hat) Signed-off-by: Steve French Signed-off-by: Sasha Levin commit a4a5a153df5df2eefe07af16f2e4e52fd5bb92b9 Author: Liao Chen Date: Sat Aug 31 09:42:31 2024 +0000 spi: bcm63xx: Enable module autoloading [ Upstream commit 709df70a20e990d262c473ad9899314039e8ec82 ] Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded based on the alias from of_device_id table. Signed-off-by: Liao Chen Link: https://patch.msgid.link/20240831094231.795024-1-liaochen4@huawei.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin commit 74968adcecda8ff1c0ac3de05106605be33e17a2 Author: hongchi.peng Date: Mon Aug 26 10:45:17 2024 +0800 drm: komeda: Fix an issue related to normalized zpos [ Upstream commit 258905cb9a6414be5c9ca4aa20ef855f8dc894d4 ] We use komeda_crtc_normalize_zpos to normalize zpos of affected planes to their blending zorder in CU. If there's only one slave plane in affected planes and its layer_split property is enabled, order++ for its split layer, so that when calculating the normalized_zpos of master planes, the split layer of the slave plane is included, but the max_slave_zorder does not include the split layer and keep zero because there's only one slave plane in affacted planes, although we actually use two slave layers in this commit. In most cases, this bug does not result in a commit failure, but assume the following situation: slave_layer 0: zpos = 0, layer split enabled, normalized_zpos = 0;(use slave_layer 2 as its split layer) master_layer 0: zpos = 2, layer_split enabled, normalized_zpos = 2;(use master_layer 2 as its split layer) master_layer 1: zpos = 4, normalized_zpos = 4; master_layer 3: zpos = 5, normalized_zpos = 5; kcrtc_st->max_slave_zorder = 0; When we use master_layer 3 as a input of CU in function komeda_compiz_set_input and check it with function komeda_component_check_input, the parameter idx is equal to normailzed_zpos minus max_slave_zorder, the value of idx is 5 and is euqal to CU's max_active_inputs, so that komeda_component_check_input returns a -EINVAL value. To fix the bug described above, when calculating the max_slave_zorder with the layer_split enabled, count the split layer in this calculation directly. Signed-off-by: hongchi.peng Acked-by: Liviu Dudau Signed-off-by: Liviu Dudau Link: https://patchwork.freedesktop.org/patch/msgid/20240826024517.3739-1-hongchi.peng@siengine.com Signed-off-by: Sasha Levin commit fbef47f59032a2d258ee1560d22ad1d2a43de047 Author: Fabio Estevam Date: Wed Aug 28 15:00:56 2024 -0300 spi: spidev: Add an entry for elgin,jg10309-01 [ Upstream commit 5f3eee1eef5d0edd23d8ac0974f56283649a1512 ] The rv1108-elgin-r1 board has an LCD controlled via SPI in userspace. The marking on the LCD is JG10309-01. Add the "elgin,jg10309-01" compatible string. Signed-off-by: Fabio Estevam Reviewed-by: Heiko Stuebner Link: https://patch.msgid.link/20240828180057.3167190-2-festevam@gmail.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin commit d404252ae77e1750cbaaf353146e7312e52568d9 Author: Liao Chen Date: Mon Aug 26 08:49:23 2024 +0000 ASoC: tda7419: fix module autoloading [ Upstream commit 934b44589da9aa300201a00fe139c5c54f421563 ] Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded based on the alias from of_device_id table. Signed-off-by: Liao Chen Link: https://patch.msgid.link/20240826084924.368387-4-liaochen4@huawei.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin commit b013a1e77094b5dbb6a8515b9d09b7734f9206a9 Author: Liao Chen Date: Mon Aug 26 08:49:21 2024 +0000 ASoC: intel: fix module autoloading [ Upstream commit ae61a3391088d29aa8605c9f2db84295ab993a49 ] Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded based on the alias from of_device_id table. Signed-off-by: Liao Chen Link: https://patch.msgid.link/20240826084924.368387-2-liaochen4@huawei.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin commit 62386a1614f5e84bcf39a11453e521c0d2cf06b8 Author: Hans de Goede Date: Fri Aug 23 09:43:05 2024 +0200 ASoC: Intel: soc-acpi-cht: Make Lenovo Yoga Tab 3 X90F DMI match less strict [ Upstream commit 839a4ec06f75cec8fec2cc5fc14e921d0c3f7369 ] There are 2G and 4G RAM versions of the Lenovo Yoga Tab 3 X90F and it turns out that the 2G version has a DMI product name of "CHERRYVIEW D1 PLATFORM" where as the 4G version has "CHERRYVIEW C0 PLATFORM". The sys-vendor + product-version check are unique enough that the product-name check is not necessary. Drop the product-name check so that the existing DMI match for the 4G RAM version also matches the 2G RAM version. Signed-off-by: Hans de Goede Reviewed-by: Pierre-Louis Bossart Link: https://patch.msgid.link/20240823074305.16873-1-hdegoede@redhat.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin commit f1e32334e96d9cbbfc9083bb7b1197497127d07e Author: Marc Kleine-Budde Date: Fri Jul 5 17:24:42 2024 +0200 can: mcp251xfd: mcp251xfd_ring_init(): check TX-coalescing configuration [ Upstream commit ac2b81eb8b2d104033560daea886ee84531e3d0a ] When changing the interface from CAN-CC to CAN-FD mode the old coalescing parameters are re-used. This might cause problem, as the configured parameters are too big for CAN-FD mode. During testing an invalid TX coalescing configuration has been seen. The problem should be been fixed in the previous patch, but add a safeguard here to ensure that the number of TEF coalescing buffers (if configured) is exactly the half of all TEF buffers. Link: https://lore.kernel.org/all/20240805-mcp251xfd-fix-ringconfig-v1-2-72086f0ca5ee@pengutronix.de Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin commit fef6432edcd8a9d6d2babd29af92d9acb9e9748a Author: Emmanuel Grumbach Date: Sun Aug 25 19:17:01 2024 +0300 wifi: iwlwifi: clear trans->state earlier upon error [ Upstream commit 094513f8a2fbddee51b055d8035f995551f98fce ] When the firmware crashes, we first told the op_mode and only then, changed the transport's state. This is a problem if the op_mode's nic_error() handler needs to send a host command: it'll see that the transport's state still reflects that the firmware is alive. Today, this has no consequences since we set the STATUS_FW_ERROR bit and that will prevent sending host commands. iwl_fw_dbg_stop_restart_recording looks at this bit to know not to send a host command for example. To fix the hibernation, we needed to reset the firmware without having an error and checking STATUS_FW_ERROR to see whether the firmware is alive will no longer hold, so this change is necessary as well. Change the flow a bit. Change trans->state before calling the op_mode's nic_error() method and check trans->state instead of STATUS_FW_ERROR. This will keep the current behavior of iwl_fw_dbg_stop_restart_recording upon firmware error, and it'll allow us to call iwl_fw_dbg_stop_restart_recording safely even if STATUS_FW_ERROR is clear, but yet, the firmware is not alive. Signed-off-by: Emmanuel Grumbach Signed-off-by: Miri Korenblit Link: https://patch.msgid.link/20240825191257.9d7427fbdfd7.Ia056ca57029a382c921d6f7b6a6b28fc480f2f22@changeid [I missed this was a dependency for the hibernation fix, changed the commit message a bit accordingly] Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin commit a8c48e7b8340e28d432ac87d341364435ed85b64 Author: Dmitry Antipov Date: Mon Aug 5 17:20:35 2024 +0300 wifi: mac80211: free skb on error path in ieee80211_beacon_get_ap() [ Upstream commit 786c5be9ac29a39b6f37f1fdd2ea59d0fe35d525 ] In 'ieee80211_beacon_get_ap()', free allocated skb in case of error returned by 'ieee80211_beacon_protect()'. Compile tested only. Signed-off-by: Dmitry Antipov Link: https://patch.msgid.link/20240805142035.227847-1-dmantipov@yandex.ru Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin commit 1b0cd832c9607f41f84053b818e0b7908510a3b9 Author: Emmanuel Grumbach Date: Sun Aug 25 19:17:04 2024 +0300 wifi: iwlwifi: mvm: don't wait for tx queues if firmware is dead [ Upstream commit 3a84454f5204718ca5b4ad2c1f0bf2031e2403d1 ] There is a WARNING in iwl_trans_wait_tx_queues_empty() (that was recently converted from just a message), that can be hit if we wait for TX queues to become empty after firmware died. Clearly, we can't expect anything from the firmware after it's declared dead. Don't call iwl_trans_wait_tx_queues_empty() in this case. While it could be a good idea to stop the flow earlier, the flush functions do some maintenance work that is not related to the firmware, so keep that part of the code running even when the firmware is not running. Signed-off-by: Emmanuel Grumbach Signed-off-by: Miri Korenblit Link: https://patch.msgid.link/20240825191257.a7cbd794cee9.I44a739fbd4ffcc46b83844dd1c7b2eb0c7b270f6@changeid [edit commit message] Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin commit 5948a191906b54e10f02f6b7a7670243a39f99f4 Author: Emmanuel Grumbach Date: Sun Aug 25 19:17:10 2024 +0300 wifi: iwlwifi: mvm: pause TCM when the firmware is stopped [ Upstream commit 0668ebc8c2282ca1e7eb96092a347baefffb5fe7 ] Not doing so will make us send a host command to the transport while the firmware is not alive, which will trigger a WARNING. bad state = 0 WARNING: CPU: 2 PID: 17434 at drivers/net/wireless/intel/iwlwifi/iwl-trans.c:115 iwl_trans_send_cmd+0x1cb/0x1e0 [iwlwifi] RIP: 0010:iwl_trans_send_cmd+0x1cb/0x1e0 [iwlwifi] Call Trace: iwl_mvm_send_cmd+0x40/0xc0 [iwlmvm] iwl_mvm_config_scan+0x198/0x260 [iwlmvm] iwl_mvm_recalc_tcm+0x730/0x11d0 [iwlmvm] iwl_mvm_tcm_work+0x1d/0x30 [iwlmvm] process_one_work+0x29e/0x640 worker_thread+0x2df/0x690 ? rescuer_thread+0x540/0x540 kthread+0x192/0x1e0 ? set_kthread_struct+0x90/0x90 ret_from_fork+0x22/0x30 Signed-off-by: Emmanuel Grumbach Signed-off-by: Miri Korenblit Link: https://patch.msgid.link/20240825191257.5abe71ca1b6b.I97a968cb8be1f24f94652d9b110ecbf6af73f89e@changeid Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin commit 051e6cce7a8e31d70f572029d64d73ccfd0596a2 Author: Daniel Gabay Date: Sun Aug 25 19:17:05 2024 +0300 wifi: iwlwifi: mvm: fix iwl_mvm_scan_fits() calculation [ Upstream commit d44162280899c3fc2c6700e21e491e71c3c96e3d ] The calculation should consider also the 6GHz IE's len, fix that. In addition, in iwl_mvm_sched_scan_start() the scan_fits helper is called only in case non_psc_incldued is true, but it should be called regardless, fix that as well. Signed-off-by: Daniel Gabay Reviewed-by: Ilan Peer Signed-off-by: Miri Korenblit Link: https://patch.msgid.link/20240825191257.7db825442fd2.I99f4d6587709de02072fd57957ec7472331c6b1d@changeid Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin commit ba94a887d262d878c994fd25b01c79f500fa6367 Author: Benjamin Berg Date: Sun Aug 25 19:17:13 2024 +0300 wifi: iwlwifi: lower message level for FW buffer destination [ Upstream commit f8a129c1e10256c785164ed5efa5d17d45fbd81b ] An invalid buffer destination is not a problem for the driver and it does not make sense to report it with the KERN_ERR message level. As such, change the message to use IWL_DEBUG_FW. Reported-by: Len Brown Closes: https://lore.kernel.org/r/CAJvTdKkcxJss=DM2sxgv_MR5BeZ4_OC-3ad6tA40TYH2yqHCWw@mail.gmail.com Signed-off-by: Benjamin Berg Signed-off-by: Miri Korenblit Link: https://patch.msgid.link/20240825191257.20abf78f05bc.Ifbcecc2ae9fb40b9698302507dcba8b922c8d856@changeid Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin commit f0eb100965cb67d1744c42336d8b11dbaf161889 Author: Huacai Chen Date: Mon Aug 26 23:11:32 2024 +0800 LoongArch: Define ARCH_IRQ_INIT_FLAGS as IRQ_NOPROBE [ Upstream commit 274ea3563e5ab9f468c15bfb9d2492803a66d9be ] Currently we call irq_set_noprobe() in a loop for all IRQs, but indeed it only works for IRQs below NR_IRQS_LEGACY because at init_IRQ() only legacy interrupts have been allocated. Instead, we can define ARCH_IRQ_INIT_FLAGS as IRQ_NOPROBE in asm/hwirq.h and the core will automatically set the flag for all interrupts. Reviewed-by: Thomas Gleixner Signed-off-by: Huacai Chen Signed-off-by: Tianyang Zhang Signed-off-by: Sasha Levin commit bb06a6e78e88e4e69618a23c8a15913b6e1c9358 Author: Jacky Chou Date: Thu Aug 22 15:30:06 2024 +0800 net: ftgmac100: Ensure tx descriptor updates are visible [ Upstream commit 4186c8d9e6af57bab0687b299df10ebd47534a0a ] The driver must ensure TX descriptor updates are visible before updating TX pointer and TX clear pointer. This resolves TX hangs observed on AST2600 when running iperf3. Signed-off-by: Jacky Chou Signed-off-by: David S. Miller Signed-off-by: Sasha Levin commit 2e767997252c12df6945a5ca1dd753c60effc193 Author: Mike Rapoport Date: Mon Jul 29 08:33:27 2024 +0300 microblaze: don't treat zero reserved memory regions as error [ Upstream commit 0075df288dd8a7abfe03b3766176c393063591dd ] Before commit 721f4a6526da ("mm/memblock: remove empty dummy entry") the check for non-zero of memblock.reserved.cnt in mmu_init() would always be true either because memblock.reserved.cnt is initialized to 1 or because there were memory reservations earlier. The removal of dummy empty entry in memblock caused this check to fail because now memblock.reserved.cnt is initialized to 0. Remove the check for non-zero of memblock.reserved.cnt because it's perfectly fine to have an empty memblock.reserved array that early in boot. Reported-by: Guenter Roeck Signed-off-by: Mike Rapoport Reviewed-by: Wei Yang Tested-by: Guenter Roeck Link: https://lore.kernel.org/r/20240729053327.4091459-1-rppt@kernel.org Signed-off-by: Guenter Roeck Signed-off-by: Sasha Levin commit 0bc618a68edf60a769d9373507d412547767d54e Author: Ross Brown Date: Tue Jul 30 08:21:42 2024 +0200 hwmon: (asus-ec-sensors) remove VRM temp X570-E GAMING [ Upstream commit 9efaebc0072b8e95505544bf385c20ee8a29d799 ] X570-E GAMING does not have VRM temperature sensor. Signed-off-by: Ross Brown Signed-off-by: Eugene Shalygin Link: https://lore.kernel.org/r/20240730062320.5188-2-eugene.shalygin@gmail.com Signed-off-by: Guenter Roeck Signed-off-by: Sasha Levin commit 2fcf56f51356a8864a3f9758a2cc0a3c67153b02 Author: Thomas Blocher Date: Wed Jul 31 01:16:26 2024 +0200 pinctrl: at91: make it work with current gpiolib [ Upstream commit 752f387faaae0ae2e84d3f496922524785e77d60 ] pinctrl-at91 currently does not support the gpio-groups devicetree property and has no pin-range. Because of this at91 gpios stopped working since patch commit 2ab73c6d8323fa1e ("gpio: Support GPIO controllers without pin-ranges") This was discussed in the patches commit fc328a7d1fcce263 ("gpio: Revert regression in sysfs-gpio (gpiolib.c)") commit 56e337f2cf132632 ("Revert "gpio: Revert regression in sysfs-gpio (gpiolib.c)"") As a workaround manually set pin-range via gpiochip_add_pin_range() until a) pinctrl-at91 is reworked to support devicetree gpio-groups b) another solution as mentioned in commit 56e337f2cf132632 ("Revert "gpio: Revert regression in sysfs-gpio (gpiolib.c)"") is found Signed-off-by: Thomas Blocher Link: https://lore.kernel.org/5b992862-355d-f0de-cd3d-ff99e67a4ff1@ek-dev.de Signed-off-by: Linus Walleij Signed-off-by: Sasha Levin commit 5e2a30d6e9613b95c79205c0cf8110e2eade8546 Author: Sherry Yang Date: Tue Aug 20 23:51:31 2024 -0700 scsi: lpfc: Fix overflow build issue [ Upstream commit 3417c9574e368f0330637505f00d3814ca8854d2 ] Build failed while enabling "CONFIG_GCOV_KERNEL=y" and "CONFIG_GCOV_PROFILE_ALL=y" with following error: BUILDSTDERR: drivers/scsi/lpfc/lpfc_bsg.c: In function 'lpfc_get_cgnbuf_info': BUILDSTDERR: ./include/linux/fortify-string.h:114:33: error: '__builtin_memcpy' accessing 18446744073709551615 bytes at offsets 0 and 0 overlaps 9223372036854775807 bytes at offset -9223372036854775808 [-Werror=restrict] BUILDSTDERR: 114 | #define __underlying_memcpy __builtin_memcpy BUILDSTDERR: | ^ BUILDSTDERR: ./include/linux/fortify-string.h:637:9: note: in expansion of macro '__underlying_memcpy' BUILDSTDERR: 637 | __underlying_##op(p, q, __fortify_size); \ BUILDSTDERR: | ^~~~~~~~~~~~~ BUILDSTDERR: ./include/linux/fortify-string.h:682:26: note: in expansion of macro '__fortify_memcpy_chk' BUILDSTDERR: 682 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \ BUILDSTDERR: | ^~~~~~~~~~~~~~~~~~~~ BUILDSTDERR: drivers/scsi/lpfc/lpfc_bsg.c:5468:9: note: in expansion of macro 'memcpy' BUILDSTDERR: 5468 | memcpy(cgn_buff, cp, cinfosz); BUILDSTDERR: | ^~~~~~ This happens from the commit 06bb7fc0feee ("kbuild: turn on -Wrestrict by default"). Address this issue by using size_t type. Signed-off-by: Sherry Yang Link: https://lore.kernel.org/r/20240821065131.1180791-1-sherry.yang@oracle.com Reviewed-by: Justin Tee Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin commit fce0cddf7e3b7f83ea634b54d9bc3adad1bc2f44 Author: Kailang Yang Date: Thu Aug 22 16:46:56 2024 +0800 ALSA: hda/realtek - FIxed ALC285 headphone no sound [ Upstream commit 1fa7b099d60ad64f559bd3b8e3f0d94b2e015514 ] Dell platform with ALC215 ALC285 ALC289 ALC225 ALC295 ALC299, plug headphone or headset. It had a chance to get no sound from headphone. Replace depop procedure will solve this issue. Signed-off-by: Kailang Yang Link: https://lore.kernel.org/d0de1b03fd174520945dde216d765223@realtek.com Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin commit 85ba7682ee3ed30ef029cefaa3a612d2bb2043ba Author: Kailang Yang Date: Thu Aug 22 10:54:19 2024 +0800 ALSA: hda/realtek - Fixed ALC256 headphone no sound [ Upstream commit 9b82ff1362f50914c8292902e07be98a9f59d33d ] Dell platform, plug headphone or headset, it had a chance to get no sound from headphone. Replace depop procedure will solve this issue. Signed-off-by: Kailang Yang Link: https://lore.kernel.org/bb8e2de30d294dc287944efa0667685a@realtek.com Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin commit 555e0d606aba1bfc17f63cae32f3a5f64019c82a Author: Hongbo Li Date: Wed Aug 21 14:19:55 2024 +0800 ASoC: allow module autoloading for table board_ids [ Upstream commit 5f7c98b7519a3a847d9182bd99d57ea250032ca1 ] Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded based on the alias from platform_device_id table. Signed-off-by: Hongbo Li Link: https://patch.msgid.link/20240821061955.2273782-3-lihongbo22@huawei.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin commit 9434465f3a0d56821c4f358d18fe9834e3ea9a04 Author: Hongbo Li Date: Wed Aug 21 14:19:54 2024 +0800 ASoC: allow module autoloading for table db1200_pids [ Upstream commit 0e9fdab1e8df490354562187cdbb8dec643eae2c ] Add MODULE_DEVICE_TABLE(), so modules could be properly autoloaded based on the alias from platform_device_id table. Signed-off-by: Hongbo Li Link: https://patch.msgid.link/20240821061955.2273782-2-lihongbo22@huawei.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin commit 39d4d5a285af8fd2f4cd7530dbd5a850e1d12bb8 Author: Albert Jakieła Date: Fri Aug 9 13:56:27 2024 +0000 ASoC: SOF: mediatek: Add missing board compatible [ Upstream commit c0196faaa927321a63e680427e075734ee656e42 ] Add Google Dojo compatible. Signed-off-by: Albert Jakieła Reviewed-by: Chen-Yu Tsai Link: https://patch.msgid.link/20240809135627.544429-1-jakiela@google.com Signed-off-by: Mark Brown Signed-off-by: Sasha Levin