commit e46dc0adfe39724bcf52cea47b8f9c9aed86a394 Author: Greg Kroah-Hartman Date: Sat Jul 4 13:44:22 2026 +0200 Linux 6.18.38 Link: https://lore.kernel.org/r/20260702155112.110058792@linuxfoundation.org Tested-by: Brett A C Sheffield Tested-by: Peter Schneider Tested-by: Miguel Ojeda Tested-by: Shung-Hsi Yu Link: https://lore.kernel.org/r/20260703072816.644513463@linuxfoundation.org Tested-by: Brett A C Sheffield Tested-by: Mark Brown Tested-by: Ron Economos Tested-by: Wentao Guan Tested-by: Miguel Ojeda Signed-off-by: Greg Kroah-Hartman commit 92c63a5ef3c7a2136ee71324c28f1799769206a2 Author: John Johansen Date: Mon Jun 22 16:34:13 2026 -0700 apparmor: advertise the tcp fast open fix is applied commit 2f6701a5ce6257ae7a64ddc6d89d0a08d2a034f8 upstream. The fix for tcp-fast-open ensures that the connect permission is being mediated correctly but it didn't add an artifact to the feature set to advertise the fix is available. Add an artifact so that the test suite can identify if the fix has not been properly applied or a new unexpected regression has occurred. Fixes: 4d587cd8a7215 ("apparmor: mediate the implicit connect of TCP fast open sendmsg") Signed-off-by: John Johansen Signed-off-by: Greg Kroah-Hartman commit e77fbefd1269b5c123e7c651a1ebdce1b87d19a0 Author: HanQuan Date: Tue Jun 23 01:52:08 2026 +0000 net/tcp-ao: fix use-after-free of key in del_async path commit 5ba9950bc9078e19b69cca1e56d1553b125c6857 upstream. In tcp_ao_delete_key(), the del_async path skips the current_key and rnext_key validity checks present in the synchronous path, assuming these pointers are always NULL on LISTEN sockets. However, if a key was added with set_current=1/set_rnext=1 while the socket was in CLOSE state, current_key and rnext_key will be non-NULL after listen() transitions the socket to LISTEN. When such a key is deleted with del_async=1, hlist_del_rcu() and call_rcu() free the key without clearing the dangling pointers. After the RCU grace period, getsockopt(TCP_AO_INFO) dereferences current_key->sndid and rnext_key->rcvid from freed slab memory. Clear current_key and rnext_key in the del_async path when they reference the key being deleted. Fixes: d6732b95b6fb ("net/tcp: Allow asynchronous delete for TCP-AO keys (MKTs)") Signed-off-by: HanQuan Reviewed-by: Eric Dumazet Link: https://patch.msgid.link/20260623015208.1191687-1-eilaimemedsnaimel@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman commit 3d205fe80f2181f0109150ad1fa06ee5bc046935 Author: Stepan Ionichev Date: Thu May 14 19:37:45 2026 +0500 serial: 8250_dw: unregister 8250 port if clk_notifier_register() fails commit 10fc708b4de7f86002d2d735a2dbf3b5b7f65692 upstream. dw8250_probe() registers the 8250 port via serial8250_register_8250_port() and then, if the device has a clock, registers a clock notifier. If clk_notifier_register() fails, probe returns the error but leaves the 8250 port registered. The matching serial8250_unregister_port() lives in dw8250_remove(), which is not called when probe fails, so the port slot stays occupied until the device is rebound or the system is rebooted. The devm-allocated driver data is freed while the port still references it (via the saved private_data and serial_in/serial_out callbacks), so any access to that port slot before a rebind is a use-after-free hazard. Unregister the port on the clk_notifier_register() error path. Fixes: cc816969d7b5 ("serial: 8250_dw: Fix common clocks usage race condition") Cc: stable@vger.kernel.org Signed-off-by: Stepan Ionichev Reviewed-by: Andy Shevchenko Link: https://patch.msgid.link/20260514143746.23671-2-sozdayvek@gmail.com Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 7627ff8c4f9919f14de562b0160ab4ec9d80b1f7 Author: Hem Parekh Date: Tue Jun 2 16:56:46 2026 -0700 ksmbd: fix out-of-bounds read in smb_check_perm_dacl() commit 1ef06004ed4bd6d3ed8c840d9d1a376b66d4935b upstream. The permission-check ACE walk in smb_check_perm_dacl() validates the ACE header size and caps sid.num_subauth at SID_MAX_SUB_AUTHORITIES, but it never checks that ace->size is actually large enough to contain num_subauth sub-authorities before compare_sids() dereferences them. CIFS_SID_BASE_SIZE covers the SID header up to but excluding the sub_auth[] array, and offsetof(struct smb_ace, sid) is the ACE header, so the existing guards only guarantee the 8-byte SID base, i.e. zero sub-authorities. compare_sids() then reads ace->sid.sub_auth[i] for i < min(local_sid->num_subauth, ace->sid.num_subauth). The local comparison SIDs (sid_everyone, sid_unix_NFS_mode, and the id_to_sid() result) always have at least one sub-authority, and an attacker controls the ACE revision and authority bytes (which lie within the in-bounds SID base), so they can match one of those SIDs and force the sub_auth read. A crafted ACE with size == 16 and num_subauth >= 1 placed at the tail of the security descriptor therefore causes a heap out-of-bounds read of up to SID_MAX_SUB_AUTHORITIES * sizeof(__le32) bytes past the pntsd allocation. The security descriptor is loaded by ksmbd_vfs_get_sd_xattr() into a buffer sized exactly to the on-disk data (kzalloc(sd_size) in ndr_decode_v4_ntacl()), so the read lands past the allocation. The malformed descriptor can be stored verbatim via SMB2_SET_INFO (the DACL is not normalised before being written to the security.NTACL xattr) and the read fires on a subsequent SMB2_CREATE access check, making this reachable by an authenticated client on a share that uses ACL xattrs. Add the missing num_subauth-versus-ace_size check, mirroring the identical guards already present in the sibling parsers parse_dacl() and smb_inherit_dacl(). Fixes: d07b26f39246 ("ksmbd: require minimum ACE size in smb_check_perm_dacl()") Cc: stable@vger.kernel.org Signed-off-by: Hem Parekh Acked-by: Namjae Jeon Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman commit 62c26720121bfcb565f136d99bc40a7378a66fa0 Author: Markus Elfring Date: Sun Jun 14 09:56:35 2026 +0200 NFS: Prevent resource leak in nfs_alloc_server() commit d189f224308c8ac3feeea8e442c99922bd18f1b2 upstream. It was overlooked to call ida_free() after a failed nfs_alloc_iostats() call. Thus add the missed function call in an if branch. Fixes: 1c7251187dc067a6d460cf33ca67da9c1dd87807 ("NFS: add superblock sysfs entries") Cc: stable@vger.kernel.org Reported-by: Christophe Jaillet Closes: https://lore.kernel.org/linux-nfs/1c8e10c9-def7-4f0d-8aa1-23c8035a38c8@wanadoo.fr/ Signed-off-by: Markus Elfring Signed-off-by: Anna Schumaker Signed-off-by: Greg Kroah-Hartman commit 6919eb549e8f3caaef16d918749bf2b68d18532d Author: Igor Raits Date: Wed Apr 29 12:49:38 2026 +0200 NFSv4: clear exception state on successful mkdir retry commit 238e9b51aa29f48b6243212a3b75c8e48d6b96fd upstream. After a server returns NFS4ERR_DELAY for an NFSv4 CREATE issued by mkdir(2), the client correctly waits and retries. When the retry succeeds, however, mkdir(2) can still surface -EEXIST to userspace even though the directory was just created on the server. Reproducer (random 16-hex names so collisions are not the cause) against an in-kernel Linux nfsd; reproduces under both NFSv4.0 and NFSv4.2: N=2000000; base=/var/gdc/export for ((i=1; i<=N; i++)); do d=$base/$(openssl rand -hex 8) mkdir "$d" 2>/dev/null || echo "$(date +%T) failed loop=$i $d" rmdir "$d" 2>/dev/null done Failures cluster at the cadence at which the server-side auth/export cache refresh path causes nfsd to return NFS4ERR_DELAY for CREATE. A wire trace of one failure (the three CREATE RPCs all come from a single mkdir(2), generated by the do-while in nfs4_proc_mkdir()): client -> server CREATE name=... -> NFS4ERR_DELAY ~100 ms later client -> server CREATE name=... -> NFS4_OK (dir created) ~80 us later client -> server CREATE name=... -> NFS4ERR_EXIST (correct) Since commit dd862da61e91 ("nfs: fix incorrect handling of large-number NFS errors in nfs4_do_mkdir()"), nfs4_handle_exception() is called only when _nfs4_proc_mkdir() returned an error. That gate breaks retry-state hygiene: nfs4_do_handle_exception() resets exception.{delay,recovering, retry} to 0 on entry, so calling it on success is what previously cleared the retry flag set by the preceding NFS4ERR_DELAY iteration. With the gate in place, exception.retry stays at 1 after the successful retry, the loop runs once more, and the resulting CREATE for an already-created name yields NFS4ERR_EXIST -> -EEXIST to userspace. Drop the conditional and call nfs4_handle_exception() unconditionally, matching every other do-while in fs/nfs/nfs4proc.c (nfs4_proc_symlink(), nfs4_proc_link(), etc.). The dentry/status separation introduced by that commit is preserved. Fixes: dd862da61e91 ("nfs: fix incorrect handling of large-number NFS errors in nfs4_do_mkdir()") Reported-and-tested-by: Jan Čípa Closes: https://lore.kernel.org/linux-nfs/CA+9S74hSp_tJu2Ffe2BPNC2T25gfkhgjjDkdgSsF5c2rnJq_wA@mail.gmail.com/ Reviewed-by: NeilBrown Cc: stable@vger.kernel.org Signed-off-by: Igor Raits Signed-off-by: Anna Schumaker Signed-off-by: Greg Kroah-Hartman commit 012d37a568bfbb2c9686f03ade75560bc7139956 Author: Michael Bommarito Date: Wed May 27 12:30:35 2026 -0400 NFSv4/pNFS: reject zero-length r_addr in nfs4_decode_mp_ds_addr commit 41fe0f7b84f0cb822ae10ab08592996a592b2a25 upstream. nfs4_decode_mp_ds_addr() decodes the r_netid and r_addr opaques of a netaddr4 from a GETDEVICEINFO multipath-DS body, then immediately calls strrchr(buf, '.') to locate the port separator. Both decodes use xdr_stream_decode_string_dup(), and the current code checks only "nlen < 0" / "rlen < 0" before dereferencing the returned string. When the on-wire opaque has length zero, xdr_stream_decode_opaque_inline() returns 0 and xdr_stream_decode_string_dup() falls through to its "*str = NULL; return ret" tail, leaving buf NULL with a return value of 0. The "< 0" check does not catch this, and the next line is strrchr(NULL, '.'), a kernel NULL pointer dereference reachable from any pNFS-flexfile client mounted against a malicious or compromised metadata server. Reject the zero-length cases explicitly so the decoder fails with -EBADMSG (treated as a malformed GETDEVICEINFO body) instead of panicking the client. Cc: stable@vger.kernel.org Fixes: 6b7f3cf96364 ("nfs41: pull decode_ds_addr from file layout to generic pnfs") Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito Signed-off-by: Anna Schumaker Signed-off-by: Greg Kroah-Hartman commit d8c90c7cc061265d5f2813a1f5c82ef2f4707e67 Author: Michael Bommarito Date: Wed May 13 12:26:56 2026 -0400 NFSv4/flexfiles: reject zero filehandle version count commit 2c6bb3c40bc24f6aa8dfbe6fe98c3ad6389203f2 upstream. ff_layout_alloc_lseg() decodes the filehandle-version array count from the flexfiles layout body. The value is used as the count for kzalloc_objs(), and the current code only rejects NULL. A zero count yields ZERO_SIZE_PTR, which can be stored in dss_info->fh_versions even though later flexfiles paths assume that at least one filehandle version exists. Reject fh_count == 0 before the allocation, matching the existing zero version_count validation in the flexfiles GETDEVICEINFO parser. A QEMU/KASAN run with a malformed flexfiles layout hit: KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017] RIP: 0010:ff_layout_encode_ff_layoutupdate.isra.0+0x15f/0x750 ff_layout_encode_layoutreturn+0x683/0x970 nfs4_xdr_enc_layoutreturn+0x278/0x3a0 Kernel panic - not syncing: Fatal exception The patched kernel rejects the malformed layout without KASAN/oops/panic, and a valid fh_count=1 regression still opens, reads, and unmounts cleanly. Cc: stable@vger.kernel.org Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito Signed-off-by: Anna Schumaker Signed-off-by: Greg Kroah-Hartman commit 4367afc119c51e17a616f6908772b7e2c2c4013f Author: Jeff Layton Date: Fri May 22 12:44:19 2026 -0400 nfsd: reset write verifier on deferred writeback errors commit 2090b05803faab8a9fa62fbff871007862cac1b7 upstream. nfsd_vfs_write() and nfsd_commit() both call filemap_check_wb_err() to detect deferred writeback errors, but neither rotates the server's write verifier (nn->writeverf) when this check fails. Every other durable-storage-failure path in these functions calls commit_reset_write_verifier() before returning an error. The missing rotation means clients holding UNSTABLE write data under the current verifier will COMMIT, receive the unchanged verifier back, and conclude their data is durable — silently dropping data that failed writeback. This violates the UNSTABLE+COMMIT durability contract (RFC 1813 §3.3.7, RFC 8881 §18.32). Add commit_reset_write_verifier() calls at both filemap_check_wb_err() error sites, matching the pattern used by adjacent error paths in the same functions. The helper already filters -EAGAIN and -ESTALE internally, so the calls are unconditionally safe. Reported-by: Chris Mason Fixes: 555dbf1a9aac ("nfsd: Replace use of rwsem with errseq_t") Cc: stable@vger.kernel.org Assisted-by: kres:claude-opus-4-6 Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Greg Kroah-Hartman commit 017a6150106b054cc84d1b0582d97bd3a74d4281 Author: Jeff Layton Date: Fri May 22 10:36:14 2026 -0400 nfsd: avoid leaking pre-allocated openowner on unconfirmed retry race commit 57aee7a35bb12753057c5b65d72d1f46c0e95b07 upstream. When find_or_alloc_open_stateowner() encounters an unconfirmed owner, it calls release_openowner() and sets oo = NULL. Control then falls through past the `if (oo)` guard -- which would have freed any pre-allocated `new` -- and unconditionally executes `new = alloc_stateowner(...)`. If `new` was already allocated on a prior iteration, the pointer is silently overwritten and the previous allocation (slab object + owner name buffer) is leaked. This requires a race: two NFSv4.0 OPEN threads with the same owner string, where a concurrent thread inserts a new unconfirmed owner into the hash between retry iterations. The window is narrow but repeatable under adversarial conditions. Fix by adding `goto retry` after `oo = NULL` so the already-allocated `new` is reused on the next iteration rather than overwritten. Reported-by: Chris Mason Fixes: 23df17788c62 ("nfsd: perform all find_openstateowner_str calls in the one place.") Cc: stable@vger.kernel.org Assisted-by: kres:claude-opus-4-6 Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Greg Kroah-Hartman commit 0f28337f54cfb7f9777bbc2018be52b7c76c620e Author: Dominik Woźniak Date: Thu May 21 17:46:56 2026 +0200 nfsd: check get_user() return when reading princhashlen commit e186fa1c057f5eccb22afb1e83e34c0627085868 upstream. In __cld_pipe_inprogress_downcall(), the get_user() that reads princhashlen from the userspace cld_msg_v2 buffer does not check its return value. A failing copy leaves princhashlen with uninitialised stack contents, which are then used to drive memdup_user() and stored as princhash.len on the resulting reclaim record. The other get_user() calls in this function all check the return; only this one is missed, which is most likely a copy-paste oversight from when v2 upcalls were introduced. Mirror the existing pattern used a few lines above for namelen. namecopy is declared with __free(kfree) so the early return cleans up the already-allocated buffer automatically. Fixes: 6ee95d1c8991 ("nfsd: add support for upcall version 2") Cc: stable@vger.kernel.org Signed-off-by: Dominik Woźniak Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Greg Kroah-Hartman commit dba7da4835de7b17259fa23797d7dbfd36304bf0 Author: Jeff Layton Date: Thu May 21 09:25:40 2026 -0400 nfsd: fix inverted cp_ttl check in async copy reaper commit 0150459b05490b88b7e7378a31550a9e07b5517c upstream. nfsd4_async_copy_reaper() is supposed to keep completed async copy state around for NFSD_COPY_INITIAL_TTL (10) laundromat ticks so that OFFLOAD_STATUS can report the result, then reap the state once the countdown expires. The TTL predicate is inverted: `if (--copy->cp_ttl)` is true while ticks remain and false when the counter reaches zero. This causes the copy to be reaped on the very first tick (cp_ttl goes from 10 to 9, which is non-zero) instead of after all 10 ticks elapse. Once reaped, OFFLOAD_STATUS returns NFS4ERR_BAD_STATEID because the copy state has already been freed. Fix by negating the test so that cleanup runs when the TTL expires. Fixes: aa0ebd21df9c ("NFSD: Add nfsd4_copy time-to-live") Cc: stable@vger.kernel.org Reported-by: Chris Mason Assisted-by: kres:claude-opus-4-6 Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Greg Kroah-Hartman commit 136b416593f1349cf6f72c8e3d18f0f204ee8545 Author: Jeff Layton Date: Thu May 21 13:51:43 2026 -0400 nfsd: fix posix_acl leak on SETACL decode failure commit 0853ac544c590880d797b04daa33fcb72b6be0e1 upstream. nfsaclsvc_decode_setaclargs() and nfs3svc_decode_setaclargs() each call nfs_stream_decode_acl() twice, first for NFS_ACL and then for NFS_DFACL. Each successful call transfers ownership of a freshly allocated posix_acl into argp->acl_access or argp->acl_default. If the first call succeeds but the second fails, the decoder returns false and argp->acl_access is left dangling. ACLPROC2_SETACL.pc_release was wired to nfssvc_release_attrstat and ACLPROC3_SETACL.pc_release was wired to nfs3svc_release_fhandle. Both only call fh_put() and have no knowledge of the ACL fields on argp. The posix_acl_release() pairs sat at the out: labels inside nfsacld_proc_setacl() and nfsd3_proc_setacl(), but svc_process() skips pc_func when pc_decode returns false, so that cleanup is unreachable on decode failure: svc_process_common() pc_decode() /* decode_setaclargs: false */ /* pc_func skipped */ pc_release() /* fh_put only -- ACLs leaked */ The orphaned posix_acl is leaked for the lifetime of the server. Fix by adding nfsaclsvc_release_setacl() and nfs3svc_release_setacl(), which release both argp->acl_access and argp->acl_default in addition to fh_put(), and wiring them as pc_release for their respective SETACL procedures. pc_release runs on every path svc_process() takes after decode, including decode failure, so the posix_acl_release() pairs are removed from the proc functions' out: labels to keep ownership in one place. This matches the existing release_getacl() pattern used by the sibling GETACL procedures. Fixes: a257cdd0e217 ("[PATCH] NFSD: Add server support for NFSv3 ACLs.") Cc: stable@vger.kernel.org Assisted-by: kres:claude-opus-4-7 Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Signed-off-by: Greg Kroah-Hartman commit c8a24effd96d4779e2ad779654682304491c55a5 Author: Guannan Wang Date: Thu May 21 16:03:32 2026 +0800 NFSD: Fix SECINFO_NO_NAME decode error cleanup commit 9e18e83b8846a5c3fe13fc8a464b4865d33996c6 upstream. nfsd4_decode_secinfo_no_name() currently initializes sin_exp after decoding sin_style. If the XDR stream is truncated, the decoder returns nfserr_bad_xdr before sin_exp is initialized. Since commit 3fdc54646234 ("NFSD: Reduce amount of struct nfsd4_compoundargs that needs clearing"), the inline iops array is not cleared between RPC calls. A failed SECINFO_NO_NAME decode can therefore leave sin_exp holding stale union contents from a previous operation. The error response path still invokes nfsd4_secinfo_no_name_release(), which calls exp_put() on a non-NULL sin_exp. Initialize sin_exp before the first failable decode step, matching nfsd4_decode_secinfo(). Fixes: 3fdc54646234 ("NFSD: Reduce amount of struct nfsd4_compoundargs that needs clearing") Cc: stable@vger.kernel.org Signed-off-by: Guannan Wang Signed-off-by: Chuck Lever Signed-off-by: Greg Kroah-Hartman commit 6a946038f2a5a8c29048c6af369d4e391448a5c5 Author: Johan Hovold Date: Mon May 11 16:37:12 2026 +0200 i2c: core: fix adapter registration race commit ba14d7cf2fe7284610a29854bdff22b2537d3ce6 upstream. Adapters can be looked up based on their id using i2c_get_adapter() which takes a reference to the embedded struct device. Make sure that the adapter (including its struct device) has been initialised before adding it to the IDR to avoid accessing uninitialised data which could, for example, lead to NULL-pointer dereferences or use-after-free. Note that the i2c-dev chardev, which is registered from a bus notifier, currently uses i2c_get_adapter() so the adapter needs to be added to the IDR before registration. Fixes: 6e13e6418418 ("i2c: Add i2c_add_numbered_adapter()") Cc: stable@vger.kernel.org # 2.6.22 Signed-off-by: Johan Hovold Signed-off-by: Wolfram Sang Signed-off-by: Greg Kroah-Hartman commit fc6aa9bdbae60cdd8ae2358c7ee87c6fdb1132a4 Author: Steffen Persvold Date: Fri Jun 12 18:40:41 2026 +0200 fbdev: modedb: Fix misaligned fields in the 1920x1080-60 mode commit d894c48a57d78206e4df9c90d4acfaf39394806a upstream. The 1920x1080@60 modedb entry has one too many initializers before its sync field: a stray "0" occupies the sync slot, which shifts the remaining values by one field. The entry therefore decodes as sync = 0, vmode = FB_SYNC_HOR_HIGH_ACT | FB_SYNC_VERT_HIGH_ACT (0x3, i.e. FB_VMODE_INTERLACED | FB_VMODE_DOUBLE), and flag = FB_VMODE_NONINTERLACED, instead of the intended sync = positive H/V, vmode = non-interlaced. fb_find_mode() then returns a 1920x1080 mode flagged as interlaced + doublescan with active-low syncs. Drivers that honour var->vmode and var->sync when programming display timing enable doublescan and the wrong sync polarity, corrupting the output. Drop the stray initializer so sync and vmode hold their intended values (positive H/V sync, non-interlaced), matching the adjacent 1920x1200 entry. Fixes: c8902258b2b8 ("fbdev: modedb: Add 1920x1080 at 60 Hz video mode") Cc: stable@vger.kernel.org Signed-off-by: Steffen Persvold Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman commit 4d418cf8daf57e454b4d855bf9b2419fd8e6a540 Author: Tuo Li Date: Wed Jun 10 10:50:14 2026 +0800 fbdev: modedb: fix a possible UAF in fb_find_mode() commit 85b6256469cebdac395e7447147e06b2e151014f upstream. If mode_option is NULL, it is assigned from mode_option_buf: if (!mode_option) { fb_get_options(NULL, &mode_option_buf); mode_option = mode_option_buf; } Later, name is assigned from mode_option: const char *name = mode_option; However, mode_option_buf is freed before name is no longer used: kfree(mode_option_buf); while name is still accessed by: if ((name_matches(db[i], name, namelen) || Since name aliases mode_option_buf, this may result in a use-after-free. Fix this by extending the lifetime of mode_option_buf until the end of the function by using scope-based resource management for cleanup. Signed-off-by: Tuo Li Cc: stable@vger.kernel.org # v6.5+ Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman commit eea16b6f805c0b1fb2f72f0f771088ea45356956 Author: Ian Bridges Date: Wed Jun 24 23:13:12 2026 -0500 fbdev: Fix fb_new_modelist to prevent null-ptr-deref in fb_videomode_to_var commit 7f08fc10fa3d3366dc3af723970bd03d7d6d10e3 upstream. info->var, a framebuffer's current mode, is expected to have a matching entry in info->modelist. var_to_display() relies on this and treats a failed fb_match_mode() as "This should not happen". fb_set_var() keeps it true by adding the mode to the list on every change, and do_register_framebuffer() does the same at registration. store_modes() replaces the modelist from userspace. fb_new_modelist() validates the new modes but does not check that info->var still has a match. It relies on fbcon_new_modelist() to re-point consoles, but that only handles consoles mapped to the framebuffer. With fbcon unbound there are none, so info->var is left describing a mode that is no longer in the list. A later console takeover runs var_to_display(), where fb_match_mode() returns NULL and leaves fb_display[i].mode NULL. fbcon_switch() passes it to display_to_var(), and fb_videomode_to_var() dereferences the NULL mode. Keep the current mode in the list in fb_new_modelist(), the same way fb_set_var() does. Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ian Bridges Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman commit 7643e5622994f9b207445f9cd89a02223ce6a99f Author: Vivian Wang Date: Tue Mar 3 13:29:46 2026 +0800 riscv: kfence: Call mark_new_valid_map() for kfence_unprotect() commit 8d6c8c40e733b3fcaf92fed0a078bba2f6941a3b upstream. In kfence_protect_page(), which kfence_unprotect() calls, we cannot send IPIs to other CPUs to ask them to flush TLB. This may lead to those CPUs spuriously faulting on a recently allocated kfence object despite it being valid, leading to false positive use-after-free reports. Fix this by calling mark_new_valid_map() so that the page fault handling code path notices the spurious fault and flushes TLB then retries the access. Update the comment in handle_exception to indicate that new_valid_map_cpus_check also handles kfence_unprotect() spurious faults. Note that kfence_protect() has the same stale TLB entries problem, but that leads to false negatives, which is fine with kfence. Cc: stable@vger.kernel.org Reported-by: Yanko Kaneti Fixes: b3431a8bb336 ("riscv: Fix IPIs usage in kfence_protect_page()") Signed-off-by: Vivian Wang Link: https://patch.msgid.link/20260303-handle-kfence-protect-spurious-fault-v2-2-f80d8354d79d@iscas.ac.cn Signed-off-by: Paul Walmsley Signed-off-by: Greg Kroah-Hartman commit 3b33dbb43e21a8d1248dbe906d9e27f640a6001a Author: Vivian Wang Date: Tue Mar 3 13:29:45 2026 +0800 riscv: mm: Extract helper mark_new_valid_map() commit 9ee25d0a70ff4494b4e1d266b962d0a574ef318a upstream. In preparation of a future patch using the same mechanism for non-vmalloc addresses, extract the mark_new_valid_map() helper from flush_cache_vmap(). No functional change intended. Cc: stable@vger.kernel.org Signed-off-by: Vivian Wang Link: https://patch.msgid.link/20260303-handle-kfence-protect-spurious-fault-v2-1-f80d8354d79d@iscas.ac.cn Signed-off-by: Paul Walmsley Signed-off-by: Greg Kroah-Hartman commit 2205275be9be981e70ff29610b0117d8853fac70 Author: Wentao Liang Date: Tue Apr 7 07:30:25 2026 +0000 power: reset: linkstation-poweroff: fix use-after-free in the linkstation_poweroff_init() commit 8eec545cde69e46e9a1d2b7d915ce4f5df85b3bd upstream. Move of_node_put(dn) after the of_match_node() call, which still needs the node pointer. The node reference is correctly released after use. Fixes: e2f471efe1d6 ("power: reset: linkstation-poweroff: prepare for new devices") Cc: stable@vger.kernel.org Signed-off-by: Wentao Liang Link: https://patch.msgid.link/20260407073025.271865-1-vulab@iscas.ac.cn Signed-off-by: Sebastian Reichel Signed-off-by: Greg Kroah-Hartman commit 720949ed666f34ff28ffdfe1471a5861d1e41fdf Author: Ashutosh Desai Date: Fri May 1 13:35:32 2026 -0700 KVM: SVM: Fix page overflow in sev_dbg_crypt() for ENCRYPT path commit 78ee2d50185a037b3d2452a97f3dad69c3f7f389 upstream. In sev_dbg_crypt(), the per-iteration transfer length is bounded by the source page offset (PAGE_SIZE - s_off) but not by the destination page offset (PAGE_SIZE - d_off). When d_off > s_off, the encrypt path (__sev_dbg_encrypt_user) performs a read-modify-write using a single-page intermediate buffer (dst_tpage): 1. __sev_dbg_decrypt() expands the size to round_up(len + (d_off & 15), 16) before issuing the PSP command. If len + (d_off & 15) > PAGE_SIZE, the PSP writes beyond the end of the 4096-byte dst_tpage allocation. 2. The subsequent memcpy()/copy_from_user() into page_address(dst_tpage) + (d_off & 15) of 'len' bytes overflows by up to 15 bytes under the same condition. Trigger example: s_off = 0, d_off = 1, debug.len = PAGE_SIZE - the PSP is instructed to write round_up(4097, 16) = 4112 bytes to a 4096-byte buffer. Fix by also bounding len by (PAGE_SIZE - d_off), the same check that sev_send_update_data() already performs for its single-page guest region. ================================================================== BUG: KASAN: slab-use-after-free in sev_dbg_crypt+0x993/0xd10 [kvm_amd] Write of size 4095 at addr ff110062293bb009 by task sev_dbg_test/228214 CPU: 96 UID: 0 PID: 228214 Comm: sev_dbg_test Tainted: G U W 7.0.0-smp--5ce9b0c48211-dbg #156 PREEMPTLAZY Tainted: [U]=USER, [W]=WARN Hardware name: Google Astoria/astoria, BIOS 0.20250817.1-0 08/25/2025 Call Trace: dump_stack_lvl+0x54/0x70 print_report+0xbc/0x260 kasan_report+0xa2/0xd0 kasan_check_range+0x25f/0x2c0 __asan_memcpy+0x40/0x70 sev_dbg_crypt+0x993/0xd10 [kvm_amd] sev_mem_enc_ioctl+0x33c/0x450 [kvm_amd] kvm_vm_ioctl+0x65d/0x6d0 [kvm] __se_sys_ioctl+0xb2/0x100 do_syscall_64+0xe8/0x870 entry_SYSCALL_64_after_hwframe+0x4b/0x53 The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x7fe72b6a0 pfn:0x62293bb memcg:ff11000112827d82 flags: 0x1400000000000000(node=1|zone=1) raw: 1400000000000000 0000000000000000 dead000000000122 0000000000000000 raw: 00000007fe72b6a0 0000000000000000 00000001ffffffff ff11000112827d82 page dumped because: kasan: bad access detected Memory state around the buggy address: ff110062293bbf00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ff110062293bbf80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ff110062293bc000: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ^ ff110062293bc080: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ff110062293bc100: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ================================================================== Disabling lock debugging due to kernel taint Fixes: 24f41fb23a39 ("KVM: SVM: Add support for SEV DEBUG_DECRYPT command") Fixes: 7d1594f5d94b ("KVM: SVM: Add support for SEV DEBUG_ENCRYPT command") Cc: stable@vger.kernel.org Signed-off-by: Ashutosh Desai [sean: add sample KASAN splat, Fixes, and stable@] Link: https://patch.msgid.link/20260501203537.2120074-2-seanjc@google.com Signed-off-by: Sean Christopherson Signed-off-by: Greg Kroah-Hartman commit e36095d8d922bb26ce860231aacf0cd14edea07c Author: Hyunwoo Kim Date: Sat Jun 6 23:44:52 2026 +0900 KVM: x86: hyper-v: Bound the bank index when querying sparse banks commit 4721f8160f17554b003e8928bb61e6c9b2fe92a3 upstream. When checking if a VP ID is included in a sparse bank set, explicitly check that the ID can actually be contained in a sparse bank (the TLFS allows for a maximum of 64 banks of 64 vCPUs each). When handling a paravirtual TLB flush for L2, the VP ID is copied verbatim from the enlightened VMCS, without any bounds check, i.e. isn't guaranteed to be under the limit of 4096. Failure to check the bounds of the VP ID leads to an out-of-bounds read when testing the sparse bank, and super strictly speaking could lead to KVM performing an unnecessary TLB flush for an L2 vCPU. ================================================================== BUG: KASAN: use-after-free in hv_is_vp_in_sparse_set+0x85/0x100 [kvm] Read of size 8 at addr ffff88811ba5f598 by task hyperv_evmcs/2802 CPU: 12 UID: 1000 PID: 2802 Comm: hyperv_evmcs Not tainted 7.1.0-rc2 #7 PREEMPT Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack_lvl+0x51/0x60 print_report+0xcb/0x5d0 kasan_report+0xb4/0xe0 kasan_check_range+0x35/0x1b0 hv_is_vp_in_sparse_set+0x85/0x100 [kvm] kvm_hv_flush_tlb+0xe9e/0x16c0 [kvm] kvm_hv_hypercall+0xe6b/0x1e60 [kvm] vmx_handle_exit+0x485/0x1b60 [kvm_intel] kvm_arch_vcpu_ioctl_run+0x22e3/0x5070 [kvm] kvm_vcpu_ioctl+0x5d0/0x10c0 [kvm] __x64_sys_ioctl+0x129/0x1a0 do_syscall_64+0xb9/0xcf0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f0e62d1a9bf The buggy address belongs to the physical page: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffffffffffffffff pfn:0x11ba5f flags: 0x4000000000000000(zone=1) raw: 4000000000000000 0000000000000000 00000000ffffffff 0000000000000000 raw: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88811ba5f480: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88811ba5f500: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff88811ba5f580: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff88811ba5f600: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88811ba5f680: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== Disabling lock debugging due to kernel taint Opportunistically add a compile time assertion to ensure the maximum number of sparse banks exactly matches the number of possible bits in the passed in mask. Cc: stable@vger.kernel.org Fixes: c58a318f6090 ("KVM: x86: hyper-v: L2 TLB flush") Signed-off-by: Hyunwoo Kim Reviewed-by: Vitaly Kuznetsov Link: https://patch.msgid.link/aiQyZIJtO-2Aj_xN@v4bel [sean: add KASAN splat, drop comment, add assert, massage changelog] Signed-off-by: Sean Christopherson Signed-off-by: Greg Kroah-Hartman commit f9b57a0015c241274651f4b36627f56b1b5a8651 Author: Jonas Jelonek Date: Mon Jun 8 09:37:29 2026 +0000 MIPS: smp: report dying CPU to RCU in stop_this_cpu() commit 9f3f3bdc6d9dac1a5a8262ee7ad0f2ff1527a7e7 upstream. smp_send_stop() parks all secondary CPUs in stop_this_cpu(). The function marks the CPU offline for the scheduler via set_cpu_online(false) but never informs RCU, so RCU keeps expecting a quiescent state from CPUs that are now spinning forever with interrupts disabled. As long as nothing waits for an RCU grace period after smp_send_stop() this is harmless, which is why it went unnoticed. Since commit 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT") however, irq_work_sync() calls synchronize_rcu() on architectures without an irq_work self-IPI, i.e. where arch_irq_work_has_interrupt() returns false. That is the asm-generic default used by MIPS. Any irq_work_sync() issued in the reboot/shutdown path after smp_send_stop() then blocks on a grace period that can never complete, hanging the reboot: WARNING: CPU: 0 PID: 15 at kernel/irq_work.c:144 irq_work_queue_on ... rcu: INFO: rcu_sched detected stalls on CPUs/tasks: rcu: Offline CPU 1 blocking current GP. rcu: Offline CPU 2 blocking current GP. rcu: Offline CPU 3 blocking current GP. This issue was noticed on several Realtek MIPS switch SoCs (MIPS interAptiv) and came up during kernel bump downstream in OpenWrt from 6.18.33 to 6.18.34, after the backport of the patch to the 6.18 stable branch. The patch also has been backported all the way back to 6.1. Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the generic CPU-hotplug offline path, so RCU stops waiting on the parked CPUs and grace periods can still complete. MIPS shuts down all CPUs here without going through the CPU-hotplug mechanism, so this report is not otherwise issued. Reporting a dying CPU to RCU outside the regular hotplug offline path is not unprecedented: arm64 does the same in cpu_die_early(). There it is an exception for a CPU that was coming online and is aborting bringup, rather than the default shutdown action as on MIPS. Fixes: 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT") CC: stable@vger.kernel.org Signed-off-by: Jonas Jelonek Signed-off-by: Thomas Bogendoerfer Signed-off-by: Greg Kroah-Hartman commit 6dbe9443d9f5f7fb6d319a7b77108853ae6c6bea Author: Yizhou Zhao Date: Thu May 28 13:39:16 2026 +0800 9p: avoid putting oldfid in p9_client_walk() error path commit 1a3860d46e3eb47dbd60339783cdad7904486b9f upstream. When p9_client_walk() is called with clone set to false, fid aliases oldfid. If the walk subsequently fails after the request has been sent, the error path jumps to clunk_fid, which currently calls p9_fid_put(fid) unconditionally. This drops a reference to oldfid even though ownership of oldfid remains with the caller. If this is the last reference, oldfid can be clunked and destroyed while the caller still expects it to be valid. A later use or put of oldfid can then trigger a use-after-free or refcount underflow. Fix this by only putting fid in the clunk_fid error path when it does not alias oldfid, matching the existing guard in the error path below. This can be triggered when a multi-component walk is split into multiple p9_client_walk() calls and a later non-cloning walk fails. A reproducer and refcount warning logs are available on request. Fixes: b48dbb998d70 ("9p fid refcount: add p9_fid_get/put wrappers") Cc: stable@vger.kernel.org Reported-by: Yuxiang Yang Reported-by: Ao Wang Reported-by: Xuewei Feng Reported-by: Qi Li Reported-by: Ke Xu Assisted-by: GLM 5.1 Signed-off-by: Yizhou Zhao Message-ID: <20260528053918.53550-1-zhaoyz24@mails.tsinghua.edu.cn> Signed-off-by: Dominique Martinet Signed-off-by: Greg Kroah-Hartman commit 4cd57ebee395041099fcdfcabb00749ce38d8b27 Author: Zhang Cen Date: Sun May 24 19:12:48 2026 +0800 ocfs2: reject oversized group bitmap descriptors commit 9bd541e09dffff27e5bec0f9f45b0228173a5375 upstream. ocfs2_validate_gd_parent() only bounds bg_bits against the parent allocator's chain geometry. A malicious descriptor can still claim a bg_size/bg_bits pair that exceeds the bitmap bytes that physically fit in the group descriptor block, so later bitmap scans and bit updates can run past bg_bitmap. Add a physical-cap check based on ocfs2_group_bitmap_size() for the parent allocator type and reject descriptors whose bg_size or bg_bits exceed that capacity. Keep the existing chain geometry check so both the on-disk bitmap layout and the allocator metadata must agree before the descriptor is used. Validation reproduced this kernel report: KASAN use-after-free in _find_next_bit+0x7f/0xc0 Read of size 8 Call trace: dump_stack_lvl+0x66/0xa0 (?:?) print_report+0xd0/0x630 (?:?) _find_next_bit+0x7f/0xc0 (?:?) srso_alias_return_thunk+0x5/0xfbef5 (?:?) __virt_addr_valid+0x188/0x2f0 (?:?) kasan_report+0xe4/0x120 (?:?) ocfs2_find_max_contig_free_bits+0x35/0x70 (fs/ocfs2/suballoc.c:1375) ocfs2_block_group_set_bits+0x472/0x4b0 (fs/ocfs2/suballoc.c:1457) ocfs2_cluster_group_search+0x16b/0x440 (fs/ocfs2/suballoc.c:86) ocfs2_bg_discontig_fix_result+0x1ef/0x230 (fs/ocfs2/suballoc.c:1786) ocfs2_search_chain+0x8f8/0x10a0 (fs/ocfs2/suballoc.c:1886) get_page_from_freelist+0x70e/0x2370 (?:?) lock_release+0xc6/0x290 (?:?) do_raw_spin_unlock+0x9a/0x100 (?:?) kasan_unpoison+0x27/0x60 (?:?) __bfs+0x147/0x240 (?:?) get_page_from_freelist+0x83d/0x2370 (?:?) ocfs2_claim_suballoc_bits+0x38c/0xe70 (fs/ocfs2/suballoc.c:96) sched_domains_numa_masks_clear+0x70/0xd0 (?:?) check_irq_usage+0xe8/0xb70 (?:?) __ocfs2_claim_clusters+0x18d/0x4c0 (fs/ocfs2/suballoc.c:2497) check_path+0x24/0x50 (?:?) rcu_is_watching+0x20/0x50 (?:?) check_prev_add+0xfd/0xd00 (?:?) ocfs2_add_clusters_in_btree+0x17d/0x810 (fs/ocfs2/suballoc.c:?) __folio_batch_add_and_move+0x1f5/0x3d0 (?:?) ocfs2_add_inode_data+0xd9/0x120 (fs/ocfs2/suballoc.c:?) filemap_add_folio+0x105/0x1f0 (?:?) ocfs2_write_begin_nolock+0x29f7/0x2f80 (fs/ocfs2/suballoc.c:3043) ocfs2_read_inode_block+0xb5/0x110 (fs/ocfs2/suballoc.c:?) down_write+0xf5/0x180 (?:?) ocfs2_write_begin+0x180/0x240 (fs/ocfs2/suballoc.c:?) __mark_inode_dirty+0x758/0x9a0 (?:?) inode_to_bdi+0x41/0x90 (?:?) balance_dirty_pages_ratelimited_flags+0xf8/0x1d0 (?:?) generic_perform_write+0x252/0x440 (?:?) mnt_put_write_access_file+0x16/0x70 (?:?) file_update_time_flags+0xe4/0x200 (?:?) ocfs2_file_write_iter+0x80a/0x1320 (fs/ocfs2/suballoc.c:?) lock_acquire+0x184/0x2f0 (?:?) ksys_write+0xd2/0x170 (?:?) apparmor_file_permission+0xf5/0x310 (?:?) read_zero+0x8d/0x140 (?:?) lock_is_held_type+0x8f/0x100 (?:?) Link: https://lore.kernel.org/20260524111248.1429884-1-rollkingzzc@gmail.com Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem") Assisted-by: Codex:gpt-5.5 Signed-off-by: Zhang Cen Reviewed-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Changwei Ge Cc: Jun Piao Cc: Heming Zhao Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit 104d100212396801f1d9d388282f746e23e2bfd6 Author: Yuho Choi Date: Mon Jun 1 14:32:47 2026 -0400 rpmsg: char: Fix use-after-free on probe error path commit 1ff3f528e67d20e2b1483dcaba899dc7832b2e6b upstream. rpmsg_chrdev_probe() stores the newly allocated eptdev in the default endpoint's priv pointer before calling rpmsg_chrdev_eptdev_add(). If rpmsg_chrdev_eptdev_add() then fails, its error path frees eptdev while the default endpoint may still dispatch callbacks with the stale priv pointer. Avoid publishing eptdev through the default endpoint until rpmsg_chrdev_eptdev_add() succeeds. Messages received before the priv pointer is published should be ignored by rpmsg_ept_cb(). Flow-control updates can hit rpmsg_ept_flow_cb() in the same window, so make both callbacks return success when priv is NULL. Fixes: bc69d1066569 ("rpmsg: char: Introduce the "rpmsg-raw" channel") Signed-off-by: Yuho Choi Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20260601183247.1962010-1-dbgh9129@gmail.com Signed-off-by: Mathieu Poirier Signed-off-by: Greg Kroah-Hartman commit 369496d885b4cf6e8647cf4dc5cf3ac68fdf37a1 Author: Wentao Liang Date: Wed Apr 8 15:45:34 2026 +0000 fpga: region: fix use-after-free in child_regions_with_firmware() commit 54f3c5643ec523a04b6ec0e7c19eb10f5ebebdd3 upstream. Move of_node_put(child_region) after the error print to avoid accessing freed memory when pr_err() references child_region. Fixes: 0fa20cdfcc1f ("fpga: fpga-region: device tree control for FPGA") Cc: stable@vger.kernel.org Signed-off-by: Wentao Liang [ Yilun: Fix the Fixes tag ] Reviewed-by: Xu Yilun Link: https://lore.kernel.org/r/20260408154534.404327-1-vulab@iscas.ac.cn Signed-off-by: Xu Yilun Signed-off-by: Greg Kroah-Hartman commit b3a3831b2eb884641906fc5e46207b205b6aea13 Author: Qingshuang Fu Date: Thu Jun 18 10:13:52 2026 +0800 irqchip/imgpdc: Fix resource leak, add missing chained handler cleanup on remove commit 37738fdf2ab1e504d1c63ce5bc0aeb6452d8f057 upstream. The driver allocates domain generic chips using irq_alloc_domain_generic_chips() during probe and sets up chained handlers using irq_set_chained_handler_and_data(). However, on driver removal, the generic chips are not freed and the chained handlers are not removed. The generic chips remain on the global gc_list and may later be accessed by generic interrupt chip suspend, resume, or shutdown callbacks after the driver has been removed, potentially resulting in a use-after-free and kernel crash. The chained handlers that were installed in probe for peripheral and syswake interrupts are also left dangling, which can lead to spurious interrupts accessing freed memory. Fix these issues by: - Setting IRQ_DOMAIN_FLAG_DESTROY_GC flag in domain->flags, so the core code automatically removes generic chips when irq_domain_remove() is called - Clearing all chained handlers with NULL in pdc_intc_remove() Fixes: b6ef9161e43a ("irq-imgpdc: add ImgTec PDC irqchip driver") Signed-off-by: Qingshuang Fu Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260618021352.661773-1-fffsqian@163.com Signed-off-by: Greg Kroah-Hartman commit 200e7637f4d6a1342987045eea72641524f909dc Author: Wentao Liang Date: Mon May 18 13:10:36 2026 +0000 pNFS: Fix use-after-free in pnfs_update_layout() commit 13e198a90ca4050f4bee8a3f23680389a6563ccc upstream. When hitting the NFS_LAYOUT_RETURN branch in pnfs_update_layout(), the code calls pnfs_prepare_to_retry_layoutget(lo). If it succeeds, pnfs_put_layout_hdr(lo) is called before trace_pnfs_update_layout(), which still references 'lo'. This results in a use-after-free when the tracepoint accesses lo's fields. Fix this by moving the tracepoint call before pnfs_put_layout_hdr(lo). Fixes: 2c8d5fc37fe2 ("pNFS: Stricter ordering of layoutget and layoutreturn") Cc: stable@vger.kernel.org Signed-off-by: Wentao Liang Signed-off-by: Anna Schumaker Signed-off-by: Greg Kroah-Hartman commit 90e254f18b8c224460082329dd5c42fd30995c2f Author: Huacai Chen Date: Thu Jun 25 13:03:49 2026 +0800 LoongArch: Report dying CPU to RCU in stop_this_cpu() commit f2539c56c74691e7a88af6372ba2b48c06ed2fe4 upstream. This is a port of MIPS commit 9f3f3bdc6d9dac1 ("MIPS: smp: report dying CPU to RCU in stop_this_cpu()"). smp_send_stop() parks all secondary CPUs in stop_this_cpu(). And the function marks the CPU offline for the scheduler via set_cpu_online(false) but never informs RCU, so RCU keeps expecting a quiescent state from CPUs that are now spinning forever with interrupts disabled. As long as nothing waits for an RCU grace period after smp_send_stop() this is harmless, which is why it went unnoticed. However, since commit 91840be8f710370 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT"), irq_work_sync() calls synchronize_rcu() on architectures without an irq_work self-IPI, i.e. where arch_irq_work_has_interrupt() returns false. Any irq_work_sync() issued in the reboot/shutdown/halt path after smp_send_stop() then blocks on a grace period that can never complete, hanging the reboot: WARNING: CPU: 0 PID: 15 at kernel/irq_work.c:144 irq_work_queue_on ... rcu: INFO: rcu_sched detected stalls on CPUs/tasks: rcu: Offline CPU 1 blocking current GP. rcu: Offline CPU 2 blocking current GP. rcu: Offline CPU 3 blocking current GP. This issue needs some hacks to reproduce, and it was not noticed on LoongArch because arch_irq_work_has_interrupt() usually returns true. Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the generic CPU-hotplug offline path, so RCU stops waiting on the parked CPUs and grace periods can still complete. LoongArch shuts down all CPUs here without going through the CPU-hotplug mechanism, so this report is not otherwise issued. Cc: Fixes: 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT") Reviewed-by: Guo Ren Signed-off-by: Huacai Chen Signed-off-by: Greg Kroah-Hartman commit e18769616fd5a90ec1e12aabbba544c488284292 Author: Doruk Tan Ozturk Date: Wed Jun 17 09:58:18 2026 +0200 tipc: fix slab-use-after-free Read in tipc_aead_decrypt_done commit bda3348872a2ef0d19f2df6aa8cb5025adce2f20 upstream. tipc_aead_decrypt() goes straight from tipc_bearer_hold(b) to crypto_aead_decrypt(req) without taking a reference on the netns, unlike the encrypt path. When crypto_aead_decrypt() is offloaded asynchronously (e.g. the SIMD aead wrapper queuing to cryptd), the cryptd worker runs tipc_aead_decrypt_done() later. If the bearer's netns is torn down in the meantime, cleanup_net() -> tipc_exit_net() -> tipc_crypto_stop() frees the per-netns tipc_crypto, and the completion then reads it: tipc_aead_decrypt_done() dereferences aead->crypto->stats and aead->crypto->net, and tipc_crypto_rcv_complete() dereferences aead->crypto->aead[] and the node table -- reading freed memory. Decoded KASAN splat (v7.1-rc7, CONFIG_KASAN_INLINE + TIPC + TIPC_CRYPTO): BUG: KASAN: slab-use-after-free in tipc_aead_decrypt_done (net/tipc/crypto.c:999) Read of size 8 at addr ffff8881056258a8 by task kworker/u16:2/51 Workqueue: events_unbound Call Trace: tipc_aead_decrypt_done (net/tipc/crypto.c:999) process_one_work (kernel/workqueue.c:3314) worker_thread (kernel/workqueue.c:3397 kernel/workqueue.c:3478) kthread (kernel/kthread.c:436) ret_from_fork (arch/x86/kernel/process.c:158) ret_from_fork_asm (arch/x86/entry/entry_64.S:245) Allocated by task 169: __kasan_kmalloc (mm/kasan/common.c:398 mm/kasan/common.c:415) tipc_crypto_start (net/tipc/crypto.c:1502) tipc_init_net (net/tipc/core.c:72) ops_init (net/core/net_namespace.c:137) setup_net (net/core/net_namespace.c:446) copy_net_ns (net/core/net_namespace.c:579) create_new_namespaces (kernel/nsproxy.c:132) __x64_sys_unshare (kernel/fork.c:3316) do_syscall_64 (arch/x86/entry/syscall_64.c:63) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121) Freed by task 8: kfree (mm/slub.c:6566) tipc_exit_net (net/tipc/core.c:119) cleanup_net (net/core/net_namespace.c:704) process_one_work (kernel/workqueue.c:3314) kthread (kernel/kthread.c:436) This is the same class of bug that commit e279024617134 ("net/tipc: fix slab-use-after-free Read in tipc_aead_encrypt_done") fixed for the encrypt side. The encrypt path takes maybe_get_net(aead->crypto->net) before crypto_aead_encrypt() and drops it with put_net() on the synchronous return paths and in tipc_aead_encrypt_done(); the -EINPROGRESS/-EBUSY return keeps the reference for the async callback to release. The decrypt path was left without the equivalent guard. Mirror the encrypt-side fix on the decrypt path: take a net reference before crypto_aead_decrypt() (failing with -ENODEV and the matching bearer put if it cannot be acquired), keep it across the -EINPROGRESS/-EBUSY async return, and drop it with put_net() on the synchronous success/error return and at the end of tipc_aead_decrypt_done(). Reproduced under KASAN on v7.1-rc7: a UDP bearer with a cluster key is flooded with crafted encrypted frames from an unknown peer (driving the cluster-key decrypt path) while the bearer's netns is repeatedly torn down. The completion must run asynchronously to outlive tipc_crypto_stop(); on x86 the stock aesni gcm(aes) now decrypts synchronously, so the async path was exercised via cryptd offload. The unguarded aead->crypto dereference in tipc_aead_decrypt_done() is the unpatched upstream path; tipc_aead_decrypt() still lacks maybe_get_net(aead->crypto->net), so the completion can outlive the free on any config where crypto_aead_decrypt() goes async. Found by 0sec automated security-research tooling (https://0sec.ai). Fixes: fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication") Cc: stable@vger.kernel.org Signed-off-by: Doruk Tan Ozturk Reviewed-by: Alexander Lobakin Reviewed-by: Tung Nguyen Reviewed-by: Simon Horman Link: https://patch.msgid.link/20260617075818.37431-1-doruk@0sec.ai Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman commit 5e5b7f2ef854936e95dceb6a2fdfefcb7152d2c6 Author: Michal Koutný Date: Thu Feb 5 23:54:23 2026 +0800 blk-cgroup: fix UAF in __blkcg_rstat_flush() commit 0ab5ee5a1badb58cbb2242617cb01a4972b1f2a2 upstream. When multiple blkgs in the same blkcg are released concurrently, a use-after-free can occur. The race happens when one blkg's __blkcg_rstat_flush() removes another blkg's iostat entries via llist_del_all(). The second blkg sees an empty list and proceeds to free itself while the first is still iterating over its entries. Move the flush from __blkg_release() (RCU callback) to blkg_release() (before call_rcu). This ensures the RCU grace period waits for any concurrent flush's rcu_read_lock() section to complete before freeing. Cc: stable@vger.kernel.org Cc: Jay Shin Cc: Tejun Heo Cc: Waiman Long Fixes: 20cb1c2fb756 ("blk-cgroup: Flush stats before releasing blkcg_gq") Reported-by: coregee2000@gmail.com Closes: https://lore.kernel.org/linux-block/CAHPqNmwT9oRpem3J3erS_W0uSQND47LGGSBsNxP8E6uSUish1w@mail.gmail.com/ Signed-off-by: Ming Lei Tested-by: Jose Fernandez (Anthropic) Link: https://patch.msgid.link/20260205155425.342084-1-ming.lei@redhat.com Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit 5a84398101bf9f11e84b176343e4e3ba83e668c0 Author: Fan Wu Date: Wed Jun 17 02:05:18 2026 +0000 hdlc_ppp: sync per-proto timers before freeing hdlc state commit c78a4e41ab5ead6193ad8a2dd92e8906bae659fa upstream. Each PPP control protocol (LCP/IPCP/IPV6CP) embedded in struct ppp registers a timer via timer_setup(). That struct ppp is the hdlc->state allocation, which detach_hdlc_protocol() frees with kfree() in both teardown paths: unregister_hdlc_device() and the re-attach inside attach_hdlc_protocol(). The ppp proto never registered a .detach callback, so detach_hdlc_protocol() performs no timer synchronization before the kfree(). The only cancel, timer_delete(&proto->timer) in ppp_cp_event(), is partial (it does not wait for a running callback) and only runs on the ->CLOSED transition; ppp_stop()/ppp_close() do not sync either. A ppp_timer callback already executing (blocked on ppp->lock) survives the kfree and then dereferences proto->state / ppp->lock in freed memory, leading to a use-after-free. Fix this by adding a .detach helper that calls timer_shutdown_sync() on every per-proto timer. detach_hdlc_protocol() invokes proto->detach(dev) before kfree(hdlc->state), so timer_shutdown_sync() now runs on both free paths. timer_shutdown_sync() is used instead of timer_delete_sync() because the keepalive path re-arms the timer through add_timer()/mod_timer() and shutdown blocks any re-activation during teardown. Initialize the per-protocol timers in ppp_ioctl() when the protocol is attached, and remove the now-redundant timer_setup() from ppp_start(), so that the timers are initialized exactly once at attach time and ppp_timer_release() never operates on uninitialized timer_list structures. attach_hdlc_protocol() uses kmalloc() (not kzalloc), so struct ppp's protos[i].timer is uninitialized garbage until the first timer_setup(); without this init-at-attach, attaching the PPP protocol without ever bringing the device up would leave timer_shutdown_sync() operating on uninitialized memory in .detach. Moving the init out of ppp_start() (which only runs on NETDEV_UP) into the attach path makes the initialization unconditional and avoids initializing the same timer_list twice. This bug was found by static analysis. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Fan Wu Link: https://patch.msgid.link/20260617020518.116319-1-fanwu01@zju.edu.cn Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman commit e91df6d273445c03f5aa302bfe147eda33d45794 Author: Wentao Liang Date: Tue Jun 16 15:10:49 2026 +0000 pwrseq: core: fix use-after-free in pwrseq_debugfs_seq_next() commit 257595adf9dac15ae1edd9d07753fbc576a7583d upstream. pwrseq_debugfs_seq_next() declares 'next' with __free(put_device), which causes put_device() to be called on the returned pointer when the variable goes out of scope. This results in a use-after-free since the seq_file framework receives a pointer whose reference has already been dropped. Simply removing __free(put_device) would fix the UAF but would leak the reference acquired by bus_find_next_device(), as stop() only calls up_read(&pwrseq_sem) and never releases the device reference. Fix this by making the reference counting consistent across all seq_file callbacks, matching the standard pattern used by PCI and SCSI: - start(): use get_device() so it returns a referenced pointer. - next(): explicitly put_device(curr) to release the previous device's reference (no NULL check needed - the seq_file framework only calls next() while the previous return was non-NULL). - stop(): put_device(data) to release the last iterated device's reference, with a NULL guard since stop() may be called with NULL when start() returned NULL or next() reached end-of-sequence. Cc: stable@vger.kernel.org Fixes: 249ebf3f65f8 ("power: sequencing: implement the pwrseq core") Signed-off-by: Wentao Liang Link: https://patch.msgid.link/20260616151049.1705503-1-vulab@iscas.ac.cn Signed-off-by: Bartosz Golaszewski Signed-off-by: Greg Kroah-Hartman commit b85ef03f726b15047a6fa6d11b639bdf6c0ee4f0 Author: Tristan Madani Date: Fri May 1 11:02:03 2026 +0000 gfs2: fix use-after-free in gfs2_qd_dealloc commit f9c9ec2c319f843b70ecdf939d48b52d189bc081 upstream. gfs2_qd_dealloc(), called as an RCU callback from gfs2_qd_dispose(), accesses the superblock object sdp through qd->qd_sbd after freeing qd. It does so to decrement sd_quota_count and wake up sd_kill_wait. However, by the time the RCU callback runs, gfs2_put_super() may have already freed sdp via free_sbd(). This can happen when gfs2_quota_cleanup() is called during unmount: it disposes of quota objects via call_rcu() and then waits on sd_kill_wait with a 60-second timeout. If the timeout expires, or if gfs2_gl_hash_clear() triggers additional qd_put() calls that schedule more RCU callbacks after the wait completes, gfs2_put_super() will proceed to free the superblock while RCU callbacks referencing it are still pending. Add an rcu_barrier() before free_sbd() in gfs2_put_super() to ensure all pending RCU callbacks (including gfs2_qd_dealloc) have completed before the superblock is freed. Fixes: a475c5dd16e5 ("gfs2: Free quota data objects synchronously") Reported-by: syzbot+42a37bf8045847d8f9d2@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=42a37bf8045847d8f9d2 Tested-by: syzbot+42a37bf8045847d8f9d2@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Tristan Madani Signed-off-by: Andreas Gruenbacher Signed-off-by: Greg Kroah-Hartman commit 8d8507a457667f23477a15496b91908a5b5b7cf3 Author: Sam James Date: Mon May 25 08:56:19 2026 +0100 crypto: nx - fix nx_crypto_ctx_exit argument commit 4e67f504ee9ded15e256b64f4fde150e917381d7 upstream. nx_crypto_ctx_shash_exit calls nx_crypto_ctx_exit with crypto_shash_ctx(...) but crypto_shash_ctx gives a nx_crypto_ctx *, not a crypto_tfm *. Fix the type in nx_crypto_ctx_exit and drop the bogus crypto_tfm_ctx call. This fixes the following oops: BUG: Unable to handle kernel data access at 0xc0403effffffffc8 Faulting instruction address: 0xc000000000396cb4 Oops: Kernel access of bad area, sig: 11 [#15] Call Trace: nx_crypto_ctx_shash_exit+0x24/0x60 crypto_shash_exit_tfm+0x28/0x40 crypto_destroy_tfm+0x98/0x140 crypto_exit_ahash_using_shash+0x20/0x40 crypto_destroy_tfm+0x98/0x140 hash_release+0x1c/0x30 alg_sock_destruct+0x38/0x60 __sk_destruct+0x48/0x2b0 af_alg_release+0x58/0xb0 __sock_release+0x68/0x150 sock_close+0x20/0x40 __fput+0x110/0x3a0 sys_close+0x48/0xa0 system_call_exception+0x140/0x2d0 system_call_common+0xf4/0x258 .. which came from hardlink(1) opportunistically using AF_ALG. The same problem exists with nx_crypto_ctx_skcipher_exit getting a context it wasn't expecting, but apparently nobody hit that for years. Cc: Eric Biggers Cc: stable@vger.kernel.org Fixes: bfd9efddf990 ("crypto: nx - convert AES-ECB to skcipher API") Fixes: 9420e628e7d8 ("crypto: nx - Use API partial block handling") Acked-by: Breno Leitao Reviewed-by: Eric Biggers Reported-by: Calvin Buckley Tested-by: Calvin Buckley Suggested-by: Brad Spengler Signed-off-by: Sam James Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman commit 5da9b1a87ec7cc3489c27016313524769f12d9e0 Author: Sean Christopherson Date: Fri Jun 12 15:52:41 2026 -0700 KVM: Replace guest-triggerable BUG_ON() in ioeventfd datamatch with get_unaligned() commit f1edbed787ba67988ed34e0132ca128b052b6ce8 upstream. Drop a BUG_ON() that has been reachable since it was first added, way back in 2009, and instead use get_unaligned() to perform potentially-unaligned accesses. For a given store, KVM x86's emulator tracks the entire value in the destination operand, x86_emulate_ctxt.dst. If the destination is memory, and the target splits multiple pages and/or is emulated MMIO, then KVM handles each fragment independently. E.g. on a page split starting at page offset 0xffc, KVM writes 4 bytes to the first page, then the remaining bytes to the second page, using ctxt->dst as the source for both (with appropriate offsets). If the destination splits a page *and* hits emulated MMIO on the second page, then KVM will complete the write to the first page, then emulate the MMIO access to the second page. If there is a datamatch-enabled ioeventfd at offset 0 of the second page, then KVM will process the remainder of the store as a potential ioeventfd signal. Putting it all together, if the guest emits a store that splits a page starting at page offset N, and the second page has a datamatch-enabled ioeventfd at offset 0, then KVM will check for datamatch using &dst.valptr[N] as the source. Due to dst (and thus dst.valptr) being 32-byte aligned, if N is not aligned to @len, the BUG_ON() fires. E.g. with a 16-byte store at page offset 0xffc, to an ioeventfd of len 8, all initial checks in ioeventfd_in_range() will succeed, and the BUG_ON() fires due to @val being 4-byte aligned, but not 8-byte aligned. ------------[ cut here ]------------ kernel BUG at arch/x86/kvm/../../../virt/kvm/eventfd.c:783! Oops: invalid opcode: 0000 [#1] SMP CPU: 0 UID: 1000 PID: 615 Comm: repro Not tainted 7.1.0-rc2-ff238429d1ea #365 PREEMPT Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:ioeventfd_write+0x6c/0x70 [kvm] Call Trace: __kvm_io_bus_write+0x85/0xb0 [kvm] kvm_io_bus_write+0x53/0x80 [kvm] vcpu_mmio_write+0x66/0xf0 [kvm] emulator_read_write_onepage+0x12a/0x540 [kvm] emulator_read_write+0x109/0x2b0 [kvm] x86_emulate_insn+0x4f8/0xfb0 [kvm] x86_emulate_instruction+0x181/0x790 [kvm] kvm_mmu_page_fault+0x313/0x630 [kvm] vmx_handle_exit+0x18a/0x590 [kvm_intel] kvm_arch_vcpu_ioctl_run+0xc81/0x1c90 [kvm] kvm_vcpu_ioctl+0x2d5/0x970 [kvm] __x64_sys_ioctl+0x8a/0xd0 do_syscall_64+0xb7/0x890 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f19c931a9bf Modules linked in: kvm_intel kvm irqbypass ---[ end trace 0000000000000000 ]--- In a perfect world, the fix would be to simply delete the BUG_ON(), as KVM x86 doesn't perform alignment checks on "normal" memory accesses at CPL0. Sadly, C99 ruins all the fun; while the x86 architecture plays nice, dereferencing an unaligned pointer directly is undefined behavior in C, e.g. triggers splats when running with CONFIG_UBSAN_ALIGNMENT=y. Fixes: d34e6b175e61 ("KVM: add ioeventfd support") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson Message-ID: <20260612225241.678509-1-seanjc@google.com> Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit 18587f9831612e24cd8f24be1ec15478feff7abc Author: Sean Christopherson Date: Wed Apr 29 09:34:01 2026 -0700 KVM: x86/mmu: Ensure hugepage is in by slot before checking max mapping level commit ef057cbf825e03b63f6edf5980f96abf3c53089d upstream. When recovering hugepages in the shadow MMU, verify that the base gfn of the shadow page is actually contained within the target memslot, *before* querying the max mapping level given the shadow page's gfn. Failure to pre-check the validity of the gfn can lead to an out-of-bounds access to the slot's lpage_info (which typically manifests as a host #PF because the lpage_info is vmalloc'd) if the guest creates a hugepage mapping (in its PTEs) that extends "below" the bounds of a memslot. When faulting in memory for a guest, and the size of the guest mapping is greater than KVM's (current) max mapping, then KVM will create a "direct" shadow page (direct in that there are no gPTEs to shadow, and so the target gfn is a direct calculation given the base gfn of the shadow page). The hugepage recovery flow looks for such direct shadow pages, as forcing 4KiB mappings when dirty logging generates the guest > host mapping size case. When the 4KiB restriction is lifted, then KVM can replace the shadow page with a hugepage. But if KVM originally used a smaller mapping than the guest because the range of memory covered by the guest hugepage exceeds the bounds of a memslot, then KVM will link a direct shadow page with a gfn that is outside the bounds of the memslot being used to fault in memory. The rmap entry added for the leaf mapping is correct and within bounds, but the gfn of the leaf SPTE's parent shadow page will be out of bounds. BUG: unable to handle page fault for address: ffffc90000806ffc #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 100000067 P4D 100000067 PUD 1002a7067 PMD 10612f067 PTE 0 Oops: Oops: 0000 [#1] SMP CPU: 13 UID: 1000 PID: 757 Comm: mmu_stress_test Not tainted 7.1.0-rc1-48ce1e26eace-x86_pir_to_irr_comments-vm #341 PREEMPT Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kvm_mmu_max_mapping_level+0x79/0x2b0 [kvm] Call Trace: kvm_mmu_recover_huge_pages+0x21b/0x320 [kvm] kvm_set_memslot+0x1ee/0x590 [kvm] kvm_set_memory_region.part.0+0x3a1/0x4d0 [kvm] kvm_vm_ioctl+0x9bf/0x15d0 [kvm] __x64_sys_ioctl+0x8a/0xd0 do_syscall_64+0xb7/0xbb0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f21c0f1a9bf Don't bother pre-checking the bounds of the potential hugepage, i.e. don't check that e.g. sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level + 1) is also within the memslot, as the checks performed by kvm_mmu_max_mapping_level() are a superset of the basic bounds checks. I.e. pre-checking the full range would be a dubious micro-optimization. Fixes: 9eba50f8d7fc ("KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs") Cc: stable@vger.kernel.org Cc: David Matlack Cc: James Houghton Cc: Alexander Bulekov Cc: Fred Griffoul Cc: Alexander Graf Cc: David Woodhouse Cc: Filippo Sironi Cc: Ivan Orlov Signed-off-by: Sean Christopherson Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit adfacfbaeae2cb760f492357cc36b41f84ef7f86 Author: Michael Bommarito Date: Wed Apr 22 11:58:44 2026 -0400 exfat: fix potential use-after-free in exfat_find_dir_entry() commit 3f5f8ee9917cc2b9076ac533492d8a200edcabb8 upstream. In exfat_find_dir_entry(), the buffer_head obtained from exfat_get_dentry() is released with brelse(bh) before the fall-through TYPE_EXTEND branch reads the directory entry through ep (which points into bh->b_data): brelse(bh); if (entry_type == TYPE_EXTEND) { ... len = exfat_extract_uni_name(ep, entry_uniname); ... } After brelse() drops our reference, nothing guarantees that the underlying page backing bh->b_data remains valid for the subsequent exfat_extract_uni_name() read. This is the same pattern fixed in commit fc961522ddbd ("exfat: Fix potential use after free in exfat_load_upcase_table()"). Move brelse(bh) so it runs after ep is no longer dereferenced on each branch. Confirmed on QEMU x86_64 with CONFIG_KASAN=y + CONFIG_DEBUG_PAGEALLOC=y + CONFIG_PAGE_POISONING=y on linux-next, using a crafted exFAT image (long filename with same-hash collisions forcing the TYPE_EXTEND path). With a debug-only invalidate_bdev() inserted between brelse(bh) and the ep read to make the stale-deref window deterministic, the unpatched kernel faults: BUG: KASAN: use-after-free in exfat_find_dir_entry+0x133b/0x15a0 BUG: unable to handle page fault for address: ffff88801a5fa0c2 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI RIP: 0010:exfat_find_dir_entry+0x1188/0x15a0 With this patch applied, the same instrumented harness completes cleanly under the same sanitizer stack. I have not reproduced a crash on an uninstrumented kernel under ordinary reclaim; the instrumented A/B establishes the lifetime violation and that the patch closes it, not an unaided triggerability claim. Fixes: ca06197382bd ("exfat: add directory operations") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito Signed-off-by: Namjae Jeon Signed-off-by: Greg Kroah-Hartman commit 6e61fc2e06e44b6d30248cc5bc47a58e75c2b43e Author: Maciej W. Rozycki Date: Wed May 6 23:42:27 2026 +0100 MIPS: DEC: Prevent initial console buffer from landing in XKPHYS commit 7fb13fd35110ebe95eb053faf79d018f51144d85 upstream. In 64-bit configurations calling the initial console output handler from a kernel thread other than the initial one will result in a situation where the stack has been placed in the XKPHYS 64-bit memory segment and consequently so has been the buffer allocated there that is used as the argument corresponding to the `%s' output conversion specifier for the firmware's printf() entry point. This 64-bit address will then be truncated by 32-bit firmware, resulting in an attempt to access the wrong memory location, which in turn will cause all kinds of unpredictable behaviour, such as a kernel crash: Console: colour dummy device 160x64 Calibrating delay loop... 49.36 BogoMIPS (lpj=192512) pid_max: default: 32768 minimum: 301 CPU 0 Unable to handle kernel paging request at virtual address 000000000203bd00, epc == ffffffffbfc08364, ra == ffffffffbfc08800 Oops[#1]: CPU: 0 PID: 0 Comm: swapper Not tainted 5.18.0-rc2-00254-gfb649bda6f56-dirty #121 $ 0 : 0000000000000000 0000000000000001 0000000000000023 ffffffff80684ba0 $ 4 : 000000000203bd00 ffffffffbfc0f3b4 ffffffffffffffff 0000000000000073 $ 8 : 0a303d7469000000 0000000000000000 0000000000000073 ffffffffbfc0f473 $12 : 0000000000000002 0000000000000000 ffffffff80684c1c 0000000000000000 $16 : 0000000000000000 ffffffff80596dc9 0000000000000000 ffffffffbfc09240 $20 : ffffffff80684c40 ffffffffbfc0f400 000000000000002d 000000000000002b $24 : ffffffffffffffbf 000000000203bd00 $28 : ffffffff805f0000 ffffffff80684b58 0000000000000030 ffffffffbfc08800 Hi : 0000000000000000 Lo : 0000000000000aa8 epc : ffffffffbfc08364 0xffffffffbfc08364 ra : ffffffffbfc08800 0xffffffffbfc08800 Status: 140120e2 KX SX UX KERNEL EXL Cause : 00000008 (ExcCode 02) BadVA : 000000000203bd00 PrId : 00000430 (R4000SC) Modules linked in: Process swapper (pid: 0, threadinfo=(____ptrval____), task=(____ptrval____), tls=0000000000000000) Stack : 0000000000000000 0000000000000000 0000000000000000 0000004d0000004d 80684cc0806a2a40 80596dc80000004d 8061000000000000 bfc0850c80684c38 0000000000000000 000000000203bd00 0000000000000000 0000000000000000 0000000000000000 00000000bfc0f3b4 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000002500000000 0000000000000000 0000000000000000 802c1a7400000000 0203bd0080596dc8 0203bd4d69000000 6c61632000000018 5f746567646e6172 6c616320625f6d6f 5f736e5f6d6f7266 206361323778302b 303d74696e726320 806a0a38806b0000 806a0a38806b0000 00000000806b0000 80683c58806b0000 ... Call Trace: Code: a082ffff 03e00008 00601021 <80820000> 00001821 10400005 24840001 80820000 24630001 ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Fatal exception in interrupt KN04 V2.1k (PC: 0xa0026768, SP: 0x806848e8) >> In this case the pointer in $4 was truncated from 0x980000000203bd00 to 0x000000000203bd00. This may happen when no final console driver has been enabled in the configuration and consequently the initial console continues being used late into bootstrap or with an upcoming change that will switch the zs driver to use a platform device, which in turn will make the console handover happen only after other kernel threads have already been started. Fix the issue by making the buffer static and initdata, and therefore placed in the CKSEG0 32-bit compatibility segment, observing that the console output handler is called with the console lock held, implying no need for this code to be reentrant. Add an assertion to verify the buffer actually has been placed in a compatibility segment. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Maciej W. Rozycki Cc: stable@vger.kernel.org # v2.6.12+ Signed-off-by: Thomas Bogendoerfer Signed-off-by: Greg Kroah-Hartman commit 65bd0c0afb0e1bf3287458e342429b069624f7d4 Author: Dawei Feng Date: Wed Jun 3 18:53:16 2026 +0800 bpf: use kvfree() for replaced sysctl write buffer commit 4c21b5927d4364bfe7365f2700da5fea0ed0d004 upstream. proc_sys_call_handler() allocates its temporary sysctl buffer with kvzalloc() and passes it to __cgroup_bpf_run_filter_sysctl(). Since kvzalloc() may fall back to vmalloc() for large allocations, freeing that buffer with kfree() is wrong and can corrupt memory. Use kvfree() to safely handle both kmalloc and kvzalloc()/vmalloc allocations. The bug was first flagged by an experimental analysis tool we are developing for kernel memory-management bugs while analyzing v6.13-rc1. The tool is still under development and is not yet publicly available. Manual inspection confirms that the bug is still present in v7.1-rc5. Reproduced the bug based on v7.1-rc4 in a QEMU x86_64 guest booted with KASAN and CONFIG_FAILSLAB enabled. To exercise the replacement path, the test tree also included the accompanying fix for the stale ret == 1 check in __cgroup_bpf_run_filter_sysctl(). The reproducer confines failslab injections to the proc_sys_call_handler() range, uses stacktrace-depth=32, and injects fail-nth=1 while writing 8191 bytes to /proc/sys/kernel/domainname from a task in the target cgroup. Under that setup, fail-nth=1 triggered the fault: BUG: unable to handle page fault for address: ffffeb0200024d48 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 SMP KASAN NOPTI CPU: 2 UID: 0 PID: 209 Comm: repro_proc_sys_ Not tainted 7.1.0-rc4-00686-g97625979a5d4 PREEMPT(lazy) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 RIP: 0010:kfree+0x6e/0x510 ... Call Trace: ? __cgroup_bpf_run_filter_sysctl+0x626/0xc30 __cgroup_bpf_run_filter_sysctl+0x74d/0xc30 ? __pfx___cgroup_bpf_run_filter_sysctl+0x10/0x10 ? srso_return_thunk+0x5/0x5f ? __kvmalloc_node_noprof+0x345/0x870 ? proc_sys_call_handler+0x250/0x480 ? srso_return_thunk+0x5/0x5f proc_sys_call_handler+0x3a2/0x480 ? __pfx_proc_sys_call_handler+0x10/0x10 ? srso_return_thunk+0x5/0x5f ? selinux_file_permission+0x39f/0x500 ? srso_return_thunk+0x5/0x5f ? lock_is_held_type+0x9e/0x120 vfs_write+0x98e/0x1000 ... With this fix applied on top of the same test setup, rerunning the reproducer with fail-nth=1 yields no corresponding Oops reports. Fixes: 4508943794ef ("proc: use kvzalloc for our kernel buffer") Cc: stable@vger.kernel.org Reviewed-by: Emil Tsalapatis Reviewed-by: Jiayuan Chen Acked-by: Yonghong Song Signed-off-by: Zilin Guan Signed-off-by: Dawei Feng Link: https://lore.kernel.org/r/20260603105317.944304-3-dawei.feng@seu.edu.cn Signed-off-by: Alexei Starovoitov Signed-off-by: Greg Kroah-Hartman commit 3804e6de30ae7b053d53341d9d6944356cf23b40 Author: Denis Arefev Date: Thu May 21 10:28:56 2026 +0300 block: Avoid mounting the bdev pseudo-filesystem in userspace commit f73aa66dffcb8e61e78f01b56163ec16a15d06d2 upstream. The bdev pseudo-filesystem is an internal kernel filesystem with which userspace should not interfere. Unregister it so that userspace cannot even attempt to mount it. This fixes a bug [1] that occurs when attempting to access files, because the system call move_mount() uses pointers declared in the inode_operations structure, which for the bdev pseudo-filesystem are always equal to 0. `inode->i_op = &empty_iops;` [1] BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page PGD 23380067 P4D 23380067 PUD 23381067 PMD 0 Oops: 0010 [#1] PREEMPT SMP KASAN NOPTI CPU: 2 PID: 17125 Comm: syz-executor.0 Not tainted 6.1.155-syzkaller-00350-g84221fde2681 #0 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 RIP: 0010:0x0 Call Trace: lookup_open.isra.0+0x700/0x1180 fs/namei.c:3460 open_last_lookups fs/namei.c:3550 [inline] path_openat+0x953/0x2700 fs/namei.c:3780 do_filp_open+0x1c5/0x410 fs/namei.c:3810 do_sys_openat2+0x171/0x4d0 fs/open.c:1318 do_sys_open fs/open.c:1334 [inline] __do_sys_openat fs/open.c:1350 [inline] __se_sys_openat fs/open.c:1345 [inline] __x64_sys_openat+0x13c/0x1f0 fs/open.c:1345 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x35/0x80 arch/x86/entry/common.c:81 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Link: https://lore.kernel.org/all/20131010004732.GJ13318@ZenIV.linux.org.uk/T/# Cc: stable@vger.kernel.org Signed-off-by: Denis Arefev Reviewed-by: Christoph Hellwig Link: https://patch.msgid.link/20260521072857.5078-1-arefev@swemel.ru Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit db2c5b9fb908715cab9976ee5966c8493ead34bd Author: Wenjie Qi Date: Wed May 27 20:06:28 2026 +0800 f2fs: keep atomic write retry from zeroing original data commit 6d874b65aadce56ac78f76129dbcfc2599b638f8 upstream. A partial atomic write reserves a block in the COW inode before reading the original data page for the untouched bytes in that page. If that read fails, write_begin returns an error but leaves the COW inode entry as NEW_ADDR. A retry of the same partial write then finds the COW entry, treats it as existing COW data, and f2fs_write_begin() zeroes the whole folio because blkaddr is NEW_ADDR. If the retry is committed, the bytes outside the retried write range are committed as zeroes instead of preserving the original file contents. Only use the COW inode as the read source when it already has a real data block. If the COW entry is still NEW_ADDR, treat it as a reservation to reuse: keep reading the old data from the original inode and avoid reserving or accounting the same atomic block again. Cc: stable@kernel.org Fixes: 3db1de0e582c ("f2fs: change the current atomic write way") Signed-off-by: Wenjie Qi Reviewed-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Greg Kroah-Hartman commit 20190e498057997532c7f186d081011f18e0a462 Author: Yongpeng Yang Date: Mon Apr 27 21:10:51 2026 +0800 f2fs: fix incorrect FI_NO_EXTENT handling in __destroy_extent_node() commit 1f70ddb28a3c71df124da5fa4040c808116d6bb9 upstream. When __destroy_extent_node() sets the inode flag FI_NO_EXTENT, it does not reset the length of the largest extent to 0 and update the inode folio. Since modifications to the extent tree are disallowed afterward, the cached largest extent may become stale. This can trigger the following error in xfstests generic/388: F2FS-fs (dm-0): sanity_check_extent_cache: inode (ino=1761) extent info [220057, 57, 6] is incorrect, run fsck to fix In the f2fs_drop_inode path, __destroy_extent_node() does not need to guarantee that et->node_cnt is 0, because concurrency with writeback is expected in this path, and writeback may update the extent cache. This patch reverts commit ed78aeebef05 ("f2fs: fix node_cnt race between extent node destroy and writeback"), and remove the unnecessary zero check of et->node_cnt. Fixes: ed78aeebef05 ("f2fs: fix node_cnt race between extent node destroy and writeback") Cc: stable@vger.kernel.org Reported-by: Chao Yu Suggested-by: Chao Yu Signed-off-by: Yongpeng Yang Reviewed-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Greg Kroah-Hartman commit ff83de56882cb8466184d322abece2589258ca56 Author: Zhang Cen Date: Mon Jun 15 15:19:54 2026 +0800 f2fs: validate ACL entry sizes in f2fs_acl_from_disk() commit c4810ada31e80cbe4011467c4f3b1e93f94134f3 upstream. f2fs_acl_count() only validates the aggregate ACL xattr length. A malformed ACL can still place ACL_USER or ACL_GROUP in a slot that only contains struct f2fs_acl_entry_short bytes, and f2fs_acl_from_disk() then reads entry->e_id before verifying that a full entry fits. Require a short entry before reading e_tag and e_perm, and require a full entry before reading e_id for ACL_USER and ACL_GROUP. Return -EFSCORRUPTED from these new truncated-entry checks, while keeping the pre-existing -EINVAL paths unchanged. Validation reproduced this kernel report: KASAN slab-out-of-bounds in __f2fs_get_acl+0x6fb/0x7e0 RIP: 0033:0x7f4b835ea7aa The buggy address belongs to the object at ffff888114589960 which belongs to the cache kmalloc-8 of size 8 The buggy address is located 0 bytes to the right of allocated 8-byte region [ffff888114589960, ffff888114589968) Read of size 4 Call trace: dump_stack_lvl+0x66/0xa0 (?:?) print_report+0xce/0x630 (?:?) __f2fs_get_acl+0x6fb/0x7e0 (fs/f2fs/acl.c:169) srso_alias_return_thunk+0x5/0xfbef5 (?:?) __virt_addr_valid+0x224/0x430 (?:?) kasan_report+0xe0/0x110 (?:?) __f2fs_get_acl+0x5/0x7e0 (fs/f2fs/acl.c:169) __get_acl+0x281/0x380 (?:?) vfs_get_acl+0x10b/0x190 (?:?) do_get_acl+0x2a/0x410 (?:?) do_get_acl+0x9/0x410 (?:?) do_getxattr+0xe8/0x260 (?:?) filename_getxattr+0xd1/0x140 (?:?) do_getname+0x2d/0x2d0 (?:?) path_getxattrat+0x16c/0x200 (?:?) lock_release+0xc8/0x290 (?:?) cgroup_update_frozen+0x9d/0x320 (?:?) lockdep_hardirqs_on_prepare+0xea/0x1a0 (?:?) trace_hardirqs_on+0x1a/0x170 (?:?) _raw_spin_unlock_irq+0x28/0x50 (?:?) do_syscall_64+0x115/0x6a0 (arch/x86/entry/syscall_64.c:87) entry_SYSCALL_64_after_hwframe+0x77/0x7f (?:?) Cc: stable@kernel.org Fixes: af48b85b8cd3 ("f2fs: add xattr and acl functionalities") Assisted-by: Codex:gpt-5.5 Signed-off-by: Zhang Cen Reviewed-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Greg Kroah-Hartman commit 888d94cc9afbf1b81b76be1c783c3d9f8f338904 Author: Sunmin Jeong Date: Mon Jun 22 14:28:17 2026 +0900 f2fs: fix to round down start offset of fallocate for pin file commit 4275b59673eb60b02eec3997816c83f1f4b909c4 upstream. Currently, the length of fallocate for pin file is section-aligned to keep allocated sections from being selected as victims of GC. However, for the case that the start offset of fallocate is not aligned in section, the allocated sections can't be fully utilized. It's because a new section is allocated by f2fs_allocate_pinning_section() after using blks_per_sec blocks regardless of the start offset. As a result, several unexpected dirty segments may be created, including blocks assigned to the pinned file. To address this issue, let's round down the start offset of fallocate to the length of section. The reproducing scenario is as below chunk=$(((2<<20)+4096)) # 2MB + 4KB touch test f2fs_io pinfile set test f2fs_io fallocate 0 0 $chunk test f2fs_io fallocate 0 $chunk $chunk test f2fs_io fallocate 0 $((chunk*2)) $chunk test f2fs_io fiemap 0 $((chunk*3)) test Fiemap: offset = 0 len = 12288 logical addr. physical addr. length flags 0 0000000000000000 000000068c600000 0000000000400000 00001088 1 0000000000400000 000000003d400000 0000000000001000 00001088 2 0000000000401000 00000003eb200000 0000000000200000 00001088 3 0000000000601000 00000005e4200000 0000000000001000 00001088 4 0000000000602000 0000000605400000 0000000000200000 00001089 Cc: stable@vger.kernel.org Fixes: f5a53edcf01e ("f2fs: support aligned pinned file") Reviewed-by: Yunji Kang Reviewed-by: Yeongjin Gil Reviewed-by: Sungjong Seo Signed-off-by: Sunmin Jeong Reviewed-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Greg Kroah-Hartman commit 77f216ff9ce5cde8eed9f6d12707e906dffdc9f7 Author: Wenjie Qi Date: Thu May 21 11:16:18 2026 +0800 f2fs: validate compress cache inode only when enabled commit 5073c66a96a9c23c0c2533ed4ed06e42f9021208 upstream. F2FS_COMPRESS_INO() uses NM_I(sbi)->max_nid as the synthetic inode number for the compressed page cache inode. That inode only exists when the compress_cache mount option is enabled. When compress_cache is disabled, max_nid is outside the valid inode range. A corrupted directory entry that points to ino == max_nid should therefore be rejected by f2fs_check_nid_range(). However, is_meta_ino() currently treats F2FS_COMPRESS_INO() as a meta inode unconditionally, so f2fs_iget() bypasses do_read_inode() and its nid range check, and instantiates a fake internal inode instead. Gate the compressed cache inode case on COMPRESS_CACHE, matching f2fs_init_compress_inode(). With compress_cache disabled, ino == max_nid now follows the normal inode path and is rejected as an out-of-range nid. Cc: stable@kernel.org Fixes: 6ce19aff0b8c ("f2fs: compress: add compress_inode to cache compressed blocks") Signed-off-by: Wenjie Qi Reviewed-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Greg Kroah-Hartman commit 8aad54746c251f2c2370118df766c0c82e2d2091 Author: Wenjie Qi Date: Tue May 26 13:35:57 2026 +0800 f2fs: validate orphan inode entry count commit 846c499a65816d13f1186e3090e825e8bb8bcb8b upstream. f2fs_recover_orphan_inodes() trusts the orphan block entry_count when replaying orphan inodes from the checkpoint pack. A corrupted entry_count larger than F2FS_ORPHANS_PER_BLOCK makes the recovery loop read past the ino[] array and interpret footer or following data as inode numbers. On a crafted image, mounting an unpatched kernel can drive orphan recovery into f2fs_bug_on() and panic the kernel. Validate entry_count before consuming entries so corrupted checkpoint data fails the mount with -EFSCORRUPTED and requests fsck instead. Set ERROR_INCONSISTENT_ORPHAN as well, so the corruption reason can be recorded in the superblock s_errors[] field. This gives fsck a persistent hint even though mount-time orphan recovery failure may leave no chance to persist SBI_NEED_FSCK through a checkpoint. Cc: stable@kernel.org Fixes: 127e670abfa7 ("f2fs: add checkpoint operations") Signed-off-by: Wenjie Qi Reviewed-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Greg Kroah-Hartman commit 1e48fefac682c5ee133add84d1f06d458fedf635 Author: Wenjie Qi Date: Wed May 20 20:07:05 2026 +0800 f2fs: pass correct iostat type for single node writes commit fcb05c26c2a67953b420739b85f49386efc9b6c0 upstream. f2fs_write_single_node_folio() takes an io_type argument, but still passes FS_GC_NODE_IO to __write_node_folio() unconditionally. This was harmless while the helper was only used by f2fs_move_node_folio(), whose caller passes FS_GC_NODE_IO. However, commit fe9b8b30b971 ("f2fs: fix inline data not being written to disk in writeback path") made f2fs_inline_data_fiemap() call the helper with FS_NODE_IO for FIEMAP_FLAG_SYNC. Honor the caller supplied io_type so inline-data FIEMAP sync writeback is accounted as normal node IO instead of GC node IO, while the GC path continues to pass FS_GC_NODE_IO explicitly. Cc: stable@kernel.org Fixes: fe9b8b30b971 ("f2fs: fix inline data not being written to disk in writeback path") Signed-off-by: Wenjie Qi Reviewed-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Greg Kroah-Hartman commit 1de92789ce31e46fa7e7d8e89c90b19cdb1c103b Author: Junrui Luo Date: Thu Apr 2 14:48:07 2026 +0800 wifi: iwlwifi: mld: validate sta_mask before ffs() in BA session handlers commit f056fc2b927448d37eca6b6cacc3d1b0f67b20d2 upstream. Three BA session handlers use ffs(ba_data->sta_mask) - 1 to derive a station ID without checking that sta_mask is non-zero. When sta_mask is zero, ffs() returns 0 and the subtraction wraps to 0xFFFFFFFF, causing an out-of-bounds access on fw_id_to_link_sta[]. Add WARN_ON_ONCE(!ba_data->sta_mask) guards before each ffs() call, consistent with the existing check in iwl_mld_ampdu_rx_start(). Reported-by: Yuhao Jiang Cc: stable@vger.kernel.org Signed-off-by: Junrui Luo Link: https://patch.msgid.link/SYBPR01MB788115C6CE873271A9A15A25AF51A@SYBPR01MB7881.ausprd01.prod.outlook.com Signed-off-by: Miri Korenblit Signed-off-by: Greg Kroah-Hartman commit b0b07e04f0c7219bd1a3eb15e22bdf9109f0d393 Author: Junjie Cao Date: Thu Feb 12 20:50:35 2026 +0800 wifi: iwlwifi: mld: fix race condition in PTP removal commit e1fc08598aa34b28359831e768076f56632720c1 upstream. iwl_mld_ptp_remove() calls cancel_delayed_work_sync() only after ptp_clock_unregister() and clearing ptp_data state (ptp_clock, last_gp2, wrap_counter). This creates a race where the delayed work iwl_mld_ptp_work() can execute between ptp_clock_unregister() and cancel_delayed_work_sync(), observing partially cleared PTP state. Move cancel_delayed_work_sync() before ptp_clock_unregister() to ensure the delayed work is fully stopped before any PTP cleanup begins. Cc: stable@vger.kernel.org Reviewed-by: Simon Horman Reviewed-by: Vadim Fedorenko Signed-off-by: Junjie Cao Link: https://patch.msgid.link/20260212125035.1345718-2-junjie.cao@intel.com Signed-off-by: Miri Korenblit Signed-off-by: Greg Kroah-Hartman commit df626f284cb90b2566bd296d04e134a55c8d3012 Author: Junjie Cao Date: Thu Feb 12 20:50:34 2026 +0800 wifi: iwlwifi: mvm: fix race condition in PTP removal commit 65150c9cc3e06ab54bc4e8134a47f6f5d095a4e3 upstream. iwl_mvm_ptp_remove() calls cancel_delayed_work_sync() only after ptp_clock_unregister() and clearing ptp_data state (ptp_clock, ptp_clock_info, last_gp2). This creates a race where the delayed work iwl_mvm_ptp_work() can execute between ptp_clock_unregister() and cancel_delayed_work_sync(), observing partially cleared PTP state. Move cancel_delayed_work_sync() before ptp_clock_unregister() to ensure the delayed work is fully stopped before any PTP cleanup begins. Cc: stable@vger.kernel.org Reviewed-by: Simon Horman Reviewed-by: Vadim Fedorenko Signed-off-by: Junjie Cao Link: https://patch.msgid.link/20260212125035.1345718-1-junjie.cao@intel.com Signed-off-by: Miri Korenblit Signed-off-by: Greg Kroah-Hartman commit 200d58c851b8f63f77a05570072dd20f79bc3681 Author: Luka Gejak Date: Mon May 18 16:23:11 2026 +0200 wifi: rtw88: usb: fix memory leaks on USB write failures commit 6b964941bbfe6e0f18b1a5e008486dbb62df440a upstream. When rtw_usb_write_port() fails to submit a USB Request Block (URB) (e.g., due to device disconnect or ENOMEM), the completion callback is never executed. Currently, the driver ignores the return value of rtw_usb_write_port() in rtw_usb_write_data() and rtw_usb_tx_agg_skb(). Because these functions rely on the completion callback to free the socket buffers (skbs) and the transaction control block (txcb), a submission failure results in: 1. A memory leak of the allocated skb in rtw_usb_write_data(). 2. A memory leak of the txcb structure and all aggregated skbs in rtw_usb_tx_agg_skb(). Fix this by checking the return value of rtw_usb_write_port(). If it fails, explicitly free the skb in rtw_usb_write_data(), and properly purge the tx_ack_queue and free the txcb in rtw_usb_tx_agg_skb(). The issue was discovered in practice during device disconnect/reconnect scenarios and memory pressure conditions. Tested by verifying normal TX operation continues after the fix without regressions. Fixes: a82dfd33d123 ("wifi: rtw88: Add common USB chip support") Cc: stable@vger.kernel.org Acked-by: Ping-Ke Shih Tested-by: Luka Gejak Signed-off-by: Luka Gejak Signed-off-by: Ping-Ke Shih Link: https://patch.msgid.link/20260518142311.10328-2-luka.gejak@linux.dev Signed-off-by: Greg Kroah-Hartman commit 73d427d271f7af6a738feab3368460331821ab6d Author: Luka Gejak Date: Mon May 18 16:23:10 2026 +0200 wifi: rtw88: increase TX report timeout to fix race condition commit c80788f7c5aed8d420366b821f867a8a353d83a5 upstream. The driver expects the firmware to report TX status within 500ms. However, a timeout can be triggered when the hardware performs background scans while under TX load. During these scans, the firmware stays off-channel for periods exceeding 500ms, delaying the delivery of TX reports back to the driver. When this occurs, the purge timer fires prematurely and drops the tracking skbs from the queue. This results in the host stack interpreting the missing status as packet loss, leading to TCP window collapse. In testing with iperf3, this causes throughput to drop from ~90 Mbps to near-zero for approximately 2 seconds until the connection recovers. Increase RTW_TX_PROBE_TIMEOUT to 2500ms for RTL8723DU. This duration is sufficient to accommodate off-channel dwell time during full background scans, ensuring the purge timer only trips during genuine firmware lockups and preventing unnecessary TCP retransmission cycles. Fixes: a82dfd33d123 ("wifi: rtw88: Add common USB chip support") Cc: stable@vger.kernel.org Acked-by: Ping-Ke Shih Tested-by: Luka Gejak Signed-off-by: Luka Gejak Signed-off-by: Ping-Ke Shih Link: https://patch.msgid.link/20260518142311.10328-1-luka.gejak@linux.dev Signed-off-by: Greg Kroah-Hartman commit 0aeb4d3ff6ced464b342bca5c772e7502f3cad32 Author: Bitterblue Smith Date: Sat Apr 25 22:32:58 2026 +0300 wifi: rtlwifi: rtl8821ae: Fix C2H bit location in RX descriptor commit 83d38df6929118c3f996b9e3351c2d5014073d87 upstream. Bit 28 of double word 2 in the RX descriptor indicates if the packet is a normal 802.11 frame, or a message from the wifi firmware to the driver (Card 2 Host). Commit f5678bfe1cdc ("rtlwifi: rtl8821ae: Replace local bit manipulation macros") mistakenly made the driver look for this bit in double word 1, causing packet loss and Bluetooth coexistence problems. Fixes: f5678bfe1cdc ("rtlwifi: rtl8821ae: Replace local bit manipulation macros") Cc: Signed-off-by: Bitterblue Smith Acked-by: Ping-Ke Shih Signed-off-by: Ping-Ke Shih Link: https://patch.msgid.link/04da7398-cedb-425a-a810-5772ab10139d@gmail.com Signed-off-by: Greg Kroah-Hartman commit 40aa3c2b0cb8e34e0576fc94cc70e4e33db03c0a Author: Jose Ignacio Tornos Martinez Date: Mon Apr 20 13:01:29 2026 +0200 wifi: ath11k: fix warning when unbinding commit 8b7a26b6681922a38cd5a7829ace61f8e54df9b7 upstream. If there is an error during some initialization related to firmware, the buffers dp->tx_ring[i].tx_status are released. However this is released again when the device is unbinded (ath11k_pci), and we get: WARNING: CPU: 0 PID: 6231 at mm/slub.c:4368 free_large_kmalloc+0x57/0x90 Call Trace: free_large_kmalloc ath11k_dp_free ath11k_core_deinit ath11k_pci_remove ... The issue is always reproducible from a VM because the MSI addressing initialization is failing. In order to fix the issue, just set the buffers to NULL after releasing in order to avoid the double free. Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices") Cc: stable@vger.kernel.org Signed-off-by: Jose Ignacio Tornos Martinez Reviewed-by: Baochen Qiang Reviewed-by: Rameshkumar Sundaram Link: https://patch.msgid.link/20260420110130.509670-1-jtornosm@redhat.com Signed-off-by: Jeff Johnson Signed-off-by: Greg Kroah-Hartman commit a7cdc384c9c57506df62e7ff04058eff61bd1d0f Author: ElXreno Date: Wed May 6 04:39:16 2026 +0300 wifi: mt76: mt7925: don't disable AP BSS when removing TDLS peer commit 37d65384aa6f9cbe45f4052b13b378af1aab3e95 upstream. On a STATION vif, removing a TDLS peer takes the mt7925_mac_sta_remove -> mt7925_mac_sta_remove_links path. The first loop in that function calls mt7925_mcu_add_bss_info(..., enable=false) for every link of the station being removed. For a non-MLO STATION vif there is exactly one link, link 0, whose bss_conf is the AP's. TDLS peers do not have their own bss_conf - they share the AP's BSS. The result is that every TDLS peer teardown sends a BSS_INFO_UPDATE with enable=0 for the AP's BSS to the firmware, which wipes the AP-side rate-control context. The connection stays associated and TX from the host still works at the negotiated rate, but the AP's downlink to us collapses to the lowest mandatory OFDM rate (HE-MCS 0 / 6 Mbit/s OFDM) and only slowly recovers as rate adaptation re-learns under sustained traffic. With brief or bursty traffic the link can stay at 6-72 Mbit/s indefinitely, requiring a manual reconnect. mt7925_mac_link_sta_remove() already guards its own mt7925_mcu_add_bss_info(..., false) call with "vif->type == NL80211_IFTYPE_STATION && !link_sta->sta->tdls". Add the equivalent guard at the top of the cleanup loop in mt7925_mac_sta_remove_links(), above the link_sta / link_conf / mlink / mconf lookups, so TDLS peer teardown skips the loop body entirely without doing the per-link work that would just be thrown away. Verified on mt7925e by triggering Samsung-S938B auto-TDLS via iperf3 and watching iw rx bitrate after teardown: Before: rx bitrate collapses to 6.0-72.0 Mbit/s, oscillates 17/72/ 137/288/432 Mbit/s for 30+ seconds, no full recovery without a manual reassoc. After: rx bitrate stays at 1200.9 Mbit/s HE-MCS 11 NSS 2 80 MHz across the entire TDLS lifecycle. bpftrace confirms a single mt7925_mcu_add_bss_info(enable=0) call per teardown before the fix; zero such calls after. Fixes: 3878b4333602 ("wifi: mt76: mt7925: update mt7925_mac_link_sta_[add, assoc, remove] for MLO") Cc: stable@vger.kernel.org Signed-off-by: ElXreno Assisted-by: Claude:claude-opus-4-7 bpftrace Link: https://patch.msgid.link/20260506-mt7925-tdls-fixes-v2-2-46aa826ba8bb@gmail.com Signed-off-by: Felix Fietkau Signed-off-by: Greg Kroah-Hartman commit 7e25b5e22c1f43f0d47ff487360e6aebd55d033a Author: Zenm Chen Date: Tue Apr 7 23:44:30 2026 +0800 wifi: mt76: mt76x2u: Add support for ELECOM WDC-867SU3S commit f4ce0664e9f0387873b181777891741c33e19465 upstream. Add the ID 056e:400a to the table to support an additional MT7612U adapter: ELECOM WDC-867SU3S. Compile tested only. Cc: stable@vger.kernel.org # 5.10.x Signed-off-by: Zenm Chen Acked-by: Lorenzo Bianconi Link: https://patch.msgid.link/20260407154430.9184-1-zenmchen@gmail.com Signed-off-by: Felix Fietkau Signed-off-by: Greg Kroah-Hartman commit ec1c9e8962555b1766d059dc02760645c2ff62ff Author: Mike Rapoport (Microsoft) Date: Wed May 13 11:14:16 2026 +0300 userfaultfd: ensure mremap_userfaultfd_fail() releases mmap_changing commit 0496a59745b0723ea74274db16fd5c8b1379b9a9 upstream. Sashiko says: mremap_userfaultfd_prep() increments ctx->mmap_changing to stall concurrent operations, but mremap_userfaultfd_fail() does not decrement it before dropping the context reference. If an mremap operation fails, ctx->mmap_changing remains elevated. This will causes subsequent userfaultfd operations like a UFFDIO_COPY to fail with -EAGAIN. Decrement ctx->mmap_changing in mremap_userfaultfd_fail(). Link: https://sashiko.dev/#/patchset/20260430113512.115938-1-rppt@kernel.org Link: https://lore.kernel.org/20260513081416.495963-1-rppt@kernel.org Fixes: df2cc96e7701 ("userfaultfd: prevent non-cooperative events vs mcopy_atomic races") Signed-off-by: Mike Rapoport (Microsoft) Reviewed-by: David Hildenbrand (Arm) Cc: Al Viro Cc: Christian Brauner Cc: Jan Kara Cc: Peter Xu Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit 7216ce8cb12fee44e309503955bb83806b106129 Author: Shaomin Chen Date: Wed Jun 10 13:10:05 2026 +0300 keys: Pin request_key_auth payload in instantiate paths commit fd15b457a86939c38aa12116adabd8ff686c5e51 upstream. A: request_key() B: KEYCTL_INSTANTIATE_IOV ================ ========================= create auth key store rka in auth key wait for helper get auth key load rka from auth key copy user payload sleep on #PF helper completed detach and free rka destroy auth key wake up use rka->target_key **USE-AFTER-FREE** Give request_key_auth payloads a refcount. Take a payload reference while authkey->sem stabilizes the payload and revocation state. Hold that reference across the instantiate and reject paths. Drop the auth key owning reference from revoke and destroy. [jarkko: Replaced the first two paragraphs of text with an actual concurrency scenario.] Cc: stable@vger.kernel.org # v5.10+ Fixes: b5f545c880a2 ("[PATCH] keys: Permit running process to instantiate keys") Reported-by: Shaomin Chen Closes: https://lore.kernel.org/r/20260519144403.436694-1-eeesssooo020@gmail.com Signed-off-by: Shaomin Chen Signed-off-by: Jarkko Sakkinen Signed-off-by: Greg Kroah-Hartman commit b11c1fa32667692a2c0566e10163758e786e430c Author: Jarkko Sakkinen Date: Mon Jun 1 23:11:54 2026 +0300 KEYS: fix overflow in keyctl_pkey_params_get_2() commit cb481e59ea6cae3b7796ac1d7a22b6b24c3f3c0b upstream. The length for the internal output buffer is calculated incorrectly, which can result overflow when a too small buffer is provided. Fix the bug by allocating internal output with the size of the maximum length of the cryptographic primitive instead of caller provided size. Link: https://lore.kernel.org/keyrings/20260531024914.3712130-1-jarkko@kernel.org/ Cc: stable@vger.kernel.org # v4.20+ Fixes: 00d60fd3b932 ("KEYS: Provide keyctls to drive the new key type ops for asymmetric keys [ver #2]") Reported-by: Alessandro Groppo Tested-by: Alessandro Groppo Signed-off-by: Jarkko Sakkinen Signed-off-by: Greg Kroah-Hartman commit 49d893b9cbcfc5802a32e53a64c6c6956670d65b Author: Konstantin Khorenko Date: Mon May 11 12:50:52 2026 +0200 gcov: use atomic counter updates to fix concurrent access crashes commit 56cb9b7d96b28a1173a510ab25354b6599ad3a33 upstream. GCC's GCOV instrumentation can merge global branch counters with loop induction variables as an optimization. In inflate_fast(), the inner copy loops get transformed so that the GCOV counter value is loaded multiple times to compute the loop base address, start index, and end bound. Since GCOV counters are global (not per-CPU), concurrent execution on different CPUs causes the counter to change between loads, producing inconsistent values and out-of-bounds memory writes. The crash manifests during IPComp (IP Payload Compression) processing when inflate_fast() runs concurrently on multiple CPUs: BUG: unable to handle page fault for address: ffffd0a3c0902ffa RIP: inflate_fast+1431 Call Trace: zlib_inflate __deflate_decompress crypto_comp_decompress ipcomp_decompress [xfrm_ipcomp] ipcomp_input [xfrm_ipcomp] xfrm_input At the crash point, the compiler generated three loads from the same global GCOV counter (__gcov0.inflate_fast+216) to compute base, start, and end for an indexed loop. Another CPU modified the counter between loads, making the values inconsistent - the write went 3.4 MB past a 65 KB buffer. Add -fprofile-update=prefer-atomic to CFLAGS_GCOV at the global level in the top-level Makefile, guarded by a try-run compile test. The test compiles a minimal program with and without -fprofile-update=prefer-atomic using the full KBUILD_CFLAGS, then compares undefined symbols in the resulting object files. If prefer-atomic introduces new undefined references (such as __atomic_fetch_add_8 on i386 or __aarch64_ldadd8_relax on arm64 with outline-atomics), the flag is not added -- the kernel does not link against libatomic. On architectures where GCC inlines 64-bit atomic counter updates (x86_64, s390, ...) the test passes and the flag is enabled, preventing the compiler from merging counters with loop induction variables and fixing the observed concurrent-access crash. On architectures where the flag would introduce libatomic dependencies, it is silently omitted and behaviour is no worse than before this patch. Move the CFLAGS_GCOV block from its original position (before the arch Makefile include) to after the core KBUILD_CFLAGS assignments but before the scripts/Makefile.gcc-plugins include. This placement ensures the try-run test sees arch-specific flags (-m32, -march=, -mno-outline-atomics) while avoiding GCC plugin flags (-fplugin=) that would break the test on clean builds when plugin shared objects do not yet exist. Link: https://lore.kernel.org/20260511105052.417187-2-khorenko@virtuozzo.com Signed-off-by: Konstantin Khorenko Tested-by: Arnd Bergmann Tested-by: Peter Oberparleiter Reviewed-by: Peter Oberparleiter Cc: Masahiro Yamada Cc: Miguel Ojeda Cc: Mikhail Zaslonko Cc: Nathan Chancellor Cc: Pavel Tikhomirov Cc: Thomas Weißschuh Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit 2b7ec72786094a4f4abd9bade1170021f026c5ff Author: Arnd Bergmann Date: Tue May 26 12:18:41 2026 +0200 err.h: use __always_inline on all error pointer helpers commit 94bfc7f3b0c7c33331ba4ff6cc64ff309dfcbce8 upstream. While testing randconfig builds on s390, I came across a link failure with CONFIG_DMA_SHARED_BUFFER disabled: ERROR: modpost: "dma_buf_put" [drivers/iommu/iommufd/iommufd.ko] undefined! The problem here is that IS_ERR() is not inlined and dead code elimination fails as a consequence. The err.h helpers all turn into a trivial assignment of a bit mask and should never result in a function call, so force them to always be inline. This should generally result in better object code aside from avoiding the link failure above. Link: https://lore.kernel.org/20260526101851.2495110-1-arnd@kernel.org Signed-off-by: Arnd Bergmann Reviewed-by: Alexander Lobakin Reviewed-by: Nathan Chancellor Tested-by: Tamir Duberstein Cc: Alexander Gordeev Cc: Andriy Shevchenko Cc: Ansuel Smith Cc: Bjorn Andersson Cc: Heiko Carstens Cc: Vasily Gorbik Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit 1fcca1260c6e74e2279661511cdaa0aa232e4f7e Author: Ard Biesheuvel Date: Thu Jun 4 17:11:56 2026 +0200 KVM: arm64: Omit tag sync on stage-2 mappings of the zero page commit 2986a625740599fe6e7635b0586fed2a95bcd1f7 upstream. Commit f620d66af316 ("arm64: mte: Do not flag the zero page as PG_mte_tagged") removed the PG_mte_tagged flag from the zero page, but missed a KVM code path that may set this flag on the zero page when it is used in a stage-2 CoW mapping of anonymous memory. So disregard the zero page explicitly in sanitise_mte_tags(). Fixes: f620d66af316 ("arm64: mte: Do not flag the zero page as PG_mte_tagged") Cc: stable@vger.kernel.org # 5.10.x Suggested-by: Catalin Marinas Signed-off-by: Ard Biesheuvel Reviewed-by: Catalin Marinas Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman commit 97e1044e79c5d6bdbc435e33980f52e6e1f5d65f Author: Usama Arif Date: Tue Jun 16 07:15:18 2026 -0700 block: invalidate cached plug timestamp after task switch commit fad156c2af227f42ca796cbb20ddc354a6dd9932 upstream. blk_time_get_ns() caches ktime_get_ns() in current->plug->cur_ktime and marks the task with PF_BLOCK_TS. That cache is only valid while the task keeps running; if the task is switched out, wall-clock time advances and the cached value must not be reused when the task runs again. The existing invalidation covers explicit plug flushes through __blk_flush_plug(), and the schedule() / rtmutex paths through sched_update_worker(). It does not cover in-kernel preemption paths such as preempt_schedule(), preempt_schedule_notrace(), and preempt_schedule_irq(), which enter __schedule(SM_PREEMPT) directly and return without calling sched_update_worker(). As a result, a task preempted while holding a plug with PF_BLOCK_TS set can reuse a stale plug->cur_ktime after it is scheduled back in. blk-iocost then consumes that stale timestamp through ioc_now(), producing stale vnow values for throttle decisions, and through ioc_rqos_done(), inflating on-queue time and feeding false missed-QoS samples into vrate adjustment. Move the schedule-side invalidation to finish_task_switch(), which runs for the scheduled-in task after every actual context switch regardless of which schedule entry point was used. Keep __blk_flush_plug() as the explicit flush/finish-plug invalidation path, and remove only the PF_BLOCK_TS handling from sched_update_worker(). Fixes: 06b23f92af87 ("block: update cached timestamp post schedule/preemption") Cc: stable@vger.kernel.org Signed-off-by: Usama Arif Link: https://patch.msgid.link/20260616141604.328820-3-usama.arif@linux.dev Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit 99e6c712cc300883b8cbf03347d5359ec1a4d6dd Author: Usama Arif Date: Tue Jun 16 07:15:17 2026 -0700 kernel/fork: clear PF_BLOCK_TS in copy_process() commit fd38b75c4b43295b10d69772a46d1c74dbd6fc81 upstream. PF_BLOCK_TS is only set in blk_time_get_ns() when current->plug is non-NULL, and blk_finish_plug() clears it via __blk_flush_plug() before NULLing the plug pointer. copy_process() breaks the invariant by inheriting PF_BLOCK_TS from the parent while resetting the child's plug to NULL. Clear PF_BLOCK_TS alongside that assignment so callers can rely on "PF_BLOCK_TS set implies current->plug != NULL" and dereference current->plug unguarded. Fixes: 06b23f92af87 ("block: update cached timestamp post schedule/preemption") Cc: stable@vger.kernel.org Signed-off-by: Usama Arif Link: https://patch.msgid.link/20260616141604.328820-2-usama.arif@linux.dev Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit 0d35f9f194a858567a21017d69318a51e3a822b9 Author: Ian Bridges Date: Thu Jun 25 23:50:48 2026 -0500 fbdev: fix use-after-free in store_modes() commit 2c1c805c65fb7dc7524e20376d6987721e73a0b1 upstream. store_modes() replaces a framebuffer's modelist with modes from userspace. On success it frees the old modelist with fb_destroy_modelist(). Two fields still point into that freed list. One pointer is fb_display[i].mode, the mode a console is using. fbcon_new_modelist() moves these pointers to the new list. It only does so for consoles still mapped to the framebuffer. An unmapped console is skipped and keeps its stale pointer. Unbinding fbcon, for example, sets con2fb_map[i] to -1 but leaves fb_display[i].mode set. An FBIOPUT_VSCREENINFO ioctl with FB_ACTIVATE_INV_MODE later reaches fbcon_mode_deleted(). That function reads the stale fb_display[i].mode through fb_mode_is_equal(). The read is a use-after-free. The other pointer is fb_info->mode, the current mode. It is set through the mode sysfs attribute. store_modes() does not update fb_info->mode, so it is left pointing into the freed list. show_mode(), the attribute's read handler, dereferences the stale fb_info->mode through mode_string(). The read is a use-after-free. Clear both pointers before freeing the list. Commit a1f305893074 ("fbcon: Set fb_display[i]->mode to NULL when the mode is released") added the helper fbcon_delete_modelist(). It clears every fb_display[i].mode that points into a given list. So far it is called only from the unregister path. Call it from store_modes() too, and set fb_info->mode to NULL. Reported-by: syzbot+81c7c6b52649fd07299d@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=81c7c6b52649fd07299d Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/ajjoDhAi2y4ArSlz@dev/ Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Ian Bridges Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman commit 81371dbd23601f67f01372817fdbab42c5601e43 Author: Koichiro Den Date: Wed Mar 4 11:05:27 2026 +0900 NTB: epf: Avoid pci_iounmap() with offset when PEER_SPAD and CONFIG share BAR commit d876153680e3d721d385e554def919bce3d18c74 upstream. When BAR_PEER_SPAD and BAR_CONFIG share one PCI BAR, the module teardown path ends up calling pci_iounmap() on the same iomem with some offset, which is unnecessary and triggers a kernel warning like the following: Trying to vunmap() nonexistent vm area (0000000069a5ffe8) WARNING: mm/vmalloc.c:3470 at vunmap+0x58/0x68, CPU#5: modprobe/2937 [...] Call trace: vunmap+0x58/0x68 (P) iounmap+0x34/0x48 pci_iounmap+0x2c/0x40 ntb_epf_pci_remove+0x44/0x80 [ntb_hw_epf] pci_device_remove+0x48/0xf8 device_remove+0x50/0x88 device_release_driver_internal+0x1c8/0x228 driver_detach+0x50/0xb0 bus_remove_driver+0x74/0x100 driver_unregister+0x34/0x68 pci_unregister_driver+0x34/0xa0 ntb_epf_pci_driver_exit+0x14/0xfe0 [ntb_hw_epf] [...] Fix it by unmapping only when PEER_SPAD and CONFIG use difference bars. Cc: stable@vger.kernel.org Fixes: e75d5ae8ab88 ("NTB: epf: Allow more flexibility in the memory BAR map method") Reviewed-by: Frank Li Signed-off-by: Koichiro Den Reviewed-by: Dave Jiang Signed-off-by: Jon Mason Signed-off-by: Greg Kroah-Hartman commit c3ca2631073b2cef06824fd2bfc452ff7a1023de Author: Ruslan Valiyev Date: Tue May 26 00:04:46 2026 +0200 apparmor: fix use-after-free in rawdata dedup loop commit 6f060496d03e4dc560a40f73770bd08335cb7a27 upstream. aa_replace_profiles() walks ns->rawdata_list to dedup the incoming policy blob against entries already attached to existing profiles. Per the kernel-doc on struct aa_loaddata, list membership does not hold a reference: profiles hold pcount, and when the last pcount drops, do_ploaddata_rmfs() is queued on a workqueue that takes ns->lock and removes the entry. Between dropping the last pcount and the workqueue running, an entry remains on the list with pcount == 0. aa_get_profile_loaddata() is an unconditional kref_get() on pcount, so when the dedup loop hits such an entry, refcount hardening reports refcount_t: addition on 0; use-after-free. inside aa_replace_profiles(), and the poisoned counter then trips "saturated" and "underflow" warnings on the subsequent uses of the same loaddata. Before commit a0b7091c4de4 ("apparmor: fix race on rawdata dereference") the dedup path used a get_unless_zero-style helper on a single counter, so the existing "if (tmp)" guard was meaningful. The split-refcount refactor introduced aa_get_profile_loaddata(), which has plain kref_get() semantics, and the guard quietly became a no-op. Introduce aa_get_profile_loaddata_not0(), matching the existing _not0 convention used by aa_get_profile_not0(), and use it for the rawdata_list dedup lookup so dying entries are skipped. Reproduced on x86_64 with v7.1-rc5 in QEMU+KVM running Ubuntu 24.04 + stress-ng 0.17.06: stress-ng --apparmor 1 --klog-check --timeout 60s Without this patch the three refcount_t warnings fire within a few seconds. With it the same 60 s run is clean. Coverage is a smoke-test only; a longer soak with CONFIG_KASAN, CONFIG_KCSAN and CONFIG_PROVE_LOCKING would be welcome from anyone with the cycles. Fixes: a0b7091c4de4 ("apparmor: fix race on rawdata dereference") Reported-by: Colin Ian King Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221513 Cc: stable@vger.kernel.org Signed-off-by: Ruslan Valiyev Signed-off-by: John Johansen Signed-off-by: Greg Kroah-Hartman commit 4a69b83045d3195d5b9a9b053ad840ddb2998b4e Author: Bryam Vargas Date: Mon Jun 22 15:57:38 2026 -0500 apparmor: mediate the implicit connect of TCP fast open sendmsg commit 4d587cd8a72155089a627130bbd4716ec0856e21 upstream. sendmsg()/sendto() with MSG_FASTOPEN is a combination of connect(2) and write(2): it opens the connection in the SYN. apparmor_socket_sendmsg() only checks AA_MAY_SEND, so a profile that grants send but denies connect lets a confined task open an outbound TCP/MPTCP connection that connect(2) would have refused, bypassing connect mediation. Mediate the implicit connect when MSG_FASTOPEN is set and a destination is supplied. Add it to apparmor_socket_sendmsg() (not the shared aa_sock_msg_perm() helper, which recvmsg also uses) and call aa_sk_perm() directly, mirroring the selinux and tomoyo fixes. sk_is_tcp() does not cover MPTCP fast open, so the SOCK_STREAM/IPPROTO_MPTCP arm is explicit. Fixes: cf60af03ca4e ("net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)") Cc: stable@vger.kernel.org Signed-off-by: Bryam Vargas Signed-off-by: John Johansen Signed-off-by: Greg Kroah-Hartman commit 1697957eb0971d420dde42862b88eb43506a1105 Author: Maoyi Xie Date: Fri Jun 12 16:59:35 2026 +0800 net: ip_gre: require CAP_NET_ADMIN in the device netns for changelink commit 8165f7ff57d9667d2bb477ef6af83ede7fed4ad7 upstream. A tunnel changelink() operates on at most two netns, dev_net(dev) and the tunnel link netns t->net. They differ once the device is created in or moved to a netns other than the one the request runs in. The rtnl changelink path checks CAP_NET_ADMIN only against dev_net(dev), so a caller privileged there but not in t->net can rewrite a tunnel that lives in t->net. Add rtnl_dev_link_net_capable() next to rtnl_get_net_ns_capable() in net/core/rtnetlink.c. It requires CAP_NET_ADMIN in the link netns and is skipped when the link netns is dev_net(dev), where the rtnl path already checked it. The other patches in this series use the same helper. Gate ipgre_changelink() and erspan_changelink() with it, at the top of the op before any attribute is parsed, because the parsers update live tunnel fields first. ipgre_netlink_parms() sets t->collect_md before ip_tunnel_changelink() runs. Commit 8b484efd5cb4 ("ip6: vti: Use ip6_tnl.net in vti6_siocdevprivate().") added the same check on the ioctl path. This adds it on RTM_NEWLINK. Reported-by: Xiao Liang Closes: https://lore.kernel.org/netdev/CABAhCOSzP1vaThGV35_VnsRCb=87_CPjPVsTHbq905k8A+BuUg@mail.gmail.com/ Fixes: b57708add314 ("gre: add x-netns support") Cc: stable@vger.kernel.org Signed-off-by: Maoyi Xie Reviewed-by: Kuniyuki Iwashima Link: https://patch.msgid.link/20260612085941.3158249-2-maoyixie.tju@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman commit 1acdd14c0990dd1cd4b6534f00366d2e6dfce05f Author: Yiming Qian Date: Wed Jun 10 06:21:36 2026 +0000 net: skmsg: preserve sg.copy across SG transforms commit 406e8a651a7b854c41fecd5117bb282b3a6c2c6b upstream. The sk_msg sg.copy bitmap is part of the scatterlist entry ownership state. A set bit tells sk_msg_compute_data_pointers() not to expose the entry through writable BPF ctx->data. This protects entries backed by pages that are not private to the sk_msg, such as splice-backed file page-cache pages. Several sk_msg transform paths move, copy, split, or compact msg->sg.data[] entries without moving the matching sg.copy bit. This can make an externally backed entry arrive at a new slot with a clear copy bit. A later SK_MSG verdict can then expose sg_virt(sge) as writable ctx->data and BPF stores can modify the original page cache. Keep sg.copy synchronized with sg.data[] whenever entries are transferred, shifted, split, or copied into a new sk_msg. Clear the bit when an entry is replaced by a newly allocated private page or freed. This covers the BPF pull/push/pop helpers, sk_msg_shift_left/right(), sk_msg_xfer(), and tls_split_open_record(), including the partial tail entry created during TLS open-record splitting. Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling") Cc: stable@vger.kernel.org Reported-by: Yiming Qian Reported-by: Keenan Dong Signed-off-by: Yiming Qian Link: https://patch.msgid.link/20260610062137.49075-1-yimingqian591@gmail.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman commit bd968bdd568beacfdf98ec537a87527e85f1d0cf Author: Doruk Tan Ozturk Date: Tue May 26 20:37:26 2026 +0200 mac802154: llsec: add skb_cow_data() before in-place crypto commit 84a04eb5b210643bd67aab81ff805d32f62aa865 upstream. llsec_do_encrypt_unauth(), llsec_do_encrypt_auth(), llsec_do_decrypt_unauth(), and llsec_do_decrypt_auth() all perform in-place cryptographic transformations on skb data. They build a scatterlist with sg_init_one() pointing into the skb's linear data area and then pass the same scatterlist as both src and dst to the crypto API (e.g. crypto_skcipher_encrypt/decrypt, crypto_aead_encrypt/decrypt). On the RX path, __ieee802154_rx_handle_packet() clones the received skb before handing it to each subscriber via ieee802154_subif_frame(). The cloned skb shares the same underlying data buffer via reference counting. When llsec_do_decrypt() subsequently modifies this shared buffer in place, it corrupts data that other clones -- potentially belonging to other sockets or subsystems -- still reference. On the TX path, similar data sharing can occur when an skb's head has been cloned (skb_cloned() returns true). The fix is to call skb_cow_data() before performing any in-place crypto operation. skb_cow_data() ensures that the skb's data area is not shared: if the skb head is cloned or the data spans multiple fragments, it copies the data into a private buffer that can be safely modified in place. This is the same pattern used by: - ESP (net/ipv4/esp4.c, net/ipv6/esp6.c) - MACsec (drivers/net/macsec.c) - WireGuard (drivers/net/wireguard/receive.c) - TIPC (net/tipc/crypto.c) Without this guard, in-place crypto on shared skb data leads to: - Silent data corruption of other skb clones - Use-after-free when the crypto API scatterwalk writes through a page that has already been freed by another clone's kfree_skb() - Kernel crashes under concurrent 802.15.4 traffic with security enabled (KASAN/KMSAN reports slab-use-after-free) Found by 0sec (https://0sec.ai) using automated source analysis. Fixes: 4c14a2fb5d14 ("mac802154: add llsec decryption method") Fixes: 03556e4d0dbb ("mac802154: add llsec encryption method") Cc: stable@vger.kernel.org Reported-by: Doruk Tan Ozturk Closes: https://lore.kernel.org/linux-wpan/20260525161806.96158-1-doruk@0sec.ai/ Reviewed-by: Alexander Lobakin Signed-off-by: Doruk Tan Ozturk Closes: Link: https://lore.kernel.org/20260526183726.56100-1-doruk@0sec.ai Signed-off-by: Stefan Schmidt Signed-off-by: Greg Kroah-Hartman commit 0cfa78c050662784fc8e3ab26dbfd1dc632b2082 Author: Kuniyuki Iwashima Date: Wed Jul 1 09:53:06 2026 +0300 af_unix: Set gc_in_progress to true in unix_gc(). [ Upstream commit d82ba05263c69fa2437fe93e4e561cc40f4c03af ] Igor Ushakov reported that unix_gc() could run with gc_in_progress being false if the work is scheduled while running: Thread 1 Thread 2 Thread 3 -------- -------- -------- unix_schedule_gc() unix_schedule_gc() `- if (!gc_in_progress) `- if (!gc_in_progress) |- gc_in_progress = true | `- queue_work() | unix_gc() <----------------/ | | |- gc_in_progress = true ... `- queue_work() | | `- gc_in_progress = false | | unix_gc() <---------------------------------------------' | ... /* gc_in_progress == false */ | `- gc_in_progress = false unix_peek_fpl() relies on gc_in_progress not to confuse GC by MSG_PEEK. Let's set gc_in_progress to true in unix_gc(). Fixes: 8b90a9f819dc ("af_unix: Run GC on only one CPU.") Reported-by: Igor Ushakov Signed-off-by: Kuniyuki Iwashima Link: https://patch.msgid.link/20260501073945.1884564-1-kuniyu@google.com Signed-off-by: Jakub Kicinski [ Add setting gc_in_progress in __unix_gc(). Keep the existing set in unix_gc() for wait_for_unix_gc() over-limit throttling. ] Signed-off-by: Igor Ushakov Signed-off-by: Sasha Levin commit 3c499851753a24d2e148d4e9ca51764c0c51554e Author: Jiajia Liu Date: Thu May 28 11:38:14 2026 +0800 wifi: mt76: add wcid publish check in mt76_sta_add commit 20b126920a259df4d7dcae19fcfe2c57a74d6b2e upstream. Since mt7925_mac_sta_add publishes wcid, add publish check in mt76_sta_add to avoid reinitializing the wcid->poll_list. Found dev->sta_poll_list corruption when using mt7925 and 7.1-rc4. According to the corruption information, prev->next was changed to itself. wlan0: disconnect from AP 90:fb:5d:94:8b:e3 for new auth to 90:fb:5d:94:8b:e2 wlan0: authenticate with 90:fb:5d:94:8b:e2 (local address=84:9e:56:9c:7e:6b) wlan0: send auth to 90:fb:5d:94:8b:e2 (try 1/3) slab kmalloc-8k start ffff8c80958a6000 pointer offset 4160 size 8192 list_add corruption. prev->next should be next (ffff8c808a7488f8), but was ffff8c80958a7040. (prev=ffff8c80958a7040). mt76_wcid_add_poll+0x95/0xd0 [mt76] mt7925_mac_add_txs.part.0+0xa5/0xe0 [mt7925_common] mt7925_rx_check+0xa7/0xc0 [mt7925_common] mt76_dma_rx_poll+0x50d/0x790 [mt76] mt792x_poll_rx+0x52/0xe0 [mt792x_lib] Signed-off-by: Jiajia Liu Link: https://patch.msgid.link/20260528033814.46418-1-liujiajia@kylinos.cn Signed-off-by: Felix Fietkau Signed-off-by: Greg Kroah-Hartman commit 5e658b9245a52d838ef93729a7bc07de8e19deb7 Author: Konstantin Komarov Date: Wed Jun 10 12:31:01 2026 +0200 ntfs3: reject direct userspace writes to reserved $LX* xattrs commit 5b08dccecf825cbf905f348bc6ccb497507e28e2 upstream. NTFS3 uses $LXUID, $LXGID, $LXMOD and $LXDEV as internal WSL permission metadata and reloads them into i_uid, i_gid and i_mode from ntfs_get_wsl_perm(). Because the empty-prefix xattr handler also lets file owners call setxattr() on these names directly, an unprivileged writer on a writable ntfs3 mount can plant root ownership and S_ISUID on their own file and gain euid 0 after inode reload. Reject direct userspace writes to the reserved $LX* names. Internal ntfs3 metadata updates are unchanged because ntfs_save_wsl_perm() writes them via ntfs_set_ea() directly. Signed-off-by: Zhen Yan [almaz.alexandrovich@paragon-software.com: added an additional check for non privileged users] Signed-off-by: Konstantin Komarov Signed-off-by: Greg Kroah-Hartman commit 77798d7be6ef71e72fb6fc8a2901bf74ebc9706f Author: Wongi Lee Date: Tue Jun 16 22:38:29 2026 +0900 ipv4: account for fraggap on the paged allocation path [ Upstream commit eca856950f7cb1a221e02b99d758409f2c5cec42 ] In __ip_append_data(), when the paged-allocation branch is taken, alloclen and pagedlen are computed as alloclen = fragheaderlen + transhdrlen; pagedlen = datalen - transhdrlen; datalen already includes fraggap, but the fraggap bytes carried over from the previous skb are copied into the new skb's linear area at offset transhdrlen by the subsequent skb_copy_and_csum_bits(). The linear area is therefore undersized by fraggap bytes while pagedlen is overstated by the same amount. The non-paged branch sets alloclen to fraglen, which already accounts for fraggap because datalen does. Bring the paged branch in line by adding fraggap to alloclen and subtracting it from pagedlen. After this adjustment, copy no longer collapses to -fraggap on the paged path, so remove the stale comment describing that old arithmetic. Fixes: 8eb77cc73977 ("ipv4: avoid partial copy for zc") Signed-off-by: Jungwoo Lee Signed-off-by: Wongi Lee Reviewed-by: Ido Schimmel Link: https://patch.msgid.link/ajFR1eLAIs42TN3g@DESKTOP-19IMU7U.localdomain Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit 6374fb9edf72c67a118a2c214a0dddd04c921e0a Author: Wongi Lee Date: Tue Jun 16 22:46:17 2026 +0900 ipv6: account for fraggap on the paged allocation path commit 736b380e28d0480c7bc3e022f1950f31fe53a7c5 upstream. In __ip6_append_data(), when the paged-allocation branch is taken (MSG_MORE / NETIF_F_SG / large fraglen), alloclen and pagedlen are computed as alloclen = fragheaderlen + transhdrlen; pagedlen = datalen - transhdrlen; datalen already includes fraggap (datalen = length + fraggap). When fraggap is non-zero, this is not the first skb and transhdrlen is zero. The fraggap bytes carried over from the previous skb are copied just past the fragment headers in the new skb's linear area. The linear area is therefore undersized by fraggap bytes while pagedlen is overstated by the same amount, and the copy writes past skb->end into the trailing skb_shared_info. An unprivileged user can trigger this via a UDPv6 socket using MSG_MORE together with MSG_SPLICE_PAGES. The bad accounting was introduced by commit 773ba4fe9104 ("ipv6: avoid partial copy for zc"). Before commit ce650a166335 ("udp6: Fix __ip6_append_data()'s handling of MSG_SPLICE_PAGES"), the negative copy value caused -EINVAL to be returned. That later commit allowed MSG_SPLICE_PAGES to proceed in this case, making the corruption triggerable. The non-paged branch sets alloclen to fraglen, which already accounts for fraggap because datalen does. Bring the paged branch in line by adding fraggap to alloclen and subtracting it from pagedlen. After this adjustment, copy no longer collapses to -fraggap on the paged path, so remove the stale comment describing that old arithmetic. Since a negative copy is no longer expected for a valid MSG_SPLICE_PAGES case, remove the MSG_SPLICE_PAGES exception from the negative copy check. Fixes: 773ba4fe9104 ("ipv6: avoid partial copy for zc") Signed-off-by: Jungwoo Lee Signed-off-by: Wongi Lee Reviewed-by: Ido Schimmel Link: https://patch.msgid.link/ajFTqRljatR17fFy@DESKTOP-19IMU7U.localdomain Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman commit 565ab66005b14e4d40f2ef7d36cc6baaf9725fb2 Author: Sven Eckelmann Date: Fri Jun 26 18:12:10 2026 +0200 batman-adv: tvlv: avoid race of cifsnotfound handler state commit edb557b2ba38fea2c5eb710cf366c797e187218c upstream. TVLV handlers can have the flag BATADV_TVLV_HANDLER_OGM_CIFNOTFND set to signal that the OGM handler should be called (with NULL for data) when the specific TVLV container was not found in the OGM. This is used by: * DAT * GW * Multicast (OGM + Tracker) The state whether the handler was executed was stored in the struct batadv_tvlv_handler. But the TVLV processing is started without any lock. Multiple parallel contexts processing TVLVs would therefore overwrite each others BATADV_TVLV_HANDLER_OGM_CALLED flag in the shared batadv_tvlv_handler. Drop the shared BATADV_TVLV_HANDLER_OGM_CALLED flag and instead determine, per TVLV buffer, whether a matching container was present by scanning the packet's buffer. Cc: stable@kernel.org Fixes: ef26157747d4 ("batman-adv: tvlv - basic infrastructure") Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit 4cc9f7711bb898f7af34eef26f615e9465ae250c Author: Sven Eckelmann Date: Fri Jun 26 18:12:09 2026 +0200 batman-adv: tvlv: enforce 2-byte alignment commit 32a6799255525d6ea4da0f7e9e0e521ad9560a46 upstream. The fields of an aggregated OGM(v2) are accessed assuming (at least) 2-byte alignment, so a following OGM must start at an even offset. As the header length is even, an odd tvlv_len would misalign it and trigger unaligned accesses on strict-alignment architectures. Such a misaligned TVLV/OGM/OGMv2 is not created by a normal participant in the mesh. Therefore, reject such malformed packets. Cc: stable@kernel.org Fixes: ef26157747d4 ("batman-adv: tvlv - basic infrastructure") Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit 04e1a6557fbf8ad9385563ae79c5437cb5bf94c3 Author: Sven Eckelmann Date: Fri Jun 26 18:12:08 2026 +0200 batman-adv: dat: prevent false sharing between VLANs commit 20d7658b74169f86d4ac01b9185b3eadddf71f28 upstream. The local hash of DAT entries is supposed to be VLAN (VID) aware. But the adding to the hash and the search in the hash were not checking the VID information of the hash entries. The entries would therefore only be correctly separated when batadv_hash_dat() didn't select the same buckets for different VIDs. Cc: stable@kernel.org Fixes: be1db4f6615b ("batman-adv: make the Distributed ARP Table vlan aware") Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit 3f82fc92cf523a007a423c21b04613acf32aeaa3 Author: Sven Eckelmann Date: Fri Jun 26 18:12:07 2026 +0200 batman-adv: tt: track roam count per VID commit 12407d5f61c2653a64f2ff4b22f3c267f8420ef1 upstream. batadv_tt_check_roam_count() is supposed to track roaming of a TT entry. But TT entries are for a MAC + VID. The VID was completely missed and thus leads to incorrect detection of ROAM counts when a client MAC exists in multiple VLANs. Cc: stable@kernel.org Fixes: c018ad3de61a ("batman-adv: add the VLAN ID attribute to the TT entry") Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit 3470d583fc652c40db2bfd350e981e0e4716e189 Author: Sven Eckelmann Date: Fri Jun 26 18:12:06 2026 +0200 batman-adv: tt: don't merge change entries with different VIDs commit f08e06c2d5c3e2434e7c773f2213f4a7dce6bc1e upstream. batadv_tt_local_event() merges/cancels events for the same client which would conflict or be duplicates. The matching of the queued events only compares the MAC address - the VLAN ID stored in each event is ignored. If a MAC would now appear on multiple VID, the two ADD change events (for VID 1 and VID 2) would be merged to a single vid event. The remote can therefore not calculate the correct TT table and desync. A full translation table exchange is required to recover from this state. A check of VID is therefore necessary to avoid such wrong merges/cancels. Cc: stable@kernel.org Fixes: c018ad3de61a ("batman-adv: add the VLAN ID attribute to the TT entry") Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit af5a069805f67957340cde27e349374be676e6d4 Author: Sven Eckelmann Date: Fri Jun 26 18:12:05 2026 +0200 batman-adv: tp_meter: handle overlapping packets commit cbde75c38b21f022891525078622587ad557b7c1 upstream. If the size of the packets would change during the transmission, it could happen that some retries of packets are overlapping. In this case, precise comparisons of sequence numbers by the receiver would be wrong. It is then necessary to check if the start sequence number to the end sequence number ("seqno + length") would contain a new range. If this is the case then this is enough to accept this packet. In all other cases, the packet still has to be dropped (and not acked). Cc: stable@kernel.org Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") [ Switch to pre-splitted tp_vars structure names ] Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit d511c72a83dd55adf90f318f4471285e605264f8 Author: Sven Eckelmann Date: Fri Jun 26 18:12:04 2026 +0200 batman-adv: tp_meter: prevent parallel modifications of last_recv commit 6dde0cfcb36e4d5b3de35b75696937478441eed4 upstream. When last_recv is updated to store the last receive sequence number, it is assuming that nothing is modifying in parallel while: * check for outdated packets is done * out of order check is performed (and packets are stored in out-of-order queue) * the out-of-order queue was searched for closed gaps * sequence number for next ack is calculated Nothing of that was actually protected. It could therefore happen that the last_recv was updated multiple times in parallel and the final sequence number was calculated with deltas which had no connection to the sequence number they were added to. Lock this whole region with the same lock which was already used to protect the unacked (out-of-order) list. Cc: stable@kernel.org Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") [ Switch to pre-splitted tp_vars structure names ] Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit 1dafdd0794be1a63d2c45f9a32e0a9cd89f68478 Author: Sven Eckelmann Date: Fri Jun 26 18:12:03 2026 +0200 batman-adv: tp_meter: annotate last_recv_time access with READ/WRITE_ONCE commit d67c728f07fca2ee6ffdc6dd4421cf2e8691f4d1 upstream. The last_recv_time field for batadv_tp_receiver tracks the jiffies value of the most recent activity and is used to detect timeouts. These accesses are not consistently protected by a lock, so READ_ONCE/WRITE_ONCE must be used to prevent data races caused by compiler optimizations. Cc: stable@kernel.org Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit 2233787658db859f0a9b83cb397cf783bb8be865 Author: Sven Eckelmann Date: Fri Jun 26 18:12:02 2026 +0200 batman-adv: tp_meter: restrict number of unacked list entries commit e7c775110e1858e5a7471a23a9c9658c0af9df89 upstream. When the unacked_list is unbound, an attacker could send messages with small lengths and appropriated seqno + gaps to force the receiver to allocate more and more unacked_list entries. And the end either causing an out-of-memory situation or increase the management overhead for the (large) list that significant portions of CPU cycles are wasted in searching through the list. When limiting the list to a specific number, it is important to still correctly add a new entry to the list. But if the list became larger than the limit, the last entry of the list (with the highest seqno) must be dropped to still allow the earlier seqnos to finish and therefore to continue the process. Otherwise, the process might get stuck with too high seqnos which are not handled by batadv_tp_ack_unordered(). Cc: stable@kernel.org Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") [ Switch to pre-splitted tp_vars structure names ] Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit 3d4548c96d6f21ac1a9b06c5f82f3ef439c87023 Author: Sven Eckelmann Date: Fri Jun 26 18:12:01 2026 +0200 batman-adv: v: prevent OGM aggregation on disabled hardif commit d11c00b95b2a3b3934007fc003dccc6fdcc061ad upstream. When an interface gets disabled, the worker is correctly disabled by batadv_hardif_disable_interface() -> ... -> batadv_v_ogm_iface_disable(). In this process, the skb aggr_list is also freed. But batadv_v_ogm_send_meshif() can still queue new skbs (via batadv_v_ogm_queue_on_if()) to the aggr_list. This will only stop after all cores can no longer find the RCU protected list of hard interfaces. These queued skbs will never be freed or consumed by batadv_v_ogm_aggr_work. The batadv_v_ogm_iface_disable() function must block batadv_v_ogm_queue_on_if() to avoid leak of skbs. Cc: stable@kernel.org Fixes: f89255a02f1d ("batman-adv: BATMAN_V: introduce per hard-iface OGMv2 queues") [ Context ] Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit 44ae137a2aceff08cd226718ef077adca2638012 Author: Sven Eckelmann Date: Fri Jun 26 18:12:00 2026 +0200 batman-adv: frag: avoid underflow of TTL commit 493d9d2528e1a09b090e4b37f0f553def7bd5ce9 upstream. Packets with a TTL are using it to limit the amount of time this packet can be forwarded. But for batadv_frag_packet, the TTL was always only reduced but it was never evaluated. It could even underflow without any effect. Check the TTL in batadv_frag_skb_fwd() before attempting to prepare it for forwarding. This keeps it in sync with the not fragmented unicast packet. Cc: stable@kernel.org Fixes: 610bfc6bc99b ("batman-adv: Receive fragmented packets and merge") Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit 116e94025f0f48cbcfc167b6dc925551fad1b2bd Author: Sven Eckelmann Date: Fri Jun 26 18:11:59 2026 +0200 batman-adv: frag: ensure fragment is writable before modifying TTL commit b7293c6e8c15b2db77809b25cf8389e35331b27a upstream. Before batman-adv is allowed to write to an skb, it either has to have its own copy of the skb or use skb_cow() to ensure that the data part is not shared. But batadv_frag_skb_fwd() modifies the TTL even when it is shared. Adding a skb_cow() right before this operation avoids this and can at the same time prepare it for the modifications required to forward the fragment. Cc: stable@kernel.org Fixes: 610bfc6bc99b ("batman-adv: Receive fragmented packets and merge") Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit 0473ae882624ae309399c3fff10cedc6d0cffbfd Author: Sven Eckelmann Date: Fri Jun 26 18:11:58 2026 +0200 batman-adv: fix (m|b)cast csum after decrementing TTL commit e728bbdf32660c8f32b8f5e8d09427a2c131ad60 upstream. The broadcast and multicast packets can be received at the same time by the local system and forwarded to other nodes. Both are simply decrementing the TTL at the beginning of the receive path - independent of chosen paths (receive/forward). But such a modification of the data conflicts with the hw csum. This is not a problem when the packet is directly forwarded but can cause errors in the local receive path. Such a problem can then trigger a "hw csum failure". The receiver path must therefore ensure that the csum is fixed for each modification of the payload before batadv_interface_rx() is reached. Since all batman-adv packet types with a ttl have it as u8 at offset 2, a helper can be used for all of them. But it is only used at the moment for batadv_bcast_packet and batadv_mcast_packet because they are the only ones which deliver the packet locally but unconditionally modify the TTL. Cc: stable@kernel.org Fixes: 3f69339068f9 ("batman-adv: bcast: queue per interface, if needed") Fixes: 07afe1ba288c ("batman-adv: mcast: implement multicast packet reception and forwarding") Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit 49bf27fcd7ee4cf643d766fc91fffbf2b0134363 Author: Sven Eckelmann Date: Fri Jun 26 18:11:57 2026 +0200 batman-adv: ensure bcast is writable before modifying TTL commit 4cd6d3a4b96a8576f1fed8f9f9f17c2dc2978e0c upstream. Before batman-adv is allowed to write to an skb, it either has to have its own copy of the skb or used skb_cow() to ensure that the data part is not shared. The old implementation used a shared queue and created copies before attempting to write to it. But with the new implementation, the broadcast packet is already modified when it gets received. Potentially writing to shared buffers in this process. Adding a skb_cow() right before this operation avoids this and can at the same time prepare it for the modifications required to rebroadcast the packet. Cc: stable@kernel.org Fixes: 3f69339068f9 ("batman-adv: bcast: queue per interface, if needed") Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit 646b68639c06bc44593216fd7e9433d48d5e96eb Author: Sven Eckelmann Date: Fri Jun 26 18:11:56 2026 +0200 batman-adv: gw: don't deselect gateway with active hardif commit df97a7107b16375a10a36d7a63e9b4291a8ac680 upstream. The batadv_hardif_cnt() was previously checking if there is an batadv_hard_iface->mesh_iface which is has the same mesh_iface. And since batadv_hardif_disable_interface() was resetting the batadv_hard_iface->mesh_iface after this check, it had to verify whether *1* interface was still part of the mesh_iface before it started the gateway deselection. But after batadv_hardif_cnt() is now checking the lower interfaces of mesh_iface and batadv_hardif_disable_interface() already removed the interface via netdev_upper_dev_unlink() earlier in this function, the check must now make sure that *0* interfaces can be found by batadv_hardif_cnt() before selected gateway must be deselected. Otherwise the deselection would already happen one batadv_hard_iface too early. Because a 0 hardif count from batadv_hardif_cnt() is equal to an empty list, it is possible to replace the counting with a simple list_empty(). Cc: stable@kernel.org Fixes: 7dc284702bcd ("batman-adv: store hard_iface as iflink private data") Reviewed-by: Nora Schiffer Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit 95a061f587b76a519ee17e3d406a37bc4eb63a50 Author: Sven Eckelmann Date: Fri Jun 26 18:11:55 2026 +0200 batman-adv: tp_meter: initialize last_recv_time during init commit 811cb00fa8cdc3f0a7f6eefc000a6888367c8c8f upstream. The last_recv_time is the most important indicator for a receiver session to figure out whether a session timed out or not. But this information was only initialized after the session was added to the tp_receiver_list and after the timer was started. In the worst case, the timer (function) could have tried to access this information before the actual initialization was reached. Like rest of the variables of the tp_meter receiver session, this field has to be filled out before any other (parallel running) context has the chance to access it. Cc: stable@kernel.org Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") [ Context ] Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit 75612c100a9e2fe9df718056b03f544ec5b0623e Author: Sven Eckelmann Date: Fri Jun 26 18:11:54 2026 +0200 batman-adv: prevent ELP transmission interval underflow commit 5e50d4b8ae3ea622122d3c6a38d7f6fe68dfddca upstream. batadv_v_elp_start_timer() enqeues a delayed work. The time when it starts is randomly chosen between (elp_interval - BATADV_JITTER) and (elp_interval + BATADV_JITTER). The configured elp_interval must therefore be larger or equal to BATADV_JITTER to avoid that it causes an underflow of the unsigned integer. If this would happen, then a "fast" ELP interval would turn into a "day long" delay. At the same time, it must not be larger than the maximum value the variable can store. Cc: stable@kernel.org Fixes: a10800829040 ("batman-adv: Add elp_interval hardif genl configuration") [ Context ] Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit 43733e5b525fbee8e60903b57d1a8cf6498e762d Author: Sven Eckelmann Date: Fri Jun 26 18:11:53 2026 +0200 batman-adv: bla: annotate lasttime access with READ/WRITE_ONCE commit 98b0fb191c878a64cbaebfe231d96d57576acf8c upstream. The lasttime field for claim, backbone_gw, and loopdetect tracks the jiffies value of the most recent activity and is used to detect timeouts. These accesses are not consistently protected by a lock, so READ_ONCE/WRITE_ONCE must be used to prevent data races caused by compiler optimizations. Cc: stable@kernel.org Fixes: 23721387c409 ("batman-adv: add basic bridge loop avoidance code") Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit 23d085bd63086457ae82ab92e9e2d8ec895e2774 Author: Sven Eckelmann Date: Fri Jun 26 18:11:52 2026 +0200 batman-adv: tp_meter: add only finished tp_vars to lists commit 15ccbf685222274f5add1387af58c2a41a95f81e upstream. When the receiver variables (aka "session") are initialized, then they are added to the list of sessions before the timer is set up. A RCU protected reader could therefore find the entry and run mod_setup before batadv_tp_init_recv() finished the timer initialization. The same is true for batadv_tp_start(), which must first initialize the finish_work and the test_length to avoid a similar problem. Cc: stable@kernel.org Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit b8bf8400e50cbab595843241f63be3355deb1ca8 Author: Sven Eckelmann Date: Fri Jun 26 18:11:51 2026 +0200 batman-adv: tp_meter: handle seqno wrap-around for fast recovery detection commit f54c85ed42a1b27a516cf2a4728f5a612b799e07 upstream. The recover variable and the last_sent sequence number are initialized on purpose as a really high value which will wrap-around after the first 2000 bytes. The fast recovery precondition must therefore not use simple integer comparisons but use helpers which are aware of the sequence number wrap-arounds. Cc: stable@kernel.org Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit 1db02f3e315da800720e2e14b4a9c8ffe14e8cbd Author: Sven Eckelmann Date: Fri Jun 26 18:11:50 2026 +0200 batman-adv: tp_meter: fix fast recovery precondition commit 2b0d08f08ed3b2174f05c43089ec65f3543a025b upstream. The fast recovery precondition checks if the recover (initialized to BATADV_TP_FIRST_SEQ) is bigger than the received ack. But since recover is only updated when this check is successful, it will never enter the fast recovery mode. According to RFC6582 Section 3.2 step 2, the check should actually be different: > When the third duplicate ACK is received, the TCP sender first > checks the value of recover to see if the Cumulative > Acknowledgment field covers more than recover The precondition must therefore check if recover is smaller than the received ack - basically swapping the operands of the current check. Cc: stable@kernel.org Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit 7d2a44bc6bbe39aed03c68864aa0e54e04a50278 Author: Sven Eckelmann Date: Fri Jun 26 18:11:49 2026 +0200 batman-adv: tp_meter: avoid divide-by-zero for dec_cwnd commit 33ccd52f3cc9ed46ce395199f89aa3234dc83314 upstream. The cwnd is always MSS <= cwnd <= 0x20000000. But the calculation in batadv_tp_update_cwnd() assumes unsigned 32 bit arithmetics. ((mss * 8) ** 2) / (cwnd * 8) In case cwnd is actually 0x20000000, it will be shifted by 3 bit to the left end up at 0x100000000 or U32_MAX + 1. It will therefore wrap around and be 0 - resulting in: ((mss * 8) ** 2) / 0 This is of course invalid and cannot be calculated. The calculation should must be simplified to avoid this overflow: (mss ** 2) * 8 / cwnd It will keep the precision enhancement from the scaling (by 8) but avoid the overflow in the divisor. In theory, there could still be an overflow in the dividend. It is at the moment fixed to BATADV_TP_PLEN in batadv_tp_recv_ack() - so it is not an imminent problem. But allowing it to use the whole u32 bit range, would mean that it can still use up to 67 bits. To keep this calculation safe for 32 bit arithmetic, mss must never use more than floor((32 - 3) / 2) bits - or in other words: must never be larger than 16383. Cc: stable@kernel.org Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit 8e77fe0414f5c7c956ea82615975d43eab018c25 Author: Sven Eckelmann Date: Fri Jun 26 18:11:48 2026 +0200 batman-adv: tp_meter: avoid window underflow commit 765947b81fb54b6ebb0bc1cfe55c0fa399e002b8 upstream. In batadv_tp_avail(), win_left is calculated with 32-bit unsigned arithmetic: win_left = win_limit - tp_vars->last_sent; During Fast Recovery, cwnd is inflated and last_sent advances rapidly. When Fast Recovery ends, cwnd drops abruptly back to ss_threshold. If the newly shrunk win_limit is less than last_sent, the unsigned subtraction will underflow, wrapping to a massive positive value. Instead of returning that the window is full (unavailable), it returns that the sender can continue sending. To handle this situation, it must be checked whether the windows end sequence number (win_limit) has to be compared with the last sent sequence number. If it would be before the last sent sequence number, then more acks are needed before the transmission can be started again. Cc: stable@kernel.org Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit 7cb88d91d5f9f9a2842e0ec0622d01bdfaa0511a Author: Sven Eckelmann Date: Fri Jun 26 18:11:47 2026 +0200 batman-adv: tp_meter: initialize dec_cwnd explicitly commit febfb1b86224489535312296ecfa3d4bf467f339 upstream. When batadv_tp_update_cwnd() is called, dec_cwnd is increased. But dec_cwnd is only initialixed (to 0) when a duplicate Ack was received or when cwnd is below the ss_threshold. Just initialize the cwnd during the initialization to avoid any potential access of uninitialized data. Cc: stable@kernel.org Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit 696c4cae872cca59f51b1b5a0f8888d22ccb47a2 Author: Sven Eckelmann Date: Fri Jun 26 18:11:46 2026 +0200 batman-adv: tp_meter: initialize dup_acks explicitly commit b2b68b32a715e0328662801576974aa37b942b00 upstream. When an ack with a sequence number equal to the last_acked is received, the dup_acks counter is increased to decide whether fast retransmit should be performed. Only when the sequence numbers are not equal, the dup_acks is set to the initial value (0). But if the initial packet would have the sequence number BATADV_TP_FIRST_SEQ, dup_acks would not be initialized and atomic_inc would operate on an undefined starting value. It is therefore required to have it explicitly initialized during the start of the sender session. Cc: stable@kernel.org Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit 1c5a1268418e8a2e5ab652a3adafac6c1e5e96b8 Author: Sven Eckelmann Date: Fri Jun 26 18:11:45 2026 +0200 batman-adv: tp_meter: keep unacked list in ascending ordered commit 5aa8651527ea0b610e7a09fb3b8204c1398b9525 upstream. When batadv_tp_handle_out_of_order inserts a new entry in the list of unacked (out of order) packets, it searches from the entry with the newest sequence number towards oldest sequence number. If an entry is found which is older than the newly entry, the new entry has to be added after the found one to keep the ascending order. But for this operation list_add_tail() was used. But this function adds an entry _before_ another one. As result, the list would contain a lot of swapped sequence numbers. The consumer of this list (batadv_tp_ack_unordered()) would then fail to correctly ack packets. Cc: stable@kernel.org Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation") Signed-off-by: Sven Eckelmann Signed-off-by: Sasha Levin commit e055e74b80eb8858f98c736aa565173a917dcab5 Author: NeilBrown Date: Fri Jun 26 10:31:22 2026 -0400 lockd: fix TEST handling when not all permissions are available. [ Upstream commit 0b474240327cebeff08ad429e8ed3cfc6c8ee816 ] The F_GETLK fcntl can work with either read access or write access or both. It can query F_RDLCK and F_WRLCK locks in either case. However lockd currently treats F_GETLK similar to F_SETLK in that read access is required to query an F_RDLCK lock and write access is required to query a F_WRLCK lock. This is wrong and can cause problems - e.g. when qemu accesses a read-only (e.g. iso) filesystem image over NFS (though why it queries if it can get a write lock - I don't know. But it does, and this works with local filesystems). So we need TEST requests to be handled differently. To do this: - change nlm_do_fopen() to accept O_RDWR as a mode and in that case succeed if either a O_RDONLY or O_WRONLY file can be opened. - change nlm_lookup_file() to accept a mode argument from caller, instead of deducing base on lock time, and pass that on to nlm_do_fopen() - change nlm4svc_retrieve_args() and nlmsvc_retrieve_args() to detect TEST requests and pass O_RDWR as a mode to nlm_lookup_file, passing the same mode as before for other requests. Also set lock->fl.c.flc_file to whichever file is available for TEST requests. - change nlmsvc_testlock() to also not calculate the mode, but to use whatever was stored in lock->fl.c.flc_file. This behaviour of lockd - requesting O_WRONLY access to TEST for exclusive locks - has been present at least since git history began. However it was hidden until recently because knfsd ignored the access requested by lockd and required only READ access for all locking requests (unless the underlying filesystem provided an f_op->open function which checked access permissions). The commit mentioned in Fixes: below changed nfsd_permission() to NOT override the access request for LOCK requests and this exposed the bug that we are now fixing. Note that there is another issue that this patch does not address. The flock(.., LOCK_EX) call is permitted on a read-only file descriptor. Linux NFS maps this to NLM locking as whole-file byte-range locks. nfsd will see this as though it were fcntl( F_SETLK (F_WRLCK)) and will now require write access, which it might not be able to get. It is not clear if this is a problem in practice, or what the best solution might be. So no attempt is made to address it. Reported-by: Tj Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1128861 Fixes: 4cc9b9f2bf4d ("nfsd: refine and rename NFSD_MAY_LOCK") Reviewed-by: Jeff Layton Signed-off-by: NeilBrown Signed-off-by: Chuck Lever Signed-off-by: Sasha Levin commit 671ec2eabb874fcb593297c4dd885fc3dae54f32 Author: Sasha Levin Date: Sat Jun 27 10:53:39 2026 -0400 Revert "PCI: qcom: Advertise Hotplug Slot Capability with no Command Completion support" This reverts commit f176c47683bf6365e2f6d580d557fae49169a703. Signed-off-by: Sasha Levin commit d844702198395d3f80222777030f69db6be6b709 Author: Paul Moore Date: Fri Jun 26 15:50:35 2026 +0800 selinux: fix overlayfs mmap() and mprotect() access checks [ Upstream commit 82544d36b1729153c8aeb179e84750f0c085d3b1 ] The existing SELinux security model for overlayfs is to allow access if the current task is able to access the top level file (the "user" file) and the mounter's credentials are sufficient to access the lower level file (the "backing" file). Unfortunately, the current code does not properly enforce these access controls for both mmap() and mprotect() operations on overlayfs filesystems. This patch makes use of the newly created security_mmap_backing_file() LSM hook to provide the missing backing file enforcement for mmap() operations, and leverages the backing file API and new LSM blob to provide the necessary information to properly enforce the mprotect() access controls. Cc: stable@vger.kernel.org Acked-by: Amir Goldstein Signed-off-by: Paul Moore Signed-off-by: Cai Xinchen Signed-off-by: Sasha Levin commit 5dfcb15974e7d0f96aca278dd9f1b85df91523ef Author: Paul Moore Date: Fri Jun 26 15:50:34 2026 +0800 lsm: add backing_file LSM hooks [ Upstream commit 6af36aeb147a06dea47c49859cd6ca5659aeb987 ] Stacked filesystems such as overlayfs do not currently provide the necessary mechanisms for LSMs to properly enforce access controls on the mmap() and mprotect() operations. In order to resolve this gap, a LSM security blob is being added to the backing_file struct and the following new LSM hooks are being created: security_backing_file_alloc() security_backing_file_free() security_mmap_backing_file() The first two hooks are to manage the lifecycle of the LSM security blob in the backing_file struct, while the third provides a new mmap() access control point for the underlying backing file. It is also expected that LSMs will likely want to update their security_file_mprotect() callback to address issues with their mprotect() controls, but that does not require a change to the security_file_mprotect() LSM hook. There are a three other small changes to support these new LSM hooks: * Pass the user file associated with a backing file down to alloc_empty_backing_file() so it can be included in the security_backing_file_alloc() hook. * Add getter and setter functions for the backing_file struct LSM blob as the backing_file struct remains private to fs/file_table.c. * Constify the file struct field in the LSM common_audit_data struct to better support LSMs that need to pass a const file struct pointer into the common LSM audit code. Thanks to Arnd Bergmann for identifying the missing EXPORT_SYMBOL_GPL() and supplying a fixup. Cc: stable@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: linux-unionfs@vger.kernel.org Cc: linux-erofs@lists.ozlabs.org Reviewed-by: Amir Goldstein Reviewed-by: Serge Hallyn Reviewed-by: Christian Brauner Signed-off-by: Paul Moore [Mainline declares lsm_backing_file_cache in security/lsm.h. Linux 6.18.y does not have security/lsm_init.c or security/lsm.h; the cache variable is defined locally as static struct kmem_cache *lsm_backing_file_cache in security/security.c.] Signed-off-by: Cai Xinchen Signed-off-by: Sasha Levin commit 5e470998a23e4c3d89ed24e8172cb22747e61efa Author: Paolo Bonzini Date: Fri Jun 26 13:23:15 2026 +0200 KVM: x86: Fix shadow paging use-after-free due to unexpected role commit 81ccda30b4e83d8f5cc4fd50503c44e3a33abfeb upstream. Commit 0cb2af2ea66ad ("KVM: x86: Fix shadow paging use-after-free due to unexpected GFN") fixed a shadow paging mismatch between stored and computed GFNs; the bug could be triggered by changing a PDE mapping from outside the guest, and then deleting a memslot. The rmap_remove() call would miss entries created after the PDE change because the GFN of the leaf SPTE does not match the GFN of the struct kvm_mmu_page. A similar hole however remains if the modified PDE points to a non-leaf page. In this case the gfn can be made to match, but the role does not match: the original large 2MB page creates a kvm_mmu_page with direct=1, while the new 4KB needs a kvm_mmu_page with direct=0. However, kvm_mmu_get_child_sp() does not compare the role, and therefore reuses the page. The next step is installing a leaf (4KB) SPTE on the new path which records an rmap entry under the gfn resolved by the walk. But when that child is zapped its parent kvm_mmu_page has direct=1 and kvm_mmu_page_get_gfn() computes the gfn for the 4KB page as sp->gfn + index instead of using sp->shadowed_translation[] (or sp->gfns[] in older kernels). It therefore fails to remove the recorded entry. When the memslot is dropped the shadow page is freed but the rmap entry survives, as in the scenario that was already fixed. Code that later walks that gfn (dirty logging, MMU notifier invalidation, and so on) dereferences an sptep that lies in the freed page, causing the use-after-free. Fixes: 2032a93d66fa ("KVM: MMU: Don't allocate gfns page for direct mmu pages") Reported-by: Hyunwoo Kim Signed-off-by: Paolo Bonzini Signed-off-by: Sasha Levin