Will Deacon [Fri, 8 Mar 2024 15:28:29 +0000 (15:28 +0000)]
swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE
For swiotlb allocations >= PAGE_SIZE, the slab search historically
adjusted the stride to avoid checking unaligned slots. This had the
side-effect of aligning large mapping requests to PAGE_SIZE, but that
was broken by
0eee5ae10256 ("swiotlb: fix slot alignment checks").
Since this alignment could be relied upon drivers, reinstate PAGE_SIZE
alignment for swiotlb mappings >= PAGE_SIZE.
Reported-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Petr Tesarik <petr.tesarik1@huawei-partners.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Nicolin Chen [Fri, 8 Mar 2024 15:28:28 +0000 (15:28 +0000)]
iommu/dma: Force swiotlb_max_mapping_size on an untrusted device
The swiotlb does not support a mapping size > swiotlb_max_mapping_size().
On the other hand, with a 64KB PAGE_SIZE configuration, it's observed that
an NVME device can map a size between 300KB~512KB, which certainly failed
the swiotlb mappings, though the default pool of swiotlb has many slots:
systemd[1]: Started Journal Service.
=> nvme 0000:00:01.0: swiotlb buffer is full (sz: 327680 bytes), total 32768 (slots), used 32 (slots)
note: journal-offline[392] exited with irqs disabled
note: journal-offline[392] exited with preempt_count 1
Call trace:
[ 3.099918] swiotlb_tbl_map_single+0x214/0x240
[ 3.099921] iommu_dma_map_page+0x218/0x328
[ 3.099928] dma_map_page_attrs+0x2e8/0x3a0
[ 3.101985] nvme_prep_rq.part.0+0x408/0x878 [nvme]
[ 3.102308] nvme_queue_rqs+0xc0/0x300 [nvme]
[ 3.102313] blk_mq_flush_plug_list.part.0+0x57c/0x600
[ 3.102321] blk_add_rq_to_plug+0x180/0x2a0
[ 3.102323] blk_mq_submit_bio+0x4c8/0x6b8
[ 3.103463] __submit_bio+0x44/0x220
[ 3.103468] submit_bio_noacct_nocheck+0x2b8/0x360
[ 3.103470] submit_bio_noacct+0x180/0x6c8
[ 3.103471] submit_bio+0x34/0x130
[ 3.103473] ext4_bio_write_folio+0x5a4/0x8c8
[ 3.104766] mpage_submit_folio+0xa0/0x100
[ 3.104769] mpage_map_and_submit_buffers+0x1a4/0x400
[ 3.104771] ext4_do_writepages+0x6a0/0xd78
[ 3.105615] ext4_writepages+0x80/0x118
[ 3.105616] do_writepages+0x90/0x1e8
[ 3.105619] filemap_fdatawrite_wbc+0x94/0xe0
[ 3.105622] __filemap_fdatawrite_range+0x68/0xb8
[ 3.106656] file_write_and_wait_range+0x84/0x120
[ 3.106658] ext4_sync_file+0x7c/0x4c0
[ 3.106660] vfs_fsync_range+0x3c/0xa8
[ 3.106663] do_fsync+0x44/0xc0
Since untrusted devices might go down the swiotlb pathway with dma-iommu,
these devices should not map a size larger than swiotlb_max_mapping_size.
To fix this bug, add iommu_dma_max_mapping_size() for untrusted devices to
take into account swiotlb_max_mapping_size() v.s. iova_rcache_range() from
the iommu_dma_opt_mapping_size().
Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
Link: https://lore.kernel.org/r/ee51a3a5c32cf885b18f6416171802669f4a718a.1707851466.git.nicolinc@nvidia.com
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
[will: Drop redundant is_swiotlb_active(dev) check]
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Will Deacon [Fri, 8 Mar 2024 15:28:27 +0000 (15:28 +0000)]
swiotlb: Fix alignment checks when both allocation and DMA masks are present
Nicolin reports that swiotlb buffer allocations fail for an NVME device
behind an IOMMU using 64KiB pages. This is because we end up with a
minimum allocation alignment of 64KiB (for the IOMMU to map the buffer
safely) but a minimum DMA alignment mask corresponding to a 4KiB NVME
page (i.e. preserving the 4KiB page offset from the original allocation).
If the original address is not 4KiB-aligned, the allocation will fail
because swiotlb_search_pool_area() erroneously compares these unmasked
bits with the 64KiB-aligned candidate allocation.
Tweak swiotlb_search_pool_area() so that the DMA alignment mask is
reduced based on the required alignment of the allocation.
Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
Link: https://lore.kernel.org/r/cover.1707851466.git.nicolinc@nvidia.com
Reported-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Will Deacon [Fri, 8 Mar 2024 15:28:26 +0000 (15:28 +0000)]
swiotlb: Honour dma_alloc_coherent() alignment in swiotlb_alloc()
core-api/dma-api-howto.rst states the following properties of
dma_alloc_coherent():
| The CPU virtual address and the DMA address are both guaranteed to
| be aligned to the smallest PAGE_SIZE order which is greater than or
| equal to the requested size.
However, swiotlb_alloc() passes zero for the 'alloc_align_mask'
parameter of swiotlb_find_slots() and so this property is not upheld.
Instead, allocations larger than a page are aligned to PAGE_SIZE,
Calculate the mask corresponding to the page order suitable for holding
the allocation and pass that to swiotlb_find_slots().
Fixes: e81e99bacc9f ("swiotlb: Support aligned swiotlb buffers")
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Petr Tesarik <petr.tesarik1@huawei-partners.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Will Deacon [Fri, 8 Mar 2024 15:28:25 +0000 (15:28 +0000)]
swiotlb: Enforce page alignment in swiotlb_alloc()
When allocating pages from a restricted DMA pool in swiotlb_alloc(),
the buffer address is blindly converted to a 'struct page *' that is
returned to the caller. In the unlikely event of an allocation bug,
page-unaligned addresses are not detected and slots can silently be
double-allocated.
Add a simple check of the buffer alignment in swiotlb_alloc() to make
debugging a little easier if something has gone wonky.
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Petr Tesarik <petr.tesarik1@huawei-partners.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Will Deacon [Fri, 8 Mar 2024 15:28:24 +0000 (15:28 +0000)]
swiotlb: Fix double-allocation of slots due to broken alignment handling
Commit
bbb73a103fbb ("swiotlb: fix a braino in the alignment check fix"),
which was a fix for commit
0eee5ae10256 ("swiotlb: fix slot alignment
checks"), causes a functional regression with vsock in a virtual machine
using bouncing via a restricted DMA SWIOTLB pool.
When virtio allocates the virtqueues for the vsock device using
dma_alloc_coherent(), the SWIOTLB search can return page-unaligned
allocations if 'area->index' was left unaligned by a previous allocation
from the buffer:
# Final address in brackets is the SWIOTLB address returned to the caller
| virtio-pci 0000:00:07.0: orig_addr 0x0 alloc_size 0x2000, iotlb_align_mask 0x800 stride 0x2: got slot 1645-1649/7168 (0x98326800)
| virtio-pci 0000:00:07.0: orig_addr 0x0 alloc_size 0x2000, iotlb_align_mask 0x800 stride 0x2: got slot 1649-1653/7168 (0x98328800)
| virtio-pci 0000:00:07.0: orig_addr 0x0 alloc_size 0x2000, iotlb_align_mask 0x800 stride 0x2: got slot 1653-1657/7168 (0x9832a800)
This ends badly (typically buffer corruption and/or a hang) because
swiotlb_alloc() is expecting a page-aligned allocation and so blindly
returns a pointer to the 'struct page' corresponding to the allocation,
therefore double-allocating the first half (2KiB slot) of the 4KiB page.
Fix the problem by treating the allocation alignment separately to any
additional alignment requirements from the device, using the maximum
of the two as the stride to search the buffer slots and taking care
to ensure a minimum of page-alignment for buffers larger than a page.
This also resolves swiotlb allocation failures occuring due to the
inclusion of ~PAGE_MASK in 'iotlb_align_mask' for large allocations and
resulting in alignment requirements exceeding swiotlb_max_mapping_size().
Fixes: bbb73a103fbb ("swiotlb: fix a braino in the alignment check fix")
Fixes: 0eee5ae10256 ("swiotlb: fix slot alignment checks")
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Petr Tesarik <petr.tesarik1@huawei-partners.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Rick Edgecombe [Thu, 22 Feb 2024 00:17:21 +0000 (16:17 -0800)]
dma-direct: Leak pages on dma_set_decrypted() failure
On TDX it is possible for the untrusted host to cause
set_memory_encrypted() or set_memory_decrypted() to fail such that an
error is returned and the resulting memory is shared. Callers need to
take care to handle these errors to avoid returning decrypted (shared)
memory to the page allocator, which could lead to functional or security
issues.
DMA could free decrypted/shared pages if dma_set_decrypted() fails. This
should be a rare case. Just leak the pages in this case instead of
freeing them.
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
ZhangPeng [Tue, 9 Jan 2024 07:04:56 +0000 (15:04 +0800)]
swiotlb: add debugfs to track swiotlb transient pool usage
Introduce a new debugfs interface io_tlb_transient_nslabs. The device
driver can create a new swiotlb transient memory pool once default
memory pool is full. To export the swiotlb transient memory pool usage
via debugfs would help the user estimate the size of transient swiotlb
memory pool or analyze device driver memory leak issue.
Signed-off-by: ZhangPeng <zhangpeng362@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Linus Torvalds [Wed, 28 Feb 2024 01:00:10 +0000 (17:00 -0800)]
Merge tag 'lsm-pr-
20240227' of git://git./linux/kernel/git/pcmoore/lsm
Pull lsm fixes from Paul Moore:
"Two small patches, one for AppArmor and one for SELinux, to fix
potential uninitialized variable problems in the new LSM syscalls we
added during the v6.8 merge window.
We haven't been able to get a response from John on the AppArmor
patch, but considering both the importance of the patch and it's
rather simple nature it seems like a good idea to get this merged
sooner rather than later.
I'm sure John is just taking some much needed vacation; if we need to
revise this when he gets back to his email we can"
* tag 'lsm-pr-
20240227' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
apparmor: fix lsm_get_self_attr()
selinux: fix lsm_get_self_attr()
Linus Torvalds [Wed, 28 Feb 2024 00:44:15 +0000 (16:44 -0800)]
Merge tag 'mm-hotfixes-stable-2024-02-27-14-52' of git://git./linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Six hotfixes. Three are cc:stable and the remainder address post-6.7
issues or aren't considered appropriate for backporting"
* tag 'mm-hotfixes-stable-2024-02-27-14-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/debug_vm_pgtable: fix BUG_ON with pud advanced test
mm: cachestat: fix folio read-after-free in cache walk
MAINTAINERS: add memory mapping entry with reviewers
mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index
kasan: revert eviction of stack traces in generic mode
stackdepot: use variable size records for non-evictable entries
Linus Torvalds [Mon, 26 Feb 2024 19:06:30 +0000 (11:06 -0800)]
Merge tag 'mtd/fixes-for-6.8-rc7' of git://git./linux/kernel/git/mtd/linux
Pull mtd fixes from Miquel Raynal:
"Many NAND page layouts have been added to the Marvell NAND controller
but could not be used in practice so they are being removed.
Regarding the SPI-NAND area, Gigadevice chips were not using the right
buffer for an ECC status check operation.
Aside from these driver fixes, there is also a refcount fix in the MTD
core nodes parsing logic"
* tag 'mtd/fixes-for-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: marvell: fix layouts
mtd: Fix possible refcounting issue when going through partition nodes
mtd: spinand: gigadevice: Fix the get ecc status issue
Linus Torvalds [Mon, 26 Feb 2024 19:00:54 +0000 (11:00 -0800)]
Merge tag 'for-6.8-rc6-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A more fixes for recently reported or discovered problems:
- fix corner case of send that would generate potentially large
stream of zeros if there's a hole at the end of the file
- fix chunk validation in zoned mode on conventional zones, it was
possible to create chunks that would not be allowed on sequential
zones
- fix validation of dev-replace ioctl filenames
- fix KCSAN warnings about access to block reserve struct members"
* tag 'for-6.8-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix data race at btrfs_use_block_rsv() when accessing block reserve
btrfs: fix data races when accessing the reserved amount of block reserves
btrfs: send: don't issue unnecessary zero writes for trailing hole
btrfs: dev-replace: properly validate device names
btrfs: zoned: don't skip block group profile checks on conventional zones
Mark O'Donovan [Wed, 21 Feb 2024 10:43:58 +0000 (10:43 +0000)]
fs/ntfs3: fix build without CONFIG_NTFS3_LZX_XPRESS
When CONFIG_NTFS3_LZX_XPRESS is not set then we get the following build
error:
fs/ntfs3/frecord.c:2460:16: error: unused variable ‘i_size’
Signed-off-by: Mark O'Donovan <shiftee@posteo.net>
Fixes: 4fd6c08a16d7 ("fs/ntfs3: Use i_size_read and i_size_write")
Tested-by: Chris Clayton <chris2553@googlemail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 25 Feb 2024 23:46:06 +0000 (15:46 -0800)]
Linux 6.8-rc6
Linus Torvalds [Sun, 25 Feb 2024 23:31:57 +0000 (15:31 -0800)]
Merge tag 'bcachefs-2024-02-25' of https://evilpiepirate.org/git/bcachefs
Pull bcachefs fixes from Kent Overstreet:
"Some more mostly boring fixes, but some not
User reported ones:
- the BTREE_ITER_FILTER_SNAPSHOTS one fixes a really nasty
performance bug; user reported an untar initially taking two
seconds and then ~2 minutes
- kill a __GFP_NOFAIL in the buffered read path; this was a leftover
from the trickier fix to kill __GFP_NOFAIL in readahead, where we
can't return errors (and have to silently truncate the read
ourselves).
bcachefs can't use GFP_NOFAIL for folio state unlike iomap based
filesystems because our folio state is just barely too big, 2MB
hugepages cause us to exceed the 2 page threshhold for GFP_NOFAIL.
additionally, the flags argument was just buggy, we weren't
supplying GFP_KERNEL previously (!)"
* tag 'bcachefs-2024-02-25' of https://evilpiepirate.org/git/bcachefs:
bcachefs: fix bch2_save_backtrace()
bcachefs: Fix check_snapshot() memcpy
bcachefs: Fix bch2_journal_flush_device_pins()
bcachefs: fix iov_iter count underflow on sub-block dio read
bcachefs: Fix BTREE_ITER_FILTER_SNAPSHOTS on inodes btree
bcachefs: Kill __GFP_NOFAIL in buffered read path
bcachefs: fix backpointer_to_text() when dev does not exist
Kent Overstreet [Sun, 25 Feb 2024 20:45:34 +0000 (15:45 -0500)]
bcachefs: fix bch2_save_backtrace()
Missed a call in the previous fix.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Linus Torvalds [Sun, 25 Feb 2024 18:58:12 +0000 (10:58 -0800)]
Merge tag 'docs-6.8-fixes3' of git://git.lwn.net/linux
Pull two documentation build fixes from Jonathan Corbet:
- The XFS online fsck documentation uses incredibly deeply nested
subsection and list nesting; that broke the PDF docs build. Tweak a
parameter to tell LaTeX to allow the deeper nesting.
- Fix a 6.8 PDF-build regression
* tag 'docs-6.8-fixes3' of git://git.lwn.net/linux:
docs: translations: use attribute to store current language
docs: Instruct LaTeX to cope with deeper nesting
Linus Torvalds [Sun, 25 Feb 2024 18:41:57 +0000 (10:41 -0800)]
Merge tag 'usb-6.8-rc6' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes for 6.8-rc6 to resolve some reported
problems. These include:
- regression fixes with typec tpcm code as reported by many
- cdnsp and cdns3 driver fixes
- usb role setting code bugfixes
- build fix for uhci driver
- ncm gadget driver bugfix
- MAINTAINERS entry update
All of these have been in linux-next all week with no reported issues
and there is at least one fix in here that is in Thorsten's regression
list that is being tracked"
* tag 'usb-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: tpcm: Fix issues with power being removed during reset
MAINTAINERS: Drop myself as maintainer of TYPEC port controller drivers
usb: gadget: ncm: Avoid dropping datagrams of properly parsed NTBs
Revert "usb: typec: tcpm: reset counter when enter into unattached state after try role"
usb: gadget: omap_udc: fix USB gadget regression on Palm TE
usb: dwc3: gadget: Don't disconnect if not started
usb: cdns3: fix memory double free when handle zero packet
usb: cdns3: fixed memory use after free at cdns3_gadget_ep_disable()
usb: roles: don't get/set_role() when usb_role_switch is unregistered
usb: roles: fix NULL pointer issue when put module's reference
usb: cdnsp: fixed issue with incorrect detecting CDNSP family controllers
usb: cdnsp: blocked some cdns3 specific code
usb: uhci-grlib: Explicitly include linux/platform_device.h
Linus Torvalds [Sun, 25 Feb 2024 18:35:41 +0000 (10:35 -0800)]
Merge tag 'tty-6.8-rc6' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial driver fixes from Greg KH:
"Here are three small serial/tty driver fixes for 6.8-rc6 that resolve
the following reported errors:
- riscv hvc console driver fix that was reported by many
- amba-pl011 serial driver fix for RS485 mode
- stm32 serial driver fix for RS485 mode
All of these have been in linux-next all week with no reported
problems"
* tag 'tty-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: amba-pl011: Fix DMA transmission in RS485 mode
serial: stm32: do not always set SER_RS485_RX_DURING_TX if RS485 is enabled
tty: hvc: Don't enable the RISC-V SBI console by default
Linus Torvalds [Sun, 25 Feb 2024 18:22:21 +0000 (10:22 -0800)]
Merge tag 'x86_urgent_for_v6.8_rc6' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Make sure clearing CPU buffers using VERW happens at the latest
possible point in the return-to-userspace path, otherwise memory
accesses after the VERW execution could cause data to land in CPU
buffers again
* tag 'x86_urgent_for_v6.8_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
KVM/VMX: Move VERW closer to VMentry for MDS mitigation
KVM/VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH
x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key
x86/entry_32: Add VERW just before userspace transition
x86/entry_64: Add VERW just before userspace transition
x86/bugs: Add asm helpers for executing VERW
Linus Torvalds [Sun, 25 Feb 2024 18:14:12 +0000 (10:14 -0800)]
Merge tag 'irq_urgent_for_v6.8_rc6' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Make sure GICv4 always gets initialized to prevent a kexec-ed kernel
from silently failing to set it up
- Do not call bus_get_dev_root() for the mbigen irqchip as it always
returns NULL - use NULL directly
- Fix hardware interrupt number truncation when assigning MSI
interrupts
- Correct sending end-of-interrupt messages to disabled interrupts
lines on RISC-V PLIC
* tag 'irq_urgent_for_v6.8_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3-its: Do not assume vPE tables are preallocated
irqchip/mbigen: Don't use bus_get_dev_root() to find the parent
PCI/MSI: Prevent MSI hardware interrupt number truncation
irqchip/sifive-plic: Enable interrupt if needed before EOI
Linus Torvalds [Sun, 25 Feb 2024 17:53:13 +0000 (09:53 -0800)]
Merge tag 'erofs-for-6.8-rc6-fixes' of git://git./linux/kernel/git/xiang/erofs
Pull erofs fix from Gao Xiang:
- Fix page refcount leak when looking up specific inodes
introduced by metabuf reworking
* tag 'erofs-for-6.8-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix refcount on the metabuf used for inode lookup
Linus Torvalds [Sun, 25 Feb 2024 17:29:05 +0000 (09:29 -0800)]
Merge tag 'pull-fixes.pathwalk-rcu-2' of git://git./linux/kernel/git/viro/vfs
Pull RCU pathwalk fixes from Al Viro:
"We still have some races in filesystem methods when exposed to RCU
pathwalk. This series is a result of code audit (the second round of
it) and it should deal with most of that stuff.
Still pending: ntfs3 ->d_hash()/->d_compare() and ceph_d_revalidate().
Up to maintainers (a note for NTFS folks - when documentation says
that a method may not block, it *does* imply that blocking allocations
are to be avoided. Really)"
[ More explanations for people who aren't familiar with the vagaries of
RCU path walking: most of it is hidden from filesystems, but if a
filesystem actively participates in the low-level path walking it
needs to make sure the fields involved in that walk are RCU-safe.
That "actively participate in low-level path walking" includes things
like having its own ->d_hash()/->d_compare() routines, or by having
its own directory permission function that doesn't just use the common
helpers. Having a ->d_revalidate() function will also have this issue.
Note that instead of making everything RCU safe you can also choose to
abort the RCU pathwalk if your operation cannot be done safely under
RCU, but that obviously comes with a performance penalty. One common
pattern is to allow the simple cases under RCU, and abort only if you
need to do something more complicated.
So not everything needs to be RCU-safe, and things like the inode etc
that the VFS itself maintains obviously already are. But these fixes
tend to be about properly RCU-delaying things like ->s_fs_info that
are maintained by the filesystem and that got potentially released too
early. - Linus ]
* tag 'pull-fixes.pathwalk-rcu-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ext4_get_link(): fix breakage in RCU mode
cifs_get_link(): bail out in unsafe case
fuse: fix UAF in rcu pathwalks
procfs: make freeing proc_fs_info rcu-delayed
procfs: move dropping pde and pid from ->evict_inode() to ->free_inode()
nfs: fix UAF on pathwalk running into umount
nfs: make nfs_set_verifier() safe for use in RCU pathwalk
afs: fix __afs_break_callback() / afs_drop_open_mmap() race
hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info
exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper
affs: free affs_sb_info with kfree_rcu()
rcu pathwalk: prevent bogus hard errors from may_lookup()
fs/super.c: don't drop ->s_user_ns until we free struct super_block itself
Linus Torvalds [Sun, 25 Feb 2024 17:17:15 +0000 (09:17 -0800)]
Merge tag 'pull-fixes' of git://git./linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"A couple of fixes - revert of regression from this cycle and a fix for
erofs failure exit breakage (had been there since way back)"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
erofs: fix handling kern_mount() failure
Revert "get rid of DCACHE_GENOCIDE"
Al Viro [Sat, 3 Feb 2024 06:17:34 +0000 (01:17 -0500)]
ext4_get_link(): fix breakage in RCU mode
1) errors from ext4_getblk() should not be propagated to caller
unless we are really sure that we would've gotten the same error
in non-RCU pathwalk.
2) we leak buffer_heads if ext4_getblk() is successful, but bh is
not uptodate.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 20 Sep 2023 02:28:16 +0000 (22:28 -0400)]
cifs_get_link(): bail out in unsafe case
->d_revalidate() bails out there, anyway. It's not enough
to prevent getting into ->get_link() in RCU mode, but that
could happen only in a very contrieved setup. Not worth
trying to do anything fancy here unless ->d_revalidate()
stops kicking out of RCU mode at least in some cases.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 28 Sep 2023 04:19:39 +0000 (00:19 -0400)]
fuse: fix UAF in rcu pathwalks
->permission(), ->get_link() and ->inode_get_acl() might dereference
->s_fs_info (and, in case of ->permission(), ->s_fs_info->fc->user_ns
as well) when called from rcu pathwalk.
Freeing ->s_fs_info->fc is rcu-delayed; we need to make freeing ->s_fs_info
and dropping ->user_ns rcu-delayed too.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 20 Sep 2023 04:12:00 +0000 (00:12 -0400)]
procfs: make freeing proc_fs_info rcu-delayed
makes proc_pid_ns() safe from rcu pathwalk (put_pid_ns()
is still synchronous, but that's not a problem - it does
rcu-delay everything that needs to be)
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 20 Sep 2023 03:52:58 +0000 (23:52 -0400)]
procfs: move dropping pde and pid from ->evict_inode() to ->free_inode()
that keeps both around until struct inode is freed, making access
to them safe from rcu-pathwalk
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 28 Sep 2023 02:11:26 +0000 (22:11 -0400)]
nfs: fix UAF on pathwalk running into umount
NFS ->d_revalidate(), ->permission() and ->get_link() need to access
some parts of nfs_server when called in RCU mode:
server->flags
server->caps
*(server->io_stats)
and, worst of all, call
server->nfs_client->rpc_ops->have_delegation
(the last one - as NFS_PROTO(inode)->have_delegation()). We really
don't want to RCU-delay the entire nfs_free_server() (it would have
to be done with schedule_work() from RCU callback, since it can't
be made to run from interrupt context), but actual freeing of
nfs_server and ->io_stats can be done via call_rcu() just fine.
nfs_client part is handled simply by making nfs_free_client() use
kfree_rcu().
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Thu, 28 Sep 2023 01:50:25 +0000 (21:50 -0400)]
nfs: make nfs_set_verifier() safe for use in RCU pathwalk
nfs_set_verifier() relies upon dentry being pinned; if that's
the case, grabbing ->d_lock stabilizes ->d_parent and guarantees
that ->d_parent points to a positive dentry. For something
we'd run into in RCU mode that is *not* true - dentry might've
been through dentry_kill() just as we grabbed ->d_lock, with
its parent going through the same just as we get to into
nfs_set_verifier_locked(). It might get to detaching inode
(and zeroing ->d_inode) before nfs_set_verifier_locked() gets
to fetching that; we get an oops as the result.
That can happen in nfs{,4} ->d_revalidate(); the call chain in
question is nfs_set_verifier_locked() <- nfs_set_verifier() <-
nfs_lookup_revalidate_delegated() <- nfs{,4}_do_lookup_revalidate().
We have checked that the parent had been positive, but that's
done before we get to nfs_set_verifier() and it's possible for
memory pressure to pick our dentry as eviction candidate by that
time. If that happens, back-to-back attempts to kill dentry and
its parent are quite normal. Sure, in case of eviction we'll
fail the ->d_seq check in the caller, but we need to survive
until we return there...
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 30 Sep 2023 00:24:34 +0000 (20:24 -0400)]
afs: fix __afs_break_callback() / afs_drop_open_mmap() race
In __afs_break_callback() we might check ->cb_nr_mmap and if it's non-zero
do queue_work(&vnode->cb_work). In afs_drop_open_mmap() we decrement
->cb_nr_mmap and do flush_work(&vnode->cb_work) if it reaches zero.
The trouble is, there's nothing to prevent __afs_break_callback() from
seeing ->cb_nr_mmap before the decrement and do queue_work() after both
the decrement and flush_work(). If that happens, we might be in trouble -
vnode might get freed before the queued work runs.
__afs_break_callback() is always done under ->cb_lock, so let's make
sure that ->cb_nr_mmap can change from non-zero to zero while holding
->cb_lock (the spinlock component of it - it's a seqlock and we don't
need to mess with the counter).
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 20 Sep 2023 00:18:59 +0000 (20:18 -0400)]
hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info
->d_hash() and ->d_compare() use those, so we need to delay freeing
them.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 19 Sep 2023 19:53:32 +0000 (15:53 -0400)]
exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper
That stuff can be accessed by ->d_hash()/->d_compare(); as it is, we have
a hard-to-hit UAF if rcu pathwalk manages to get into ->d_hash() on a filesystem
that is in process of getting shut down.
Besides, having nls and upcase table cleanup moved from ->put_super() towards
the place where sbi is freed makes for simpler failure exits.
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Tue, 19 Sep 2023 23:36:07 +0000 (19:36 -0400)]
affs: free affs_sb_info with kfree_rcu()
one of the flags in it is used by ->d_hash()/->d_compare()
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 30 Sep 2023 01:11:41 +0000 (21:11 -0400)]
rcu pathwalk: prevent bogus hard errors from may_lookup()
If lazy call of ->permission() returns a hard error, check that
try_to_unlazy() succeeds before returning it. That both makes
life easier for ->permission() instances and closes the race
in ENOTDIR handling - it is possible that positive d_can_lookup()
seen in link_path_walk() applies to the state *after* unlink() +
mkdir(), while nd->inode matches the state prior to that.
Normally seeing e.g. EACCES from permission check in rcu pathwalk
means that with some timings non-rcu pathwalk would've run into
the same; however, running into a non-executable regular file
in the middle of a pathname would not get to permission check -
it would fail with ENOTDIR instead.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 2 Feb 2024 02:10:01 +0000 (21:10 -0500)]
fs/super.c: don't drop ->s_user_ns until we free struct super_block itself
Avoids fun races in RCU pathwalk... Same goes for freeing LSM shite
hanging off super_block's arse.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Kent Overstreet [Sat, 24 Feb 2024 06:18:45 +0000 (01:18 -0500)]
bcachefs: Fix check_snapshot() memcpy
check_snapshot() copies the bch_snapshot to a temporary to easily handle
older versions that don't have all the fields of the current version,
but it lacked a min() to correctly handle keys newer and larger than the
current version.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 18 Feb 2024 01:38:47 +0000 (20:38 -0500)]
bcachefs: Fix bch2_journal_flush_device_pins()
If a journal write errored, the list of devices it was written to could
be empty - we're not supposed to mark an empty replicas list.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Brian Foster [Thu, 15 Feb 2024 17:16:05 +0000 (12:16 -0500)]
bcachefs: fix iov_iter count underflow on sub-block dio read
bch2_direct_IO_read() checks the request offset and size for sector
alignment and then falls through to a couple calculations to shrink
the size of the request based on the inode size. The problem is that
these checks round up to the fs block size, which runs the risk of
underflowing iter->count if the block size happens to be large
enough. This is triggered by fstest generic/361 with a 4k block
size, which subsequently leads to a crash. To avoid this crash,
check that the shorten length doesn't exceed the overall length of
the iter.
Fixes:
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Su Yue <glass.su@suse.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 25 Feb 2024 00:14:36 +0000 (19:14 -0500)]
bcachefs: Fix BTREE_ITER_FILTER_SNAPSHOTS on inodes btree
If we're in FILTER_SNAPSHOTS mode and we start scanning a range of the
keyspace where no keys are visible in the current snapshot, we have a
problem - we'll scan for a very long time before scanning terminates.
Awhile back, this was fixed for most cases with peek_upto() (and
assertions that enforce that it's being used).
But the fix missed the fact that the inodes btree is different - every
key offset is in a different snapshot tree, not just the inode field.
Fixes:
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 23 Feb 2024 02:39:13 +0000 (21:39 -0500)]
bcachefs: Kill __GFP_NOFAIL in buffered read path
Recently, we fixed our __GFP_NOFAIL usage in the readahead path, but the
easy one in read_single_folio() (where wa can return an error) was
missed - oops.
Fixes:
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 21 Feb 2024 03:16:00 +0000 (22:16 -0500)]
bcachefs: fix backpointer_to_text() when dev does not exist
Fixes:
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Linus Torvalds [Sun, 25 Feb 2024 00:49:51 +0000 (16:49 -0800)]
Merge tag 'powerpc-6.8-4' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix a crash when hot adding a PCI device to an LPAR since
recent changes
- Fix nested KVM level-2 guest reboot failure due to empty
'arch_compat'
Thanks to Amit Machhiwal, Aneesh Kumar K.V (IBM), Brian King, Gaurav
Batra, and Vaibhav Jain.
* tag 'powerpc-6.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Book3S HV: Fix L2 guest reboot failure due to empty 'arch_compat'
powerpc/pseries/iommu: DLPAR add doesn't completely initialize pci_controller
Linus Torvalds [Sat, 24 Feb 2024 23:59:26 +0000 (15:59 -0800)]
Merge tag 'iommu-fixes-v6.8-rc5' of git://git./linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Intel VT-d fixes for nested domain handling:
- Cache invalidation for changes in a parent domain
- Dirty tracking setting for parent and nested domains
- Fix a constant-out-of-range warning
- ARM SMMU fixes:
- Fix CD allocation from atomic context when using SVA with SMMUv3
- Revert the conversion of SMMUv2 to domain_alloc_paging(), as it
breaks the boot for Qualcomm MSM8996 devices
- Restore SVA handle sharing in core code as it turned out there are
still drivers relying on it
* tag 'iommu-fixes-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/sva: Restore SVA handle sharing
iommu/arm-smmu-v3: Do not use GFP_KERNEL under as spinlock
iommu/vt-d: Fix constant-out-of-range warning
iommu/vt-d: Set SSADE when attaching to a parent with dirty tracking
iommu/vt-d: Add missing dirty tracking set for parent domain
iommu/vt-d: Wrap the dirty tracking loop to be a helper
iommu/vt-d: Remove domain parameter for intel_pasid_setup_dirty_tracking()
iommu/vt-d: Add missing device iotlb flush for parent domain
iommu/vt-d: Update iotlb in nested domain attach
iommu/vt-d: Add missing iotlb flush for parent domain
iommu/vt-d: Add __iommu_flush_iotlb_psi()
iommu/vt-d: Track nested domains in parent
Revert "iommu/arm-smmu: Convert to domain_alloc_paging()"
Linus Torvalds [Sat, 24 Feb 2024 23:53:40 +0000 (15:53 -0800)]
Merge tag 'cxl-fixes-6.8-rc6' of git://git./linux/kernel/git/cxl/cxl
Pull cxl fixes from Dan Williams:
"A collection of significant fixes for the CXL subsystem.
The largest change in this set, that bordered on "new development", is
the fix for the fact that the location of the new qos_class attribute
did not match the Documentation. The fix ends up deleting more code
than it added, and it has a new unit test to backstop basic errors in
this interface going forward. So the "red-diff" and unit test saved
the "rip it out and try again" response.
In contrast, the new notification path for firmware reported CXL
errors (CXL CPER notifications) has a locking context bug that can not
be fixed with a red-diff. Given where the release cycle stands, it is
not comfortable to squeeze in that fix in these waning days. So, that
receives the "back it out and try again later" treatment.
There is a regression fix in the code that establishes memory NUMA
nodes for platform CXL regions. That has an ack from x86 folks. There
are a couple more fixups for Linux to understand (reassemble) CXL
regions instantiated by platform firmware. The policy around platforms
that do not match host-physical-address with system-physical-address
(i.e. systems that have an address translation mechanism between the
address range reported in the ACPI CEDT.CFMWS and endpoint decoders)
has been softened to abort driver load rather than teardown the memory
range (can cause system hangs). Lastly, there is a robustness /
regression fix for cases where the driver would previously continue in
the face of error, and a fixup for PCI error notification handling.
Summary:
- Fix NUMA initialization from ACPI CEDT.CFMWS
- Fix region assembly failures due to async init order
- Fix / simplify export of qos_class information
- Fix cxl_acpi initialization vs single-window-init failures
- Fix handling of repeated 'pci_channel_io_frozen' notifications
- Workaround platforms that violate host-physical-address ==
system-physical address assumptions
- Defer CXL CPER notification handling to v6.9"
* tag 'cxl-fixes-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/acpi: Fix load failures due to single window creation failure
acpi/ghes: Remove CXL CPER notifications
cxl/pci: Fix disabling memory if DVSEC CXL Range does not match a CFMWS window
cxl/test: Add support for qos_class checking
cxl: Fix sysfs export of qos_class for memdev
cxl: Remove unnecessary type cast in cxl_qos_class_verify()
cxl: Change 'struct cxl_memdev_state' *_perf_list to single 'struct cxl_dpa_perf'
cxl/region: Allow out of order assembly of autodiscovered regions
cxl/region: Handle endpoint decoders in cxl_region_find_decoder()
x86/numa: Fix the sort compare func used in numa_fill_memblks()
x86/numa: Fix the address overlap check in numa_fill_memblks()
cxl/pci: Skip to handle RAS errors if CXL.mem device is detached
Linus Torvalds [Sat, 24 Feb 2024 17:55:29 +0000 (09:55 -0800)]
Merge tag 'for-6.8/dm-fix-3' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fix from Mike Snitzer:
- Fix DM integrity and verity targets to not use excessive stack when
they recheck in the error path.
* tag 'for-6.8/dm-fix-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm-integrity, dm-verity: reduce stack usage for recheck
Linus Torvalds [Sat, 24 Feb 2024 17:49:16 +0000 (09:49 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Six fixes: the four driver ones are pretty trivial.
The larger two core changes are to try to fix various USB attached
devices which have somewhat eccentric ways of handling the VPD and
other mode pages which necessitate multiple revalidates (that were
removed in the interests of efficiency) and updating the heuristic for
supported VPD pages"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: jazz_esp: Only build if SCSI core is builtin
scsi: smartpqi: Fix disable_managed_interrupts
scsi: ufs: Uninitialized variable in ufshcd_devfreq_target()
scsi: target: pscsi: Fix bio_put() for error case
scsi: core: Consult supported VPD page list prior to fetching page
scsi: sd: usb_storage: uas: Access media prior to querying device properties
Linus Torvalds [Sat, 24 Feb 2024 17:46:05 +0000 (09:46 -0800)]
Merge tag 'i2c-for-6.8-rc6' of git://git./linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
"A bugfix for host drivers"
* tag 'i2c-for-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: imx: when being a target, mark the last read as processed
Linus Torvalds [Sat, 24 Feb 2024 17:36:35 +0000 (09:36 -0800)]
Merge tag 'loongarch-fixes-6.8-3' of git://git./linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Fix two cpu-hotplug issues, fix the init sequence about FDT system,
fix the coding style of dts, and fix the wrong CPUCFG ID handling of
KVM"
* tag 'loongarch-fixes-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: KVM: Streamline kvm_check_cpucfg() and improve comments
LoongArch: KVM: Rename _kvm_get_cpucfg() to _kvm_get_cpucfg_mask()
LoongArch: KVM: Fix input validation of _kvm_get_cpucfg() & kvm_check_cpucfg()
LoongArch: dts: Minor whitespace cleanup
LoongArch: Call early_init_fdt_scan_reserved_mem() earlier
LoongArch: Update cpu_sibling_map when disabling nonboot CPUs
LoongArch: Disable IRQ before init_fn() for nonboot CPUs
Arnd Bergmann [Sat, 24 Feb 2024 13:48:03 +0000 (14:48 +0100)]
dm-integrity, dm-verity: reduce stack usage for recheck
The newly added integrity_recheck() function has another larger stack
allocation, just like its caller integrity_metadata(). When it gets
inlined, the combination of the two exceeds the warning limit for 32-bit
architectures and possibly risks an overflow when this is called from
a deep call chain through a file system:
drivers/md/dm-integrity.c:1767:13: error: stack frame size (1048) exceeds limit (1024) in 'integrity_metadata' [-Werror,-Wframe-larger-than]
1767 | static void integrity_metadata(struct work_struct *w)
Since the caller at this point is done using its checksum buffer,
just reuse the same buffer in the new function to avoid the double
allocation.
[Mikulas: add "noinline" to integrity_recheck and verity_recheck.
These functions are only called on error, so they shouldn't bloat the
stack frame or code size of the caller.]
Fixes: c88f5e553fe3 ("dm-integrity: recheck the integrity tag after a failure")
Fixes: 9177f3c0dea6 ("dm-verity: recheck the hash after a failure")
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Aneesh Kumar K.V (IBM) [Mon, 29 Jan 2024 06:00:22 +0000 (11:30 +0530)]
mm/debug_vm_pgtable: fix BUG_ON with pud advanced test
Architectures like powerpc add debug checks to ensure we find only devmap
PUD pte entries. These debug checks are only done with CONFIG_DEBUG_VM.
This patch marks the ptes used for PUD advanced test devmap pte entries so
that we don't hit on debug checks on architecture like ppc64 as below.
WARNING: CPU: 2 PID: 1 at arch/powerpc/mm/book3s64/radix_pgtable.c:1382 radix__pud_hugepage_update+0x38/0x138
....
NIP [
c0000000000a7004] radix__pud_hugepage_update+0x38/0x138
LR [
c0000000000a77a8] radix__pudp_huge_get_and_clear+0x28/0x60
Call Trace:
[
c000000004a2f950] [
c000000004a2f9a0] 0xc000000004a2f9a0 (unreliable)
[
c000000004a2f980] [
000d34c100000000] 0xd34c100000000
[
c000000004a2f9a0] [
c00000000206ba98] pud_advanced_tests+0x118/0x334
[
c000000004a2fa40] [
c00000000206db34] debug_vm_pgtable+0xcbc/0x1c48
[
c000000004a2fc10] [
c00000000000fd28] do_one_initcall+0x60/0x388
Also
kernel BUG at arch/powerpc/mm/book3s64/pgtable.c:202!
....
NIP [
c000000000096510] pudp_huge_get_and_clear_full+0x98/0x174
LR [
c00000000206bb34] pud_advanced_tests+0x1b4/0x334
Call Trace:
[
c000000004a2f950] [
000d34c100000000] 0xd34c100000000 (unreliable)
[
c000000004a2f9a0] [
c00000000206bb34] pud_advanced_tests+0x1b4/0x334
[
c000000004a2fa40] [
c00000000206db34] debug_vm_pgtable+0xcbc/0x1c48
[
c000000004a2fc10] [
c00000000000fd28] do_one_initcall+0x60/0x388
Link: https://lkml.kernel.org/r/20240129060022.68044-1-aneesh.kumar@kernel.org
Fixes: 27af67f35631 ("powerpc/book3s64/mm: enable transparent pud hugepage")
Signed-off-by: Aneesh Kumar K.V (IBM) <aneesh.kumar@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Nhat Pham [Tue, 20 Feb 2024 03:01:21 +0000 (19:01 -0800)]
mm: cachestat: fix folio read-after-free in cache walk
In cachestat, we access the folio from the page cache's xarray to compute
its page offset, and check for its dirty and writeback flags. However, we
do not hold a reference to the folio before performing these actions,
which means the folio can concurrently be released and reused as another
folio/page/slab.
Get around this altogether by just using xarray's existing machinery for
the folio page offsets and dirty/writeback states.
This changes behavior for tmpfs files to now always report zeroes in their
dirty and writeback counters. This is okay as tmpfs doesn't follow
conventional writeback cache behavior: its pages get "cleaned" during
swapout, after which they're no longer resident etc.
Link: https://lkml.kernel.org/r/20240220153409.GA216065@cmpxchg.org
Fixes: cf264e1329fb ("cachestat: implement cachestat syscall")
Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Jann Horn <jannh@google.com>
Cc: <stable@vger.kernel.org> [6.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lorenzo Stoakes [Tue, 20 Feb 2024 06:44:10 +0000 (06:44 +0000)]
MAINTAINERS: add memory mapping entry with reviewers
Recently there have been a number of patches which have affected various
aspects of the memory mapping logic as implemented in mm/mmap.c where it
would have been useful for regular contributors to have been notified.
Add an entry for this part of mm in particular with regular contributors
tagged as reviewers.
Link: https://lkml.kernel.org/r/20240220064410.4639-1-lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Byungchul Park [Fri, 16 Feb 2024 11:15:02 +0000 (20:15 +0900)]
mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index
With numa balancing on, when a numa system is running where a numa node
doesn't have its local memory so it has no managed zones, the following
oops has been observed. It's because wakeup_kswapd() is called with a
wrong zone index, -1. Fixed it by checking the index before calling
wakeup_kswapd().
> BUG: unable to handle page fault for address:
00000000000033f3
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 2 PID: 895 Comm: masim Not tainted 6.6.0-dirty #255
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
>
rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
> RIP: 0010:wakeup_kswapd (./linux/mm/vmscan.c:7812)
> Code: (omitted)
> RSP: 0000:
ffffc90004257d58 EFLAGS:
00010286
> RAX:
ffffffffffffffff RBX:
ffff88883fff0480 RCX:
0000000000000003
> RDX:
0000000000000000 RSI:
0000000000000000 RDI:
ffff88883fff0480
> RBP:
ffffffffffffffff R08:
ff0003ffffffffff R09:
ffffffffffffffff
> R10:
ffff888106c95540 R11:
0000000055555554 R12:
0000000000000003
> R13:
0000000000000000 R14:
0000000000000000 R15:
ffff88883fff0940
> FS:
00007fc4b8124740(0000) GS:
ffff888827c00000(0000) knlGS:
0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
> CR2:
00000000000033f3 CR3:
000000026cc08004 CR4:
0000000000770ee0
> DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
> DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
> PKRU:
55555554
> Call Trace:
> <TASK>
> ? __die
> ? page_fault_oops
> ? __pte_offset_map_lock
> ? exc_page_fault
> ? asm_exc_page_fault
> ? wakeup_kswapd
> migrate_misplaced_page
> __handle_mm_fault
> handle_mm_fault
> do_user_addr_fault
> exc_page_fault
> asm_exc_page_fault
> RIP: 0033:0x55b897ba0808
> Code: (omitted)
> RSP: 002b:
00007ffeefa821a0 EFLAGS:
00010287
> RAX:
000055b89983acd0 RBX:
00007ffeefa823f8 RCX:
000055b89983acd0
> RDX:
00007fc2f8122010 RSI:
0000000000020000 RDI:
000055b89983acd0
> RBP:
00007ffeefa821a0 R08:
0000000000000037 R09:
0000000000000075
> R10:
0000000000000000 R11:
0000000000000202 R12:
0000000000000000
> R13:
00007ffeefa82410 R14:
000055b897ba5dd8 R15:
00007fc4b8340000
> </TASK>
Link: https://lkml.kernel.org/r/20240216111502.79759-1-byungchul@sk.com
Signed-off-by: Byungchul Park <byungchul@sk.com>
Reported-by: Hyeongtak Ji <hyeongtak.ji@sk.com>
Fixes: c574bbe917036 ("NUMA balancing: optimize page placement for memory tiering system")
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Marco Elver [Mon, 29 Jan 2024 10:07:02 +0000 (11:07 +0100)]
kasan: revert eviction of stack traces in generic mode
This partially reverts commits
cc478e0b6bdf,
63b85ac56a64,
08d7c94d9635,
a414d4286f34, and
773688a6cb24 to make use of variable-sized stack depot
records, since eviction of stack entries from stack depot forces fixed-
sized stack records. Care was taken to retain the code cleanups by the
above commits.
Eviction was added to generic KASAN as a response to alleviating the
additional memory usage from fixed-sized stack records, but this still
uses more memory than previously.
With the re-introduction of variable-sized records for stack depot, we can
just switch back to non-evictable stack records again, and return back to
the previous performance and memory usage baseline.
Before (observed after a KASAN kernel boot):
pools: 597
refcounted_allocations: 17547
refcounted_frees: 6477
refcounted_in_use: 11070
freelist_size: 3497
persistent_count: 12163
persistent_bytes:
1717008
After:
pools: 319
refcounted_allocations: 0
refcounted_frees: 0
refcounted_in_use: 0
freelist_size: 0
persistent_count: 29397
persistent_bytes:
5183536
As can be seen from the counters, with a generic KASAN config, refcounted
allocations and evictions are no longer used. Due to using variable-sized
records, I observe a reduction of 278 stack depot pools (saving 4448 KiB)
with my test setup.
Link: https://lkml.kernel.org/r/20240129100708.39460-2-elver@google.com
Fixes: cc478e0b6bdf ("kasan: avoid resetting aux_lock")
Fixes: 63b85ac56a64 ("kasan: stop leaking stack trace handles")
Fixes: 08d7c94d9635 ("kasan: memset free track in qlink_free")
Fixes: a414d4286f34 ("kasan: handle concurrent kasan_record_aux_stack calls")
Fixes: 773688a6cb24 ("kasan: use stack_depot_put for Generic mode")
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Marco Elver [Mon, 29 Jan 2024 10:07:01 +0000 (11:07 +0100)]
stackdepot: use variable size records for non-evictable entries
With the introduction of stack depot evictions, each stack record is now
fixed size, so that future reuse after an eviction can safely store
differently sized stack traces. In all cases that do not make use of
evictions, this wastes lots of space.
Fix it by re-introducing variable size stack records (up to the max
allowed size) for entries that will never be evicted. We know if an entry
will never be evicted if the flag STACK_DEPOT_FLAG_GET is not provided,
since a later stack_depot_put() attempt is undefined behavior.
With my current kernel config that enables KASAN and also SLUB owner
tracking, I observe (after a kernel boot) a whopping reduction of 296
stack depot pools, which translates into 4736 KiB saved. The savings here
are from SLUB owner tracking only, because KASAN generic mode still uses
refcounting.
Before:
pools: 893
allocations: 29841
frees: 6524
in_use: 23317
freelist_size: 3454
After:
pools: 597
refcounted_allocations: 17547
refcounted_frees: 6477
refcounted_in_use: 11070
freelist_size: 3497
persistent_count: 12163
persistent_bytes:
1717008
[elver@google.com: fix -Wstringop-overflow warning]
Link: https://lore.kernel.org/all/20240201135747.18eca98e@canb.auug.org.au/
Link: https://lkml.kernel.org/r/20240201090434.1762340-1-elver@google.com
Link: https://lore.kernel.org/all/CABXGCsOzpRPZGg23QqJAzKnqkZPKzvieeg=W7sgjgi3q0pBo0g@mail.gmail.com/
Link: https://lkml.kernel.org/r/20240129100708.39460-1-elver@google.com
Link: https://lore.kernel.org/all/CABXGCsOzpRPZGg23QqJAzKnqkZPKzvieeg=W7sgjgi3q0pBo0g@mail.gmail.com/
Fixes: 108be8def46e ("lib/stackdepot: allow users to evict stack traces")
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Corey Minyard [Wed, 21 Feb 2024 19:27:13 +0000 (20:27 +0100)]
i2c: imx: when being a target, mark the last read as processed
When being a target, NAK from the controller means that all bytes have
been transferred. So, the last byte needs also to be marked as
'processed'. Otherwise index registers of backends may not increase.
Fixes: f7414cd6923f ("i2c: imx: support slave mode for imx I2C driver")
Signed-off-by: Corey Minyard <minyard@acm.org>
Tested-by: Andrew Manley <andrew.manley@sealingtech.com>
Reviewed-by: Andrew Manley <andrew.manley@sealingtech.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
[wsa: fixed comment and commit message to properly describe the case]
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
Mickaël Salaün [Fri, 23 Feb 2024 19:05:46 +0000 (20:05 +0100)]
apparmor: fix lsm_get_self_attr()
In apparmor_getselfattr() when an invalid AppArmor attribute is
requested, or a value hasn't been explicitly set for the requested
attribute, the label passed to aa_put_label() is not properly
initialized which can cause problems when the pointer value is non-NULL
and AppArmor attempts to drop a reference on the bogus label object.
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: John Johansen <john.johansen@canonical.com>
Fixes: 223981db9baf ("AppArmor: Add selfattr hooks")
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Reviewed-by: Paul Moore <paul@paul-moore.com>
[PM: description changes as discussed with MS]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Mickaël Salaün [Fri, 23 Feb 2024 19:05:45 +0000 (20:05 +0100)]
selinux: fix lsm_get_self_attr()
selinux_getselfattr() doesn't properly initialize the string pointer
it passes to selinux_lsm_getattr() which can cause a problem when an
attribute hasn't been explicitly set; selinux_lsm_getattr() returns
0/success, but does not set or initialize the string label/attribute.
Failure to properly initialize the string causes problems later in
selinux_getselfattr() when the function attempts to kfree() the
string.
Cc: Casey Schaufler <casey@schaufler-ca.com>
Fixes: 762c934317e6 ("SELinux: Add selfattr hooks")
Suggested-by: Paul Moore <paul@paul-moore.com>
[PM: description changes as discussed in the thread]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Linus Torvalds [Fri, 23 Feb 2024 18:40:20 +0000 (10:40 -0800)]
Merge tag 'parisc-for-6.8-rc6' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc architecture fixes from Helge Deller:
"Fixes CPU hotplug, the parisc stack unwinder and two possible build
errors in kprobes and ftrace area:
- Fix CPU hotplug
- Fix unaligned accesses and faults in stack unwinder
- Fix potential build errors by always including asm-generic/kprobes.h
- Fix build bug by add missing CONFIG_DYNAMIC_FTRACE check"
* tag 'parisc-for-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix stack unwinder
parisc/kprobes: always include asm-generic/kprobes.h
parisc/ftrace: add missing CONFIG_DYNAMIC_FTRACE check
Revert "parisc: Only list existing CPUs in cpu_possible_mask"
Linus Torvalds [Fri, 23 Feb 2024 18:31:28 +0000 (10:31 -0800)]
Merge tag 'arm-fixes-6.8-2' of git://git./linux/kernel/git/soc/soc
Pull arm and RISC-V SoC fixes from Arnd Bergmann:
"The Rockchip and IMX8 platforms get a number of fixes for dts files in
order to address some misconfigurations, including a regression for
USB-C support on some boards.
The other dts fixes are part of a series by Rob Herring to clean up
another class of dtc compiler warnings across all platforms, with a
few others helping out as well. With this, we can enable the warning
for the coming merge window without introducing regressions.
Conor Dooley has collected fixes for RISC-V platforms, both for the
dts files and for platofrm specific drivers.
The ep93xx platform gets a regression for for its gpio descriptors"
* tag 'arm-fixes-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (28 commits)
ARM: dts: renesas: rcar-gen2: Add missing #interrupt-cells to DA9063 nodes
cache: ax45mp_cache: Align end size to cache boundary in ax45mp_dma_cache_wback()
arm64: dts: qcom: Fix interrupt-map cell sizes
arm: dts: Fix dtc interrupt_map warnings
arm64: dts: Fix dtc interrupt_provider warnings
arm: dts: Fix dtc interrupt_provider warnings
arm64: dts: freescale: Disable interrupt_map check
ARM: ep93xx: Add terminator to gpiod_lookup_table
riscv: dts: sifive: add missing #interrupt-cells to pmic
arm64: dts: rockchip: Correct Indiedroid Nova GPIO Names
arm64: dts: rockchip: Drop interrupts property from rk3328 pwm-rockchip node
arm64: dts: rockchip: set num-cs property for spi on px30
arm64: dts: rockchip: minor rk3588 whitespace cleanup
riscv: dts: starfive: replace underscores in node names
bus: imx-weim: fix valid range check
Revert "arm64: dts: imx8mn-var-som-symphony: Describe the USB-C connector"
Revert "arm64: dts: imx8mp-dhcom-pdk3: Describe the USB-C connector"
arm64: dts: tqma8mpql: fix audio codec iov-supply
arm64: dts: rockchip: drop unneeded status from rk3588-jaguar gpio-leds
ARM: dts: rockchip: Drop interrupts property from pwm-rockchip nodes
...
Linus Torvalds [Fri, 23 Feb 2024 18:26:43 +0000 (10:26 -0800)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"A simple fix to a definition in the CXL PMU driver, a couple of
patches to restore SME control registers on the resume path (since
Arm's fast model now clears them) and a revert for our jump label asm
constraints after Geert noticed they broke the build with GCC 5.5.
There was then the ensuing discussion about raising the minimum GCC
(and corresponding binutils) versions at [1], but for now we'll keep
things working as they were until that goes ahead.
- Revert fix to jump label asm constraints, as it regresses the build
with some GCC 5.5 toolchains.
- Restore SME control registers when resuming from suspend
- Fix incorrect filter definition in CXL PMU driver"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/sme: Restore SMCR_EL1.EZT0 on exit from suspend
arm64/sme: Restore SME registers on exit from suspend
Revert "arm64: jump_label: use constraints "Si" instead of "i""
perf: CXL: fix CPMU filter value mask length
Linus Torvalds [Fri, 23 Feb 2024 17:54:13 +0000 (09:54 -0800)]
Merge tag 's390-6.8-3' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:
- Fix invalid -EBUSY on ccw_device_start() which can lead to failing
device initialization
- Add missing multiplication by 8 in __iowrite64_copy() to get the
correct byte length before calling zpci_memcpy_toio()
- Various config updates
* tag 's390-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/cio: fix invalid -EBUSY on ccw_device_start
s390: use the correct count for __iowrite64_copy()
s390/configs: update default configurations
s390/configs: enable INIT_STACK_ALL_ZERO in all configurations
s390/configs: provide compat topic configuration target
Linus Torvalds [Fri, 23 Feb 2024 17:43:21 +0000 (09:43 -0800)]
Merge tag 'mm-hotfixes-stable-2024-02-22-15-02' of git://git./linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"A batch of MM (and one non-MM) hotfixes.
Ten are cc:stable and the remainder address post-6.7 issues or aren't
considered appropriate for backporting"
* tag 'mm-hotfixes-stable-2024-02-22-15-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
kasan: guard release_free_meta() shadow access with kasan_arch_is_ready()
mm/damon/lru_sort: fix quota status loss due to online tunings
mm/damon/reclaim: fix quota stauts loss due to online tunings
MAINTAINERS: mailmap: update Shakeel's email address
mm/damon/sysfs-schemes: handle schemes sysfs dir removal before commit_schemes_quota_goals
mm: memcontrol: clarify swapaccount=0 deprecation warning
mm/memblock: add MEMBLOCK_RSRV_NOINIT into flagname[] array
mm/zswap: invalidate duplicate entry when !zswap_enabled
lib/Kconfig.debug: TEST_IOV_ITER depends on MMU
mm/swap: fix race when skipping swapcache
mm/swap_state: update zswap LRU's protection range with the folio locked
selftests/mm: uffd-unit-test check if huge page size is 0
mm/damon/core: check apply interval in damon_do_apply_schemes()
mm: zswap: fix missing folio cleanup in writeback race path
Linus Torvalds [Fri, 23 Feb 2024 17:23:54 +0000 (09:23 -0800)]
Merge tag 'for-6.8/dm-fixes-2' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Stable fixes for 3 DM targets (integrity, verity and crypt) to
address systemic failure that can occur if user provided pages map to
the same block.
- Fix DM crypt to not allow modifying data that being encrypted for
authenticated encryption.
- Fix DM crypt and verity targets to align their respective bvec_iter
struct members to avoid the need for byte level access (due to
__packed attribute) that is costly on some arches (like RISC).
* tag 'for-6.8/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm-crypt, dm-integrity, dm-verity: bump target version
dm-verity, dm-crypt: align "struct bvec_iter" correctly
dm-crypt: recheck the integrity tag after a failure
dm-crypt: don't modify the data when using authenticated encryption
dm-verity: recheck the hash after a failure
dm-integrity: recheck the integrity tag after a failure
Linus Torvalds [Fri, 23 Feb 2024 17:17:47 +0000 (09:17 -0800)]
Merge tag 'drm-fixes-2024-02-23' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"This is the weekly drm fixes. Non-drivers there is a fbdev/sparc fix,
syncobj, ttm and buddy fixes.
On the driver side, ivpu, meson, i915 have a small fix each. Then
amdgpu and xe have a bunch. Nouveau has some minor uapi additions to
give userspace some useful info along with a Kconfig change to allow
the new GSP firmware paths to be used by default on the GPUs it
supports.
Seems about the usual amount for this time of release cycle.
fbdev:
- fix sparc undefined reference
syncobj:
- fix sync obj fence waiting
- handle NULL fence in syncobj eventfd code
ttm:
- fix invalid free
buddy:
- fix list handling
- fix 32-bit build
meson:
- don't remove bridges from other drivers
nouveau:
- fix build warnings
- add two minor info parameters
- add a Kconfig to allow GSP by default on some GPUs
ivpu:
- allow fw to do initial tile config
i915:
- fix TV mode
amdgpu:
- Suspend/resume fixes
- Backlight error fix
- DCN 3.5 fixes
- Misc fixes
xe:
- Remove support for persistent exec_queues
- Drop a reduntant sysfs newline printout
- A three-patch fix for a VM_BIND rebind optimization path
- Fix a modpost warning on an xe KUNIT module"
* tag 'drm-fixes-2024-02-23' of git://anongit.freedesktop.org/drm/drm: (27 commits)
nouveau: add an ioctl to report vram usage
nouveau: add an ioctl to return vram bar size.
nouveau/gsp: add kconfig option to enable GSP paths by default
drm/amdgpu: Fix the runtime resume failure issue
drm/amd/display: fix null-pointer dereference on edid reading
drm/amd/display: Fix memory leak in dm_sw_fini()
drm/amd/display: fix input states translation error for dcn35 & dcn351
drm/amd/display: Fix potential null pointer dereference in dc_dmub_srv
drm/amd/display: Only allow dig mapping to pwrseq in new asic
drm/amd/display: adjust few initialization order in dm
drm/syncobj: handle NULL fence in syncobj_eventfd_entry_func
drm/syncobj: call drm_syncobj_fence_add_wait when WAIT_AVAILABLE flag is set
drm/ttm: Fix an invalid freeing on already freed page in error path
sparc: Fix undefined reference to fb_is_primary_device
drm/xe: Fix modpost warning on xe_mocs kunit module
drm/xe/xe_gt_idle: Drop redundant newline in name
drm/xe: Return 2MB page size for compact 64k PTEs
drm/xe: Add XE_VMA_PTE_64K VMA flag
drm/xe: Fix xe_vma_set_pte_size
drm/xe/uapi: Remove support for persistent exec_queues
...
Linus Torvalds [Fri, 23 Feb 2024 17:05:56 +0000 (09:05 -0800)]
Merge tag 'ata-6.8-rc6' of git://git./linux/kernel/git/libata/linux
Pull ata fixes from Niklas Cassel:
- Do not try to set a sleeping device to standby. Sleep is a deeper
sleep state than standby, and needs a reset to wake up the drive. A
system resume will reset the port. Sending a command other than reset
to a sleeping device is not wise, as the command will timeout (Damien
Le Moal)
- Do not try to put a device to standby twice during system shutdown.
ata_dev_power_set_standby() is currently called twice during
shutdown, once after the scsi device is removed, and another when
ata_pci_shutdown_one() executes. Modify ata_dev_power_set_standby()
to do nothing if the device is already in standby (Damien Le Moal)
- Add a quirk for ASM1064 to fixup the number of implemented ports. We
probe all ports that the hardware reports to be implemented. Probing
ports that are not implemented causes significantly increased boot
time (Andrey Jr. Melnikov)
- Fix error handling for the ahci_ceva driver. Ensure that the
ahci_ceva driver does a proper cleanup of its resources in the error
path (Radhey Shyam Pandey)
* tag 'ata-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: libata-core: Do not call ata_dev_power_set_standby() twice
ata: ahci_ceva: fix error handling for Xilinx GT PHY support
ahci: asm1064: correct count of reported ports
ata: libata-core: Do not try to set sleeping devices to standby
Linus Torvalds [Fri, 23 Feb 2024 17:01:35 +0000 (09:01 -0800)]
Merge tag 'gpio-fixes-for-v6.8-rc6' of git://git./linux/kernel/git/brgl/linux
Pull gpio fix from Bartosz Golaszewski:
- fix a use-case where no pins are mapped to GPIOs in
gpiochip_generic_config()
* tag 'gpio-fixes-for-v6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: Handle no pin_ranges in gpiochip_generic_config()
Linus Torvalds [Fri, 23 Feb 2024 16:58:47 +0000 (08:58 -0800)]
Merge tag 'hwmon-for-v6.8-rc6' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fix from Guenter Roeck:
"Fix a global-out-of-bounds bug in nct6775 driver"
* tag 'hwmon-for-v6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (nct6775) Fix access to temperature configuration registers
Jason Gunthorpe [Thu, 22 Feb 2024 14:07:41 +0000 (10:07 -0400)]
iommu/sva: Restore SVA handle sharing
Prior to commit
092edaddb660 ("iommu: Support mm PASID 1:n with sva
domains") the code allowed a SVA handle to be bound multiple times to the
same (mm, device) pair. This was alluded to in the kdoc comment, but we
had understood this to be more a remark about allowing multiple devices,
not a literal same-driver re-opening the same SVA.
It turns out uacce and idxd were both relying on the core code to handle
reference counting for same-device same-mm scenarios. As this looks hard
to resolve in the drivers bring it back to the core code.
The new design has changed the meaning of the domain->users refcount to
refer to the number of devices that are sharing that domain for the same
mm. This is part of the design to lift the SVA domain de-duplication out
of the drivers.
Return the old behavior by explicitly de-duplicating the struct iommu_sva
handle. The same (mm, device) will return the same handle pointer and the
core code will handle tracking this. The last unbind of the handle will
destroy it.
Fixes: 092edaddb660 ("iommu: Support mm PASID 1:n with sva domains")
Reported-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Closes: https://lore.kernel.org/all/20240221110658.529-1-zhangfei.gao@linaro.org/
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/0-v1-9455fc497a6f+3b4-iommu_sva_sharing_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Joerg Roedel [Fri, 23 Feb 2024 15:43:53 +0000 (16:43 +0100)]
Merge tag 'arm-smmu-fixes' of git://git./linux/kernel/git/will/linux into iommu/fixes
Arm SMMU fixes for 6.8
- Fix CD allocation from atomic context when using SVA with SMMUv3
- Revert the conversion of SMMUv2 to domain_alloc_paging(), as it
breaks the boot for Qualcomm MSM8996 devices
Arnd Bergmann [Fri, 23 Feb 2024 12:54:36 +0000 (13:54 +0100)]
Merge tag 'renesas-fixes-for-v6.8-tag1' of git://git./linux/kernel/git/geert/renesas-devel into arm/fixes
Renesas fixes for v6.8
- Add missing #interrupt-cells to DA9063 nodes.
* tag 'renesas-fixes-for-v6.8-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel:
ARM: dts: renesas: rcar-gen2: Add missing #interrupt-cells to DA9063 nodes
Link: https://lore.kernel.org/r/cover.1708597150.git.geert+renesas@glider.be
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Arnd Bergmann [Fri, 23 Feb 2024 12:54:07 +0000 (13:54 +0100)]
Merge tag 'riscv-dt-fixes-for-v6.8-rc6' of https://git./linux/kernel/git/conor/linux into arm/fixes
RISC-V Devicetree fixes for v6.8-rc6
Two fixes for W=2 issues in devicetrees, which should constitute fixes
for all reasonable-to-fix W=2 problems on RISC-V. The others are caused
by standard USB and MMC property names containing underscores that are
not likely to ever change.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
* tag 'riscv-dt-fixes-for-v6.8-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
riscv: dts: sifive: add missing #interrupt-cells to pmic
riscv: dts: starfive: replace underscores in node names
Link: https://lore.kernel.org/r/20240221-foil-glade-09dbf1aa3fe2@spud
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Arnd Bergmann [Fri, 23 Feb 2024 12:53:54 +0000 (13:53 +0100)]
Merge tag 'riscv-soc-drivers-fixes-for-v6.8-rc6' of https://git./linux/kernel/git/conor/linux into arm/fixes
RISC-V SoC driver fixes for v6.8-rc6
A fix for a kconfig symbol whose help text has been unhelpful since its
introduction.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
* tag 'riscv-soc-drivers-fixes-for-v6.8-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
soc: microchip: Fix POLARFIRE_SOC_SYS_CTRL input prompt
Link: https://lore.kernel.org/r/20240221-irate-outrage-cf7f96f83074@spud
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Arnd Bergmann [Fri, 23 Feb 2024 12:53:43 +0000 (13:53 +0100)]
Merge tag 'riscv-firmware-fixes-for-v6.8-rc6' of https://git./linux/kernel/git/conor/linux into arm/fixes
Microchip firmware driver fixes for v6.8-rc6
A single fix for me incorrectly using sizeof().
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
* tag 'riscv-firmware-fixes-for-v6.8-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
firmware: microchip: fix wrong sizeof argument
Link: https://lore.kernel.org/r/20240221-recognize-dust-4bb575f4e67b@spud
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Arnd Bergmann [Fri, 23 Feb 2024 12:53:30 +0000 (13:53 +0100)]
Merge tag 'riscv-cache-fixes-for-v6.8-rc6' of https://git./linux/kernel/git/conor/linux into arm/fixes
RISC-V Cache driver fixes for v6.8-rc6
A single fix for an inconsistency reported during CIP review by Pavel in
the newly added ax45mp cache driver.
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
* tag 'riscv-cache-fixes-for-v6.8-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux:
cache: ax45mp_cache: Align end size to cache boundary in ax45mp_dma_cache_wback()
Link: https://lore.kernel.org/r/20240221-keenness-handheld-b930aaa77708@spud
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
WANG Xuerui [Fri, 23 Feb 2024 06:36:31 +0000 (14:36 +0800)]
LoongArch: KVM: Streamline kvm_check_cpucfg() and improve comments
All the checks currently done in kvm_check_cpucfg can be realized with
early returns, so just do that to avoid extra cognitive burden related
to the return value handling.
While at it, clean up comments of _kvm_get_cpucfg_mask() and
kvm_check_cpucfg(), by removing comments that are merely restatement of
the code nearby, and paraphrasing the rest so they read more natural for
English speakers (that likely are not familiar with the actual Chinese-
influenced grammar).
No functional changes intended.
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Fri, 23 Feb 2024 06:36:31 +0000 (14:36 +0800)]
LoongArch: KVM: Rename _kvm_get_cpucfg() to _kvm_get_cpucfg_mask()
The function is not actually a getter of guest CPUCFG, but rather
validation of the input CPUCFG ID plus information about the supported
bit flags of that CPUCFG leaf. So rename it to avoid confusion.
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Fri, 23 Feb 2024 06:36:31 +0000 (14:36 +0800)]
LoongArch: KVM: Fix input validation of _kvm_get_cpucfg() & kvm_check_cpucfg()
The range check for the CPUCFG ID is wrong (should have been a ||
instead of &&) and useless in effect, so fix the obvious mistake.
Furthermore, the juggling of the temp return value is unnecessary,
because it is semantically equivalent and more readable to just
return at every switch case's end. This is done too to avoid potential
bugs in the future related to the unwanted complexity.
Also, the return value of _kvm_get_cpucfg is meant to be checked, but
this was not done, so bad CPUCFG IDs wrongly fall back to the default
case and 0 is incorrectly returned; check the return value to fix the
UAPI behavior.
While at it, also remove the redundant range check in kvm_check_cpucfg,
because out-of-range CPUCFG IDs are already rejected by the -EINVAL
as returned by _kvm_get_cpucfg().
Fixes: db1ecca22edf ("LoongArch: KVM: Add LSX (128bit SIMD) support")
Fixes: 118e10cd893d ("LoongArch: KVM: Add LASX (256bit SIMD) support")
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Krzysztof Kozlowski [Fri, 23 Feb 2024 06:36:31 +0000 (14:36 +0800)]
LoongArch: dts: Minor whitespace cleanup
The DTS code coding style expects exactly one space before '{'
character.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Huacai Chen [Fri, 23 Feb 2024 06:36:31 +0000 (14:36 +0800)]
LoongArch: Call early_init_fdt_scan_reserved_mem() earlier
The unflatten_and_copy_device_tree() function contains a call to
memblock_alloc(). This means that memblock is allocating memory before
any of the reserved memory regions are set aside in the arch_mem_init()
function which calls early_init_fdt_scan_reserved_mem(). Therefore,
there is a possibility for memblock to allocate from any of the
reserved memory regions.
Hence, move the call to early_init_fdt_scan_reserved_mem() to be earlier
in the init sequence, so that the reserved memory regions are set aside
before any allocations are done using memblock.
Cc: stable@vger.kernel.org
Fixes: 88d4d957edc707e ("LoongArch: Add FDT booting support from efi system table")
Signed-off-by: Oreoluwa Babatunde <quic_obabatun@quicinc.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Huacai Chen [Fri, 23 Feb 2024 06:36:31 +0000 (14:36 +0800)]
Huacai Chen [Fri, 23 Feb 2024 06:36:31 +0000 (14:36 +0800)]
Dave Airlie [Wed, 24 Jan 2024 04:24:25 +0000 (14:24 +1000)]
nouveau: add an ioctl to report vram usage
This reports the currently used vram allocations.
userspace using this has been proposed for nvk, but
it's a rather trivial uapi addition.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 24 Jan 2024 03:50:58 +0000 (13:50 +1000)]
nouveau: add an ioctl to return vram bar size.
This returns the BAR resources size so userspace can make
decisions based on rebar support.
userspace using this has been proposed for nvk, but
it's a rather trivial uapi addition.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 14 Feb 2024 04:06:32 +0000 (14:06 +1000)]
nouveau/gsp: add kconfig option to enable GSP paths by default
Turing and Ampere will continue to use the old paths by default,
but we should allow distros to decide what the policy is.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240214040632.661069-1-airlied@gmail.com
Dave Airlie [Thu, 22 Feb 2024 23:44:44 +0000 (09:44 +1000)]
Merge tag 'drm-xe-fixes-2024-02-22' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
UAPI Changes:
- Remove support for persistent exec_queues
- Drop a reduntant sysfs newline printout
Cross-subsystem Changes:
Core Changes:
Driver Changes:
- A three-patch fix for a VM_BIND rebind optimization path
- Fix a modpost warning on an xe KUNIT module
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZdcsNrxdWMMM417v@fedora
Dave Airlie [Thu, 22 Feb 2024 22:36:46 +0000 (08:36 +1000)]
Merge tag 'amd-drm-fixes-6.8-2024-02-22' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.8-2024-02-22:
amdgpu:
- Suspend/resume fixes
- Backlight error fix
- DCN 3.5 fixes
- Misc fixes
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240222195338.5809-1-alexander.deucher@amd.com
Dave Airlie [Thu, 22 Feb 2024 22:30:20 +0000 (08:30 +1000)]
Merge tag 'drm-intel-fixes-2024-02-22' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Fixup for TV mode
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZdcwT9kltvEgJZZE@jlahtine-mobl.ger.corp.intel.com
Dave Airlie [Thu, 22 Feb 2024 22:09:45 +0000 (08:09 +1000)]
Merge tag 'drm-misc-fixes-2024-02-22' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
A list handling fix and 64bit division on 32bit platform fix for the
drm/buddy allocator, a cast warning and an initialization fix for
nouveau, a bridge handling fix for meson, an initialisation fix for
ivpu, a SPARC build fix for fbdev, a double-free fix for ttm, and two
fence handling fixes for syncobj.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/gl2antuifidtzn3dfm426p7xwh5fxj23behagwh26owfnosh2w@gqoa7vj5prnh
Linus Torvalds [Thu, 22 Feb 2024 19:57:30 +0000 (11:57 -0800)]
Merge tag 'block-6.8-2024-02-22' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
"Mostly just fixlets for md, but also a sed-opal parsing fix"
* tag 'block-6.8-2024-02-22' of git://git.kernel.dk/linux:
block: sed-opal: handle empty atoms when parsing response
md: Don't suspend the array for interrupted reshape
md: Don't register sync_thread for reshape directly
md: Make sure md_do_sync() will set MD_RECOVERY_DONE
md: Don't ignore read-only array in md_check_recovery()
md: Don't ignore suspended array in md_check_recovery()
md: Fix missing release of 'active_io' for flush
Linus Torvalds [Thu, 22 Feb 2024 19:53:09 +0000 (11:53 -0800)]
Merge tag 'for-linus-iommufd' of git://git./linux/kernel/git/jgg/iommufd
Pull iommufd fixes from Jason Gunthorpe:
- Fix dirty tracking bitmap collection when using reporting bitmaps
that are not neatly aligned to u64's or match the IO page table radix
tree layout.
- Add self tests to cover the cases that were found to be broken.
- Add missing enforcement of invalidation type in the uapi.
- Fix selftest config generation
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd:
selftests/iommu: fix the config fragment
iommufd: Reject non-zero data_type if no data_len is provided
iommufd/iova_bitmap: Consider page offset for the pages to be pinned
iommufd/selftest: Add mock IO hugepages tests
iommufd/selftest: Hugepage mock domain support
iommufd/selftest: Refactor mock_domain_read_and_clear_dirty()
iommufd/selftest: Refactor dirty bitmap tests
iommufd/iova_bitmap: Handle recording beyond the mapped pages
iommufd/selftest: Test u64 unaligned bitmaps
iommufd/iova_bitmap: Switch iova_bitmap::bitmap to an u8 array
iommufd/iova_bitmap: Bounds check mapped::pages access
Linus Torvalds [Thu, 22 Feb 2024 19:47:07 +0000 (11:47 -0800)]
Merge tag 'platform-drivers-x86-v6.8-3' of git://git./linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"Regression fixes:
- Fix INT0002 vGPIO events no longer working after 6.8 ACPI SCI
changes
- AMD-PMF: Fix laptops (e.g. Framework 13 AMD) hanging on suspend
- x86-android-tablets: Fix touchscreen no longer working on Lenovo
Yogabook
- x86-android-tablets: Fix serdev instantiation regression
- intel-vbtn: Fix ThinkPad X1 Tablet Gen2 no longer suspending
Bug fixes:
- think-lmi: Fix changing BIOS settings on Lenovo workstations
- touchscreen_dmi: Fix Hi8 Air touchscreen data sometimes missing
- AMD-PMF: Fix Smart PC support not working after suspend/resume
Other misc small fixes"
* tag 'platform-drivers-x86-v6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: thinkpad_acpi: Only update profile if successfully converted
platform/x86: intel-vbtn: Stop calling "VBDL" from notify_handler
platform/x86: x86-android-tablets: Fix acer_b1_750_goodix_gpios name
platform/x86: x86-android-tablets: Fix serdev instantiation no longer working
platform/x86: Add new get_serdev_controller() helper
platform/x86: x86-android-tablets: Fix keyboard touchscreen on Lenovo Yogabook1 X90
platform/x86/amd/pmf: Fix a potential race with policy binary sideload
platform/x86/amd/pmf: Fixup error handling for amd_pmf_init_smart_pc()
platform/x86/amd/pmf: Add debugging message for missing policy data
platform/x86/amd/pmf: Fix a suspend hang on Framework 13
platform/x86/amd/pmf: Fix TEE enact command failure after suspend and resume
platform/x86/amd/pmf: Remove smart_pc_status enum
platform/x86: touchscreen_dmi: Consolidate Goodix upside-down touchscreen data
platform/x86: touchscreen_dmi: Allow partial (prefix) matches for ACPI names
platform/x86: intel: int0002_vgpio: Pass IRQF_ONESHOT to request_irq()
platform/x86: think-lmi: Fix password opcode ordering for workstations
Linus Torvalds [Thu, 22 Feb 2024 19:44:20 +0000 (11:44 -0800)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"Here are some Samsung clk driver fixes I've been sitting on for far
too long.
They fix the bindings and clk driver for the Google GS101 SoC"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: samsung: clk-gs101: comply with the new dt cmu_misc clock names
dt-bindings: clock: gs101: rename cmu_misc clock-names
Linus Torvalds [Thu, 22 Feb 2024 18:06:29 +0000 (10:06 -0800)]
Merge tag 'vfs-6.8-rc6.fixes' of git://git./linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix a memory leak in cachefiles
- Restrict aio cancellations to I/O submitted through the aio
interfaces as this is otherwise causing issues for I/O submitted
via io_uring
- Increase buffer for afs volume status to avoid overflow
- Fix a missing zero-length check in unbuffered writes in the
netfs library. If generic_write_checks() returns zero make
netfs_unbuffered_write_iter() return right away
- Prevent a leak in i_dio_count caused by netfs_begin_read() operating
past i_size. It will return early and leave i_dio_count incremented
- Account for ipv4 addresses as well as ipv6 addresses when processing
incoming callbacks in afs
* tag 'vfs-6.8-rc6.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaio
afs: Increase buffer size in afs_update_volume_status()
afs: Fix ignored callbacks over ipv4
cachefiles: fix memory leak in cachefiles_add_cache()
netfs: Fix missing zero-length check in unbuffered write
netfs: Fix i_dio_count leak on DIO read past i_size
Linus Torvalds [Thu, 22 Feb 2024 17:57:58 +0000 (09:57 -0800)]
Merge tag 'net-6.8.0-rc6' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf and netfilter.
Current release - regressions:
- af_unix: fix another unix GC hangup
Previous releases - regressions:
- core: fix a possible AF_UNIX deadlock
- bpf: fix NULL pointer dereference in sk_psock_verdict_data_ready()
- netfilter: nft_flow_offload: release dst in case direct xmit path
is used
- bridge: switchdev: ensure MDB events are delivered exactly once
- l2tp: pass correct message length to ip6_append_data
- dccp/tcp: unhash sk from ehash for tb2 alloc failure after
check_estalblished()
- tls: fixes for record type handling with PEEK
- devlink: fix possible use-after-free and memory leaks in
devlink_init()
Previous releases - always broken:
- bpf: fix an oops when attempting to read the vsyscall page through
bpf_probe_read_kernel
- sched: act_mirred: use the backlog for mirred ingress
- netfilter: nft_flow_offload: fix dst refcount underflow
- ipv6: sr: fix possible use-after-free and null-ptr-deref
- mptcp: fix several data races
- phonet: take correct lock to peek at the RX queue
Misc:
- handful of fixes and reliability improvements for selftests"
* tag 'net-6.8.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
l2tp: pass correct message length to ip6_append_data
net: phy: realtek: Fix rtl8211f_config_init() for RTL8211F(D)(I)-VD-CG PHY
selftests: ioam: refactoring to align with the fix
Fix write to cloned skb in ipv6_hop_ioam()
phonet/pep: fix racy skb_queue_empty() use
phonet: take correct lock to peek at the RX queue
net: sparx5: Add spinlock for frame transmission from CPU
net/sched: flower: Add lock protection when remove filter handle
devlink: fix port dump cmd type
net: stmmac: Fix EST offset for dwmac 5.10
tools: ynl: don't leak mcast_groups on init error
tools: ynl: make sure we always pass yarg to mnl_cb_run
net: mctp: put sock on tag allocation failure
netfilter: nf_tables: use kzalloc for hook allocation
netfilter: nf_tables: register hooks last when adding new chain/flowtable
netfilter: nft_flow_offload: release dst in case direct xmit path is used
netfilter: nft_flow_offload: reset dst in route object after setting up flow
netfilter: nf_tables: set dormant flag on hook register failure
selftests: tls: add test for peeking past a record of a different type
selftests: tls: add test for merging of same-type control messages
...
Ma Jun [Wed, 21 Feb 2024 09:16:49 +0000 (17:16 +0800)]
drm/amdgpu: Fix the runtime resume failure issue
Don't set power state flag when system enter runtime suspend,
or it may cause runtime resume failure issue.
Fixes: 3a9626c816db ("drm/amd: Stop evicting resources on APUs in suspend")
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Melissa Wen [Fri, 16 Feb 2024 12:23:19 +0000 (09:23 -0300)]
drm/amd/display: fix null-pointer dereference on edid reading
Use i2c adapter when there isn't aux_mode in dc_link to fix a
null-pointer derefence that happens when running
igt@kms_force_connector_basic in a system with DCN2.1 and HDMI connector
detected as below:
[ +0.178146] BUG: kernel NULL pointer dereference, address:
00000000000004c0
[ +0.000010] #PF: supervisor read access in kernel mode
[ +0.000005] #PF: error_code(0x0000) - not-present page
[ +0.000004] PGD 0 P4D 0
[ +0.000006] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ +0.000006] CPU: 15 PID: 2368 Comm: kms_force_conne Not tainted 6.5.0-asdn+ #152
[ +0.000005] Hardware name: HP HP ENVY x360 Convertible 13-ay1xxx/8929, BIOS F.01 07/14/2021
[ +0.000004] RIP: 0010:i2c_transfer+0xd/0x100
[ +0.000011] Code: ea fc ff ff 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 54 55 53 <48> 8b 47 10 48 89 fb 48 83 38 00 0f 84 b3 00 00 00 83 3d 2f 80 16
[ +0.000004] RSP: 0018:
ffff9c4f89c0fad0 EFLAGS:
00010246
[ +0.000005] RAX:
0000000000000000 RBX:
0000000000000005 RCX:
0000000000000080
[ +0.000003] RDX:
0000000000000002 RSI:
ffff9c4f89c0fb20 RDI:
00000000000004b0
[ +0.000003] RBP:
ffff9c4f89c0fb80 R08:
0000000000000080 R09:
ffff8d8e0b15b980
[ +0.000003] R10:
00000000000380e0 R11:
0000000000000000 R12:
0000000000000080
[ +0.000002] R13:
0000000000000002 R14:
ffff9c4f89c0fb0e R15:
ffff9c4f89c0fb0f
[ +0.000004] FS:
00007f9ad2176c40(0000) GS:
ffff8d90fe9c0000(0000) knlGS:
0000000000000000
[ +0.000003] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ +0.000004] CR2:
00000000000004c0 CR3:
0000000121bc4000 CR4:
0000000000750ee0
[ +0.000003] PKRU:
55555554
[ +0.000003] Call Trace:
[ +0.000006] <TASK>
[ +0.000006] ? __die+0x23/0x70
[ +0.000011] ? page_fault_oops+0x17d/0x4c0
[ +0.000008] ? preempt_count_add+0x6e/0xa0
[ +0.000008] ? srso_alias_return_thunk+0x5/0x7f
[ +0.000011] ? exc_page_fault+0x7f/0x180
[ +0.000009] ? asm_exc_page_fault+0x26/0x30
[ +0.000013] ? i2c_transfer+0xd/0x100
[ +0.000010] drm_do_probe_ddc_edid+0xc2/0x140 [drm]
[ +0.000067] ? srso_alias_return_thunk+0x5/0x7f
[ +0.000006] ? _drm_do_get_edid+0x97/0x3c0 [drm]
[ +0.000043] ? __pfx_drm_do_probe_ddc_edid+0x10/0x10 [drm]
[ +0.000042] edid_block_read+0x3b/0xd0 [drm]
[ +0.000043] _drm_do_get_edid+0xb6/0x3c0 [drm]
[ +0.000041] ? __pfx_drm_do_probe_ddc_edid+0x10/0x10 [drm]
[ +0.000043] drm_edid_read_custom+0x37/0xd0 [drm]
[ +0.000044] amdgpu_dm_connector_mode_valid+0x129/0x1d0 [amdgpu]
[ +0.000153] drm_connector_mode_valid+0x3b/0x60 [drm_kms_helper]
[ +0.000000] __drm_helper_update_and_validate+0xfe/0x3c0 [drm_kms_helper]
[ +0.000000] ? amdgpu_dm_connector_get_modes+0xb6/0x520 [amdgpu]
[ +0.000000] ? srso_alias_return_thunk+0x5/0x7f
[ +0.000000] drm_helper_probe_single_connector_modes+0x2ab/0x540 [drm_kms_helper]
[ +0.000000] status_store+0xb2/0x1f0 [drm]
[ +0.000000] kernfs_fop_write_iter+0x136/0x1d0
[ +0.000000] vfs_write+0x24d/0x440
[ +0.000000] ksys_write+0x6f/0xf0
[ +0.000000] do_syscall_64+0x60/0xc0
[ +0.000000] ? srso_alias_return_thunk+0x5/0x7f
[ +0.000000] ? syscall_exit_to_user_mode+0x2b/0x40
[ +0.000000] ? srso_alias_return_thunk+0x5/0x7f
[ +0.000000] ? do_syscall_64+0x6c/0xc0
[ +0.000000] ? do_syscall_64+0x6c/0xc0
[ +0.000000] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ +0.000000] RIP: 0033:0x7f9ad46b4b00
[ +0.000000] Code: 40 00 48 8b 15 19 b3 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d e1 3a 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
[ +0.000000] RSP: 002b:
00007ffcbd3bd6d8 EFLAGS:
00000202 ORIG_RAX:
0000000000000001
[ +0.000000] RAX:
ffffffffffffffda RBX:
0000000000000000 RCX:
00007f9ad46b4b00
[ +0.000000] RDX:
0000000000000002 RSI:
00007f9ad48a7417 RDI:
0000000000000009
[ +0.000000] RBP:
0000000000000002 R08:
0000000000000064 R09:
0000000000000000
[ +0.000000] R10:
0000000000000000 R11:
0000000000000202 R12:
00007f9ad48a7417
[ +0.000000] R13:
0000000000000009 R14:
00007ffcbd3bd760 R15:
0000000000000001
[ +0.000000] </TASK>
[ +0.000000] Modules linked in: ctr ccm rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device cmac algif_hash algif_skcipher af_alg bnep btusb btrtl btbcm btintel btmtk bluetooth uvcvideo videobuf2_vmalloc sha3_generic videobuf2_memops uvc jitterentropy_rng videobuf2_v4l2 videodev drbg videobuf2_common ansi_cprng mc ecdh_generic ecc qrtr binfmt_misc hid_sensor_accel_3d hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf industrialio snd_ctl_led joydev hid_sensor_iio_common rtw89_8852ae rtw89_8852a rtw89_pci snd_hda_codec_realtek rtw89_core snd_hda_codec_generic intel_rapl_msr ledtrig_audio intel_rapl_common snd_hda_codec_hdmi mac80211 snd_hda_intel snd_intel_dspcfg kvm_amd snd_hda_codec snd_soc_dmic snd_acp3x_rn snd_acp3x_pdm_dma libarc4 snd_hwdep snd_soc_core kvm snd_hda_core cfg80211 snd_pci_acp6x snd_pcm nls_ascii snd_timer hp_wmi snd_pci_acp5x nls_cp437 snd_rn_pci_acp3x ucsi_acpi sparse_keymap ccp snd platform_profile snd_acp_config typec_ucsi irqbypass vfat sp5100_tco
[ +0.000000] snd_soc_acpi fat rapl pcspkr wmi_bmof roles rfkill rng_core snd_pci_acp3x soundcore k10temp watchdog typec battery ac amd_pmc acpi_tad button hid_sensor_hub hid_multitouch evdev serio_raw msr parport_pc ppdev lp parport fuse loop efi_pstore configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic dm_crypt dm_mod efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c crc32c_generic xor raid6_pq raid1 raid0 multipath linear md_mod amdgpu amdxcp i2c_algo_bit drm_ttm_helper ttm crc32_pclmul crc32c_intel drm_exec gpu_sched drm_suballoc_helper nvme ghash_clmulni_intel drm_buddy drm_display_helper sha512_ssse3 nvme_core ahci xhci_pci sha512_generic hid_generic xhci_hcd libahci rtsx_pci_sdmmc t10_pi i2c_hid_acpi drm_kms_helper i2c_hid mmc_core libata aesni_intel crc64_rocksoft_generic crypto_simd amd_sfh crc64_rocksoft scsi_mod usbcore cryptd crc_t10dif cec drm crct10dif_generic hid rtsx_pci crct10dif_pclmul scsi_common rc_core crc64 i2c_piix4
[ +0.000000] usb_common crct10dif_common video wmi
[ +0.000000] CR2:
00000000000004c0
[ +0.000000] ---[ end trace
0000000000000000 ]---
Fixes: 0e859faf8670 ("drm/amd/display: Remove unwanted drm edid references")
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Armin Wolf [Tue, 13 Feb 2024 00:50:50 +0000 (01:50 +0100)]
drm/amd/display: Fix memory leak in dm_sw_fini()
After destroying dmub_srv, the memory associated with it is
not freed, causing a memory leak:
unreferenced object 0xffff896302b45800 (size 1024):
comm "(udev-worker)", pid 222, jiffies
4294894636
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace (crc
6265fd77):
[<
ffffffff993495ed>] kmalloc_trace+0x29d/0x340
[<
ffffffffc0ea4a94>] dm_dmub_sw_init+0xb4/0x450 [amdgpu]
[<
ffffffffc0ea4e55>] dm_sw_init+0x15/0x2b0 [amdgpu]
[<
ffffffffc0ba8557>] amdgpu_device_init+0x1417/0x24e0 [amdgpu]
[<
ffffffffc0bab285>] amdgpu_driver_load_kms+0x15/0x190 [amdgpu]
[<
ffffffffc0ba09c7>] amdgpu_pci_probe+0x187/0x4e0 [amdgpu]
[<
ffffffff9968fd1e>] local_pci_probe+0x3e/0x90
[<
ffffffff996918a3>] pci_device_probe+0xc3/0x230
[<
ffffffff99805872>] really_probe+0xe2/0x480
[<
ffffffff99805c98>] __driver_probe_device+0x78/0x160
[<
ffffffff99805daf>] driver_probe_device+0x1f/0x90
[<
ffffffff9980601e>] __driver_attach+0xce/0x1c0
[<
ffffffff99803170>] bus_for_each_dev+0x70/0xc0
[<
ffffffff99804822>] bus_add_driver+0x112/0x210
[<
ffffffff99807245>] driver_register+0x55/0x100
[<
ffffffff990012d1>] do_one_initcall+0x41/0x300
Fix this by freeing dmub_srv after destroying it.
Fixes: 743b9786b14a ("drm/amd/display: Hook up the DMUB service in DM")
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>