Alexander Gordeev [Sun, 5 May 2024 10:47:10 +0000 (12:47 +0200)]
Revert "s390: Relocate vmlinux ELF data to virtual address space"
This reverts commit
9ecaa2e94e602a3cbcbfe182535f6297f7630b98.
In case CONFIG_MODULES kernel option is not defined the build fails
with the following linker error:
block/partitions/ibm.o: in function `ibm_partition':
ibm.c:(.text+0x8bc): relocation truncated to fit: R_390_PLT32DBL against undefined symbol `dasd_biodasdinfo'
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Alexander Gordeev [Thu, 2 May 2024 20:02:25 +0000 (22:02 +0200)]
Merge branch 'shared-zeropage' into features
David Hildenbrand says:
===================
This series fixes one issue with uffd + shared zeropages on s390x and
fixes that "ordinary" KVM guests can make use of shared zeropages again.
userfaultfd could currently end up mapping shared zeropages into processes
that forbid shared zeropages. This only apples to s390x, relevant for
handling PV guests and guests that use storage kets correctly. Fix it
by placing a zeroed folio instead of the shared zeropage during
UFFDIO_ZEROPAGE instead.
I stumbled over this issue while looking into a customer scenario that
is using:
(1) Memory ballooning for dynamic resizing. Start a VM with, say, 100 GiB
and inflate the balloon during boot to 60 GiB. The VM has ~40 GiB
available and additional memory can be "fake hotplugged" to the VM
later on demand by deflating the balloon. Actual memory overcommit is
not desired, so physical memory would only be moved between VMs.
(2) Live migration of VMs between sites to evacuate servers in case of
emergency.
Without the shared zeropage, during (2), the VM would suddenly consume
100 GiB on the migration source and destination. On the migration source,
where we don't excpect memory overcommit, we could easilt end up crashing
the VM during migration.
Independent of that, memory handed back to the hypervisor using "free page
reporting" would end up consuming actual memory after the migration on the
destination, not getting freed up until reused+freed again.
While there might be ways to optimize parts of this in QEMU, we really
should just support the shared zeropage again for ordinary VMs.
We only expect legcy guests to make use of storage keys, so let's handle
zeropages again when enabling storage keys or when enabling PV. To not
break userfaultfd like we did in the past, don't zap the shared zeropages,
but instead trigger unsharing faults, just like we do for unsharing
KSM pages in break_ksm().
Unsharing faults will simply replace the shared zeropage by a zeroed
anonymous folio. We can already trigger the same fault path using GUP,
when trying to long-term pin a shared zeropage, but also when unmerging
a KSM-placed zeropages, so this is nothing new.
Patch #1 tested on 86-64 by forcing mm_forbids_zeropage() to be 1, and
running the uffd selftests.
Patch #2 tested on s390x: the live migration scenario now works as
expected, and kvm-unit-tests that trigger usage of skeys work well, whereby
I can see detection and unsharing of shared zeropages.
Further (as broken in v2), I tested that the shared zeropage is no
longer populated after skeys are used -- that mm_forbids_zeropage() works
as expected:
./s390x-run s390x/skey.elf \
-no-shutdown \
-chardev socket,id=monitor,path=/var/tmp/mon,server,nowait \
-mon chardev=monitor,mode=readline
Then, in another shell:
# cat /proc/`pgrep qemu`/smaps_rollup | grep Rss
Rss: 31484 kB
# echo "dump-guest-memory tmp" | sudo nc -U /var/tmp/mon
...
# cat /proc/`pgrep qemu`/smaps_rollup | grep Rss
Rss: 160452 kB
-> Reading guest memory does not populate the shared zeropage
Doing the same with selftest.elf (no skeys)
# cat /proc/`pgrep qemu`/smaps_rollup | grep Rss
Rss: 30900 kB
# echo "dump-guest-memory tmp" | sudo nc -U /var/tmp/mon
...
# cat /proc/`pgrep qemu`/smaps_rollup | grep Rsstmp/mon
Rss: 30924 kB
-> Reading guest memory does populate the shared zeropage
===================
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Nina Schoetterl-Glausch [Mon, 29 Apr 2024 17:15:12 +0000 (19:15 +0200)]
KVM: s390: vsie: Use virt_to_phys for crypto control block
The address of the crypto control block in the (shadow) SIE block is
absolute/physical.
Convert from virtual to physical when shadowing the guest's control
block during VSIE.
Signed-off-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20240429171512.879215-1-nsg@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Alexander Gordeev [Tue, 30 Apr 2024 15:58:51 +0000 (17:58 +0200)]
s390: Relocate vmlinux ELF data to virtual address space
Currently kernel image relocation tables and other ELF
data are set to base zero. Since kernel virtual and
physical address spaces are uncoupled the kernel is
mapped at the top of the virtual address space, hence
making the information contained in vmlinux ELF tables
inconsistent.
That does not pose any issue with regard to the kernel
booting and operation, but makes it difficult to use a
generated vmlinux with some debugging tools (e.g. gdb).
Relocate vmlinux image base address from zero to a base
address in the virtual address space. It is the address
that kernel is mapped to in cases KASLR is disabled.
The vmlinux ELF header before and after this change looks
like this:
Elf file type is EXEC (Executable file)
Entry point 0x100000
There are 3 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000001000 0x0000000000100000 0x0000000000100000
0x0000000001323378 0x0000000001323378 R E 0x1000
LOAD 0x0000000001325000 0x0000000001424000 0x0000000001424000
0x00000000003a4200 0x000000000048fdb8 RWE 0x1000
NOTE 0x00000000012a33b0 0x00000000013a23b0 0x00000000013a23b0
0x0000000000000054 0x0000000000000054 0x4
Elf file type is EXEC (Executable file)
Entry point 0x3ffe0000000
There are 3 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000001000 0x000003ffe0000000 0x000003ffe0000000
0x0000000001323378 0x0000000001323378 R E 0x1000
LOAD 0x0000000001325000 0x000003ffe1324000 0x000003ffe1324000
0x00000000003a4200 0x000000000048fdb8 RWE 0x1000
NOTE 0x00000000012a33b0 0x000003ffe12a23b0 0x000003ffe12a23b0
0x0000000000000054 0x0000000000000054 0x4
Suggested-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Sumanth Korikkar [Thu, 25 Apr 2024 14:59:31 +0000 (16:59 +0200)]
s390: Compile kernel with -fPIC and link with -no-pie
When the kernel is built with CONFIG_PIE_BUILD option enabled it
uses dynamic symbols, for which the linker does not allow more
than 64K number of entries. This can break features like kpatch.
Hence, whenever possible the kernel is built with CONFIG_PIE_BUILD
option disabled. For that support of unaligned symbols generated by
linker scripts in the compiler is necessary.
However, older compilers might lack such support. In that case the
build process resorts to CONFIG_PIE_BUILD option-enabled build.
Compile object files with -fPIC option and then link the kernel
binary with -no-pie linker option.
As result, the dynamic symbols are not generated and not only kpatch
feature succeeds, but also the whole CONFIG_PIE_BUILD option-enabled
code could be dropped.
[ agordeev: Reworded the commit message ]
Suggested-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Sumanth Korikkar [Thu, 25 Apr 2024 14:59:30 +0000 (16:59 +0200)]
s390: vmlinux.lds.S: Drop .hash and .gnu.hash for !CONFIG_PIE_BUILD
Sections .hash and .gnu.hash are only created when CONFIG_PIE_BUILD
option is enabled. Drop these for the case CONFIG_PIE_BUILD is disabled.
[ agordeev: Reworded the commit message ]
Fixes: 778666df60f0 ("s390: compile relocatable kernel without -fPIE")
Suggested-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Sven Schnelle [Fri, 26 Apr 2024 10:02:15 +0000 (12:02 +0200)]
s390/ftrace: Use unwinder instead of __builtin_return_address()
Using __builtin_return_address(n) might return undefined values
when used with values of n outside of the stack. This was noticed
when __builtin_return_address() was called in ftrace on top level
functions like the interrupt handlers.
As this behaviour cannot be fixed, use the s390 stack unwinder and
remove the ftrace compilation flags for unwind_bc.c and stacktrace.c
to prevent the unwinding function polluting function traces.
Another advantage is that this also works with clang.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Jean Delvare [Tue, 23 Apr 2024 14:27:24 +0000 (16:27 +0200)]
s390/pci: Drop unneeded reference to CONFIG_DMI
The S/390 architecture doesn't support SMBIOS, so CONFIG_DMI will
never be defined there. So we can simply omit these preprocessing
directives and speed up the build a bit.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Niklas Schnelle <schnelle@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Acked-by: Niklas Schnelle <schnelle@linux.ibm.com>
Link: https://lore.kernel.org/r/20240423162724.3966265a@endymion.delvare
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Sven Schnelle [Fri, 26 Apr 2024 06:02:06 +0000 (08:02 +0200)]
s390/os_info: Fix array size in struct os_info
gcc's -Warray-bounds warned about an out-of-bounds access to
the entry array contained in struct os_info. This doesn't trigger
a bug right now because there's a large reserved space after the
array. Nevertheless fix this, and also add a BUILD_BUG_ON to make
sure struct os_info is always exactly on page in size.
Fixes: f4cac27dc0d6 ("s390/crash: Use old os_info to create PT_LOAD headers")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Alexander Egorenkov [Tue, 23 Apr 2024 09:42:05 +0000 (11:42 +0200)]
s390/os_info: Initialize old os_info in standalone dump kernel
The commit
be42660d0c13 ("s390/crash: use old os_info to create PT_LOAD headers")
introduced use of the old os_info into standalone dump kernel.
Before this change os_info_old_init() expected to be called only from
a regular kdump kernel although the function itself is able to work
in standalone dump kernels as well (because copy_oldmem_kernel() is able
to handle both use cases). Therefore, fix the expectation of os_info_old_init()
and enable it to be called from a standalone dump kernel.
Fixes: f4cac27dc0d6 ("s390/crash: Use old os_info to create PT_LOAD headers")
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Jason J. Herne [Mon, 15 Apr 2024 15:25:55 +0000 (11:25 -0400)]
docs: Update s390 vfio-ap doc for ap_config sysfs attribute
A new sysfs attribute, ap_config, for the vfio_ap driver is
documented.
Signed-off-by: "Jason J. Herne" <jjherne@linux.ibm.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Link: https://lore.kernel.org/r/20240415152555.13152-6-jjherne@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Jason J. Herne [Mon, 15 Apr 2024 15:25:54 +0000 (11:25 -0400)]
s390/vfio-ap: Add write support to sysfs attr ap_config
Allow writing a complete set of masks to ap_config. Doing so will
cause the vfio-ap driver to replace the vfio-ap mediated device's
matrix masks with the given set of masks. If the given state cannot
be set, then no changes are made to the vfio-ap mediated device.
The format of the data written to ap_config is as follows:
{amask},{dmask},{cmask}\n
\n is a newline character.
amask, dmask, and cmask are masks identifying which adapters, domains,
and control domains should be assigned to the mediated device.
The format of a mask is as follows:
0xNN..NN
Where NN..NN is 64 hexadecimal characters representing a 256-bit value.
The leftmost (highest order) bit represents adapter/domain 0.
For an example set of masks that represent your mdev's current
configuration, simply cat ap_config.
This attribute is intended to be used by an mdevctl callout script
supporting the mdev type vfio_ap-passthrough to atomically update a
vfio-ap mediated device's state.
Signed-off-by: "Jason J. Herne" <jjherne@linux.ibm.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/20240415152555.13152-5-jjherne@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Jason J. Herne [Mon, 15 Apr 2024 15:25:53 +0000 (11:25 -0400)]
s390/vfio-ap: Ignore duplicate link requests in vfio_ap_mdev_link_queue
vfio_ap_mdev_link_queue is changed to detect if a matrix_mdev has
already linked the given queue. If so, it bails out.
Signed-off-by: "Jason J. Herne" <jjherne@linux.ibm.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Link: https://lore.kernel.org/r/20240415152555.13152-4-jjherne@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Jason J. Herne [Mon, 15 Apr 2024 15:25:52 +0000 (11:25 -0400)]
s390/vfio-ap: Add sysfs attr, ap_config, to export mdev state
Add ap_config sysfs attribute. This will provide the means for
setting or displaying the adapters, domains and control domains assigned
to the vfio-ap mediated device in a single operation. This sysfs
attribute is comprised of three masks: One for adapters, one for domains,
and one for control domains.
This attribute is intended to be used by mdevctl to query a vfio-ap
mediated device's state.
Signed-off-by: "Jason J. Herne" <jjherne@linux.ibm.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Link: https://lore.kernel.org/r/20240415152555.13152-3-jjherne@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Jason J. Herne [Mon, 15 Apr 2024 15:25:51 +0000 (11:25 -0400)]
s390/ap: Externalize AP bus specific bitmap reading function
Rename hex2bitmap() to ap_hex2bitmap() and export it for external
use. This function will be used by the implementation of the vfio-ap
ap_config sysfs attribute.
Signed-off-by: "Jason J. Herne" <jjherne@linux.ibm.com>
Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Link: https://lore.kernel.org/r/20240415152555.13152-2-jjherne@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
David Hildenbrand [Thu, 11 Apr 2024 16:14:41 +0000 (18:14 +0200)]
s390/mm: Re-enable the shared zeropage for !PV and !skeys KVM guests
commit
fa41ba0d08de ("s390/mm: avoid empty zero pages for KVM guests to
avoid postcopy hangs") introduced an undesired side effect when combined
with memory ballooning and VM migration: memory part of the inflated
memory balloon will consume memory.
Assuming we have a 100GiB VM and inflated the balloon to 40GiB. Our VM
will consume ~60GiB of memory. If we now trigger a VM migration,
hypervisors like QEMU will read all VM memory. As s390x does not support
the shared zeropage, we'll end up allocating for all previously-inflated
memory part of the memory balloon: 50 GiB. So we might easily
(unexpectedly) crash the VM on the migration source.
Even worse, hypervisors like QEMU optimize for zeropage migration to not
consume memory on the migration destination: when migrating a
"page full of zeroes", on the migration destination they check whether the
target memory is already zero (by reading the destination memory) and avoid
writing to the memory to not allocate memory: however, s390x will also
allocate memory here, implying that also on the migration destination, we
will end up allocating all previously-inflated memory part of the memory
balloon.
This is especially bad if actual memory overcommit was not desired, when
memory ballooning is used for dynamic VM memory resizing, setting aside
some memory during boot that can be added later on demand. Alternatives
like virtio-mem that would avoid this issue are not yet available on
s390x.
There could be ways to optimize some cases in user space: before reading
memory in an anonymous private mapping on the migration source, check via
/proc/self/pagemap if anything is already populated. Similarly check on
the migration destination before reading. While that would avoid
populating tables full of shared zeropages on all architectures, it's
harder to get right and performant, and requires user space changes.
Further, with posctopy live migration we must place a page, so there,
"avoid touching memory to avoid allocating memory" is not really
possible. (Note that a previously we would have falsely inserted
shared zeropages into processes using UFFDIO_ZEROPAGE where
mm_forbids_zeropage() would have actually forbidden it)
PV is currently incompatible with memory ballooning, and in the common
case, KVM guests don't make use of storage keys. Instead of zapping
zeropages when enabling storage keys / PV, that turned out to be
problematic in the past, let's do exactly the same we do with KSM pages:
trigger unsharing faults to replace the shared zeropages by proper
anonymous folios.
What about added latency when enabling storage kes? Having a lot of
zeropages in applicable environments (PV, legacy guests, unittests) is
unexpected. Further, KSM could today already unshare the zeropages
and unmerging KSM pages when enabling storage kets would unshare the
KSM-placed zeropages in the same way, resulting in the same latency.
[ agordeev: Fixed sparse and checkpatch complaints and error handling ]
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Fixes: fa41ba0d08de ("s390/mm: avoid empty zero pages for KVM guests to avoid postcopy hangs")
Signed-off-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20240411161441.910170-3-david@redhat.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
David Hildenbrand [Thu, 11 Apr 2024 16:14:40 +0000 (18:14 +0200)]
mm/userfaultfd: Do not place zeropages when zeropages are disallowed
s390x must disable shared zeropages for processes running VMs, because
the VMs could end up making use of "storage keys" or protected
virtualization, which are incompatible with shared zeropages.
Yet, with userfaultfd it is possible to insert shared zeropages into
such processes. Let's fallback to simply allocating a fresh zeroed
anonymous folio and insert that instead.
mm_forbids_zeropage() was introduced in commit
593befa6ab74 ("mm: introduce
mm_forbids_zeropage function"), briefly before userfaultfd went
upstream.
Note that we don't want to fail the UFFDIO_ZEROPAGE request like we do
for hugetlb, it would be rather unexpected. Further, we also
cannot really indicated "not supported" to user space ahead of time: it
could be that the MM disallows zeropages after userfaultfd was already
registered.
[ agordeev: Fixed checkpatch complaints ]
Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation")
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240411161441.910170-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Vasily Gorbik [Wed, 17 Jan 2024 10:50:49 +0000 (11:50 +0100)]
s390/expoline: Make modules use kernel expolines
Currently, kernel modules contain their own set of expoline thunks. In
the case of EXPOLINE_EXTERN, this involves postlinking of precompiled
expoline.o. expoline.o is also necessary for out-of-source tree module
builds.
Now that the kernel modules area is less than 4 GB away from
kernel expoline thunks, make modules use kernel expolines. Also make
EXPOLINE_EXTERN the default if the compiler supports it. This simplifies
build and aligns with the approach adopted by other architectures.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Vasily Gorbik [Wed, 17 Jan 2024 10:50:46 +0000 (11:50 +0100)]
s390/nospec: Correct modules thunk offset calculation
Fix offset calculation when branch target is more then 2Gb away.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Alexander Gordeev [Mon, 26 Feb 2024 10:47:40 +0000 (11:47 +0100)]
s390/boot: Do not rescue .vmlinux.relocs section
The .vmlinux.relocs section is moved in front of the compressed
kernel. The interim section rescue step is avoided as result.
Suggested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Alexander Gordeev [Fri, 22 Mar 2024 13:39:57 +0000 (14:39 +0100)]
s390/boot: Rework deployment of the kernel image
Rework deployment of kernel image for both compressed and
uncompressed variants as defined by CONFIG_KERNEL_UNCOMPRESSED
kernel configuration variable.
In case CONFIG_KERNEL_UNCOMPRESSED is disabled avoid uncompressing
the kernel to a temporary buffer and copying it to the target
address. Instead, uncompress it directly to the target destination.
In case CONFIG_KERNEL_UNCOMPRESSED is enabled avoid moving the
kernel to default 0x100000 location when KASLR is disabled or
failed. Instead, use the uncompressed kernel image directly.
In case KASLR is disabled or failed .amode31 section location in
memory is not randomized and precedes the kernel image. In case
CONFIG_KERNEL_UNCOMPRESSED is disabled that location overlaps the
area used by the decompression algorithm. That is fine, since that
area is not used after the decompression finished and the size of
.amode31 section is not expected to exceed BOOT_HEAP_SIZE ever.
There is no decompression in case CONFIG_KERNEL_UNCOMPRESSED is
enabled. Therefore, rename decompress_kernel() to deploy_kernel(),
which better describes both uncompressed and compressed cases.
Introduce AMODE31_SIZE macro to avoid immediate value of 0x3000
(the size of .amode31 section) in the decompressor linker script.
Modify the vmlinux linker script to force the size of .amode31
section to AMODE31_SIZE (the value of (_eamode31 - _samode31)
could otherwise differ as result of compiler options used).
Introduce __START_KERNEL macro that defines the kernel ELF image
entry point and set it to the currrent value of 0x100000.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Alexander Gordeev [Tue, 26 Sep 2023 13:58:51 +0000 (15:58 +0200)]
s390: Map kernel at fixed location when KASLR is disabled
Since kernel virtual and physical address spaces are
uncoupled the kernel is mapped at the top of the virtual
address space in case KASLR is disabled.
That does not pose any issue with regard to the kernel
booting and operation, but makes it difficult to use a
generated vmlinux with some debugging tools (e.g. gdb),
because the exact location of the kernel image in virtual
memory is unknown. Make that location known and introduce
CONFIG_KERNEL_IMAGE_BASE configuration option.
A custom CONFIG_KERNEL_IMAGE_BASE value that would break
the virtual memory layout leads to a build error.
The kernel image size is defined by KERNEL_IMAGE_SIZE
macro and set to 512 MB, by analogy with x86.
Suggested-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Alexander Gordeev [Fri, 1 Mar 2024 06:15:22 +0000 (07:15 +0100)]
s390/mm: Uncouple physical vs virtual address spaces
The uncoupling physical vs virtual address spaces brings
the following benefits to s390:
- virtual memory layout flexibility;
- closes the address gap between kernel and modules, it
caused s390-only problems in the past (e.g. 'perf' bugs);
- allows getting rid of trampolines used for module calls
into kernel;
- allows simplifying BPF trampoline;
- minor performance improvement in branch prediction;
- kernel randomization entropy is magnitude bigger, as it is
derived from the amount of available virtual, not physical
memory;
The whole change could be described in two pictures below:
before and after the change.
Some aspects of the virtual memory layout setup are not
clarified (number of page levels, alignment, DMA memory),
since these are not a part of this change or secondary
with regard to how the uncoupling itself is implemented.
The focus of the pictures is to explain why __va() and __pa()
macros are implemented the way they are.
Memory layout in V==R mode:
| Physical | Virtual |
+- 0 --------------+- 0 --------------+ identity mapping start
| | S390_lowcore | Low-address memory
| +- 8 KB -----------+
| | |
| | identity | phys == virt
| | mapping | virt == phys
| | |
+- AMODE31_START --+- AMODE31_START --+ .amode31 rand. phys/virt start
|.amode31 text/data|.amode31 text/data|
+- AMODE31_END ----+- AMODE31_END ----+ .amode31 rand. phys/virt start
| | |
| | |
+- __kaslr_offset, __kaslr_offset_phys| kernel rand. phys/virt start
| | |
| kernel text/data | kernel text/data | phys == kvirt
| | |
+------------------+------------------+ kernel phys/virt end
| | |
| | |
| | |
| | |
+- ident_map_size -+- ident_map_size -+ identity mapping end
| |
| ... unused gap |
| |
+---- vmemmap -----+ 'struct page' array start
| |
| virtually mapped |
| memory map |
| |
+- __abs_lowcore --+
| |
| Absolute Lowcore |
| |
+- __memcpy_real_area
| |
| Real Memory Copy|
| |
+- VMALLOC_START --+ vmalloc area start
| |
| vmalloc area |
| |
+- MODULES_VADDR --+ modules area start
| |
| modules area |
| |
+------------------+ UltraVisor Secure Storage limit
| |
| ... unused gap |
| |
+KASAN_SHADOW_START+ KASAN shadow memory start
| |
| KASAN shadow |
| |
+------------------+ ASCE limit
Memory layout in V!=R mode:
| Physical | Virtual |
+- 0 --------------+- 0 --------------+
| | S390_lowcore | Low-address memory
| +- 8 KB -----------+
| | |
| | |
| | ... unused gap |
| | |
+- AMODE31_START --+- AMODE31_START --+ .amode31 rand. phys/virt start
|.amode31 text/data|.amode31 text/data|
+- AMODE31_END ----+- AMODE31_END ----+ .amode31 rand. phys/virt end (<2GB)
| | |
| | |
+- __kaslr_offset_phys | kernel rand. phys start
| | |
| kernel text/data | |
| | |
+------------------+ | kernel phys end
| | |
| | |
| | |
| | |
+- ident_map_size -+ |
| |
| ... unused gap |
| |
+- __identity_base + identity mapping start (>= 2GB)
| |
| identity | phys == virt - __identity_base
| mapping | virt == phys + __identity_base
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
+---- vmemmap -----+ 'struct page' array start
| |
| virtually mapped |
| memory map |
| |
+- __abs_lowcore --+
| |
| Absolute Lowcore |
| |
+- __memcpy_real_area
| |
| Real Memory Copy|
| |
+- VMALLOC_START --+ vmalloc area start
| |
| vmalloc area |
| |
+- MODULES_VADDR --+ modules area start
| |
| modules area |
| |
+- __kaslr_offset -+ kernel rand. virt start
| |
| kernel text/data | phys == (kvirt - __kaslr_offset) +
| | __kaslr_offset_phys
+- kernel .bss end + kernel rand. virt end
| |
| ... unused gap |
| |
+------------------+ UltraVisor Secure Storage limit
| |
| ... unused gap |
| |
+KASAN_SHADOW_START+ KASAN shadow memory start
| |
| KASAN shadow |
| |
+------------------+ ASCE limit
Unused gaps in the virtual memory layout could be present
or not - depending on how partucular system is configured.
No page tables are created for the unused gaps.
The relative order of vmalloc, modules and kernel image in
virtual memory is defined by following considerations:
- start of the modules area and end of the kernel should reside
within 4GB to accommodate relative 32-bit jumps. The best way
to achieve that is to place kernel next to modules;
- vmalloc and module areas should locate next to each other
to prevent failures and extra reworks in user level tools
(makedumpfile, crash, etc.) which treat vmalloc and module
addresses similarily;
- kernel needs to be the last area in the virtual memory
layout to easily distinguish between kernel and non-kernel
virtual addresses. That is needed to (again) simplify
handling of addresses in user level tools and make __pa()
macro faster (see below);
Concluding the above, the relative order of the considered
virtual areas in memory is: vmalloc - modules - kernel.
Therefore, the only change to the current memory layout is
moving kernel to the end of virtual address space.
With that approach the implementation of __pa() macro is
straightforward - all linear virtual addresses less than
kernel base are considered identity mapping:
phys == virt - __identity_base
All addresses greater than kernel base are kernel ones:
phys == (kvirt - __kaslr_offset) + __kaslr_offset_phys
By contrast, __va() macro deals only with identity mapping
addresses:
virt == phys + __identity_base
.amode31 section is mapped separately and is not covered by
__pa() macro. In fact, it could have been handled easily by
checking whether a virtual address is within the section or
not, but there is no need for that. Thus, let __pa() code
do as little machine cycles as possible.
The KASAN shadow memory is located at the very end of the
virtual memory layout, at addresses higher than the kernel.
However, that is not a linear mapping and no code other than
KASAN instrumentation or API is expected to access it.
When KASLR mode is enabled the kernel base address randomized
within a memory window that spans whole unused virtual address
space. The size of that window depends from the amount of
physical memory available to the system, the limit imposed by
UltraVisor (if present) and the vmalloc area size as provided
by vmalloc= kernel command line parameter.
In case the virtual memory is exhausted the minimum size of
the randomization window is forcefully set to 2GB, which
amounts to in 15 bits of entropy if KASAN is enabled or 17
bits of entropy in default configuration.
The default kernel offset 0x100000 is used as a magic value
both in the decompressor code and vmlinux linker script, but
it will be removed with a follow-up change.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Alexander Gordeev [Sat, 30 Sep 2023 08:37:49 +0000 (10:37 +0200)]
s390/crash: Use old os_info to create PT_LOAD headers
This is a preparatory rework to allow uncoupling virtual
and physical addresses spaces.
The vmcore ELF program headers describe virtual memory
regions of a crashed kernel. User level tools use that
information for the kernel text and data analysis (e.g
vmcore-dmesg extracts the kernel log).
Currently the kernel image is covered by program headers
describing the identity mapping regions. But in the future
the kernel image will be mapped into separate region outside
of the identity mapping. Create the additional ELF program
header that covers kernel image only, so that vmcore tools
could locate kernel text and data.
Further, the identity mapping in crashed and capture kernels
will have different base address. Due to that __va() macro
can not be used in the capture kernel. Instead, read crashed
kernel identity mapping base address from os_info and use
it for PT_LOAD type program headers creation.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Alexander Gordeev [Fri, 11 Aug 2023 08:10:53 +0000 (10:10 +0200)]
s390/vmcoreinfo: Store virtual memory layout
This is a preparatory rework to allow uncoupling virtual
and physical addresses spaces.
The virtual memory layout is needed for address translation
by crash tool when /proc/kcore device is used as the memory
image.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Alexander Gordeev [Fri, 11 Aug 2023 08:07:31 +0000 (10:07 +0200)]
s390/os_info: Store virtual memory layout
This is a preparatory rework to allow uncoupling virtual
and physical addresses spaces.
The virtual memory layout will be read out by makedumpfile,
crash and other user tools for virtual address translation.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Alexander Gordeev [Thu, 27 Jan 2022 13:28:49 +0000 (14:28 +0100)]
s390/os_info: Introduce value entries
Introduce entries that do not reference any data in memory,
but rather provide values. Set the size of such entries to
zero and do not compute checksum for them, since there is no
data which integrity needs to be checked. The integrity of
the value entries itself is still covered by the os_info
checksum.
Reserve the lowest unused entry index OS_INFO_RESERVED for
future use - presumably for the number of entries present.
That could later be used by user level tools. The existing
tools would not notice any difference.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Alexander Gordeev [Fri, 11 Aug 2023 07:49:27 +0000 (09:49 +0200)]
s390/boot: Make .amode31 section address range explicit
This is a preparatory rework to allow uncoupling virtual
and physical addresses spaces.
Introduce .amode31 section address range AMODE31_START
and AMODE31_END macros for later use.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Alexander Gordeev [Thu, 10 Aug 2023 19:40:19 +0000 (21:40 +0200)]
s390/boot: Make identity mapping base address explicit
This is a preparatory rework to allow uncoupling virtual
and physical addresses spaces.
Currently the identity mapping base address is implicit
and is always set to zero. Make it explicit by putting
into __identity_base persistent boot variable and use it
in proper context - which is the value of PAGE_OFFSET.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Alexander Gordeev [Tue, 20 Feb 2024 13:35:43 +0000 (14:35 +0100)]
s390/boot: Uncouple virtual and physical kernel offsets
This is a preparatory rework to allow uncoupling virtual
and physical addresses spaces.
Currently __kaslr_offset is the kernel offset in both
physical memory on boot and in virtual memory after DAT
mode is enabled.
Uncouple these offsets and rename the physical address
space variant to __kaslr_offset_phys while keep the name
__kaslr_offset for the offset in virtual address space.
Do not use __kaslr_offset_phys after DAT mode is enabled
just yet, but still make it a persistent boot variable
for later use.
Use __kaslr_offset and __kaslr_offset_phys offsets in
proper contexts and alter handle_relocs() function to
distinguish between the two.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Alexander Gordeev [Sat, 2 Dec 2023 09:57:15 +0000 (10:57 +0100)]
s390/mm: Create virtual memory layout structure
This is a preparatory rework to allow uncoupling virtual
and physical addresses spaces.
Put virtual memory layout information into a structure
to improve code generation when accessing the structure
members, which are currently only ident_map_size and
__kaslr_offset.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Alexander Gordeev [Sat, 2 Dec 2023 07:50:45 +0000 (08:50 +0100)]
s390/mm: Move KASLR related to <asm/page.h>
Move everyting KASLR related to <asm/page.h>,
similarly to many other architectures.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Alexander Gordeev [Fri, 1 Mar 2024 06:05:39 +0000 (07:05 +0100)]
s390/boot: Swap vmalloc and Lowcore/Real Memory Copy areas
This is a preparatory rework to allow uncoupling virtual
and physical addresses spaces.
Currently the order of virtual memory areas is (the lowcore
and .amode31 section are skipped, as it is irrelevant):
identity mapping (the kernel is contained within)
vmemmap
vmalloc
modules
Absolute Lowcore
Real Memory Copy
In the future the kernel will be mapped separately and placed
to the end of the virtual address space, so the layout would
turn like this:
identity mapping
vmemmap
vmalloc
modules
Absolute Lowcore
Real Memory Copy
kernel
However, the distance between kernel and modules needs to be as
little as possible, ideally - none. Thus, the Absolute Lowcore
and Real Memory Copy areas would stay in the way and therefore
need to be moved as well:
identity mapping
vmemmap
Absolute Lowcore
Real Memory Copy
vmalloc
modules
kernel
To facilitate such layout swap the vmalloc and Absolute Lowcore
together with Real Memory Copy areas. As result, the current
layout turns into:
identity mapping (the kernel is contained within)
vmemmap
Absolute Lowcore
Real Memory Copy
vmalloc
modules
This will allow to locate the kernel directly next to the
modules once it gets mapped separately.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Alexander Gordeev [Tue, 11 Jul 2023 05:58:24 +0000 (07:58 +0200)]
s390/boot: Reduce size of identity mapping on overlap
In case vmemmap array could overlap with vmalloc area on
virtual memory layout setup, the size of vmalloc area
is decreased. That could result in less memory than user
requested with vmalloc= kernel command line parameter.
Instead, reduce the size of identity mapping (and the
size of vmemmap array as result) to avoid such overlap.
Further, currently the virtual memmory allocation "rolls"
from top to bottom and it is only VMALLOC_START that could
get increased due to the overlap. Change that to decrease-
only, which makes the whole allocation algorithm more easy
to comprehend.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Alexander Gordeev [Fri, 1 Mar 2024 06:03:49 +0000 (07:03 +0100)]
s390/boot: Consider DCSS segments on memory layout setup
The maximum mappable physical address (as returned by
arch_get_mappable_range() callback) is limited by the
value of (1UL << MAX_PHYSMEM_BITS).
The maximum physical address available to a DCSS segment
is 512GB.
In case the available online or offline memory size is less
than the DCSS limit arch_get_mappable_range() would include
never used [512GB..(1UL << MAX_PHYSMEM_BITS)] range.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Alexander Gordeev [Tue, 11 Jul 2023 10:21:39 +0000 (12:21 +0200)]
s390/boot: Do not force vmemmap to start at MAX_PHYSMEM_BITS
vmemmap is forcefully set to start at MAX_PHYSMEM_BITS at most.
That could be needed in the past to limit ident_map_size to
MAX_PHYSMEM_BITS. However since commit
75eba6ec0de1 ("s390:
unify identity mapping limits handling") ident_map_size is
limited in setup_ident_map_size() function, which is called
earlier.
Another reason to limit vmemmap start to MAX_PHYSMEM_BITS is
because it was returned by arch_get_mappable_range() as the
maximum mappable physical address. Since commit
f641679dfe55
("s390/mm: rework arch_get_mappable_range() callback") that
is not required anymore.
As result, there is no neccessity to limit vmemmap starting
address with MAX_PHYSMEM_BITS.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Nina Schoetterl-Glausch [Tue, 19 Mar 2024 16:44:20 +0000 (17:44 +0100)]
KVM: s390: vsie: Use virt_to_phys for facility control block
In order for SIE to interpretively execute STFLE, it requires the real
or absolute address of a facility-list control block.
Before writing the location into the shadow SIE control block, convert
it from a virtual address.
We currently do not run into this bug because the lower 31 bits are the
same for virtual and physical addresses.
Signed-off-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Link: https://lore.kernel.org/r/20240319164420.4053380-3-nsg@linux.ibm.com
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-Id: <
20240319164420.
4053380-3-nsg@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Peter Oberparleiter [Tue, 26 Mar 2024 16:04:56 +0000 (17:04 +0100)]
s390/cio: fix tracepoint subchannel type field
The subchannel-type field "st" of s390_cio_stsch and s390_cio_msch
tracepoints is incorrectly filled with the subchannel-enabled SCHIB
value "ena". Fix this by assigning the correct value.
Fixes: d1de8633d96a ("s390 cio: Rewrite trace point class s390_class_schib")
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Peter Oberparleiter [Tue, 26 Mar 2024 16:03:24 +0000 (17:03 +0100)]
s390/cio: export CHPID operating speed
Add a per-CHPID sysfs attribute named "speed_bps" that provides the
operating speed of the associated channel path in bits per second,
or 0 if the operating speed is not available.
Example:
$ cat /sys/devices/css0/chp0.32/speed_bps
32G
Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Peter Oberparleiter [Tue, 26 Mar 2024 16:03:23 +0000 (17:03 +0100)]
s390/cio: export measurement data for all CMGs
A channel-path's channel-measurement-group value (CMG) determines the
format of associated measurement data and characteristics blocks. Both
blocks are of fixed size and contain a generic and CMG-dependent part.
Currently CIO exports these data blocks via sysfs only for a specific
list of CMGs even though the kernel itself does not interpret
CMG-dependent data.
Change CIO to export measurement data and characteristics for all CMGs.
This enables supporting new CMG data formats in userspace without the
need for kernel changes.
Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Peter Oberparleiter [Tue, 26 Mar 2024 16:03:22 +0000 (17:03 +0100)]
s390/cio: export extended channel-path-measurement data
Add a per-CHPID binary sysfs attribute named "ext_measurement" that
provides access to extended channel-path-measurement data for the
associated channel path.
Note that while not all channel-paths provide extended measurement data
this attribute is created unconditionally for all channel paths because
channel-path measurement capabilities might change during run-time.
Reading from the attribute will only return data for channel-paths that
support extended measurement data.
Example:
$ echo 1 > /sys/devices/css0/cm_enable
$ xxd /sys/devices/css0/chp0.32/ext_measurement
00000000: 53e0 8002 0000 0095 0000 0000 59cc e034 S...........Y..4
00000010: 38b8 cc45 0000 0000 0000 0000 3e24 fe94 8..E........>$..
00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Tested-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Peter Oberparleiter [Tue, 26 Mar 2024 16:03:21 +0000 (17:03 +0100)]
s390/cio: simplify measurement attribute registration
Use attribute groups to simplify registration, removal and extension of
measurement related sysfs attributes.
Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Peter Oberparleiter [Tue, 26 Mar 2024 16:03:20 +0000 (17:03 +0100)]
s390/cio: rework channel-utilization-block handling
Convert channel-utilization-block (CUB) address variables from separate
named fields to arrays of addresses. Also simplify error handling and
introduce named constants. This is done in preparation of introducing
additional CUBs.
Note: With this change the __packed annotation of secm_area is required
to prevent an alignment hole that would otherwise occur due to the
switch from u32 to dma64_t.
Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Matthew Wilcox (Oracle) [Fri, 22 Mar 2024 16:11:47 +0000 (16:11 +0000)]
s390/mm: Convert gmap_make_secure to use a folio
Remove uses of deprecated page APIs, and move the check for large
folios to here to avoid taking the folio lock if the folio is too large.
We could do better here by attempting to split the large folio, but I'll
leave that improvement for someone who can test it.
Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20240322161149.2327518-3-willy@infradead.org
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Matthew Wilcox (Oracle) [Fri, 22 Mar 2024 16:11:46 +0000 (16:11 +0000)]
s390/mm: Convert make_page_secure to use a folio
These page APIs are deprecated, so convert the incoming page to a folio
and use the folio APIs instead. The ultravisor API cannot handle large
folios, so return -EINVAL if one has slipped through.
Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Link: https://lore.kernel.org/r/20240322161149.2327518-2-willy@infradead.org
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Thomas Richter [Wed, 27 Mar 2024 08:22:43 +0000 (09:22 +0100)]
s390/cpum_cf: make crypto counters upward compatible across machine types
The CPU Measurement facility crypto counter set functionality
is defined by the Second Counter Version Number. This number
varies between machine types, but is upward compatible.
Lessen the checks to reflect this behavior.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Heiko Carstens [Tue, 26 Mar 2024 10:52:23 +0000 (11:52 +0100)]
s390: adjust indentation of RELOCS command build step out
Common pattern in non-verbose build output for quiet commands is that the
shorthand of a command including whitespace contains at least eight
characters. Adjust this for the RELOCS command, which comes only with seven
characters.
Before:
SORTTAB vmlinux
CC arch/s390/boot/version.o
RELOCS arch/s390/boot/relocs.S
OBJCOPY arch/s390/boot/info.bin
After:
SORTTAB vmlinux
CC arch/s390/boot/version.o
RELOCS arch/s390/boot/relocs.S
OBJCOPY arch/s390/boot/info.bin
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Li Zhijian [Thu, 14 Mar 2024 09:52:09 +0000 (17:52 +0800)]
s390/cio: convert sprintf()/snprintf() to sysfs_emit()
Per filesystems/sysfs.rst, show() should only use sysfs_emit()
or sysfs_emit_at() when formatting the value to be returned to user space.
coccinelle complains that there are still a couple of functions that use
snprintf(). Convert them to sysfs_emit().
Generally, this patch is generated by
make coccicheck M=<path/to/file> MODE=patch \
COCCI=scripts/coccinelle/api/device_attr_show.cocci
No functional change intended.
Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Link: https://lore.kernel.org/r/20240314095209.1325229-1-lizhijian@fujitsu.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Holger Dengler [Tue, 27 Feb 2024 15:49:33 +0000 (16:49 +0100)]
s390/ap: rename ap debug configuration option
The configuration option ZCRYPT_DEBUG is used only in ap queue code,
so rename it to AP_DEBUG. It also no longer depends on ZCRYPT but on
AP. While at it, also update the help text.
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Holger Dengler [Mon, 19 Feb 2024 17:10:19 +0000 (18:10 +0100)]
s390/ap: modularize ap bus
There is no hard requirement to have the ap bus statically in the
kernel, so add an option to compile it as module.
Cc: Tony Krowiak <akrowiak@linux.ibm.com>
Cc: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Holger Dengler [Mon, 19 Feb 2024 16:32:54 +0000 (17:32 +0100)]
s390/chsc: use notifier for AP configuration changes
The direct dependency of chsc and the AP bus prevents the
modularization of ap bus. Introduce a notifier interface for AP
changes, which decouples the producer of the change events (chsc) from
the consumer (ap_bus).
Remove the ap_cfg_chg() interface and replace it with the notifier
invocation. The ap bus module registers a notification handler, which
triggers the AP bus scan.
Cc: Vineeth Vijayan <vneethv@linux.ibm.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Acked-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Holger Dengler [Thu, 15 Feb 2024 07:59:45 +0000 (08:59 +0100)]
s390/uv: export prot_virt_guest symbol in uv
The inline function is_prot_virt_guest() in asm/uv.h makes use of the
prot_virt_guest symbol. As this inline function can be called by other
parts of the kernel (modules and built-in), the symbol should be
exported, similar to the prot_virt_host symbol.
One consumer of is_prot_virt_guest() will be the ap bus code.
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Holger Dengler [Fri, 8 Mar 2024 16:23:25 +0000 (17:23 +0100)]
s390/ap: swap IRQ and bus/device registration
The IRQ handler may rely on the bus or the root device. Register the
adapter IRQ after setting up the bus and the root device to avoid any
race conditions.
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Holger Dengler [Fri, 8 Mar 2024 16:13:48 +0000 (17:13 +0100)]
s390/ap: rework ap initialization
Rework the ap initialization and add missing cleanups to the error path.
Errors during the registration of IRQ handler is now also detected.
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Holger Dengler [Fri, 8 Mar 2024 14:54:43 +0000 (15:54 +0100)]
s390/ap: use static qci information
Since qci is available on most of the current machines, move away from
the dynamic buffers for qci information and store it instead in a
statically defined buffer.
The new flags member in struct ap_config_info is now used as an
indicator, if qci is available in the system (at least one of these
bits is set).
Suggested-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Linus Torvalds [Sun, 31 Mar 2024 21:32:39 +0000 (14:32 -0700)]
Linux 6.9-rc2
Linus Torvalds [Sun, 31 Mar 2024 18:23:51 +0000 (11:23 -0700)]
Merge tag 'kbuild-fixes-v6.9' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Deduplicate Kconfig entries for CONFIG_CXL_PMU
- Fix unselectable choice entry in MIPS Kconfig, and forbid this
structure
- Remove unused include/asm-generic/export.h
- Fix a NULL pointer dereference bug in modpost
- Enable -Woverride-init warning consistently with W=1
- Drop KCSAN flags from *.mod.c files
* tag 'kbuild-fixes-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: Fix typo HEIGTH to HEIGHT
Documentation/llvm: Note s390 LLVM=1 support with LLVM 18.1.0 and newer
kbuild: Disable KCSAN for autogenerated *.mod.c intermediaries
kbuild: make -Woverride-init warnings more consistent
modpost: do not make find_tosym() return NULL
export.h: remove include/asm-generic/export.h
kconfig: do not reparent the menu inside a choice block
MIPS: move unselectable FIT_IMAGE_FDT_EPM5 out of the "System type" choice
cxl: remove CONFIG_CXL_PMU entry in drivers/cxl/Kconfig
Linus Torvalds [Sun, 31 Mar 2024 18:15:32 +0000 (11:15 -0700)]
Merge tag 'edac_urgent_for_v6.9_rc2' of git://git./linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:
- Fix more issues in the AMD FMPM driver
* tag 'edac_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
RAS: Avoid build errors when CONFIG_DEBUG_FS=n
RAS/AMD/FMPM: Safely handle saved records of various sizes
RAS/AMD/FMPM: Avoid NULL ptr deref in get_saved_records()
Linus Torvalds [Sun, 31 Mar 2024 18:04:51 +0000 (11:04 -0700)]
Merge tag 'irq_urgent_for_v6.9_rc2' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Fix an unused function warning on irqchip/irq-armada-370-xp
- Fix the IRQ sharing with pinctrl-amd and ACPI OSL
* tag 'irq_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/armada-370-xp: Suppress unused-function warning
genirq: Introduce IRQF_COND_ONESHOT and use it in pinctrl-amd
Linus Torvalds [Sun, 31 Mar 2024 17:43:11 +0000 (10:43 -0700)]
Merge tag 'perf_urgent_for_v6.9_rc2' of git://git./linux/kernel/git/tip/tip
Pull x86 perf fixes from Borislav Petkov:
- Define the correct set of default hw events on AMD Zen4
- Use the correct stalled cycles PMCs on AMD Zen2 and newer
- Fix detection of the LBR freeze feature on AMD
* tag 'perf_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/amd/core: Define a proper ref-cycles event for Zen 4 and later
perf/x86/amd/core: Update and fix stalled-cycles-* events for Zen 2 and later
perf/x86/amd/lbr: Use freeze based on availability
x86/cpufeatures: Add new word for scattered features
Linus Torvalds [Sun, 31 Mar 2024 17:34:49 +0000 (10:34 -0700)]
Merge tag 'timers_urgent_for_v6.9_rc2' of git://git./linux/kernel/git/tip/tip
Pull timers update from Borislav Petkov:
- Volunteer in Anna-Maria and Frederic as timers co-maintainers so that
tglx can relax more :-P
* tag 'timers_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Add co-maintainers for time[rs]
Linus Torvalds [Sun, 31 Mar 2024 17:30:06 +0000 (10:30 -0700)]
Merge tag 'objtool_urgent_for_v6.9_rc2' of git://git./linux/kernel/git/tip/tip
Pull objtool fix from Borislav Petkov:
- Fix a format specifier build error in objtool during an x32 build
* tag 'objtool_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix compile failure when using the x32 compiler
Linus Torvalds [Sun, 31 Mar 2024 17:16:34 +0000 (10:16 -0700)]
Merge tag 'x86_urgent_for_v6.9_rc2' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Make sure single object builds in arch/x86/virt/ ala
make ... arch/x86/virt/vmx/tdx/seamcall.o
work again
- Do not do ROM range scans and memory validation when the kernel is
running as a SEV-SNP guest as those can get problematic and, before
that, are not really needed in such a guest
- Exclude the build-time generated vdso-image-x32.o object from objtool
validation and in particular the return sites in there due to a
warning which fires when an unpatched return thunk is being used
- Improve the NMI CPUs stall message to show additional information
about the state of each CPU wrt the NMI handler
- Enable gcc named address spaces support only on !KCSAN configs due to
compiler options incompatibility
- Revert a change which was trying to use GB pages for mapping regions
only when the regions would be large enough but that change lead to
kexec failing
- A documentation fixlet
* tag 'x86_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build: Use obj-y to descend into arch/x86/virt/
x86/sev: Skip ROM range scans and validation for SEV-SNP guests
x86/vdso: Fix rethunk patching for vdso-image-x32.o too
x86/nmi: Upgrade NMI backtrace stall checks & messages
x86/percpu: Disable named address spaces for KCSAN
Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."
Documentation/x86: Fix title underline length
Isak Ellmer [Sat, 30 Mar 2024 15:19:45 +0000 (16:19 +0100)]
kconfig: Fix typo HEIGTH to HEIGHT
Fixed a typo in some variables where height was misspelled as heigth.
Signed-off-by: Isak Ellmer <isak01@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Nathan Chancellor [Wed, 27 Mar 2024 17:20:36 +0000 (10:20 -0700)]
Documentation/llvm: Note s390 LLVM=1 support with LLVM 18.1.0 and newer
As of the first s390 pull request during the 6.9 merge window,
commit
691632f0e869 ("Merge tag 's390-6.9-1' of
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux"), s390 can be
built with LLVM=1 when using LLVM 18.1.0, which is the first version
that has SystemZ support implemented in ld.lld and llvm-objcopy.
Update the supported architectures table in the Kbuild LLVM
documentation to note this explicitly to make it more discoverable by
users and other developers. Additionally, this brings s390 in line with
the rest of the architectures in the table, which all support LLVM=1.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Borislav Petkov (AMD) [Tue, 26 Mar 2024 20:25:48 +0000 (21:25 +0100)]
kbuild: Disable KCSAN for autogenerated *.mod.c intermediaries
When KCSAN and CONSTRUCTORS are enabled, one can trigger the
"Unpatched return thunk in use. This should not happen!"
catch-all warning.
Usually, when objtool runs on the .o objects, it does generate a section
.return_sites which contains all offsets in the objects to the return
thunks of the functions present there. Those return thunks then get
patched at runtime by the alternatives.
KCSAN and CONSTRUCTORS add this to the object file's .text.startup
section:
-------------------
Disassembly of section .text.startup:
...
0000000000000010 <_sub_I_00099_0>:
10: f3 0f 1e fa endbr64
14: e8 00 00 00 00 call 19 <_sub_I_00099_0+0x9>
15: R_X86_64_PLT32 __tsan_init-0x4
19: e9 00 00 00 00 jmp 1e <__UNIQUE_ID___addressable_cryptd_alloc_aead349+0x6>
1a: R_X86_64_PLT32 __x86_return_thunk-0x4
-------------------
which, if it is built as a module goes through the intermediary stage of
creating a <module>.mod.c file which, when translated, receives a second
constructor:
-------------------
Disassembly of section .text.startup:
0000000000000010 <_sub_I_00099_0>:
10: f3 0f 1e fa endbr64
14: e8 00 00 00 00 call 19 <_sub_I_00099_0+0x9>
15: R_X86_64_PLT32 __tsan_init-0x4
19: e9 00 00 00 00 jmp 1e <_sub_I_00099_0+0xe>
1a: R_X86_64_PLT32 __x86_return_thunk-0x4
...
0000000000000030 <_sub_I_00099_0>:
30: f3 0f 1e fa endbr64
34: e8 00 00 00 00 call 39 <_sub_I_00099_0+0x9>
35: R_X86_64_PLT32 __tsan_init-0x4
39: e9 00 00 00 00 jmp 3e <__ksymtab_cryptd_alloc_ahash+0x2>
3a: R_X86_64_PLT32 __x86_return_thunk-0x4
-------------------
in the .ko file.
Objtool has run already so that second constructor's return thunk cannot
be added to the .return_sites section and thus the return thunk remains
unpatched and the warning rightfully fires.
Drop KCSAN flags from the mod.c generation stage as those constructors
do not contain data races one would be interested about.
Debugged together with David Kaplan <David.Kaplan@amd.com> and Nikolay
Borisov <nik.borisov@suse.com>.
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Closes: https://lore.kernel.org/r/0851a207-7143-417e-be31-8bf2b3afb57d@molgen.mpg.de
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> # Dell XPS 13
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Reviewed-by: Marco Elver <elver@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Arnd Bergmann [Tue, 26 Mar 2024 14:47:16 +0000 (15:47 +0100)]
kbuild: make -Woverride-init warnings more consistent
The -Woverride-init warn about code that may be intentional or not,
but the inintentional ones tend to be real bugs, so there is a bit of
disagreement on whether this warning option should be enabled by default
and we have multiple settings in scripts/Makefile.extrawarn as well as
individual subsystems.
Older versions of clang only supported -Wno-initializer-overrides with
the same meaning as gcc's -Woverride-init, though all supported versions
now work with both. Because of this difference, an earlier cleanup of
mine accidentally turned the clang warning off for W=1 builds and only
left it on for W=2, while it's still enabled for gcc with W=1.
There is also one driver that only turns the warning off for newer
versions of gcc but not other compilers, and some but not all the
Makefiles still use a cc-disable-warning conditional that is no
longer needed with supported compilers here.
Address all of the above by removing the special cases for clang
and always turning the warning off unconditionally where it got
in the way, using the syntax that is supported by both compilers.
Fixes: 2cd3271b7a31 ("kbuild: avoid duplicate warning options")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Andrew Jeffery <andrew@codeconstruct.com.au>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Mikulas Patocka [Sat, 30 Mar 2024 19:23:08 +0000 (20:23 +0100)]
objtool: Fix compile failure when using the x32 compiler
When compiling the v6.9-rc1 kernel with the x32 compiler, the following
errors are reported. The reason is that we take an "unsigned long"
variable and print it using "PRIx64" format string.
In file included from check.c:16:
check.c: In function ‘add_dead_ends’:
/usr/src/git/linux-2.6/tools/objtool/include/objtool/warn.h:46:17: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 5 has type ‘long unsigned int’ [-Werror=format=]
46 | "%s: warning: objtool: " format "\n", \
| ^~~~~~~~~~~~~~~~~~~~~~~~
check.c:613:33: note: in expansion of macro ‘WARN’
613 | WARN("can't find unreachable insn at %s+0x%" PRIx64,
| ^~~~
...
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: linux-kernel@vger.kernel.org
Linus Torvalds [Sat, 30 Mar 2024 20:51:58 +0000 (13:51 -0700)]
Merge tag 'xfs-6.9-fixes-1' of git://git./fs/xfs/xfs-linux
Pull xfs fixes from Chandan Babu:
- Allow stripe unit/width value passed via mount option to be written
over existing values in the super block
- Do not set current->journal_info to avoid its value from being miused
by another filesystem context
* tag 'xfs-6.9-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: don't use current->journal_info
xfs: allow sunit mount option to repair bad primary sb stripe values
Linus Torvalds [Sat, 30 Mar 2024 20:44:52 +0000 (13:44 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes and updates from James Bottomley:
"Fully half this pull is updates to lpfc and qla2xxx which got
committed just as the merge window opened. A sizeable fraction of the
driver updates are simple bug fixes (and lock reworks for bug fixes in
the case of lpfc), so rather than splitting the few actual
enhancements out, we're just adding the drivers to the -rc1 pull.
The enhancements for lpfc are log message removals, copyright updates
and three patches redefining types. For qla2xxx it's just removing a
debug message on module removal and the manufacturer detail update.
The two major fixes are the sg teardown race and a core error leg
problem with the procfs directory not being removed if we destroy a
created host that never got to the running state. The rest are minor
fixes and constifications"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (41 commits)
scsi: bnx2fc: Remove spin_lock_bh while releasing resources after upload
scsi: core: Fix unremoved procfs host directory regression
scsi: mpi3mr: Avoid memcpy field-spanning write WARNING
scsi: sd: Fix TCG OPAL unlock on system resume
scsi: sg: Avoid sg device teardown race
scsi: lpfc: Copyright updates for 14.4.0.1 patches
scsi: lpfc: Update lpfc version to 14.4.0.1
scsi: lpfc: Define types in a union for generic void *context3 ptr
scsi: lpfc: Define lpfc_dmabuf type for ctx_buf ptr
scsi: lpfc: Define lpfc_nodelist type for ctx_ndlp ptr
scsi: lpfc: Use a dedicated lock for ras_fwlog state
scsi: lpfc: Release hbalock before calling lpfc_worker_wake_up()
scsi: lpfc: Replace hbalock with ndlp lock in lpfc_nvme_unregister_port()
scsi: lpfc: Update lpfc_ramp_down_queue_handler() logic
scsi: lpfc: Remove IRQF_ONESHOT flag from threaded IRQ handling
scsi: lpfc: Move NPIV's transport unregistration to after resource clean up
scsi: lpfc: Remove unnecessary log message in queuecommand path
scsi: qla2xxx: Update version to 10.02.09.200-k
scsi: qla2xxx: Delay I/O Abort on PCI error
scsi: qla2xxx: Change debug message during driver unload
...
Linus Torvalds [Sat, 30 Mar 2024 20:16:21 +0000 (13:16 -0700)]
Merge tag 'i2c-for-6.9-rc2' of git://git./linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
"A fix from Andi for I2C host drivers"
* tag 'i2c-for-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: i801: Fix a refactoring that broke a touchpad on Lenovo P1
Linus Torvalds [Sat, 30 Mar 2024 20:11:42 +0000 (13:11 -0700)]
Merge tag 'usb-6.9-rc2' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a bunch of small USB fixes for reported problems and
regressions for 6.9-rc2. Included in here are:
- deadlock fixes for long-suffering issues
- USB phy driver revert for reported problem
- typec fixes for reported problems
- duplicate id in dwc3 dropped
- dwc2 driver fixes
- udc driver warning fix
- cdc-wdm race bugfix
- other tiny USB bugfixes
All of these have been in linux-next this past week with no reported
issues"
* tag 'usb-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (26 commits)
USB: core: Fix deadlock in port "disable" sysfs attribute
USB: core: Add hub_get() and hub_put() routines
usb: typec: ucsi: Check capabilities before cable and identity discovery
usb: typec: ucsi: Clear UCSI_CCI_RESET_COMPLETE before reset
usb: typec: ucsi_acpi: Refactor and fix DELL quirk
usb: typec: ucsi: Ack unsupported commands
usb: typec: ucsi: Check for notifications after init
usb: typec: ucsi: Clear EVENT_PENDING under PPM lock
usb: typec: Return size of buffer if pd_set operation succeeds
usb: udc: remove warning when queue disabled ep
usb: dwc3: pci: Drop duplicate ID
usb: dwc3: Properly set system wakeup
Revert "usb: phy: generic: Get the vbus supply"
usb: cdc-wdm: close race between read and workqueue
usb: dwc2: gadget: LPM flow fix
usb: dwc2: gadget: Fix exiting from clock gating
usb: dwc2: host: Fix ISOC flow in DDMA mode
usb: dwc2: host: Fix remote wakeup from hibernation
usb: dwc2: host: Fix hibernation flow
USB: core: Fix deadlock in usb_deauthorize_interface()
...
Linus Torvalds [Sat, 30 Mar 2024 19:59:00 +0000 (12:59 -0700)]
Merge tag 'staging-6.9-rc2' of git://git./linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are two small staging driver fixes for the vc04_services driver
that resolve reported problems:
- strncpy fix for information leak
- another information leak discovered by the previous strncpy fix
Both of these have been in linux-next all this past week with no
reported issues"
* tag 'staging-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: vc04_services: fix information leak in create_component()
staging: vc04_services: changen strncpy() to strscpy_pad()
Wolfram Sang [Sat, 30 Mar 2024 14:37:54 +0000 (15:37 +0100)]
Merge tag 'i2c-host-fixes-6.9-rc2' of git://git./linux/kernel/git/andi.shyti/linux into i2c/for-current
One fix in the i801 driver where a bug caused touchpad
malfunctions on some Lenovo P1 models by incorrectly overwriting
a status variable during successful SMBUS transactions.
Masahiro Yamada [Sat, 30 Mar 2024 06:05:54 +0000 (15:05 +0900)]
x86/build: Use obj-y to descend into arch/x86/virt/
Commit
c33621b4c5ad ("x86/virt/tdx: Wire up basic SEAMCALL functions")
introduced a new instance of core-y instead of the standardized obj-y
syntax.
X86 Makefiles descend into subdirectories of arch/x86/virt inconsistently;
into arch/x86/virt/ via core-y defined in arch/x86/Makefile, but into
arch/x86/virt/svm/ via obj-y defined in arch/x86/Kbuild.
This is problematic when you build a single object in parallel because
multiple threads attempt to build the same file.
$ make -j$(nproc) arch/x86/virt/vmx/tdx/seamcall.o
[ snip ]
AS arch/x86/virt/vmx/tdx/seamcall.o
AS arch/x86/virt/vmx/tdx/seamcall.o
fixdep: error opening file: arch/x86/virt/vmx/tdx/.seamcall.o.d: No such file or directory
make[4]: *** [scripts/Makefile.build:362: arch/x86/virt/vmx/tdx/seamcall.o] Error 2
Use the obj-y syntax, as it works correctly.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240330060554.18524-1-masahiroy@kernel.org
Linus Torvalds [Fri, 29 Mar 2024 22:51:15 +0000 (15:51 -0700)]
Merge tag 'drm-fixes-2024-03-30' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Regular fixes for rc2, quite a few i915/amdgpu as usual, some xe, and
then mostly scattered around. rc3 might be quieter with the holidays
but we shall see.
bridge:
- select DRM_KMS_HELPER
dma-buf:
- fix NULL-pointer deref
dp:
- fix div-by-zero in DP MST unplug code
fbdev:
- select FB_IOMEM_FOPS for SBus
sched:
- fix NULL-pointer deref
xe:
- Fix build on mips
- Fix wrong bound checks
- Fix use of msec rather than jiffies
- Remove dead code
amdgpu:
- SMU 14.0.1 updates
- DCN 3.5.x updates
- VPE fix
- eDP panel flickering fix
- Suspend fix
- PSR fix
- DCN 3.0+ fix
- VCN 4.0.6 updates
- debugfs fix
amdkfd:
- DMA-Buf fix
- GFX 9.4.2 TLB flush fix
- CP interrupt fix
i915:
- Fix for BUG_ON/BUILD_BUG_ON IN I915_memcpy.c
- Update a MTL workaround
- Fix locking inversion in hwmon's sysfs
- Remove a bogus error message around PXP
- Fix UAF on VMA
- Reset queue_priority_hint on parking
- Display Fixes:
- Remove duplicated audio enable/disable on SDVO and DP
- Disable AuxCCS for Xe driver
- Revert init order of MIPI DSI
- DRRS debugfs fix with an extra refactor patch
- VRR related fixes
- Fix a JSL eDP corruption
- Fix the cursor physical dma address
- BIOS VBT related fix
nouveau:
- dmem: handle kcalloc() allocation failures
qxl:
- remove unused variables
rockchip:
- vop2: remove support for AR30 and AB30 formats
vmwgfx:
- debugfs: create ttm_resource_manager entry only if needed"
* tag 'drm-fixes-2024-03-30' of https://gitlab.freedesktop.org/drm/kernel: (55 commits)
drm/i915/bios: Tolerate devdata==NULL in intel_bios_encoder_supports_dp_dual_mode()
drm/i915: Pre-populate the cursor physical dma address
drm/i915/gt: Reset queue_priority_hint on parking
drm/i915/vma: Fix UAF on destroy against retire race
drm/i915: Do not print 'pxp init failed with 0' when it succeed
drm/i915: Do not match JSL in ehl_combo_pll_div_frac_wa_needed()
drm/i915/hwmon: Fix locking inversion in sysfs getter
drm/i915/dsb: Fix DSB vblank waits when using VRR
drm/i915/vrr: Generate VRR "safe window" for DSB
drm/i915/display/debugfs: Fix duplicate checks in i915_drrs_status
drm/i915/drrs: Refactor CPU transcoder DRRS check
drm/i915/mtl: Update workaround
14018575942
drm/i915/dsi: Go back to the previous INIT_OTP/DISPLAY_ON order, mostly
drm/i915/display: Disable AuxCCS framebuffers if built for Xe
drm/i915: Stop doing double audio enable/disable on SDVO and g4x+ DP
drm/i915: Add includes for BUG_ON/BUILD_BUG_ON in i915_memcpy.c
drm/qxl: remove unused variable from `qxl_process_single_command()`
drm/qxl: remove unused `count` variable from `qxl_surface_id_alloc()`
drm/i915: add bug.h include to i915_memcpy.c
drm/vmwgfx: Create debugfs ttm_resource_manager entry only if needed
...
Linus Torvalds [Fri, 29 Mar 2024 22:38:29 +0000 (15:38 -0700)]
Merge tag 'linux_kselftest-fixes-6.9-rc2' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"Fixes to seccomp and ftrace tests and a change to add config file for
dmabuf-heap test to increase coverage"
* tag 'linux_kselftest-fixes-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: dmabuf-heap: add config file for the test
selftests/seccomp: Try to fit runtime of benchmark into timeout
selftests/ftrace: Fix event filter target_func selection
Linus Torvalds [Fri, 29 Mar 2024 22:35:12 +0000 (15:35 -0700)]
Merge tag 'linux_kselftest-kunit-fixes-6.9-rc2' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull KUnit fixes from Shuah Khan:
"One urgent fix for --alltests build failure related to renaming of
CONFIG_DAMON_DBGFS to DAMON_DBGFS_DEPRECATED to the missing config
option"
* tag 'linux_kselftest-kunit-fixes-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: configs: Enable CONFIG_DAMON_DBGFS_DEPRECATED for --alltests
Muhammad Usama Anjum [Tue, 5 Mar 2024 06:08:47 +0000 (11:08 +0500)]
selftests: dmabuf-heap: add config file for the test
The config fragment enlists all the config options needed for the test.
This config is merged into the kernel's config on which this test is
run.
Fixed whitespace errors during commit:
Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Mark Brown [Mon, 25 Mar 2024 16:57:59 +0000 (16:57 +0000)]
selftests/seccomp: Try to fit runtime of benchmark into timeout
The seccomp benchmark runs five scenarios, one calibration run with no
seccomp filters enabled then four further runs each adding a filter. The
calibration run times itself for 15s and then each additional run executes
for the same number of times.
Currently the seccomp tests, including the benchmark, run with an extended
120s timeout but this is not sufficient to robustly run the tests on a lot
of platforms. Sample timings from some recent runs:
Platform Run 1 Run 2 Run 3 Run 4
--------- ----- ----- ----- -----
PowerEdge R200 16.6s 16.6s 31.6s 37.4s
BBB (arm) 20.4s 20.4s 54.5s
Synquacer (arm64) 20.7s 23.7s 40.3s
The x86 runs from the PowerEdge are quite marginal and routinely fail, for
the successful run reported here the timed portions of the run are at
117.2s leaving less than 3s of margin which is frequently breached. The
added overhead of adding filters on the other platforms is such that there
is no prospect of their runs fitting into the 120s timeout, especially
on 32 bit arm where there is no BPF JIT.
While we could lower the time we calibrate for I'm also already seeing the
currently completing runs reporting issues with the per filter overheads
not matching expectations:
Let's instead raise the timeout to 180s which is only a 50% increase on the
current timeout which is itself not *too* large given that there's only two
tests in this suite.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Mark Rutland [Wed, 20 Mar 2024 14:18:44 +0000 (14:18 +0000)]
selftests/ftrace: Fix event filter target_func selection
The event filter function test has been failing in our internal test
farm:
| # not ok 33 event filter function - test event filtering on functions
Running the test in verbose mode indicates that this is because the test
erroneously determines that kmem_cache_free() is the most common caller
of kmem_cache_free():
# # + cut -d: -f3 trace
# # + sed s/call_site=([^+]*)+0x.*/1/
# # + sort
# # + uniq -c
# # + sort
# # + tail -n 1
# # + sed s/^[ 0-9]*//
# # + target_func=kmem_cache_free
... and as kmem_cache_free() doesn't call itself, setting this as the
filter function for kmem_cache_free() results in no hits, and
consequently the test fails:
# # + grep kmem_cache_free trace
# # + grep kmem_cache_free
# # + wc -l
# # + hitcnt=0
# # + grep kmem_cache_free trace
# # + grep -v kmem_cache_free
# # + wc -l
# # + misscnt=0
# # + [ 0 -eq 0 ]
# # + exit_fail
This seems to be because the system in question has tasks with ':' in
their name (which a number of kernel worker threads have). These show up
in the trace, e.g.
test:.sh-1299 [004] ..... 2886.040608: kmem_cache_free: call_site=putname+0xa4/0xc8 ptr=
000000000f4d22f4 name=names_cache
... and so when we try to extact the call_site with:
cut -d: -f3 trace | sed 's/call_site=\([^+]*\)+0x.*/\1/'
... the 'cut' command will extrace the column containing
'kmem_cache_free' rather than the column containing 'call_site=...', and
the 'sed' command will leave this unchanged. Consequently, the test will
decide to use 'kmem_cache_free' as the filter function, resulting in the
failure seen above.
Fix this by matching the 'call_site=<func>' part specifically to extract
the function name.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: linux-trace-kernel@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Dave Airlie [Fri, 29 Mar 2024 19:33:22 +0000 (05:33 +1000)]
Merge tag 'drm-intel-fixes-2024-03-28' of https://anongit.freedesktop.org/git/drm/drm-intel into drm-fixes
Core/GT Fixes:
- Fix for BUG_ON/BUILD_BUG_ON IN I915_memcpy.c (Joonas)
- Update a MTL workaround (Tejas)
- Fix locking inversion in hwmon's sysfs (Janusz)
- Remove a bogus error message around PXP (Jose)
- Fix UAF on VMA (Janusz)
- Reset queue_priority_hint on parking (Chris)
Display Fixes:
- Remove duplicated audio enable/disable on SDVO and DP (Ville)
- Disable AuxCCS for Xe driver (Juha-Pekka)
- Revert init order of MIPI DSI (Ville)
- DRRS debugfs fix with an extra refactor patch (Bhanuprakash)
- VRR related fixes (Ville)
- Fix a JSL eDP corruption (Jonathon)
- Fix the cursor physical dma address (Ville)
- BIOS VBT related fix (Ville)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZgYaIVgjIs30mIvS@intel.com
Borislav Petkov (AMD) [Thu, 28 Mar 2024 12:59:05 +0000 (13:59 +0100)]
x86/bugs: Fix the SRSO mitigation on Zen3/4
The original version of the mitigation would patch in the calls to the
untraining routines directly. That is, the alternative() in UNTRAIN_RET
will patch in the CALL to srso_alias_untrain_ret() directly.
However, even if commit
e7c25c441e9e ("x86/cpu: Cleanup the untrain
mess") meant well in trying to clean up the situation, due to micro-
architectural reasons, the untraining routine srso_alias_untrain_ret()
must be the target of a CALL instruction and not of a JMP instruction as
it is done now.
Reshuffle the alternative macros to accomplish that.
Fixes: e7c25c441e9e ("x86/cpu: Cleanup the untrain mess")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 29 Mar 2024 19:06:09 +0000 (12:06 -0700)]
Merge tag '6.9-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Add missing trace point (noticed when debugging the recent mknod LSM
regression)
- fscache fix
* tag '6.9-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix duplicate fscache cookie warnings
smb3: add trace event for mknod
Linus Torvalds [Fri, 29 Mar 2024 18:50:38 +0000 (11:50 -0700)]
Merge tag 'thermal-6.9-rc2' of git://git./linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"These revert a problematic optimization commit and address a devfreq
cooling device issue.
Specifics:
- Revert thermal core optimization that introduced a functional issue
causing a critical trip point to be crossed in some cases (Daniel
Lezcano)
- Add missing conversion between different state ranges to the
devfreq cooling device driver (Ye Zhang)"
* tag 'thermal-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: devfreq_cooling: Fix perf state when calculate dfc res_util
Revert "thermal: core: Don't update trip points inside the hysteresis range"
Linus Torvalds [Fri, 29 Mar 2024 18:37:12 +0000 (11:37 -0700)]
Merge tag 'acpi-6.9-rc2' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix two issues that may lead to attempts to use memory that has
been freed already.
Specifics:
- Drop __exit annotation from einj_remove() in the ACPI APEI code
because this function can be called during runtime (Arnd Bergmann)
- Make acpi_db_walk_for_fields() check acpi_evaluate_object() return
value to avoid accessing memory that has been freed (Nikita
Kiryushin)"
* tag 'acpi-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: debugger: check status of acpi_evaluate_object() in acpi_db_walk_for_fields()
ACPI: APEI: EINJ: mark remove callback as non-__exit
Linus Torvalds [Fri, 29 Mar 2024 18:06:13 +0000 (11:06 -0700)]
mm: clean up populate_vma_page_range() FOLL_* flag handling
The code wasn't exactly wrong, but it was very odd, and it used
FOLL_FORCE together with FOLL_WRITE when it really didn't need to (it
only set FOLL_WRITE for writable mappings, so then the FOLL_FORCE was
pointless).
It also pointlessly called __get_user_pages() even when it knew it
wouldn't populate anything because the vma wasn't accessible and it
explicitly tested for and did *not* set FOLL_FORCE for inaccessible
vma's.
This code does need to use FOLL_FORCE, because we want to do fault in
writable shared mappings, but then the mapping may not actually be
readable. And we don't want to use FOLL_WRITE (which would match the
permission of the vma), because that would also dirty the pages, which
we don't want to do.
For very similar reasons, FOLL_FORCE populates a executable-only mapping
with no read permissions. We don't have a FOLL_EXEC flag.
Yes, it would probably be cleaner to split FOLL_WRITE into two bits (for
separate permission and dirty bit handling), and add a FOLL_EXEC flag
for the "GUP executable page" case. That would allow us to avoid
FOLL_FORCE entirely here.
But that's not how our FOLL_xyz bits have traditionally worked, and that
would be a much bigger patch.
So this at least avoids the FOLL_FORCE | FOLL_WRITE combination that
made one of my experimental validation patches trigger a warning. That
warning was a false positive (and my experimental patch was incomplete
anyway), but it all made me look at this and decide to clean at least
this small case up.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Fri, 29 Mar 2024 18:00:09 +0000 (19:00 +0100)]
Merge branch 'acpica'
* acpica:
ACPICA: debugger: check status of acpi_evaluate_object() in acpi_db_walk_for_fields()
Linus Torvalds [Fri, 29 Mar 2024 16:51:04 +0000 (09:51 -0700)]
Merge tag 'efi-fixes-for-v6.9-3' of git://git./linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
"These address all the outstanding EFI/x86 boot related regressions:
- Revert to the old initrd memory allocation soft limit of INT_MAX,
which was dropped inadvertently
- Ensure that startup_32() is entered with a valid boot_params
pointer when using the new EFI mixed mode protocol
- Fix a compiler warning introduced by a fix from the previous pull"
* tag 'efi-fixes-for-v6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
x86/efistub: Reinstate soft limit for initrd loading
efi/libstub: Cast away type warning in use of max()
x86/efistub: Add missing boot_params for mixed mode compat entry
Linus Torvalds [Fri, 29 Mar 2024 16:40:22 +0000 (09:40 -0700)]
Merge tag 'block-6.9-
20240329' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
"Small round of minor fixes or cleanups for the 6.9-rc2 kernel, one
fixing an issue introduced in 6.8"
* tag 'block-6.9-
20240329' of git://git.kernel.dk/linux:
block: Do not force full zone append completion in req_bio_endio()
block: don't reject too large max_user_sectors in blk_validate_limits
block: Make blk_rq_set_mixed_merge() static
Linus Torvalds [Fri, 29 Mar 2024 16:33:05 +0000 (09:33 -0700)]
Merge tag 'for-6.9/dm-fixes-2' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix MAINTAINERS to not include M: dm-devel for DM entries.
- Fix DM vdo's murmurhash to use proper byteswapping methods.
- Fix DM integrity clang warning about comparison out-of-range.
* tag 'for-6.9/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm integrity: fix out-of-range warning
dm vdo murmurhash3: use kernel byteswapping routines instead of GCC ones
MAINTAINERS: Remove incorrect M: tag for dm-devel@lists.linux.dev
Linus Torvalds [Fri, 29 Mar 2024 16:26:34 +0000 (09:26 -0700)]
Merge tag 'gpio-fixes-for-v6.9-rc2' of git://git./linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix a procfs failure when requesting an interrupt with a label
containing the '/' character
- add missing stubs for GPIO lookup functions for !GPIOLIB
- fix debug messages that would print "(null)" for NULL strings
* tag 'gpio-fixes-for-v6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: Fix debug messaging in gpiod_find_and_request()
gpiolib: Add stubs for GPIO lookup functions
gpio: cdev: sanitize the label before requesting the interrupt
Arnd Bergmann [Thu, 28 Mar 2024 14:30:39 +0000 (15:30 +0100)]
dm integrity: fix out-of-range warning
Depending on the value of CONFIG_HZ, clang complains about a pointless
comparison:
drivers/md/dm-integrity.c:4085:12: error: result of comparison of
constant
42949672950 with expression of type
'unsigned int' is always false
[-Werror,-Wtautological-constant-out-of-range-compare]
if (val >= (uint64_t)UINT_MAX * 1000 / HZ) {
As the check remains useful for other configurations, shut up the
warning by adding a second type cast to uint64_t.
Fixes: 468dfca38b1a ("dm integrity: add a bitmap mode")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Ken Raeburn [Mon, 25 Mar 2024 19:22:45 +0000 (15:22 -0400)]
dm vdo murmurhash3: use kernel byteswapping routines instead of GCC ones
Also open-code the calls.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Kuan-Wei Chiu [Tue, 19 Mar 2024 18:18:42 +0000 (02:18 +0800)]
MAINTAINERS: Remove incorrect M: tag for dm-devel@lists.linux.dev
The dm-devel@lists.linux.dev mailing list should only be listed under
the L: (List) tag in the MAINTAINERS file. However, it was incorrectly
listed under both L: and M: (Maintainers) tags, which is not accurate.
Remove the M: tag for dm-devel@lists.linux.dev in the MAINTAINERS file
to reflect the correct categorization.
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Linus Torvalds [Fri, 29 Mar 2024 00:15:33 +0000 (17:15 -0700)]
Merge tag 'mmc-v6.9-rc1' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- Fix regression for the mmc ioctl
MMC host:
- sdhci-of-dwcmshc: Fixup PM support in ->remove_new()
- sdhci-omap: Re-tune when device became runtime suspended"
* tag 'mmc-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
sdhci-of-dwcmshc: disable PM runtime in dwcmshc_remove()
mmc: sdhci-omap: re-tuning is needed after a pm transition to support emmc HS200 mode
mmc: core: Avoid negative index with array access
mmc: core: Initialize mmc_blk_ioc_data
Damien Le Moal [Thu, 28 Mar 2024 00:43:40 +0000 (09:43 +0900)]
block: Do not force full zone append completion in req_bio_endio()
This reverts commit
748dc0b65ec2b4b7b3dbd7befcc4a54fdcac7988.
Partial zone append completions cannot be supported as there is no
guarantees that the fragmented data will be written sequentially in the
same manner as with a full command. Commit
748dc0b65ec2 ("block: fix
partial zone append completion handling in req_bio_endio()") changed
req_bio_endio() to always advance a partially failed BIO by its full
length, but this can lead to incorrect accounting. So revert this
change and let low level device drivers handle this case by always
failing completely zone append operations. With this revert, users will
still see an IO error for a partially completed zone append BIO.
Fixes: 748dc0b65ec2 ("block: fix partial zone append completion handling in req_bio_endio()")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240328004409.594888-2-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Thu, 28 Mar 2024 21:54:49 +0000 (14:54 -0700)]
Merge tag 'sound-6.9-rc2' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of device-specific small fixes: a series of fixes for
TAS2781 HD-audio codec, ASoC SOF, Cirrus CS35L56 and a couple of
legacy drivers"
* tag 'sound-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/tas2781: remove useless dev_dbg from playback_hook
ALSA: hda/tas2781: add debug statements to kcontrols
ALSA: hda/tas2781: add locks to kcontrols
ALSA: hda/tas2781: remove digital gain kcontrol
ALSA: aoa: avoid false-positive format truncation warning
ALSA: sh: aica: reorder cleanup operations to avoid UAF bugs
ALSA: hda: cs35l56: Set the init_done flag before component_add()
ALSA: hda: cs35l56: Raise device name message log level
ASoC: SOF: ipc4-topology: support NHLT device type
ALSA: hda: intel-nhlt: add intel_nhlt_ssp_device_type() function
Linus Torvalds [Thu, 28 Mar 2024 21:40:46 +0000 (14:40 -0700)]
Merge tag 'iommu-fixes-v6.9-rc1' of git://git./linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
"ARM SMMU fixes:
- Fix swabbing of the STE fields in the unlikely event of running on
a big-endian machine
- Fix setting of STE.SHCFG on hardware that doesn't implement support
for attribute overrides
IOMMU core:
- PASID validation fix in device attach path"
* tag 'iommu-fixes-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: Validate the PASID in iommu_attach_device_pasid()
iommu/arm-smmu-v3: Fix access for STE.SHCFG
iommu/arm-smmu-v3: Add cpu_to_le64() around STRTAB_STE_0_V
Linus Torvalds [Thu, 28 Mar 2024 21:35:32 +0000 (14:35 -0700)]
Merge tag 'nfsd-6.9-1' of git://git./linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Address three recently introduced regressions
* tag 'nfsd-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: CREATE_SESSION must never cache NFS4ERR_DELAY replies
SUNRPC: Revert
561141dd494382217bace4d1a51d08168420eace
nfsd: Fix error cleanup path in nfsd_rename()