linux.git
17 months agos390/mm,fault: move VM_FAULT_ERROR handling to do_exception()
Heiko Carstens [Thu, 12 Oct 2023 07:40:52 +0000 (09:40 +0200)]
s390/mm,fault: move VM_FAULT_ERROR handling to do_exception()

Get rid of do_fault_error() and move its contents to do_exception(),
which makes do_exception(). With removing do_fault_error() it is also
possible to get rid of the handle_fault_error_nolock() wrapper.
Instead rename do_no_context() to handle_fault_error_nolock().

In result the whole fault handling looks much more like on other
architectures.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/mm,fault: remove VM_FAULT_BADMAP and VM_FAULT_BADACCESS
Heiko Carstens [Thu, 12 Oct 2023 07:40:51 +0000 (09:40 +0200)]
s390/mm,fault: remove VM_FAULT_BADMAP and VM_FAULT_BADACCESS

Remove the last two private vm_fault reasons: VM_FAULT_BADMAP and
VM_FAULT_BADACCESS.

In order to achieve this add an si_code parameter to do_no_context()
and it's wrappers and directly call the wrappers instead of relying on
do_fault_error() handling.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/mm,fault: remove VM_FAULT_SIGNAL
Heiko Carstens [Thu, 12 Oct 2023 07:40:50 +0000 (09:40 +0200)]
s390/mm,fault: remove VM_FAULT_SIGNAL

Remove VM_FAULT_SIGNAL and open-code it at the only two locations
where it is used.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/mm,fault: remove VM_FAULT_BADCONTEXT
Heiko Carstens [Thu, 12 Oct 2023 07:40:49 +0000 (09:40 +0200)]
s390/mm,fault: remove VM_FAULT_BADCONTEXT

Remove VM_FAULT_BADCONTEXT and instead call do_no_context() via
wrappers. This adds two new wrappers similar to what x86 has:

handle_fault_error() and handle_fault_error_nolock(). Both of them
simply call do_no_context(), while handle_fault_error() also unlocks
mmap lock, which avoids adding lots of mmap_read_unlock() calls with
this and subsequent patches.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/mm,fault: simplify kfence fault handling
Heiko Carstens [Thu, 12 Oct 2023 07:40:48 +0000 (09:40 +0200)]
s390/mm,fault: simplify kfence fault handling

do_no_context() can be simplified by removing its fault parameter,
which is only used to decide if kfence_handle_page_fault() should be
called.

If the fault happened within the kernel space it is ok to always check
if this happened on a page which was unmapped because of the kfence
feature. Limiting the check to the VM_FAULT_BADCONTEXT case doesn't
add any value.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/mm,fault: call do_fault_error() only from do_exception()
Heiko Carstens [Thu, 12 Oct 2023 07:40:47 +0000 (09:40 +0200)]
s390/mm,fault: call do_fault_error() only from do_exception()

Remove duplicated fault error handling and handle it only once within
do_exception().

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/mm,fault: get rid of do_low_address()
Heiko Carstens [Thu, 12 Oct 2023 07:40:46 +0000 (09:40 +0200)]
s390/mm,fault: get rid of do_low_address()

There is only one caller of do_low_address(). Given that this code is
quite special just get rid of do_low_address, and add it to
do_protection_exception() in order to make the code a bit more
readable.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/mm,fault: remove VM_FAULT_PFAULT
Heiko Carstens [Thu, 12 Oct 2023 07:40:45 +0000 (09:40 +0200)]
s390/mm,fault: remove VM_FAULT_PFAULT

Handling of VM_FAULT_PFAULT and VM_FAULT_BADCONTEXT is nearly identical;
the only difference is within do_no_context() where however the fault_type
(KERNEL_FAULT vs GMAP_FAULT) makes sure that both types will be handled
differently.

Therefore it is possible to get rid of VM_FAULT_PFAULT and use
VM_FAULT_BADCONTEXT instead.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/mm,fault: use get_kernel_nofault() to dereference in dump_pagetable()
Heiko Carstens [Thu, 12 Oct 2023 07:40:44 +0000 (09:40 +0200)]
s390/mm,fault: use get_kernel_nofault() to dereference in dump_pagetable()

The page table dumper uses get_kernel_nofault() to test if dereferencing
page table entries is possible. Use the result, which is the required page
table entry, instead of throwing it away and dereferencing a second time
without any safe guard.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/mm,fault: improve readability by using teid union
Heiko Carstens [Thu, 12 Oct 2023 07:40:43 +0000 (09:40 +0200)]
s390/mm,fault: improve readability by using teid union

Get rid of some magic numbers, and use the teid union and also some
ptrace PSW defines to improve readability.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/mm: move translation-exception identification structure to fault.h
Heiko Carstens [Thu, 12 Oct 2023 07:40:42 +0000 (09:40 +0200)]
s390/mm: move translation-exception identification structure to fault.h

Move translation-exception identification structure to new fault.h
header file, change it to a union, and change existing kvm code
accordingly. The new union will be used by subsequent patches.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/mm,fault: use static key for store indication
Heiko Carstens [Thu, 12 Oct 2023 07:40:41 +0000 (09:40 +0200)]
s390/mm,fault: use static key for store indication

Generate slightly better code by using a static key to implement store
indication. This allows to get rid of a memory access on the hot path.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/mm,fault: use get_fault_address() everywhere
Heiko Carstens [Thu, 12 Oct 2023 07:40:40 +0000 (09:40 +0200)]
s390/mm,fault: use get_fault_address() everywhere

Use the get_fault_address() helper function instead of open-coding it
at many locations.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/mm,fault: replace WARN_ON_ONCE() with unreachable()
Heiko Carstens [Thu, 12 Oct 2023 07:40:39 +0000 (09:40 +0200)]
s390/mm,fault: replace WARN_ON_ONCE() with unreachable()

do_secure_storage_access() contains a switch statements which handles
all possible return values from get_fault_type(). Therefore remove the
pointless default case error handling and replace it with unreachable().

Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/mm,fault: remove noinline attribute from all functions
Heiko Carstens [Thu, 12 Oct 2023 07:40:38 +0000 (09:40 +0200)]
s390/mm,fault: remove noinline attribute from all functions

Remove all noinline attribute from all functions and leave the
inlining decisions up to the compiler.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/mm,fault: remove line break
Heiko Carstens [Thu, 12 Oct 2023 07:40:37 +0000 (09:40 +0200)]
s390/mm,fault: remove line break

chechpatch reports:

 CHECK: Alignment should match open parenthesis
 +               if (IS_ENABLED(CONFIG_PGSTE) && gmap &&
 +                       (flags & FAULT_FLAG_RETRY_NOWAIT)) {

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/mm,fault: include linux/mmu_context.h
Heiko Carstens [Thu, 12 Oct 2023 07:40:36 +0000 (09:40 +0200)]
s390/mm,fault: include linux/mmu_context.h

Include linux/mmu_context.h instead asm/mmu_context.h.

checkpatch reports:

CHECK: Consider using #include <linux/mmu_context.h> instead of <asm/mmu_context.h>
+#include <asm/mmu_context.h>

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/mm,fault: have balanced braces, remove unnecessary blanks
Heiko Carstens [Thu, 12 Oct 2023 07:40:35 +0000 (09:40 +0200)]
s390/mm,fault: have balanced braces, remove unnecessary blanks

Remove unnecessary braces and also blanks after casts.
Add braces to have balanced braces where missing.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/mm,fault: use pr_warn(), pr_cont(), ... instead of open-coding
Heiko Carstens [Thu, 12 Oct 2023 07:40:34 +0000 (09:40 +0200)]
s390/mm,fault: use pr_warn(), pr_cont(), ... instead of open-coding

Use pr_warn() and friends instead of open-coding with printk().

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/mm,fault: use pr_warn_ratelimited()
Heiko Carstens [Thu, 12 Oct 2023 07:40:33 +0000 (09:40 +0200)]
s390/mm,fault: use pr_warn_ratelimited()

Use pr_warn_ratelimited() instead of printk_ratelimited().

checkpatch reports:

WARNING: Prefer ... pr_warn_ratelimited(...  to printk_ratelimited(KERN_WARNING ...
+       printk_ratelimited(KERN_WARNING

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/mm,fault: use __ratelimit() instead of printk_ratelimit()
Heiko Carstens [Thu, 12 Oct 2023 07:40:32 +0000 (09:40 +0200)]
s390/mm,fault: use __ratelimit() instead of printk_ratelimit()

Just like other architectures make use __ratelimit() instead of
printk_ratelimit().

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/mm,fault: reverse x-mas tree coding style
Heiko Carstens [Thu, 12 Oct 2023 07:40:31 +0000 (09:40 +0200)]
s390/mm,fault: reverse x-mas tree coding style

Have reverse x-mas tree coding style for variables everywhere.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/mm,fault: remove and improve comments, adjust whitespace
Heiko Carstens [Thu, 12 Oct 2023 07:40:30 +0000 (09:40 +0200)]
s390/mm,fault: remove and improve comments, adjust whitespace

Remove wrong, outdated, and pointless comments. Adjust wording for
some comments, and adjust whitespace at some places.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/pai_crypto: dynamically allocate percpu pai crypto map data structure
Thomas Richter [Mon, 21 Aug 2023 14:49:05 +0000 (16:49 +0200)]
s390/pai_crypto: dynamically allocate percpu pai crypto map data structure

Struct paicrypt_map is a data structure and is statically defined
for each possible CPU. Rework this and replace it by dynamically
allocated data structures created when a perf_event_open() system call
is invoked.

It is replaced by an array of pointers to all possible CPUs and
reference counting. The array of pointers is allocated when the first
event is created. For each online CPU an event is installed on, a struct
paicrypt_map is allocated and a pointer to struct cpu_cf_events is
stored in the array:

                        CPU   0   1   2   3    ...  N
                              +---+---+---+---+---+---+
     paicrypt_root::mapptr--> | * |   |   |   |...|   |
                              +-|-+---+---+---+---+---+
                                |
                                |
                               \|/
                        +--------------+
                        | paicrypt_map |
                        +--------------+

With this approach the large data structure is only allocated when
an event is actually installed and used.
Also implement proper reference counting for allocation and removal.

PAI crypto counter events can not be created when a CPU hot plug
add is processed. This means a CPU hot plug add does not get
the necessary PAI event to record PAI cryptography counter increments
on the newly added CPU. There is no possibility to notify user space
of a new CPU and the necessary event infrastructure assoiciated with
the file descriptor returned by perf_event_open() system call.
However system call perf_event_open() can use the newly added CPU
when issued after the CPU hot plug add.

Kernel CPU hot plug remove deletes the CPU and stops the PAI counters on
that CPU. When the process closes the file descriptor associated
with that event, the event's destroy() function removes any
allocated data structures and adjusts the reference counts.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/mm: make vmemmap_free() only for CONFIG_MEMORY_HOTPLUG available
Heiko Carstens [Mon, 16 Oct 2023 10:17:59 +0000 (12:17 +0200)]
s390/mm: make vmemmap_free() only for CONFIG_MEMORY_HOTPLUG available

Get rid of this W=1 compile warning:

arch/s390/mm/vmem.c:502:6: warning: no previous prototype for ‘vmemmap_free’ [-Wmissing-prototypes]
  502 | void vmemmap_free(unsigned long start, unsigned long end,
      |      ^~~~~~~~~~~~

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
17 months agos390/mm: remove __GFP_HIGHMEM masking
Heiko Carstens [Mon, 16 Oct 2023 10:10:04 +0000 (12:10 +0200)]
s390/mm: remove __GFP_HIGHMEM masking

Remove unnecessary __GFP_HIGHMEM masking, which was introduced with
commit 6326c26c1514 ("s390: convert various pgalloc functions to use
ptdescs"). Also remove a whitespace change which was introduced with
the same commit.

Link: https://lore.kernel.org/all/CAOzc2px-SFSnmjcPriiB3cm1fNj3+YC8S0VSp4t1QvDR0f4E2A@mail.gmail.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/vmem: remove unused variable
Vasily Gorbik [Thu, 12 Oct 2023 09:39:29 +0000 (11:39 +0200)]
s390/vmem: remove unused variable

Fix the follow warning reported by sparse:

arch/s390/boot/vmem.c:170:15: warning: unused variable ‘entry’ [-Wunused-variable]
  170 |         pte_t entry;
      |               ^~~~~

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390: add support for DCACHE_WORD_ACCESS
Heiko Carstens [Fri, 6 Oct 2023 13:42:42 +0000 (15:42 +0200)]
s390: add support for DCACHE_WORD_ACCESS

Implement load_unaligned_zeropad() and enable DCACHE_WORD_ACCESS to
speed up string operations in fs/dcache.c and fs/namei.c.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390: provide word-at-a-time implementation
Heiko Carstens [Fri, 6 Oct 2023 13:42:41 +0000 (15:42 +0200)]
s390: provide word-at-a-time implementation

Provide an s390 specific word-at-a-time implementation. Compared to the
generic variant the generated code for has_zero() is slightly
better. However find_zero() is much simpler since it reuses the result
of __fls() aka flogr() and now comes without any conditional branches,
while the generic variant has three of them.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/extable: reduce number of extable macros
Heiko Carstens [Fri, 6 Oct 2023 13:42:40 +0000 (15:42 +0200)]
s390/extable: reduce number of extable macros

Get rid of __EX_TABLE() macro, rename __EX_TABLE_UA() to __EX_TABLE()
and convert users of old __EX_TABLE() macro so they pass more parameters
to the changed __EX_TABLE() semantics.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/crash: fix virtual vs physical address confusion
Alexander Gordeev [Sun, 1 Oct 2023 08:01:31 +0000 (10:01 +0200)]
s390/crash: fix virtual vs physical address confusion

Fix virtual vs physical address confusion (which currently are the same).

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/crash: remove unused parameter
Alexander Gordeev [Sun, 1 Oct 2023 08:09:34 +0000 (10:09 +0200)]
s390/crash: remove unused parameter

Funciton loads_init() does not use loads_offset parameter.

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/ap: show APFS value on error reply 0x8B
Harald Freudenberger [Tue, 12 Sep 2023 07:02:07 +0000 (09:02 +0200)]
s390/ap: show APFS value on error reply 0x8B

With Secure Execution the error reply RX 0x8B now carries
an APFS value indicating why the request has been filtered
by a lower layer of the firmware. So display this value
as a warning line in the s390 debug feature trace.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/zcrypt: introduce new internal AP queue se_bound attribute
Harald Freudenberger [Tue, 12 Sep 2023 08:08:51 +0000 (10:08 +0200)]
s390/zcrypt: introduce new internal AP queue se_bound attribute

This patch introduces a new AP queue internal attribute
se_bound which reflects the bound state of an APQN within
a Secure Execution environment.

With introduction of Secure Execution guests now an
AP firmware queue needs to be bound to the guest before
usage. This patch introduces a new internal attribute
reflecting this bound state and some glue code to handle
this new field during lifetime of an AP queue device.

Together with that now the zcrypt scheduler considers
the state of the AP queues when a message is about to be
distributed among the existing queues. There is a new
function ap_queue_usable() which returns true only when
all conditions for using this AP queue device are fulfilled.
In details this means: the AP queue needs to be configured,
not checkstopped and within an SE environment it needs
to be bound. So the new function gives and indication
if the AP queue device is ready to serve requests or not.

Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/ap: re-init AP queues on config on
Harald Freudenberger [Tue, 12 Sep 2023 07:54:25 +0000 (09:54 +0200)]
s390/ap: re-init AP queues on config on

On a state toggle from config off to config on and on the
state toggle from checkstop to not checkstop the queue's
internal states was set but the state machine was not
nudged. This did not care as on the first enqueue of a
request the state machine kick ran.

However, within an Secure Execution guest a queue is
only chosen by the scheduler when it has been bound.
But to bind a queue, it needs to run through the initial
states (reset, enable interrupts, ...). So this is like
a chicken-and-egg problem and the result was in fact
that a queue was unusable after a config off/on toggle.

With some slight rework of the handling of these states
now the new function _ap_queue_init_state() is called
which is the core of the ap_queue_init_state() function
but without locking handling. This has the benefit that
it can be called on all the places where a (re-)init
of the AP queue's state machine is needed.

Fixes: 2d72eaf036d2 ("s390/ap: implement SE AP bind, unbind and associate")
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390: use control register bit defines
Heiko Carstens [Mon, 11 Sep 2023 19:40:13 +0000 (21:40 +0200)]
s390: use control register bit defines

Use control register bit defines instead of plain numbers where
possible.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/ctlreg: add control register bits
Heiko Carstens [Mon, 11 Sep 2023 19:40:12 +0000 (21:40 +0200)]
s390/ctlreg: add control register bits

Instead of having only masks for specific bit locations within control
registers, also define the bit numbers and use them to define the
masks.

The bit defines can be used to convert plain numbers to defines.

Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/irq: use CR0 defines to define CR0_IRQ_SUBCLASS_MASK
Heiko Carstens [Mon, 11 Sep 2023 19:40:11 +0000 (21:40 +0200)]
s390/irq: use CR0 defines to define CR0_IRQ_SUBCLASS_MASK

Use existing CR0 defines to define CR0_IRQ_SUBCLASS_MASK instead of
open-coding the defines again.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/ctlreg: add missing defines
Heiko Carstens [Mon, 11 Sep 2023 19:40:10 +0000 (21:40 +0200)]
s390/ctlreg: add missing defines

Add a couple of missing control register defines which otherwise would
prevent to convert other open-coded usages.

Acked-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/setup: make use of system_ctl_load()
Heiko Carstens [Mon, 11 Sep 2023 19:40:09 +0000 (21:40 +0200)]
s390/setup: make use of system_ctl_load()

Use system_ctl_load() instead of local_ctl_load() to reflect that
control register changes are supposed to be global.

Even though setup_cr() was ok, note that the usage of local_ctl_load()
would have been wrong, if it would have happened after the control
register save area was initialized: only local control register contents
would have been changed, but wouldn't be used for new CPUs.

With using system_ctl_load() the caller doesn't need to take care.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/ctlreg: add system_ctl_load()
Heiko Carstens [Mon, 11 Sep 2023 19:40:08 +0000 (21:40 +0200)]
s390/ctlreg: add system_ctl_load()

Add system_ctl_load() which can be used to load a value to a control
register system wide.

Refactor system_ctl_set_clear_bit() so it can handle all different
request types, and also rename it to system_ctlreg_modify()

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/early: use system_ctl_set_bit() instead of local_ctl_set_bit()
Heiko Carstens [Mon, 11 Sep 2023 19:40:07 +0000 (21:40 +0200)]
s390/early: use system_ctl_set_bit() instead of local_ctl_set_bit()

Use system_ctl_set_bit() instead of local_ctl_set_bit() to reflect
that the control register changes are supposed to be global. This
change is just for documentation purposes, since it still results only
in local control register contents being changed.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/ctlreg: allow to call system_ctl_set/clear_bit() early
Heiko Carstens [Mon, 11 Sep 2023 19:40:06 +0000 (21:40 +0200)]
s390/ctlreg: allow to call system_ctl_set/clear_bit() early

Allow to call system_ctl_set_bit() and system_clt_clear_bit() early, so
that users do not have to take care when the control register save area
has been initialized. Users are supposed to use system_ctl_set_bit() and
system:clt_clear_bit() for all control register changes which are supposed
to be seen globally.

Depending on the system state such calls will change:

- local control register contents
- save area and local control register contents
- save area and global control register contents

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/ctltreg: make initialization of control register save area explicit
Heiko Carstens [Mon, 11 Sep 2023 19:40:05 +0000 (21:40 +0200)]
s390/ctltreg: make initialization of control register save area explicit

Commit e1b9c2749af0 ("s390/smp: ensure global control register contents
are in sync") made the control register save area contained within the
lowcore at absolute address zero a resource which is used when
initializing CPUs. However this is anything but obvious from the code.

Add an ctlreg_init_save_area() function in order to make this explicit.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/ctlreg: add struct ctlreg
Heiko Carstens [Mon, 11 Sep 2023 19:40:04 +0000 (21:40 +0200)]
s390/ctlreg: add struct ctlreg

Add struct ctlreg to enforce strict type checking / usage for control
register functions.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/ctlreg: add type checking to __local_ctl_load() and __local_ctl_store()
Heiko Carstens [Mon, 11 Sep 2023 19:40:03 +0000 (21:40 +0200)]
s390/ctlreg: add type checking to __local_ctl_load() and __local_ctl_store()

Add type checking to __local_ctl_load() and __local_ctl_store(). For
both functions enforce to pass an array consisting of unsigned longs.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/kprobes,ptrace: open code struct per_reg
Heiko Carstens [Mon, 11 Sep 2023 19:40:02 +0000 (21:40 +0200)]
s390/kprobes,ptrace: open code struct per_reg

Open code struct per_regs within kprobes and ptrace code, since at both
locations a struct per_regs is passed to __local_ctl_load() and
__local_ctl_store() which prevents to implement type checking for both
functions.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/ctlreg: change parameters of __local_ctl_load() and __local_ctl_store()
Heiko Carstens [Mon, 11 Sep 2023 19:40:01 +0000 (21:40 +0200)]
s390/ctlreg: change parameters of __local_ctl_load() and __local_ctl_store()

Change __local_ctl_load() and __local_ctl_store(), so that control
register parameters come first.

This way all control handling functions consistently have control
register(s) parameter first.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/ctlreg: use local_ctl_load() and local_ctl_store() where possible
Heiko Carstens [Mon, 11 Sep 2023 19:40:00 +0000 (21:40 +0200)]
s390/ctlreg: use local_ctl_load() and local_ctl_store() where possible

Convert all single control register usages of __local_ctl_load() and
__local_ctl_store() to local_ctl_load() and local_ctl_store().

This also requires to change the type of some struct lowcore members
from __u64 to unsigned long.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/ctlreg: add local_ctl_load() and local_ctl_store()
Heiko Carstens [Mon, 11 Sep 2023 19:39:59 +0000 (21:39 +0200)]
s390/ctlreg: add local_ctl_load() and local_ctl_store()

Add local_ctl_load() and local_ctl_store() which load and store contents
for only a single control register.

This allows for easier to read code, but also better type checking,
since __local_ctl_load() and __local_ctl_store() do not come with any
type checking at all (which will be changed with subsequent patches).

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/ctlreg: add local and system prefix to some functions
Heiko Carstens [Mon, 11 Sep 2023 19:39:58 +0000 (21:39 +0200)]
s390/ctlreg: add local and system prefix to some functions

Add local and system prefix to some functions to clarify they change
control register contents on either the local CPU or the on all CPUs.

This results in the following API:

Two defines which load and save multiple control registers.
The defines correlate with the following C prototypes:

void __local_ctl_load(unsigned long *, unsigned int cr_low, unsigned int cr_high);
void __local_ctl_store(unsigned long *, unsigned int cr_low, unsigned int cr_high);

Two functions which locally set or clear one bit for a specified
control register:

void local_ctl_set_bit(unsigned int cr, unsigned int bit);
void local_ctl_clear_bit(unsigned int cr, unsigned int bit);

Two functions which set or clear one bit for a specified control
register on all CPUs:

void system_ctl_set_bit(unsigned int cr, unsigned int bit);
void system_ctl_clear_bit(unsigend int cr, unsigned int bit);

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/ctlreg: cleanup inline assemblies
Heiko Carstens [Mon, 11 Sep 2023 19:39:57 +0000 (21:39 +0200)]
s390/ctlreg: cleanup inline assemblies

Use symbolic names for operands, remove typedefs, and slightly
refactor the code.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/ctlreg: rename ctl_reg.h to ctlreg.h
Heiko Carstens [Mon, 11 Sep 2023 19:39:56 +0000 (21:39 +0200)]
s390/ctlreg: rename ctl_reg.h to ctlreg.h

Rename ctl_reg.h to ctlreg.h so it matches not only ctlreg.c but also
other control register related function, union, and structure names,
which all come with a ctlreg prefix.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/ctlreg: move control register code to separate file
Heiko Carstens [Mon, 11 Sep 2023 19:39:55 +0000 (21:39 +0200)]
s390/ctlreg: move control register code to separate file

Control register handling has nothing to do with low level SMP code.
Move it to a separate file.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/setup: use strlcat() instead of strcat()
Heiko Carstens [Mon, 11 Sep 2023 11:03:00 +0000 (13:03 +0200)]
s390/setup: use strlcat() instead of strcat()

Use strlcat() instead of strcat() in order to get rid of this W=1
warning:

In function ‘strlcat’,
    inlined from ‘strcat’ at ./include/linux/fortify-string.h:432:6,
    inlined from ‘setup_zfcpdump’ at arch/s390/kernel/setup.c:308:2,
    inlined from ‘setup_arch’ at arch/s390/kernel/setup.c:1002:2:
./include/linux/fortify-string.h:406:19: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
  406 |         p[actual] = '\0';
      |         ~~~~~~~~~~^~~~~~

As stated in fortify-string.h strcat() should not be used, since
FORTIFY_SOURCE cannot figure out the size of the destination buffer.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/smp: keep the original lowcore for CPU 0
Ilya Leoshkevich [Mon, 31 Jul 2023 15:07:11 +0000 (17:07 +0200)]
s390/smp: keep the original lowcore for CPU 0

Now that CPU 0 is not hotpluggable, it is not necessary to support
freeing its stacks. Delete all the code that migrates it to new stacks
and a new lowcore.

Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/smp: disallow CPU hotplug of CPU 0
Tobias Huschle [Wed, 16 Aug 2023 07:28:20 +0000 (09:28 +0200)]
s390/smp: disallow CPU hotplug of CPU 0

On s390, CPU 0 has special properties in comparison to other CPUs, as it
cannot be deconfigured for example. Therefore, allowing to hotplug CPU 0
introduces additional complexity when handling these properties.
Disallowing to hotplug CPU 0 allows to remove such complexities.

This follows x86 which also prevents offlining of CPU0 since commit
e59e74dc48a3 ("x86/topology: Remove CPU0 hotplug option").

[hca@linux.ibm.com: changed commit message]
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Tobias Huschle <huschle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/smp,mcck: fix early IPI handling
Heiko Carstens [Tue, 5 Sep 2023 13:49:37 +0000 (15:49 +0200)]
s390/smp,mcck: fix early IPI handling

Both the external call as well as the emergency signal submask bits in
control register 0 are set before any interrupt handler is registered.

Change the order and first register the interrupt handler and only then
enable the interrupts by setting the corresponding bits in control
register 0.

This prevents that the second part of the machine check handler for
early machine check handling is not executed: the machine check handler
sends an IPI to the CPU it runs on. If the corresponding interrupts are
enabled, but no interrupt handler is present, the interrupt is ignored.

Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
18 months agos390/zcrypt: update list of EP11 operation modes
Ingo Franzki [Thu, 31 Aug 2023 15:36:43 +0000 (17:36 +0200)]
s390/zcrypt: update list of EP11 operation modes

Add additional operation mode strings into the EP11 operation mode table.
These strings are returned by sysfs entries /sys/devices/ap/cardxx/op_modes
and /sys/devices/ap/cardxx/xx.yyyy/op_modes.

Signed-off-by: Ingo Franzki <ifranzki@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
19 months agoLinux 6.6-rc2
Linus Torvalds [Sun, 17 Sep 2023 21:40:24 +0000 (14:40 -0700)]
Linux 6.6-rc2

19 months agoMerge tag 'x86-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 17 Sep 2023 18:13:37 +0000 (11:13 -0700)]
Merge tag 'x86-urgent-2023-09-17' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - Fix an UV boot crash

   - Skip spurious ENDBR generation on _THIS_IP_

   - Fix ENDBR use in putuser() asm methods

   - Fix corner case boot crashes on 5-level paging

   - and fix a false positive WARNING on LTO kernels"

* tag 'x86-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/purgatory: Remove LTO flags
  x86/boot/compressed: Reserve more memory for page tables
  x86/ibt: Avoid duplicate ENDBR in __put_user_nocheck*()
  x86/ibt: Suppress spurious ENDBR
  x86/platform/uv: Use alternate source for socket to node data

19 months agoMerge tag 'sched-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 17 Sep 2023 18:10:23 +0000 (11:10 -0700)]
Merge tag 'sched-urgent-2023-09-17' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
 "Fix a performance regression on large SMT systems, an Intel SMT4
  balancing bug, and a topology setup bug on (Intel) hybrid processors"

* tag 'sched-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sched: Restore the SD_ASYM_PACKING flag in the DIE domain
  sched/fair: Fix SMT4 group_smt_balance handling
  sched/fair: Optimize should_we_balance() for large SMT systems

19 months agoMerge tag 'objtool-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 17 Sep 2023 17:59:37 +0000 (10:59 -0700)]
Merge tag 'objtool-urgent-2023-09-17' of git://git./linux/kernel/git/tip/tip

Pull objtool fix from Ingo Molnar:
 "Fix a cold functions related false-positive objtool warning that
  triggers on Clang"

* tag 'objtool-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Fix _THIS_IP_ detection for cold functions

19 months agoMerge tag 'core-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 17 Sep 2023 17:55:35 +0000 (10:55 -0700)]
Merge tag 'core-urgent-2023-09-17' of git://git./linux/kernel/git/tip/tip

Pull WARN fix from Ingo Molnar:
 "Fix a missing preempt-enable in the WARN() slowpath"

* tag 'core-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  panic: Reenable preemption in WARN slowpath

19 months agostat: remove no-longer-used helper macros
Linus Torvalds [Sun, 3 Sep 2023 18:09:56 +0000 (11:09 -0700)]
stat: remove no-longer-used helper macros

The choose_32_64() macros were added to deal with an odd inconsistency
between the 32-bit and 64-bit layout of 'struct stat' way back when in
commit a52dd971f947 ("vfs: de-crapify "cp_new_stat()" function").

Then a decade later Mikulas noticed that said inconsistency had been a
mistake in the early x86-64 port, and shouldn't have existed in the
first place.  So commit 932aba1e1690 ("stat: fix inconsistency between
struct stat and struct compat_stat") removed the uses of the helpers.

But the helpers remained around, unused.

Get rid of them.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
19 months agoMerge tag '6.6-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 17 Sep 2023 17:41:42 +0000 (10:41 -0700)]
Merge tag '6.6-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:
 "Three small SMB3 client fixes, one to improve a null check and two
  minor cleanups"

* tag '6.6-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: fix some minor typos and repeated words
  smb3: correct places where ENOTSUPP is used instead of preferred EOPNOTSUPP
  smb3: move server check earlier when setting channel sequence number

19 months agoMerge tag '6.6-rc1-ksmbd' of git://git.samba.org/ksmbd
Linus Torvalds [Sun, 17 Sep 2023 17:38:01 +0000 (10:38 -0700)]
Merge tag '6.6-rc1-ksmbd' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:
 "Two ksmbd server fixes"

* tag '6.6-rc1-ksmbd' of git://git.samba.org/ksmbd:
  ksmbd: fix passing freed memory 'aux_payload_buf'
  ksmbd: remove unneeded mark_inode_dirty in set_info_sec()

19 months agoMerge tag 'ext4_for_linus-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 17 Sep 2023 17:33:53 +0000 (10:33 -0700)]
Merge tag 'ext4_for_linus-6.6-rc2' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Regression and bug fixes for ext4"

* tag 'ext4_for_linus-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix rec_len verify error
  ext4: do not let fstrim block system suspend
  ext4: move setting of trimmed bit into ext4_try_to_trim_range()
  jbd2: Fix memory leak in journal_init_common()
  jbd2: Remove page size assumptions
  buffer: Make bh_offset() work for compound pages

19 months agox86/purgatory: Remove LTO flags
Song Liu [Thu, 14 Sep 2023 17:01:38 +0000 (10:01 -0700)]
x86/purgatory: Remove LTO flags

-flto* implies -ffunction-sections. With LTO enabled, ld.lld generates
multiple .text sections for purgatory.ro:

  $ readelf -S purgatory.ro  | grep " .text"
    [ 1] .text             PROGBITS         0000000000000000  00000040
    [ 7] .text.purgatory   PROGBITS         0000000000000000  000020e0
    [ 9] .text.warn        PROGBITS         0000000000000000  000021c0
    [13] .text.sha256_upda PROGBITS         0000000000000000  000022f0
    [15] .text.sha224_upda PROGBITS         0000000000000000  00002be0
    [17] .text.sha256_fina PROGBITS         0000000000000000  00002bf0
    [19] .text.sha224_fina PROGBITS         0000000000000000  00002cc0

This causes WARNING from kexec_purgatory_setup_sechdrs():

  WARNING: CPU: 26 PID: 110894 at kernel/kexec_file.c:919
  kexec_load_purgatory+0x37f/0x390

Fix this by disabling LTO for purgatory.

[ AFAICT, x86 is the only arch that supports LTO and purgatory. ]

We could also fix this with an explicit linker script to rejoin .text.*
sections back into .text. However, given the benefit of LTOing purgatory
is small, simply disable the production of more .text.* sections for now.

Fixes: b33fff07e3e3 ("x86, build: allow LTO to be selected")
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230914170138.995606-1-song@kernel.org
19 months agox86/boot/compressed: Reserve more memory for page tables
Kirill A. Shutemov [Fri, 15 Sep 2023 07:02:21 +0000 (10:02 +0300)]
x86/boot/compressed: Reserve more memory for page tables

The decompressor has a hard limit on the number of page tables it can
allocate. This limit is defined at compile-time and will cause boot
failure if it is reached.

The kernel is very strict and calculates the limit precisely for the
worst-case scenario based on the current configuration. However, it is
easy to forget to adjust the limit when a new use-case arises. The
worst-case scenario is rarely encountered during sanity checks.

In the case of enabling 5-level paging, a use-case was overlooked. The
limit needs to be increased by one to accommodate the additional level.
This oversight went unnoticed until Aaron attempted to run the kernel
via kexec with 5-level paging and unaccepted memory enabled.

Update wost-case calculations to include 5-level paging.

To address this issue, let's allocate some extra space for page tables.
128K should be sufficient for any use-case. The logic can be simplified
by using a single value for all kernel configurations.

[ Also add a warning, should this memory run low - by Dave Hansen. ]

Fixes: 34bbb0009f3b ("x86/boot/compressed: Enable 5-level paging during decompression stage")
Reported-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915070221.10266-1-kirill.shutemov@linux.intel.com
19 months agoMerge tag 'kbuild-fixes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahi...
Linus Torvalds [Sat, 16 Sep 2023 22:27:00 +0000 (15:27 -0700)]
Merge tag 'kbuild-fixes-v6.6' of git://git./linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Fix kernel-devel RPM and linux-headers Deb package

 - Fix too long argument list error in 'make modules_install'

* tag 'kbuild-fixes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: avoid long argument lists in make modules_install
  kbuild: fix kernel-devel RPM package and linux-headers Deb package

19 months agovm: fix move_vma() memory accounting being off
Linus Torvalds [Sat, 16 Sep 2023 19:31:42 +0000 (12:31 -0700)]
vm: fix move_vma() memory accounting being off

Commit 408579cd627a ("mm: Update do_vmi_align_munmap() return
semantics") seems to have updated one of the callers of do_vmi_munmap()
incorrectly: it used to check for the error case (which didn't
change: negative means error).

That commit changed the check to the success case (which did change:
before that commit, 0 was success, and 1 was "success and lock
downgraded".  After the change, it's always 0 for success, and the lock
will have been released if requested).

This didn't change any actual VM behavior _except_ for memory accounting
when 'VM_ACCOUNT' was set on the vma.  Which made the wrong return value
test fairly subtle, since everything continues to work.

Or rather - it continues to work but the "Committed memory" accounting
goes all wonky (Committed_AS value in /proc/meminfo), and depending on
settings that then causes problems much much later as the VM relies on
bogus statistics for its heuristics.

Revert that one line of the change back to the original logic.

Fixes: 408579cd627a ("mm: Update do_vmi_align_munmap() return semantics")
Reported-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Reported-bisected-and-tested-by: Michael Labiuk <michael.labiuk@virtuozzo.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Link: https://lore.kernel.org/all/1694366957@msgid.manchmal.in-ulm.de/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
19 months agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 16 Sep 2023 18:54:48 +0000 (11:54 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "16 small(ish) fixes all in drivers.

  The major fixes are in pm8001 (fixes MSI-X issue going back to its
  origin), the qla2xxx endianness fix, which fixes a bug on big endian
  and the lpfc ones which can cause an oops on module removal without
  them"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: lpfc: Prevent use-after-free during rmmod with mapped NVMe rports
  scsi: lpfc: Early return after marking final NLP_DROPPED flag in dev_loss_tmo
  scsi: lpfc: Fix the NULL vs IS_ERR() bug for debugfs_create_file()
  scsi: target: core: Fix target_cmd_counter leak
  scsi: pm8001: Setup IRQs on resume
  scsi: pm80xx: Avoid leaking tags when processing OPC_INB_SET_CONTROLLER_CONFIG command
  scsi: pm80xx: Use phy-specific SAS address when sending PHY_START command
  scsi: ufs: core: Poll HCS.UCRDY before issuing a UIC command
  scsi: ufs: core: Move __ufshcd_send_uic_cmd() outside host_lock
  scsi: qedf: Add synchronization between I/O completions and abort
  scsi: target: Replace strlcpy() with strscpy()
  scsi: qla2xxx: Fix NULL vs IS_ERR() bug for debugfs_create_dir()
  scsi: qla2xxx: Use raw_smp_processor_id() instead of smp_processor_id()
  scsi: qla2xxx: Correct endianness for rqstlen and rsplen
  scsi: ppa: Fix accidentally reversed conditions for 16-bit and 32-bit EPP
  scsi: megaraid_sas: Fix deadlock on firmware crashdump

19 months agoMerge tag 'ata-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal...
Linus Torvalds [Sat, 16 Sep 2023 18:49:57 +0000 (11:49 -0700)]
Merge tag 'ata-6.6-rc2' of git://git./linux/kernel/git/dlemoal/libata

Pull ata fixes from Damien Le Moal:

 - Fix link power management transitions to disallow unsupported states
   (Niklas)

 - A small string handling fix for the sata_mv driver (Christophe)

 - Clear port pending interrupts before reset, as per AHCI
   specifications (Szuying).

   Followup fixes for this one are to not clear ATA_PFLAG_EH_PENDING in
   ata_eh_reset() to allow EH to continue on with other actions recorded
   with error interrupts triggered before EH completes. And an
   additional fix to avoid thawing a port twice in EH (Niklas)

 - Small code style fixes in the pata_parport driver to silence the
   build bot as it keeps complaining about bad indentation (me)

 - A fix for the recent CDL code to avoid fetching sense data for
   successful commands when not necessary for correct operation (Niklas)

* tag 'ata-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: libata-core: fetch sense data for successful commands iff CDL enabled
  ata: libata-eh: do not thaw the port twice in ata_eh_reset()
  ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset()
  ata: pata_parport: Fix code style issues
  ata: libahci: clear pending interrupt status
  ata: sata_mv: Fix incorrect string length computation in mv_dump_mem()
  ata: libata: disallow dev-initiated LPM transitions to unsupported states

19 months agoMerge tag 'usb-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sat, 16 Sep 2023 18:37:11 +0000 (11:37 -0700)]
Merge tag 'usb-6.6-rc2' of git://git./linux/kernel/git/gregkh/usb

Pull USB fix from Greg KH:
 "Here is a single USB fix for a much-reported regression for 6.6-rc1.

  It resolves a crash in the typec debugfs code for many systems. It's
  been in linux-next with no reported issues, and many people have
  reported it resolving their problem with 6.6-rc1"

* tag 'usb-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: typec: ucsi: Fix NULL pointer dereference

19 months agoMerge tag 'driver-core-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 16 Sep 2023 18:26:52 +0000 (11:26 -0700)]
Merge tag 'driver-core-6.6-rc2' of git://git./linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here is a single driver core fix for a much-reported-by-sysbot issue
  that showed up in 6.6-rc1. It's been submitted by many people, all in
  the same way, so it obviously fixes things for them all.

  Also in here is a single documentation update adding riscv to the
  embargoed hardware document in case there are any future issues with
  that processor family.

  Both of these have been in linux-next with no reported problems"

* tag 'driver-core-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  Documentation: embargoed-hardware-issues.rst: Add myself for RISC-V
  driver core: return an error when dev_set_name() hasn't happened

19 months agoMerge tag 'char-misc-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sat, 16 Sep 2023 18:17:19 +0000 (11:17 -0700)]
Merge tag 'char-misc-6.6-rc2' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc fix from Greg KH:
 "Here is a single patch for 6.6-rc2 that reverts a 6.5 change for the
  comedi subsystem that has ended up being incorrect and caused drivers
  that were working for people to be unable to be able to be selected to
  build at all.

  To fix this, the Kconfig change needs to be reverted and a future set
  of fixes for the ioport dependancies will show up in 6.7-rc1 (there's
  no rush for them.)

  This has been in linux-next with no reported issues"

* tag 'char-misc-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  Revert "comedi: add HAS_IOPORT dependencies"

19 months agoMerge tag 'i2c-for-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sat, 16 Sep 2023 18:09:18 +0000 (11:09 -0700)]
Merge tag 'i2c-for-6.6-rc2' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "The main thing is the removal of 'probe_new' because all i2c client
  drivers are converted now. Thanks Uwe, this marks the end of a long
  conversion process.

  Other than that, we have a few Kconfig updates and driver bugfixes"

* tag 'i2c-for-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: cadence: Fix the kernel-doc warnings
  i2c: aspeed: Reset the i2c controller when timeout occurs
  i2c: I2C_MLXCPLD on ARM64 should depend on ACPI
  i2c: Make I2C_ATR invisible
  i2c: Drop legacy callback .probe_new()
  w1: ds2482: Switch back to use struct i2c_driver's .probe()

19 months agoata: libata-core: fetch sense data for successful commands iff CDL enabled
Niklas Cassel [Wed, 13 Sep 2023 15:04:43 +0000 (17:04 +0200)]
ata: libata-core: fetch sense data for successful commands iff CDL enabled

Currently, we fetch sense data for a _successful_ command if either:
1) Command was NCQ and ATA_DFLAG_CDL_ENABLED flag set (flag
   ATA_DFLAG_CDL_ENABLED will only be set if the Successful NCQ command
   sense data supported bit is set); or
2) Command was non-NCQ and regular sense data reporting is enabled.

This means that case 2) will trigger for a non-NCQ command which has
ATA_SENSE bit set, regardless if CDL is enabled or not.

This decision was by design. If the device reports that it has sense data
available, it makes sense to fetch that sense data, since the sk/asc/ascq
could be important information regardless if CDL is enabled or not.

However, the fetching of sense data for a successful command is done via
ATA EH. Considering how intricate the ATA EH is, we really do not want to
invoke ATA EH unless absolutely needed.

Before commit 18bd7718b5c4 ("scsi: ata: libata: Handle completion of CDL
commands using policy 0xD") we never fetched sense data for successful
commands.

In order to not invoke the ATA EH unless absolutely necessary, even if the
device claims support for sense data reporting, only fetch sense data for
successful (NCQ and non-NCQ commands) commands that are using CDL.

[Damien] Modified the check to test the qc flag ATA_QCFLAG_HAS_CDL
instead of the device support for CDL, which is implied for commands
using CDL.

Fixes: 3ac873c76d79 ("ata: libata-core: fix when to fetch sense data for successful commands")
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
19 months agoata: libata-eh: do not thaw the port twice in ata_eh_reset()
Niklas Cassel [Wed, 13 Sep 2023 22:19:17 +0000 (00:19 +0200)]
ata: libata-eh: do not thaw the port twice in ata_eh_reset()

commit 1e641060c4b5 ("libata: clear eh_info on reset completion") added
a workaround that broke the retry mechanism in ATA EH.

Tejun himself suggested to remove this workaround when it was identified
to cause additional problems:
https://lore.kernel.org/linux-ide/20110426135027.GI878@htj.dyndns.org/

He even said:
"Hmm... it seems I wasn't thinking straight when I added that work around."
https://lore.kernel.org/linux-ide/20110426155229.GM878@htj.dyndns.org/

While removing the workaround solved the issue, however, the workaround was
kept to avoid "spurious hotplug events during reset", and instead another
workaround was added on top of the existing workaround in commit
8c56cacc724c ("libata: fix unexpectedly frozen port after ata_eh_reset()").

Because these IRQs happened when the port was frozen, we know that they
were actually a side effect of PxIS and IS.IPS(x) not being cleared before
the COMRESET. This is now done in commit 94152042eaa9 ("ata: libahci: clear
pending interrupt status"), so these workarounds can now be removed.

Since commit 1e641060c4b5 ("libata: clear eh_info on reset completion") has
now been reverted, the ATA EH retry mechanism is functional again, so there
is once again no need to thaw the port more than once in ata_eh_reset().

This reverts "the workaround on top of the workaround" introduced in commit
8c56cacc724c ("libata: fix unexpectedly frozen port after ata_eh_reset()").

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
19 months agoata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset()
Niklas Cassel [Wed, 13 Sep 2023 22:19:16 +0000 (00:19 +0200)]
ata: libata-eh: do not clear ATA_PFLAG_EH_PENDING in ata_eh_reset()

ata_scsi_port_error_handler() starts off by clearing ATA_PFLAG_EH_PENDING,
before calling ap->ops->error_handler() (without holding the ap->lock).

If an error IRQ is received while ap->ops->error_handler() is running,
the irq handler will set ATA_PFLAG_EH_PENDING.

Once ap->ops->error_handler() returns, ata_scsi_port_error_handler()
checks if ATA_PFLAG_EH_PENDING is set, and if it is, another iteration
of ATA EH is performed.

The problem is that ATA_PFLAG_EH_PENDING is not only cleared by
ata_scsi_port_error_handler(), it is also cleared by ata_eh_reset().

ata_eh_reset() is called by ap->ops->error_handler(). This additional
clearing done by ata_eh_reset() breaks the whole retry logic in
ata_scsi_port_error_handler(). Thus, if an error IRQ is received while
ap->ops->error_handler() is running, the port will currently remain
frozen and will never get re-enabled.

The additional clearing in ata_eh_reset() was introduced in commit
1e641060c4b5 ("libata: clear eh_info on reset completion").

Looking at the original error report:
https://marc.info/?l=linux-ide&m=124765325828495&w=2

We can see the following happening:
[    1.074659] ata3: XXX port freeze
[    1.074700] ata3: XXX hardresetting link, stopping engine
[    1.074746] ata3: XXX flipping SControl

[    1.411471] ata3: XXX irq_stat=400040 CONN|PHY
[    1.411475] ata3: XXX port freeze

[    1.420049] ata3: XXX starting engine
[    1.420096] ata3: XXX rc=0, class=1
[    1.420142] ata3: XXX clearing IRQs for thawing
[    1.420188] ata3: XXX port thawed
[    1.420234] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

We are not supposed to be able to receive an error IRQ while the port is
frozen (PxIE is set to 0, i.e. all IRQs for the port are disabled).

AHCI 1.3.1 section 10.7.1.1 First Tier (IS Register) states:
"Each bit location can be thought of as reporting a '1' if the virtual
"interrupt line" for that port is indicating it wishes to generate an
interrupt. That is, if a port has one or more interrupt status bit set,
and the enables for those status bits are set, then this bit shall be set."

Additionally, AHCI state P:ComInit clearly shows that the state machine
will only jump to P:ComInitSetIS (which sets IS.IPS(x) to '1'), if PxIE.PCE
is set to '1'. In our case, PxIE is set to 0, so IS.IPS(x) won't get set.

So IS.IPS(x) only gets set if PxIS and PxIE is set.

AHCI 1.3.1 section 10.7.1.1 First Tier (IS Register) also states:
"The bits in this register are read/write clear. It is set by the level of
the virtual interrupt line being a set, and cleared by a write of '1' from
the software."

So if IS.IPS(x) is set, you need to explicitly clear it by writing a 1 to
IS.IPS(x) for that port.

Since PxIE is cleared, the only way to get an interrupt while the port is
frozen, is if IS.IPS(x) is set, and the only way IS.IPS(x) can be set when
the port is frozen, is if it was set before the port was frozen.

However, since commit 737dd811a3db ("ata: libahci: clear pending interrupt
status"), we clear both PxIS and IS.IPS(x) after freezing the port, but
before the COMRESET, so the problem that commit 1e641060c4b5 ("libata:
clear eh_info on reset completion") fixed can no longer happen.

Thus, revert commit 1e641060c4b5 ("libata: clear eh_info on reset
completion"), so that the retry logic in ata_scsi_port_error_handler()
works once again. (The retry logic is still needed, since we can still
get an error IRQ _after_ the port has been thawed, but before
ata_scsi_port_error_handler() takes the ap->lock in order to check
if ATA_PFLAG_EH_PENDING is set.)

Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
19 months agoMerge tag 'linux-kselftest-fixes-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kerne...
Linus Torvalds [Sat, 16 Sep 2023 02:22:20 +0000 (19:22 -0700)]
Merge tag 'linux-kselftest-fixes-6.6-rc2' of git://git./linux/kernel/git/shuah/linux-kselftest

Pull more kselftest fixes from Shuah Khan
 "Fixes to user_events test and ftrace test.

  The user_events test was enabled by default in Linux 6.6-rc1. The
  following fixes are for bugs found since then:

   - add checks for dependencies and skip the test if they aren't met.

     The user_events test requires root access, and tracefs and
     user_events enabled. It leaves tracefs mounted and a fix is in
     progress for that missing piece.

   - create user_events test-specific Kconfig fragments

  ftrace test fixes:

   - unmount tracefs for recovering environment. Fix identified during
     the above mentioned user_events dependencies fix.

   - adds softlink to latest log directory improving usage"

* tag 'linux-kselftest-fixes-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: tracing: Fix to unmount tracefs for recovering environment
  selftests: user_events: create test-specific Kconfig fragments
  ftrace/selftests: Add softlink to latest log directory
  selftests/user_events: Fix failures when user_events is not installed

19 months agoMerge tag 'nfsd-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Linus Torvalds [Fri, 15 Sep 2023 23:48:44 +0000 (16:48 -0700)]
Merge tag 'nfsd-6.6-1' of git://git./linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Use correct order when encoding NFSv4 RENAME change_info

 - Fix a potential oops during NFSD shutdown

* tag 'nfsd-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: fix possible oops when nfsd/pool_stats is closed.
  nfsd: fix change_info in NFSv4 RENAME replies

19 months agoMerge tag 'pm-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Fri, 15 Sep 2023 22:11:53 +0000 (15:11 -0700)]
Merge tag 'pm-6.6-rc2' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "Fix the handling of block devices in the test_resume mode of
  hibernation (Chen Yu)"

* tag 'pm-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: hibernate: Fix the exclusive get block device in test_resume mode
  PM: hibernate: Rename function parameter from snapshot_test to exclusive

19 months agoMerge tag 'thermal-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 15 Sep 2023 21:52:59 +0000 (14:52 -0700)]
Merge tag 'thermal-6.6-rc2' of git://git./linux/kernel/git/rafael/linux-pm

Pull thermal control fixes from Rafael Wysocki:
 "These fix a thermal core breakage introduced by one of the recent
  changes, amend those changes by adding 'const' to a new callback
  argument and fix two memory leaks.

  Specifics:

   - Unbreak disabled trip point check in handle_thermal_trip() that may
     cause it to skip enabled trip points (Rafael Wysocki)

   - Add missing of_node_put() to of_find_trip_id() and
     thermal_of_for_each_cooling_maps() that each break out of a
     for_each_child_of_node() loop without dropping the reference to the
     child object (Julia Lawall)

   - Constify the recently added trip argument of the .get_trend()
     thermal zone callback (Rafael Wysocki)"

* tag 'thermal-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: core: Fix disabled trip point check in handle_thermal_trip()
  thermal: Constify the trip argument of the .get_trend() zone callback
  thermal/of: add missing of_node_put()

19 months agoMerge tag 'for-6.6/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device...
Linus Torvalds [Fri, 15 Sep 2023 21:30:54 +0000 (14:30 -0700)]
Merge tag 'for-6.6/dm-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - Fix DM core retrieve_deps() UAF race due to missing locking of a DM
   table's list of devices that is managed using dm_{get,put}_device.

 - Revert DM core's half-baked RCU optimization if IO submitter has set
   REQ_NOWAIT. Can be revisited, and properly justified, after
   comprehensively auditing all of DM to also pass GFP_NOWAIT for any
   allocations if REQ_NOWAIT used.

* tag 'for-6.6/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: don't attempt to queue IO under RCU protection
  dm: fix a race condition in retrieve_deps

19 months agoMerge tag 'block-6.6-2023-09-15' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 15 Sep 2023 21:05:58 +0000 (14:05 -0700)]
Merge tag 'block-6.6-2023-09-15' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - NVMe pull via Keith:
      - nvme-tcp iov len fix (Varun)
      - nvme-hwmon const qualifier for safety (Krzysztof)
      - nvme-fc null pointer checks (Nigel)
      - nvme-pci no numa node fix (Pratyush)
      - nvme timeout fix for non-compliant controllers (Keith)

 - MD pull via Song fixing regressions with both 6.5 and 6.6

 - Fix a use-after-free regression in resizing blk-mq tags (Chengming)

* tag 'block-6.6-2023-09-15' of git://git.kernel.dk/linux:
  nvme: avoid bogus CRTO values
  md: Put the right device in md_seq_next
  nvme-pci: do not set the NUMA node of device if it has none
  blk-mq: fix tags UAF when shrinking q->nr_hw_queues
  md/raid1: fix error: ISO C90 forbids mixed declarations
  md: fix warning for holder mismatch from export_rdev()
  md: don't dereference mddev after export_rdev()
  nvme-fc: Prevent null pointer dereference in nvme_fc_io_getuuid()
  nvme: host: hwmon: constify pointers to hwmon_channel_info
  nvmet-tcp: pass iov_len instead of sg->length to bvec_set_page()

19 months agoMerge tag 'io_uring-6.6-2023-09-15' of git://git.kernel.dk/linux
Linus Torvalds [Fri, 15 Sep 2023 20:55:29 +0000 (13:55 -0700)]
Merge tag 'io_uring-6.6-2023-09-15' of git://git.kernel.dk/linux

Pull io_uring fix from Jens Axboe:
 "Just a single fix, fixing a regression with poll first, recvmsg, and
  using a provided buffer"

* tag 'io_uring-6.6-2023-09-15' of git://git.kernel.dk/linux:
  io_uring/net: fix iter retargeting for selected buf

19 months agoMerge tag 'firewire-fixes-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 15 Sep 2023 20:51:01 +0000 (13:51 -0700)]
Merge tag 'firewire-fixes-6.6-rc2' of git://git./linux/kernel/git/ieee1394/linux1394

Pull firewire fix from Takashi Sakamoto:
 "A change applied to v6.5 kernel brings an issue that usual GFP
  allocation is done in atomic context under acquired spin-lock. Let us
  revert it"

* tag 'firewire-fixes-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  Revert "firewire: core: obsolete usage of GFP_ATOMIC at building node tree"

19 months agoMerge tag 'drm-fixes-2023-09-15' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 15 Sep 2023 20:25:52 +0000 (13:25 -0700)]
Merge tag 'drm-fixes-2023-09-15' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Regular rc2 fixes pull, mostly made up of amdgpu stuff, one i915, and
  a bunch of others, one vkms locking violation is reverted.

  connector:
   - doc fix

  exec:
   - workaround lockdep issue

  tests:
   - fix a UAF

  vkms:
   - revert hrtimer fix

  fbdev:
   - g364fb: fix build failure with mips

  i915:
   - Only check eDP HPD when AUX CH is shared.

  amdgpu:
   - GC 9.4.3 fixes
   - Fix white screen issues with S/G display on system with >= 64G of ram
   - Replay fixes
   - SMU 13.0.6 fixes
   - AUX backlight fix
   - NBIO 4.3 SR-IOV fixes for HDP
   - RAS fixes
   - DP MST resume fix
   - Fix segfault on systems with no vbios
   - DPIA fixes

  amdkfd:
   - CWSR grace period fix
   - Unaligned doorbell fix
   - CRIU fix for GFX11
   - Add missing TLB flush on gfx10 and newer

  radeon:
   - make fence wait in suballocator uninterrruptable

  gm12u320:
   - Fix the timeout usage for usb_bulk_msg()"

* tag 'drm-fixes-2023-09-15' of git://anongit.freedesktop.org/drm/drm: (29 commits)
  drm/tests: helpers: Avoid a driver uaf
  Revert "drm/vkms: Fix race-condition between the hrtimer and the atomic commit"
  drm/amdkfd: Insert missing TLB flush on GFX10 and later
  drm/i915: Only check eDP HPD when AUX CH is shared
  drm/amd/display: Fix 2nd DPIA encoder Assignment
  drm/amd/display: Add DPIA Link Encoder Assignment Fix
  drm/amd/display: fix replay_mode kernel-doc warning
  drm/amdgpu: Handle null atom context in VBIOS info ioctl
  drm/amdkfd: Checkpoint and restore queues on GFX11
  drm/amd/display: Adjust the MST resume flow
  drm/amdgpu: fallback to old RAS error message for aqua_vanjaram
  drm/amdgpu/nbio4.3: set proper rmmio_remap.reg_offset for SR-IOV
  drm/amdgpu/soc21: don't remap HDP registers for SR-IOV
  drm/amd/display: Don't check registers, if using AUX BL control
  drm/amdgpu: fix retry loop test
  drm/amd/display: Add dirty rect support for Replay
  Revert "drm/amd: Disable S/G for APUs when 64GB or more host memory"
  drm/amd/display: fix the white screen issue when >= 64GB DRAM
  drm/amdkfd: Update CU masking for GFX 9.4.3
  drm/amdkfd: Update cache info reporting for GFX v9.4.3
  ...

19 months agoMerge tag 'efi-fixes-for-v6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 15 Sep 2023 19:42:48 +0000 (12:42 -0700)]
Merge tag 'efi-fixes-for-v6.6-1' of git://git./linux/kernel/git/efi/efi

Pull EFI fixes from Ard Biesheuvel:

 - Missing x86 patch for the runtime cleanup that was merged in -rc1

 - Kconfig tweak for kexec on x86 so EFI support does not get disabled
   inadvertently

 - Use the right EFI memory type for the unaccepted memory table so
   kexec/kdump exposes it to the crash kernel as well

 - Work around EFI implementations which do not implement
   QueryVariableInfo, which is now called by statfs() on efivarfs

* tag 'efi-fixes-for-v6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efivarfs: fix statfs() on efivarfs
  efi/unaccepted: Use ACPI reclaim memory for unaccepted memory table
  efi/x86: Ensure that EFI_RUNTIME_MAP is enabled for kexec
  efi/x86: Move EFI runtime call setup/teardown helpers out of line

19 months agodm: don't attempt to queue IO under RCU protection
Jens Axboe [Fri, 15 Sep 2023 19:14:23 +0000 (13:14 -0600)]
dm: don't attempt to queue IO under RCU protection

dm looks up the table for IO based on the request type, with an
assumption that if the request is marked REQ_NOWAIT, it's fine to
attempt to submit that IO while under RCU read lock protection. This
is not OK, as REQ_NOWAIT just means that we should not be sleeping
waiting on other IO, it does not mean that we can't potentially
schedule.

A simple test case demonstrates this quite nicely:

int main(int argc, char *argv[])
{
        struct iovec iov;
        int fd;

        fd = open("/dev/dm-0", O_RDONLY | O_DIRECT);
        posix_memalign(&iov.iov_base, 4096, 4096);
        iov.iov_len = 4096;
        preadv2(fd, &iov, 1, 0, RWF_NOWAIT);
        return 0;
}

which will instantly spew:

BUG: sleeping function called from invalid context at include/linux/sched/mm.h:306
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 5580, name: dm-nowait
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
INFO: lockdep is turned off.
CPU: 7 PID: 5580 Comm: dm-nowait Not tainted 6.6.0-rc1-g39956d2dcd81 #132
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x11d/0x1b0
 __might_resched+0x3c3/0x5e0
 ? preempt_count_sub+0x150/0x150
 mempool_alloc+0x1e2/0x390
 ? mempool_resize+0x7d0/0x7d0
 ? lock_sync+0x190/0x190
 ? lock_release+0x4b7/0x670
 ? internal_get_user_pages_fast+0x868/0x2d40
 bio_alloc_bioset+0x417/0x8c0
 ? bvec_alloc+0x200/0x200
 ? internal_get_user_pages_fast+0xb8c/0x2d40
 bio_alloc_clone+0x53/0x100
 dm_submit_bio+0x27f/0x1a20
 ? lock_release+0x4b7/0x670
 ? blk_try_enter_queue+0x1a0/0x4d0
 ? dm_dax_direct_access+0x260/0x260
 ? rcu_is_watching+0x12/0xb0
 ? blk_try_enter_queue+0x1cc/0x4d0
 __submit_bio+0x239/0x310
 ? __bio_queue_enter+0x700/0x700
 ? kvm_clock_get_cycles+0x40/0x60
 ? ktime_get+0x285/0x470
 submit_bio_noacct_nocheck+0x4d9/0xb80
 ? should_fail_request+0x80/0x80
 ? preempt_count_sub+0x150/0x150
 ? lock_release+0x4b7/0x670
 ? __bio_add_page+0x143/0x2d0
 ? iov_iter_revert+0x27/0x360
 submit_bio_noacct+0x53e/0x1b30
 submit_bio_wait+0x10a/0x230
 ? submit_bio_wait_endio+0x40/0x40
 __blkdev_direct_IO_simple+0x4f8/0x780
 ? blkdev_bio_end_io+0x4c0/0x4c0
 ? stack_trace_save+0x90/0xc0
 ? __bio_clone+0x3c0/0x3c0
 ? lock_release+0x4b7/0x670
 ? lock_sync+0x190/0x190
 ? atime_needs_update+0x3bf/0x7e0
 ? timestamp_truncate+0x21b/0x2d0
 ? inode_owner_or_capable+0x240/0x240
 blkdev_direct_IO.part.0+0x84a/0x1810
 ? rcu_is_watching+0x12/0xb0
 ? lock_release+0x4b7/0x670
 ? blkdev_read_iter+0x40d/0x530
 ? reacquire_held_locks+0x4e0/0x4e0
 ? __blkdev_direct_IO_simple+0x780/0x780
 ? rcu_is_watching+0x12/0xb0
 ? __mark_inode_dirty+0x297/0xd50
 ? preempt_count_add+0x72/0x140
 blkdev_read_iter+0x2a4/0x530
 do_iter_readv_writev+0x2f2/0x3c0
 ? generic_copy_file_range+0x1d0/0x1d0
 ? fsnotify_perm.part.0+0x25d/0x630
 ? security_file_permission+0xd8/0x100
 do_iter_read+0x31b/0x880
 ? import_iovec+0x10b/0x140
 vfs_readv+0x12d/0x1a0
 ? vfs_iter_read+0xb0/0xb0
 ? rcu_is_watching+0x12/0xb0
 ? rcu_is_watching+0x12/0xb0
 ? lock_release+0x4b7/0x670
 do_preadv+0x1b3/0x260
 ? do_readv+0x370/0x370
 __x64_sys_preadv2+0xef/0x150
 do_syscall_64+0x39/0xb0
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f5af41ad806
Code: 41 54 41 89 fc 55 44 89 c5 53 48 89 cb 48 83 ec 18 80 3d e4 dd 0d 00 00 74 7a 45 89 c1 49 89 ca 45 31 c0 b8 47 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 be 00 00 00 48 85 c0 79 4a 48 8b 0d da 55
RSP: 002b:00007ffd3145c7f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000147
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5af41ad806
RDX: 0000000000000001 RSI: 00007ffd3145c850 RDI: 0000000000000003
RBP: 0000000000000008 R08: 0000000000000000 R09: 0000000000000008
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
R13: 00007ffd3145c850 R14: 000055f5f0431dd8 R15: 0000000000000001
 </TASK>

where in fact it is dm itself that attempts to allocate a bio clone with
GFP_NOIO under the rcu read lock, regardless of the request type.

Fix this by getting rid of the special casing for REQ_NOWAIT, and just
use the normal SRCU protected table lookup. Get rid of the bio based
table locking helpers at the same time, as they are now unused.

Cc: stable@vger.kernel.org
Fixes: 563a225c9fd2 ("dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
19 months agoMerge tag 'selinux-pr-20230914' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 15 Sep 2023 19:38:44 +0000 (12:38 -0700)]
Merge tag 'selinux-pr-20230914' of git://git./linux/kernel/git/pcmoore/selinux

Pull selinux fix from Paul Moore:
 "A relatively small SELinux patch to fix an issue with a
  vfs/LSM/SELinux patch that went upstream during the recent merge
  window.

  The short version is that the original patch changed how we
  initialized mount options to resolve a NFS issue and we inadvertently
  broke a use case due to the changed behavior.

  The fix restores this behavior for the cases that require it while
  keeping the original NFS fix in place"

* tag 'selinux-pr-20230914' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: fix handling of empty opts in selinux_fs_context_submount()

19 months agoMerge tag 'riscv-for-linus-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 15 Sep 2023 19:33:01 +0000 (12:33 -0700)]
Merge tag 'riscv-for-linus-6.6-rc2' of git://git./linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - A fix to align kexec'd kernels to PMD boundries

 - The T-Head dcache.cva encoding was incorrect, it has been fixed to
   invalidate all caches (as opposed to just the L1)

* tag 'riscv-for-linus-6.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: errata: fix T-Head dcache.cva encoding
  riscv: kexec: Align the kexeced kernel entry

19 months agoRevert "firewire: core: obsolete usage of GFP_ATOMIC at building node tree"
Takashi Sakamoto [Fri, 15 Sep 2023 09:33:59 +0000 (18:33 +0900)]
Revert "firewire: core: obsolete usage of GFP_ATOMIC at building node tree"

This reverts commit 06f45435d985d60d7d2fe2424fbb9909d177a63d.

John Ogness reports the case that the allocation is in atomic context under
acquired spin-lock.

[   12.555784] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:306
[   12.555808] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 70, name: kworker/1:2
[   12.555814] preempt_count: 1, expected: 0
[   12.555820] INFO: lockdep is turned off.
[   12.555824] irq event stamp: 208
[   12.555828] hardirqs last  enabled at (207): [<c00000000111e414>] ._raw_spin_unlock_irq+0x44/0x80
[   12.555850] hardirqs last disabled at (208): [<c00000000110ff94>] .__schedule+0x854/0xfe0
[   12.555859] softirqs last  enabled at (188): [<c000000000f73504>] .addrconf_verify_rtnl+0x2c4/0xb70
[   12.555872] softirqs last disabled at (182): [<c000000000f732b0>] .addrconf_verify_rtnl+0x70/0xb70
[   12.555884] CPU: 1 PID: 70 Comm: kworker/1:2 Tainted: G S                 6.6.0-rc1 #1
[   12.555893] Hardware name: PowerMac7,2 PPC970 0x390202 PowerMac
[   12.555898] Workqueue: firewire_ohci .bus_reset_work [firewire_ohci]
[   12.555939] Call Trace:
[   12.555944] [c000000009677830] [c0000000010d83c0] .dump_stack_lvl+0x8c/0xd0 (unreliable)
[   12.555963] [c0000000096778b0] [c000000000140270] .__might_resched+0x320/0x340
[   12.555978] [c000000009677940] [c000000000497600] .__kmem_cache_alloc_node+0x390/0x460
[   12.555993] [c000000009677a10] [c0000000003fe620] .__kmalloc+0x70/0x310
[   12.556007] [c000000009677ac0] [c0003d00004e2268] .fw_core_handle_bus_reset+0x2c8/0xba0 [firewire_core]
[   12.556060] [c000000009677c20] [c0003d0000491190] .bus_reset_work+0x330/0x9b0 [firewire_ohci]
[   12.556079] [c000000009677d10] [c00000000011d0d0] .process_one_work+0x280/0x6f0
[   12.556094] [c000000009677e10] [c00000000011d8a0] .worker_thread+0x360/0x500
[   12.556107] [c000000009677ef0] [c00000000012e3b4] .kthread+0x154/0x160
[   12.556120] [c000000009677f90] [c00000000000bfa8] .start_kernel_thread+0x10/0x14

Cc: stable@kernel.org
Reported-by: John Ogness <john.ogness@linutronix.de>
Link: https://lore.kernel.org/lkml/87jzsuv1xk.fsf@jogness.linutronix.de/raw
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
19 months agopanic: Reenable preemption in WARN slowpath
Lukas Wunner [Fri, 15 Sep 2023 07:55:39 +0000 (09:55 +0200)]
panic: Reenable preemption in WARN slowpath

Commit:

  5a5d7e9badd2 ("cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG")

amended warn_slowpath_fmt() to disable preemption until the WARN splat
has been emitted.

However the commit neglected to reenable preemption in the !fmt codepath,
i.e. when a WARN splat is emitted without additional format string.

One consequence is that users may see more splats than intended.  E.g. a
WARN splat emitted in a work item results in at least two extra splats:

  BUG: workqueue leaked lock or atomic
  (emitted by process_one_work())

  BUG: scheduling while atomic
  (emitted by worker_thread() -> schedule())

Ironically the point of the commit was to *avoid* extra splats. ;)

Fix it.

Fixes: 5a5d7e9badd2 ("cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/3ec48fde01e4ee6505f77908ba351bad200ae3d1.1694763684.git.lukas@wunner.de
19 months agosmb3: fix some minor typos and repeated words
Steve French [Fri, 15 Sep 2023 06:37:33 +0000 (01:37 -0500)]
smb3: fix some minor typos and repeated words

Minor cleanup pointed out by checkpatch (repeated words, missing blank
lines) in smb2pdu.c and old header location referred to in transport.c

Signed-off-by: Steve French <stfrench@microsoft.com>
19 months agosmb3: correct places where ENOTSUPP is used instead of preferred EOPNOTSUPP
Steve French [Fri, 15 Sep 2023 06:10:40 +0000 (01:10 -0500)]
smb3: correct places where ENOTSUPP is used instead of preferred EOPNOTSUPP

checkpatch flagged a few places with:
     WARNING: ENOTSUPP is not a SUSV4 error code, prefer EOPNOTSUPP
Also fixed minor typo

Signed-off-by: Steve French <stfrench@microsoft.com>
19 months agoata: pata_parport: Fix code style issues
Damien Le Moal [Fri, 15 Sep 2023 02:33:12 +0000 (11:33 +0900)]
ata: pata_parport: Fix code style issues

Fix indentation and other code style issues in the comm.c file.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202309150646.n3iBvbPj-lkp@intel.com/
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
19 months agoata: libahci: clear pending interrupt status
Szuying Chen [Thu, 7 Sep 2023 08:17:10 +0000 (16:17 +0800)]
ata: libahci: clear pending interrupt status

When a CRC error occurs, the HBA asserts an interrupt to indicate an
interface fatal error (PxIS.IFS). The ISR clears PxIE and PxIS, then
does error recovery. If the adapter receives another SDB FIS
with an error (PxIS.TFES) from the device before the start of the EH
recovery process, the interrupt signaling the new SDB cannot be
serviced as PxIE was cleared already. This in turn results in the HBA
inability to issue any command during the error recovery process after
setting PxCMD.ST to 1 because PxIS.TFES is still set.

According to AHCI 1.3.1 specifications section 6.2.2, fatal errors
notified by setting PxIS.HBFS, PxIS.HBDS, PxIS.IFS or PxIS.TFES will
cause the HBA to enter the ERR:Fatal state. In this state, the HBA
shall not issue any new commands.

To avoid this situation, introduce the function
ahci_port_clear_pending_irq() to clear pending interrupts before
executing a COMRESET. This follows the AHCI 1.3.1 - section 6.2.2.2
specification.

Signed-off-by: Szuying Chen <Chloe_Chen@asmedia.com.tw>
Fixes: e0bfd149973d ("[PATCH] ahci: stop engine during hard reset")
Cc: stable@vger.kernel.org
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>