Inochi Amaoto [Thu, 11 Apr 2024 12:12:40 +0000 (20:12 +0800)]
riscv: defconfig: Enable CONFIG_CLK_SOPHGO_CV1800
CONFIG_CLK_SOPHGO_CV1800 is required when booting the minimum
system for CV1800 series board.
Signed-off-by: Inochi Amaoto <inochiama@outlook.com>
Link: https://lore.kernel.org/r/IA1PR20MB49537E8B2D1FAAA7D5B8BDA2BB052@IA1PR20MB4953.namprd20.prod.outlook.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Palmer Dabbelt [Mon, 29 Apr 2024 17:49:39 +0000 (10:49 -0700)]
Merge patch series "riscv: ASID-related and UP-related TLB flush enhancements"
Samuel Holland <samuel.holland@sifive.com> says:
This series converts uniprocessor kernel builds to use the same TLB
flushing code as SMP builds, to take advantage of batching and existing
range- and ASID-based TLB flush optimizations. It optimizes out IPIs and
SBI calls based on the online CPU count, which also covers the scenario
where SMP was enabled at build time but only one CPU is present/online.
A final optimization is to use single-ASID flushes wherever possible, to
avoid unnecessary TLB misses for kernel mappings.
This series has a semantic conflict with the AIA patches that are in
linux-next due to the removal of the third parameter of
riscv_ipi_set_virq_range(), which is called from imsic_ipi_domain_init()
in drivers/irqchip/irq-riscv-imsic-early.c. The resolution is to remove
the extra argument from the call site.
Here are some numbers from D1 which show the performance impact:
v6.9-rc1:
System Benchmarks Partial Index BASELINE RESULT INDEX
Execl Throughput 43.0 198.5 46.2
File Copy 1024 bufsize 2000 maxblocks 3960.0 73934.4 186.7
File Copy 256 bufsize 500 maxblocks 1655.0 20242.6 122.3
File Copy 4096 bufsize 8000 maxblocks 5800.0 197706.4 340.9
Pipe Throughput 12440.0 176974.2 142.3
Pipe-based Context Switching 4000.0 23626.8 59.1
Process Creation 126.0 449.9 35.7
Shell Scripts (1 concurrent) 42.4 544.4 128.4
Shell Scripts (16 concurrent) --- 35.3 ---
Shell Scripts (8 concurrent) 6.0 71.6 119.3
System Call Overhead 15000.0 248072.6 165.4
========
System Benchmarks Index Score (Partial Only) 110.6
v6.9-rc1 + this patch series:
System Benchmarks Partial Index BASELINE RESULT INDEX
Execl Throughput 43.0 196.8 45.8
File Copy 1024 bufsize 2000 maxblocks 3960.0 71782.2 181.3
File Copy 256 bufsize 500 maxblocks 1655.0 21269.4 128.5
File Copy 4096 bufsize 8000 maxblocks 5800.0 199424.0 343.8
Pipe Throughput 12440.0 196468.6 157.9
Pipe-based Context Switching 4000.0 24261.8 60.7
Process Creation 126.0 459.0 36.4
Shell Scripts (1 concurrent) 42.4 543.8 128.2
Shell Scripts (16 concurrent) --- 35.5 ---
Shell Scripts (8 concurrent) 6.0 71.7 119.6
System Call Overhead 15000.0 259415.2 172.9
========
System Benchmarks Index Score (Partial Only) 113.0
* b4-shazam-lts:
riscv: mm: Always use an ASID to flush mm contexts
riscv: mm: Preserve global TLB entries when switching contexts
riscv: mm: Make asid_bits a local variable
riscv: mm: Use a fixed layout for the MM context ID
riscv: mm: Introduce cntx2asid/cntx2version helper macros
riscv: Avoid TLB flush loops when affected by SiFive CIP-1200
riscv: Apply SiFive CIP-1200 workaround to single-ASID sfence.vma
riscv: mm: Combine the SMP and UP TLB flush code
riscv: Only send remote fences when some other CPU is online
riscv: mm: Broadcast kernel TLB flushes only when needed
riscv: Use IPIs for remote cache/TLB flushes by default
riscv: Factor out page table TLB synchronization
riscv: Flush the instruction cache during SMP bringup
Link: https://lore.kernel.org/r/20240327045035.368512-1-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Jisheng Zhang [Mon, 25 Mar 2024 10:58:23 +0000 (18:58 +0800)]
riscv: select ARCH_HAS_FAST_MULTIPLIER
Currently, riscv linux requires at least IMA, so all platforms have a
multiplier. And I assume the 'mul' efficiency is comparable or better
than a sequence of five or so register-dependent arithmetic
instructions. Select ARCH_HAS_FAST_MULTIPLIER to get slightly nicer
codegen. Refer to commit
f9b4192923fa ("[PATCH] bitops: hweight()
speedup") for more details.
In a simple benchmark test calling hweight64() in a loop, it got:
about 14% performance improvement on JH7110, tested on Milkv Mars.
about 23% performance improvement on TH1520 and SG2042, tested on
Sipeed LPI4A and SG2042 platform.
a slight performance drop on CV1800B, tested on milkv duo. Among all
riscv platforms in my hands, this is the only one which sees a slight
performance drop. It means the 'mul' isn't quick enough. However, the
situation exists on x86 too, for example, P4 doesn't have fast
integer multiplies as said in the above commit, x86 also selects
ARCH_HAS_FAST_MULTIPLIER. So let's select ARCH_HAS_FAST_MULTIPLIER
which can benefit almost riscv platforms.
Samuel also provided some performance numbers:
On Unmatched: 20% speedup for __sw_hweight32 and 30% speedup for
__sw_hweight64.
On D1: 8% speedup for __sw_hweight32 and 8% slowdown for
__sw_hweight64.
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Tested-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240325105823.1483-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Palmer Dabbelt [Wed, 24 Apr 2024 19:57:51 +0000 (12:57 -0700)]
Merge patch series "riscv: enable lockless lockref implementation"
Jisheng Zhang <jszhang@kernel.org> says:
This series selects ARCH_USE_CMPXCHG_LOCKREF to enable the
cmpxchg-based lockless lockref implementation for riscv. Then,
implement arch_cmpxchg64_{relaxed|acquire|release}.
After patch1:
Using Linus' test case[1] on TH1520 platform, I see a 11.2% improvement.
On JH7110 platform, I see 12.0% improvement.
After patch2:
on both TH1520 and JH7110 platforms, I didn't see obvious
performance improvement with Linus' test case [1]. IMHO, this may
be related with the fence and lr.d/sc.d hw implementations. In theory,
lr/sc without fence could give performance improvement over lr/sc plus
fence, so add the code here to leave performance improvement room on
newer HW platforms.
* b4-shazam-merge:
riscv: cmpxchg: implement arch_cmpxchg64_{relaxed|acquire|release}
riscv: select ARCH_USE_CMPXCHG_LOCKREF
Link: http://marc.info/?l=linux-fsdevel&m=137782380714721&w=4
Link: https://lore.kernel.org/r/20240325111038.1700-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Jisheng Zhang [Mon, 25 Mar 2024 11:00:36 +0000 (19:00 +0800)]
riscv: mm: still create swiotlb buffer for kmalloc() bouncing if required
After commit
f51f7a0fc2f4 ("riscv: enable DMA_BOUNCE_UNALIGNED_KMALLOC
for !dma_coherent"), for non-coherent platforms with less than 4GB
memory, we rely on users to pass "swiotlb=mmnn,force" kernel parameters
to enable DMA bouncing for unaligned kmalloc() buffers. Now let's go
further: If no bouncing needed for ZONE_DMA, let kernel automatically
allocate 1MB swiotlb buffer per 1GB of RAM for kmalloc() bouncing on
non-coherent platforms, so that no need to pass "swiotlb=mmnn,force"
any more.
The math of "1MB swiotlb buffer per 1GB of RAM for kmalloc() bouncing"
is taken from arm64. Users can still force smaller swiotlb buffer by
passing "swiotlb=mmnn".
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240325110036.1564-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Dawei Li [Wed, 20 Mar 2024 06:47:12 +0000 (14:47 +0800)]
riscv: Annotate pgtable_l{4,5}_enabled with __ro_after_init
pgtable_l{4,5}_enabled are read only after initialization, make explicit
annotation of __ro_after_init on them.
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240320064712.442579-3-dawei.li@shingroup.cn
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Dawei Li [Wed, 20 Mar 2024 06:47:11 +0000 (14:47 +0800)]
riscv: Remove redundant CONFIG_64BIT from pgtable_l{4,5}_enabled
IS_ENABLED(CONFIG_64BIT) in initialization of pgtable_l{4,5}_enabled is
redundant, remove it.
Signed-off-by: Dawei Li <dawei.li@shingroup.cn>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240320064712.442579-2-dawei.li@shingroup.cn
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Palmer Dabbelt [Wed, 17 Apr 2024 01:50:58 +0000 (18:50 -0700)]
Merge patch series "riscv: Create and document PR_RISCV_SET_ICACHE_FLUSH_CTX prctl"
Charlie Jenkins <charlie@rivosinc.com> says:
Improve the performance of icache flushing by creating a new prctl flag
PR_RISCV_SET_ICACHE_FLUSH_CTX. The interface is left generic to allow
for future expansions such as with the proposed J extension [1].
Documentation is also provided to explain the use case.
Patch sent to add PR_RISCV_SET_ICACHE_FLUSH_CTX to man-pages [2].
[1] https://github.com/riscv/riscv-j-extension
[2] https://lore.kernel.org/linux-man/
20240124-fencei_prctl-v1-1-
0bddafcef331@rivosinc.com
* b4-shazam-merge:
cpumask: Add assign cpu
documentation: Document PR_RISCV_SET_ICACHE_FLUSH_CTX prctl
riscv: Include riscv_set_icache_flush_ctx prctl
riscv: Remove unnecessary irqflags processor.h include
Link: https://lore.kernel.org/r/20240312-fencei-v13-0-4b6bdc2bbf32@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Palmer Dabbelt [Wed, 17 Apr 2024 01:27:50 +0000 (18:27 -0700)]
Merge patch series "riscv: fix patching with IPI"
Alexandre Ghiti <alexghiti@rivosinc.com> says:
patch 1 removes a useless memory barrier and patch 2 actually fixes the
issue with IPI in the patching code.
* b4-shazam-merge:
riscv: Fix text patching when IPI are used
riscv: Remove superfluous smp_mb()
Link: https://lore.kernel.org/r/20240229121056.203419-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Samuel Holland [Wed, 27 Mar 2024 04:49:54 +0000 (21:49 -0700)]
riscv: mm: Always use an ASID to flush mm contexts
Even if multiple ASIDs are not supported, using the single-ASID variant
of the sfence.vma instruction preserves TLB entries for global (kernel)
pages. So it is always more efficient to use the single-ASID code path.
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20240327045035.368512-14-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Samuel Holland [Wed, 27 Mar 2024 04:49:53 +0000 (21:49 -0700)]
riscv: mm: Preserve global TLB entries when switching contexts
If the CPU does not support multiple ASIDs, all MM contexts use ASID 0.
In this case, it is still beneficial to flush the TLB by ASID, as the
single-ASID variant of the sfence.vma instruction preserves TLB entries
for global (kernel) pages.
This optimization is recommended by the RISC-V privileged specification:
If the implementation does not provide ASIDs, or software chooses
to always use ASID 0, then after every satp write, software should
execute SFENCE.VMA with rs1=x0. In the common case that no global
translations have been modified, rs2 should be set to a register
other than x0 but which contains the value zero, so that global
translations are not flushed.
It is not possible to apply this optimization when using the ASID
allocator, because that code must flush the TLB for all ASIDs at once
when incrementing the version number.
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20240327045035.368512-13-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Samuel Holland [Wed, 27 Mar 2024 04:49:52 +0000 (21:49 -0700)]
riscv: mm: Make asid_bits a local variable
This variable is only used inside asids_init().
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20240327045035.368512-12-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Samuel Holland [Wed, 27 Mar 2024 04:49:51 +0000 (21:49 -0700)]
riscv: mm: Use a fixed layout for the MM context ID
Currently, the size of the ASID field in the MM context ID dynamically
depends on the number of hardware-supported ASID bits. This requires
reading a global variable to extract either field from the context ID.
Instead, allocate the maximum possible number of bits to the ASID field,
so the layout of the context ID is known at compile-time.
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20240327045035.368512-11-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Samuel Holland [Wed, 27 Mar 2024 04:49:50 +0000 (21:49 -0700)]
riscv: mm: Introduce cntx2asid/cntx2version helper macros
When using the ASID allocator, the MM context ID contains two values:
the ASID in the lower bits, and the allocator version number in the
remaining bits. Use macros to make this separation more obvious.
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20240327045035.368512-10-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Samuel Holland [Wed, 27 Mar 2024 04:49:49 +0000 (21:49 -0700)]
riscv: Avoid TLB flush loops when affected by SiFive CIP-1200
Implementations affected by SiFive errata CIP-1200 have a bug which
forces the kernel to always use the global variant of the sfence.vma
instruction. When affected by this errata, do not attempt to flush a
range of addresses; each iteration of the loop would actually flush the
whole TLB instead. Instead, minimize the overall number of sfence.vma
instructions.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Yunhui Cui <cuiyunhui@bytedance.com>
Link: https://lore.kernel.org/r/20240327045035.368512-9-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Samuel Holland [Wed, 27 Mar 2024 04:49:48 +0000 (21:49 -0700)]
riscv: Apply SiFive CIP-1200 workaround to single-ASID sfence.vma
commit
3f1e782998cd ("riscv: add ASID-based tlbflushing methods") added
calls to the sfence.vma instruction with rs2 != x0. These single-ASID
instruction variants are also affected by SiFive errata CIP-1200.
Until now, the errata workaround was not needed for the single-ASID
sfence.vma variants, because they were only used when the ASID allocator
was enabled, and the affected SiFive platforms do not support multiple
ASIDs. However, we are going to start using those sfence.vma variants
regardless of ASID support, so now we need alternatives covering them.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20240327045035.368512-8-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Samuel Holland [Wed, 27 Mar 2024 04:49:47 +0000 (21:49 -0700)]
riscv: mm: Combine the SMP and UP TLB flush code
In SMP configurations, all TLB flushing narrower than flush_tlb_all()
goes through __flush_tlb_range(). Do the same in UP configurations.
This allows UP configurations to take advantage of recent improvements
to the code in tlbflush.c, such as support for huge pages and flushing
multiple-page ranges.
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Yunhui Cui <cuiyunhui@bytedance.com>
Link: https://lore.kernel.org/r/20240327045035.368512-7-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Samuel Holland [Wed, 27 Mar 2024 04:49:46 +0000 (21:49 -0700)]
riscv: Only send remote fences when some other CPU is online
If no other CPU is online, a local cache or TLB flush is sufficient.
These checks can be constant-folded when SMP is disabled.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240327045035.368512-6-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Samuel Holland [Wed, 27 Mar 2024 04:49:45 +0000 (21:49 -0700)]
riscv: mm: Broadcast kernel TLB flushes only when needed
__flush_tlb_range() avoids broadcasting TLB flushes when an mm context
is only active on the local CPU. Apply this same optimization to TLB
flushes of kernel memory when only one CPU is online. This check can be
constant-folded when SMP is disabled.
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20240327045035.368512-5-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Samuel Holland [Wed, 27 Mar 2024 04:49:44 +0000 (21:49 -0700)]
riscv: Use IPIs for remote cache/TLB flushes by default
An IPI backend is always required in an SMP configuration, but an SBI
implementation is not. For example, SBI will be unavailable when the
kernel runs in M mode. For this reason, consider IPI delivery of cache
and TLB flushes to be the base case, and any other implementation (such
as the SBI remote fence extension) to be an optimization.
Generally, if IPIs can be delivered without firmware assistance, they
are assumed to be faster than SBI calls due to the SBI context switch
overhead. However, when SBI is used as the IPI backend, then the context
switch cost must be paid anyway, and performing the cache/TLB flush
directly in the SBI implementation is more efficient than injecting an
interrupt to S-mode. This is the only existing scenario where
riscv_ipi_set_virq_range() is called with use_for_rfence set to false.
sbi_ipi_init() already checks riscv_ipi_have_virq_range(), so it only
calls riscv_ipi_set_virq_range() when no other IPI device is available.
This allows moving the static key and dropping the use_for_rfence
parameter. This decouples the static key from the irqchip driver probe
order.
Furthermore, the static branch only makes sense when CONFIG_RISCV_SBI is
enabled. Optherwise, IPIs must be used. Add a fallback definition of
riscv_use_sbi_for_rfence() which handles this case and removes the need
to check CONFIG_RISCV_SBI elsewhere, such as in cacheflush.c.
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240327045035.368512-4-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Samuel Holland [Wed, 27 Mar 2024 04:49:43 +0000 (21:49 -0700)]
riscv: Factor out page table TLB synchronization
The logic is the same for all page table levels. See commit
69be3fb111e7
("riscv: enable MMU_GATHER_RCU_TABLE_FREE for SMP && MMU").
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240327045035.368512-3-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Samuel Holland [Wed, 27 Mar 2024 04:49:42 +0000 (21:49 -0700)]
riscv: Flush the instruction cache during SMP bringup
Instruction cache flush IPIs are sent only to CPUs in cpu_online_mask,
so they will not target a CPU until it calls set_cpu_online() earlier in
smp_callin(). As a result, if instruction memory is modified between the
CPU coming out of reset and that point, then its instruction cache may
contain stale data. Therefore, the instruction cache must be flushed
after the set_cpu_online() synchronization point.
Fixes: 08f051eda33b ("RISC-V: Flush I$ when making a dirty page executable")
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20240327045035.368512-2-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Clément Léger [Wed, 21 Feb 2024 08:31:06 +0000 (09:31 +0100)]
riscv: hwprobe: export Zihintpause ISA extension
Export the Zihintpause ISA extension through hwprobe which allows using
"pause" instructions. Some userspace applications (OpenJDK for
instance) uses this to handle some locking back-off.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20240221083108.1235311-1-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Clément Léger [Tue, 6 Feb 2024 15:40:59 +0000 (16:40 +0100)]
riscv: misaligned: remove CONFIG_RISCV_M_MODE specific code
While reworking code to fix sparse errors, it appears that the
RISCV_M_MODE specific could actually be removed and use the one for
normal mode. Even though RISCV_M_MODE can do direct user memory access,
using the user uaccess helpers is also going to work. Since there is no
need anymore for specific accessors (load_u8()/store_u8()), we can
directly use memcpy()/copy_{to/from}_user() and get rid of the copy
loop entirely. __read_insn() is also fixed to use an unsigned long
instead of a pointer which was cast in __user address space. The
insn_addr parameter is now cast from unsigned lnog to the correct
address space directly.
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20240206154104.896809-1-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Samuel Holland [Tue, 12 Mar 2024 19:56:38 +0000 (12:56 -0700)]
riscv: Do not save the scratch CSR during suspend
While the processor is executing kernel code, the value of the scratch
CSR is always zero, so there is no need to save the value. Continue to
write the CSR during the resume flow, so we do not rely on firmware to
initialize it.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20240312195641.1830521-1-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Palmer Dabbelt [Tue, 9 Apr 2024 18:39:46 +0000 (11:39 -0700)]
Merge patch series "riscv: 64-bit NOMMU fixes and enhancements"
Samuel Holland <samuel.holland@sifive.com> says:
This series aims to improve support for NOMMU, specifically by making it
easier to test NOMMU kernels in QEMU and on various widely-available
hardware (errata permitting). After all, everything supports Svbare...
After applying this series, a NOMMU kernel based on defconfig (changing
only the three options below*) boots to userspace on QEMU when passed as
-kernel.
# CONFIG_RISCV_M_MODE is not set
# CONFIG_MMU is not set
CONFIG_NONPORTABLE=y
*if you are using LLD, you must also disable BPF_SYSCALL and KALLSYMS,
because LLD bails on out-of-range references to undefined weak symbols.
* b4-shazam-merge:
riscv: Allow NOMMU kernels to run in S-mode
riscv: Remove MMU dependency from Zbb and Zicboz
riscv: Fix loading 64-bit NOMMU kernels past the start of RAM
riscv: Fix TASK_SIZE on 64-bit NOMMU
Link: https://lore.kernel.org/r/20240227003630.3634533-1-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Miguel Ojeda [Tue, 9 Apr 2024 17:25:16 +0000 (18:25 +0100)]
RISC-V: enable building 64-bit kernels with rust support
The rust modules work on 64-bit RISC-V, with no twiddling required.
Select HAVE_RUST and provide the required flags to kbuild so that the
modules can be used. The Makefile and Kconfig changes are lifted from
work done by Miguel in the Rust-for-Linux tree, hence his authorship.
Following the rabbit hole, the Makefile changes originated in a script,
created based on config files originally added by Gary, hence his
co-authorship.
32-bit is broken in core rust code, so support is limited to 64-bit:
ld.lld: error: undefined symbol: __udivdi3
As 64-bit RISC-V is now supported, add it to the arch support table.
Co-developed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Co-developed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20240409-silencer-book-ce1320f06aab@spud
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Palmer Dabbelt [Mon, 8 Apr 2024 17:55:03 +0000 (10:55 -0700)]
Merge patch series "Rework & improve riscv cmpxchg.h and atomic.h"
Leonardo Bras <leobras@redhat.com> says:
While studying riscv's cmpxchg.h file, I got really interested in
understanding how RISCV asm implemented the different versions of
{cmp,}xchg.
When I understood the pattern, it made sense for me to remove the
duplications and create macros to make it easier to understand what exactly
changes between the versions: Instruction sufixes & barriers.
Also, did the same kind of work on atomic.c.
After that, I noted both cmpxchg and xchg only accept variables of
size 4 and 8, compared to x86 and arm64 which do 1,2,4,8.
Now that deduplication is done, it is quite direct to implement them
for variable sizes 1 and 2, so I did it. Then Guo Ren already presented
me some possible users :)
I did compare the generated asm on a test.c that contained usage for every
changed function, and could not detect any change on patches 1 + 2 + 3
compared with upstream.
Pathes 4 & 5 were compiled-tested, merged with guoren/qspinlock_v11 and
booted just fine with qemu -machine virt -append "qspinlock".
(tree: https://gitlab.com/LeoBras/linux/-/commits/guo_qspinlock_v11)
Latest tests happened based on this tree:
https://github.com/guoren83/linux/tree/qspinlock_v12
* b4-shazam-lts:
riscv/cmpxchg: Implement xchg for variables of size 1 and 2
riscv/cmpxchg: Implement cmpxchg for variables of size 1 and 2
riscv/atomic.h : Deduplicate arch_atomic.*
riscv/cmpxchg: Deduplicate cmpxchg() asm and macros
riscv/cmpxchg: Deduplicate xchg() asm functions
Link: https://lore.kernel.org/r/20240103163203.72768-2-leobras@redhat.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Jisheng Zhang [Mon, 25 Mar 2024 11:10:38 +0000 (19:10 +0800)]
riscv: cmpxchg: implement arch_cmpxchg64_{relaxed|acquire|release}
After selecting ARCH_USE_CMPXCHG_LOCKREF, one straight futher
optimization is implementing the arch_cmpxchg64_relaxed() because the
lockref code does not need the cmpxchg to have barrier semantics. At
the same time, implement arch_cmpxchg64_acquire and
arch_cmpxchg64_release as well.
However, on both TH1520 and JH7110 platforms, I didn't see obvious
performance improvement with Linus' test case [1]. IMHO, this may
be related with the fence and lr.d/sc.d hw implementations. In theory,
lr/sc without fence could give performance improvement over lr/sc plus
fence, so add the code here to leave performance improvement room on
newer HW platforms.
Link: http://marc.info/?l=linux-fsdevel&m=137782380714721&w=4
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20240325111038.1700-3-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Jisheng Zhang [Mon, 25 Mar 2024 11:10:37 +0000 (19:10 +0800)]
riscv: select ARCH_USE_CMPXCHG_LOCKREF
Select ARCH_USE_CMPXCHG_LOCKREF to enable the cmpxchg-based lockless
lockref implementation for riscv.
Using Linus' test case[1] on TH1520 platform, I see a 11.2% improvement.
On JH7110 platform, I see 12.0% improvement.
Link: http://marc.info/?l=linux-fsdevel&m=137782380714721&w=4
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20240325111038.1700-2-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Charlie Jenkins [Tue, 12 Mar 2024 23:53:43 +0000 (16:53 -0700)]
cpumask: Add assign cpu
Standardize an assign_cpu function for cpumasks.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20240312-fencei-v13-4-4b6bdc2bbf32@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Charlie Jenkins [Tue, 12 Mar 2024 23:53:42 +0000 (16:53 -0700)]
documentation: Document PR_RISCV_SET_ICACHE_FLUSH_CTX prctl
Provide documentation that explains how to properly do CMODX in riscv.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240312-fencei-v13-3-4b6bdc2bbf32@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Charlie Jenkins [Tue, 12 Mar 2024 23:53:41 +0000 (16:53 -0700)]
riscv: Include riscv_set_icache_flush_ctx prctl
Support new prctl with key PR_RISCV_SET_ICACHE_FLUSH_CTX to enable
optimization of cross modifying code. This prctl enables userspace code
to use icache flushing instructions such as fence.i with the guarantee
that the icache will continue to be clean after thread migration.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20240312-fencei-v13-2-4b6bdc2bbf32@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Charlie Jenkins [Tue, 12 Mar 2024 23:53:40 +0000 (16:53 -0700)]
riscv: Remove unnecessary irqflags processor.h include
This include is not used and can lead to circular dependencies.
Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20240312-fencei-v13-1-4b6bdc2bbf32@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti [Thu, 29 Feb 2024 12:10:56 +0000 (13:10 +0100)]
riscv: Fix text patching when IPI are used
For now, we use stop_machine() to patch the text and when we use IPIs for
remote icache flushes (which is emitted in patch_text_nosync()), the system
hangs.
So instead, make sure every CPU executes the stop_machine() patching
function and emit a local icache flush there.
Co-developed-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20240229121056.203419-3-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Alexandre Ghiti [Thu, 29 Feb 2024 12:10:55 +0000 (13:10 +0100)]
riscv: Remove superfluous smp_mb()
This memory barrier is not needed and not documented so simply remove
it.
Suggested-by: Andrea Parri <andrea@rivosinc.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20240229121056.203419-2-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Samuel Holland [Tue, 27 Feb 2024 00:34:49 +0000 (16:34 -0800)]
riscv: Allow NOMMU kernels to run in S-mode
For ease of testing, it is convenient to run NOMMU kernels in supervisor
mode. The only required change is to offset the kernel load address,
since the beginning of RAM is usually reserved for M-mode firmware.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20240227003630.3634533-5-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Samuel Holland [Tue, 27 Feb 2024 00:34:48 +0000 (16:34 -0800)]
riscv: Remove MMU dependency from Zbb and Zicboz
The Zbb and Zicboz ISA extensions have no dependency on an MMU and are
equally useful on NOMMU kernels.
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20240227003630.3634533-4-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Samuel Holland [Tue, 27 Feb 2024 00:34:47 +0000 (16:34 -0800)]
riscv: Fix loading 64-bit NOMMU kernels past the start of RAM
commit
3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear
mapping") added logic to allow using RAM below the kernel load address.
However, this does not work for NOMMU, where PAGE_OFFSET is fixed to the
kernel load address. Since that range of memory corresponds to PFNs
below ARCH_PFN_OFFSET, mm initialization runs off the beginning of
mem_map and corrupts adjacent kernel memory. Fix this by restoring the
previous behavior for NOMMU kernels.
Fixes: 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping")
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Link: https://lore.kernel.org/r/20240227003630.3634533-3-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Samuel Holland [Tue, 27 Feb 2024 00:34:46 +0000 (16:34 -0800)]
riscv: Fix TASK_SIZE on 64-bit NOMMU
On NOMMU, userspace memory can come from anywhere in physical RAM. The
current definition of TASK_SIZE is wrong if any RAM exists above 4G,
causing spurious failures in the userspace access routines.
Fixes: 6bd33e1ece52 ("riscv: add nommu support")
Fixes: c3f896dcf1e4 ("mm: switch the test_vmalloc module to use __vmalloc_node")
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Bo Gan <ganboing@gmail.com>
Link: https://lore.kernel.org/r/20240227003630.3634533-2-samuel.holland@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Leonardo Bras [Wed, 3 Jan 2024 16:32:03 +0000 (13:32 -0300)]
riscv/cmpxchg: Implement xchg for variables of size 1 and 2
xchg for variables of size 1-byte and 2-bytes is not yet available for
riscv, even though its present in other architectures such as arm64 and
x86. This could lead to not being able to implement some locking mechanisms
or requiring some rework to make it work properly.
Implement 1-byte and 2-bytes xchg in order to achieve parity with other
architectures.
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Tested-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20240103163203.72768-7-leobras@redhat.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Leonardo Bras [Wed, 3 Jan 2024 16:32:02 +0000 (13:32 -0300)]
riscv/cmpxchg: Implement cmpxchg for variables of size 1 and 2
cmpxchg for variables of size 1-byte and 2-bytes is not yet available for
riscv, even though its present in other architectures such as arm64 and
x86. This could lead to not being able to implement some locking mechanisms
or requiring some rework to make it work properly.
Implement 1-byte and 2-bytes cmpxchg in order to achieve parity with other
architectures.
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Tested-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20240103163203.72768-6-leobras@redhat.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Leonardo Bras [Wed, 3 Jan 2024 16:32:01 +0000 (13:32 -0300)]
riscv/atomic.h : Deduplicate arch_atomic.*
Some functions use mostly the same asm for 32-bit and 64-bit versions.
Make a macro that is generic enough and avoid code duplication.
(This did not cause any change in generated asm)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Andrea Parri <parri.andrea@gmail.com>
Tested-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20240103163203.72768-5-leobras@redhat.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Leonardo Bras [Wed, 3 Jan 2024 16:32:00 +0000 (13:32 -0300)]
riscv/cmpxchg: Deduplicate cmpxchg() asm and macros
In this header every cmpxchg define (_relaxed, _acquire, _release,
vanilla) contain it's own asm file, both for 4-byte variables an 8-byte
variables, on a total of 8 versions of mostly the same asm.
This is usually bad, as it means any change may be done in up to 8
different places.
Unify those versions by creating a new define with enough parameters to
generate any version of the previous 8.
Then unify the result under a more general define, and simplify
arch_cmpxchg* generation
(This did not cause any change in generated asm)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Andrea Parri <parri.andrea@gmail.com>
Tested-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20240103163203.72768-4-leobras@redhat.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Leonardo Bras [Wed, 3 Jan 2024 16:31:59 +0000 (13:31 -0300)]
riscv/cmpxchg: Deduplicate xchg() asm functions
In this header every xchg define (_relaxed, _acquire, _release, vanilla)
contain it's own asm file, both for 4-byte variables an 8-byte variables,
on a total of 8 versions of mostly the same asm.
This is usually bad, as it means any change may be done in up to 8
different places.
Unify those versions by creating a new define with enough parameters to
generate any version of the previous 8.
Then unify the result under a more general define, and simplify
arch_xchg* generation.
(This did not cause any change in generated asm)
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Andrea Parri <parri.andrea@gmail.com>
Tested-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20240103163203.72768-3-leobras@redhat.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Yangyu Chen [Tue, 9 Jan 2024 18:48:59 +0000 (02:48 +0800)]
RISC-V: only flush icache when it has VM_EXEC set
As I-Cache flush on current RISC-V needs to send IPIs to every CPU cores
in the system is very costly, limiting flush_icache_mm to be called only
when vma->vm_flags has VM_EXEC can help minimize the frequency of these
operations. It improves performance and reduces disturbances when
copy_from_user_page is needed such as profiling with perf.
For I-D coherence concerns, it will not fail if such a page adds VM_EXEC
flags in the future since we have checked it in the __set_pte_at function.
Signed-off-by: Yangyu Chen <cyy@cyyself.name>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/tencent_6D851035F6F2FD0B5A69FB391AE39AC6300A@qq.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
tanzirh@google.com [Tue, 12 Dec 2023 00:20:14 +0000 (00:20 +0000)]
riscv: remove unused header
arch/riscv/kernel/sys_riscv.c builds without using anything
from asm-generic/mman-common.h. There seems to be no reason
to include this file.
Signed-off-by: Tanzir Hasan <tanzirh@google.com>
Fixes: 9e2e6042a7ec ("riscv: Allow PROT_WRITE-only mmap()")
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20231212-removeasmgeneric-v2-1-a0e6d7df34a7@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Masahiro Yamada [Sat, 23 Mar 2024 09:06:15 +0000 (18:06 +0900)]
export.h: remove include/asm-generic/export.h
Commit
3a6dd5f614a1 ("riscv: remove unneeded #include
<asm-generic/export.h>") removed the last use of
include/asm-generic/export.h.
This deprecated header can go away.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20240323090615.1244904-1-masahiroy@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Masahiro Yamada [Sat, 23 Mar 2024 11:35:00 +0000 (20:35 +0900)]
riscv: merge two if-blocks for KBUILD_IMAGE
In arch/riscv/Makefile, KBUILD_IMAGE is assigned in two separate
if-blocks.
When CONFIG_XIP_KERNEL is disabled, the decision made by the first
if-block is overwritten by the second one, which is redundant and
unreadable.
Merge the two if-blocks.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240323113500.1249272-1-masahiroy@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Stafford Horne [Sun, 10 Mar 2024 11:21:26 +0000 (11:21 +0000)]
riscv: Remove unused asm/signal.h file
When riscv moved to common entry the definition and usage of
do_work_pending was removed. This unused header file remains.
Remove the header file as it is not used.
I have tested compiling the kernel with this patch applied and saw no
issues. Noticed when auditing how different ports handle signals
related to saving FPU state.
Fixes: f0bddf50586d ("riscv: entry: Convert to generic entry")
Signed-off-by: Stafford Horne <shorne@gmail.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20240310112129.376134-1-shorne@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Linus Torvalds [Sun, 24 Mar 2024 21:10:05 +0000 (14:10 -0700)]
Linux 6.9-rc1
Linus Torvalds [Sun, 24 Mar 2024 20:54:06 +0000 (13:54 -0700)]
Merge tag 'efi-fixes-for-v6.9-2' of git://git./linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
- Fix logic that is supposed to prevent placement of the kernel image
below LOAD_PHYSICAL_ADDR
- Use the firmware stack in the EFI stub when running in mixed mode
- Clear BSS only once when using mixed mode
- Check efi.get_variable() function pointer for NULL before trying to
call it
* tag 'efi-fixes-for-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: fix panic in kdump kernel
x86/efistub: Don't clear BSS twice in mixed mode
x86/efistub: Call mixed mode boot services on the firmware's stack
efi/libstub: fix efi_random_alloc() to allocate memory at alloc_min or higher address
Linus Torvalds [Sun, 24 Mar 2024 18:13:56 +0000 (11:13 -0700)]
Merge tag 'x86-urgent-2024-03-24' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
- Ensure that the encryption mask at boot is properly propagated on
5-level page tables, otherwise the PGD entry is incorrectly set to
non-encrypted, which causes system crashes during boot.
- Undo the deferred 5-level page table setup as it cannot work with
memory encryption enabled.
- Prevent inconsistent XFD state on CPU hotplug, where the MSR is reset
to the default value but the cached variable is not, so subsequent
comparisons might yield the wrong result and as a consequence the
result prevents updating the MSR.
- Register the local APIC address only once in the MPPARSE enumeration
to prevent triggering the related WARN_ONs() in the APIC and topology
code.
- Handle the case where no APIC is found gracefully by registering a
fake APIC in the topology code. That makes all related topology
functions work correctly and does not affect the actual APIC driver
code at all.
- Don't evaluate logical IDs during early boot as the local APIC IDs
are not yet enumerated and the invoked function returns an error
code. Nothing requires the logical IDs before the final CPUID
enumeration takes place, which happens after the enumeration.
- Cure the fallout of the per CPU rework on UP which misplaced the
copying of boot_cpu_data to per CPU data so that the final update to
boot_cpu_data got lost which caused inconsistent state and boot
crashes.
- Use copy_from_kernel_nofault() in the kprobes setup as there is no
guarantee that the address can be safely accessed.
- Reorder struct members in struct saved_context to work around another
kmemleak false positive
- Remove the buggy code which tries to update the E820 kexec table for
setup_data as that is never passed to the kexec kernel.
- Update the resource control documentation to use the proper units.
- Fix a Kconfig warning observed with tinyconfig
* tag 'x86-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/64: Move 5-level paging global variable assignments back
x86/boot/64: Apply encryption mask to 5-level pagetable update
x86/cpu: Add model number for another Intel Arrow Lake mobile processor
x86/fpu: Keep xfd_state in sync with MSR_IA32_XFD
Documentation/x86: Document that resctrl bandwidth control units are MiB
x86/mpparse: Register APIC address only once
x86/topology: Handle the !APIC case gracefully
x86/topology: Don't evaluate logical IDs during early boot
x86/cpu: Ensure that CPU info updates are propagated on UP
kprobes/x86: Use copy_from_kernel_nofault() to read from unsafe address
x86/pm: Work around false positive kmemleak report in msr_build_context()
x86/kexec: Do not update E820 kexec table for setup_data
x86/config: Fix warning for 'make ARCH=x86_64 tinyconfig'
Linus Torvalds [Sun, 24 Mar 2024 18:11:05 +0000 (11:11 -0700)]
Merge tag 'sched-urgent-2024-03-24' of git://git./linux/kernel/git/tip/tip
Pull scheduler doc clarification from Thomas Gleixner:
"A single update for the documentation of the base_slice_ns tunable to
clarify that any value which is less than the tick slice has no effect
because the scheduler tick is not guaranteed to happen within the set
time slice"
* tag 'sched-urgent-2024-03-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/doc: Update documentation for base_slice_ns and CONFIG_HZ relation
Linus Torvalds [Sun, 24 Mar 2024 17:45:31 +0000 (10:45 -0700)]
Merge tag 'dma-mapping-6.9-2024-03-24' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
"This has a set of swiotlb alignment fixes for sometimes very long
standing bugs from Will. We've been discussion them for a while and
they should be solid now"
* tag 'dma-mapping-6.9-2024-03-24' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: Reinstate page-alignment for mappings >= PAGE_SIZE
iommu/dma: Force swiotlb_max_mapping_size on an untrusted device
swiotlb: Fix alignment checks when both allocation and DMA masks are present
swiotlb: Honour dma_alloc_coherent() alignment in swiotlb_alloc()
swiotlb: Enforce page alignment in swiotlb_alloc()
swiotlb: Fix double-allocation of slots due to broken alignment handling
Oleksandr Tymoshenko [Sat, 23 Mar 2024 06:33:33 +0000 (06:33 +0000)]
efi: fix panic in kdump kernel
Check if get_next_variable() is actually valid pointer before
calling it. In kdump kernel this method is set to NULL that causes
panic during the kexec-ed kernel boot.
Tested with QEMU and OVMF firmware.
Fixes: bad267f9e18f ("efi: verify that variable services are supported")
Signed-off-by: Oleksandr Tymoshenko <ovt@google.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Ard Biesheuvel [Fri, 22 Mar 2024 16:01:45 +0000 (17:01 +0100)]
x86/efistub: Don't clear BSS twice in mixed mode
Clearing BSS should only be done once, at the very beginning.
efi_pe_entry() is the entrypoint from the firmware, which may not clear
BSS and so it is done explicitly. However, efi_pe_entry() is also used
as an entrypoint by the mixed mode startup code, in which case BSS will
already have been cleared, and doing it again at this point will corrupt
global variables holding the firmware's GDT/IDT and segment selectors.
So make the memset() conditional on whether the EFI stub is running in
native mode.
Fixes: b3810c5a2cc4a666 ("x86/efistub: Clear decompressor BSS in native EFI entrypoint")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Ard Biesheuvel [Fri, 22 Mar 2024 15:03:58 +0000 (17:03 +0200)]
x86/efistub: Call mixed mode boot services on the firmware's stack
Normally, the EFI stub calls into the EFI boot services using the stack
that was live when the stub was entered. According to the UEFI spec,
this stack needs to be at least 128k in size - this might seem large but
all asynchronous processing and event handling in EFI runs from the same
stack and so quite a lot of space may be used in practice.
In mixed mode, the situation is a bit different: the bootloader calls
the 32-bit EFI stub entry point, which calls the decompressor's 32-bit
entry point, where the boot stack is set up, using a fixed allocation
of 16k. This stack is still in use when the EFI stub is started in
64-bit mode, and so all calls back into the EFI firmware will be using
the decompressor's limited boot stack.
Due to the placement of the boot stack right after the boot heap, any
stack overruns have gone unnoticed. However, commit
5c4feadb0011983b ("x86/decompressor: Move global symbol references to C code")
moved the definition of the boot heap into C code, and now the boot
stack is placed right at the base of BSS, where any overruns will
corrupt the end of the .data section.
While it would be possible to work around this by increasing the size of
the boot stack, doing so would affect all x86 systems, and mixed mode
systems are a tiny (and shrinking) fraction of the x86 installed base.
So instead, record the firmware stack pointer value when entering from
the 32-bit firmware, and switch to this stack every time a EFI boot
service call is made.
Cc: <stable@kernel.org> # v6.1+
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tom Lendacky [Fri, 22 Mar 2024 15:41:07 +0000 (10:41 -0500)]
x86/boot/64: Move 5-level paging global variable assignments back
Commit
63bed9660420 ("x86/startup_64: Defer assignment of 5-level paging
global variables") moved assignment of 5-level global variables to later
in the boot in order to avoid having to use RIP relative addressing in
order to set them. However, when running with 5-level paging and SME
active (mem_encrypt=on), the variables are needed as part of the page
table setup needed to encrypt the kernel (using pgd_none(), p4d_offset(),
etc.). Since the variables haven't been set, the page table manipulation
is done as if 4-level paging is active, causing the system to crash on
boot.
While only a subset of the assignments that were moved need to be set
early, move all of the assignments back into check_la57_support() so that
these assignments aren't spread between two locations. Instead of just
reverting the fix, this uses the new RIP_REL_REF() macro when assigning
the variables.
Fixes: 63bed9660420 ("x86/startup_64: Defer assignment of 5-level paging global variables")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/2ca419f4d0de719926fd82353f6751f717590a86.1711122067.git.thomas.lendacky@amd.com
Tom Lendacky [Fri, 22 Mar 2024 15:41:06 +0000 (10:41 -0500)]
x86/boot/64: Apply encryption mask to 5-level pagetable update
When running with 5-level page tables, the kernel mapping PGD entry is
updated to point to the P4D table. The assignment uses _PAGE_TABLE_NOENC,
which, when SME is active (mem_encrypt=on), results in a page table
entry without the encryption mask set, causing the system to crash on
boot.
Change the assignment to use _PAGE_TABLE instead of _PAGE_TABLE_NOENC so
that the encryption mask is set for the PGD entry.
Fixes: 533568e06b15 ("x86/boot/64: Use RIP_REL_REF() to access early_top_pgt[]")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/8f20345cda7dbba2cf748b286e1bc00816fe649a.1711122067.git.thomas.lendacky@amd.com
Tony Luck [Fri, 22 Mar 2024 16:17:25 +0000 (09:17 -0700)]
x86/cpu: Add model number for another Intel Arrow Lake mobile processor
This one is the regular laptop CPU.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240322161725.195614-1-tony.luck@intel.com
Adamos Ttofari [Fri, 22 Mar 2024 23:04:39 +0000 (16:04 -0700)]
x86/fpu: Keep xfd_state in sync with MSR_IA32_XFD
Commit
672365477ae8 ("x86/fpu: Update XFD state where required") and
commit
8bf26758ca96 ("x86/fpu: Add XFD state to fpstate") introduced a
per CPU variable xfd_state to keep the MSR_IA32_XFD value cached, in
order to avoid unnecessary writes to the MSR.
On CPU hotplug MSR_IA32_XFD is reset to the init_fpstate.xfd, which
wipes out any stale state. But the per CPU cached xfd value is not
reset, which brings them out of sync.
As a consequence a subsequent xfd_update_state() might fail to update
the MSR which in turn can result in XRSTOR raising a #NM in kernel
space, which crashes the kernel.
To fix this, introduce xfd_set_state() to write xfd_state together
with MSR_IA32_XFD, and use it in all places that set MSR_IA32_XFD.
Fixes: 672365477ae8 ("x86/fpu: Update XFD state where required")
Signed-off-by: Adamos Ttofari <attofari@amazon.de>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20240322230439.456571-1-chang.seok.bae@intel.com
Closes: https://lore.kernel.org/lkml/20230511152818.13839-1-attofari@amazon.de
Tony Luck [Fri, 22 Mar 2024 18:20:15 +0000 (11:20 -0700)]
Documentation/x86: Document that resctrl bandwidth control units are MiB
The memory bandwidth software controller uses 2^20 units rather than
10^6. See mbm_bw_count() which computes bandwidth using the "SZ_1M"
Linux define for 0x00100000.
Update the documentation to use MiB when describing this feature.
It's too late to fix the mount option "mba_MBps" as that is now an
established user interface.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240322182016.196544-1-tony.luck@intel.com
Linus Torvalds [Sat, 23 Mar 2024 21:49:25 +0000 (14:49 -0700)]
Merge tag 'timers-urgent-2024-03-23' of git://git./linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"Two regression fixes for the timer and timer migration code:
- Prevent endless timer requeuing which is caused by two CPUs racing
out of idle. This happens when the last CPU goes idle and therefore
has to ensure to expire the pending global timers and some other
CPU come out of idle at the same time and the other CPU wins the
race and expires the global queue. This causes the last CPU to
chase ghost timers forever and reprogramming it's clockevent device
endlessly.
Cure this by re-evaluating the wakeup time unconditionally.
- The split into local (pinned) and global timers in the timer wheel
caused a regression for NOHZ full as it broke the idle tracking of
global timers. On NOHZ full this prevents an self IPI being sent
which in turn causes the timer to be not programmed and not being
expired on time.
Restore the idle tracking for the global timer base so that the
self IPI condition for NOHZ full is working correctly again"
* tag 'timers-urgent-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers: Fix removed self-IPI on global timer's enqueue in nohz_full
timers/migration: Fix endless timer requeue after idle interrupts
Linus Torvalds [Sat, 23 Mar 2024 21:42:45 +0000 (14:42 -0700)]
Merge tag 'timers-core-2024-03-23' of git://git./linux/kernel/git/tip/tip
Pull more clocksource updates from Thomas Gleixner:
"A set of updates for clocksource and clockevent drivers:
- A fix for the prescaler of the ARM global timer where the prescaler
mask define only covered 4 bits while it is actully 8 bits wide.
This obviously restricted the possible range of prescaler
adjustments
- A fix for the RISC-V timer which prevents a timer interrupt being
raised while the timer is initialized
- A set of device tree updates to support new system on chips in
various drivers
- Kernel-doc and other cleanups all over the place"
* tag 'timers-core-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/timer-riscv: Clear timer interrupt on timer initialization
dt-bindings: timer: Add support for cadence TTC PWM
clocksource/drivers/arm_global_timer: Simplify prescaler register access
clocksource/drivers/arm_global_timer: Guard against division by zero
clocksource/drivers/arm_global_timer: Make gt_target_rate unsigned long
dt-bindings: timer: add Ralink SoCs system tick counter
clocksource: arm_global_timer: fix non-kernel-doc comment
clocksource/drivers/arm_global_timer: Remove stray tab
clocksource/drivers/arm_global_timer: Fix maximum prescaler value
clocksource/drivers/imx-sysctr: Add i.MX95 support
clocksource/drivers/imx-sysctr: Drop use global variables
dt-bindings: timer: nxp,sysctr-timer: support i.MX95
dt-bindings: timer: renesas: ostm: Document RZ/Five SoC
dt-bindings: timer: renesas,tmu: Document input capture interrupt
clocksource/drivers/ti-32K: Fix misuse of "/**" comment
clocksource/drivers/stm32: Fix all kernel-doc warnings
dt-bindings: timer: exynos4210-mct: Add google,gs101-mct compatible
clocksource/drivers/imx: Fix -Wunused-but-set-variable warning
Linus Torvalds [Sat, 23 Mar 2024 21:30:38 +0000 (14:30 -0700)]
Merge tag 'irq-urgent-2024-03-23' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A series of fixes for the Renesas RZG21 interrupt chip driver to
prevent spurious and misrouted interrupts.
- Ensure that posted writes are flushed in the eoi() callback
- Ensure that interrupts are masked at the chip level when the
trigger type is changed
- Clear the interrupt status register when setting up edge type
trigger modes
- Ensure that the trigger type and routing information is set before
the interrupt is enabled"
* tag 'irq-urgent-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/renesas-rzg2l: Do not set TIEN and TINT source at the same time
irqchip/renesas-rzg2l: Prevent spurious interrupts when setting trigger type
irqchip/renesas-rzg2l: Rename rzg2l_irq_eoi()
irqchip/renesas-rzg2l: Rename rzg2l_tint_eoi()
irqchip/renesas-rzg2l: Flush posted write in irq_eoi()
Linus Torvalds [Sat, 23 Mar 2024 21:17:37 +0000 (14:17 -0700)]
Merge tag 'core-entry-2024-03-23' of git://git./linux/kernel/git/tip/tip
Pull core entry fix from Thomas Gleixner:
"A single fix for the generic entry code:
The trace_sys_enter() tracepoint can modify the syscall number via
kprobes or BPF in pt_regs, but that requires that the syscall number
is re-evaluted from pt_regs after the tracepoint.
A seccomp fix in that area removed the re-evaluation so the change
does not take effect as the code just uses the locally cached number.
Restore the original behaviour by re-evaluating the syscall number
after the tracepoint"
* tag 'core-entry-2024-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
entry: Respect changes to system call number by trace_sys_enter()
Linus Torvalds [Sat, 23 Mar 2024 16:21:26 +0000 (09:21 -0700)]
Merge tag 'powerpc-6.9-2' of git://git./linux/kernel/git/powerpc/linux
Pull more powerpc updates from Michael Ellerman:
- Handle errors in mark_rodata_ro() and mark_initmem_nx()
- Make struct crash_mem available without CONFIG_CRASH_DUMP
Thanks to Christophe Leroy and Hari Bathini.
* tag 'powerpc-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/kdump: Split KEXEC_CORE and CRASH_DUMP dependency
powerpc/kexec: split CONFIG_KEXEC_FILE and CONFIG_CRASH_DUMP
kexec/kdump: make struct crash_mem available without CONFIG_CRASH_DUMP
powerpc: Handle error in mark_rodata_ro() and mark_initmem_nx()
Linus Torvalds [Sat, 23 Mar 2024 16:17:03 +0000 (09:17 -0700)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King:
- remove a misuse of kernel-doc comment
- use "Call trace:" for backtraces like other architectures
- implement copy_from_kernel_nofault_allowed() to fix a LKDTM test
- add a "cut here" line for prefetch aborts
- remove unnecessary Kconfing entry for FRAME_POINTER
- remove iwmmxy support for PJ4/PJ4B cores
- use bitfield helpers in ptrace to improve readabililty
- check if folio is reserved before flushing
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9359/1: flush: check if the folio is reserved for no-mapping addresses
ARM: 9354/1: ptrace: Use bitfield helpers
ARM: 9352/1: iwmmxt: Remove support for PJ4/PJ4B cores
ARM: 9353/1: remove unneeded entry for CONFIG_FRAME_POINTER
ARM: 9351/1: fault: Add "cut here" line for prefetch aborts
ARM: 9350/1: fault: Implement copy_from_kernel_nofault_allowed()
ARM: 9349/1: unwind: Add missing "Call trace:" line
ARM: 9334/1: mm: init: remove misuse of kernel-doc comment
Linus Torvalds [Sat, 23 Mar 2024 15:43:21 +0000 (08:43 -0700)]
Merge tag 'hardening-v6.9-rc1-fixes' of git://git./linux/kernel/git/kees/linux
Pull more hardening updates from Kees Cook:
- CONFIG_MEMCPY_SLOW_KUNIT_TEST is no longer needed (Guenter Roeck)
- Fix needless UTF-8 character in arch/Kconfig (Liu Song)
- Improve __counted_by warning message in LKDTM (Nathan Chancellor)
- Refactor DEFINE_FLEX() for default use of __counted_by
- Disable signed integer overflow sanitizer on GCC < 8
* tag 'hardening-v6.9-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
lkdtm/bugs: Improve warning message for compilers without counted_by support
overflow: Change DEFINE_FLEX to take __counted_by member
Revert "kunit: memcpy: Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TEST"
arch/Kconfig: eliminate needless UTF-8 character in Kconfig help
ubsan: Disable signed integer overflow sanitizer on GCC < 8
Thomas Gleixner [Fri, 22 Mar 2024 18:56:39 +0000 (19:56 +0100)]
x86/mpparse: Register APIC address only once
The APIC address is registered twice. First during the early detection and
afterwards when actually scanning the table for APIC IDs. The APIC and
topology core warn about the second attempt.
Restrict it to the early detection call.
Fixes: 81287ad65da5 ("x86/apic: Sanitize APIC address setup")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20240322185305.297774848@linutronix.de
Thomas Gleixner [Fri, 22 Mar 2024 18:56:38 +0000 (19:56 +0100)]
x86/topology: Handle the !APIC case gracefully
If there is no local APIC enumerated and registered then the topology
bitmaps are empty. Therefore, topology_init_possible_cpus() will die with
a division by zero exception.
Prevent this by registering a fake APIC id to populate the topology
bitmap. This also allows to use all topology query interfaces
unconditionally. It does not affect the actual APIC code because either
the local APIC address was not registered or no local APIC could be
detected.
Fixes: f1f758a80516 ("x86/topology: Add a mechanism to track topology via APIC IDs")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20240322185305.242709302@linutronix.de
Thomas Gleixner [Fri, 22 Mar 2024 18:56:36 +0000 (19:56 +0100)]
x86/topology: Don't evaluate logical IDs during early boot
The local APICs have not yet been enumerated so the logical ID evaluation
from the topology bitmaps does not work and would return an error code.
Skip the evaluation during the early boot CPUID evaluation and only apply
it on the final run.
Fixes: 380414be78bf ("x86/cpu/topology: Use topology logical mapping mechanism")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20240322185305.186943142@linutronix.de
Thomas Gleixner [Fri, 22 Mar 2024 18:56:35 +0000 (19:56 +0100)]
x86/cpu: Ensure that CPU info updates are propagated on UP
The boot sequence evaluates CPUID information twice:
1) During early boot
2) When finalizing the early setup right before
mitigations are selected and alternatives are patched.
In both cases the evaluation is stored in boot_cpu_data, but on UP the
copying of boot_cpu_data to the per CPU info of the boot CPU happens
between #1 and #2. So any update which happens in #2 is never propagated to
the per CPU info instance.
Consolidate the whole logic and copy boot_cpu_data right before applying
alternatives as that's the point where boot_cpu_data is in it's final
state and not supposed to change anymore.
This also removes the voodoo mb() from smp_prepare_cpus_common() which
had absolutely no purpose.
Fixes: 71eb4893cfaf ("x86/percpu: Cure per CPU madness on UP")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20240322185305.127642785@linutronix.de
Nathan Chancellor [Thu, 21 Mar 2024 20:18:17 +0000 (13:18 -0700)]
lkdtm/bugs: Improve warning message for compilers without counted_by support
The current message for telling the user that their compiler does not
support the counted_by attribute in the FAM_BOUNDS test does not make
much sense either grammatically or semantically. Fix it to make it
correct in both aspects.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20240321-lkdtm-improve-lack-of-counted_by-msg-v1-1-0fbf7481a29c@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Kees Cook [Wed, 6 Mar 2024 23:51:36 +0000 (15:51 -0800)]
overflow: Change DEFINE_FLEX to take __counted_by member
The norm should be flexible array structures with __counted_by
annotations, so DEFINE_FLEX() is updated to expect that. Rename
the non-annotated version to DEFINE_RAW_FLEX(), and update the
few existing users. Additionally add selftests for the macros.
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20240306235128.it.933-kees@kernel.org
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Linus Torvalds [Fri, 22 Mar 2024 20:31:07 +0000 (13:31 -0700)]
Merge tag 'scsi-misc' of git://git./linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"The vfs has long had a write lifetime hint mechanism that gives the
expected longevity on storage of the data being written. f2fs was the
original consumer of this and used the hint for flash data placement
(mostly to avoid write amplification by placing objects with similar
lifetimes in the same erase block).
More recently the SCSI based UFS (Universal Flash Storage) drivers
have wanted to take advantage of this as well, for the same reasons as
f2fs, necessitating plumbing the write hints through the block layer
and then adding it to the SCSI core.
The vfs write_hints already taken plumbs this as far as block and this
completes the SCSI core enabling based on a recently agreed reuse of
the old write command group number. The additions to the scsi_debug
driver are for emulating this property so we can run tests on it in
the absence of an actual UFS device"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: scsi_debug: Maintain write statistics per group number
scsi: scsi_debug: Implement GET STREAM STATUS
scsi: scsi_debug: Implement the IO Advice Hints Grouping mode page
scsi: scsi_debug: Allocate the MODE SENSE response from the heap
scsi: scsi_debug: Rework subpage code error handling
scsi: scsi_debug: Rework page code error handling
scsi: scsi_debug: Support the block limits extension VPD page
scsi: scsi_debug: Reduce code duplication
scsi: sd: Translate data lifetime information
scsi: scsi_proto: Add structures and constants related to I/O groups and streams
scsi: core: Query the Block Limits Extension VPD page
Linus Torvalds [Fri, 22 Mar 2024 19:46:07 +0000 (12:46 -0700)]
Merge tag 'block-6.9-
20240322' of git://git.kernel.dk/linux
Pull more block updates from Jens Axboe:
- NVMe pull request via Keith:
- Make an informative message less ominous (Keith)
- Enhanced trace decoding (Guixin)
- TCP updates (Hannes, Li)
- Fabrics connect deadlock fix (Chunguang)
- Platform API migration update (Uwe)
- A new device quirk (Jiawei)
- Remove dead assignment in fd (Yufeng)
* tag 'block-6.9-
20240322' of git://git.kernel.dk/linux:
nvmet-rdma: remove NVMET_RDMA_REQ_INVALIDATE_RKEY flag
nvme: remove redundant BUILD_BUG_ON check
floppy: remove duplicated code in redo_fd_request()
nvme/tcp: Add wq_unbound modparam for nvme_tcp_wq
nvme-tcp: Export the nvme_tcp_wq to sysfs
drivers/nvme: Add quirks for device 126f:2262
nvme: parse format command's lbafu when tracing
nvme: add tracing of reservation commands
nvme: parse zns command's zsa and zrasf to string
nvme: use nvme_disk_is_ns_head helper
nvme: fix reconnection fail due to reserved tag allocation
nvmet: add tracing of zns commands
nvmet: add tracing of authentication commands
nvme-apple: Convert to platform remove callback returning void
nvmet-tcp: do not continue for invalid icreq
nvme: change shutdown timeout setting message
Linus Torvalds [Fri, 22 Mar 2024 19:42:55 +0000 (12:42 -0700)]
Merge tag 'io_uring-6.9-
20240322' of git://git.kernel.dk/linux
Pull more io_uring updates from Jens Axboe:
"One patch just missed the initial pull, the rest are either fixes or
small cleanups that make our life easier for the next kernel:
- Fix a potential leak in error handling of pinned pages, and clean
it up (Gabriel, Pavel)
- Fix an issue with how read multishot returns retry (me)
- Fix a problem with waitid/futex removals, if we hit the case of
needing to remove all of them at exit time (me)
- Fix for a regression introduced in this merge window, where we
don't always have sr->done_io initialized if the ->prep_async()
path is used (me)
- Fix for SQPOLL setup error handling (me)
- Fix for a poll removal request being delayed (Pavel)
- Rename of a struct member which had a confusing name (Pavel)"
* tag 'io_uring-6.9-
20240322' of git://git.kernel.dk/linux:
io_uring/sqpoll: early exit thread if task_context wasn't allocated
io_uring: clear opcode specific data for an early failure
io_uring/net: ensure async prep handlers always initialize ->done_io
io_uring/waitid: always remove waitid entry for cancel all
io_uring/futex: always remove futex entry for cancel all
io_uring: fix poll_remove stalled req completion
io_uring: Fix release of pinned pages when __io_uaddr_map fails
io_uring/kbuf: rename is_mapped
io_uring: simplify io_pages_free
io_uring: clean rings on NO_MMAP alloc fail
io_uring/rw: return IOU_ISSUE_SKIP_COMPLETE for multishot retry
io_uring: don't save/restore iowait state
Linus Torvalds [Fri, 22 Mar 2024 19:34:26 +0000 (12:34 -0700)]
Merge tag 'for-6.9/dm-fixes' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix a memory leak in DM integrity recheck code that was added during
the 6.9 merge. Also fix the recheck code to ensure it issues bios
with proper alignment.
- Fix DM snapshot's dm_exception_table_exit() to schedule while
handling an large exception table during snapshot device shutdown.
* tag 'for-6.9/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm-integrity: align the outgoing bio in integrity_recheck
dm snapshot: fix lockup in dm_exception_table_exit
dm-integrity: fix a memory leak when rechecking the data
Linus Torvalds [Fri, 22 Mar 2024 18:15:45 +0000 (11:15 -0700)]
Merge tag 'ceph-for-6.9-rc1' of https://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"A patch to minimize blockage when processing very large batches of
dirty caps and two fixes to better handle EOF in the face of multiple
clients performing reads and size-extending writes at the same time"
* tag 'ceph-for-6.9-rc1' of https://github.com/ceph/ceph-client:
ceph: set correct cap mask for getattr request for read
ceph: stop copying to iter at EOF on sync reads
ceph: remove SLAB_MEM_SPREAD flag usage
ceph: break the check delayed cap loop every 5s
Linus Torvalds [Fri, 22 Mar 2024 18:12:21 +0000 (11:12 -0700)]
Merge tag 'xfs-6.9-merge-9' of git://git./fs/xfs/xfs-linux
Pull xfs fixes from Chandan Babu:
- Fix invalid pointer dereference by initializing xmbuf before
tracepoint function is invoked
- Use memalloc_nofs_save() when inserting into quota radix tree
* tag 'xfs-6.9-merge-9' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: quota radix tree allocations need to be NOFS on insert
xfs: fix dev_t usage in xmbuf tracepoints
Linus Torvalds [Fri, 22 Mar 2024 17:41:13 +0000 (10:41 -0700)]
Merge tag 'riscv-for-linus-6.9-mw2' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Support for various vector-accelerated crypto routines
- Hibernation is now enabled for portable kernel builds
- mmap_rnd_bits_max is larger on systems with larger VAs
- Support for fast GUP
- Support for membarrier-based instruction cache synchronization
- Support for the Andes hart-level interrupt controller and PMU
- Some cleanups around unaligned access speed probing and Kconfig
settings
- Support for ACPI LPI and CPPC
- Various cleanus related to barriers
- A handful of fixes
* tag 'riscv-for-linus-6.9-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (66 commits)
riscv: Fix syscall wrapper for >word-size arguments
crypto: riscv - add vector crypto accelerated AES-CBC-CTS
crypto: riscv - parallelize AES-CBC decryption
riscv: Only flush the mm icache when setting an exec pte
riscv: Use kcalloc() instead of kzalloc()
riscv/barrier: Add missing space after ','
riscv/barrier: Consolidate fence definitions
riscv/barrier: Define RISCV_FULL_BARRIER
riscv/barrier: Define __{mb,rmb,wmb}
RISC-V: defconfig: Enable CONFIG_ACPI_CPPC_CPUFREQ
cpufreq: Move CPPC configs to common Kconfig and add RISC-V
ACPI: RISC-V: Add CPPC driver
ACPI: Enable ACPI_PROCESSOR for RISC-V
ACPI: RISC-V: Add LPI driver
cpuidle: RISC-V: Move few functions to arch/riscv
riscv: Introduce set_compat_task() in asm/compat.h
riscv: Introduce is_compat_thread() into compat.h
riscv: add compile-time test into is_compat_task()
riscv: Replace direct thread flag check with is_compat_task()
riscv: Improve arch_get_mmap_end() macro
...
Linus Torvalds [Fri, 22 Mar 2024 17:22:45 +0000 (10:22 -0700)]
Merge tag 'loongarch-6.9' of git://git./linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
- Add objtool support for LoongArch
- Add ORC stack unwinder support for LoongArch
- Add kernel livepatching support for LoongArch
- Select ARCH_HAS_CURRENT_STACK_POINTER in Kconfig
- Select HAVE_ARCH_USERFAULTFD_MINOR in Kconfig
- Some bug fixes and other small changes
* tag 'loongarch-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch/crypto: Clean up useless assignment operations
LoongArch: Define the __io_aw() hook as mmiowb()
LoongArch: Remove superfluous flush_dcache_page() definition
LoongArch: Move {dmw,tlb}_virt_to_page() definition to page.h
LoongArch: Change __my_cpu_offset definition to avoid mis-optimization
LoongArch: Select HAVE_ARCH_USERFAULTFD_MINOR in Kconfig
LoongArch: Select ARCH_HAS_CURRENT_STACK_POINTER in Kconfig
LoongArch: Add kernel livepatching support
LoongArch: Add ORC stack unwinder support
objtool: Check local label in read_unwind_hints()
objtool: Check local label in add_dead_ends()
objtool/LoongArch: Enable orc to be built
objtool/x86: Separate arch-specific and generic parts
objtool/LoongArch: Implement instruction decoder
objtool/LoongArch: Enable objtool to be built
Linus Torvalds [Fri, 22 Mar 2024 17:09:08 +0000 (10:09 -0700)]
Merge tag 'fbdev-for-6.9-rc1' of git://git./linux/kernel/git/deller/linux-fbdev
Pull fbdev updates from Helge Deller:
- Allow console fonts up to 64x128 pixels (Samuel Thibault)
- Prevent division-by-zero in fb monitor code (Roman Smirnov)
- Drop Renesas ARM platforms from Mobile LCDC framebuffer driver (Geert
Uytterhoeven)
- Various code cleanups in viafb, uveafb and mb862xxfb drivers by
Aleksandr Burakov, Li Zhijian and Michael Ellerman
* tag 'fbdev-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
fbdev: panel-tpo-td043mtea1: Convert sprintf() to sysfs_emit()
fbmon: prevent division by zero in fb_videomode_from_videomode()
fbcon: Increase maximum font width x height to 64 x 128
fbdev: viafb: fix typo in hw_bitblt_1 and hw_bitblt_2
fbdev: mb862xxfb: Fix defined but not used error
fbdev: uvesafb: Convert sprintf/snprintf to sysfs_emit
fbdev: Restrict FB_SH_MOBILE_LCDC to SuperH
Linus Torvalds [Fri, 22 Mar 2024 16:57:00 +0000 (09:57 -0700)]
Merge tag 'spi-fix-v6.9-merge-window' of git://git./linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A small collection of fixes that came in since the merge window. Most
of it is relatively minor driver specific fixes, there's also fixes
for error handling with SPI flash devices and a fix restoring delay
control functionality for non-GPIO chip selects managed by the core"
* tag 'spi-fix-v6.9-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-mt65xx: Fix NULL pointer access in interrupt handler
spi: docs: spidev: fix echo command format
spi: spi-imx: fix off-by-one in mx51 CPU mode burst length
spi: lm70llp: fix links in doc and comments
spi: Fix error code checking in spi_mem_exec_op()
spi: Restore delays for non-GPIO chip select
spi: lpspi: Avoid potential use-after-free in probe()
Linus Torvalds [Fri, 22 Mar 2024 16:52:37 +0000 (09:52 -0700)]
Merge tag 'regulator-fix-v6.9-merge-window' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"One fix that came in during the merge window, fixing a problem with
bootstrapping the state of exclusive regulators which have a parent
regulator"
* tag 'regulator-fix-v6.9-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: core: Propagate the regulator state in case of exclusive get
Linus Torvalds [Fri, 22 Mar 2024 16:44:19 +0000 (09:44 -0700)]
Merge tag 'sound-fix2-6.9-rc1' of git://git./linux/kernel/git/tiwai/sound
Pull more sound fixes from Takashi Iwai:
"The remaining fixes for 6.9-rc1 that have been gathered in this week.
More about ASoC at this time (one long-standing fix for compress
offload, SOF, AMD ACP, Rockchip, Cirrus and tlv320 stuff) while
another regression fix in ALSA core and a couple of HD-audio quirks as
usual are included"
* tag 'sound-fix2-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: control: Fix unannotated kfree() cleanup
ALSA: hda/realtek: Add quirks for some Clevo laptops
ALSA: hda/realtek: Add quirk for HP Spectre x360 14 eu0000
ALSA: hda/realtek: fix the hp playback volume issue for LG machines
ASoC: soc-compress: Fix and add DPCM locking
ASoC: SOF: amd: Skip IRAM/DRAM size modification for Steam Deck OLED
ASoC: SOF: amd: Move signed_fw_image to struct acp_quirk_entry
ASoC: amd: yc: Revert "add new YC platform variant (0x63) support"
ASoC: amd: yc: Revert "Fix non-functional mic on Lenovo 21J2"
ASoC: soc-core.c: Skip dummy codec when adding platforms
ASoC: rockchip: i2s-tdm: Fix inaccurate sampling rates
ASoC: dt-bindings: cirrus,cs42l43: Fix 'gpio-ranges' schema
ASoC: amd: yc: Fix non-functional mic on ASUS M7600RE
ASoC: tlv320adc3xxx: Don't strip remove function when driver is builtin
Linus Torvalds [Fri, 22 Mar 2024 16:39:11 +0000 (09:39 -0700)]
Merge tag 'i2c-for-6.9-rc1-part2' of git://git./linux/kernel/git/wsa/linux
Pull more i2c updates from Wolfram Sang:
"Some more I2C updates after the dependencies have been merged now.
Plus a DT binding fix"
* tag 'i2c-for-6.9-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
dt-bindings: i2c: qcom,i2c-cci: Fix OV7251 'data-lanes' entries
i2c: muxes: pca954x: Allow sharing reset GPIO
i2c: nomadik: sort includes
i2c: nomadik: support Mobileye EyeQ5 I2C controller
i2c: nomadik: fetch i2c-transfer-timeout-us property from devicetree
i2c: nomadik: replace jiffies by ktime for FIFO flushing timeout
i2c: nomadik: support short xfer timeouts using waitqueue & hrtimer
i2c: nomadik: use bitops helpers
i2c: nomadik: simplify IRQ masking logic
i2c: nomadik: rename private struct pointers from dev to priv
dt-bindings: i2c: nomadik: add mobileye,eyeq5-i2c bindings and example
KONDO KAZUMA(近藤 和真) [Fri, 22 Mar 2024 10:47:02 +0000 (10:47 +0000)]
efi/libstub: fix efi_random_alloc() to allocate memory at alloc_min or higher address
Following warning is sometimes observed while booting my servers:
[ 3.594838] DMA: preallocated 4096 KiB GFP_KERNEL pool for atomic allocations
[ 3.602918] swapper/0: page allocation failure: order:10, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0-1
...
[ 3.851862] DMA: preallocated 1024 KiB GFP_KERNEL|GFP_DMA pool for atomic allocation
If 'nokaslr' boot option is set, the warning always happens.
On x86, ZONE_DMA is small zone at the first 16MB of physical address
space. When this problem happens, most of that space seems to be used by
decompressed kernel. Thereby, there is not enough space at DMA_ZONE to
meet the request of DMA pool allocation.
The commit
2f77465b05b1 ("x86/efistub: Avoid placing the kernel below
LOAD_PHYSICAL_ADDR") tried to fix this problem by introducing lower
bound of allocation.
But the fix is not complete.
efi_random_alloc() allocates pages by following steps.
1. Count total available slots ('total_slots')
2. Select a slot ('target_slot') to allocate randomly
3. Calculate a starting address ('target') to be included target_slot
4. Allocate pages, which starting address is 'target'
In step 1, 'alloc_min' is used to offset the starting address of memory
chunk. But in step 3 'alloc_min' is not considered at all. As the
result, 'target' can be miscalculated and become lower than 'alloc_min'.
When KASLR is disabled, 'target_slot' is always 0 and the problem
happens everytime if the EFI memory map of the system meets the
condition.
Fix this problem by calculating 'target' considering 'alloc_min'.
Cc: linux-efi@vger.kernel.org
Cc: Tom Englund <tomenglund26@gmail.com>
Cc: linux-kernel@vger.kernel.org
Fixes: 2f77465b05b1 ("x86/efistub: Avoid placing the kernel below LOAD_PHYSICAL_ADDR")
Signed-off-by: Kazuma Kondo <kazuma-kondo@nec.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Masami Hiramatsu (Google) [Thu, 14 Mar 2024 15:17:30 +0000 (00:17 +0900)]
kprobes/x86: Use copy_from_kernel_nofault() to read from unsafe address
Read from an unsafe address with copy_from_kernel_nofault() in
arch_adjust_kprobe_addr() because this function is used before checking
the address is in text or not. Syzcaller bot found a bug and reported
the case if user specifies inaccessible data area,
arch_adjust_kprobe_addr() will cause a kernel panic.
[ mingo: Clarified the comment. ]
Fixes: cc66bb914578 ("x86/ibt,kprobes: Cure sym+0 equals fentry woes")
Reported-by: Qiang Zhang <zzqq0103.hey@gmail.com>
Tested-by: Jinghao Jia <jinghao7@illinois.edu>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/171042945004.154897.2221804961882915806.stgit@devnote2
Anton Altaparmakov [Thu, 14 Mar 2024 14:26:56 +0000 (14:26 +0000)]
x86/pm: Work around false positive kmemleak report in msr_build_context()
Since:
7ee18d677989 ("x86/power: Make restore_processor_context() sane")
kmemleak reports this issue:
unreferenced object 0xf68241e0 (size 32):
comm "swapper/0", pid 1, jiffies
4294668610 (age 68.432s)
hex dump (first 32 bytes):
00 cc cc cc 29 10 01 c0 00 00 00 00 00 00 00 00 ....)...........
00 42 82 f6 cc cc cc cc cc cc cc cc cc cc cc cc .B..............
backtrace:
[<
461c1d50>] __kmem_cache_alloc_node+0x106/0x260
[<
ea65e13b>] __kmalloc+0x54/0x160
[<
c3858cd2>] msr_build_context.constprop.0+0x35/0x100
[<
46635aff>] pm_check_save_msr+0x63/0x80
[<
6b6bb938>] do_one_initcall+0x41/0x1f0
[<
3f3add60>] kernel_init_freeable+0x199/0x1e8
[<
3b538fde>] kernel_init+0x1a/0x110
[<
938ae2b2>] ret_from_fork+0x1c/0x28
Which is a false positive.
Reproducer:
- Run rsync of whole kernel tree (multiple times if needed).
- start a kmemleak scan
- Note this is just an example: a lot of our internal tests hit these.
The root cause is similar to the fix in:
b0b592cf0836 x86/pm: Fix false positive kmemleak report in msr_build_context()
ie. the alignment within the packed struct saved_context
which has everything unaligned as there is only "u16 gs;" at start of
struct where in the past there were four u16 there thus aligning
everything afterwards. The issue is with the fact that Kmemleak only
searches for pointers that are aligned (see how pointers are scanned in
kmemleak.c) so when the struct members are not aligned it doesn't see
them.
Testing:
We run a lot of tests with our CI, and after applying this fix we do not
see any kmemleak issues any more whilst without it we see hundreds of
the above report. From a single, simple test run consisting of 416 individual test
cases on kernel 5.10 x86 with kmemleak enabled we got 20 failures due to this,
which is quite a lot. With this fix applied we get zero kmemleak related failures.
Fixes: 7ee18d677989 ("x86/power: Make restore_processor_context() sane")
Signed-off-by: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: stable@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20240314142656.17699-1-anton@tuxera.com
Dave Young [Fri, 22 Mar 2024 05:15:08 +0000 (13:15 +0800)]
x86/kexec: Do not update E820 kexec table for setup_data
crashkernel reservation failed on a Thinkpad t440s laptop recently.
Actually the memblock reservation succeeded, but later insert_resource()
failed.
Test steps:
kexec load -> /* make sure add crashkernel param eg. crashkernel=160M */
kexec reboot ->
dmesg|grep "crashkernel reserved";
crashkernel memory range like below reserved successfully:
0x00000000d0000000 - 0x00000000da000000
But no such "Crash kernel" region in /proc/iomem
The background story:
Currently the E820 code reserves setup_data regions for both the current
kernel and the kexec kernel, and it inserts them into the resources list.
Before the kexec kernel reboots nobody passes the old setup_data, and
kexec only passes fresh SETUP_EFI/SETUP_IMA/SETUP_RNG_SEED if needed.
Thus the old setup data memory is not used at all.
Due to old kernel updates the kexec e820 table as well so kexec kernel
sees them as E820_TYPE_RESERVED_KERN regions, and later the old setup_data
regions are inserted into resources list in the kexec kernel by
e820__reserve_resources().
Note, due to no setup_data is passed in for those old regions they are not
early reserved (by function early_reserve_memory), and the crashkernel
memblock reservation will just treat them as usable memory and it could
reserve the crashkernel region which overlaps with the old setup_data
regions. And just like the bug I noticed here, kdump insert_resource
failed because e820__reserve_resources has added the overlapped chunks
in /proc/iomem already.
Finally, looking at the code, the old setup_data regions are not used
at all as no setup_data is passed in by the kexec boot loader. Although
something like SETUP_PCI etc could be needed, kexec should pass
the info as new setup_data so that kexec kernel can take care of them.
This should be taken care of in other separate patches if needed.
Thus drop the useless buggy code here.
Signed-off-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Bohac <jbohac@suse.cz>
Cc: Eric DeVolder <eric.devolder@oracle.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/r/Zf0T3HCG-790K-pZ@darkstar.users.ipa.redhat.com
Linus Torvalds [Fri, 22 Mar 2024 02:14:28 +0000 (19:14 -0700)]
Merge tag '6.9-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Various get_inode_info_fixes
- Fix for querying xattrs of cached dirs
- Four minor cleanup fixes (including adding some header corrections
and a missing flag)
- Performance improvement for deferred close
- Two query interface fixes
* tag '6.9-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
smb311: additional compression flag defined in updated protocol spec
smb311: correct incorrect offset field in compression header
cifs: Move some extern decls from .c files to .h
cifs: remove redundant variable assignment
cifs: fixes for get_inode_info
cifs: open_cached_dir(): add FILE_READ_EA to desired access
cifs: reduce warning log level for server not advertising interfaces
cifs: make sure server interfaces are requested only for SMB3+
cifs: defer close file handles having RH lease
Linus Torvalds [Fri, 22 Mar 2024 02:04:31 +0000 (19:04 -0700)]
Merge tag 'drm-next-2024-03-22' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Fixes from the last week (or 3 weeks in amdgpu case), after amdgpu,
it's xe and nouveau then a few scattered core fixes.
core:
- fix rounding in drm_fixp2int_round()
bridge:
- fix documentation for DRM_BRIDGE_OP_EDID
sun4i:
- fix 64-bit division on 32-bit architectures
tests:
- fix dependency on DRM_KMS_HELPER
probe-helper:
- never return negative values from .get_modes() plus driver fixes
xe:
- invalidate userptr vma on page pin fault
- fail early on sysfs file creation error
- skip VMA pinning on xe_exec if no batches
nouveau:
- clear bo resource bus after eviction
- documentation fixes
- don't check devinit disable on GSP
amdgpu:
- Freesync fixes
- UAF IOCTL fixes
- Fix mmhub client ID mapping
- IH 7.0 fix
- DML2 fixes
- VCN 4.0.6 fix
- GART bind fix
- GPU reset fix
- SR-IOV fix
- OD table handling fixes
- Fix TA handling on boards without display hardware
- DML1 fix
- ABM fix
- eDP panel fix
- DPPCLK fix
- HDCP fix
- Revert incorrect error case handling in ioremap
- VPE fix
- HDMI fixes
- SDMA 4.4.2 fix
- Other misc fixes
amdkfd:
- Fix duplicate BO handling in process restore"
* tag 'drm-next-2024-03-22' of https://gitlab.freedesktop.org/drm/kernel: (50 commits)
drm/amdgpu/pm: Don't use OD table on Arcturus
drm/amdgpu: drop setting buffer funcs in sdma442
drm/amd/display: Fix noise issue on HDMI AV mute
drm/amd/display: Revert Remove pixle rate limit for subvp
Revert "drm/amdgpu/vpe: don't emit cond exec command under collaborate mode"
Revert "drm/amd/amdgpu: Fix potential ioremap() memory leaks in amdgpu_device_init()"
drm/amd/display: Add a dc_state NULL check in dc_state_release
drm/amd/display: Return the correct HDCP error code
drm/amd/display: Implement wait_for_odm_update_pending_complete
drm/amd/display: Lock all enabled otg pipes even with no planes
drm/amd/display: Amend coasting vtotal for replay low hz
drm/amd/display: Fix idle check for shared firmware state
drm/amd/display: Update odm when ODM combine is changed on an otg master pipe with no plane
drm/amd/display: Init DPPCLK from SMU on dcn32
drm/amd/display: Add monitor patch for specific eDP
drm/amd/display: Allow dirty rects to be sent to dmub when abm is active
drm/amd/display: Override min required DCFCLK in dml1_validate
drm/amdgpu: Bypass display ta if display hw is not available
drm/amdgpu: correct the KGQ fallback message
drm/amdgpu/pm: Check the validity of overdiver power limit
...
Dave Airlie [Fri, 22 Mar 2024 00:33:27 +0000 (10:33 +1000)]
Merge tag 'amd-drm-fixes-6.9-2024-03-21' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-fixes-6.9-2024-03-21:
amdgpu:
- Freesync fixes
- UAF IOCTL fixes
- Fix mmhub client ID mapping
- IH 7.0 fix
- DML2 fixes
- VCN 4.0.6 fix
- GART bind fix
- GPU reset fix
- SR-IOV fix
- OD table handling fixes
- Fix TA handling on boards without display hardware
- DML1 fix
- ABM fix
- eDP panel fix
- DPPCLK fix
- HDCP fix
- Revert incorrect error case handling in ioremap
- VPE fix
- HDMI fixes
- SDMA 4.4.2 fix
- Other misc fixes
amdkfd:
- Fix duplicate BO handling in process restore
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240321220514.1418288-1-alexander.deucher@amd.com
Linus Torvalds [Fri, 22 Mar 2024 00:21:41 +0000 (17:21 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Re-instate the CPUMASK_OFFSTACK option for arm64 when NR_CPUS > 256.
The bug that led to the initial revert was the cpufreq-dt code not
using zalloc_cpumask_var().
- Make the STARFIVE_STARLINK_PMU config option depend on 64BIT to
prevent compile-test failures on 32-bit architectures due to missing
writeq().
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
perf: starfive: fix 64-bit only COMPILE_TEST condition
ARM64: Dynamically allocate cpumasks and increase supported CPUs to 512
Linus Torvalds [Fri, 22 Mar 2024 00:16:46 +0000 (17:16 -0700)]
Merge tag 'rtc-6.9' of git://git./linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"Subsytem:
- rtc_class is now const
Drivers:
- ds1511: cleanup, set date and time range and alarm offset limit
- max31335: fix interrupt handler
- pcf8523: improve suspend support"
* tag 'rtc-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (28 commits)
MAINTAINER: Include linux-arm-msm for Qualcomm RTC patches
dt-bindings: rtc: zynqmp: Add support for Versal/Versal NET SoCs
rtc: class: make rtc_class constant
dt-bindings: rtc: abx80x: Improve checks on trickle charger constraints
MAINTAINERS: adjust file entry in ARM/Mediatek RTC DRIVER
rtc: nct3018y: fix possible NULL dereference
rtc: max31335: fix interrupt status reg
rtc: mt6397: select IRQ_DOMAIN instead of depending on it
dt-bindings: rtc: abx80x: convert to yaml
rtc: m41t80: Use the unified property API get the wakeup-source property
dt-bindings: at91rm9260-rtt: add sam9x7 compatible
dt-bindings: rtc: convert MT7622 RTC to the json-schema
dt-bindings: rtc: convert MT2717 RTC to the json-schema
rtc: pcf8523: add suspend handlers for alarm IRQ
rtc: ds1511: set alarm offset limit
rtc: ds1511: set range
rtc: ds1511: drop inline/noinline hints
rtc: ds1511: rename pdata
rtc: ds1511: implement ds1511_rtc_read_alarm properly
rtc: ds1511: remove partial alarm support
...
Dave Airlie [Thu, 21 Mar 2024 23:57:22 +0000 (09:57 +1000)]
Merge tag 'drm-misc-next-fixes-2024-03-21' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
Short summary of fixes pull:
core:
- fix rounding in drm_fixp2int_round()
bridge:
- fix documentation for DRM_BRIDGE_OP_EDID
nouveau:
- don't check devinit disable on GSP
sun4i:
- fix 64-bit division on 32-bit architectures
tests:
- fix dependency on DRM_KMS_HELPER
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240321161948.GA30430@linux.fritz.box
Linus Torvalds [Thu, 21 Mar 2024 22:18:18 +0000 (15:18 -0700)]
Merge tag 'siox/for-6.9-rc1' of git://git./linux/kernel/git/ukleinek/linux
Pull siox updates from Uwe Kleine-König:
"This reworks how siox device registration works yielding a saner API.
This allows us to simplify the gpio bus driver using two new devm
functions"
* tag 'siox/for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
siox: bus-gpio: Simplify using devm_siox_* functions
siox: Provide a devm variant of siox_master_register()
siox: Provide a devm variant of siox_master_alloc()
siox: Don't pass the reference on a master in siox_master_register()