Uladzislau Rezki (Sony) [Tue, 16 Apr 2024 09:18:01 +0000 (11:18 +0200)]
Merge branches 'fixes.2024.04.15a', 'misc.2024.04.12a', 'rcu-sync-normal-improve.2024.04.15a', 'rcu-tasks.2024.04.15a' and 'rcutorture.2024.04.15a' into rcu-merge.2024.04.15a
fixes.2024.04.15a: RCU fixes
misc.2024.04.12a: Miscellaneous fixes
rcu-sync-normal-improve.2024.04.15a: Improving synchronize_rcu() call
rcu-tasks.2024.04.15a: Tasks RCU updates
rcutorture.2024.04.15a: Torture-test updates
Zqiang [Fri, 29 Mar 2024 04:52:45 +0000 (12:52 +0800)]
rcutorture: Use rcu_gp_slow_register/unregister() only for rcutype test
The rcu_gp_slow_register/unregister() is only useful in tests where
torture_type=rcu, so this commit therefore generates ->gp_slow_register()
and ->gp_slow_unregister() function pointers in the rcu_torture_ops
structure, and slows grace periods only when these function pointers
exist.
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Fri, 29 Mar 2024 02:51:34 +0000 (19:51 -0700)]
torture: Scale --do-kvfree test time
Currently, the torture.sh --do-kvfree testing is hard-coded to ten
minutes, ignoring the --duration argument. This commit therefore scales
this test duration the same as for the rcutorture tests.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Zqiang [Mon, 25 Mar 2024 07:52:19 +0000 (15:52 +0800)]
rcutorture: Fix invalid context warning when enable srcu barrier testing
When the torture_type is set srcu or srcud and cb_barrier is
non-zero, running the rcutorture test will trigger the
following warning:
[ 163.910989][ C1] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[ 163.910994][ C1] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1
[ 163.910999][ C1] preempt_count: 10001, expected: 0
[ 163.911002][ C1] RCU nest depth: 0, expected: 0
[ 163.911005][ C1] INFO: lockdep is turned off.
[ 163.911007][ C1] irq event stamp: 30964
[ 163.911010][ C1] hardirqs last enabled at (30963): [<
ffffffffabc7df52>] do_idle+0x362/0x500
[ 163.911018][ C1] hardirqs last disabled at (30964): [<
ffffffffae616eff>] sysvec_call_function_single+0xf/0xd0
[ 163.911025][ C1] softirqs last enabled at (0): [<
ffffffffabb6475f>] copy_process+0x16ff/0x6580
[ 163.911033][ C1] softirqs last disabled at (0): [<
0000000000000000>] 0x0
[ 163.911038][ C1] Preemption disabled at:
[ 163.911039][ C1] [<
ffffffffacf1964b>] stack_depot_save_flags+0x24b/0x6c0
[ 163.911063][ C1] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 6.8.0-rc4-rt4-yocto-preempt-rt+ #3
1e39aa9a737dd024a3275c4f835a872f673a7d3a
[ 163.911071][ C1] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[ 163.911075][ C1] Call Trace:
[ 163.911078][ C1] <IRQ>
[ 163.911080][ C1] dump_stack_lvl+0x88/0xd0
[ 163.911089][ C1] dump_stack+0x10/0x20
[ 163.911095][ C1] __might_resched+0x36f/0x530
[ 163.911105][ C1] rt_spin_lock+0x82/0x1c0
[ 163.911112][ C1] spin_lock_irqsave_ssp_contention+0xb8/0x100
[ 163.911121][ C1] srcu_gp_start_if_needed+0x782/0xf00
[ 163.911128][ C1] ? _raw_spin_unlock_irqrestore+0x46/0x70
[ 163.911136][ C1] ? debug_object_active_state+0x336/0x470
[ 163.911148][ C1] ? __pfx_srcu_gp_start_if_needed+0x10/0x10
[ 163.911156][ C1] ? __pfx_lock_release+0x10/0x10
[ 163.911165][ C1] ? __pfx_rcu_torture_barrier_cbf+0x10/0x10
[ 163.911188][ C1] __call_srcu+0x9f/0xe0
[ 163.911196][ C1] call_srcu+0x13/0x20
[ 163.911201][ C1] srcu_torture_call+0x1b/0x30
[ 163.911224][ C1] rcu_torture_barrier1cb+0x4a/0x60
[ 163.911247][ C1] __flush_smp_call_function_queue+0x267/0xca0
[ 163.911256][ C1] ? __pfx_rcu_torture_barrier1cb+0x10/0x10
[ 163.911281][ C1] generic_smp_call_function_single_interrupt+0x13/0x20
[ 163.911288][ C1] __sysvec_call_function_single+0x7d/0x280
[ 163.911295][ C1] sysvec_call_function_single+0x93/0xd0
[ 163.911302][ C1] </IRQ>
[ 163.911304][ C1] <TASK>
[ 163.911308][ C1] asm_sysvec_call_function_single+0x1b/0x20
[ 163.911313][ C1] RIP: 0010:default_idle+0x17/0x20
[ 163.911326][ C1] RSP: 0018:
ffff888001997dc8 EFLAGS:
00000246
[ 163.911333][ C1] RAX:
0000000000000000 RBX:
dffffc0000000000 RCX:
ffffffffae618b51
[ 163.911337][ C1] RDX:
0000000000000000 RSI:
ffffffffaea80920 RDI:
ffffffffaec2de80
[ 163.911342][ C1] RBP:
ffff888001997dc8 R08:
0000000000000001 R09:
ffffed100d740cad
[ 163.911346][ C1] R10:
ffffed100d740cac R11:
ffff88806ba06563 R12:
0000000000000001
[ 163.911350][ C1] R13:
ffffffffafe460c0 R14:
ffffffffafe460c0 R15:
0000000000000000
[ 163.911358][ C1] ? ct_kernel_exit.constprop.3+0x121/0x160
[ 163.911369][ C1] ? lockdep_hardirqs_on+0xc4/0x150
[ 163.911376][ C1] arch_cpu_idle+0x9/0x10
[ 163.911383][ C1] default_idle_call+0x7a/0xb0
[ 163.911390][ C1] do_idle+0x362/0x500
[ 163.911398][ C1] ? __pfx_do_idle+0x10/0x10
[ 163.911404][ C1] ? complete_with_flags+0x8b/0xb0
[ 163.911416][ C1] cpu_startup_entry+0x58/0x70
[ 163.911423][ C1] start_secondary+0x221/0x280
[ 163.911430][ C1] ? __pfx_start_secondary+0x10/0x10
[ 163.911440][ C1] secondary_startup_64_no_verify+0x17f/0x18b
[ 163.911455][ C1] </TASK>
This commit therefore use smp_call_on_cpu() instead of
smp_call_function_single(), make rcu_torture_barrier1cb() invoked
happens on task-context.
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Zqiang [Thu, 21 Mar 2024 08:28:50 +0000 (16:28 +0800)]
rcutorture: Make stall-tasks directly exit when rcutorture tests end
When the rcutorture tests start to exit, the rcu_torture_cleanup() is
invoked to stop kthreads and release resources, if the stall-task
kthreads exist, cpu-stall has started and the rcutorture.stall_cpu
is set to a larger value, the rcu_torture_cleanup() will be blocked
for a long time and the hung-task may occur, this commit therefore
add kthread_should_stop() to the loop of cpu-stall operation, when
rcutorture tests ends, no need to wait for cpu-stall to end, exit
directly.
Use the following command to test:
insmod rcutorture.ko torture_type=srcu fwd_progress=0 stat_interval=4
stall_cpu_block=1 stall_cpu=200 stall_cpu_holdoff=10 read_exit_burst=0
object_debug=1
rmmod rcutorture
[15361.918610] INFO: task rmmod:878 blocked for more than 122 seconds.
[15361.918613] Tainted: G W
6.8.0-rc2-yoctodev-standard+ #25
[15361.918615] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[15361.918616] task:rmmod state:D stack:0 pid:878
tgid:878 ppid:773 flags:0x00004002
[15361.918621] Call Trace:
[15361.918623] <TASK>
[15361.918626] __schedule+0xc0d/0x28f0
[15361.918631] ? __pfx___schedule+0x10/0x10
[15361.918635] ? rcu_is_watching+0x19/0xb0
[15361.918638] ? schedule+0x1f6/0x290
[15361.918642] ? __pfx_lock_release+0x10/0x10
[15361.918645] ? schedule+0xc9/0x290
[15361.918648] ? schedule+0xc9/0x290
[15361.918653] ? trace_preempt_off+0x54/0x100
[15361.918657] ? schedule+0xc9/0x290
[15361.918661] schedule+0xd0/0x290
[15361.918665] schedule_timeout+0x56d/0x7d0
[15361.918669] ? debug_smp_processor_id+0x1b/0x30
[15361.918672] ? rcu_is_watching+0x19/0xb0
[15361.918676] ? __pfx_schedule_timeout+0x10/0x10
[15361.918679] ? debug_smp_processor_id+0x1b/0x30
[15361.918683] ? rcu_is_watching+0x19/0xb0
[15361.918686] ? wait_for_completion+0x179/0x4c0
[15361.918690] ? __pfx_lock_release+0x10/0x10
[15361.918693] ? __kasan_check_write+0x18/0x20
[15361.918696] ? wait_for_completion+0x9d/0x4c0
[15361.918700] ? _raw_spin_unlock_irq+0x36/0x50
[15361.918703] ? wait_for_completion+0x179/0x4c0
[15361.918707] ? _raw_spin_unlock_irq+0x36/0x50
[15361.918710] ? wait_for_completion+0x179/0x4c0
[15361.918714] ? trace_preempt_on+0x54/0x100
[15361.918718] ? wait_for_completion+0x179/0x4c0
[15361.918723] wait_for_completion+0x181/0x4c0
[15361.918728] ? __pfx_wait_for_completion+0x10/0x10
[15361.918738] kthread_stop+0x152/0x470
[15361.918742] _torture_stop_kthread+0x44/0xc0 [torture
7af7f9cbba28271a10503b653f9e05d518fbc8c3]
[15361.918752] rcu_torture_cleanup+0x2ac/0xe90 [rcutorture
f2cb1f556ee7956270927183c4c2c7749a336529]
[15361.918766] ? __pfx_rcu_torture_cleanup+0x10/0x10 [rcutorture
f2cb1f556ee7956270927183c4c2c7749a336529]
[15361.918777] ? __kasan_check_write+0x18/0x20
[15361.918781] ? __mutex_unlock_slowpath+0x17c/0x670
[15361.918789] ? __might_fault+0xcd/0x180
[15361.918793] ? find_module_all+0x104/0x1d0
[15361.918799] __x64_sys_delete_module+0x2a4/0x3f0
[15361.918803] ? __pfx___x64_sys_delete_module+0x10/0x10
[15361.918807] ? syscall_exit_to_user_mode+0x149/0x280
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Zqiang [Mon, 18 Mar 2024 09:34:12 +0000 (17:34 +0800)]
rcutorture: Removing redundant function pointer initialization
For these rcu_torture_ops structure's objects defined by using static,
if the value of the function pointer in its member is not set, the default
value will be NULL, this commit therefore remove the pre-existing
initialization of function pointers to NULL.
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Zqiang [Mon, 18 Mar 2024 09:34:11 +0000 (17:34 +0800)]
rcutorture: Make rcutorture support print rcu-tasks gp state
This commit make rcu-tasks related rcutorture test support rcu-tasks
gp state printing when the writer stall occurs or the at the end of
rcutorture test, and generate rcu_ops->get_gp_data() operation to
simplify the acquisition of gp state for different types of rcutorture
tests.
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Zqiang [Fri, 15 Mar 2024 07:17:10 +0000 (15:17 +0800)]
rcutorture: Use the gp_kthread_dbg operation specified by cur_ops
Despite there being a cur_ops->gp_kthread_dbg(), rcu_torture_writer()
unconditionally invokes vanilla RCU's show_rcu_gp_kthreads(). This is not
at all helpful when some other flavor of RCU is being tested. This commit
therefore makes rcu_torture_writer() invoke cur_ops->gp_kthread_dbg()
for RCU implementations providing this function.
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
linke li [Thu, 7 Mar 2024 03:51:10 +0000 (19:51 -0800)]
rcutorture: Re-use value stored to ->rtort_pipe_count instead of re-reading
Currently, the rcu_torture_pipe_update_one() writes the value (i + 1)
to rp->rtort_pipe_count, then immediately re-reads it in order to compare
it to RCU_TORTURE_PIPE_LEN. This re-read is pointless because no other
update to rp->rtort_pipe_count can occur at this point. This commit
therefore instead re-uses the (i + 1) value stored in the comparison
instead of re-reading rp->rtort_pipe_count.
Signed-off-by: linke li <lilinke99@qq.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Thu, 7 Mar 2024 03:21:47 +0000 (19:21 -0800)]
rcutorture: Fix rcu_torture_one_read() pipe_count overflow comment
The "pipe_count > RCU_TORTURE_PIPE_LEN" check has a comment saying "Should
not happen, but...". This is only true when testing an RCU whose grace
periods are always long enough. This commit therefore fixes this comment.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Closes: https://lore.kernel.org/lkml/CAHk-=wi7rJ-eGq+xaxVfzFEgbL9tdf6Kc8Z89rCpfcQOKm74Tw@mail.gmail.com/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Thu, 7 Mar 2024 03:14:46 +0000 (19:14 -0800)]
rcutorture: Remove extraneous rcu_torture_pipe_update_one() READ_ONCE()
The rcu_torture_pipe_update_one() cannot run concurrently with any updates
of ->rtort_pipe_count, so this commit removes the extraneous READ_ONCE()
from the read from this field.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Closes: https://lore.kernel.org/lkml/CAHk-=wiX_zF5Mpt8kUm_LFQpYY-mshrXJPOe+wKNwiVhEUcU9g@mail.gmail.com/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Uladzislau Rezki (Sony) [Fri, 8 Mar 2024 17:34:09 +0000 (18:34 +0100)]
rcu: Allocate WQ with WQ_MEM_RECLAIM bit set
synchronize_rcu() users have to be processed regardless
of memory pressure so our private WQ needs to have at least
one execution context what WQ_MEM_RECLAIM flag guarantees.
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Uladzislau Rezki (Sony) [Fri, 8 Mar 2024 17:34:07 +0000 (18:34 +0100)]
rcu: Support direct wake-up of synchronize_rcu() users
This patch introduces a small enhancement which allows to do a
direct wake-up of synchronize_rcu() callers. It occurs after a
completion of grace period, thus by the gp-kthread.
Number of clients is limited by the hard-coded maximum allowed
threshold. The remaining part, if still exists is deferred to
a main worker.
Link: https://lore.kernel.org/lkml/Zd0ZtNu+Rt0qXkfS@lothringen/
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Uladzislau Rezki (Sony) [Fri, 8 Mar 2024 17:34:06 +0000 (18:34 +0100)]
rcu: Add a trace event for synchronize_rcu_normal()
Add an rcu_sr_normal() trace event. It takes three arguments
first one is the name of RCU flavour, second one is a user id
which triggeres synchronize_rcu_normal() and last one is an
event.
There are two traces in the synchronize_rcu_normal(). On entry,
when a new request is registered and on exit point when request
is completed.
Please note, CONFIG_RCU_TRACE=y is required to activate traces.
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Uladzislau Rezki (Sony) [Fri, 8 Mar 2024 17:34:05 +0000 (18:34 +0100)]
rcu: Reduce synchronize_rcu() latency
A call to a synchronize_rcu() can be optimized from a latency
point of view. Workloads which depend on this can benefit of it.
The delay of wakeme_after_rcu() callback, which unblocks a waiter,
depends on several factors:
- how fast a process of offloading is started. Combination of:
- !CONFIG_RCU_NOCB_CPU/CONFIG_RCU_NOCB_CPU;
- !CONFIG_RCU_LAZY/CONFIG_RCU_LAZY;
- other.
- when started, invoking path is interrupted due to:
- time limit;
- need_resched();
- if limit is reached.
- where in a nocb list it is located;
- how fast previous callbacks completed;
Example:
1. On our embedded devices i can easily trigger the scenario when
it is a last in the list out of ~3600 callbacks:
<snip>
<...>-29 [001] d..1. 21950.145313: rcu_batch_start: rcu_preempt CBs=3613 bl=28
...
<...>-29 [001] ..... 21950.152578: rcu_invoke_callback: rcu_preempt rhp=
00000000b2d6dee8 func=__free_vm_area_struct.cfi_jt
<...>-29 [001] ..... 21950.152579: rcu_invoke_callback: rcu_preempt rhp=
00000000a446f607 func=__free_vm_area_struct.cfi_jt
<...>-29 [001] ..... 21950.152580: rcu_invoke_callback: rcu_preempt rhp=
00000000a5cab03b func=__free_vm_area_struct.cfi_jt
<...>-29 [001] ..... 21950.152581: rcu_invoke_callback: rcu_preempt rhp=
0000000013b7e5ee func=__free_vm_area_struct.cfi_jt
<...>-29 [001] ..... 21950.152582: rcu_invoke_callback: rcu_preempt rhp=
000000000a8ca6f9 func=__free_vm_area_struct.cfi_jt
<...>-29 [001] ..... 21950.152583: rcu_invoke_callback: rcu_preempt rhp=
000000008f162ca8 func=wakeme_after_rcu.cfi_jt
<...>-29 [001] d..1. 21950.152625: rcu_batch_end: rcu_preempt CBs-invoked=3612 idle=....
<snip>
2. We use cpuset/cgroup to classify tasks and assign them into
different cgroups. For example "backgrond" group which binds tasks
only to little CPUs or "foreground" which makes use of all CPUs.
Tasks can be migrated between groups by a request if an acceleration
is needed.
See below an example how "surfaceflinger" task gets migrated.
Initially it is located in the "system-background" cgroup which
allows to run only on little cores. In order to speed it up it
can be temporary moved into "foreground" cgroup which allows
to use big/all CPUs:
cgroup_attach_task():
-> cgroup_migrate_execute()
-> cpuset_can_attach()
-> percpu_down_write()
-> rcu_sync_enter()
-> synchronize_rcu()
-> now move tasks to the new cgroup.
-> cgroup_migrate_finish()
<snip>
rcuop/1-29 [000] ..... 7030.528570: rcu_invoke_callback: rcu_preempt rhp=
00000000461605e0 func=wakeme_after_rcu.cfi_jt
PERFD-SERVER-1855 [000] d..1. 7030.530293: cgroup_attach_task: dst_root=3 dst_id=22 dst_level=1 dst_path=/foreground pid=1900 comm=surfaceflinger
TimerDispatch-2768 [002] d..5. 7030.537542: sched_migrate_task: comm=surfaceflinger pid=1900 prio=98 orig_cpu=0 dest_cpu=4
<snip>
"Boosting a task" depends on synchronize_rcu() latency:
- first trace shows a completion of synchronize_rcu();
- second shows attaching a task to a new group;
- last shows a final step when migration occurs.
3. To address this drawback, maintain a separate track that consists
of synchronize_rcu() callers only. After completion of a grace period
users are deferred to a dedicated worker to process requests.
4. This patch reduces the latency of synchronize_rcu() approximately
by ~30-40% on synthetic tests. The real test case, camera launch time,
shows(time is in milliseconds):
1-run 542 vs 489 improvement 9%
2-run 540 vs 466 improvement 13%
3-run 518 vs 468 improvement 9%
4-run 531 vs 457 improvement 13%
5-run 548 vs 475 improvement 13%
6-run 509 vs 484 improvement 4%
Synthetic test(no "noise" from other callbacks):
Hardware: x86_64 64 CPUs, 64GB of memory
Linux-6.6
- 10K tasks(simultaneous);
- each task does(1000 loops)
synchronize_rcu();
kfree(p);
default: CONFIG_RCU_NOCB_CPU: takes 54 seconds to complete all users;
patch: CONFIG_RCU_NOCB_CPU: takes 35 seconds to complete all users.
Running 60K gives approximately same results on my setup. Please note
it is without any interaction with another type of callbacks, otherwise
it will impact a lot a default case.
5. By default it is disabled. To enable this perform one of the
below sequence:
echo 1 > /sys/module/rcutree/parameters/rcu_normal_wake_from_gp
or pass a boot parameter "rcutree.rcu_normal_wake_from_gp=1"
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Co-developed-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Nikita Kiryushin [Mon, 1 Apr 2024 19:43:15 +0000 (22:43 +0300)]
rcu: Fix buffer overflow in print_cpu_stall_info()
The rcuc-starvation output from print_cpu_stall_info() might overflow the
buffer if there is a huge difference in jiffies difference. The situation
might seem improbable, but computers sometimes get very confused about
time, which can result in full-sized integers, and, in this case,
buffer overflow.
Also, the unsigned jiffies difference is printed using %ld, which is
normally for signed integers. This is intentional for debugging purposes,
but it is not obvious from the code.
This commit therefore changes sprintf() to snprintf() and adds a
clarifying comment about intention of %ld format.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 245a62982502 ("rcu: Dump rcuc kthread status for CPUs not reporting quiescent state")
Signed-off-by: Nikita Kiryushin <kiryushin@ancud.ru>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Johannes Berg [Mon, 25 Mar 2024 17:39:08 +0000 (18:39 +0100)]
rcu: Mollify sparse with RCU guard
When using "guard(rcu)();" sparse will complain, because even
though it now understands the cleanup attribute, it doesn't
evaluate the calls from it at function exit, and thus doesn't
count the context correctly.
Given that there's a conditional in the resulting code:
static inline void class_rcu_destructor(class_rcu_t *_T)
{
if (_T->lock) {
rcu_read_unlock();
}
}
it seems that even trying to teach sparse to evalulate the
cleanup attribute function it'd still be difficult to really
make it understand the full context here.
Suppress the sparse warning by just releasing the context in
the acquisition part of the function, after all we know it's
safe with the guard, that's the whole point of it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Nikita Kiryushin [Wed, 27 Mar 2024 17:47:47 +0000 (20:47 +0300)]
rcu-tasks: Fix show_rcu_tasks_trace_gp_kthread buffer overflow
There is a possibility of buffer overflow in
show_rcu_tasks_trace_gp_kthread() if counters, passed
to sprintf() are huge. Counter numbers, needed for this
are unrealistically high, but buffer overflow is still
possible.
Use snprintf() with buffer size instead of sprintf().
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: edf3775f0ad6 ("rcu-tasks: Add count for idle tasks on offline CPUs")
Signed-off-by: Nikita Kiryushin <kiryushin@ancud.ru>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Zqiang [Mon, 26 Feb 2024 03:24:39 +0000 (11:24 +0800)]
rcu-tasks: Fix the comments for tasks_rcu_exit_srcu_stall_timer
The synchronize_srcu() has been removed by commit("rcu-tasks: Eliminate
deadlocks involving do_exit() and RCU tasks") in rcu_tasks_postscan.
This commit therefore fixes the tasks_rcu_exit_srcu_stall_timer comment.
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Sat, 24 Feb 2024 01:07:24 +0000 (17:07 -0800)]
rcu-tasks: Replace exit_tasks_rcu_start() initialization with WARN_ON_ONCE()
Because the Tasks RCU ->rtp_exit_list is initialized at rcu_init()
time while there is only one CPU running with interrupts disabled, it
is not possible for an exiting task to encounter an uninitialized list.
This commit therefore replaces the conditional initialization with
a WARN_ON_ONCE().
Reported-by: Frederic Weisbecker <frederic@kernel.org>
Closes: https://lore.kernel.org/all/ZdiNXmO3wRvmzPsr@lothringen/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Sat, 16 Mar 2024 06:29:59 +0000 (23:29 -0700)]
rcu: Remove redundant CONFIG_PROVE_RCU #if condition
The #if condition controlling the rcu_preempt_sleep_check() definition
has a redundant check for CONFIG_PREEMPT_RCU, which is already checked
for by an enclosing #ifndef. This commit therefore removes this redundant
condition from the inner #if.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Fri, 8 Mar 2024 21:26:26 +0000 (13:26 -0800)]
rcu: Inform KCSAN of one-byte cmpxchg() in rcu_trc_cmpxchg_need_qs()
Tasks Trace RCU needs a single-byte cmpxchg(), but no such thing exists.
Therefore, rcu_trc_cmpxchg_need_qs() emulates one using field substitution
and a four-byte cmpxchg(), such that the other three bytes are always
atomically updated to their old values. This works, but results in
false-positive KCSAN failures because as far as KCSAN knows, this
cmpxchg() operation is updating all four bytes.
This commit therefore encloses the cmpxchg() in a data_race() and adds
a single-byte instrument_atomic_read_write(), thus telling KCSAN exactly
what is going on so as to avoid the false positives.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Fri, 8 Mar 2024 19:15:01 +0000 (11:15 -0800)]
rcu: Make hotplug operations track GP state, not flags
Currently, there are rcu_data structure fields named ->rcu_onl_gp_seq
and ->rcu_ofl_gp_seq that track the rcu_state.gp_flags field at the
time of the corresponding CPU's last online or offline operation,
respectively. However, this information is not particularly useful.
It would be better to instead track the grace period state kept
in rcu_state.gp_state. This would also be consistent with the
initialization in rcu_boot_init_percpu_data(), which is to RCU_GP_CLEANED
(an rcu_state.gp_state value), and also with the diagnostics in
rcu_implicit_dynticks_qs(), whose format is consistent with an integer,
not a bitmask.
This commit therefore makes this change and changes the names to
->rcu_onl_gp_flags and ->rcu_ofl_gp_flags, respectively.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Fri, 8 Mar 2024 04:14:55 +0000 (20:14 -0800)]
rcu: Mark loads from rcu_state.n_online_cpus
The rcu_state.n_online_cpus value is only ever updated by CPU-hotplug
operations, which are serialized. However, this value is read locklessly.
This commit therefore marks those reads. While in the area, it also
adds ASSERT_EXCLUSIVE_WRITER() calls just in case parallel CPU hotplug
becomes a thing.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Fri, 8 Mar 2024 04:11:21 +0000 (20:11 -0800)]
rcu: Mark writes to rcu_sync ->gp_count field
The rcu_sync structure's ->gp_count field is updated under the protection
of ->rss_lock, but read locklessly, and KCSAN noted the data race.
This commit therefore uses WRITE_ONCE() to do this update to clearly
document its racy nature.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Thu, 7 Mar 2024 23:06:19 +0000 (15:06 -0800)]
rcu: Bring diagnostic read of rcu_state.gp_flags into alignment
This commit adds READ_ONCE() to a lockless diagnostic read from
rcu_state.gp_flags to avoid giving the compiler any chance whatsoever
of confusing the diagnostic state printed.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Thu, 7 Mar 2024 23:01:54 +0000 (15:01 -0800)]
rcu: Remove redundant READ_ONCE() of rcu_state.gp_flags in tree.c
Although it is functionally OK to do READ_ONCE() of a variable that
cannot change, it is confusing and at best an accident waiting to happen.
This commit therefore removes a number of READ_ONCE(rcu_state.gp_flags)
instances from kernel/rcu/tree.c that are not needed due to updates
to this field being excluded by virtue of holding the root rcu_node
structure's ->lock.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Closes: https://lore.kernel.org/lkml/4857c5ef-bd8f-4670-87ac-0600a1699d05@paulmck-laptop/T/#mccb23c2a4902da4d3c750165329f8de056903c58
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Closes: https://lore.kernel.org/lkml/4857c5ef-bd8f-4670-87ac-0600a1699d05@paulmck-laptop/T/#md1b5c026584f9c3c7b0fbc9240dd7de584597b73
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Mon, 4 Mar 2024 23:33:33 +0000 (15:33 -0800)]
rcu: Make Tiny RCU explicitly disable preemption
Because Tiny RCU is used only in kernels built with either
CONFIG_PREEMPT_NONE=y or CONFIG_PREEMPT_VOLUNTARY=y, there has not been
any need for TINY RCU to explicitly disable preemption. However, the
prospect of lazy preemption changes that, and preemption means that
the non-atomic increment in synchronize_rcu() can be preempted, with
the possibility that one of the increments is lost. This could cause
failures for users of the APIs that poll RCU grace periods.
This commit therefore adds the needed preempt_disable() and
preempt_enable() call to Tiny RCU.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Sat, 2 Mar 2024 01:16:53 +0000 (17:16 -0800)]
rcu: Remove redundant BH disabling in TINY_RCU
The TINY_RCU rcu_process_callbacks() function is only ever invoked from
a softirq handler, which means that BH is already disabled. This commit
therefore removes the redundant local_bh_disable() and local_bh_ennable()
from this function.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Thu, 22 Feb 2024 18:09:19 +0000 (10:09 -0800)]
rcu: Create NEED_TASKS_RCU to factor out enablement logic
Currently, if a Kconfig option depends on TASKS_RCU, it conditionally does
"select TASKS_RCU if PREEMPTION". This works, but requires any change in
this enablement logic to be replicated across all such "select" clauses.
This commit therefore creates a new NEED_TASKS_RCU Kconfig option so
that the default value of TASKS_RCU can depend on a combination of this
new option and any needed enablement logic, so that this logic is in
one place.
While in the area, also anticipate a likely future change by adding
PREEMPT_AUTO to that logic.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Thu, 15 Feb 2024 17:04:30 +0000 (09:04 -0800)]
srcu: Make Tiny SRCU explicitly disable preemption
Because Tiny SRCU is used only in kernels built with either
CONFIG_PREEMPT_NONE=y or CONFIG_PREEMPT_VOLUNTARY=y, there has not
been any need for TINY SRCU to explicitly disable preemption. However,
the prospect of lazy preemption changes that, and the lazy-preemption
patches do result in rcutorture runs finding both too-short grace periods
and grace-period hangs for Tiny SRCU.
This commit therefore adds the needed preempt_disable() and
preempt_enable() calls to Tiny SRCU.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Wed, 14 Feb 2024 23:33:55 +0000 (15:33 -0800)]
rcu: Make TINY_RCU depend on !PREEMPT_RCU rather than !PREEMPTION
Right now, TINY_RCU depends on (!PREEMPTION && !SMP), which has served the
kernel well for many years due to the fact that PREEMPT_RCU is normally
a synonym for PREEMPTION. But with the advent of lazy preemption,
it will be possible to have non-preemptible RCU in a preemptible kernel,
so that kernels could be built with PREEMPT_RCU=n and PREEMPTION=y.
This commit therefore makes TINY_RCU depend on (!PREEMPT_RCU && !SMP),
thus allowing for a non-preemptible RCU in preemptible kernels.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Uladzislau Rezki (Sony) [Wed, 14 Feb 2024 22:45:53 +0000 (14:45 -0800)]
rcu: Update lockdep while in RCU read-side critical section
With Ankur's lazy-/auto-preemption patches applied and with a
lazy-preemptible kernel in combination with a non-preemptible RCU,
lockdep sometimes complains about context switches within RCU read-side
critical sections. This is a false positive due to rcu_read_unlock()
updating lockdep state too late:
__release(RCU);
__rcu_read_unlock();
// Context switch here results in lockdep false positive!!!
rcu_lock_release(&rcu_lock_map); /* Keep acq info for rls diags. */
Although this complaint could also happen with preemptible RCU
in a preemptible kernel, the odds of that happening aer quite low.
In constrast, with non-preemptible RCU, a long critical section has a
high probability of performing a context switch from the preempt_enable()
in __rcu_read_unlock().
The fix is straightforward, just move the rcu_lock_release()
within rcu_read_unlock() to obtain the reverse order from that of
rcu_read_lock():
rcu_lock_release(&rcu_lock_map); /* Keep acq info for rls diags. */
__release(RCU);
__rcu_read_unlock();
This commit makes this change.
Co-developed-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Co-developed-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Paul E. McKenney [Wed, 28 Feb 2024 19:30:43 +0000 (11:30 -0800)]
ftrace: Choose RCU Tasks based on TASKS_RCU rather than PREEMPTION
The advent of CONFIG_PREEMPT_AUTO, AKA lazy preemption, will mean that
even kernels built with CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_VOLUNTARY
might see the occasional preemption, and that this preemption just might
happen within a trampoline.
Therefore, update ftrace_shutdown() to invoke synchronize_rcu_tasks()
based on CONFIG_TASKS_RCU instead of CONFIG_PREEMPTION.
[ paulmck: Apply Steven Rostedt feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ankur Arora <ankur.a.arora@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <linux-trace-kernel@vger.kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Wed, 28 Feb 2024 19:17:28 +0000 (11:17 -0800)]
bpf: Choose RCU Tasks based on TASKS_RCU rather than PREEMPTION
The advent of CONFIG_PREEMPT_AUTO, AKA lazy preemption, will mean that
even kernels built with CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_VOLUNTARY
might see the occasional preemption, and that this preemption just might
happen within a trampoline.
Therefore, update bpf_tramp_image_put() to choose call_rcu_tasks()
based on CONFIG_TASKS_RCU instead of CONFIG_PREEMPTION.
This change might enable further simplifications, but the goal of this
effort is to make the code safe, not necessarily optimal.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <bpf@vger.kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Thu, 22 Feb 2024 18:25:44 +0000 (10:25 -0800)]
tracing: Select new NEED_TASKS_RCU Kconfig option
Currently, if a Kconfig option depends on TASKS_RCU, it conditionally does
"select TASKS_RCU if PREEMPTION". This works, but requires any change in
this enablement logic to be replicated across all such "select" clauses.
A new NEED_TASKS_RCU Kconfig option has been created to allow this
enablement logic to be in one place in kernel/rcu/Kconfig.
Therefore, select the new NEED_TASKS_RCU Kconfig option instead of the
old TASKS_RCU option.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: <linux-trace-kernel@vger.kernel.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Uladzislau Rezki (Sony) [Fri, 8 Mar 2024 17:34:04 +0000 (18:34 +0100)]
rcu: Add data structures for synchronize_rcu()
The synchronize_rcu() call is going to be reworked, thus
this patch adds dedicated fields into the rcu_state structure.
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Thu, 22 Feb 2024 18:22:27 +0000 (10:22 -0800)]
arch: Select new NEED_TASKS_RCU Kconfig option
Currently, if a Kconfig option depends on TASKS_RCU, it conditionally does
"select TASKS_RCU if PREEMPTION". This works, but requires any change in
this enablement logic to be replicated across all such "select" clauses.
A new NEED_TASKS_RCU Kconfig option has been created to allow this
enablement logic to be in one place in kernel/rcu/Kconfig.
Therefore, select the new NEED_TASKS_RCU Kconfig option instead of the
old TASKS_RCU option.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Thu, 22 Feb 2024 18:16:17 +0000 (10:16 -0800)]
bpf: Select new NEED_TASKS_RCU Kconfig option
Currently, if a Kconfig option depends on TASKS_RCU, it conditionally does
"select TASKS_RCU if PREEMPTION". This works, but requires any change in
this enablement logic to be replicated across all such "select" clauses.
A new NEED_TASKS_RCU Kconfig option has been created to allow this
enablement logic to be in one place in kernel/rcu/Kconfig.
Therefore, make BPF select the new NEED_TASKS_RCU Kconfig option.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Yonghong Song <yonghong.song@linux.dev>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: <bpf@vger.kernel.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Neeraj Upadhyay [Sat, 24 Feb 2024 04:57:30 +0000 (10:27 +0530)]
MAINTAINERS: Update Neeraj's email address
Update my email-address in MAINTAINERS and .mailmap entries to my
kernel.org account.
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
Reviewed-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Zenghui Yu [Fri, 16 Feb 2024 15:44:55 +0000 (23:44 +0800)]
doc: Remove references to arrayRCU.rst
arrayRCU.rst has been removed since commit
ef2555cf68c3 ("doc: Remove
arrayRCU.rst") but is still referenced by whatisRCU.rst. Update it to
reflect the current state of the documentation.
Signed-off-by: Zenghui Yu <zenghui.yu@linux.dev>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Fri, 23 Feb 2024 23:16:20 +0000 (15:16 -0800)]
rcu-tasks: Make Tasks RCU wait idly for grace-period delays
Currently, all waits for grace periods sleep at TASK_UNINTERRUPTIBLE,
regardless of RCU flavor. This has worked well, but there have been
cases where a longer-than-average Tasks RCU grace period has triggered
softlockup splats, many of them, before the Tasks RCU CPU stall warning
appears. These softlockup splats unnecessarily consume console bandwidth
and complicate diagnosis of the underlying problem. Plus a long but not
pathologically long Tasks RCU grace period might trigger a few softlockup
splats before completing normally, which generates noise for no good
reason.
This commit therefore causes Tasks RCU grace periods to sleep at TASK_IDLE
priority. If there really is a persistent problem, the eventual Tasks
RCU CPU stall warning will flag it, and without the extra noise.
Reported-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Thu, 7 Mar 2024 01:00:39 +0000 (17:00 -0800)]
rcutorture: ASSERT_EXCLUSIVE_WRITER() for ->rtort_pipe_count updates
It turns out that only one CPU at a time will ever invoke
rcu_torture_pipe_update_one() on a given rcu_torture structure.
This commit therefore adds three ASSERT_EXCLUSIVE_WRITER() calls
to enlist KCSAN's aid in checking this.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Sun, 18 Feb 2024 17:11:08 +0000 (09:11 -0800)]
rcutorture: Dump GP kthread state on insufficient cb-flood laundering
If a callback flood prevents grace period from completing, rcutorture
does a WARN_ON(). Avoiding this WARN_ON() currently requires that at
least three grace periods elapse during an eight-second callback-flood
interval. Unfortunately, the current debug information does not include
anything about the grace-period state. This commit therefore adds a
call to cur_ops->gp_kthread_dbg(), if this function pointer is non-NULL.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Tue, 20 Feb 2024 01:00:31 +0000 (17:00 -0800)]
rcutorture: Dump # online CPUs on insufficient cb-flood laundering
This commit adds the number of online CPUs to the state dump following
an unsuccesful callback-flood test.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Sun, 18 Feb 2024 01:57:35 +0000 (17:57 -0800)]
rcutorture: Enable RCU priority boosting for TREE09
The TREE09 rcutorture scenario exhausts memory from time to time, and
this is due to a reader being preempted and blocking grace periods,
thus preventing recycling of the memory used in callback-flooding tests.
This commit therefore enables RCU priority boosting and sets the boosting
delay to 100 milliseconds after grace-period start.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Tue, 27 Feb 2024 23:07:25 +0000 (15:07 -0800)]
rcu: Add lockdep checks and kernel-doc header to rcu_softirq_qs()
There is some indications that rcu_softirq_qs() might be more generally
used than anticipated. This commit therefore adds some lockdep assertions
and some cautionary tales in a new kernel-doc header.
Link: https://lore.kernel.org/all/Zd4DXTyCf17lcTfq@debian.debian/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Yan Zhai <yan@cloudflare.com>
Cc: <netdev@vger.kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Fri, 9 Feb 2024 11:11:25 +0000 (03:11 -0800)]
rcutorture: Disable tracing to permit Tasks Rude RCU testing
Now that the KPROBES, TRACING, BLK_DEV_IO_TRACE, and UPROBE_EVENTS
Kconfig options select the TASKS_TRACE_RCU option, the torture.sh tests
of enabling exactly one of the RCU Tasks flavors fail. This commit
therefore disables these options to allow this testing to succeed.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Paul E. McKenney [Wed, 31 Jan 2024 12:15:42 +0000 (04:15 -0800)]
scftorture: Increase memory provided to guest OS
The tradition, extending back almost a full year, has been 2GB plus an
additional number of GBs equal to the number of CPUs divided by sixteen.
This tradition has served scftorture well, even the CONFIG_PREEMPT=y
version running KASAN within guest OSes having 40 CPUs. However, this
test recently started OOMing on larger systems, and this commit therefore
gives this test an additional GB of memory.
It is quite possible that further testing on larger systems will show
a need to decrease the divisor from 16 to (say) 8, but that is a change
to make once it has been demonstrated to be required.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Linus Torvalds [Sun, 31 Mar 2024 21:32:39 +0000 (14:32 -0700)]
Linux 6.9-rc2
Linus Torvalds [Sun, 31 Mar 2024 18:23:51 +0000 (11:23 -0700)]
Merge tag 'kbuild-fixes-v6.9' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Deduplicate Kconfig entries for CONFIG_CXL_PMU
- Fix unselectable choice entry in MIPS Kconfig, and forbid this
structure
- Remove unused include/asm-generic/export.h
- Fix a NULL pointer dereference bug in modpost
- Enable -Woverride-init warning consistently with W=1
- Drop KCSAN flags from *.mod.c files
* tag 'kbuild-fixes-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: Fix typo HEIGTH to HEIGHT
Documentation/llvm: Note s390 LLVM=1 support with LLVM 18.1.0 and newer
kbuild: Disable KCSAN for autogenerated *.mod.c intermediaries
kbuild: make -Woverride-init warnings more consistent
modpost: do not make find_tosym() return NULL
export.h: remove include/asm-generic/export.h
kconfig: do not reparent the menu inside a choice block
MIPS: move unselectable FIT_IMAGE_FDT_EPM5 out of the "System type" choice
cxl: remove CONFIG_CXL_PMU entry in drivers/cxl/Kconfig
Linus Torvalds [Sun, 31 Mar 2024 18:15:32 +0000 (11:15 -0700)]
Merge tag 'edac_urgent_for_v6.9_rc2' of git://git./linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:
- Fix more issues in the AMD FMPM driver
* tag 'edac_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
RAS: Avoid build errors when CONFIG_DEBUG_FS=n
RAS/AMD/FMPM: Safely handle saved records of various sizes
RAS/AMD/FMPM: Avoid NULL ptr deref in get_saved_records()
Linus Torvalds [Sun, 31 Mar 2024 18:04:51 +0000 (11:04 -0700)]
Merge tag 'irq_urgent_for_v6.9_rc2' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Fix an unused function warning on irqchip/irq-armada-370-xp
- Fix the IRQ sharing with pinctrl-amd and ACPI OSL
* tag 'irq_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/armada-370-xp: Suppress unused-function warning
genirq: Introduce IRQF_COND_ONESHOT and use it in pinctrl-amd
Linus Torvalds [Sun, 31 Mar 2024 17:43:11 +0000 (10:43 -0700)]
Merge tag 'perf_urgent_for_v6.9_rc2' of git://git./linux/kernel/git/tip/tip
Pull x86 perf fixes from Borislav Petkov:
- Define the correct set of default hw events on AMD Zen4
- Use the correct stalled cycles PMCs on AMD Zen2 and newer
- Fix detection of the LBR freeze feature on AMD
* tag 'perf_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/amd/core: Define a proper ref-cycles event for Zen 4 and later
perf/x86/amd/core: Update and fix stalled-cycles-* events for Zen 2 and later
perf/x86/amd/lbr: Use freeze based on availability
x86/cpufeatures: Add new word for scattered features
Linus Torvalds [Sun, 31 Mar 2024 17:34:49 +0000 (10:34 -0700)]
Merge tag 'timers_urgent_for_v6.9_rc2' of git://git./linux/kernel/git/tip/tip
Pull timers update from Borislav Petkov:
- Volunteer in Anna-Maria and Frederic as timers co-maintainers so that
tglx can relax more :-P
* tag 'timers_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Add co-maintainers for time[rs]
Linus Torvalds [Sun, 31 Mar 2024 17:30:06 +0000 (10:30 -0700)]
Merge tag 'objtool_urgent_for_v6.9_rc2' of git://git./linux/kernel/git/tip/tip
Pull objtool fix from Borislav Petkov:
- Fix a format specifier build error in objtool during an x32 build
* tag 'objtool_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix compile failure when using the x32 compiler
Linus Torvalds [Sun, 31 Mar 2024 17:16:34 +0000 (10:16 -0700)]
Merge tag 'x86_urgent_for_v6.9_rc2' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Make sure single object builds in arch/x86/virt/ ala
make ... arch/x86/virt/vmx/tdx/seamcall.o
work again
- Do not do ROM range scans and memory validation when the kernel is
running as a SEV-SNP guest as those can get problematic and, before
that, are not really needed in such a guest
- Exclude the build-time generated vdso-image-x32.o object from objtool
validation and in particular the return sites in there due to a
warning which fires when an unpatched return thunk is being used
- Improve the NMI CPUs stall message to show additional information
about the state of each CPU wrt the NMI handler
- Enable gcc named address spaces support only on !KCSAN configs due to
compiler options incompatibility
- Revert a change which was trying to use GB pages for mapping regions
only when the regions would be large enough but that change lead to
kexec failing
- A documentation fixlet
* tag 'x86_urgent_for_v6.9_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build: Use obj-y to descend into arch/x86/virt/
x86/sev: Skip ROM range scans and validation for SEV-SNP guests
x86/vdso: Fix rethunk patching for vdso-image-x32.o too
x86/nmi: Upgrade NMI backtrace stall checks & messages
x86/percpu: Disable named address spaces for KCSAN
Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."
Documentation/x86: Fix title underline length
Isak Ellmer [Sat, 30 Mar 2024 15:19:45 +0000 (16:19 +0100)]
kconfig: Fix typo HEIGTH to HEIGHT
Fixed a typo in some variables where height was misspelled as heigth.
Signed-off-by: Isak Ellmer <isak01@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Nathan Chancellor [Wed, 27 Mar 2024 17:20:36 +0000 (10:20 -0700)]
Documentation/llvm: Note s390 LLVM=1 support with LLVM 18.1.0 and newer
As of the first s390 pull request during the 6.9 merge window,
commit
691632f0e869 ("Merge tag 's390-6.9-1' of
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux"), s390 can be
built with LLVM=1 when using LLVM 18.1.0, which is the first version
that has SystemZ support implemented in ld.lld and llvm-objcopy.
Update the supported architectures table in the Kbuild LLVM
documentation to note this explicitly to make it more discoverable by
users and other developers. Additionally, this brings s390 in line with
the rest of the architectures in the table, which all support LLVM=1.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Borislav Petkov (AMD) [Tue, 26 Mar 2024 20:25:48 +0000 (21:25 +0100)]
kbuild: Disable KCSAN for autogenerated *.mod.c intermediaries
When KCSAN and CONSTRUCTORS are enabled, one can trigger the
"Unpatched return thunk in use. This should not happen!"
catch-all warning.
Usually, when objtool runs on the .o objects, it does generate a section
.return_sites which contains all offsets in the objects to the return
thunks of the functions present there. Those return thunks then get
patched at runtime by the alternatives.
KCSAN and CONSTRUCTORS add this to the object file's .text.startup
section:
-------------------
Disassembly of section .text.startup:
...
0000000000000010 <_sub_I_00099_0>:
10: f3 0f 1e fa endbr64
14: e8 00 00 00 00 call 19 <_sub_I_00099_0+0x9>
15: R_X86_64_PLT32 __tsan_init-0x4
19: e9 00 00 00 00 jmp 1e <__UNIQUE_ID___addressable_cryptd_alloc_aead349+0x6>
1a: R_X86_64_PLT32 __x86_return_thunk-0x4
-------------------
which, if it is built as a module goes through the intermediary stage of
creating a <module>.mod.c file which, when translated, receives a second
constructor:
-------------------
Disassembly of section .text.startup:
0000000000000010 <_sub_I_00099_0>:
10: f3 0f 1e fa endbr64
14: e8 00 00 00 00 call 19 <_sub_I_00099_0+0x9>
15: R_X86_64_PLT32 __tsan_init-0x4
19: e9 00 00 00 00 jmp 1e <_sub_I_00099_0+0xe>
1a: R_X86_64_PLT32 __x86_return_thunk-0x4
...
0000000000000030 <_sub_I_00099_0>:
30: f3 0f 1e fa endbr64
34: e8 00 00 00 00 call 39 <_sub_I_00099_0+0x9>
35: R_X86_64_PLT32 __tsan_init-0x4
39: e9 00 00 00 00 jmp 3e <__ksymtab_cryptd_alloc_ahash+0x2>
3a: R_X86_64_PLT32 __x86_return_thunk-0x4
-------------------
in the .ko file.
Objtool has run already so that second constructor's return thunk cannot
be added to the .return_sites section and thus the return thunk remains
unpatched and the warning rightfully fires.
Drop KCSAN flags from the mod.c generation stage as those constructors
do not contain data races one would be interested about.
Debugged together with David Kaplan <David.Kaplan@amd.com> and Nikolay
Borisov <nik.borisov@suse.com>.
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Closes: https://lore.kernel.org/r/0851a207-7143-417e-be31-8bf2b3afb57d@molgen.mpg.de
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> # Dell XPS 13
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Reviewed-by: Marco Elver <elver@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Arnd Bergmann [Tue, 26 Mar 2024 14:47:16 +0000 (15:47 +0100)]
kbuild: make -Woverride-init warnings more consistent
The -Woverride-init warn about code that may be intentional or not,
but the inintentional ones tend to be real bugs, so there is a bit of
disagreement on whether this warning option should be enabled by default
and we have multiple settings in scripts/Makefile.extrawarn as well as
individual subsystems.
Older versions of clang only supported -Wno-initializer-overrides with
the same meaning as gcc's -Woverride-init, though all supported versions
now work with both. Because of this difference, an earlier cleanup of
mine accidentally turned the clang warning off for W=1 builds and only
left it on for W=2, while it's still enabled for gcc with W=1.
There is also one driver that only turns the warning off for newer
versions of gcc but not other compilers, and some but not all the
Makefiles still use a cc-disable-warning conditional that is no
longer needed with supported compilers here.
Address all of the above by removing the special cases for clang
and always turning the warning off unconditionally where it got
in the way, using the syntax that is supported by both compilers.
Fixes: 2cd3271b7a31 ("kbuild: avoid duplicate warning options")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Andrew Jeffery <andrew@codeconstruct.com.au>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Mikulas Patocka [Sat, 30 Mar 2024 19:23:08 +0000 (20:23 +0100)]
objtool: Fix compile failure when using the x32 compiler
When compiling the v6.9-rc1 kernel with the x32 compiler, the following
errors are reported. The reason is that we take an "unsigned long"
variable and print it using "PRIx64" format string.
In file included from check.c:16:
check.c: In function ‘add_dead_ends’:
/usr/src/git/linux-2.6/tools/objtool/include/objtool/warn.h:46:17: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 5 has type ‘long unsigned int’ [-Werror=format=]
46 | "%s: warning: objtool: " format "\n", \
| ^~~~~~~~~~~~~~~~~~~~~~~~
check.c:613:33: note: in expansion of macro ‘WARN’
613 | WARN("can't find unreachable insn at %s+0x%" PRIx64,
| ^~~~
...
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: linux-kernel@vger.kernel.org
Linus Torvalds [Sat, 30 Mar 2024 20:51:58 +0000 (13:51 -0700)]
Merge tag 'xfs-6.9-fixes-1' of git://git./fs/xfs/xfs-linux
Pull xfs fixes from Chandan Babu:
- Allow stripe unit/width value passed via mount option to be written
over existing values in the super block
- Do not set current->journal_info to avoid its value from being miused
by another filesystem context
* tag 'xfs-6.9-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: don't use current->journal_info
xfs: allow sunit mount option to repair bad primary sb stripe values
Linus Torvalds [Sat, 30 Mar 2024 20:44:52 +0000 (13:44 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes and updates from James Bottomley:
"Fully half this pull is updates to lpfc and qla2xxx which got
committed just as the merge window opened. A sizeable fraction of the
driver updates are simple bug fixes (and lock reworks for bug fixes in
the case of lpfc), so rather than splitting the few actual
enhancements out, we're just adding the drivers to the -rc1 pull.
The enhancements for lpfc are log message removals, copyright updates
and three patches redefining types. For qla2xxx it's just removing a
debug message on module removal and the manufacturer detail update.
The two major fixes are the sg teardown race and a core error leg
problem with the procfs directory not being removed if we destroy a
created host that never got to the running state. The rest are minor
fixes and constifications"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (41 commits)
scsi: bnx2fc: Remove spin_lock_bh while releasing resources after upload
scsi: core: Fix unremoved procfs host directory regression
scsi: mpi3mr: Avoid memcpy field-spanning write WARNING
scsi: sd: Fix TCG OPAL unlock on system resume
scsi: sg: Avoid sg device teardown race
scsi: lpfc: Copyright updates for 14.4.0.1 patches
scsi: lpfc: Update lpfc version to 14.4.0.1
scsi: lpfc: Define types in a union for generic void *context3 ptr
scsi: lpfc: Define lpfc_dmabuf type for ctx_buf ptr
scsi: lpfc: Define lpfc_nodelist type for ctx_ndlp ptr
scsi: lpfc: Use a dedicated lock for ras_fwlog state
scsi: lpfc: Release hbalock before calling lpfc_worker_wake_up()
scsi: lpfc: Replace hbalock with ndlp lock in lpfc_nvme_unregister_port()
scsi: lpfc: Update lpfc_ramp_down_queue_handler() logic
scsi: lpfc: Remove IRQF_ONESHOT flag from threaded IRQ handling
scsi: lpfc: Move NPIV's transport unregistration to after resource clean up
scsi: lpfc: Remove unnecessary log message in queuecommand path
scsi: qla2xxx: Update version to 10.02.09.200-k
scsi: qla2xxx: Delay I/O Abort on PCI error
scsi: qla2xxx: Change debug message during driver unload
...
Linus Torvalds [Sat, 30 Mar 2024 20:16:21 +0000 (13:16 -0700)]
Merge tag 'i2c-for-6.9-rc2' of git://git./linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
"A fix from Andi for I2C host drivers"
* tag 'i2c-for-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: i801: Fix a refactoring that broke a touchpad on Lenovo P1
Linus Torvalds [Sat, 30 Mar 2024 20:11:42 +0000 (13:11 -0700)]
Merge tag 'usb-6.9-rc2' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a bunch of small USB fixes for reported problems and
regressions for 6.9-rc2. Included in here are:
- deadlock fixes for long-suffering issues
- USB phy driver revert for reported problem
- typec fixes for reported problems
- duplicate id in dwc3 dropped
- dwc2 driver fixes
- udc driver warning fix
- cdc-wdm race bugfix
- other tiny USB bugfixes
All of these have been in linux-next this past week with no reported
issues"
* tag 'usb-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (26 commits)
USB: core: Fix deadlock in port "disable" sysfs attribute
USB: core: Add hub_get() and hub_put() routines
usb: typec: ucsi: Check capabilities before cable and identity discovery
usb: typec: ucsi: Clear UCSI_CCI_RESET_COMPLETE before reset
usb: typec: ucsi_acpi: Refactor and fix DELL quirk
usb: typec: ucsi: Ack unsupported commands
usb: typec: ucsi: Check for notifications after init
usb: typec: ucsi: Clear EVENT_PENDING under PPM lock
usb: typec: Return size of buffer if pd_set operation succeeds
usb: udc: remove warning when queue disabled ep
usb: dwc3: pci: Drop duplicate ID
usb: dwc3: Properly set system wakeup
Revert "usb: phy: generic: Get the vbus supply"
usb: cdc-wdm: close race between read and workqueue
usb: dwc2: gadget: LPM flow fix
usb: dwc2: gadget: Fix exiting from clock gating
usb: dwc2: host: Fix ISOC flow in DDMA mode
usb: dwc2: host: Fix remote wakeup from hibernation
usb: dwc2: host: Fix hibernation flow
USB: core: Fix deadlock in usb_deauthorize_interface()
...
Linus Torvalds [Sat, 30 Mar 2024 19:59:00 +0000 (12:59 -0700)]
Merge tag 'staging-6.9-rc2' of git://git./linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are two small staging driver fixes for the vc04_services driver
that resolve reported problems:
- strncpy fix for information leak
- another information leak discovered by the previous strncpy fix
Both of these have been in linux-next all this past week with no
reported issues"
* tag 'staging-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: vc04_services: fix information leak in create_component()
staging: vc04_services: changen strncpy() to strscpy_pad()
Wolfram Sang [Sat, 30 Mar 2024 14:37:54 +0000 (15:37 +0100)]
Merge tag 'i2c-host-fixes-6.9-rc2' of git://git./linux/kernel/git/andi.shyti/linux into i2c/for-current
One fix in the i801 driver where a bug caused touchpad
malfunctions on some Lenovo P1 models by incorrectly overwriting
a status variable during successful SMBUS transactions.
Masahiro Yamada [Sat, 30 Mar 2024 06:05:54 +0000 (15:05 +0900)]
x86/build: Use obj-y to descend into arch/x86/virt/
Commit
c33621b4c5ad ("x86/virt/tdx: Wire up basic SEAMCALL functions")
introduced a new instance of core-y instead of the standardized obj-y
syntax.
X86 Makefiles descend into subdirectories of arch/x86/virt inconsistently;
into arch/x86/virt/ via core-y defined in arch/x86/Makefile, but into
arch/x86/virt/svm/ via obj-y defined in arch/x86/Kbuild.
This is problematic when you build a single object in parallel because
multiple threads attempt to build the same file.
$ make -j$(nproc) arch/x86/virt/vmx/tdx/seamcall.o
[ snip ]
AS arch/x86/virt/vmx/tdx/seamcall.o
AS arch/x86/virt/vmx/tdx/seamcall.o
fixdep: error opening file: arch/x86/virt/vmx/tdx/.seamcall.o.d: No such file or directory
make[4]: *** [scripts/Makefile.build:362: arch/x86/virt/vmx/tdx/seamcall.o] Error 2
Use the obj-y syntax, as it works correctly.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20240330060554.18524-1-masahiroy@kernel.org
Linus Torvalds [Fri, 29 Mar 2024 22:51:15 +0000 (15:51 -0700)]
Merge tag 'drm-fixes-2024-03-30' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Regular fixes for rc2, quite a few i915/amdgpu as usual, some xe, and
then mostly scattered around. rc3 might be quieter with the holidays
but we shall see.
bridge:
- select DRM_KMS_HELPER
dma-buf:
- fix NULL-pointer deref
dp:
- fix div-by-zero in DP MST unplug code
fbdev:
- select FB_IOMEM_FOPS for SBus
sched:
- fix NULL-pointer deref
xe:
- Fix build on mips
- Fix wrong bound checks
- Fix use of msec rather than jiffies
- Remove dead code
amdgpu:
- SMU 14.0.1 updates
- DCN 3.5.x updates
- VPE fix
- eDP panel flickering fix
- Suspend fix
- PSR fix
- DCN 3.0+ fix
- VCN 4.0.6 updates
- debugfs fix
amdkfd:
- DMA-Buf fix
- GFX 9.4.2 TLB flush fix
- CP interrupt fix
i915:
- Fix for BUG_ON/BUILD_BUG_ON IN I915_memcpy.c
- Update a MTL workaround
- Fix locking inversion in hwmon's sysfs
- Remove a bogus error message around PXP
- Fix UAF on VMA
- Reset queue_priority_hint on parking
- Display Fixes:
- Remove duplicated audio enable/disable on SDVO and DP
- Disable AuxCCS for Xe driver
- Revert init order of MIPI DSI
- DRRS debugfs fix with an extra refactor patch
- VRR related fixes
- Fix a JSL eDP corruption
- Fix the cursor physical dma address
- BIOS VBT related fix
nouveau:
- dmem: handle kcalloc() allocation failures
qxl:
- remove unused variables
rockchip:
- vop2: remove support for AR30 and AB30 formats
vmwgfx:
- debugfs: create ttm_resource_manager entry only if needed"
* tag 'drm-fixes-2024-03-30' of https://gitlab.freedesktop.org/drm/kernel: (55 commits)
drm/i915/bios: Tolerate devdata==NULL in intel_bios_encoder_supports_dp_dual_mode()
drm/i915: Pre-populate the cursor physical dma address
drm/i915/gt: Reset queue_priority_hint on parking
drm/i915/vma: Fix UAF on destroy against retire race
drm/i915: Do not print 'pxp init failed with 0' when it succeed
drm/i915: Do not match JSL in ehl_combo_pll_div_frac_wa_needed()
drm/i915/hwmon: Fix locking inversion in sysfs getter
drm/i915/dsb: Fix DSB vblank waits when using VRR
drm/i915/vrr: Generate VRR "safe window" for DSB
drm/i915/display/debugfs: Fix duplicate checks in i915_drrs_status
drm/i915/drrs: Refactor CPU transcoder DRRS check
drm/i915/mtl: Update workaround
14018575942
drm/i915/dsi: Go back to the previous INIT_OTP/DISPLAY_ON order, mostly
drm/i915/display: Disable AuxCCS framebuffers if built for Xe
drm/i915: Stop doing double audio enable/disable on SDVO and g4x+ DP
drm/i915: Add includes for BUG_ON/BUILD_BUG_ON in i915_memcpy.c
drm/qxl: remove unused variable from `qxl_process_single_command()`
drm/qxl: remove unused `count` variable from `qxl_surface_id_alloc()`
drm/i915: add bug.h include to i915_memcpy.c
drm/vmwgfx: Create debugfs ttm_resource_manager entry only if needed
...
Linus Torvalds [Fri, 29 Mar 2024 22:38:29 +0000 (15:38 -0700)]
Merge tag 'linux_kselftest-fixes-6.9-rc2' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"Fixes to seccomp and ftrace tests and a change to add config file for
dmabuf-heap test to increase coverage"
* tag 'linux_kselftest-fixes-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: dmabuf-heap: add config file for the test
selftests/seccomp: Try to fit runtime of benchmark into timeout
selftests/ftrace: Fix event filter target_func selection
Linus Torvalds [Fri, 29 Mar 2024 22:35:12 +0000 (15:35 -0700)]
Merge tag 'linux_kselftest-kunit-fixes-6.9-rc2' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull KUnit fixes from Shuah Khan:
"One urgent fix for --alltests build failure related to renaming of
CONFIG_DAMON_DBGFS to DAMON_DBGFS_DEPRECATED to the missing config
option"
* tag 'linux_kselftest-kunit-fixes-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: configs: Enable CONFIG_DAMON_DBGFS_DEPRECATED for --alltests
Muhammad Usama Anjum [Tue, 5 Mar 2024 06:08:47 +0000 (11:08 +0500)]
selftests: dmabuf-heap: add config file for the test
The config fragment enlists all the config options needed for the test.
This config is merged into the kernel's config on which this test is
run.
Fixed whitespace errors during commit:
Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: T.J. Mercier <tjmercier@google.com>
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Mark Brown [Mon, 25 Mar 2024 16:57:59 +0000 (16:57 +0000)]
selftests/seccomp: Try to fit runtime of benchmark into timeout
The seccomp benchmark runs five scenarios, one calibration run with no
seccomp filters enabled then four further runs each adding a filter. The
calibration run times itself for 15s and then each additional run executes
for the same number of times.
Currently the seccomp tests, including the benchmark, run with an extended
120s timeout but this is not sufficient to robustly run the tests on a lot
of platforms. Sample timings from some recent runs:
Platform Run 1 Run 2 Run 3 Run 4
--------- ----- ----- ----- -----
PowerEdge R200 16.6s 16.6s 31.6s 37.4s
BBB (arm) 20.4s 20.4s 54.5s
Synquacer (arm64) 20.7s 23.7s 40.3s
The x86 runs from the PowerEdge are quite marginal and routinely fail, for
the successful run reported here the timed portions of the run are at
117.2s leaving less than 3s of margin which is frequently breached. The
added overhead of adding filters on the other platforms is such that there
is no prospect of their runs fitting into the 120s timeout, especially
on 32 bit arm where there is no BPF JIT.
While we could lower the time we calibrate for I'm also already seeing the
currently completing runs reporting issues with the per filter overheads
not matching expectations:
Let's instead raise the timeout to 180s which is only a 50% increase on the
current timeout which is itself not *too* large given that there's only two
tests in this suite.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Mark Rutland [Wed, 20 Mar 2024 14:18:44 +0000 (14:18 +0000)]
selftests/ftrace: Fix event filter target_func selection
The event filter function test has been failing in our internal test
farm:
| # not ok 33 event filter function - test event filtering on functions
Running the test in verbose mode indicates that this is because the test
erroneously determines that kmem_cache_free() is the most common caller
of kmem_cache_free():
# # + cut -d: -f3 trace
# # + sed s/call_site=([^+]*)+0x.*/1/
# # + sort
# # + uniq -c
# # + sort
# # + tail -n 1
# # + sed s/^[ 0-9]*//
# # + target_func=kmem_cache_free
... and as kmem_cache_free() doesn't call itself, setting this as the
filter function for kmem_cache_free() results in no hits, and
consequently the test fails:
# # + grep kmem_cache_free trace
# # + grep kmem_cache_free
# # + wc -l
# # + hitcnt=0
# # + grep kmem_cache_free trace
# # + grep -v kmem_cache_free
# # + wc -l
# # + misscnt=0
# # + [ 0 -eq 0 ]
# # + exit_fail
This seems to be because the system in question has tasks with ':' in
their name (which a number of kernel worker threads have). These show up
in the trace, e.g.
test:.sh-1299 [004] ..... 2886.040608: kmem_cache_free: call_site=putname+0xa4/0xc8 ptr=
000000000f4d22f4 name=names_cache
... and so when we try to extact the call_site with:
cut -d: -f3 trace | sed 's/call_site=\([^+]*\)+0x.*/\1/'
... the 'cut' command will extrace the column containing
'kmem_cache_free' rather than the column containing 'call_site=...', and
the 'sed' command will leave this unchanged. Consequently, the test will
decide to use 'kmem_cache_free' as the filter function, resulting in the
failure seen above.
Fix this by matching the 'call_site=<func>' part specifically to extract
the function name.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: linux-trace-kernel@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Dave Airlie [Fri, 29 Mar 2024 19:33:22 +0000 (05:33 +1000)]
Merge tag 'drm-intel-fixes-2024-03-28' of https://anongit.freedesktop.org/git/drm/drm-intel into drm-fixes
Core/GT Fixes:
- Fix for BUG_ON/BUILD_BUG_ON IN I915_memcpy.c (Joonas)
- Update a MTL workaround (Tejas)
- Fix locking inversion in hwmon's sysfs (Janusz)
- Remove a bogus error message around PXP (Jose)
- Fix UAF on VMA (Janusz)
- Reset queue_priority_hint on parking (Chris)
Display Fixes:
- Remove duplicated audio enable/disable on SDVO and DP (Ville)
- Disable AuxCCS for Xe driver (Juha-Pekka)
- Revert init order of MIPI DSI (Ville)
- DRRS debugfs fix with an extra refactor patch (Bhanuprakash)
- VRR related fixes (Ville)
- Fix a JSL eDP corruption (Jonathon)
- Fix the cursor physical dma address (Ville)
- BIOS VBT related fix (Ville)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZgYaIVgjIs30mIvS@intel.com
Borislav Petkov (AMD) [Thu, 28 Mar 2024 12:59:05 +0000 (13:59 +0100)]
x86/bugs: Fix the SRSO mitigation on Zen3/4
The original version of the mitigation would patch in the calls to the
untraining routines directly. That is, the alternative() in UNTRAIN_RET
will patch in the CALL to srso_alias_untrain_ret() directly.
However, even if commit
e7c25c441e9e ("x86/cpu: Cleanup the untrain
mess") meant well in trying to clean up the situation, due to micro-
architectural reasons, the untraining routine srso_alias_untrain_ret()
must be the target of a CALL instruction and not of a JMP instruction as
it is done now.
Reshuffle the alternative macros to accomplish that.
Fixes: e7c25c441e9e ("x86/cpu: Cleanup the untrain mess")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 29 Mar 2024 19:06:09 +0000 (12:06 -0700)]
Merge tag '6.9-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Add missing trace point (noticed when debugging the recent mknod LSM
regression)
- fscache fix
* tag '6.9-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix duplicate fscache cookie warnings
smb3: add trace event for mknod
Linus Torvalds [Fri, 29 Mar 2024 18:50:38 +0000 (11:50 -0700)]
Merge tag 'thermal-6.9-rc2' of git://git./linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"These revert a problematic optimization commit and address a devfreq
cooling device issue.
Specifics:
- Revert thermal core optimization that introduced a functional issue
causing a critical trip point to be crossed in some cases (Daniel
Lezcano)
- Add missing conversion between different state ranges to the
devfreq cooling device driver (Ye Zhang)"
* tag 'thermal-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: devfreq_cooling: Fix perf state when calculate dfc res_util
Revert "thermal: core: Don't update trip points inside the hysteresis range"
Linus Torvalds [Fri, 29 Mar 2024 18:37:12 +0000 (11:37 -0700)]
Merge tag 'acpi-6.9-rc2' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix two issues that may lead to attempts to use memory that has
been freed already.
Specifics:
- Drop __exit annotation from einj_remove() in the ACPI APEI code
because this function can be called during runtime (Arnd Bergmann)
- Make acpi_db_walk_for_fields() check acpi_evaluate_object() return
value to avoid accessing memory that has been freed (Nikita
Kiryushin)"
* tag 'acpi-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: debugger: check status of acpi_evaluate_object() in acpi_db_walk_for_fields()
ACPI: APEI: EINJ: mark remove callback as non-__exit
Linus Torvalds [Fri, 29 Mar 2024 18:06:13 +0000 (11:06 -0700)]
mm: clean up populate_vma_page_range() FOLL_* flag handling
The code wasn't exactly wrong, but it was very odd, and it used
FOLL_FORCE together with FOLL_WRITE when it really didn't need to (it
only set FOLL_WRITE for writable mappings, so then the FOLL_FORCE was
pointless).
It also pointlessly called __get_user_pages() even when it knew it
wouldn't populate anything because the vma wasn't accessible and it
explicitly tested for and did *not* set FOLL_FORCE for inaccessible
vma's.
This code does need to use FOLL_FORCE, because we want to do fault in
writable shared mappings, but then the mapping may not actually be
readable. And we don't want to use FOLL_WRITE (which would match the
permission of the vma), because that would also dirty the pages, which
we don't want to do.
For very similar reasons, FOLL_FORCE populates a executable-only mapping
with no read permissions. We don't have a FOLL_EXEC flag.
Yes, it would probably be cleaner to split FOLL_WRITE into two bits (for
separate permission and dirty bit handling), and add a FOLL_EXEC flag
for the "GUP executable page" case. That would allow us to avoid
FOLL_FORCE entirely here.
But that's not how our FOLL_xyz bits have traditionally worked, and that
would be a much bigger patch.
So this at least avoids the FOLL_FORCE | FOLL_WRITE combination that
made one of my experimental validation patches trigger a warning. That
warning was a false positive (and my experimental patch was incomplete
anyway), but it all made me look at this and decide to clean at least
this small case up.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rafael J. Wysocki [Fri, 29 Mar 2024 18:00:09 +0000 (19:00 +0100)]
Merge branch 'acpica'
* acpica:
ACPICA: debugger: check status of acpi_evaluate_object() in acpi_db_walk_for_fields()
Linus Torvalds [Fri, 29 Mar 2024 16:51:04 +0000 (09:51 -0700)]
Merge tag 'efi-fixes-for-v6.9-3' of git://git./linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
"These address all the outstanding EFI/x86 boot related regressions:
- Revert to the old initrd memory allocation soft limit of INT_MAX,
which was dropped inadvertently
- Ensure that startup_32() is entered with a valid boot_params
pointer when using the new EFI mixed mode protocol
- Fix a compiler warning introduced by a fix from the previous pull"
* tag 'efi-fixes-for-v6.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
x86/efistub: Reinstate soft limit for initrd loading
efi/libstub: Cast away type warning in use of max()
x86/efistub: Add missing boot_params for mixed mode compat entry
Linus Torvalds [Fri, 29 Mar 2024 16:40:22 +0000 (09:40 -0700)]
Merge tag 'block-6.9-
20240329' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
"Small round of minor fixes or cleanups for the 6.9-rc2 kernel, one
fixing an issue introduced in 6.8"
* tag 'block-6.9-
20240329' of git://git.kernel.dk/linux:
block: Do not force full zone append completion in req_bio_endio()
block: don't reject too large max_user_sectors in blk_validate_limits
block: Make blk_rq_set_mixed_merge() static
Linus Torvalds [Fri, 29 Mar 2024 16:33:05 +0000 (09:33 -0700)]
Merge tag 'for-6.9/dm-fixes-2' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix MAINTAINERS to not include M: dm-devel for DM entries.
- Fix DM vdo's murmurhash to use proper byteswapping methods.
- Fix DM integrity clang warning about comparison out-of-range.
* tag 'for-6.9/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm integrity: fix out-of-range warning
dm vdo murmurhash3: use kernel byteswapping routines instead of GCC ones
MAINTAINERS: Remove incorrect M: tag for dm-devel@lists.linux.dev
Linus Torvalds [Fri, 29 Mar 2024 16:26:34 +0000 (09:26 -0700)]
Merge tag 'gpio-fixes-for-v6.9-rc2' of git://git./linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix a procfs failure when requesting an interrupt with a label
containing the '/' character
- add missing stubs for GPIO lookup functions for !GPIOLIB
- fix debug messages that would print "(null)" for NULL strings
* tag 'gpio-fixes-for-v6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: Fix debug messaging in gpiod_find_and_request()
gpiolib: Add stubs for GPIO lookup functions
gpio: cdev: sanitize the label before requesting the interrupt
Arnd Bergmann [Thu, 28 Mar 2024 14:30:39 +0000 (15:30 +0100)]
dm integrity: fix out-of-range warning
Depending on the value of CONFIG_HZ, clang complains about a pointless
comparison:
drivers/md/dm-integrity.c:4085:12: error: result of comparison of
constant
42949672950 with expression of type
'unsigned int' is always false
[-Werror,-Wtautological-constant-out-of-range-compare]
if (val >= (uint64_t)UINT_MAX * 1000 / HZ) {
As the check remains useful for other configurations, shut up the
warning by adding a second type cast to uint64_t.
Fixes: 468dfca38b1a ("dm integrity: add a bitmap mode")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Ken Raeburn [Mon, 25 Mar 2024 19:22:45 +0000 (15:22 -0400)]
dm vdo murmurhash3: use kernel byteswapping routines instead of GCC ones
Also open-code the calls.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Ken Raeburn <raeburn@redhat.com>
Signed-off-by: Matthew Sakai <msakai@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Kuan-Wei Chiu [Tue, 19 Mar 2024 18:18:42 +0000 (02:18 +0800)]
MAINTAINERS: Remove incorrect M: tag for dm-devel@lists.linux.dev
The dm-devel@lists.linux.dev mailing list should only be listed under
the L: (List) tag in the MAINTAINERS file. However, it was incorrectly
listed under both L: and M: (Maintainers) tags, which is not accurate.
Remove the M: tag for dm-devel@lists.linux.dev in the MAINTAINERS file
to reflect the correct categorization.
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Linus Torvalds [Fri, 29 Mar 2024 00:15:33 +0000 (17:15 -0700)]
Merge tag 'mmc-v6.9-rc1' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- Fix regression for the mmc ioctl
MMC host:
- sdhci-of-dwcmshc: Fixup PM support in ->remove_new()
- sdhci-omap: Re-tune when device became runtime suspended"
* tag 'mmc-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
sdhci-of-dwcmshc: disable PM runtime in dwcmshc_remove()
mmc: sdhci-omap: re-tuning is needed after a pm transition to support emmc HS200 mode
mmc: core: Avoid negative index with array access
mmc: core: Initialize mmc_blk_ioc_data
Damien Le Moal [Thu, 28 Mar 2024 00:43:40 +0000 (09:43 +0900)]
block: Do not force full zone append completion in req_bio_endio()
This reverts commit
748dc0b65ec2b4b7b3dbd7befcc4a54fdcac7988.
Partial zone append completions cannot be supported as there is no
guarantees that the fragmented data will be written sequentially in the
same manner as with a full command. Commit
748dc0b65ec2 ("block: fix
partial zone append completion handling in req_bio_endio()") changed
req_bio_endio() to always advance a partially failed BIO by its full
length, but this can lead to incorrect accounting. So revert this
change and let low level device drivers handle this case by always
failing completely zone append operations. With this revert, users will
still see an IO error for a partially completed zone append BIO.
Fixes: 748dc0b65ec2 ("block: fix partial zone append completion handling in req_bio_endio()")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20240328004409.594888-2-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Thu, 28 Mar 2024 21:54:49 +0000 (14:54 -0700)]
Merge tag 'sound-6.9-rc2' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of device-specific small fixes: a series of fixes for
TAS2781 HD-audio codec, ASoC SOF, Cirrus CS35L56 and a couple of
legacy drivers"
* tag 'sound-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/tas2781: remove useless dev_dbg from playback_hook
ALSA: hda/tas2781: add debug statements to kcontrols
ALSA: hda/tas2781: add locks to kcontrols
ALSA: hda/tas2781: remove digital gain kcontrol
ALSA: aoa: avoid false-positive format truncation warning
ALSA: sh: aica: reorder cleanup operations to avoid UAF bugs
ALSA: hda: cs35l56: Set the init_done flag before component_add()
ALSA: hda: cs35l56: Raise device name message log level
ASoC: SOF: ipc4-topology: support NHLT device type
ALSA: hda: intel-nhlt: add intel_nhlt_ssp_device_type() function
Linus Torvalds [Thu, 28 Mar 2024 21:40:46 +0000 (14:40 -0700)]
Merge tag 'iommu-fixes-v6.9-rc1' of git://git./linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
"ARM SMMU fixes:
- Fix swabbing of the STE fields in the unlikely event of running on
a big-endian machine
- Fix setting of STE.SHCFG on hardware that doesn't implement support
for attribute overrides
IOMMU core:
- PASID validation fix in device attach path"
* tag 'iommu-fixes-v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: Validate the PASID in iommu_attach_device_pasid()
iommu/arm-smmu-v3: Fix access for STE.SHCFG
iommu/arm-smmu-v3: Add cpu_to_le64() around STRTAB_STE_0_V
Linus Torvalds [Thu, 28 Mar 2024 21:35:32 +0000 (14:35 -0700)]
Merge tag 'nfsd-6.9-1' of git://git./linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Address three recently introduced regressions
* tag 'nfsd-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: CREATE_SESSION must never cache NFS4ERR_DELAY replies
SUNRPC: Revert
561141dd494382217bace4d1a51d08168420eace
nfsd: Fix error cleanup path in nfsd_rename()
Linus Torvalds [Thu, 28 Mar 2024 20:09:37 +0000 (13:09 -0700)]
Merge tag 'net-6.9-rc2' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf, WiFi and netfilter.
Current release - regressions:
- ipv6: fix address dump when IPv6 is disabled on an interface
Current release - new code bugs:
- bpf: temporarily disable atomic operations in BPF arena
- nexthop: fix uninitialized variable in nla_put_nh_group_stats()
Previous releases - regressions:
- bpf: protect against int overflow for stack access size
- hsr: fix the promiscuous mode in offload mode
- wifi: don't always use FW dump trig
- tls: adjust recv return with async crypto and failed copy to
userspace
- tcp: properly terminate timers for kernel sockets
- ice: fix memory corruption bug with suspend and rebuild
- at803x: fix kernel panic with at8031_probe
- qeth: handle deferred cc1
Previous releases - always broken:
- bpf: fix bug in BPF_LDX_MEMSX
- netfilter: reject table flag and netdev basechain updates
- inet_defrag: prevent sk release while still in use
- wifi: pick the version of SESSION_PROTECTION_NOTIF
- wwan: t7xx: split 64bit accesses to fix alignment issues
- mlxbf_gige: call request_irq() after NAPI initialized
- hns3: fix kernel crash when devlink reload during pf
initialization"
* tag 'net-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
inet: inet_defrag: prevent sk release while still in use
Octeontx2-af: fix pause frame configuration in GMP mode
net: lan743x: Add set RFE read fifo threshold for PCI1x1x chips
net: bcmasp: Remove phy_{suspend/resume}
net: bcmasp: Bring up unimac after PHY link up
net: phy: qcom: at803x: fix kernel panic with at8031_probe
netfilter: arptables: Select NETFILTER_FAMILY_ARP when building arp_tables.c
netfilter: nf_tables: skip netdev hook unregistration if table is dormant
netfilter: nf_tables: reject table flag and netdev basechain updates
netfilter: nf_tables: reject destroy command to remove basechain hooks
bpf: update BPF LSM designated reviewer list
bpf: Protect against int overflow for stack access size
bpf: Check bloom filter map value size
bpf: fix warning for crash_kexec
selftests: netdevsim: set test timeout to 10 minutes
net: wan: framer: Add missing static inline qualifiers
mlxbf_gige: call request_irq() after NAPI initialized
tls: get psock ref after taking rxlock to avoid leak
selftests: tls: add test with a partially invalid iov
tls: adjust recv return with async crypto and failed copy to userspace
...
Dave Airlie [Thu, 28 Mar 2024 19:00:13 +0000 (05:00 +1000)]
Merge tag 'drm-misc-fixes-2024-03-28' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Short summary of fixes pull:
bridge:
- select DRM_KMS_HELPER
dma-buf:
- fix NULL-pointer deref
dp:
- fix div-by-zero in DP MST unplug code
fbdev:
- select FB_IOMEM_FOPS for SBus
nouveau:
- dmem: handle kcalloc() allocation failures
qxl:
- remove unused variables
rockchip:
- vop2: remove support for AR30 and AB30 formats
sched:
- fix NULL-pointer deref
vmwgfx:
- debugfs: create ttm_resource_manager entry only if needed
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240328134417.GA8673@localhost.localdomain
David Gow [Tue, 26 Mar 2024 10:07:38 +0000 (18:07 +0800)]
kunit: configs: Enable CONFIG_DAMON_DBGFS_DEPRECATED for --alltests
This is required, as CONFIG_DAMON_DEBUGFS is enabled, and --alltests UML
builds will fail due to the missing config option otherwise.
Fixes: f4cba4bf6777 ("mm/damon: rename CONFIG_DAMON_DBGFS to DAMON_DBGFS_DEPRECATED")
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Rae Moar <rmoar@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Ville Syrjälä [Tue, 19 Mar 2024 09:24:42 +0000 (11:24 +0200)]
drm/i915/bios: Tolerate devdata==NULL in intel_bios_encoder_supports_dp_dual_mode()
If we have no VBT, or the VBT didn't declare the encoder
in question, we won't have the 'devdata' for the encoder.
Instead of oopsing just bail early.
We won't be able to tell whether the port is DP++ or not,
but so be it.
Cc: stable@vger.kernel.org
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10464
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240319092443.15769-1-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit
26410896206342c8a80d2b027923e9ee7d33b733)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Ville Syrjälä [Mon, 25 Mar 2024 17:57:38 +0000 (19:57 +0200)]
drm/i915: Pre-populate the cursor physical dma address
Calling i915_gem_object_get_dma_address() from the vblank
evade critical section triggers might_sleep().
While we know that we've already pinned the framebuffer
and thus i915_gem_object_get_dma_address() will in fact
not sleep in this case, it seems reasonable to keep the
unconditional might_sleep() for maximum coverage.
So let's instead pre-populate the dma address during
fb pinning, which all happens before we enter the
vblank evade critical section.
We can use u32 for the dma address as this class of
hardware doesn't support >32bit addresses.
Cc: stable@vger.kernel.org
Fixes: 0225a90981c8 ("drm/i915: Make cursor plane registers unlocked")
Reported-by: Borislav Petkov <bp@alien8.de>
Closes: https://lore.kernel.org/intel-gfx/20240227100342.GAZd2zfmYcPS_SndtO@fat_crate.local/
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240325175738.3440-1-ville.syrjala@linux.intel.com
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
(cherry picked from commit
c1289a5c3594cf04caa94ebf0edeb50c62009f1f)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Chris Wilson [Mon, 18 Mar 2024 13:58:47 +0000 (14:58 +0100)]
drm/i915/gt: Reset queue_priority_hint on parking
Originally, with strict in order execution, we could complete execution
only when the queue was empty. Preempt-to-busy allows replacement of an
active request that may complete before the preemption is processed by
HW. If that happens, the request is retired from the queue, but the
queue_priority_hint remains set, preventing direct submission until
after the next CS interrupt is processed.
This preempt-to-busy race can be triggered by the heartbeat, which will
also act as the power-management barrier and upon completion allow us to
idle the HW. We may process the completion of the heartbeat, and begin
parking the engine before the CS event that restores the
queue_priority_hint, causing us to fail the assertion that it is MIN.
<3>[ 166.210729] __engine_park:283 GEM_BUG_ON(engine->sched_engine->queue_priority_hint != (-((int)(~0U >> 1)) - 1))
<0>[ 166.210781] Dumping ftrace buffer:
<0>[ 166.210795] ---------------------------------
...
<0>[ 167.302811] drm_fdin-1097 2..s1. 165741070us : trace_ports: 0000:00:02.0 rcs0: promote { ccid:20 1217:2 prio 0 }
<0>[ 167.302861] drm_fdin-1097 2d.s2. 165741072us : execlists_submission_tasklet: 0000:00:02.0 rcs0: preempting last=1217:2, prio=0, hint=
2147483646
<0>[ 167.302928] drm_fdin-1097 2d.s2. 165741072us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence 1217:2, current 0
<0>[ 167.302992] drm_fdin-1097 2d.s2. 165741073us : __i915_request_submit: 0000:00:02.0 rcs0: fence 3:4660, current 4659
<0>[ 167.303044] drm_fdin-1097 2d.s1. 165741076us : execlists_submission_tasklet: 0000:00:02.0 rcs0: context:3 schedule-in, ccid:40
<0>[ 167.303095] drm_fdin-1097 2d.s1. 165741077us : trace_ports: 0000:00:02.0 rcs0: submit { ccid:40 3:4660* prio
2147483646 }
<0>[ 167.303159] kworker/-89 11..... 165741139us : i915_request_retire.part.0: 0000:00:02.0 rcs0: fence c90:2, current 2
<0>[ 167.303208] kworker/-89 11..... 165741148us : __intel_context_do_unpin: 0000:00:02.0 rcs0: context:c90 unpin
<0>[ 167.303272] kworker/-89 11..... 165741159us : i915_request_retire.part.0: 0000:00:02.0 rcs0: fence 1217:2, current 2
<0>[ 167.303321] kworker/-89 11..... 165741166us : __intel_context_do_unpin: 0000:00:02.0 rcs0: context:1217 unpin
<0>[ 167.303384] kworker/-89 11..... 165741170us : i915_request_retire.part.0: 0000:00:02.0 rcs0: fence 3:4660, current 4660
<0>[ 167.303434] kworker/-89 11d..1. 165741172us : __intel_context_retire: 0000:00:02.0 rcs0: context:1216 retire runtime: { total:56028ns, avg:56028ns }
<0>[ 167.303484] kworker/-89 11..... 165741198us : __engine_park: 0000:00:02.0 rcs0: parked
<0>[ 167.303534] <idle>-0 5d.H3. 165741207us : execlists_irq_handler: 0000:00:02.0 rcs0: semaphore yield:
00000040
<0>[ 167.303583] kworker/-89 11..... 165741397us : __intel_context_retire: 0000:00:02.0 rcs0: context:1217 retire runtime: { total:325575ns, avg:0ns }
<0>[ 167.303756] kworker/-89 11..... 165741777us : __intel_context_retire: 0000:00:02.0 rcs0: context:c90 retire runtime: { total:0ns, avg:0ns }
<0>[ 167.303806] kworker/-89 11..... 165742017us : __engine_park: __engine_park:283 GEM_BUG_ON(engine->sched_engine->queue_priority_hint != (-((int)(~0U >> 1)) - 1))
<0>[ 167.303811] ---------------------------------
<4>[ 167.304722] ------------[ cut here ]------------
<2>[ 167.304725] kernel BUG at drivers/gpu/drm/i915/gt/intel_engine_pm.c:283!
<4>[ 167.304731] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
<4>[ 167.304734] CPU: 11 PID: 89 Comm: kworker/11:1 Tainted: G W 6.8.0-rc2-CI_DRM_14193-gc655e0fd2804+ #1
<4>[ 167.304736] Hardware name: Intel Corporation Rocket Lake Client Platform/RocketLake S UDIMM 6L RVP, BIOS RKLSFWI1.R00.3173.A03.
2204210138 04/21/2022
<4>[ 167.304738] Workqueue: i915-unordered retire_work_handler [i915]
<4>[ 167.304839] RIP: 0010:__engine_park+0x3fd/0x680 [i915]
<4>[ 167.304937] Code: 00 48 c7 c2 b0 e5 86 a0 48 8d 3d 00 00 00 00 e8 79 48 d4 e0 bf 01 00 00 00 e8 ef 0a d4 e0 31 f6 bf 09 00 00 00 e8 03 49 c0 e0 <0f> 0b 0f 0b be 01 00 00 00 e8 f5 61 fd ff 31 c0 e9 34 fd ff ff 48
<4>[ 167.304940] RSP: 0018:
ffffc9000059fce0 EFLAGS:
00010246
<4>[ 167.304942] RAX:
0000000000000200 RBX:
0000000000000000 RCX:
0000000000000006
<4>[ 167.304944] RDX:
0000000000000000 RSI:
0000000000000000 RDI:
0000000000000009
<4>[ 167.304946] RBP:
ffff8881330ca1b0 R08:
0000000000000001 R09:
0000000000000001
<4>[ 167.304947] R10:
0000000000000001 R11:
0000000000000001 R12:
ffff8881330ca000
<4>[ 167.304948] R13:
ffff888110f02aa0 R14:
ffff88812d1d0205 R15:
ffff88811277d4f0
<4>[ 167.304950] FS:
0000000000000000(0000) GS:
ffff88844f780000(0000) knlGS:
0000000000000000
<4>[ 167.304952] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
<4>[ 167.304953] CR2:
00007fc362200c40 CR3:
000000013306e003 CR4:
0000000000770ef0
<4>[ 167.304955] PKRU:
55555554
<4>[ 167.304957] Call Trace:
<4>[ 167.304958] <TASK>
<4>[ 167.305573] ____intel_wakeref_put_last+0x1d/0x80 [i915]
<4>[ 167.305685] i915_request_retire.part.0+0x34f/0x600 [i915]
<4>[ 167.305800] retire_requests+0x51/0x80 [i915]
<4>[ 167.305892] intel_gt_retire_requests_timeout+0x27f/0x700 [i915]
<4>[ 167.305985] process_scheduled_works+0x2db/0x530
<4>[ 167.305990] worker_thread+0x18c/0x350
<4>[ 167.305993] kthread+0xfe/0x130
<4>[ 167.305997] ret_from_fork+0x2c/0x50
<4>[ 167.306001] ret_from_fork_asm+0x1b/0x30
<4>[ 167.306004] </TASK>
It is necessary for the queue_priority_hint to be lower than the next
request submission upon waking up, as we rely on the hint to decide when
to kick the tasklet to submit that first request.
Fixes: 22b7a426bbe1 ("drm/i915/execlists: Preempt-to-busy")
Closes: https://gitlab.freedesktop.org/drm/intel/issues/10154
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.4+
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240318135906.716055-2-janusz.krzysztofik@linux.intel.com
(cherry picked from commit
98850e96cf811dc2d0a7d0af491caff9f5d49c1e)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>