Paul E. McKenney [Tue, 7 Jun 2022 22:23:52 +0000 (15:23 -0700)]
rcu-tasks: Be more patient for RCU Tasks boot-time testing
The RCU-Tasks family of grace-period primitives can take some time to
complete, and the amount of time can depend on the exact hardware and
software configuration. Some configurations boot up fast enough that the
RCU-Tasks verification process gets false-positive failures. This commit
therefore allows up to 30 seconds for the grace periods to complete, with
this value adjustable downwards using the rcupdate.rcu_task_stall_timeout
kernel boot parameter.
Reported-by: Matthew Wilcox <willy@infradead.org>
Reported-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Paul E. McKenney [Tue, 7 Jun 2022 04:30:38 +0000 (21:30 -0700)]
rcu-tasks: Update comments
This commit updates comments to reflect the changes in the series
of commits that eliminated the full task-list scan.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Wed, 8 Jun 2022 00:25:03 +0000 (17:25 -0700)]
rcu-tasks: Disable and enable CPU hotplug in same function
The rcu_tasks_trace_pregp_step() function invokes cpus_read_lock() to
disable CPU hotplug, and a later call to the rcu_tasks_trace_postscan()
function invokes cpus_read_unlock() to re-enable it. This was absolutely
necessary in the past in order to protect the intervening scan of the full
tasks list, but there is no longer such a scan. This commit therefore
improves readability by moving the cpus_read_unlock() call to the end
of the rcu_tasks_trace_pregp_step() function. This commit is a pure
code-motion commit without any (intended) change in functionality.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Fri, 3 Jun 2022 00:30:01 +0000 (17:30 -0700)]
rcu-tasks: Eliminate RCU Tasks Trace IPIs to online CPUs
Currently, the RCU Tasks Trace grace-period kthread IPIs each online CPU
using smp_call_function_single() in order to track any tasks currently in
RCU Tasks Trace read-side critical sections during which the corresponding
task has neither blocked nor been preempted. These IPIs are annoying
and are also not strictly necessary because any task that blocks or is
preempted within its current RCU Tasks Trace read-side critical section
will be tracked on one of the per-CPU rcu_tasks_percpu structure's
->rtp_blkd_tasks list. So the only time that this is a problem is if
one of the CPUs runs through a long-duration RCU Tasks Trace read-side
critical section without a context switch.
Note that the task_call_func() function cannot help here because there is
no safe way to identify the target task. Of course, the task_call_func()
function will be very useful later, when processing the list of tasks,
but it needs to know the task.
This commit therefore creates a cpu_curr_snapshot() function that returns
a pointer the task_struct structure of some task that happened to be
running on the specified CPU more or less during the time that the
cpu_curr_snapshot() function was executing. If there was no context
switch during this time, this function will return a pointer to the
task_struct structure of the task that was running throughout. If there
was a context switch, then the outgoing task will be taken care of by
RCU's context-switch hook, and the incoming task was either already taken
care during some previous context switch, or it is not currently within an
RCU Tasks Trace read-side critical section. And in this latter case, the
grace period already started, so there is no need to wait on this task.
This new cpu_curr_snapshot() function is invoked on each CPU early in
the RCU Tasks Trace grace-period processing, and the resulting tasks
are queued for later quiescent-state inspection.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Thu, 2 Jun 2022 04:26:57 +0000 (21:26 -0700)]
rcu-tasks: Maintain a count of tasks blocking RCU Tasks Trace grace period
This commit maintains a new n_trc_holdouts counter that tracks the number
of tasks blocking the RCU Tasks grace period. This counter is useful
for debugging, and its value has been added to a diagostic message.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Fri, 20 May 2022 17:21:00 +0000 (10:21 -0700)]
rcu-tasks: Stop RCU Tasks Trace from scanning full tasks list
This commit takes off the training wheels and relies only on scanning
currently running tasks and tasks that have blocked or been preempted
within their current RCU Tasks Trace read-side critical section.
Before this commit, the time complexity of an RCU Tasks Trace grace
period is O(T), where T is the number of tasks. After this commit,
this time complexity is O(C+B), where C is the number of CPUs and B
is the number of tasks that have blocked (or been preempted) at least
once during their current RCU Tasks Trace read-side critical sections.
Of course, if all tasks have blocked (or been preempted) at least once
during their current RCU Tasks Trace read-side critical sections, this is
still O(T), but current expectations are that RCU Tasks Trace read-side
critical section will be short and that there will normally not be large
numbers of tasks blocked within such a critical section.
Dave Marchevsky kindly measured the effects of this commit on the RCU
Tasks Trace grace-period latency and the rcu_tasks_trace_kthread task's
CPU consumption per RCU Tasks Trace grace period over the course of a
fixed test, all in milliseconds:
Before After
GP latency 22.3 ms stddev > 0.1 17.0 ms stddev < 0.1
GP CPU 2.3 ms stddev 0.3 1.1 ms stddev 0.2
This was on a system with 15,000 tasks, so it is reasonable to expect
much larger savings on the systems on which this issue was first noted,
given that they sport well in excess of 100,000 tasks. CPU consumption
was measured using profiling techniques.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Tested-by: Dave Marchevsky <davemarchevsky@fb.com>
Paul E. McKenney [Thu, 19 May 2022 00:19:27 +0000 (17:19 -0700)]
rcu-tasks: Stop RCU Tasks Trace from scanning idle tasks
Now that RCU scans both running tasks and tasks that have blocked within
their current RCU Tasks Trace read-side critical section, there is no
need for it to scan the idle tasks. After all, an idle loop should not
be remain within an RCU Tasks Trace read-side critical section across
exit from idle, and from a BPF viewpoint, functions invoked from the
idle loop should not sleep. So only running idle tasks can be within
RCU Tasks Trace read-side critical sections.
This commit therefore removes the scan of the idle tasks from the
rcu_tasks_trace_postscan() function.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Thu, 19 May 2022 00:19:27 +0000 (17:19 -0700)]
rcu-tasks: Pull in tasks blocked within RCU Tasks Trace readers
This commit scans each CPU's ->rtp_blkd_tasks list, adding them to
the list of holdout tasks. This will cause the current RCU Tasks Trace
grace period to wait until these tasks exit their RCU Tasks Trace
read-side critical sections. This commit will enable later work
omitting the scan of the full task list.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Wed, 18 May 2022 23:06:55 +0000 (16:06 -0700)]
rcu-tasks: Scan running tasks for RCU Tasks Trace readers
A running task might be within an RCU Tasks Trace read-side critical
section for any length of time, but will not be placed on any of the
per-CPU rcu_tasks_percpu structure's ->rtp_blkd_tasks lists. Therefore
any RCU Tasks Trace grace-period processing that does not scan the full
task list must interact with the running tasks.
This commit therefore causes the rcu_tasks_trace_pregp_step() function
to IPI each CPU in order to place the corresponding task on the holdouts
list and to record whether or not it was in an RCU Tasks Trace read-side
critical section. Yes, it is possible to avoid adding it to that list
if it is not a reader, but that would prevent the system from remembering
that this task was in a quiescent state. Which is why the running tasks
are unconditionally added to the holdout list.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Wed, 18 May 2022 00:38:02 +0000 (17:38 -0700)]
rcu-tasks: Avoid rcu_tasks_trace_pertask() duplicate list additions
This commit adds checks within rcu_tasks_trace_pertask() to avoid
duplicate (and destructive) additions to the holdouts list. These checks
will be required later due to the possibility of a given task having
blocked while in an RCU Tasks Trace read-side critical section, but now
running on a CPU.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Wed, 18 May 2022 00:11:37 +0000 (17:11 -0700)]
rcu-tasks: Move rcu_tasks_trace_pertask() before rcu_tasks_trace_pregp_step()
This is a code-motion-only commit that moves rcu_tasks_trace_pertask()
to precede rcu_tasks_trace_pregp_step(), so that the latter will be
able to invoke the other without forward references.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Tue, 17 May 2022 23:47:40 +0000 (16:47 -0700)]
rcu-tasks: Add blocked-task indicator to RCU Tasks Trace stall warnings
This commit adds a "B" indicator to the RCU Tasks Trace CPU stall warning
when the task has blocked within its current read-side critical section.
This serves as a debugging aid.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Tue, 17 May 2022 22:01:14 +0000 (15:01 -0700)]
rcu-tasks: Untrack blocked RCU Tasks Trace at reader end
This commit causes rcu_read_unlock_trace() to check for the current
task being on a per-CPU list within the rcu_tasks_percpu structure,
and removes it from that list if so. This has the effect of curtailing
tracking of a task that blocked within an RCU Tasks Trace read-side
critical section once it exits that critical section.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Tue, 17 May 2022 18:30:32 +0000 (11:30 -0700)]
rcu-tasks: Track blocked RCU Tasks Trace readers
This commit places any task that has ever blocked within its current
RCU Tasks Trace read-side critical section on a per-CPU list within the
rcu_tasks_percpu structure. Tasks are removed from this list when they
exit by the exit_tasks_rcu_finish_trace() function. The purpose of this
commit is to provide the information needed to eliminate the current
scan of the full task list.
This commit offsets the INT_MIN value for ->trc_reader_nesting with the
new nesting level in order to avoid queueing tasks that are exiting
their read-side critical sections.
[ paulmck: Apply kernel test robot feedback. ]
[ paulmck: Apply feedback from syzbot+
9bb26e7c5e8e4fa7e641@syzkaller.appspotmail.com ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: syzbot <syzbot+9bb26e7c5e8e4fa7e641@syzkaller.appspotmail.com>
Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Tue, 17 May 2022 00:56:16 +0000 (17:56 -0700)]
rcu-tasks: Add data structures for lightweight grace periods
This commit adds fields to task_struct and to rcu_tasks_percpu that will
be used to avoid the task-list scan for RCU Tasks Trace grace periods,
and also initializes these fields.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Wed, 25 May 2022 16:49:26 +0000 (09:49 -0700)]
rcu-tasks: Make RCU Tasks Trace stall warning handle idle offline tasks
When a CPU is offline, its idle task can appear to be running, but it
cannot be doing anything while CPU-hotplug operations are excluded.
This commit takes advantage of that fact by making trc_check_slow_task()
check for task_curr(t) && cpu_online(task_cpu(t)), and recording
full information in that case.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Wed, 25 May 2022 16:20:59 +0000 (09:20 -0700)]
rcu-tasks: Make RCU Tasks Trace stall warnings print full .b.need_qs field
Currently, the RCU Tasks Trace CPU stall warning simply indicates
whether or not the .b.need_qs field is zero. This commit shows the
three permitted values and flags other values with either "!" or "?".
This is a debugging aid.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Wed, 25 May 2022 03:36:08 +0000 (20:36 -0700)]
rcu-tasks: Flag offline CPUs in RCU Tasks Trace stall warnings
This commit tags offline CPUs with "(offline)" in RCU Tasks Trace CPU
stall warnings. This is a debugging aid.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Tue, 24 May 2022 23:05:15 +0000 (16:05 -0700)]
rcu-tasks: Add slow-IPI indicator to RCU Tasks Trace stall warnings
This commit adds a "I" indicator to the RCU Tasks Trace CPU stall
warning when an IPI directed to a task has thus far failed to arrive.
This serves as a debugging aid.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Wed, 1 Jun 2022 04:38:15 +0000 (21:38 -0700)]
rcu-tasks: Simplify trc_inspect_reader() QS logic
Currently, trc_inspect_reader() does one check for nesting less than
or equal to zero, then sorts out the distinctions within this single
"if" statement. This commit simplifies the logic by providing one
"if" statement for quiescent states (nesting of zero) and another "if"
statement for transitioning from one nesting level to another or the
outermost rcu_read_unlock_trace() (negative nesting).
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Wed, 25 May 2022 03:33:17 +0000 (20:33 -0700)]
rcu-tasks: Make rcu_note_context_switch() unconditionally call rcu_tasks_qs()
This commit makes rcu_note_context_switch() unconditionally invoke the
rcu_tasks_qs() function, as opposed to doing so only when RCU (as opposed
to RCU Tasks Trace) urgently needs a grace period to end.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Wed, 25 May 2022 01:19:17 +0000 (18:19 -0700)]
rcu-tasks: RCU Tasks Trace grace-period kthread has implicit QS
Because the task driving the grace-period kthread is in quiescent state
throughout, this commit excludes it from the list of tasks from which
a quiescent state is needed.
This does mean that attaching a sleepable BPF program to function in
kernel/rcu/tasks.h is a bad idea, by the way.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Wed, 25 May 2022 01:02:40 +0000 (18:02 -0700)]
rcu-tasks: Handle idle tasks for recently offlined CPUs
This commit identifies idle tasks for recently offlined CPUs as residing
in a quiescent state. This is safe only because CPU-hotplug operations
are excluded during these checks.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Tue, 24 May 2022 23:59:52 +0000 (16:59 -0700)]
rcu-tasks: Idle tasks on offline CPUs are in quiescent states
Any idle task corresponding to an offline CPU is in an RCU Tasks Trace
quiescent state. This commit causes rcu_tasks_trace_postscan() to ignore
idle tasks for offline CPUs, which it can do safely due to CPU-hotplug
operations being disabled.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Thu, 26 May 2022 23:12:51 +0000 (16:12 -0700)]
rcu-tasks: Make trc_read_check_handler() fetch ->trc_reader_nesting only once
This commit replaces the pair of READ_ONCE(t->trc_reader_nesting) calls
with a single such call and a local variable. This makes the code's
intent more clear.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Wed, 25 May 2022 20:37:36 +0000 (13:37 -0700)]
rcu-tasks: Remove rcu_tasks_trace_postgp() wait for counter
Now that tasks are not removed from the list until they have responded to
any needed request for a quiescent state, it is no longer necessary to
wait for the trc_n_readers_need_end counter to go to zero. This commit
therefore removes that waiting code.
It is therefore also no longer necessary for rcu_tasks_trace_postgp() to
do the final decrement of this counter, so that code is also removed.
This in turn means that trc_n_readers_need_end counter itself can
be removed, as can the rcu_tasks_trace_iw irq_work structure and the
rcu_read_unlock_iw() function.
[ paulmck: Apply feedback from Zqiang. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Tue, 24 May 2022 03:50:11 +0000 (20:50 -0700)]
rcu-tasks: Merge state into .b.need_qs and atomically update
This commit gets rid of the task_struct structure's ->trc_reader_checked
field, making it instead be a bit within the task_struct structure's
existing ->trc_reader_special.b.need_qs field. This commit also
atomically loads, stores, and checks the resulting combination of the
reader-checked and need-quiescent state flags. This will in turn allow
significant simplification of the rcu_tasks_trace_postgp() function
as well as elimination of the trc_n_readers_need_end counter in later
commits. These changes will in turn simplify later elimination of the
RCU Tasks Trace scan of the task list, which will make RCU Tasks Trace
grace periods less CPU-intensive.
[ paulmck: Apply kernel test robot feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: KP Singh <kpsingh@kernel.org>
Paul E. McKenney [Tue, 19 Apr 2022 22:41:38 +0000 (15:41 -0700)]
rcu-tasks: Drive synchronous grace periods from calling task
This commit causes synchronous grace periods to be driven from the task
invoking synchronize_rcu_*(), allowing these functions to be invoked from
the mid-boot dead zone extending from when the scheduler was initialized
to to point that the various RCU tasks grace-period kthreads are spawned.
This change will allow the self-tests to run in a consistent manner.
Reported-by: Matthew Wilcox <willy@infradead.org>
Reported-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 19 Apr 2022 18:06:03 +0000 (11:06 -0700)]
rcu-tasks: Move synchronize_rcu_tasks_generic() down
This is strictly a code-motion commit that moves the
synchronize_rcu_tasks_generic() down to where it can invoke
rcu_tasks_one_gp() without the need for a forward declaration.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 19 Apr 2022 17:47:28 +0000 (10:47 -0700)]
rcu-tasks: Split rcu_tasks_one_gp() from rcu_tasks_kthread()
This commit abstracts most of the rcu_tasks_kthread() function's loop
body into a new rcu_tasks_one_gp() function. It also introduces
a new ->tasks_gp_mutex to synchronize concurrent calls to this new
rcu_tasks_one_gp() function. This commit is preparation for allowing
RCU tasks grace periods to be driven by the calling task during the
mid-boot dead zone.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Paul E. McKenney [Tue, 7 Dec 2021 00:19:40 +0000 (16:19 -0800)]
rcu-tasks: Check for abandoned callbacks
This commit adds a debugging scan for callbacks that got lost during a
callback-queueing transition.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Linus Torvalds [Sun, 19 Jun 2022 20:06:47 +0000 (15:06 -0500)]
Linux 5.19-rc3
Linus Torvalds [Sun, 19 Jun 2022 14:58:28 +0000 (09:58 -0500)]
Merge tag 'x86-urgent-2022-06-19' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
- Make RESERVE_BRK() work again with older binutils. The recent
'simplification' broke that.
- Make early #VE handling increment RIP when successful.
- Make the #VE code consistent vs. the RIP adjustments and add
comments.
- Handle load_unaligned_zeropad() across page boundaries correctly in
#VE when the second page is shared.
* tag 'x86-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page
x86/tdx: Clarify RIP adjustments in #VE handler
x86/tdx: Fix early #VE handling
x86/mm: Fix RESERVE_BRK() for older binutils
Linus Torvalds [Sun, 19 Jun 2022 14:54:16 +0000 (09:54 -0500)]
Merge tag 'objtool-urgent-2022-06-19' of git://git./linux/kernel/git/tip/tip
Pull build tooling updates from Thomas Gleixner:
- Remove obsolete CONFIG_X86_SMAP reference from objtool
- Fix overlapping text section failures in faddr2line for real
- Remove OBJECT_FILES_NON_STANDARD usage from x86 ftrace and replace it
with finegrained annotations so objtool can validate that code
correctly.
* tag 'objtool-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ftrace: Remove OBJECT_FILES_NON_STANDARD usage
faddr2line: Fix overlapping text section failures, the sequel
objtool: Fix obsolete reference to CONFIG_X86_SMAP
Linus Torvalds [Sun, 19 Jun 2022 14:51:00 +0000 (09:51 -0500)]
Merge tag 'sched-urgent-2022-06-19' of git://git./linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
"A single scheduler fix plugging a race between sched_setscheduler()
and balance_push().
sched_setscheduler() spliced the balance callbacks accross a lock
break which makes it possible for an interleaving schedule() to
observe an empty list"
* tag 'sched-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix balance_push() vs __sched_setscheduler()
Linus Torvalds [Sun, 19 Jun 2022 14:47:41 +0000 (09:47 -0500)]
Merge tag 'locking-urgent-2022-06-19' of git://git./linux/kernel/git/tip/tip
Pull lockdep fix from Thomas Gleixner:
"A RT fix for lockdep.
lockdep invokes prandom_u32() to create cookies. This worked until
prandom_u32() was switched to the real random generator, which takes a
spinlock for extraction, which does not work on RT when invoked from
atomic contexts.
lockdep has no requirement for real random numbers and it turns out
sched_clock() is good enough to create the cookie. That works
everywhere and is faster"
* tag 'locking-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/lockdep: Use sched_clock() for random numbers
Linus Torvalds [Sun, 19 Jun 2022 14:45:16 +0000 (09:45 -0500)]
Merge tag 'irq-urgent-2022-06-19' of git://git./linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A set of interrupt subsystem updates:
Core:
- Ensure runtime power management for chained interrupts
Drivers:
- A collection of OF node refcount fixes
- Unbreak MIPS uniprocessor builds
- Fix xilinx interrupt controller Kconfig dependencies
- Add a missing compatible string to the Uniphier driver"
* tag 'irq-urgent-2022-06-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/loongson-liointc: Use architecture register to get coreid
irqchip/uniphier-aidet: Add compatible string for NX1 SoC
dt-bindings: interrupt-controller/uniphier-aidet: Add bindings for NX1 SoC
irqchip/realtek-rtl: Fix refcount leak in map_interrupts
irqchip/gic-v3: Fix refcount leak in gic_populate_ppi_partitions
irqchip/gic-v3: Fix error handling in gic_populate_ppi_partitions
irqchip/apple-aic: Fix refcount leak in aic_of_ic_init
irqchip/apple-aic: Fix refcount leak in build_fiq_affinity
irqchip/gic/realview: Fix refcount leak in realview_gic_of_init
irqchip/xilinx: Remove microblaze+zynq dependency
genirq: PM: Use runtime PM for chained interrupts
Linus Torvalds [Sun, 19 Jun 2022 14:37:29 +0000 (09:37 -0500)]
Merge tag 'char-misc-5.19-rc3-take2' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes for real from Greg KH:
"Let's tag the proper branch this time...
Here are some small char/misc driver fixes for 5.19-rc3 that resolve
some reported issues.
They include:
- mei driver fixes
- comedi driver fix
- rtsx build warning fix
- fsl-mc-bus driver fix
All of these have been in linux-next for a while with no reported
issues"
This is what the merge in commit
f0ec9c65a8d6 _should_ have merged, but
Greg fat-fingered the pull request and I got some small changes from
linux-next instead there. Credit to Nathan Chancellor for eagle-eyes.
Link: https://lore.kernel.org/all/Yqywy+Md2AfGDu8v@dev-arch.thelio-3990X/
* tag 'char-misc-5.19-rc3-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
bus: fsl-mc-bus: fix KASAN use-after-free in fsl_mc_bus_remove()
mei: me: add raptor lake point S DID
mei: hbm: drop capability response on early shutdown
mei: me: set internal pg flag to off on hardware reset
misc: rtsx: Fix clang -Wsometimes-uninitialized in rts5261_init_from_hw()
comedi: vmk80xx: fix expression for tx buffer size
Linus Torvalds [Sun, 19 Jun 2022 14:35:09 +0000 (09:35 -0500)]
Merge tag 'i2c-for-5.19-rc3' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"MAINTAINERS rectifications and a few minor driver fixes"
* tag 'i2c-for-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: mediatek: Fix an error handling path in mtk_i2c_probe()
i2c: designware: Use standard optional ref clock implementation
MAINTAINERS: core DT include belongs to core
MAINTAINERS: add include/dt-bindings/i2c to I2C SUBSYSTEM HOST DRIVERS
i2c: npcm7xx: Add check for platform_driver_register
MAINTAINERS: Update Synopsys DesignWare I2C to Supported
Linus Torvalds [Sun, 19 Jun 2022 14:24:49 +0000 (09:24 -0500)]
Merge tag 'xfs-5.19-fixes-1' of git://git./fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"There's not a whole lot this time around (I'm still on vacation) but
here are some important fixes for new features merged in -rc1:
- Fix a bug where inode flag changes would accidentally drop nrext64
- Fix a race condition when toggling LARP mode"
* tag 'xfs-5.19-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: preserve DIFLAG2_NREXT64 when setting other inode attributes
xfs: fix variable state usage
xfs: fix TOCTOU race involving the new logged xattrs control knob
Linus Torvalds [Sun, 19 Jun 2022 02:51:12 +0000 (21:51 -0500)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix a variety of bugs, many of which were found by folks using fuzzing
or error injection.
Also fix up how test_dummy_encryption mount option is handled for the
new mount API.
Finally, fix/cleanup a number of comments and ext4 Documentation
files"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix a doubled word "need" in a comment
ext4: add reserved GDT blocks check
ext4: make variable "count" signed
ext4: correct the judgment of BUG in ext4_mb_normalize_request
ext4: fix bug_on ext4_mb_use_inode_pa
ext4: fix up test_dummy_encryption handling for new mount API
ext4: use kmemdup() to replace kmalloc + memcpy
ext4: fix super block checksum incorrect after mount
ext4: improve write performance with disabled delalloc
ext4: fix warning when submitting superblock in ext4_commit_super()
ext4, doc: remove unnecessary escaping
ext4: fix incorrect comment in ext4_bio_write_page()
fs: fix jbd2_journal_try_to_free_buffers() kernel-doc comment
Linus Torvalds [Sun, 19 Jun 2022 02:44:44 +0000 (21:44 -0500)]
Merge tag '5.19-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs client fixes from Steve French:
"Two cifs debugging improvements - one found to deal with debugging a
multichannel problem and one for a recent fallocate issue
This does include the two larger multichannel reconnect (dynamically
adjusting interfaces on reconnect) patches, because we recently found
an additional problem with multichannel to one server type that I want
to include at the same time"
* tag '5.19-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: when a channel is not found for server, log its connection id
smb3: add trace point for SMB2_set_eof
Xiang wangx [Sun, 5 Jun 2022 09:15:03 +0000 (17:15 +0800)]
ext4: fix a doubled word "need" in a comment
Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com>
Link: https://lore.kernel.org/r/20220605091503.12513-1-wangxiang@cdjrlc.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Zhang Yi [Wed, 1 Jun 2022 09:27:17 +0000 (17:27 +0800)]
ext4: add reserved GDT blocks check
We capture a NULL pointer issue when resizing a corrupt ext4 image which
is freshly clear resize_inode feature (not run e2fsck). It could be
simply reproduced by following steps. The problem is because of the
resize_inode feature was cleared, and it will convert the filesystem to
meta_bg mode in ext4_resize_fs(), but the es->s_reserved_gdt_blocks was
not reduced to zero, so could we mistakenly call reserve_backup_gdb()
and passing an uninitialized resize_inode to it when adding new group
descriptors.
mkfs.ext4 /dev/sda 3G
tune2fs -O ^resize_inode /dev/sda #forget to run requested e2fsck
mount /dev/sda /mnt
resize2fs /dev/sda 8G
========
BUG: kernel NULL pointer dereference, address:
0000000000000028
CPU: 19 PID: 3243 Comm: resize2fs Not tainted
5.18.0-rc7-00001-gfde086c5ebfd #748
...
RIP: 0010:ext4_flex_group_add+0xe08/0x2570
...
Call Trace:
<TASK>
ext4_resize_fs+0xbec/0x1660
__ext4_ioctl+0x1749/0x24e0
ext4_ioctl+0x12/0x20
__x64_sys_ioctl+0xa6/0x110
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f2dd739617b
========
The fix is simple, add a check in ext4_resize_begin() to make sure that
the es->s_reserved_gdt_blocks is zero when the resize_inode feature is
disabled.
Cc: stable@kernel.org
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220601092717.763694-1-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Ding Xiang [Mon, 30 May 2022 10:00:47 +0000 (18:00 +0800)]
ext4: make variable "count" signed
Since dx_make_map() may return -EFSCORRUPTED now, so change "count" to
be a signed integer so we can correctly check for an error code returned
by dx_make_map().
Fixes: 46c116b920eb ("ext4: verify dir block before splitting it")
Cc: stable@kernel.org
Signed-off-by: Ding Xiang <dingxiang@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20220530100047.537598-1-dingxiang@cmss.chinamobile.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Baokun Li [Sat, 28 May 2022 11:00:16 +0000 (19:00 +0800)]
ext4: correct the judgment of BUG in ext4_mb_normalize_request
ext4_mb_normalize_request() can move logical start of allocated blocks
to reduce fragmentation and better utilize preallocation. However logical
block requested as a start of allocation (ac->ac_o_ex.fe_logical) should
always be covered by allocated blocks so we should check that by
modifying and to or in the assertion.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220528110017.354175-3-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Baokun Li [Sat, 28 May 2022 11:00:15 +0000 (19:00 +0800)]
ext4: fix bug_on ext4_mb_use_inode_pa
Hulk Robot reported a BUG_ON:
==================================================================
kernel BUG at fs/ext4/mballoc.c:3211!
[...]
RIP: 0010:ext4_mb_mark_diskspace_used.cold+0x85/0x136f
[...]
Call Trace:
ext4_mb_new_blocks+0x9df/0x5d30
ext4_ext_map_blocks+0x1803/0x4d80
ext4_map_blocks+0x3a4/0x1a10
ext4_writepages+0x126d/0x2c30
do_writepages+0x7f/0x1b0
__filemap_fdatawrite_range+0x285/0x3b0
file_write_and_wait_range+0xb1/0x140
ext4_sync_file+0x1aa/0xca0
vfs_fsync_range+0xfb/0x260
do_fsync+0x48/0xa0
[...]
==================================================================
Above issue may happen as follows:
-------------------------------------
do_fsync
vfs_fsync_range
ext4_sync_file
file_write_and_wait_range
__filemap_fdatawrite_range
do_writepages
ext4_writepages
mpage_map_and_submit_extent
mpage_map_one_extent
ext4_map_blocks
ext4_mb_new_blocks
ext4_mb_normalize_request
>>> start + size <= ac->ac_o_ex.fe_logical
ext4_mb_regular_allocator
ext4_mb_simple_scan_group
ext4_mb_use_best_found
ext4_mb_new_preallocation
ext4_mb_new_inode_pa
ext4_mb_use_inode_pa
>>> set ac->ac_b_ex.fe_len <= 0
ext4_mb_mark_diskspace_used
>>> BUG_ON(ac->ac_b_ex.fe_len <= 0);
we can easily reproduce this problem with the following commands:
`fallocate -l100M disk`
`mkfs.ext4 -b 1024 -g 256 disk`
`mount disk /mnt`
`fsstress -d /mnt -l 0 -n 1000 -p 1`
The size must be smaller than or equal to EXT4_BLOCKS_PER_GROUP.
Therefore, "start + size <= ac->ac_o_ex.fe_logical" may occur
when the size is truncated. So start should be the start position of
the group where ac_o_ex.fe_logical is located after alignment.
In addition, when the value of fe_logical or EXT4_BLOCKS_PER_GROUP
is very large, the value calculated by start_off is more accurate.
Cc: stable@kernel.org
Fixes: cd648b8a8fd5 ("ext4: trim allocation requests to group size")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220528110017.354175-2-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Eric Biggers [Thu, 26 May 2022 04:04:12 +0000 (21:04 -0700)]
ext4: fix up test_dummy_encryption handling for new mount API
Since ext4 was converted to the new mount API, the test_dummy_encryption
mount option isn't being handled entirely correctly, because the needed
fscrypt_set_test_dummy_encryption() helper function combines
parsing/checking/applying into one function. That doesn't work well
with the new mount API, which split these into separate steps.
This was sort of okay anyway, due to the parsing logic that was copied
from fscrypt_set_test_dummy_encryption() into ext4_parse_param(),
combined with an additional check in ext4_check_test_dummy_encryption().
However, these overlooked the case of changing the value of
test_dummy_encryption on remount, which isn't allowed but ext4 wasn't
detecting until ext4_apply_options() when it's too late to fail.
Another bug is that if test_dummy_encryption was specified multiple
times with an argument, memory was leaked.
Fix this up properly by using the new helper functions that allow
splitting up the parse/check/apply steps for test_dummy_encryption.
Fixes: cebe85d570cf ("ext4: switch to the new mount api")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20220526040412.173025-1-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Shuqi Zhang [Wed, 25 May 2022 03:01:20 +0000 (11:01 +0800)]
ext4: use kmemdup() to replace kmalloc + memcpy
Replace kmalloc + memcpy with kmemdup()
Signed-off-by: Shuqi Zhang <zhangshuqi3@huawei.com>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220525030120.803330-1-zhangshuqi3@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Ye Bin [Wed, 25 May 2022 01:29:04 +0000 (09:29 +0800)]
ext4: fix super block checksum incorrect after mount
We got issue as follows:
[home]# mount /dev/sda test
EXT4-fs (sda): warning: mounting fs with errors, running e2fsck is recommended
[home]# dmesg
EXT4-fs (sda): warning: mounting fs with errors, running e2fsck is recommended
EXT4-fs (sda): Errors on filesystem, clearing orphan list.
EXT4-fs (sda): recovery complete
EXT4-fs (sda): mounted filesystem with ordered data mode. Quota mode: none.
[home]# debugfs /dev/sda
debugfs 1.46.5 (30-Dec-2021)
Checksum errors in superblock! Retrying...
Reason is ext4_orphan_cleanup will reset ‘s_last_orphan’ but not update
super block checksum.
To solve above issue, defer update super block checksum after
ext4_orphan_cleanup.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Cc: stable@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220525012904.1604737-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Shyam Prasad N [Tue, 14 Jun 2022 11:47:24 +0000 (11:47 +0000)]
cifs: when a channel is not found for server, log its connection id
cifs_ses_get_chan_index gets the index for a given server pointer.
When a match is not found, we warn about a possible bug.
However, printing details about the non-matching server could be
more useful to debug here.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Kirill A. Shutemov [Tue, 14 Jun 2022 12:01:35 +0000 (15:01 +0300)]
x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page
load_unaligned_zeropad() can lead to unwanted loads across page boundaries.
The unwanted loads are typically harmless. But, they might be made to
totally unrelated or even unmapped memory. load_unaligned_zeropad()
relies on exception fixup (#PF, #GP and now #VE) to recover from these
unwanted loads.
In TDX guests, the second page can be shared page and a VMM may configure
it to trigger #VE.
The kernel assumes that #VE on a shared page is an MMIO access and tries to
decode instruction to handle it. In case of load_unaligned_zeropad() it
may result in confusion as it is not MMIO access.
Fix it by detecting split page MMIO accesses and failing them.
load_unaligned_zeropad() will recover using exception fixups.
The issue was discovered by analysis and reproduced artificially. It was
not triggered during testing.
[ dhansen: fix up changelogs and comments for grammar and clarity,
plus incorporate Kirill's off-by-one fix]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20220614120135.14812-4-kirill.shutemov@linux.intel.com
Linus Torvalds [Fri, 17 Jun 2022 20:17:57 +0000 (15:17 -0500)]
Merge tag 'nfs-for-5.19-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
- Add FMODE_CAN_ODIRECT support to NFSv4 so opens don't fail
- Fix trunking detection & cl_max_connect setting
- Avoid pnfs_update_layout() livelocks
- Don't keep retrying pNFS if the server replies with NFS4ERR_UNAVAILABLE
* tag 'nfs-for-5.19-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFSv4: Add FMODE_CAN_ODIRECT after successful open of a NFS4.x file
sunrpc: set cl_max_connect when cloning an rpc_clnt
pNFS: Avoid a live lock condition in pnfs_update_layout()
pNFS: Don't keep retrying if the server replied NFS4ERR_LAYOUTUNAVAILABLE
Linus Torvalds [Fri, 17 Jun 2022 20:12:20 +0000 (15:12 -0500)]
Merge tag 'pci-v5.19-fixes-2' of git://git./linux/kernel/git/helgaas/pci
Pull pci fix from Bjorn Helgaas:
"Revert clipping of PCI host bridge windows to avoid E820 regions,
which broke several machines by forcing unnecessary BAR reassignments
(Hans de Goede)"
* tag 'pci-v5.19-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
x86/PCI: Revert "x86/PCI: Clip only host bridge windows for E820 regions"
Linus Torvalds [Fri, 17 Jun 2022 19:57:42 +0000 (14:57 -0500)]
Merge tag 'printk-for-5.19-rc3' of git://git./linux/kernel/git/printk/linux
Pull printk fixes from Petr Mladek:
"Make the global console_sem available for CPU that is handling panic()
or shutdown.
This is an old problem when an existing console lock owner might block
console output, but it became more visible with the kthreads"
* tag 'printk-for-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: Wait for the global console lock when the system is going down
printk: Block console kthreads when direct printing will be required
Hans de Goede [Sun, 12 Jun 2022 14:43:25 +0000 (16:43 +0200)]
x86/PCI: Revert "x86/PCI: Clip only host bridge windows for E820 regions"
This reverts commit
4c5e242d3e93.
Prior to
4c5e242d3e93 ("x86/PCI: Clip only host bridge windows for E820
regions"), E820 regions did not affect PCI host bridge windows. We only
looked at E820 regions and avoided them when allocating new MMIO space.
If firmware PCI bridge window and BAR assignments used E820 regions, we
left them alone.
After
4c5e242d3e93, we removed E820 regions from the PCI host bridge
windows before looking at BARs, so firmware assignments in E820 regions
looked like errors, and we moved things around to fit in the space left
(if any) after removing the E820 regions. This unnecessary BAR
reassignment broke several machines.
Guilherme reported that Steam Deck fails to boot after
4c5e242d3e93. We
clipped the window that contained most 32-bit BARs:
BIOS-e820: [mem 0x00000000a0000000-0x00000000a00fffff] reserved
acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window] for e820 entry [mem 0xa0000000-0xa00fffff]
which forced us to reassign all those BARs, for example, this NVMe BAR:
pci 0000:00:01.2: PCI bridge to [bus 01]
pci 0000:00:01.2: bridge window [mem 0x80600000-0x806fffff]
pci 0000:01:00.0: BAR 0: [mem 0x80600000-0x80603fff 64bit]
pci 0000:00:01.2: can't claim window [mem 0x80600000-0x806fffff]: no compatible bridge window
pci 0000:01:00.0: can't claim BAR 0 [mem 0x80600000-0x80603fff 64bit]: no compatible bridge window
pci 0000:00:01.2: bridge window: assigned [mem 0xa0100000-0xa01fffff]
pci 0000:01:00.0: BAR 0: assigned [mem 0xa0100000-0xa0103fff 64bit]
All the reassignments were successful, so the devices should have been
functional at the new addresses, but some were not.
Andy reported a similar failure on an Intel MID platform. Benjamin
reported a similar failure on a VMWare Fusion VM.
Note: this is not a clean revert; this revert keeps the later change to
make the clipping dependent on a new pci_use_e820 bool, moving the checking
of this bool to arch_remove_reservations().
[bhelgaas: commit log, add more reporters and testers]
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216109
Reported-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reported-by: Benjamin Coddington <bcodding@redhat.com>
Reported-by: Jongman Heo <jongman.heo@gmail.com>
Fixes: 4c5e242d3e93 ("x86/PCI: Clip only host bridge windows for E820 regions")
Link: https://lore.kernel.org/r/20220612144325.85366-1-hdegoede@redhat.com
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Linus Torvalds [Fri, 17 Jun 2022 18:55:19 +0000 (13:55 -0500)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Revert the moving of the jump labels initialisation before
setup_machine_fdt(). The bug was fixed in drivers/char/random.c.
- Ftrace fixes: branch range check and consistent handling of PLTs.
- Clean rather than invalidate FROM_DEVICE buffers at start of DMA
transfer (safer if such buffer is mapped in user space). A cache
invalidation is done already at the end of the transfer.
- A couple of clean-ups (unexport symbol, remove unused label).
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer
arm64/cpufeature: Unexport set_cpu_feature()
arm64: ftrace: remove redundant label
arm64: ftrace: consistently handle PLTs.
arm64: ftrace: fix branch range checks
Revert "arm64: Initialize jump labels before setup_machine_fdt()"
Linus Torvalds [Fri, 17 Jun 2022 18:50:24 +0000 (13:50 -0500)]
Merge tag 'loongarch-fixes-5.19-2' of git://git./linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Add missing ELF_DETAILS in vmlinux.lds.S and fix document rendering"
* tag 'loongarch-fixes-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
docs/zh_CN/LoongArch: Fix notes rendering by using reST directives
docs/LoongArch: Fix notes rendering by using reST directives
LoongArch: vmlinux.lds.S: Add missing ELF_DETAILS
Linus Torvalds [Fri, 17 Jun 2022 18:45:47 +0000 (13:45 -0500)]
Merge tag 'riscv-for-linus-5.19-rc3' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix for the PolarFire SOC's device tree
- A handful of fixes for the recently added Svpmbt support
- An improvement to the Kconfig text for Svpbmt
* tag 'riscv-for-linus-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Improve description for RISCV_ISA_SVPBMT Kconfig symbol
riscv: drop cpufeature_apply_feature tracking variable
riscv: fix dependency for t-head errata
riscv: dts: microchip: re-add pdma to mpfs device tree
Linus Torvalds [Fri, 17 Jun 2022 18:39:12 +0000 (13:39 -0500)]
Merge tag 'hyperv-fixes-signed-
20220617' of git://git./linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- Fix hv_init_clocksource annotation (Masahiro Yamada)
- Two bug fixes for vmbus driver (Saurabh Sengar)
- Fix SEV negotiation (Tianyu Lan)
- Fix comments in code (Xiang Wang)
- One minor fix to HID driver (Michael Kelley)
* tag 'hyperv-fixes-signed-
20220617' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/Hyper-V: Add SEV negotiate protocol support in Isolation VM
Drivers: hv: vmbus: Release cpu lock in error case
HID: hyperv: Correctly access fields declared as __le16
clocksource: hyper-v: unexport __init-annotated hv_init_clocksource()
Drivers: hv: Fix syntax errors in comments
Drivers: hv: vmbus: Don't assign VMbus channel interrupts to isolated CPUs
Linus Torvalds [Fri, 17 Jun 2022 18:22:58 +0000 (11:22 -0700)]
Merge tag 'block-5.19-2022-06-16' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- NVMe pull request from Christoph
- Quirks, quirks, quirks to work around buggy consumer grade
devices (Keith Bush, Ning Wang, Stefan Reiter, Rasheed Hsueh)
- Better kernel messages for devices that need quirking (Keith
Bush)
- Make a kernel message more useful (Thomas Weißschuh)
- MD pull request from Song, with a few fixes
- blk-mq sysfs locking fixes (Ming)
- BFQ stats fix (Bart)
- blk-mq offline queue fix (Bart)
- blk-mq flush request tag fix (Ming)
* tag 'block-5.19-2022-06-16' of git://git.kernel.dk/linux-block:
block/bfq: Enable I/O statistics
blk-mq: don't clear flush_rq from tags->rqs[]
blk-mq: avoid to touch q->elevator without any protection
blk-mq: protect q->elevator by ->sysfs_lock in blk_mq_elv_switch_none
block: Fix handling of offline queues in blk_mq_alloc_request_hctx()
md/raid5-ppl: Fix argument order in bio_alloc_bioset()
Revert "md: don't unregister sync_thread with reconfig_mutex held"
nvme-pci: disable write zeros support on UMIC and Samsung SSDs
nvme-pci: avoid the deepest sleep state on ZHITAI TiPro7000 SSDs
nvme-pci: sk hynix p31 has bogus namespace ids
nvme-pci: smi has bogus namespace ids
nvme-pci: phison e12 has bogus namespace ids
nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG GAMMIX S50
nvme-pci: add trouble shooting steps for timeouts
nvme: add bug report info for global duplicate id
nvme: add device name to warning in uuid_show()
Linus Torvalds [Fri, 17 Jun 2022 18:14:07 +0000 (11:14 -0700)]
Merge tag 'io_uring-5.19-2022-06-16' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Bigger than usual at this time, both because we missed -rc2, but also
because of some reverts that we chose to do. In detail:
- Adjust mapped buffer API while we still can (Dylan)
- Mapped buffer fixes (Dylan, Hao, Pavel, me)
- Fix for uring_cmd wrong API usage for task_work (Dylan)
- Fix for bug introduced in fixed file closing (Hao)
- Fix race in buffer/file resource handling (Pavel)
- Revert the NOP support for CQE32 and buffer selection that was
brought up during the merge window (Pavel)
- Remove IORING_CLOSE_FD_AND_FILE_SLOT introduced in this merge
window. The API needs further refining, so just yank it for now and
we'll revisit for a later kernel.
- Series cleaning up the CQE32 support added in this merge window,
making it more integrated rather than sitting on the side (Pavel)"
* tag 'io_uring-5.19-2022-06-16' of git://git.kernel.dk/linux-block: (21 commits)
io_uring: recycle provided buffer if we punt to io-wq
io_uring: do not use prio task_work_add in uring_cmd
io_uring: commit non-pollable provided mapped buffers upfront
io_uring: make io_fill_cqe_aux honour CQE32
io_uring: remove __io_fill_cqe() helper
io_uring: fix ->extra{1,2} misuse
io_uring: fill extra big cqe fields from req
io_uring: unite fill_cqe and the 32B version
io_uring: get rid of __io_fill_cqe{32}_req()
io_uring: remove IORING_CLOSE_FD_AND_FILE_SLOT
Revert "io_uring: add buffer selection support to IORING_OP_NOP"
Revert "io_uring: support CQE32 for nop operation"
io_uring: limit size of provided buffer ring
io_uring: fix types in provided buffer ring
io_uring: fix index calculation
io_uring: fix double unlock for pbuf select
io_uring: kbuf: fix bug of not consuming ring buffer in partial io case
io_uring: openclose: fix bug of closing wrong fixed file
io_uring: fix not locked access to fixed buf table
io_uring: fix races with buffer table unregister
...
Will Deacon [Fri, 10 Jun 2022 15:12:27 +0000 (16:12 +0100)]
arm64: mm: Don't invalidate FROM_DEVICE buffers at start of DMA transfer
Invalidating the buffer memory in arch_sync_dma_for_device() for
FROM_DEVICE transfers
When using the streaming DMA API to map a buffer prior to inbound
non-coherent DMA (i.e. DMA_FROM_DEVICE), we invalidate any dirty CPU
cachelines so that they will not be written back during the transfer and
corrupt the buffer contents written by the DMA. This, however, poses two
potential problems:
(1) If the DMA transfer does not write to every byte in the buffer,
then the unwritten bytes will contain stale data once the transfer
has completed.
(2) If the buffer has a virtual alias in userspace, then stale data
may be visible via this alias during the period between performing
the cache invalidation and the DMA writes landing in memory.
Address both of these issues by cleaning (aka writing-back) the dirty
lines in arch_sync_dma_for_device(DMA_FROM_DEVICE) instead of discarding
them using invalidation.
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220606152150.GA31568@willie-the-truck
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220610151228.4562-2-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Linus Torvalds [Fri, 17 Jun 2022 17:09:24 +0000 (10:09 -0700)]
Merge tag 'fs_for_v5.19-rc3' of git://git./linux/kernel/git/jack/linux-fs
Pull writeback and ext2 fixes from Jan Kara:
"A fix for writeback bug which prevented machines with kdevtmpfs from
booting and also one small ext2 bugfix in IO error handling"
* tag 'fs_for_v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
init: Initialize noop_backing_dev_info early
ext2: fix fs corruption when trying to remove a non-empty directory with IO error
Linus Torvalds [Fri, 17 Jun 2022 17:03:53 +0000 (10:03 -0700)]
Merge tag 'for-5.19/dm-fixes-3' of git://git./linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix a race in DM core's dm_start_io_acct that could result in double
accounting for abnormal IO (e.g. discards, write zeroes, etc).
- Fix a use-after-free in DM core's dm_put_live_table_bio.
- Fix a race for REQ_NOWAIT bios being issued despite no support from
underlying DM targets (due to DM table reload at an "unlucky" time)
- Fix access beyond allocated bitmap in DM mirror's log.
* tag 'for-5.19/dm-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm mirror log: round up region bitmap size to BITS_PER_LONG
dm: fix narrow race for REQ_NOWAIT bios being issued despite no support
dm: fix use-after-free in dm_put_live_table_bio
dm: fix race in dm_start_io_acct
Linus Torvalds [Fri, 17 Jun 2022 17:02:26 +0000 (10:02 -0700)]
Merge tag 'hwmon-for-v5.19-rc3' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Add missing lock protection in occ driver
- Add missing comma in board name list in asus-ec-sensors driver
- Fix devicetree bindings for ti,tmp401
* tag 'hwmon-for-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (asus-ec-sensors) add missing comma in board name list.
hwmon: (occ) Lock mutex in shutdown to prevent race with occ_active
dt-bindings: hwmon: ti,tmp401: Drop 'items' from 'ti,n-factor' property
Linus Torvalds [Fri, 17 Jun 2022 17:00:25 +0000 (10:00 -0700)]
Merge tag 'linux-watchdog-5.19-rc3' of git://linux-watchdog.org/linux-watchdog
Pull watchdog fix from Wim Van Sebroeck:
"Add missing MODULE_LICENSE in gxp driver"
* tag 'linux-watchdog-5.19-rc3' of git://www.linux-watchdog.org/linux-watchdog:
watchdog: gxp: Add missing MODULE_LICENSE
Linus Torvalds [Fri, 17 Jun 2022 15:27:27 +0000 (08:27 -0700)]
Merge tag 'v5.19-p2' of git://git./linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"This fixes a potential build failure when CRYPTO=m"
* tag 'v5.19-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: memneq - move into lib/
Linus Torvalds [Fri, 17 Jun 2022 14:58:39 +0000 (07:58 -0700)]
Merge tag 'char-misc-5.19-rc3' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc driver fixes for 5.19-rc3 that resolve
some reported issues.
They include:
- mei driver fixes
- comedi driver fix
- rtsx build warning fix
- fsl-mc-bus driver fix
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
eeprom: at25: Split reads into chunks and cap write size
misc: atmel-ssc: Fix IRQ check in ssc_probe
char: lp: remove redundant initialization of err
Linus Torvalds [Fri, 17 Jun 2022 14:55:24 +0000 (07:55 -0700)]
Merge tag 'staging-5.19-rc3' of git://git./linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are some small staging driver fixes for 5.19-rc3 that resolve
reported issues:
- remove visorbus.h which was forgotten in the -rc1 merge where the
code that used it was removed
- olpc_dcon: mark as broken to allow the DRM developers to evolve the
fbdev api properly without having to deal with this obsolete
driver. It will be removed soon if no one steps up to adopt it and
fix the issues with it.
- rtl8723bs driver fix
- r8188eu driver fix to resolve many reports of the driver being
broken with -rc1.
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: Also remove the Unisys visorbus.h
staging: rtl8723bs: Allocate full pwep structure
staging: olpc_dcon: mark driver as broken
staging: r8188eu: Fix warning of array overflow in ioctl_linux.c
staging: r8188eu: fix rtw_alloc_hwxmits error detection for now
Linus Torvalds [Fri, 17 Jun 2022 14:52:43 +0000 (07:52 -0700)]
Merge tag 'tty-5.19-rc3' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial driver fixes from Greg KH:
"Here are some small tty and serial driver fixes for 5.19-rc3 to
resolve some reported problems:
- 8250 lsr read bugfix
- n_gsm line discipline allocation fix
- qcom serial driver fix for reported lockups that happened in -rc1
- goldfish tty driver fix
All have been in linux-next for a while now with no reported issues"
* tag 'tty-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: 8250: Store to lsr_save_flags after lsr read
tty: goldfish: Fix free_irq() on remove
tty: serial: qcom-geni-serial: Implement start_rx callback
serial: core: Introduce callback for start_rx and do stop_rx in suspend only if this callback implementation is present.
tty: n_gsm: Debug output allocation must use GFP_ATOMIC
Linus Torvalds [Fri, 17 Jun 2022 14:50:41 +0000 (07:50 -0700)]
Merge tag 'usb-5.19-rc3' of git://git./linux/kernel/git/gregkh/usb
Pull USB driver fixes from Greg KH:
"Here are some small USB driver fixes and new device ids for 5.19-rc3
They include:
- new usb-serial driver device ids
- usb gadget driver fixes for reported problems
- cdnsp driver fix
- dwc3 driver fixes for reported problems
- dwc3 driver fix for merge problem that I caused in 5.18
- xhci driver fixes
- dwc2 memory leak fix
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: gadget: f_fs: change ep->ep safe in ffs_epfile_io()
usb: gadget: f_fs: change ep->status safe in ffs_epfile_io()
xhci: Fix null pointer dereference in resume if xhci has only one roothub
USB: fixup for merge issue with "usb: dwc3: Don't switch OTG -> peripheral if extcon is present"
usb: cdnsp: Fixed setting last_trb incorrectly
usb: gadget: u_ether: fix regression in setting fixed MAC address
usb: gadget: lpc32xx_udc: Fix refcount leak in lpc32xx_udc_probe
usb: dwc2: Fix memory leak in dwc2_hcd_init
usb: dwc3: pci: Restore line lost in merge conflict resolution
usb: dwc3: gadget: Fix IN endpoint max packet size allocation
USB: serial: option: add support for Cinterion MV31 with new baseline
USB: serial: io_ti: add Agilent E5805A support
Petr Mladek [Fri, 17 Jun 2022 14:36:48 +0000 (16:36 +0200)]
Merge branch 'rework/kthreads' into for-linus
Yanteng Si [Fri, 17 Jun 2022 12:47:55 +0000 (20:47 +0800)]
docs/zh_CN/LoongArch: Fix notes rendering by using reST directives
Notes are better expressed with reST admonitions.
Fixes: f23b22599f8e ("Documentation/zh_CN: Add basic LoongArch documentations")
Reviewed-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Yanteng Si [Fri, 17 Jun 2022 12:47:54 +0000 (20:47 +0800)]
docs/LoongArch: Fix notes rendering by using reST directives
Notes are better expressed with reST admonitions.
Fixes: 0ea8ce61cb2c ("Documentation: LoongArch: Add basic documentations")
Reviewed-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Youling Tang [Mon, 13 Jun 2022 10:54:12 +0000 (18:54 +0800)]
LoongArch: vmlinux.lds.S: Add missing ELF_DETAILS
Commit
c604abc3f6e ("vmlinux.lds.h: Split ELF_DETAILS from STABS_DEBUG")
splits ELF_DETAILS from STABS_DEBUG, resulting in missing ELF_DETAILS
information in LoongArch architecture, so add it.
Fixes: c604abc3f6e ("vmlinux.lds.h: Split ELF_DETAILS from STABS_DEBUG")
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Jens Axboe [Fri, 17 Jun 2022 12:24:26 +0000 (06:24 -0600)]
io_uring: recycle provided buffer if we punt to io-wq
io_arm_poll_handler() will recycle the buffer appropriately if we end
up arming poll (or if we're ready to retry), but not for the io-wq case
if we have attempted poll first.
Explicitly recycle the buffer to avoid both hanging on to it too long,
but also to avoid multiple reads grabbing the same one. This can happen
for ring mapped buffers, since it hasn't necessarily been committed.
Fixes: c7fb19428d67 ("io_uring: add support for ring mapped supplied buffers")
Link: https://github.com/axboe/liburing/issues/605
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Fri, 17 Jun 2022 04:39:51 +0000 (21:39 -0700)]
Merge tag 'drm-fixes-2022-06-17' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Regular drm fixes for rc3. Nothing too serious, i915, amdgpu and
exynos all have a few small driver fixes, and two ttm fixes, and one
compiler warning.
atomic:
- fix spurious compiler warning
ttm:
- add NULL ptr check in swapout code
- fix bulk move handling
i915:
- Fix page fault on error state read
- Fix memory leaks in per-gt sysfs
- Fix multiple fence handling
- Remove accidental static from a local variable
amdgpu:
- Fix regression in GTT size reporting
- OLED backlight fix
exynos:
- Check a null pointer instead of IS_ERR()
- Rework initialization code of Exynos MIC driver"
* tag 'drm-fixes-2022-06-17' of git://anongit.freedesktop.org/drm/drm:
drm/amd/display: Cap OLED brightness per max frame-average luminance
drm/amdgpu: Fix GTT size reporting in amdgpu_ioctl
drm/exynos: mic: Rework initialization
drm/exynos: fix IS_ERR() vs NULL check in probe
drm/ttm: fix bulk move handling v2
drm/i915/uc: remove accidental static from a local variable
drm/i915: Individualize fences before adding to dma_resv obj
drm/i915/gt: Fix memory leaks in per-gt sysfs
drm/i915/reset: Fix error_state_read ptr + offset use
drm/ttm: fix missing NULL check in ttm_device_swapout
drm/atomic: fix warning of unused variable
Dave Airlie [Fri, 17 Jun 2022 01:32:22 +0000 (11:32 +1000)]
Merge tag 'exynos-drm-fixes-v5.19-rc3' of git://git./linux/kernel/git/daeinki/drm-exynos into drm-fixes
two regression fixups
- Check a null pointer instead of IS_ERR().
- Rework initialization code of Exynos MIC driver.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Inki Dae <inki.dae@samsung.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220614141336.88614-1-inki.dae@samsung.com
Dave Airlie [Fri, 17 Jun 2022 01:17:37 +0000 (11:17 +1000)]
Merge tag 'amd-drm-fixes-5.19-2022-06-15' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.19-2022-06-15:
amdgpu:
- Fix regression in GTT size reporting
- OLED backlight fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220615205609.28763-1-alexander.deucher@amd.com
Dave Airlie [Fri, 17 Jun 2022 00:24:42 +0000 (10:24 +1000)]
Merge tag 'drm-intel-fixes-2022-06-16' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v5.19-rc3:
- Fix page fault on error state read
- Fix memory leaks in per-gt sysfs
- Fix multiple fence handling
- Remove accidental static from a local variable
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/8735g5xd25.fsf@intel.com
Mikulas Patocka [Thu, 16 Jun 2022 17:28:57 +0000 (13:28 -0400)]
dm mirror log: round up region bitmap size to BITS_PER_LONG
The code in dm-log rounds up bitset_size to 32 bits. It then uses
find_next_zero_bit_le on the allocated region. find_next_zero_bit_le
accesses the bitmap using unsigned long pointers. So, on 64-bit
architectures, it may access 4 bytes beyond the allocated size.
Fix this bug by rounding up bitset_size to BITS_PER_LONG.
This bug was found by running the lvm2 testsuite with kasan.
Fixes: 29121bd0b00e ("[PATCH] dm mirror log: bitset_size fix")
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mikulas Patocka [Thu, 16 Jun 2022 18:14:39 +0000 (14:14 -0400)]
dm: fix narrow race for REQ_NOWAIT bios being issued despite no support
Starting with the commit
63a225c9fd20, device mapper has an optimization
that it will take cheaper table lock (dm_get_live_table_fast instead of
dm_get_live_table) if the bio has REQ_NOWAIT. The bios with REQ_NOWAIT
must not block in the target request routine, if they did, we would be
blocking while holding rcu_read_lock, which is prohibited.
The targets that are suitable for REQ_NOWAIT optimization (and that don't
block in the map routine) have the flag DM_TARGET_NOWAIT set. Device
mapper will test if all the targets and all the devices in a table
support nowait (see the function dm_table_supports_nowait) and it will set
or clear the QUEUE_FLAG_NOWAIT flag on its request queue according to
this check.
There's a test in submit_bio_noacct: "if ((bio->bi_opf & REQ_NOWAIT) &&
!blk_queue_nowait(q)) goto not_supported" - this will make sure that
REQ_NOWAIT bios can't enter a request queue that doesn't support them.
This mechanism works to prevent REQ_NOWAIT bios from reaching dm targets
that don't support the REQ_NOWAIT flag (and that may block in the map
routine) - except that there is a small race condition:
submit_bio_noacct checks if the queue has the QUEUE_FLAG_NOWAIT without
holding any locks. Immediatelly after this check, the device mapper table
may be reloaded with a table that doesn't support REQ_NOWAIT (for example,
if we start moving the logical volume or if we activate a snapshot).
However the REQ_NOWAIT bio that already passed the check in
submit_bio_noacct would be sent to device mapper, where it could be
redirected to a dm target that doesn't support REQ_NOWAIT - the result is
sleeping while we hold rcu_read_lock.
In order to fix this race, we double-check if the target supports
REQ_NOWAIT while we hold the table lock (so that the table can't change
under us).
Fixes: 563a225c9fd2 ("dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mikulas Patocka [Thu, 16 Jun 2022 17:21:27 +0000 (13:21 -0400)]
dm: fix use-after-free in dm_put_live_table_bio
dm_put_live_table_bio is called from the end of dm_submit_bio.
However, at this point, the bio may be already finished and the caller
may have freed the bio. Consequently, dm_put_live_table_bio accesses
the stale "bio" pointer.
Fix this bug by loading the bi_opf value and passing it to
dm_get_live_table_bio and dm_put_live_table_bio instead of the bio.
This bug was found by running the lvm2 testsuite with kasan.
Fixes: 563a225c9fd2 ("dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Dave Airlie [Thu, 16 Jun 2022 23:31:22 +0000 (09:31 +1000)]
Merge tag 'drm-misc-fixes-2022-06-16' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Two fixes for TTM, one for a NULL pointer dereference and one to make sure
the buffer is pinned prior to a bulk move, and a fix for a spurious
compiler warning.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220616072519.qwrsefsemejefowu@houat
Steve French [Thu, 16 Jun 2022 03:40:23 +0000 (22:40 -0500)]
smb3: add trace point for SMB2_set_eof
In order to debug problems with file size being reported incorrectly
temporarily (in this case xfstest generic/584 intermittent failure)
we need to add trace point for the non-compounded code path where
we set the file size (SMB2_set_eof). The new trace point is:
"smb3_set_eof"
Here is sample output from the tracepoint:
TASK-PID CPU# ||||| TIMESTAMP FUNCTION
| | | ||||| | |
xfs_io-75403 [002] ..... 95219.189835: smb3_set_eof: xid=221 sid=0xeef1cbd2 tid=0x27079ee6 fid=0x52edb58c offset=0x100000
aio-dio-append--75418 [010] ..... 95219.242402: smb3_set_eof: xid=226 sid=0xeef1cbd2 tid=0x27079ee6 fid=0xae89852d offset=0x0
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Bart Van Assche [Mon, 13 Jun 2022 16:32:34 +0000 (09:32 -0700)]
block/bfq: Enable I/O statistics
BFQ uses io_start_time_ns. That member variable is only set if I/O
statistics are enabled. Hence this patch that enables I/O statistics
at the time BFQ is associated with a request queue.
Compile-tested only.
Reported-by: Cixi Geng <cixi.geng1@unisoc.com>
Cc: Cixi Geng <cixi.geng1@unisoc.com>
Cc: Yu Kuai <yukuai3@huawei.com>
Cc: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Thu, 16 Jun 2022 22:53:38 +0000 (15:53 -0700)]
Merge tag 'audit-pr-
20220616' of git://git./linux/kernel/git/pcmoore/audit
Pull audit fix from Paul Moore:
"A single audit patch to fix a problem where we were not properly
freeing memory allocated when recording information related to a
module load"
* tag 'audit-pr-
20220616' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: free module name
Linus Torvalds [Thu, 16 Jun 2022 22:50:36 +0000 (15:50 -0700)]
Merge tag 'selinux-pr-
20220616' of git://git./linux/kernel/git/pcmoore/selinux
Pull selinux fix from Paul Moore:
"A single SELinux patch to fix memory leaks when mounting filesystems
with SELinux mount options"
* tag 'selinux-pr-
20220616' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: free contexts previously transferred in selinux_add_opt()
Palmer Dabbelt [Thu, 16 Jun 2022 22:48:39 +0000 (15:48 -0700)]
RISC-V: Some Svpbmt fixes
Some additionals comments and notes from autobuilders received after the
series got applied, warranted some changes.
* commit '
924cbb8cbe3460ea192e6243017ceb0ceb255b1b':
riscv: Improve description for RISCV_ISA_SVPBMT Kconfig symbol
riscv: drop cpufeature_apply_feature tracking variable
riscv: fix dependency for t-head errata
Heiko Stuebner [Thu, 26 May 2022 20:56:43 +0000 (22:56 +0200)]
riscv: Improve description for RISCV_ISA_SVPBMT Kconfig symbol
This improves the symbol's description to make it easier for
people to understand what it is about.
Suggested-by: Christoph Hellwig <hch@lst.de>
Suggested-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20220526205646.258337-3-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Heiko Stuebner [Thu, 26 May 2022 20:56:42 +0000 (22:56 +0200)]
riscv: drop cpufeature_apply_feature tracking variable
The variable was tracking which feature patches got applied
but that information was never actually used - and thus resulted
in a warning as well.
Drop the variable.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20220526205646.258337-2-heiko@sntech.de
Fixes: ff689fd21cb1 ("riscv: add RISC-V Svpbmt extension support")
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Heiko Stuebner [Thu, 26 May 2022 20:56:45 +0000 (22:56 +0200)]
riscv: fix dependency for t-head errata
alternatives only work correctly on non-xip-kernels and while the
selected alternative-symbol has the correct dependency the symbol
selecting it also needs that dependency.
So add the missing dependency to the T-Head errata Kconfig symbol.
Reported-by: kernel test robot <yujie.liu@intel.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20220526205646.258337-5-heiko@sntech.de
Fixes: a35707c3d850 ("riscv: add memory-type errata for T-Head")
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Palmer Dabbelt [Thu, 16 Jun 2022 22:13:10 +0000 (15:13 -0700)]
Merge tag 'dt-fixes-for-palmer-5.19-rc3' of ssh://gitolite./linux/kernel/git/conor/linux into fixes
Microchip RISC-V devicetree fixes for 5.19-rc3
A single fix for mpfs.dtsi:
- The sifive pdma entry fell through the cracks between versions of my
dt patches & I gave Zong the wrong conflict resolution, so it is
added back.
* tag 'dt-fixes-for-palmer-5.19-rc3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/conor/linux:
riscv: dts: microchip: re-add pdma to mpfs device tree
Ming Lei [Thu, 16 Jun 2022 01:44:01 +0000 (09:44 +0800)]
blk-mq: don't clear flush_rq from tags->rqs[]
commit
364b61818f65 ("blk-mq: clearing flush request reference in
tags->rqs[]") is added to clear the to-be-free flush request from
tags->rqs[] for avoiding use-after-free on the flush rq.
Yu Kuai reported that blk_mq_clear_flush_rq_mapping() slows down boot time
by ~8s because running scsi probe which may create and remove lots of
unpresent LUNs on megaraid-sas which uses BLK_MQ_F_TAG_HCTX_SHARED and
each request queue has lots of hw queues.
Improve the situation by not running blk_mq_clear_flush_rq_mapping if
disk isn't added when there can't be any flush request issued.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reported-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220616014401.817001-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Thu, 16 Jun 2022 01:44:00 +0000 (09:44 +0800)]
blk-mq: avoid to touch q->elevator without any protection
q->elevator is referred in blk_mq_has_sqsched() without any protection,
no .q_usage_counter is held, no queue srcu and rcu read lock is held,
so potential use-after-free may be triggered.
Fix the issue by adding one queue flag for checking if the elevator
uses single queue style dispatch. Meantime the elevator feature flag
of ELEVATOR_F_MQ_AWARE isn't needed any more.
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220616014401.817001-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Thu, 16 Jun 2022 01:43:59 +0000 (09:43 +0800)]
blk-mq: protect q->elevator by ->sysfs_lock in blk_mq_elv_switch_none
elevator can be tore down by sysfs switch interface or disk release, so
hold ->sysfs_lock before referring to q->elevator, then potential
use-after-free can be avoided.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220616014401.817001-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bart Van Assche [Wed, 15 Jun 2022 21:00:04 +0000 (14:00 -0700)]
block: Fix handling of offline queues in blk_mq_alloc_request_hctx()
This patch prevents that test nvme/004 triggers the following:
UBSAN: array-index-out-of-bounds in block/blk-mq.h:135:9
index 512 is out of range for type 'long unsigned int [512]'
Call Trace:
show_stack+0x52/0x58
dump_stack_lvl+0x49/0x5e
dump_stack+0x10/0x12
ubsan_epilogue+0x9/0x3b
__ubsan_handle_out_of_bounds.cold+0x44/0x49
blk_mq_alloc_request_hctx+0x304/0x310
__nvme_submit_sync_cmd+0x70/0x200 [nvme_core]
nvmf_connect_io_queue+0x23e/0x2a0 [nvme_fabrics]
nvme_loop_connect_io_queues+0x8d/0xb0 [nvme_loop]
nvme_loop_create_ctrl+0x58e/0x7d0 [nvme_loop]
nvmf_create_ctrl+0x1d7/0x4d0 [nvme_fabrics]
nvmf_dev_write+0xae/0x111 [nvme_fabrics]
vfs_write+0x144/0x560
ksys_write+0xb7/0x140
__x64_sys_write+0x42/0x50
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Fixes: 20e4d8139319 ("blk-mq: simplify queue mapping & schedule with each possisble CPU")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220615210004.1031820-1-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Thu, 16 Jun 2022 18:51:32 +0000 (11:51 -0700)]
Merge tag 'net-5.19-rc3' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Mostly driver fixes.
Current release - regressions:
- Revert "net: Add a second bind table hashed by port and address",
needs more work
- amd-xgbe: use platform_irq_count(), static setup of IRQ resources
had been removed from DT core
- dts: at91: ksz9477_evb: add phy-mode to fix port/phy validation
Current release - new code bugs:
- hns3: modify the ring param print info
Previous releases - always broken:
- axienet: make the 64b addressable DMA depends on 64b architectures
- iavf: fix issue with MAC address of VF shown as zero
- ice: fix PTP TX timestamp offset calculation
- usb: ax88179_178a needs FLAG_SEND_ZLP
Misc:
- document some net.sctp.* sysctls"
* tag 'net-5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (31 commits)
net: axienet: add missing error return code in axienet_probe()
Revert "net: Add a second bind table hashed by port and address"
net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsg
net: usb: ax88179_178a needs FLAG_SEND_ZLP
MAINTAINERS: add include/dt-bindings/net to NETWORKING DRIVERS
ARM: dts: at91: ksz9477_evb: fix port/phy validation
net: bgmac: Fix an erroneous kfree() in bgmac_remove()
ice: Fix memory corruption in VF driver
ice: Fix queue config fail handling
ice: Sync VLAN filtering features for DVM
ice: Fix PTP TX timestamp offset calculation
mlxsw: spectrum_cnt: Reorder counter pools
docs: networking: phy: Fix a typo
amd-xgbe: Use platform_irq_count()
octeontx2-vf: Add support for adaptive interrupt coalescing
xilinx: Fix build on x86.
net: axienet: Use iowrite64 to write all 64b descriptor pointers
net: axienet: make the 64b addresable DMA depends on 64b archectures
net: hns3: fix tm port shapping of fibre port is incorrect after driver initialization
net: hns3: fix PF rss size initialization bug
...
Yang Yingliang [Thu, 16 Jun 2022 06:29:17 +0000 (14:29 +0800)]
net: axienet: add missing error return code in axienet_probe()
It should return error code in error path in axienet_probe().
Fixes: 00be43a74ca2 ("net: axienet: make the 64b addresable DMA depends on 64b archectures")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20220616062917.3601-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>