linux.git
2 years agoMerge branches 'doc.2022.08.31b', 'fixes.2022.08.31b', 'kvfree.2022.08.31b', 'nocb...
Paul E. McKenney [Thu, 1 Sep 2022 17:55:57 +0000 (10:55 -0700)]
Merge branches 'doc.2022.08.31b', 'fixes.2022.08.31b', 'kvfree.2022.08.31b', 'nocb.2022.09.01a', 'poll.2022.08.31b', 'poll-srcu.2022.08.31b' and 'tasks.2022.08.31b' into HEAD

doc.2022.08.31b: Documentation updates
fixes.2022.08.31b: Miscellaneous fixes
kvfree.2022.08.31b: kvfree_rcu() updates
nocb.2022.09.01a: NOCB CPU updates
poll.2022.08.31b: Full-oldstate RCU polling grace-period API
poll-srcu.2022.08.31b: Polled SRCU grace-period updates
tasks.2022.08.31b: Tasks RCU updates

2 years agorcutorture: Use the barrier operation specified by cur_ops
Zqiang [Sun, 31 Jul 2022 10:53:56 +0000 (18:53 +0800)]
rcutorture: Use the barrier operation specified by cur_ops

The rcutorture_oom_notify() function unconditionally invokes
rcu_barrier(), which is OK when the rcutorture.torture_type value is
"rcu", but unhelpful otherwise.  The purpose of these barrier calls is to
wait for all outstanding callback-flooding callbacks to be invoked before
cleaning up their data.  Using the wrong barrier function therefore
risks arbitrary memory corruption.  Thus, this commit changes these
rcu_barrier() calls into cur_ops->cb_barrier() to make things work when
torturing non-vanilla flavors of RCU.

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu-tasks: Make RCU Tasks Trace check for userspace execution
Zqiang [Tue, 19 Jul 2022 04:39:00 +0000 (12:39 +0800)]
rcu-tasks: Make RCU Tasks Trace check for userspace execution

Userspace execution is a valid quiescent state for RCU Tasks Trace,
but the scheduling-clock interrupt does not currently report such
quiescent states.

Of course, the scheduling-clock interrupt is not strictly speaking
userspace execution.  However, the only way that this code is not
in a quiescent state is if something invoked rcu_read_lock_trace(),
and that would be reflected in the ->trc_reader_nesting field in
the task_struct structure.  Furthermore, this field is checked by
rcu_tasks_trace_qs(), which is invoked by rcu_tasks_qs() which is in
turn invoked by rcu_note_voluntary_context_switch() in kernels building
at least one of the RCU Tasks flavors.  It is therefore safe to invoke
rcu_tasks_trace_qs() from the rcu_sched_clock_irq().

But rcu_tasks_qs() also invokes rcu_tasks_classic_qs() for RCU
Tasks, which lacks the read-side markers provided by RCU Tasks Trace.
This raises the possibility that an RCU Tasks grace period could start
after the interrupt from userspace execution, but before the call to
rcu_sched_clock_irq().  However, it turns out that this is safe because
the RCU Tasks grace period waits for an RCU grace period, which will
wait for the entire scheduling-clock interrupt handler, including any
RCU Tasks read-side critical section that this handler might contain.

This commit therefore updates the rcu_sched_clock_irq() function's
check for usermode execution and its call to rcu_tasks_classic_qs()
to instead check for both usermode execution and interrupt from idle,
and to instead call rcu_note_voluntary_context_switch().  This
consolidates code and provides more faster RCU Tasks Trace
reporting of quiescent states in kernels that do scheduling-clock
interrupts for userspace execution.

[ paulmck: Consolidate checks into rcu_sched_clock_irq(). ]

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu-tasks: Ensure RCU Tasks Trace loops have quiescent states
Paul E. McKenney [Mon, 18 Jul 2022 17:57:26 +0000 (10:57 -0700)]
rcu-tasks: Ensure RCU Tasks Trace loops have quiescent states

The RCU Tasks Trace grace-period kthread loops across all CPUs, and
there can be quite a few CPUs, with some commercially available systems
sporting well over a thousand of them.  Some of these loops can feature
IPIs, which can take some time.  This commit therefore places a call to
cond_resched_tasks_rcu_qs() in each such loop.

Link: https://docs.google.com/document/d/1V0YnG1HTWMt9WHJjroiJL9lf-hMrud4v8Fn3fhyY0cI/edit?usp=sharing
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu-tasks: Convert RCU_LOCKDEP_WARN() to WARN_ONCE()
Zqiang [Tue, 12 Jul 2022 08:26:05 +0000 (16:26 +0800)]
rcu-tasks: Convert RCU_LOCKDEP_WARN() to WARN_ONCE()

Kernels built with CONFIG_PROVE_RCU=y and CONFIG_DEBUG_LOCK_ALLOC=y
attempt to emit a warning when the synchronize_rcu_tasks_generic()
function is called during early boot while the rcu_scheduler_active
variable is RCU_SCHEDULER_INACTIVE.  However the warnings is not
actually be printed because the debug_lockdep_rcu_enabled() returns
false, exactly because the rcu_scheduler_active variable is still equal
to RCU_SCHEDULER_INACTIVE.

This commit therefore replaces RCU_LOCKDEP_WARN() with WARN_ONCE()
to force these warnings to actually be printed.

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agosrcu: Make Tiny SRCU use full-sized grace-period counters
Paul E. McKenney [Tue, 2 Aug 2022 22:32:47 +0000 (15:32 -0700)]
srcu: Make Tiny SRCU use full-sized grace-period counters

This commit makes Tiny SRCU use full-sized grace-period counters to
further avoid counter-wrap issues when using polled grace-period APIs.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agosrcu: Make Tiny SRCU poll_state_synchronize_srcu() more precise
Paul E. McKenney [Tue, 2 Aug 2022 18:59:49 +0000 (11:59 -0700)]
srcu: Make Tiny SRCU poll_state_synchronize_srcu() more precise

This commit applies the more-precise grace-period-state check used by
rcu_seq_done_exact() to poll_state_synchronize_srcu().  This is important
because Tiny SRCU uses a 16-bit counter, which can wrap quite quickly.
If counter wrap continues to be a problem, then expanding ->srcu_idx
and ->srcu_idx_max to 32 bits might be warranted.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agosrcu: Add GP and maximum requested GP to Tiny SRCU rcutorture output
Paul E. McKenney [Tue, 2 Aug 2022 17:18:23 +0000 (10:18 -0700)]
srcu: Add GP and maximum requested GP to Tiny SRCU rcutorture output

This commit adds the ->srcu_idx and ->srcu_max_idx fields to the Tiny
SRCU rcutorture output for additional diagnostics.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcutorture: Make "srcud" option also test polled grace-period API
Paul E. McKenney [Wed, 27 Jul 2022 23:16:41 +0000 (16:16 -0700)]
rcutorture: Make "srcud" option also test polled grace-period API

This commit brings the "srcud" (dynamically allocated) SRCU test in line
with the "srcu" (statically allocated) test, so that both test the full
SRCU polled grace-period API.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcutorture: Limit read-side polling-API testing
Paul E. McKenney [Thu, 25 Aug 2022 01:57:45 +0000 (18:57 -0700)]
rcutorture: Limit read-side polling-API testing

RCU's polled grace-period API is reasonably lightweight, but still
contains heavyweight memory barriers.  This commit therefore limits
testing of this API from rcutorture's readers in order to avoid the
false negatives that these heavyweight operations could provoke.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu: Add functions to compare grace-period state values
Paul E. McKenney [Wed, 24 Aug 2022 22:39:09 +0000 (15:39 -0700)]
rcu: Add functions to compare grace-period state values

This commit adds same_state_synchronize_rcu() and
same_state_synchronize_rcu_full() functions to compare grace-period state
values, for example, those obtained from get_state_synchronize_rcu()
and get_state_synchronize_rcu_full().  These functions allow small
structures to omit these state values by placing them in list headers for
lists containing structures with the same token value.  Presumably the
per-structure list pointers are the same ones used to link the structures
into whatever reader-accessible data structure was used.

This commit also adds both NUM_ACTIVE_RCU_POLL_OLDSTATE and
NUM_ACTIVE_RCU_POLL_FULL_OLDSTATE, which define the maximum number of
distinct unsigned long values and rcu_gp_oldstate values, respectively,
corresponding to not-yet-completed grace periods.  These values can be
used to size arrays of the list headers described above.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcutorture: Expand rcu_torture_write_types() first "if" statement
Paul E. McKenney [Tue, 23 Aug 2022 19:11:12 +0000 (12:11 -0700)]
rcutorture: Expand rcu_torture_write_types() first "if" statement

This commit expands the rcu_torture_write_types() function's first "if"
condition and body, placing one element per line, in order to make the
compiler's error messages more helpful.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcutorture: Use 1-suffixed variable in rcu_torture_write_types() check
Paul E. McKenney [Mon, 22 Aug 2022 17:02:56 +0000 (10:02 -0700)]
rcutorture: Use 1-suffixed variable in rcu_torture_write_types() check

This commit changes the use of gp_poll_exp to gp_poll_exp1 in the first
check in rcu_torture_write_types().  No functional effect, but consistency
is a good thing.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu: Make synchronize_rcu() fastpath update only boot-CPU counters
Paul E. McKenney [Fri, 5 Aug 2022 22:42:25 +0000 (15:42 -0700)]
rcu: Make synchronize_rcu() fastpath update only boot-CPU counters

Large systems can have hundreds of rcu_node structures, and updating
counters in each of them might slow down booting.  This commit therefore
updates only the counters in those rcu_node structures corresponding
to the boot CPU, up to and including the root rcu_node structure.

The counters for the remaining rcu_node structures are updated by the
rcu_scheduler_starting() function, which executes just before the first
non-boot kthread is spawned.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcutorture: Adjust rcu_poll_need_2gp() for rcu_gp_oldstate field removal
Paul E. McKenney [Fri, 5 Aug 2022 01:05:46 +0000 (18:05 -0700)]
rcutorture: Adjust rcu_poll_need_2gp() for rcu_gp_oldstate field removal

Now that rcu_gp_oldstate can accurately track both normal and
expedited grace periods regardless of system state, rcutorture's
rcu_poll_need_2gp() function need only call for a second grace period
for the old single-unsigned-long grace-period polling APIs
This commit therefore adjusts rcu_poll_need_2gp() accordingly.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu: Remove ->rgos_polled field from rcu_gp_oldstate structure
Paul E. McKenney [Fri, 5 Aug 2022 00:54:53 +0000 (17:54 -0700)]
rcu: Remove ->rgos_polled field from rcu_gp_oldstate structure

Because both normal and expedited grace periods increment their respective
counters on their pre-scheduler early boot fastpaths, the rcu_gp_oldstate
structure no longer needs its ->rgos_polled field.  This commit therefore
removes this field, shrinking this structure so that it is the same size
as an rcu_head structure.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu: Make synchronize_rcu_expedited() fast path update .expedited_sequence
Paul E. McKenney [Fri, 5 Aug 2022 00:43:53 +0000 (17:43 -0700)]
rcu: Make synchronize_rcu_expedited() fast path update .expedited_sequence

This commit causes the early boot single-CPU synchronize_rcu_expedited()
fastpath to update the rcu_state structure's ->expedited_sequence
counter.  This will allow the full-state polled grace-period APIs to
detect all expedited grace periods without the need to track the special
combined polling-only counter, which is another step towards removing
the ->rgos_polled field from the rcu_gp_oldstate, thereby reducing its
size by one third.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu: Remove expedited grace-period fast-path forward-progress helper
Paul E. McKenney [Fri, 5 Aug 2022 00:34:30 +0000 (17:34 -0700)]
rcu: Remove expedited grace-period fast-path forward-progress helper

Now that the expedited grace-period fast path can only happen during
the pre-scheduler portion of early boot, this fast path can no longer
block run-time RCU Trace grace periods.  This commit therefore removes
the conditional cond_resched() invocation.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu: Make synchronize_rcu() fast path update ->gp_seq counters
Paul E. McKenney [Fri, 5 Aug 2022 00:28:01 +0000 (17:28 -0700)]
rcu: Make synchronize_rcu() fast path update ->gp_seq counters

This commit causes the early boot single-CPU synchronize_rcu() fastpath to
update the rcu_state and rcu_node structures' ->gp_seq and ->gp_seq_needed
counters.  This will allow the full-state polled grace-period APIs to
detect all normal grace periods without the need to track the special
combined polling-only counter, which is a step towards removing the
->rgos_polled field from the rcu_gp_oldstate, thereby reducing its size
by one third.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu-tasks: Remove grace-period fast-path rcu-tasks helper
Paul E. McKenney [Fri, 5 Aug 2022 00:16:24 +0000 (17:16 -0700)]
rcu-tasks: Remove grace-period fast-path rcu-tasks helper

Now that the grace-period fast path can only happen during the
pre-scheduler portion of early boot, this fast path can no longer block
run-time RCU Tasks and RCU Tasks Trace grace periods.  This commit
therefore removes the conditional cond_resched_tasks_rcu_qs() invocation.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu: Set rcu_data structures' initial ->gpwrap value to true
Paul E. McKenney [Fri, 5 Aug 2022 00:01:55 +0000 (17:01 -0700)]
rcu: Set rcu_data structures' initial ->gpwrap value to true

It would be good do reduce the size of the rcu_gp_oldstate structure
from three unsigned long instances to two, but this requires that the
boot-time optimized grace periods update the various ->gp_seq fields.
Updating these fields in the rcu_state structure and in all of the
rcu_node structures is at least semi-reasonable, but updating them in
all of the rcu_data structures is a bridge too far.  This means that if
there are too many early boot-time grace periods, the ->gp_seq field in
the rcu_data structure cannot be trusted.  This commit therefore sets
each rcu_data structure's ->gpwrap field to provide the necessary impetus
for a suitable level of distrust.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu: Disable run-time single-CPU grace-period optimization
Paul E. McKenney [Thu, 4 Aug 2022 23:07:04 +0000 (16:07 -0700)]
rcu: Disable run-time single-CPU grace-period optimization

The run-time single-CPU grace-period optimization applies only to
kernels built with CONFIG_SMP=y && CONFIG_PREEMPTION=y that are running
on a single-CPU system.  But a kernel intended for a single-CPU system
should instead be built with CONFIG_SMP=n, and in any case, single-CPU
systems running Linux no longer appear to be the common case.  Plus this
optimization results in the rcu_gp_oldstate structure being half again
larger than it needs to be.

This commit therefore disables the run-time single-CPU grace-period
optimization, so that this optimization applies only during the
pre-scheduler portion of the boot sequence.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu: Add full-sized polling for cond_sync_exp_full()
Paul E. McKenney [Thu, 4 Aug 2022 22:23:26 +0000 (15:23 -0700)]
rcu: Add full-sized polling for cond_sync_exp_full()

The cond_synchronize_rcu_expedited() API compresses the combined expedited and
normal grace-period states into a single unsigned long, which conserves
storage, but can miss grace periods in certain cases involving overlapping
normal and expedited grace periods.  Missing the occasional grace period
is usually not a problem, but there are use cases that care about each
and every grace period.

This commit therefore adds yet another member of the full-state RCU
grace-period polling API, which is the cond_synchronize_rcu_exp_full()
function.  This uses up to three times the storage (rcu_gp_oldstate
structure instead of unsigned long), but is guaranteed not to miss
grace periods.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu: Add full-sized polling for cond_sync_full()
Paul E. McKenney [Thu, 4 Aug 2022 20:46:05 +0000 (13:46 -0700)]
rcu: Add full-sized polling for cond_sync_full()

The cond_synchronize_rcu() API compresses the combined expedited and
normal grace-period states into a single unsigned long, which conserves
storage, but can miss grace periods in certain cases involving overlapping
normal and expedited grace periods.  Missing the occasional grace period
is usually not a problem, but there are use cases that care about each
and every grace period.

This commit therefore adds yet another member of the full-state RCU
grace-period polling API, which is the cond_synchronize_rcu_full()
function.  This uses up to three times the storage (rcu_gp_oldstate
structure instead of unsigned long), but is guaranteed not to miss
grace periods.

[ paulmck: Apply feedback from kernel test robot and Julia Lawall. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu: Remove blank line from poll_state_synchronize_rcu() docbook header
Paul E. McKenney [Wed, 3 Aug 2022 23:57:47 +0000 (16:57 -0700)]
rcu: Remove blank line from poll_state_synchronize_rcu() docbook header

This commit removes the blank line preceding the oldstate parameter to
the docbook header for the poll_state_synchronize_rcu() function and
marks uses of this parameter later in that header.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu: Add full-sized polling for start_poll_expedited()
Paul E. McKenney [Wed, 3 Aug 2022 19:38:51 +0000 (12:38 -0700)]
rcu: Add full-sized polling for start_poll_expedited()

The start_poll_synchronize_rcu_expedited() API compresses the combined
expedited and normal grace-period states into a single unsigned long,
which conserves storage, but can miss grace periods in certain cases
involving overlapping normal and expedited grace periods.  Missing the
occasional grace period is usually not a problem, but there are use
cases that care about each and every grace period.

This commit therefore adds yet another member of the
full-state RCU grace-period polling API, which is the
start_poll_synchronize_rcu_expedited_full() function.  This uses up to
three times the storage (rcu_gp_oldstate structure instead of unsigned
long), but is guaranteed not to miss grace periods.

[ paulmck: Apply feedback from kernel test robot and Julia Lawall. ]

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu: Add full-sized polling for start_poll()
Paul E. McKenney [Wed, 3 Aug 2022 00:04:54 +0000 (17:04 -0700)]
rcu: Add full-sized polling for start_poll()

The start_poll_synchronize_rcu() API compresses the combined expedited and
normal grace-period states into a single unsigned long, which conserves
storage, but can miss grace periods in certain cases involving overlapping
normal and expedited grace periods.  Missing the occasional grace period
is usually not a problem, but there are use cases that care about each
and every grace period.

This commit therefore adds the next member of the full-state RCU
grace-period polling API, namely the start_poll_synchronize_rcu_full()
function.  This uses up to three times the storage (rcu_gp_oldstate
structure instead of unsigned long), but is guaranteed not to miss
grace periods.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcutorture: Verify long-running reader prevents full polling from completing
Paul E. McKenney [Wed, 3 Aug 2022 23:40:48 +0000 (16:40 -0700)]
rcutorture: Verify long-running reader prevents full polling from completing

This commit adds full-state polling checks to accompany the old-style
polling checks in the rcu_torture_one_read() function.  If a polling
cycle within an RCU reader completes, a WARN_ONCE() is triggered.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcutorture: Remove redundant RTWS_DEF_FREE check
Paul E. McKenney [Tue, 2 Aug 2022 17:22:12 +0000 (10:22 -0700)]
rcutorture: Remove redundant RTWS_DEF_FREE check

This check does nothing because the state at this point in the code
because the rcu_torture_writer_state value is guaranteed to instead
be RTWS_REPLACE.  This commit therefore removes this check.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcutorture: Verify RCU reader prevents full polling from completing
Paul E. McKenney [Tue, 2 Aug 2022 00:33:24 +0000 (17:33 -0700)]
rcutorture: Verify RCU reader prevents full polling from completing

This commit adds a test to rcu_torture_writer() that verifies that a
->get_gp_state_full() and ->poll_gp_state_full() polled grace-period
sequence does not claim that a grace period elapsed within the confines
of the corresponding read-side critical section.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcutorture: Allow per-RCU-flavor polled double-GP check
Paul E. McKenney [Mon, 1 Aug 2022 23:57:17 +0000 (16:57 -0700)]
rcutorture: Allow per-RCU-flavor polled double-GP check

Only vanilla RCU needs a double grace period for its compressed
polled grace-period old-state cookie.  This commit therefore adds an
rcu_torture_ops per-flavor function ->poll_need_2gp to allow this check
to be adapted to the RCU flavor under test.  A NULL pointer for this
function says that doubled grace periods are never needed.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcutorture: Abstract synchronous and polled API testing
Paul E. McKenney [Sat, 30 Jul 2022 05:05:17 +0000 (22:05 -0700)]
rcutorture: Abstract synchronous and polled API testing

This commit abstracts a do_rtws_sync() function that does synchronous
grace-period testing, but also testing the polled API 25% of the time
each for the normal and full-state variants of the polled API.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu: Add full-sized polling for get_state()
Paul E. McKenney [Fri, 29 Jul 2022 02:58:13 +0000 (19:58 -0700)]
rcu: Add full-sized polling for get_state()

The get_state_synchronize_rcu() API compresses the combined expedited and
normal grace-period states into a single unsigned long, which conserves
storage, but can miss grace periods in certain cases involving overlapping
normal and expedited grace periods.  Missing the occasional grace period
is usually not a problem, but there are use cases that care about each
and every grace period.

This commit therefore adds the next member of the full-state RCU
grace-period polling API, namely the get_state_synchronize_rcu_full()
function.  This uses up to three times the storage (rcu_gp_oldstate
structure instead of unsigned long), but is guaranteed not to miss
grace periods.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu: Add full-sized polling for get_completed*() and poll_state*()
Paul E. McKenney [Thu, 28 Jul 2022 22:37:05 +0000 (15:37 -0700)]
rcu: Add full-sized polling for get_completed*() and poll_state*()

The get_completed_synchronize_rcu() and poll_state_synchronize_rcu()
APIs compress the combined expedited and normal grace-period states into a
single unsigned long, which conserves storage, but can miss grace periods
in certain cases involving overlapping normal and expedited grace periods.
Missing the occasional grace period is usually not a problem, but there
are use cases that care about each and every grace period.

This commit therefore adds the first members of the full-state RCU
grace-period polling API, namely the get_completed_synchronize_rcu_full()
and poll_state_synchronize_rcu_full() functions.  These use up to three
times the storage (rcu_gp_oldstate structure instead of unsigned long),
but which are guaranteed not to miss grace periods, at least in situations
where the single-CPU grace-period optimization does not apply.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu/nocb: Add CPU number to CPU-{,de}offload failure messages
Paul E. McKenney [Wed, 3 Aug 2022 00:39:28 +0000 (17:39 -0700)]
rcu/nocb: Add CPU number to CPU-{,de}offload failure messages

Offline CPUs cannot be offloaded or deoffloaded.  Any attempt to offload
or deoffload an offline CPU causes a message to be printed on the console,
which is good, but this message does not contain the CPU number, which
is bad.  Such a CPU number can be helpful when debugging, as it gives a
clear indication that the CPU in question is in fact offline.  This commit
therefore adds the CPU number to the CPU-{,de}offload failure messages.

Cc: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu/nocb: Choose the right rcuog/rcuop kthreads to output
Zqiang [Fri, 17 Jun 2022 14:15:19 +0000 (22:15 +0800)]
rcu/nocb: Choose the right rcuog/rcuop kthreads to output

The show_rcu_nocb_gp_state() function is supposed to dump out the rcuog
kthread and the show_rcu_nocb_state() function is supposed to dump out
the rcuo[ps] kthread.  Currently, both do a mixture, which is not optimal
for debugging, even though it does not affect functionality.

This commit therefore adjusts these two functions to focus on their
respective kthreads.

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu/kvfree: Update KFREE_DRAIN_JIFFIES interval
Uladzislau Rezki (Sony) [Thu, 30 Jun 2022 16:33:35 +0000 (18:33 +0200)]
rcu/kvfree: Update KFREE_DRAIN_JIFFIES interval

Currently the monitor work is scheduled with a fixed interval of HZ/20,
which is roughly 50 milliseconds. The drawback of this approach is
low utilization of the 512 page slots in scenarios with infrequence
kvfree_rcu() calls.  For example on an Android system:

<snip>
  kworker/3:3-507     [003] ....   470.286305: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d0f0dde5 nr_records=6
  kworker/6:1-76      [006] ....   470.416613: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000ea0d6556 nr_records=1
  kworker/6:1-76      [006] ....   470.416625: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000003e025849 nr_records=9
  kworker/3:3-507     [003] ....   471.390000: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000815a8713 nr_records=48
  kworker/1:1-73      [001] ....   471.725785: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000fda9bf20 nr_records=3
  kworker/1:1-73      [001] ....   471.725833: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000a425b67b nr_records=76
  kworker/0:4-1411    [000] ....   472.085673: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000007996be9d nr_records=1
  kworker/0:4-1411    [000] ....   472.085728: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d0f0dde5 nr_records=5
  kworker/6:1-76      [006] ....   472.260340: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000065630ee4 nr_records=102
<snip>

In many cases, out of 512 slots, fewer than 10 were actually used.
In order to improve batching and make utilization more efficient this
commit sets a drain interval to a fixed 5-seconds interval. Floods are
detected when a page fills quickly, and in that case, the reclaim work
is re-scheduled for the next scheduling-clock tick (jiffy).

After this change:

<snip>
  kworker/7:1-371     [007] ....  5630.725708: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000005ab0ffb3 nr_records=121
  kworker/7:1-371     [007] ....  5630.989702: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000060c84761 nr_records=47
  kworker/7:1-371     [007] ....  5630.989714: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000000babf308 nr_records=510
  kworker/7:1-371     [007] ....  5631.553790: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000bb7bd0ef nr_records=169
  kworker/7:1-371     [007] ....  5631.553808: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044c78753 nr_records=510
  kworker/5:6-9428    [005] ....  5631.746102: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d98519aa nr_records=123
  kworker/4:7-9434    [004] ....  5632.001758: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000526c9d44 nr_records=322
  kworker/4:7-9434    [004] ....  5632.002073: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000002c6a8afa nr_records=185
  kworker/7:1-371     [007] ....  5632.277515: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000007f4a962f nr_records=510
<snip>

Here, all but one of the cases, more than one hundreds slots were used,
representing an order-of-magnitude improvement.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu/kfree: Fix kfree_rcu_shrink_count() return value
Joel Fernandes (Google) [Wed, 22 Jun 2022 22:51:02 +0000 (22:51 +0000)]
rcu/kfree: Fix kfree_rcu_shrink_count() return value

As per the comments in include/linux/shrinker.h, .count_objects callback
should return the number of freeable items, but if there are no objects
to free, SHRINK_EMPTY should be returned. The only time 0 is returned
should be when we are unable to determine the number of objects, or the
cache should be skipped for another reason.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu: Back off upon fill_page_cache_func() allocation failure
Michal Hocko [Wed, 22 Jun 2022 11:47:11 +0000 (13:47 +0200)]
rcu: Back off upon fill_page_cache_func() allocation failure

The fill_page_cache_func() function allocates couple of pages to store
kvfree_rcu_bulk_data structures. This is a lightweight (GFP_NORETRY)
allocation which can fail under memory pressure. The function will,
however keep retrying even when the previous attempt has failed.

This retrying is in theory correct, but in practice the allocation is
invoked from workqueue context, which means that if the memory reclaim
gets stuck, these retries can hog the worker for quite some time.
Although the workqueues subsystem automatically adjusts concurrency, such
adjustment is not guaranteed to happen until the worker context sleeps.
And the fill_page_cache_func() function's retry loop is not guaranteed
to sleep (see the should_reclaim_retry() function).

And we have seen this function cause workqueue lockups:

kernel: BUG: workqueue lockup - pool cpus=93 node=1 flags=0x1 nice=0 stuck for 32s!
[...]
kernel: pool 74: cpus=37 node=0 flags=0x1 nice=0 hung=32s workers=2 manager: 2146
kernel:   pwq 498: cpus=249 node=1 flags=0x1 nice=0 active=4/256 refcnt=5
kernel:     in-flight: 1917:fill_page_cache_func
kernel:     pending: dbs_work_handler, free_work, kfree_rcu_monitor

Originally, we thought that the root cause of this lockup was several
retries with direct reclaim, but this is not yet confirmed.  Furthermore,
we have seen similar lockups without any heavy memory pressure.  This
suggests that there are other factors contributing to these lockups.
However, it is not really clear that endless retries are desireable.

So let's make the fill_page_cache_func() function back off after
allocation failure.

Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu: Exclude outgoing CPU when it is the last to leave
Paul E. McKenney [Wed, 24 Aug 2022 21:46:56 +0000 (14:46 -0700)]
rcu: Exclude outgoing CPU when it is the last to leave

The rcu_boost_kthread_setaffinity() function removes the outgoing CPU
from the set_cpus_allowed() mask for the corresponding leaf rcu_node
structure's rcub priority-boosting kthread.  Except that if the outgoing
CPU will leave that structure without any online CPUs, the mask is set
to the housekeeping CPU mask from housekeeping_cpumask().  Which is fine
unless the outgoing CPU happens to be a housekeeping CPU.

This commit therefore removes the outgoing CPU from the housekeeping mask.
This would of course be problematic if the outgoing CPU was the last
online housekeeping CPU, but in that case you are in a world of hurt
anyway.  If someone comes up with a valid use case for a system needing
all the housekeeping CPUs to be offline, further adjustments can be made.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu: Avoid triggering strict-GP irq-work when RCU is idle
Zqiang [Mon, 8 Aug 2022 02:26:26 +0000 (10:26 +0800)]
rcu: Avoid triggering strict-GP irq-work when RCU is idle

Kernels built with PREEMPT_RCU=y and RCU_STRICT_GRACE_PERIOD=y trigger
irq-work from rcu_read_unlock(), and the resulting irq-work handler
invokes rcu_preempt_deferred_qs_handle().  The point of this triggering
is to force grace periods to end quickly in order to give tools like KASAN
a better chance of detecting RCU usage bugs such as leaking RCU-protected
pointers out of an RCU read-side critical section.

However, this irq-work triggering is unconditional.  This works, but
there is no point in doing this irq-work unless the current grace period
is waiting on the running CPU or task, which is not the common case.
After all, in the common case there are many rcu_read_unlock() calls
per CPU per grace period.

This commit therefore triggers the irq-work only when the current grace
period is waiting on the running CPU or task.

This change was tested as follows on a four-CPU system:

echo rcu_preempt_deferred_qs_handler > /sys/kernel/debug/tracing/set_ftrace_filter
echo 1 > /sys/kernel/debug/tracing/function_profile_enabled
insmod rcutorture.ko
sleep 20
rmmod rcutorture.ko
echo 0 > /sys/kernel/debug/tracing/function_profile_enabled
echo > /sys/kernel/debug/tracing/set_ftrace_filter

This procedure produces results in this per-CPU set of files:

/sys/kernel/debug/tracing/trace_stat/function*

Sample output from one of these files is as follows:

  Function                               Hit    Time            Avg             s^2
  --------                               ---    ----            ---             ---
  rcu_preempt_deferred_qs_handle      838746    182650.3 us     0.217 us        0.004 us

The baseline sum of the "Hit" values (the number of calls to this
function) was 3,319,015.  With this commit, that sum was 1,140,359,
for a 2.9x reduction.  The worst-case variance across the CPUs was less
than 25%, so this large effect size is statistically significant.

The raw data is available in the Link: URL.

Link: https://lore.kernel.org/all/20220808022626.12825-1-qiang1.zhang@intel.com/
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agosched/debug: Show the registers of 'current' in dump_cpu_task()
Zhen Lei [Thu, 4 Aug 2022 02:34:20 +0000 (10:34 +0800)]
sched/debug: Show the registers of 'current' in dump_cpu_task()

The dump_cpu_task() function does not print registers on architectures
that do not support NMIs.  However, registers can be useful for
debugging.  Fortunately, in the case where dump_cpu_task() is invoked
from an interrupt handler and is dumping the current CPU's stack, the
get_irq_regs() function can be used to get the registers.

Therefore, this commit makes dump_cpu_task() check to see if it is being
asked to dump the current CPU's stack from within an interrupt handler,
and, if so, it uses the get_irq_regs() function to obtain the registers.
On systems that do support NMIs, this commit has the further advantage
of avoiding a self-NMI in this case.

This is an example of rcu self-detected stall on arm64, which does not
support NMIs:
[   27.501721] rcu: INFO: rcu_preempt self-detected stall on CPU
[   27.502238] rcu:     0-....: (1250 ticks this GP) idle=4f7/1/0x4000000000000000 softirq=2594/2594 fqs=619
[   27.502632]  (t=1251 jiffies g=2989 q=29 ncpus=4)
[   27.503845] CPU: 0 PID: 306 Comm: test0 Not tainted 5.19.0-rc7-00009-g1c1a6c29ff99-dirty #46
[   27.504732] Hardware name: linux,dummy-virt (DT)
[   27.504947] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   27.504998] pc : arch_counter_read+0x18/0x24
[   27.505301] lr : arch_counter_read+0x18/0x24
[   27.505328] sp : ffff80000b29bdf0
[   27.505345] x29: ffff80000b29bdf0 x28: 0000000000000000 x27: 0000000000000000
[   27.505475] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
[   27.505553] x23: 0000000000001f40 x22: ffff800009849c48 x21: 000000065f871ae0
[   27.505627] x20: 00000000000025ec x19: ffff80000a6eb300 x18: ffffffffffffffff
[   27.505654] x17: 0000000000000001 x16: 0000000000000000 x15: ffff80000a6d0296
[   27.505681] x14: ffffffffffffffff x13: ffff80000a29bc18 x12: 0000000000000426
[   27.505709] x11: 0000000000000162 x10: ffff80000a2f3c18 x9 : ffff80000a29bc18
[   27.505736] x8 : 00000000ffffefff x7 : ffff80000a2f3c18 x6 : 00000000759bd013
[   27.505761] x5 : 01ffffffffffffff x4 : 0002dc6c00000000 x3 : 0000000000000017
[   27.505787] x2 : 00000000000025ec x1 : ffff80000b29bdf0 x0 : 0000000075a30653
[   27.505937] Call trace:
[   27.506002]  arch_counter_read+0x18/0x24
[   27.506171]  ktime_get+0x48/0xa0
[   27.506207]  test_task+0x70/0xf0
[   27.506227]  kthread+0x10c/0x110
[   27.506243]  ret_from_fork+0x10/0x20

This is a marked improvement over the old output:
[   27.944550] rcu: INFO: rcu_preempt self-detected stall on CPU
[   27.944980] rcu:     0-....: (1249 ticks this GP) idle=cbb/1/0x4000000000000000 softirq=2610/2610 fqs=614
[   27.945407]  (t=1251 jiffies g=2681 q=28 ncpus=4)
[   27.945731] Task dump for CPU 0:
[   27.945844] task:test0           state:R  running task     stack:    0 pid:  306 ppid:     2 flags:0x0000000a
[   27.946073] Call trace:
[   27.946151]  dump_backtrace.part.0+0xc8/0xd4
[   27.946378]  show_stack+0x18/0x70
[   27.946405]  sched_show_task+0x150/0x180
[   27.946427]  dump_cpu_task+0x44/0x54
[   27.947193]  rcu_dump_cpu_stacks+0xec/0x130
[   27.947212]  rcu_sched_clock_irq+0xb18/0xef0
[   27.947231]  update_process_times+0x68/0xac
[   27.947248]  tick_sched_handle+0x34/0x60
[   27.947266]  tick_sched_timer+0x4c/0xa4
[   27.947281]  __hrtimer_run_queues+0x178/0x360
[   27.947295]  hrtimer_interrupt+0xe8/0x244
[   27.947309]  arch_timer_handler_virt+0x38/0x4c
[   27.947326]  handle_percpu_devid_irq+0x88/0x230
[   27.947342]  generic_handle_domain_irq+0x2c/0x44
[   27.947357]  gic_handle_irq+0x44/0xc4
[   27.947376]  call_on_irq_stack+0x2c/0x54
[   27.947415]  do_interrupt_handler+0x80/0x94
[   27.947431]  el1_interrupt+0x34/0x70
[   27.947447]  el1h_64_irq_handler+0x18/0x24
[   27.947462]  el1h_64_irq+0x64/0x68                       <--- the above backtrace is worthless
[   27.947474]  arch_counter_read+0x18/0x24
[   27.947487]  ktime_get+0x48/0xa0
[   27.947501]  test_task+0x70/0xf0
[   27.947520]  kthread+0x10c/0x110
[   27.947538]  ret_from_fork+0x10/0x20

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Valentin Schneider <vschneid@redhat.com>
2 years agosched/debug: Try trigger_single_cpu_backtrace(cpu) in dump_cpu_task()
Zhen Lei [Thu, 4 Aug 2022 02:34:19 +0000 (10:34 +0800)]
sched/debug: Try trigger_single_cpu_backtrace(cpu) in dump_cpu_task()

The trigger_all_cpu_backtrace() function attempts to send an NMI to the
target CPU, which usually provides much better stack traces than the
dump_cpu_task() function's approach of dumping that stack from some other
CPU.  So much so that most calls to dump_cpu_task() only happen after
a call to trigger_all_cpu_backtrace() has failed.  And the exception to
this rule really should attempt to use trigger_all_cpu_backtrace() first.

Therefore, move the trigger_all_cpu_backtrace() invocation into
dump_cpu_task().

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Valentin Schneider <vschneid@redhat.com>
2 years agorcu: Update rcu_access_pointer() header for rcu_dereference_protected()
Paul E. McKenney [Wed, 3 Aug 2022 17:36:32 +0000 (10:36 -0700)]
rcu: Update rcu_access_pointer() header for rcu_dereference_protected()

The rcu_access_pointer() docbook header correctly notes that it may be
used during post-grace-period teardown.  However, it is usually better to
use rcu_dereference_protected() for this purpose.  This commit therefore
calls out this preferred usage.

Reported-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu: Document reason for rcu_all_qs() call to preempt_disable()
Paul E. McKenney [Wed, 3 Aug 2022 15:48:12 +0000 (08:48 -0700)]
rcu: Document reason for rcu_all_qs() call to preempt_disable()

Given that rcu_all_qs() is in non-preemptible kernels, why on earth should
it invoke preempt_disable()?  This commit adds the reason, which is to
work nicely with debugging enabled in CONFIG_PREEMPT_COUNT=y kernels.

Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Reported-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu: Make tiny RCU support leak callbacks for debug-object errors
Zqiang [Fri, 1 Jul 2022 02:44:04 +0000 (10:44 +0800)]
rcu: Make tiny RCU support leak callbacks for debug-object errors

Currently, only Tree RCU leaks callbacks setting when it detects a
duplicate call_rcu().  This commit causes Tiny RCU to also leak
callbacks in this situation.

Because this is Tiny RCU, kernel size is important:

1. CONFIG_TINY_RCU=y and CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
   (Production kernel)

    Original:
    text      data      bss       dec       hex     filename
    26290663  20159823  15212544  61663030  3ace736 vmlinux

    With this commit:
    text      data      bss       dec       hex     filename
    26290663  20159823  15212544  61663030  3ace736 vmlinux

2. CONFIG_TINY_RCU=y and CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
   (Debugging kernel)

    Original:
    text      data      bss       dec       hex     filename
    26291319  20160143  15212544  61664006  3aceb06 vmlinux

    With this commit:
    text      data      bss       dec       hex     filename
    26291319  20160431  15212544  61664294  3acec26 vmlinux

These results show that the kernel size is unchanged for production
kernels, as desired.

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu: Add QS check in rcu_exp_handler() for non-preemptible kernels
Zqiang [Tue, 5 Jul 2022 19:09:51 +0000 (12:09 -0700)]
rcu: Add QS check in rcu_exp_handler() for non-preemptible kernels

Kernels built with CONFIG_PREEMPTION=n and CONFIG_PREEMPT_COUNT=y maintain
preempt_count() state.  Because such kernels map __rcu_read_lock()
and __rcu_read_unlock() to preempt_disable() and preempt_enable(),
respectively, this allows the expedited grace period's !CONFIG_PREEMPT_RCU
version of the rcu_exp_handler() IPI handler function to use
preempt_count() to detect quiescent states.

This preempt_count() usage might seem to risk failures due to
use of implicit RCU readers in portions of the kernel under #ifndef
CONFIG_PREEMPTION, except that rcu_core() already disallows such implicit
RCU readers.  The moral of this story is that you must use explicit
read-side markings such as rcu_read_lock() or preempt_disable() even if
the code knows that this kernel does not support preemption.

This commit therefore adds a preempt_count()-based check for a quiescent
state in the !CONFIG_PREEMPT_RCU version of the rcu_exp_handler()
function for kernels built with CONFIG_PREEMPT_COUNT=y, reporting an
immediate quiescent state when the interrupted code had both preemption
and softirqs enabled.

This change results in about a 2% reduction in expedited grace-period
latency in kernels built with both CONFIG_PREEMPT_RCU=n and
CONFIG_PREEMPT_COUNT=y.

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/all/20220622103549.2840087-1-qiang1.zhang@intel.com/
2 years agorcu: Update rcu_preempt_deferred_qs() comments for !PREEMPT kernels
Zqiang [Mon, 20 Jun 2022 06:42:24 +0000 (14:42 +0800)]
rcu: Update rcu_preempt_deferred_qs() comments for !PREEMPT kernels

In non-premptible kernels, tasks never do context switches within
RCU read-side critical sections.  Therefore, in such kernels, each
leaf rcu_node structure's ->blkd_tasks list will always be empty.
The comment on the non-preemptible version of rcu_preempt_deferred_qs()
confuses this point, so this commit therefore fixes it.

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agorcu: Fix rcu_read_unlock_strict() strict QS reporting
Zqiang [Thu, 16 Jun 2022 13:53:47 +0000 (21:53 +0800)]
rcu: Fix rcu_read_unlock_strict() strict QS reporting

Kernels built with CONFIG_PREEMPT=n and CONFIG_RCU_STRICT_GRACE_PERIOD=y
report the quiescent state directly from the outermost rcu_read_unlock().
However, the current CPU's rcu_data structure's ->cpu_no_qs.b.norm
might still be set, in which case rcu_report_qs_rdp() will exit early,
thus failing to report quiescent state.

This commit therefore causes rcu_read_unlock_strict() to clear
CPU's rcu_data structure's ->cpu_no_qs.b.norm field before invoking
rcu_report_qs_rdp().

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agodoc/rcu: Update LWN article URLs and add 2019 article
Shao-Tse Hung [Sat, 20 Aug 2022 08:32:44 +0000 (16:32 +0800)]
doc/rcu: Update LWN article URLs and add 2019 article

This patch adds LWN articles about RCU APIs which were released in 2019.
Also, HTTP URLs are replaced by HTTPS.

Signed-off-by: Shao-Tse Hung <ccs100203@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agodoc: SLAB_TYPESAFE_BY_RCU uses cannot rely on spinlocks
Paul E. McKenney [Tue, 9 Aug 2022 17:24:36 +0000 (10:24 -0700)]
doc: SLAB_TYPESAFE_BY_RCU uses cannot rely on spinlocks

Because the SLAB_TYPESAFE_BY_RCU code does not zero pages that are
to be broken up into slabs, the memory returned by kmem_cache_alloc()
must be fully initialized, including any spinlocks included in the newly
allocated structure.  This means that readers attempting to look up an
SLAB_TYPESAFE_BY_RCU object must use a reference-counting approach.
A spinlock may be acquired only after a reference is obtained, which
prevents that object from being passed to kmem_struct_free(), but only
while that reference continues to be held.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agodoc: Update rcu_access_pointer() advice in rcu_dereference.rst
Paul E. McKenney [Thu, 4 Aug 2022 18:23:19 +0000 (11:23 -0700)]
doc: Update rcu_access_pointer() advice in rcu_dereference.rst

This commit updates the rcu_access_pointer() advice, noting that its
return value should not be assigned to a local variable, and also noting
that there is little point in using rcu_access_pointer() within an RCU
read-side critical section.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agodoc: Fix list: rcu_access_pointer() is not lockdep-checked
Paul E. McKenney [Wed, 3 Aug 2022 17:23:42 +0000 (10:23 -0700)]
doc: Fix list: rcu_access_pointer() is not lockdep-checked

The rcu_access_pointer() macro does not consult lockdep by design because
it is intended to be used outside of RCU read-side critical sections.
This commit therefore makes a separate list for it in whatisRCU.rst.

Similarly, RCU_LOCKDEP_WARN(), rcu_sleep_check(), and RCU_NONIDLE()
do not do anything with pointer access.  This commit therefore creates
a separate utility-API list for them.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agodoc: Use rcu_barrier() to rate-limit RCU callbacks
Paul E. McKenney [Thu, 28 Jul 2022 19:04:00 +0000 (12:04 -0700)]
doc: Use rcu_barrier() to rate-limit RCU callbacks

The checklist.rst document advises periodic synchronize_rcu() invocations
to prevent callback flooding.  However, rcu_barrier() is often a better
choice.  This commit therefore adds words to this effect.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agodoc: Call out queue_rcu_work() for blocking RCU callbacks
Paul E. McKenney [Thu, 28 Jul 2022 19:00:50 +0000 (12:00 -0700)]
doc: Call out queue_rcu_work() for blocking RCU callbacks

The current checklist.rst file correctly notes that RCU callbacks execute
in BH context, and cannot block.  This commit adds words advising people
needing callbacks to block to use workqueues, for example, by replacing
call_rcu() with queue_rcu_work().

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agodoc: Emphasize the need for explicit RCU read-side markers
Paul E. McKenney [Tue, 5 Jul 2022 19:15:35 +0000 (12:15 -0700)]
doc: Emphasize the need for explicit RCU read-side markers

This commit updates checklist.rst to emphasize the need for explicit
markers for RCU read-side critical sections.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2 years agoLinux 6.0-rc1
Linus Torvalds [Sun, 14 Aug 2022 22:50:18 +0000 (15:50 -0700)]
Linux 6.0-rc1

2 years agoradix-tree: replace gfp.h inclusion with gfp_types.h
Yury Norov [Fri, 12 Aug 2022 05:34:25 +0000 (22:34 -0700)]
radix-tree: replace gfp.h inclusion with gfp_types.h

Radix tree header includes gfp.h for __GFP_BITS_SHIFT only. Now we
have gfp_types.h for this.

Fixes powerpc allmodconfig build:

   In file included from include/linux/nodemask.h:97,
                    from include/linux/mmzone.h:17,
                    from include/linux/gfp.h:7,
                    from include/linux/radix-tree.h:12,
                    from include/linux/idr.h:15,
                    from include/linux/kernfs.h:12,
                    from include/linux/sysfs.h:16,
                    from include/linux/kobject.h:20,
                    from include/linux/pci.h:35,
                    from arch/powerpc/kernel/prom_init.c:24:
   include/linux/random.h: In function 'add_latent_entropy':
>> include/linux/random.h:25:46: error: 'latent_entropy' undeclared (first use in this function); did you mean 'add_latent_entropy'?
      25 |         add_device_randomness((const void *)&latent_entropy, sizeof(latent_entropy));
         |                                              ^~~~~~~~~~~~~~
         |                                              add_latent_entropy
   include/linux/random.h:25:46: note: each undeclared identifier is reported only once for each function it appears in

Reported-by: kernel test robot <lkp@intel.com>
CC: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoMerge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sun, 14 Aug 2022 20:03:53 +0000 (13:03 -0700)]
Merge tag 'pull-fixes' of git://git./linux/kernel/git/viro/vfs

Pull vfs lseek fix from Al Viro:
 "Fix proc_reg_llseek() breakage. Always had been possible if somebody
  left NULL ->proc_lseek, became a practical issue now"

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  take care to handle NULL ->proc_lseek()

2 years agotake care to handle NULL ->proc_lseek()
Al Viro [Sun, 14 Aug 2022 19:16:18 +0000 (15:16 -0400)]
take care to handle NULL ->proc_lseek()

Easily done now, just by clearing FMODE_LSEEK in ->f_mode
during proc_reg_open() for such entries.

Fixes: 868941b14441 "fs: remove no_llseek"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2 years agoMerge tag 'for-linus-6.0-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 14 Aug 2022 16:28:54 +0000 (09:28 -0700)]
Merge tag 'for-linus-6.0-rc1b-tag' of git://git./linux/kernel/git/xen/tip

Pull more xen updates from Juergen Gross:

 - fix the handling of the "persistent grants" feature negotiation
   between Xen blkfront and Xen blkback drivers

 - a cleanup of xen.config and adding xen.config to Xen section in
   MAINTAINERS

 - support HVMOP_set_evtchn_upcall_vector, which is more compliant to
   "normal" interrupt handling than the global callback used up to now

 - further small cleanups

* tag 'for-linus-6.0-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  MAINTAINERS: add xen config fragments to XEN HYPERVISOR sections
  xen: remove XEN_SCRUB_PAGES in xen.config
  xen/pciback: Fix comment typo
  xen/xenbus: fix return type in xenbus_file_read()
  xen-blkfront: Apply 'feature_persistent' parameter when connect
  xen-blkback: Apply 'feature_persistent' parameter when connect
  xen-blkback: fix persistent grants negotiation
  x86/xen: Add support for HVMOP_set_evtchn_upcall_vector

2 years agoMerge tag 'perf-tools-fixes-for-v6.0-2022-08-13' of git://git.kernel.org/pub/scm...
Linus Torvalds [Sun, 14 Aug 2022 16:22:11 +0000 (09:22 -0700)]
Merge tag 'perf-tools-fixes-for-v6.0-2022-08-13' of git://git./linux/kernel/git/acme/linux

Pull more perf tool updates from Arnaldo Carvalho de Melo:

 - 'perf c2c' now supports ARM64, adjust its output to cope with
   differences with what is in x86_64. Now go find false sharing on
   ARM64 (at least Neoverse) as well!

 - Refactor the JSON processing, making the output more compact and thus
   reducing the size of the resulting perf binary

 - Improvements for 'perf offcpu' profiling, including tracking child
   processes

 - Update Intel JSON metrics and events files for broadwellde,
   broadwellx, cascadelakex, haswellx, icelakex, ivytown, jaketown,
   knightslanding, sapphirerapids, skylakex and snowridgex

 - Add 'perf stat' JSON output and a 'perf test' entry for it

 - Ignore memfd and anonymous mmap events if jitdump present

 - Refactor 'perf test' shell tests allowing subdirs

 - Fix an error handling path in 'parse_perf_probe_command()'

 - Fixes for the guest Intel PT tracing patchkit in the 1st batch of
   this merge window

 - Print debuginfod queries if -v option is used, to explain delays in
   processing when debuginfo servers are enabled to fetch DSOs with
   richer symbol tables

 - Improve error message for 'perf record -p not_existing_pid'

 - Fix openssl and libbpf feature detection

 - Add PMU pai_crypto event description for IBM z16 on 'perf list'

 - Fix typos and duplicated words on comments in various places

* tag 'perf-tools-fixes-for-v6.0-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (81 commits)
  perf test: Refactor shell tests allowing subdirs
  perf vendor events: Update events for snowridgex
  perf vendor events: Update events and metrics for skylakex
  perf vendor events: Update metrics for sapphirerapids
  perf vendor events: Update events for knightslanding
  perf vendor events: Update metrics for jaketown
  perf vendor events: Update metrics for ivytown
  perf vendor events: Update events and metrics for icelakex
  perf vendor events: Update events and metrics for haswellx
  perf vendor events: Update events and metrics for cascadelakex
  perf vendor events: Update events and metrics for broadwellx
  perf vendor events: Update metrics for broadwellde
  perf jevents: Fold strings optimization
  perf jevents: Compress the pmu_events_table
  perf metrics: Copy entire pmu_event in find metric
  perf pmu-events: Hide the pmu_events
  perf pmu-events: Don't assume pmu_event is an array
  perf pmu-events: Move test events/metrics to JSON
  perf test: Use full metric resolution
  perf pmu-events: Hide pmu_events_map
  ...

2 years agoMerge tag 'powerpc-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sun, 14 Aug 2022 15:48:13 +0000 (08:48 -0700)]
Merge tag 'powerpc-6.0-2' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Ensure we never emit lwarx with EH=1 on 32-bit, because some 32-bit
   CPUs trap on it rather than ignoring it as they should.

 - Fix ftrace when building with clang, which was broken by some
   refactoring.

 - A couple of other minor fixes.

Thanks to Christophe Leroy, Naveen N.  Rao, Nick Desaulniers, Ondrej
Mosnacek, Pali Rohár, Russell Currey, and Segher Boessenkool.

* tag 'powerpc-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/kexec: Fix build failure from uninitialised variable
  powerpc/ppc-opcode: Fix PPC_RAW_TW()
  powerpc64/ftrace: Fix ftrace for clang builds
  powerpc: Make eh value more explicit when using lwarx
  powerpc: Don't hide eh field of lwarx behind a macro
  powerpc: Fix eh field when calling lwarx on PPC32

2 years agoMerge tag 'pull-work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sun, 14 Aug 2022 00:35:58 +0000 (17:35 -0700)]
Merge tag 'pull-work.misc' of git://git./linux/kernel/git/viro/vfs

Pull /proc/mounts fix from Al Viro:
 "Fix for /proc/mounts escaping - escape the '#' character too"

* tag 'pull-work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: escape hash as well

2 years agoMerge tag '5.20-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 14 Aug 2022 00:31:18 +0000 (17:31 -0700)]
Merge tag '5.20-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull more cifs updates from Steve French:

 - two fixes for stable, one for a lock length miscalculation, and
   another fixes a lease break timeout bug

 - improvement to handle leases, allows the close timeout to be
   configured more safely

 - five restructuring/cleanup patches

* tag '5.20-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Do not access tcon->cfids->cfid directly from is_path_accessible
  cifs: Add constructor/destructors for tcon->cfid
  SMB3: fix lease break timeout when multiple deferred close handles for the same file.
  smb3: allow deferred close timeout to be configurable
  cifs: Do not use tcon->cfid directly, use the cfid we get from open_cached_dir
  cifs: Move cached-dir functions into a separate file
  cifs: Remove {cifs,nfs}_fscache_release_page()
  cifs: fix lock length calculation

2 years agoafs: Enable multipage folio support
David Howells [Wed, 10 Aug 2022 17:52:47 +0000 (18:52 +0100)]
afs: Enable multipage folio support

Enable multipage folio support for the afs filesystem.

Support has already been implemented in netfslib, fscache and cachefiles
and in most of afs, but I've waited for Matthew Wilcox's latest folio
changes.

Note that it does require a change to afs_write_begin() to return the
correct subpage.  This is a "temporary" change as we're working on
getting rid of the need for ->write_begin() and ->write_end()
completely, at least as far as network filesystems are concerned - but
it doesn't prevent afs from making use of the capability.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: kafs-testing@auristor.com
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/lkml/2274528.1645833226@warthog.procyon.org.uk/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2 years agoMerge tag 'timers-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 13 Aug 2022 21:38:22 +0000 (14:38 -0700)]
Merge tag 'timers-urgent-2022-08-13' of git://git./linux/kernel/git/tip/tip

Pull timer fixes from Ingo Molnar:
 "Misc timer fixes:

   - fix a potential use-after-free bug in posix timers

   - correct a prototype

   - address a build warning"

* tag 'timers-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-cpu-timers: Cleanup CPU timers before freeing them during exec
  time: Correct the prototype of ns_to_kernel_old_timeval and ns_to_timespec64
  posix-timers: Make do_clock_gettime() static

2 years agoMerge tag 'x86-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 13 Aug 2022 21:24:12 +0000 (14:24 -0700)]
Merge tag 'x86-urgent-2022-08-13' of git://git./linux/kernel/git/tip/tip

Pull x86 fix from Ingo Molnar:
 "Fix the 'IBPB mitigated RETBleed' mode of operation on AMD CPUs (not
  turned on by default), which also need STIBP enabled (if available) to
  be '100% safe' on even the shortest speculation windows"

* tag 'x86-urgent-2022-08-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Enable STIBP for IBPB mitigated RETBleed

2 years agoMerge tag 'i2c-for-5.20-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sat, 13 Aug 2022 21:06:08 +0000 (14:06 -0700)]
Merge tag 'i2c-for-5.20-part2' of git://git./linux/kernel/git/wsa/linux

Pull more i2c updates from Wolfram Sang:

 - two driver fixes for issues introduced this cycle

 - one trivial driver improvement regarding ACPI

 - more DTS conversion and additions

 - documentation updates

 - subsystem-wide move from strlcpy to strscpy

* tag 'i2c-for-5.20-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  docs: i2c: i2c-sysfs: fix hyperlinks
  docs: i2c: i2c-sysfs: improve wording
  docs: i2c: instantiating-devices: add syntax coloring to dts and C blocks
  docs: i2c: smbus-protocol: improve DataLow/DataHigh definition
  docs: i2c: i2c-protocol: remove unused legend items
  docs: i2c: i2c-protocol,smbus-protocol: remove nonsense words
  docs: i2c: i2c-protocol: update introductory paragraph
  i2c: move core from strlcpy to strscpy
  i2c: move drivers from strlcpy to strscpy
  i2c: kempld: Support ACPI I2C device declaration
  i2c: mediatek: add i2c compatible for MT8188
  dt-bindings: i2c: update bindings for mt8188 soc
  i2c: microchip-corei2c: fix erroneous late ack send
  dt-bindings: i2c: qcom,i2c-cci: convert to dtschema
  i2c: qcom-geni: Fix GPI DMA buffer sync-back

2 years agoMerge tag 'ntb-5.20' of https://github.com/jonmason/ntb
Linus Torvalds [Sat, 13 Aug 2022 21:00:45 +0000 (14:00 -0700)]
Merge tag 'ntb-5.20' of https://github.com/jonmason/ntb

Pull NTB updates from Jon Mason:
 "Non-Transparent Bridge updates.

  Fix of heap data and clang warnings, support for a new Intel NTB
  device, and NTB EndPoint Function (EPF) support and the various fixes
  for that"

* tag 'ntb-5.20' of https://github.com/jonmason/ntb:
  MAINTAINERS: add PCI Endpoint NTB drivers to NTB files
  NTB: EPF: Tidy up some bounds checks
  NTB: EPF: Fix error code in epf_ntb_bind()
  PCI: endpoint: pci-epf-vntb: reduce several globals to statics
  PCI: endpoint: pci-epf-vntb: fix error handle in epf_ntb_mw_bar_init()
  PCI: endpoint: Fix Kconfig dependency
  NTB: EPF: set pointer addr to null using NULL rather than 0
  Documentation: PCI: extend subheading underline for "lspci output" section
  Documentation: PCI: Use code-block block for scratchpad registers diagram
  Documentation: PCI: Add specification for the PCI vNTB function device
  PCI: endpoint: Support NTB transfer between RC and EP
  NTB: epf: Allow more flexibility in the memory BAR map method
  PCI: designware-ep: Allow pci_epc_set_bar() update inbound map address
  ntb: intel: add GNR support for Intel PCIe gen5 NTB
  NTB: ntb_tool: uninitialized heap data in tool_fn_write()
  ntb: idt: fix clang -Wformat warnings

2 years agoMerge tag 'xfs-5.20-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Sat, 13 Aug 2022 20:50:11 +0000 (13:50 -0700)]
Merge tag 'xfs-5.20-merge-8' of git://git./fs/xfs/xfs-linux

Pull more xfs updates from Darrick Wong:
 "There's not a lot this time around, just the usual bug fixes and
  corrections for missing error returns.

   - Return error codes from block device flushes to userspace

   - Fix a deadlock between reclaim and mount time quotacheck

   - Fix an unnecessary ENOSPC return when doing COW on a filesystem
     with severe free space fragmentation

   - Fix a miscalculation in the transaction reservation computations
     for file removal operations"

* tag 'xfs-5.20-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix inode reservation space for removing transaction
  xfs: Fix false ENOSPC when performing direct write on a delalloc extent in cow fork
  xfs: fix intermittent hang during quotacheck
  xfs: check return codes when flushing block devices

2 years agoMerge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 13 Aug 2022 20:41:48 +0000 (13:41 -0700)]
Merge tag 'scsi-misc' of git://git./linux/kernel/git/jejb/scsi

Pull more SCSI updates from James Bottomley:
 "Mostly small bug fixes and trivial updates.

  The major new core update is a change to the way device, target and
  host reference counting is done to try to make it more robust (this
  change has soaked for a while to try to winkle out any bugs)"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: pm8001: Fix typo 'the the' in comment
  scsi: megaraid_sas: Remove redundant variable cmd_type
  scsi: FlashPoint: Remove redundant variable bm_int_st
  scsi: zfcp: Fix missing auto port scan and thus missing target ports
  scsi: core: Call blk_mq_free_tag_set() earlier
  scsi: core: Simplify LLD module reference counting
  scsi: core: Make sure that hosts outlive targets
  scsi: core: Make sure that targets outlive devices
  scsi: ufs: ufs-pci: Correct check for RESET DSM
  scsi: target: core: De-RCU of se_lun and se_lun acl
  scsi: target: core: Fix race during ACL removal
  scsi: ufs: core: Correct ufshcd_shutdown() flow
  scsi: ufs: core: Increase the maximum data buffer size
  scsi: lpfc: Check the return value of alloc_workqueue()

2 years agoMerge tag 'block-6.0-2022-08-12' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 13 Aug 2022 20:37:36 +0000 (13:37 -0700)]
Merge tag 'block-6.0-2022-08-12' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - NVMe pull request
     - print nvme connect Linux error codes properly (Amit Engel)
     - fix the fc_appid_store return value (Christoph Hellwig)
     - fix a typo in an error message (Christophe JAILLET)
     - add another non-unique identifier quirk (Dennis P. Kliem)
     - check if the queue is allocated before stopping it in nvme-tcp
       (Maurizio Lombardi)
     - restart admin queue if the caller needs to restart queue in
       nvme-fc (Ming Lei)
     - use kmemdup instead of kmalloc + memcpy in nvme-auth (Zhang
       Xiaoxu)

 - __alloc_disk_node() error handling fix (Rafael)

* tag 'block-6.0-2022-08-12' of git://git.kernel.dk/linux-block:
  block: Do not call blk_put_queue() if gendisk allocation fails
  nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG GAMMIX S70
  nvme-tcp: check if the queue is allocated before stopping it
  nvme-fabrics: Fix a typo in an error message
  nvme-fabrics: parse nvme connect Linux error codes
  nvmet-auth: use kmemdup instead of kmalloc + memcpy
  nvme-fc: fix the fc_appid_store return value
  nvme-fc: restart admin queue if the caller needs to restart queue

2 years agoMerge tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 13 Aug 2022 20:28:54 +0000 (13:28 -0700)]
Merge tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:

 - Regression fix for this merge window, fixing a wrong order of
   arguments for io_req_set_res() for passthru (Dylan)

 - Fix for the audit code leaking context memory (Peilin)

 - Ensure that provided buffers are memcg accounted (Pavel)

 - Correctly handle short zero-copy sends (Pavel)

 - Sparse warning fixes for the recvmsg multishot command (Dylan)

 - Error handling fix for passthru (Anuj)

 - Remove randomization of struct kiocb fields, to avoid it growing in
   size if re-arranged in such a fashion that it grows more holes or
   padding (Keith, Linus)

 - Small series improving type safety of the sqe fields (Stefan)

* tag 'io_uring-6.0-2022-08-13' of git://git.kernel.dk/linux-block:
  io_uring: add missing BUILD_BUG_ON() checks for new io_uring_sqe fields
  io_uring: make io_kiocb_to_cmd() typesafe
  fs: don't randomize struct kiocb fields
  io_uring: consistently make use of io_notif_to_data()
  io_uring: fix error handling for io_uring_cmd
  io_uring: fix io_recvmsg_prep_multishot sparse warnings
  io_uring/net: send retry for zerocopy
  io_uring: mem-account pbuf buckets
  audit, io_uring, io-wq: Fix memory leak in io_sq_thread() and io_wqe_worker()
  io_uring: pass correct parameters to io_req_set_res

2 years agoperf test: Refactor shell tests allowing subdirs
Carsten Haitzler [Fri, 12 Aug 2022 12:16:28 +0000 (13:16 +0100)]
perf test: Refactor shell tests allowing subdirs

This is a prelude to adding more tests to shell tests and in order to
support putting those tests into subdirectories, I need to change the
test code that scans/finds and runs them.

To support subdirs I have to recurse so it's time to refactor the code
to allow this and centralize the shell script finding into one location
and only one single scan that builds a list of all the found tests in
memory instead of it being duplicated in 3 places.

This code also optimizes things like knowing the max width of desciption
strings (as we can do that while we scan instead of a whole new pass of
opening files).

It also more cleanly filters scripts to see only *.sh files thus
skipping random other files in directories like *~ backup files, other
random junk/data files that may appear and the scripts must be
executable to make the cut (this ensures the script lib dir is not seen
as scripts to run).

This avoids perf test running previous older versions of test scripts
that are editor backup files as well as skipping perf.data files that
may appear and so on.

Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Carsten Haitzler <carsten.haitzler@arm.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Link: https://lore.kernel.org/r/20220812121641.336465-2-carsten.haitzler@foss.arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf vendor events: Update events for snowridgex
Zhengjun Xing [Fri, 12 Aug 2022 08:52:39 +0000 (16:52 +0800)]
perf vendor events: Update events for snowridgex

Update the events to v1.20, update events for snowridgex by the latest
event converter tools.

Use script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

to download and generate the latest events and metrics. Manually copy
the snowridgex files into perf.

Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220812085239.3089231-12-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf vendor events: Update events and metrics for skylakex
Zhengjun Xing [Fri, 12 Aug 2022 08:52:38 +0000 (16:52 +0800)]
perf vendor events: Update events and metrics for skylakex

Update the events to v1.28, the metrics are based on TMA 4.4 full, update
events and metrics for skylakex by the latest event converter tools.

Use script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

to download and generate the latest events and metrics. Manually copy
the skylakex files into perf.

Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220812085239.3089231-11-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf vendor events: Update metrics for sapphirerapids
Zhengjun Xing [Fri, 12 Aug 2022 08:52:37 +0000 (16:52 +0800)]
perf vendor events: Update metrics for sapphirerapids

The metrics are based on TMA 4.4 full, add new metrics “UNCORE_FREQ” for
sapphirerapids.

Use script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

to download and generate the latest events and metrics. Manually copy
the sapphirerapids files into perf.

Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220812085239.3089231-10-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf vendor events: Update events for knightslanding
Zhengjun Xing [Fri, 12 Aug 2022 08:52:36 +0000 (16:52 +0800)]
perf vendor events: Update events for knightslanding

Update the events to v9, update events for knightslanding by the latest
event converter tools.

Use script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

to download and generate the latest events and metrics. Manually copy
the knightslanding files into perf.

Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220812085239.3089231-9-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf vendor events: Update metrics for jaketown
Zhengjun Xing [Fri, 12 Aug 2022 08:52:35 +0000 (16:52 +0800)]
perf vendor events: Update metrics for jaketown

The metrics are based on TMA 4.4 full, add new metrics “UNCORE_FREQ” for
jaketown.

Use script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

to download and generate the latest events and metrics. Manually copy
the jaketown files into perf.

Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220812085239.3089231-8-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf vendor events: Update metrics for ivytown
Zhengjun Xing [Fri, 12 Aug 2022 08:52:34 +0000 (16:52 +0800)]
perf vendor events: Update metrics for ivytown

The metrics are based on TMA 4.4 full, add new metrics “UNCORE_FREQ” for
ivytown.

Use script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

to download and generate the latest events and metrics. Manually copy
the ivytown files into perf.

Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220812085239.3089231-7-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf vendor events: Update events and metrics for icelakex
Zhengjun Xing [Fri, 12 Aug 2022 08:52:33 +0000 (16:52 +0800)]
perf vendor events: Update events and metrics for icelakex

Update the events to v1.15, the metrics are based on TMA 4.4 full, update
events and metrics for icelakex by the latest event converter tools.

Use script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

to download and generate the latest events and metrics. Manually copy
the icelakex files into perf.

Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220812085239.3089231-6-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf vendor events: Update events and metrics for haswellx
Zhengjun Xing [Fri, 12 Aug 2022 08:52:32 +0000 (16:52 +0800)]
perf vendor events: Update events and metrics for haswellx

Update the events to v25, the metrics are based on TMA 4.4 full, update
events and metrics for haswellx by the latest event converter tools.

Use script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

to download and generate the latest events and metrics. Manually copy
the haswellx files into perf.

Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220812085239.3089231-5-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf vendor events: Update events and metrics for cascadelakex
Zhengjun Xing [Fri, 12 Aug 2022 08:52:31 +0000 (16:52 +0800)]
perf vendor events: Update events and metrics for cascadelakex

Update to v16, the metrics are based on TMA 4.4 full, update events and add
new metrics “UNCORE_FREQ” for cascadelakex.

Use script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

to download and generate the latest events and metrics. Manually copy
the cascadelakex files into perf.

Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220812085239.3089231-4-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf vendor events: Update events and metrics for broadwellx
Zhengjun Xing [Fri, 12 Aug 2022 08:52:30 +0000 (16:52 +0800)]
perf vendor events: Update events and metrics for broadwellx

Update to v19, the metrics are based on TMA 4.4 full, update events and add
new metrics “UNCORE_FREQ” for broadwellx.

Use script at:
https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

to download and generate the latest events and metrics. Manually copy
the broadwellx files into perf.

Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220812085239.3089231-3-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf vendor events: Update metrics for broadwellde
Zhengjun Xing [Fri, 12 Aug 2022 08:52:29 +0000 (16:52 +0800)]
perf vendor events: Update metrics for broadwellde

The metrics are based on TMA 4.4 full, add new metrics “UNCORE_FREQ” for
broadwellde.

Use script at:

  https://github.com/intel/event-converter-for-linux-perf/blob/master/download_and_gen.py

to download and generate the latest events and metrics. Manually copy
the broadwellde files into perf.

Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Tested-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220812085239.3089231-2-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf jevents: Fold strings optimization
Ian Rogers [Fri, 12 Aug 2022 23:09:49 +0000 (16:09 -0700)]
perf jevents: Fold strings optimization

If a shorter string ends a longer string then the shorter string may
reuse the longer string at an offset. For example, on x86 the event
arith.cycles_div_busy and cycles_div_busy can be folded, even though
they have difference names the strings are identical after 6
characters. cycles_div_busy can reuse the arith.cycles_div_busy string
at an offset of 6.

In pmu-events.c this looks like the following where the 'also:' lists
folded strings:

/* offset=177541 */ "arith.cycles_div_busy\000\000pipeline\000Cycles the divider is busy\000\000\000event=0x14,period=2000000,umask=0x1\000\000\000\000\000\000\000\000\000" /* also: cycles_div_busy\000\000pipeline\000Cycles the divider is busy\000\000\000event=0x14,period=2000000,umask=0x1\000\000\000\000\000\000\000\000\000 */

As jevents.py combines multiple strings for an event into a larger
string, the amount of folding is minimal as all parts of the event must
align. Other organizations can benefit more from folding, but lose space
by say recording more offsets.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220812230949.683239-15-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf jevents: Compress the pmu_events_table
Ian Rogers [Fri, 12 Aug 2022 23:09:48 +0000 (16:09 -0700)]
perf jevents: Compress the pmu_events_table

The pmu_events array requires 15 pointers per entry which in position
independent code need relocating. Change the array to be an array of
offsets within a big C string. Only the offset of the first variable is
required, subsequent variables are stored in order after the \0
terminator (requiring a byte per variable rather than 4 bytes per
offset).

The file size savings are:

no jevents - the same 19,788,464bytes
x86 jevents - ~16.7% file size saving 23,744,288bytes vs 28,502,632bytes
all jevents - ~19.5% file size saving 24,469,056bytes vs 30,379,920bytes
default build options plus NO_LIBBFD=1.

For example, the x86 build savings come from .rela.dyn and
.data.rel.ro becoming smaller by 3,157,032bytes and 3,042,016bytes
respectively. .rodata increases by 1,432,448bytes, giving an overall
4,766,600bytes saving.

To make metric strings more shareable, the topic is changed from say
'skx metrics' to just 'metrics'.

To try to help with the memory layout the pmu_events are ordered as used
by perf qsort comparator functions.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220812230949.683239-14-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf metrics: Copy entire pmu_event in find metric
Ian Rogers [Fri, 12 Aug 2022 23:09:47 +0000 (16:09 -0700)]
perf metrics: Copy entire pmu_event in find metric

The pmu_event passed to the pmu_events_table_for_each_event is invalid
after the loop. Copy the entire struct in metricgroup__find_metric.
Reduce the scope of this function to static.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220812230949.683239-13-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf pmu-events: Hide the pmu_events
Ian Rogers [Fri, 12 Aug 2022 23:09:46 +0000 (16:09 -0700)]
perf pmu-events: Hide the pmu_events

Hide that the pmu_event structs are an array with a new wrapper struct.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220812230949.683239-12-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf pmu-events: Don't assume pmu_event is an array
Ian Rogers [Fri, 12 Aug 2022 23:09:45 +0000 (16:09 -0700)]
perf pmu-events: Don't assume pmu_event is an array

The current code assumes that a struct pmu_event can be iterated over
forward until a NULL pmu_event is encountered.

This makes it difficult to refactor pmu_event.

Add a loop function taking a callback function that's passed the struct
pmu_event.

This way the pmu_event is only needed for one element and not an entire
array.

Switch existing code iterating over the pmu_event arrays to use the new
loop function pmu_events_table_for_each_event.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220812230949.683239-11-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf pmu-events: Move test events/metrics to JSON
Ian Rogers [Fri, 12 Aug 2022 23:09:44 +0000 (16:09 -0700)]
perf pmu-events: Move test events/metrics to JSON

Move arrays of pmu_events into the JSON code so that it may be
regenerated and modified by the jevents.py script.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220812230949.683239-10-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf test: Use full metric resolution
Ian Rogers [Fri, 12 Aug 2022 23:09:43 +0000 (16:09 -0700)]
perf test: Use full metric resolution

The simple metric resolution doesn't handle recursion properly, switch
to use the full resolution as with the parse-metric tests which also
increases coverage. Don't set the values for the metric backward as
failures to generate a result are ignored.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220812230949.683239-9-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf pmu-events: Hide pmu_events_map
Ian Rogers [Fri, 12 Aug 2022 23:09:42 +0000 (16:09 -0700)]
perf pmu-events: Hide pmu_events_map

Move usage of the table to pmu-events.c so it may be hidden. By
abstracting the table the implementation can later be changed.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220812230949.683239-8-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf pmu-events: Avoid passing pmu_events_map
Ian Rogers [Fri, 12 Aug 2022 23:09:41 +0000 (16:09 -0700)]
perf pmu-events: Avoid passing pmu_events_map

Preparation for hiding pmu_events_map as an implementation detail. While
the map is passed, the table of events is all that is normally wanted.

While modifying the function's types, rename pmu_events_map__find to
pmu_events_table__find to match later encapsulation. Similarly rename
pmu_add_cpu_aliases_map to pmu_add_cpu_aliases_table.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220812230949.683239-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf pmu-events: Hide pmu_sys_event_tables
Ian Rogers [Fri, 12 Aug 2022 23:09:40 +0000 (16:09 -0700)]
perf pmu-events: Hide pmu_sys_event_tables

Move usage of the table to pmu-events.c so it may be hidden. By
abstracting the table the implementation can later be changed.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220812230949.683239-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf jevents: Sort JSON files entries
Ian Rogers [Fri, 12 Aug 2022 23:09:39 +0000 (16:09 -0700)]
perf jevents: Sort JSON files entries

Sort the JSON files entries on conversion to C. The sort order tries to
replicated cmp_sevent from pmu.c so that the input there is already
sorted except for sysfs events. Specifically, the sort order is given by
the tuple:

(not j.desc is None, fix_none(j.topic), fix_none(j.name), fix_none(j.pmu), fix_none(j.metric_name))

which is putting events with descriptions and topics before those
without, then sorting by name, then pmu and finally metric_name

Add the topic to JsonEvent on reading to simplify. Remove an unnecessary
lambda in the JSON reading.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220812230949.683239-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf jevents: Provide path to JSON file on error
Ian Rogers [Fri, 12 Aug 2022 23:09:38 +0000 (16:09 -0700)]
perf jevents: Provide path to JSON file on error

If a JSONDecoderError or similar is raised then it is useful to know the
path. Print this and then raise the exception agan.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220812230949.683239-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf jevents: Remove the type/version variables
Ian Rogers [Fri, 12 Aug 2022 23:09:37 +0000 (16:09 -0700)]
perf jevents: Remove the type/version variables

pmu_events_map has a type variable that is always initialized to "core"
and a version variable that is never read. Remove these from the API as
it is straightforward to add them back when necessary.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220812230949.683239-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2 years agoperf jevent: Add an 'all' architecture argument
Ian Rogers [Fri, 12 Aug 2022 23:09:36 +0000 (16:09 -0700)]
perf jevent: Add an 'all' architecture argument

When 'all' is passed as the architecture generate a mapping table for
all architectures. This simplifies testing. To identify the table for an
architecture add an arch variable to the pmu_events_map.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220812230949.683239-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>