docs: RCU: Convert checklist.txt to ReST

author Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Tue, 21 Apr 2020 17:04:02 +0000 (19:04 +0200)

committer Paul E. McKenney <paulmck@kernel.org>

Mon, 29 Jun 2020 18:58:10 +0000 (11:58 -0700)
author Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Tue, 21 Apr 2020 17:04:02 +0000 (19:04 +0200)
committer Paul E. McKenney <paulmck@kernel.org>
Mon, 29 Jun 2020 18:58:10 +0000 (11:58 -0700)
diff --git a/Documentation/RCU/checklist.rst b/Documentation/RCU/checklist.rst

new file mode 100644 (file)

index 0000000..2efed99
--- /dev/null
+++ b/Documentation/RCU/checklist.rst
@@ -0,0 +1,465 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================================
+Review Checklist for RCU Patches
+================================
+
+
+This document contains a checklist for producing and reviewing patches
+that make use of RCU.  Violating any of the rules listed below will
+result in the same sorts of problems that leaving out a locking primitive
+would cause.  This list is based on experiences reviewing such patches
+over a rather long period of time, but improvements are always welcome!
+
+0.     Is RCU being applied to a read-mostly situation?  If the data
+       structure is updated more than about 10% of the time, then you
+       should strongly consider some other approach, unless detailed
+       performance measurements show that RCU is nonetheless the right
+       tool for the job.  Yes, RCU does reduce read-side overhead by
+       increasing write-side overhead, which is exactly why normal uses
+       of RCU will do much more reading than updating.
+
+       Another exception is where performance is not an issue, and RCU
+       provides a simpler implementation.  An example of this situation
+       is the dynamic NMI code in the Linux 2.6 kernel, at least on
+       architectures where NMIs are rare.
+
+       Yet another exception is where the low real-time latency of RCU's
+       read-side primitives is critically important.
+
+       One final exception is where RCU readers are used to prevent
+       the ABA problem (https://en.wikipedia.org/wiki/ABA_problem)
+       for lockless updates.  This does result in the mildly
+       counter-intuitive situation where rcu_read_lock() and
+       rcu_read_unlock() are used to protect updates, however, this
+       approach provides the same potential simplifications that garbage
+       collectors do.
+
+1.     Does the update code have proper mutual exclusion?
+
+       RCU does allow -readers- to run (almost) naked, but -writers- must
+       still use some sort of mutual exclusion, such as:
+
+       a.      locking,
+       b.      atomic operations, or
+       c.      restricting updates to a single task.
+
+       If you choose #b, be prepared to describe how you have handled
+       memory barriers on weakly ordered machines (pretty much all of
+       them -- even x86 allows later loads to be reordered to precede
+       earlier stores), and be prepared to explain why this added
+       complexity is worthwhile.  If you choose #c, be prepared to
+       explain how this single task does not become a major bottleneck on
+       big multiprocessor machines (for example, if the task is updating
+       information relating to itself that other tasks can read, there
+       by definition can be no bottleneck).  Note that the definition
+       of "large" has changed significantly:  Eight CPUs was "large"
+       in the year 2000, but a hundred CPUs was unremarkable in 2017.
+
+2.     Do the RCU read-side critical sections make proper use of
+       rcu_read_lock() and friends?  These primitives are needed
+       to prevent grace periods from ending prematurely, which
+       could result in data being unceremoniously freed out from
+       under your read-side code, which can greatly increase the
+       actuarial risk of your kernel.
+
+       As a rough rule of thumb, any dereference of an RCU-protected
+       pointer must be covered by rcu_read_lock(), rcu_read_lock_bh(),
+       rcu_read_lock_sched(), or by the appropriate update-side lock.
+       Disabling of preemption can serve as rcu_read_lock_sched(), but
+       is less readable and prevents lockdep from detecting locking issues.
+
+       Letting RCU-protected pointers "leak" out of an RCU read-side
+       critical section is every bid as bad as letting them leak out
+       from under a lock.  Unless, of course, you have arranged some
+       other means of protection, such as a lock or a reference count
+       -before- letting them out of the RCU read-side critical section.
+
+3.     Does the update code tolerate concurrent accesses?
+
+       The whole point of RCU is to permit readers to run without
+       any locks or atomic operations.  This means that readers will
+       be running while updates are in progress.  There are a number
+       of ways to handle this concurrency, depending on the situation:
+
+       a.      Use the RCU variants of the list and hlist update
+               primitives to add, remove, and replace elements on
+               an RCU-protected list.  Alternatively, use the other
+               RCU-protected data structures that have been added to
+               the Linux kernel.
+
+               This is almost always the best approach.
+
+       b.      Proceed as in (a) above, but also maintain per-element
+               locks (that are acquired by both readers and writers)
+               that guard per-element state.  Of course, fields that
+               the readers refrain from accessing can be guarded by
+               some other lock acquired only by updaters, if desired.
+
+               This works quite well, also.
+
+       c.      Make updates appear atomic to readers.  For example,
+               pointer updates to properly aligned fields will
+               appear atomic, as will individual atomic primitives.
+               Sequences of operations performed under a lock will -not-
+               appear to be atomic to RCU readers, nor will sequences
+               of multiple atomic primitives.
+
+               This can work, but is starting to get a bit tricky.
+
+       d.      Carefully order the updates and the reads so that
+               readers see valid data at all phases of the update.
+               This is often more difficult than it sounds, especially
+               given modern CPUs' tendency to reorder memory references.
+               One must usually liberally sprinkle memory barriers
+               (smp_wmb(), smp_rmb(), smp_mb()) through the code,
+               making it difficult to understand and to test.
+
+               It is usually better to group the changing data into
+               a separate structure, so that the change may be made
+               to appear atomic by updating a pointer to reference
+               a new structure containing updated values.
+
+4.     Weakly ordered CPUs pose special challenges.  Almost all CPUs
+       are weakly ordered -- even x86 CPUs allow later loads to be
+       reordered to precede earlier stores.  RCU code must take all of
+       the following measures to prevent memory-corruption problems:
+
+       a.      Readers must maintain proper ordering of their memory
+               accesses.  The rcu_dereference() primitive ensures that
+               the CPU picks up the pointer before it picks up the data
+               that the pointer points to.  This really is necessary
+               on Alpha CPUs.  If you don't believe me, see:
+
+                       http://www.openvms.compaq.com/wizard/wiz_2637.html
+
+               The rcu_dereference() primitive is also an excellent
+               documentation aid, letting the person reading the
+               code know exactly which pointers are protected by RCU.
+               Please note that compilers can also reorder code, and
+               they are becoming increasingly aggressive about doing
+               just that.  The rcu_dereference() primitive therefore also
+               prevents destructive compiler optimizations.  However,
+               with a bit of devious creativity, it is possible to
+               mishandle the return value from rcu_dereference().
+               Please see rcu_dereference.txt in this directory for
+               more information.
+
+               The rcu_dereference() primitive is used by the
+               various "_rcu()" list-traversal primitives, such
+               as the list_for_each_entry_rcu().  Note that it is
+               perfectly legal (if redundant) for update-side code to
+               use rcu_dereference() and the "_rcu()" list-traversal
+               primitives.  This is particularly useful in code that
+               is common to readers and updaters.  However, lockdep
+               will complain if you access rcu_dereference() outside
+               of an RCU read-side critical section.  See lockdep.txt
+               to learn what to do about this.
+
+               Of course, neither rcu_dereference() nor the "_rcu()"
+               list-traversal primitives can substitute for a good
+               concurrency design coordinating among multiple updaters.
+
+       b.      If the list macros are being used, the list_add_tail_rcu()
+               and list_add_rcu() primitives must be used in order
+               to prevent weakly ordered machines from misordering
+               structure initialization and pointer planting.
+               Similarly, if the hlist macros are being used, the
+               hlist_add_head_rcu() primitive is required.
+
+       c.      If the list macros are being used, the list_del_rcu()
+               primitive must be used to keep list_del()'s pointer
+               poisoning from inflicting toxic effects on concurrent
+               readers.  Similarly, if the hlist macros are being used,
+               the hlist_del_rcu() primitive is required.
+
+               The list_replace_rcu() and hlist_replace_rcu() primitives
+               may be used to replace an old structure with a new one
+               in their respective types of RCU-protected lists.
+
+       d.      Rules similar to (4b) and (4c) apply to the "hlist_nulls"
+               type of RCU-protected linked lists.
+
+       e.      Updates must ensure that initialization of a given
+               structure happens before pointers to that structure are
+               publicized.  Use the rcu_assign_pointer() primitive
+               when publicizing a pointer to a structure that can
+               be traversed by an RCU read-side critical section.
+
+5.     If call_rcu() or call_srcu() is used, the callback function will
+       be called from softirq context.  In particular, it cannot block.
+
+6.     Since synchronize_rcu() can block, it cannot be called
+       from any sort of irq context.  The same rule applies
+       for synchronize_srcu(), synchronize_rcu_expedited(), and
+       synchronize_srcu_expedited().
+
+       The expedited forms of these primitives have the same semantics
+       as the non-expedited forms, but expediting is both expensive and
+       (with the exception of synchronize_srcu_expedited()) unfriendly
+       to real-time workloads.  Use of the expedited primitives should
+       be restricted to rare configuration-change operations that would
+       not normally be undertaken while a real-time workload is running.
+       However, real-time workloads can use rcupdate.rcu_normal kernel
+       boot parameter to completely disable expedited grace periods,
+       though this might have performance implications.
+
+       In particular, if you find yourself invoking one of the expedited
+       primitives repeatedly in a loop, please do everyone a favor:
+       Restructure your code so that it batches the updates, allowing
+       a single non-expedited primitive to cover the entire batch.
+       This will very likely be faster than the loop containing the
+       expedited primitive, and will be much much easier on the rest
+       of the system, especially to real-time workloads running on
+       the rest of the system.
+
+7.     As of v4.20, a given kernel implements only one RCU flavor,
+       which is RCU-sched for PREEMPT=n and RCU-preempt for PREEMPT=y.
+       If the updater uses call_rcu() or synchronize_rcu(),
+       then the corresponding readers my use rcu_read_lock() and
+       rcu_read_unlock(), rcu_read_lock_bh() and rcu_read_unlock_bh(),
+       or any pair of primitives that disables and re-enables preemption,
+       for example, rcu_read_lock_sched() and rcu_read_unlock_sched().
+       If the updater uses synchronize_srcu() or call_srcu(),
+       then the corresponding readers must use srcu_read_lock() and
+       srcu_read_unlock(), and with the same srcu_struct.  The rules for
+       the expedited primitives are the same as for their non-expedited
+       counterparts.  Mixing things up will result in confusion and
+       broken kernels, and has even resulted in an exploitable security
+       issue.
+
+       One exception to this rule: rcu_read_lock() and rcu_read_unlock()
+       may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh()
+       in cases where local bottom halves are already known to be
+       disabled, for example, in irq or softirq context.  Commenting
+       such cases is a must, of course!  And the jury is still out on
+       whether the increased speed is worth it.
+
+8.     Although synchronize_rcu() is slower than is call_rcu(), it
+       usually results in simpler code.  So, unless update performance is
+       critically important, the updaters cannot block, or the latency of
+       synchronize_rcu() is visible from userspace, synchronize_rcu()
+       should be used in preference to call_rcu().  Furthermore,
+       kfree_rcu() usually results in even simpler code than does
+       synchronize_rcu() without synchronize_rcu()'s multi-millisecond
+       latency.  So please take advantage of kfree_rcu()'s "fire and
+       forget" memory-freeing capabilities where it applies.
+
+       An especially important property of the synchronize_rcu()
+       primitive is that it automatically self-limits: if grace periods
+       are delayed for whatever reason, then the synchronize_rcu()
+       primitive will correspondingly delay updates.  In contrast,
+       code using call_rcu() should explicitly limit update rate in
+       cases where grace periods are delayed, as failing to do so can
+       result in excessive realtime latencies or even OOM conditions.
+
+       Ways of gaining this self-limiting property when using call_rcu()
+       include:
+
+       a.      Keeping a count of the number of data-structure elements
+               used by the RCU-protected data structure, including
+               those waiting for a grace period to elapse.  Enforce a
+               limit on this number, stalling updates as needed to allow
+               previously deferred frees to complete.  Alternatively,
+               limit only the number awaiting deferred free rather than
+               the total number of elements.
+
+               One way to stall the updates is to acquire the update-side
+               mutex.  (Don't try this with a spinlock -- other CPUs
+               spinning on the lock could prevent the grace period
+               from ever ending.)  Another way to stall the updates
+               is for the updates to use a wrapper function around
+               the memory allocator, so that this wrapper function
+               simulates OOM when there is too much memory awaiting an
+               RCU grace period.  There are of course many other
+               variations on this theme.
+
+       b.      Limiting update rate.  For example, if updates occur only
+               once per hour, then no explicit rate limiting is
+               required, unless your system is already badly broken.
+               Older versions of the dcache subsystem take this approach,
+               guarding updates with a global lock, limiting their rate.
+
+       c.      Trusted update -- if updates can only be done manually by
+               superuser or some other trusted user, then it might not
+               be necessary to automatically limit them.  The theory
+               here is that superuser already has lots of ways to crash
+               the machine.
+
+       d.      Periodically invoke synchronize_rcu(), permitting a limited
+               number of updates per grace period.
+
+       The same cautions apply to call_srcu() and kfree_rcu().
+
+       Note that although these primitives do take action to avoid memory
+       exhaustion when any given CPU has too many callbacks, a determined
+       user could still exhaust memory.  This is especially the case
+       if a system with a large number of CPUs has been configured to
+       offload all of its RCU callbacks onto a single CPU, or if the
+       system has relatively little free memory.
+
+9.     All RCU list-traversal primitives, which include
+       rcu_dereference(), list_for_each_entry_rcu(), and
+       list_for_each_safe_rcu(), must be either within an RCU read-side
+       critical section or must be protected by appropriate update-side
+       locks.  RCU read-side critical sections are delimited by
+       rcu_read_lock() and rcu_read_unlock(), or by similar primitives
+       such as rcu_read_lock_bh() and rcu_read_unlock_bh(), in which
+       case the matching rcu_dereference() primitive must be used in
+       order to keep lockdep happy, in this case, rcu_dereference_bh().
+
+       The reason that it is permissible to use RCU list-traversal
+       primitives when the update-side lock is held is that doing so
+       can be quite helpful in reducing code bloat when common code is
+       shared between readers and updaters.  Additional primitives
+       are provided for this case, as discussed in lockdep.txt.
+
+10.    Conversely, if you are in an RCU read-side critical section,
+       and you don't hold the appropriate update-side lock, you -must-
+       use the "_rcu()" variants of the list macros.  Failing to do so
+       will break Alpha, cause aggressive compilers to generate bad code,
+       and confuse people trying to read your code.
+
+11.    Any lock acquired by an RCU callback must be acquired elsewhere
+       with softirq disabled, e.g., via spin_lock_irqsave(),
+       spin_lock_bh(), etc.  Failing to disable softirq on a given
+       acquisition of that lock will result in deadlock as soon as
+       the RCU softirq handler happens to run your RCU callback while
+       interrupting that acquisition's critical section.
+
+12.    RCU callbacks can be and are executed in parallel.  In many cases,
+       the callback code simply wrappers around kfree(), so that this
+       is not an issue (or, more accurately, to the extent that it is
+       an issue, the memory-allocator locking handles it).  However,
+       if the callbacks do manipulate a shared data structure, they
+       must use whatever locking or other synchronization is required
+       to safely access and/or modify that data structure.
+
+       Do not assume that RCU callbacks will be executed on the same
+       CPU that executed the corresponding call_rcu() or call_srcu().
+       For example, if a given CPU goes offline while having an RCU
+       callback pending, then that RCU callback will execute on some
+       surviving CPU.  (If this was not the case, a self-spawning RCU
+       callback would prevent the victim CPU from ever going offline.)
+       Furthermore, CPUs designated by rcu_nocbs= might well -always-
+       have their RCU callbacks executed on some other CPUs, in fact,
+       for some  real-time workloads, this is the whole point of using
+       the rcu_nocbs= kernel boot parameter.
+
+13.    Unlike other forms of RCU, it -is- permissible to block in an
+       SRCU read-side critical section (demarked by srcu_read_lock()
+       and srcu_read_unlock()), hence the "SRCU": "sleepable RCU".
+       Please note that if you don't need to sleep in read-side critical
+       sections, you should be using RCU rather than SRCU, because RCU
+       is almost always faster and easier to use than is SRCU.
+
+       Also unlike other forms of RCU, explicit initialization and
+       cleanup is required either at build time via DEFINE_SRCU()
+       or DEFINE_STATIC_SRCU() or at runtime via init_srcu_struct()
+       and cleanup_srcu_struct().  These last two are passed a
+       "struct srcu_struct" that defines the scope of a given
+       SRCU domain.  Once initialized, the srcu_struct is passed
+       to srcu_read_lock(), srcu_read_unlock() synchronize_srcu(),
+       synchronize_srcu_expedited(), and call_srcu().  A given
+       synchronize_srcu() waits only for SRCU read-side critical
+       sections governed by srcu_read_lock() and srcu_read_unlock()
+       calls that have been passed the same srcu_struct.  This property
+       is what makes sleeping read-side critical sections tolerable --
+       a given subsystem delays only its own updates, not those of other
+       subsystems using SRCU.  Therefore, SRCU is less prone to OOM the
+       system than RCU would be if RCU's read-side critical sections
+       were permitted to sleep.
+
+       The ability to sleep in read-side critical sections does not
+       come for free.  First, corresponding srcu_read_lock() and
+       srcu_read_unlock() calls must be passed the same srcu_struct.
+       Second, grace-period-detection overhead is amortized only
+       over those updates sharing a given srcu_struct, rather than
+       being globally amortized as they are for other forms of RCU.
+       Therefore, SRCU should be used in preference to rw_semaphore
+       only in extremely read-intensive situations, or in situations
+       requiring SRCU's read-side deadlock immunity or low read-side
+       realtime latency.  You should also consider percpu_rw_semaphore
+       when you need lightweight readers.
+
+       SRCU's expedited primitive (synchronize_srcu_expedited())
+       never sends IPIs to other CPUs, so it is easier on
+       real-time workloads than is synchronize_rcu_expedited().
+
+       Note that rcu_assign_pointer() relates to SRCU just as it does to
+       other forms of RCU, but instead of rcu_dereference() you should
+       use srcu_dereference() in order to avoid lockdep splats.
+
+14.    The whole point of call_rcu(), synchronize_rcu(), and friends
+       is to wait until all pre-existing readers have finished before
+       carrying out some otherwise-destructive operation.  It is
+       therefore critically important to -first- remove any path
+       that readers can follow that could be affected by the
+       destructive operation, and -only- -then- invoke call_rcu(),
+       synchronize_rcu(), or friends.
+
+       Because these primitives only wait for pre-existing readers, it
+       is the caller's responsibility to guarantee that any subsequent
+       readers will execute safely.
+
+15.    The various RCU read-side primitives do -not- necessarily contain
+       memory barriers.  You should therefore plan for the CPU
+       and the compiler to freely reorder code into and out of RCU
+       read-side critical sections.  It is the responsibility of the
+       RCU update-side primitives to deal with this.
+
+       For SRCU readers, you can use smp_mb__after_srcu_read_unlock()
+       immediately after an srcu_read_unlock() to get a full barrier.
+
+16.    Use CONFIG_PROVE_LOCKING, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and the
+       __rcu sparse checks to validate your RCU code.  These can help
+       find problems as follows:
+
+       CONFIG_PROVE_LOCKING:
+               check that accesses to RCU-protected data
+               structures are carried out under the proper RCU
+               read-side critical section, while holding the right
+               combination of locks, or whatever other conditions
+               are appropriate.
+
+       CONFIG_DEBUG_OBJECTS_RCU_HEAD:
+               check that you don't pass the
+               same object to call_rcu() (or friends) before an RCU
+               grace period has elapsed since the last time that you
+               passed that same object to call_rcu() (or friends).
+
+       __rcu sparse checks:
+               tag the pointer to the RCU-protected data
+               structure with __rcu, and sparse will warn you if you
+               access that pointer without the services of one of the
+               variants of rcu_dereference().
+
+       These debugging aids can help you find problems that are
+       otherwise extremely difficult to spot.
+
+17.    If you register a callback using call_rcu() or call_srcu(), and
+       pass in a function defined within a loadable module, then it in
+       necessary to wait for all pending callbacks to be invoked after
+       the last invocation and before unloading that module.  Note that
+       it is absolutely -not- sufficient to wait for a grace period!
+       The current (say) synchronize_rcu() implementation is -not-
+       guaranteed to wait for callbacks registered on other CPUs.
+       Or even on the current CPU if that CPU recently went offline
+       and came back online.
+
+       You instead need to use one of the barrier functions:
+
+       -       call_rcu() -> rcu_barrier()
+       -       call_srcu() -> srcu_barrier()
+
+       However, these barrier functions are absolutely -not- guaranteed
+       to wait for a grace period.  In fact, if there are no call_rcu()
+       callbacks waiting anywhere in the system, rcu_barrier() is within
+       its rights to return immediately.
+
+       So if you need to wait for both an RCU grace period and for
+       all pre-existing call_rcu() callbacks, you will need to execute
+       both rcu_barrier() and synchronize_rcu(), if necessary, using
+       something like workqueues to to execute them concurrently.
+
+       See rcubarrier.txt for more information.
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt

deleted file mode 100644 (file)

index e98ff26..0000000
--- a/Documentation/RCU/checklist.txt
+++ /dev/null
@@ -1,458 +0,0 @@
-Review Checklist for RCU Patches
-
-
-This document contains a checklist for producing and reviewing patches
-that make use of RCU.  Violating any of the rules listed below will
-result in the same sorts of problems that leaving out a locking primitive
-would cause.  This list is based on experiences reviewing such patches
-over a rather long period of time, but improvements are always welcome!
-
-0.     Is RCU being applied to a read-mostly situation?  If the data
-       structure is updated more than about 10% of the time, then you
-       should strongly consider some other approach, unless detailed
-       performance measurements show that RCU is nonetheless the right
-       tool for the job.  Yes, RCU does reduce read-side overhead by
-       increasing write-side overhead, which is exactly why normal uses
-       of RCU will do much more reading than updating.
-
-       Another exception is where performance is not an issue, and RCU
-       provides a simpler implementation.  An example of this situation
-       is the dynamic NMI code in the Linux 2.6 kernel, at least on
-       architectures where NMIs are rare.
-
-       Yet another exception is where the low real-time latency of RCU's
-       read-side primitives is critically important.
-
-       One final exception is where RCU readers are used to prevent
-       the ABA problem (https://en.wikipedia.org/wiki/ABA_problem)
-       for lockless updates.  This does result in the mildly
-       counter-intuitive situation where rcu_read_lock() and
-       rcu_read_unlock() are used to protect updates, however, this
-       approach provides the same potential simplifications that garbage
-       collectors do.
-
-1.     Does the update code have proper mutual exclusion?
-
-       RCU does allow -readers- to run (almost) naked, but -writers- must
-       still use some sort of mutual exclusion, such as:
-
-       a.      locking,
-       b.      atomic operations, or
-       c.      restricting updates to a single task.
-
-       If you choose #b, be prepared to describe how you have handled
-       memory barriers on weakly ordered machines (pretty much all of
-       them -- even x86 allows later loads to be reordered to precede
-       earlier stores), and be prepared to explain why this added
-       complexity is worthwhile.  If you choose #c, be prepared to
-       explain how this single task does not become a major bottleneck on
-       big multiprocessor machines (for example, if the task is updating
-       information relating to itself that other tasks can read, there
-       by definition can be no bottleneck).  Note that the definition
-       of "large" has changed significantly:  Eight CPUs was "large"
-       in the year 2000, but a hundred CPUs was unremarkable in 2017.
-
-2.     Do the RCU read-side critical sections make proper use of
-       rcu_read_lock() and friends?  These primitives are needed
-       to prevent grace periods from ending prematurely, which
-       could result in data being unceremoniously freed out from
-       under your read-side code, which can greatly increase the
-       actuarial risk of your kernel.
-
-       As a rough rule of thumb, any dereference of an RCU-protected
-       pointer must be covered by rcu_read_lock(), rcu_read_lock_bh(),
-       rcu_read_lock_sched(), or by the appropriate update-side lock.
-       Disabling of preemption can serve as rcu_read_lock_sched(), but
-       is less readable and prevents lockdep from detecting locking issues.
-
-       Letting RCU-protected pointers "leak" out of an RCU read-side
-       critical section is every bid as bad as letting them leak out
-       from under a lock.  Unless, of course, you have arranged some
-       other means of protection, such as a lock or a reference count
-       -before- letting them out of the RCU read-side critical section.
-
-3.     Does the update code tolerate concurrent accesses?
-
-       The whole point of RCU is to permit readers to run without
-       any locks or atomic operations.  This means that readers will
-       be running while updates are in progress.  There are a number
-       of ways to handle this concurrency, depending on the situation:
-
-       a.      Use the RCU variants of the list and hlist update
-               primitives to add, remove, and replace elements on
-               an RCU-protected list.  Alternatively, use the other
-               RCU-protected data structures that have been added to
-               the Linux kernel.
-
-               This is almost always the best approach.
-
-       b.      Proceed as in (a) above, but also maintain per-element
-               locks (that are acquired by both readers and writers)
-               that guard per-element state.  Of course, fields that
-               the readers refrain from accessing can be guarded by
-               some other lock acquired only by updaters, if desired.
-
-               This works quite well, also.
-
-       c.      Make updates appear atomic to readers.  For example,
-               pointer updates to properly aligned fields will
-               appear atomic, as will individual atomic primitives.
-               Sequences of operations performed under a lock will -not-
-               appear to be atomic to RCU readers, nor will sequences
-               of multiple atomic primitives.
-
-               This can work, but is starting to get a bit tricky.
-
-       d.      Carefully order the updates and the reads so that
-               readers see valid data at all phases of the update.
-               This is often more difficult than it sounds, especially
-               given modern CPUs' tendency to reorder memory references.
-               One must usually liberally sprinkle memory barriers
-               (smp_wmb(), smp_rmb(), smp_mb()) through the code,
-               making it difficult to understand and to test.
-
-               It is usually better to group the changing data into
-               a separate structure, so that the change may be made
-               to appear atomic by updating a pointer to reference
-               a new structure containing updated values.
-
-4.     Weakly ordered CPUs pose special challenges.  Almost all CPUs
-       are weakly ordered -- even x86 CPUs allow later loads to be
-       reordered to precede earlier stores.  RCU code must take all of
-       the following measures to prevent memory-corruption problems:
-
-       a.      Readers must maintain proper ordering of their memory
-               accesses.  The rcu_dereference() primitive ensures that
-               the CPU picks up the pointer before it picks up the data
-               that the pointer points to.  This really is necessary
-               on Alpha CPUs.  If you don't believe me, see:
-
-                       http://www.openvms.compaq.com/wizard/wiz_2637.html
-
-               The rcu_dereference() primitive is also an excellent
-               documentation aid, letting the person reading the
-               code know exactly which pointers are protected by RCU.
-               Please note that compilers can also reorder code, and
-               they are becoming increasingly aggressive about doing
-               just that.  The rcu_dereference() primitive therefore also
-               prevents destructive compiler optimizations.  However,
-               with a bit of devious creativity, it is possible to
-               mishandle the return value from rcu_dereference().
-               Please see rcu_dereference.txt in this directory for
-               more information.
-
-               The rcu_dereference() primitive is used by the
-               various "_rcu()" list-traversal primitives, such
-               as the list_for_each_entry_rcu().  Note that it is
-               perfectly legal (if redundant) for update-side code to
-               use rcu_dereference() and the "_rcu()" list-traversal
-               primitives.  This is particularly useful in code that
-               is common to readers and updaters.  However, lockdep
-               will complain if you access rcu_dereference() outside
-               of an RCU read-side critical section.  See lockdep.txt
-               to learn what to do about this.
-
-               Of course, neither rcu_dereference() nor the "_rcu()"
-               list-traversal primitives can substitute for a good
-               concurrency design coordinating among multiple updaters.
-
-       b.      If the list macros are being used, the list_add_tail_rcu()
-               and list_add_rcu() primitives must be used in order
-               to prevent weakly ordered machines from misordering
-               structure initialization and pointer planting.
-               Similarly, if the hlist macros are being used, the
-               hlist_add_head_rcu() primitive is required.
-
-       c.      If the list macros are being used, the list_del_rcu()
-               primitive must be used to keep list_del()'s pointer
-               poisoning from inflicting toxic effects on concurrent
-               readers.  Similarly, if the hlist macros are being used,
-               the hlist_del_rcu() primitive is required.
-
-               The list_replace_rcu() and hlist_replace_rcu() primitives
-               may be used to replace an old structure with a new one
-               in their respective types of RCU-protected lists.
-
-       d.      Rules similar to (4b) and (4c) apply to the "hlist_nulls"
-               type of RCU-protected linked lists.
-
-       e.      Updates must ensure that initialization of a given
-               structure happens before pointers to that structure are
-               publicized.  Use the rcu_assign_pointer() primitive
-               when publicizing a pointer to a structure that can
-               be traversed by an RCU read-side critical section.
-
-5.     If call_rcu() or call_srcu() is used, the callback function will
-       be called from softirq context.  In particular, it cannot block.
-
-6.     Since synchronize_rcu() can block, it cannot be called
-       from any sort of irq context.  The same rule applies
-       for synchronize_srcu(), synchronize_rcu_expedited(), and
-       synchronize_srcu_expedited().
-
-       The expedited forms of these primitives have the same semantics
-       as the non-expedited forms, but expediting is both expensive and
-       (with the exception of synchronize_srcu_expedited()) unfriendly
-       to real-time workloads.  Use of the expedited primitives should
-       be restricted to rare configuration-change operations that would
-       not normally be undertaken while a real-time workload is running.
-       However, real-time workloads can use rcupdate.rcu_normal kernel
-       boot parameter to completely disable expedited grace periods,
-       though this might have performance implications.
-
-       In particular, if you find yourself invoking one of the expedited
-       primitives repeatedly in a loop, please do everyone a favor:
-       Restructure your code so that it batches the updates, allowing
-       a single non-expedited primitive to cover the entire batch.
-       This will very likely be faster than the loop containing the
-       expedited primitive, and will be much much easier on the rest
-       of the system, especially to real-time workloads running on
-       the rest of the system.
-
-7.     As of v4.20, a given kernel implements only one RCU flavor,
-       which is RCU-sched for PREEMPT=n and RCU-preempt for PREEMPT=y.
-       If the updater uses call_rcu() or synchronize_rcu(),
-       then the corresponding readers my use rcu_read_lock() and
-       rcu_read_unlock(), rcu_read_lock_bh() and rcu_read_unlock_bh(),
-       or any pair of primitives that disables and re-enables preemption,
-       for example, rcu_read_lock_sched() and rcu_read_unlock_sched().
-       If the updater uses synchronize_srcu() or call_srcu(),
-       then the corresponding readers must use srcu_read_lock() and
-       srcu_read_unlock(), and with the same srcu_struct.  The rules for
-       the expedited primitives are the same as for their non-expedited
-       counterparts.  Mixing things up will result in confusion and
-       broken kernels, and has even resulted in an exploitable security
-       issue.
-
-       One exception to this rule: rcu_read_lock() and rcu_read_unlock()
-       may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh()
-       in cases where local bottom halves are already known to be
-       disabled, for example, in irq or softirq context.  Commenting
-       such cases is a must, of course!  And the jury is still out on
-       whether the increased speed is worth it.
-
-8.     Although synchronize_rcu() is slower than is call_rcu(), it
-       usually results in simpler code.  So, unless update performance is
-       critically important, the updaters cannot block, or the latency of
-       synchronize_rcu() is visible from userspace, synchronize_rcu()
-       should be used in preference to call_rcu().  Furthermore,
-       kfree_rcu() usually results in even simpler code than does
-       synchronize_rcu() without synchronize_rcu()'s multi-millisecond
-       latency.  So please take advantage of kfree_rcu()'s "fire and
-       forget" memory-freeing capabilities where it applies.
-
-       An especially important property of the synchronize_rcu()
-       primitive is that it automatically self-limits: if grace periods
-       are delayed for whatever reason, then the synchronize_rcu()
-       primitive will correspondingly delay updates.  In contrast,
-       code using call_rcu() should explicitly limit update rate in
-       cases where grace periods are delayed, as failing to do so can
-       result in excessive realtime latencies or even OOM conditions.
-
-       Ways of gaining this self-limiting property when using call_rcu()
-       include:
-
-       a.      Keeping a count of the number of data-structure elements
-               used by the RCU-protected data structure, including
-               those waiting for a grace period to elapse.  Enforce a
-               limit on this number, stalling updates as needed to allow
-               previously deferred frees to complete.  Alternatively,
-               limit only the number awaiting deferred free rather than
-               the total number of elements.
-
-               One way to stall the updates is to acquire the update-side
-               mutex.  (Don't try this with a spinlock -- other CPUs
-               spinning on the lock could prevent the grace period
-               from ever ending.)  Another way to stall the updates
-               is for the updates to use a wrapper function around
-               the memory allocator, so that this wrapper function
-               simulates OOM when there is too much memory awaiting an
-               RCU grace period.  There are of course many other
-               variations on this theme.
-
-       b.      Limiting update rate.  For example, if updates occur only
-               once per hour, then no explicit rate limiting is
-               required, unless your system is already badly broken.
-               Older versions of the dcache subsystem take this approach,
-               guarding updates with a global lock, limiting their rate.
-
-       c.      Trusted update -- if updates can only be done manually by
-               superuser or some other trusted user, then it might not
-               be necessary to automatically limit them.  The theory
-               here is that superuser already has lots of ways to crash
-               the machine.
-
-       d.      Periodically invoke synchronize_rcu(), permitting a limited
-               number of updates per grace period.
-
-       The same cautions apply to call_srcu() and kfree_rcu().
-
-       Note that although these primitives do take action to avoid memory
-       exhaustion when any given CPU has too many callbacks, a determined
-       user could still exhaust memory.  This is especially the case
-       if a system with a large number of CPUs has been configured to
-       offload all of its RCU callbacks onto a single CPU, or if the
-       system has relatively little free memory.
-
-9.     All RCU list-traversal primitives, which include
-       rcu_dereference(), list_for_each_entry_rcu(), and
-       list_for_each_safe_rcu(), must be either within an RCU read-side
-       critical section or must be protected by appropriate update-side
-       locks.  RCU read-side critical sections are delimited by
-       rcu_read_lock() and rcu_read_unlock(), or by similar primitives
-       such as rcu_read_lock_bh() and rcu_read_unlock_bh(), in which
-       case the matching rcu_dereference() primitive must be used in
-       order to keep lockdep happy, in this case, rcu_dereference_bh().
-
-       The reason that it is permissible to use RCU list-traversal
-       primitives when the update-side lock is held is that doing so
-       can be quite helpful in reducing code bloat when common code is
-       shared between readers and updaters.  Additional primitives
-       are provided for this case, as discussed in lockdep.txt.
-
-10.    Conversely, if you are in an RCU read-side critical section,
-       and you don't hold the appropriate update-side lock, you -must-
-       use the "_rcu()" variants of the list macros.  Failing to do so
-       will break Alpha, cause aggressive compilers to generate bad code,
-       and confuse people trying to read your code.
-
-11.    Any lock acquired by an RCU callback must be acquired elsewhere
-       with softirq disabled, e.g., via spin_lock_irqsave(),
-       spin_lock_bh(), etc.  Failing to disable softirq on a given
-       acquisition of that lock will result in deadlock as soon as
-       the RCU softirq handler happens to run your RCU callback while
-       interrupting that acquisition's critical section.
-
-12.    RCU callbacks can be and are executed in parallel.  In many cases,
-       the callback code simply wrappers around kfree(), so that this
-       is not an issue (or, more accurately, to the extent that it is
-       an issue, the memory-allocator locking handles it).  However,
-       if the callbacks do manipulate a shared data structure, they
-       must use whatever locking or other synchronization is required
-       to safely access and/or modify that data structure.
-
-       Do not assume that RCU callbacks will be executed on the same
-       CPU that executed the corresponding call_rcu() or call_srcu().
-       For example, if a given CPU goes offline while having an RCU
-       callback pending, then that RCU callback will execute on some
-       surviving CPU.  (If this was not the case, a self-spawning RCU
-       callback would prevent the victim CPU from ever going offline.)
-       Furthermore, CPUs designated by rcu_nocbs= might well -always-
-       have their RCU callbacks executed on some other CPUs, in fact,
-       for some  real-time workloads, this is the whole point of using
-       the rcu_nocbs= kernel boot parameter.
-
-13.    Unlike other forms of RCU, it -is- permissible to block in an
-       SRCU read-side critical section (demarked by srcu_read_lock()
-       and srcu_read_unlock()), hence the "SRCU": "sleepable RCU".
-       Please note that if you don't need to sleep in read-side critical
-       sections, you should be using RCU rather than SRCU, because RCU
-       is almost always faster and easier to use than is SRCU.
-
-       Also unlike other forms of RCU, explicit initialization and
-       cleanup is required either at build time via DEFINE_SRCU()
-       or DEFINE_STATIC_SRCU() or at runtime via init_srcu_struct()
-       and cleanup_srcu_struct().  These last two are passed a
-       "struct srcu_struct" that defines the scope of a given
-       SRCU domain.  Once initialized, the srcu_struct is passed
-       to srcu_read_lock(), srcu_read_unlock() synchronize_srcu(),
-       synchronize_srcu_expedited(), and call_srcu().  A given
-       synchronize_srcu() waits only for SRCU read-side critical
-       sections governed by srcu_read_lock() and srcu_read_unlock()
-       calls that have been passed the same srcu_struct.  This property
-       is what makes sleeping read-side critical sections tolerable --
-       a given subsystem delays only its own updates, not those of other
-       subsystems using SRCU.  Therefore, SRCU is less prone to OOM the
-       system than RCU would be if RCU's read-side critical sections
-       were permitted to sleep.
-
-       The ability to sleep in read-side critical sections does not
-       come for free.  First, corresponding srcu_read_lock() and
-       srcu_read_unlock() calls must be passed the same srcu_struct.
-       Second, grace-period-detection overhead is amortized only
-       over those updates sharing a given srcu_struct, rather than
-       being globally amortized as they are for other forms of RCU.
-       Therefore, SRCU should be used in preference to rw_semaphore
-       only in extremely read-intensive situations, or in situations
-       requiring SRCU's read-side deadlock immunity or low read-side
-       realtime latency.  You should also consider percpu_rw_semaphore
-       when you need lightweight readers.
-
-       SRCU's expedited primitive (synchronize_srcu_expedited())
-       never sends IPIs to other CPUs, so it is easier on
-       real-time workloads than is synchronize_rcu_expedited().
-
-       Note that rcu_assign_pointer() relates to SRCU just as it does to
-       other forms of RCU, but instead of rcu_dereference() you should
-       use srcu_dereference() in order to avoid lockdep splats.
-
-14.    The whole point of call_rcu(), synchronize_rcu(), and friends
-       is to wait until all pre-existing readers have finished before
-       carrying out some otherwise-destructive operation.  It is
-       therefore critically important to -first- remove any path
-       that readers can follow that could be affected by the
-       destructive operation, and -only- -then- invoke call_rcu(),
-       synchronize_rcu(), or friends.
-
-       Because these primitives only wait for pre-existing readers, it
-       is the caller's responsibility to guarantee that any subsequent
-       readers will execute safely.
-
-15.    The various RCU read-side primitives do -not- necessarily contain
-       memory barriers.  You should therefore plan for the CPU
-       and the compiler to freely reorder code into and out of RCU
-       read-side critical sections.  It is the responsibility of the
-       RCU update-side primitives to deal with this.
-
-       For SRCU readers, you can use smp_mb__after_srcu_read_unlock()
-       immediately after an srcu_read_unlock() to get a full barrier.
-
-16.    Use CONFIG_PROVE_LOCKING, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and the
-       __rcu sparse checks to validate your RCU code.  These can help
-       find problems as follows:
-
-       CONFIG_PROVE_LOCKING: check that accesses to RCU-protected data
-               structures are carried out under the proper RCU
-               read-side critical section, while holding the right
-               combination of locks, or whatever other conditions
-               are appropriate.
-
-       CONFIG_DEBUG_OBJECTS_RCU_HEAD: check that you don't pass the
-               same object to call_rcu() (or friends) before an RCU
-               grace period has elapsed since the last time that you
-               passed that same object to call_rcu() (or friends).
-
-       __rcu sparse checks: tag the pointer to the RCU-protected data
-               structure with __rcu, and sparse will warn you if you
-               access that pointer without the services of one of the
-               variants of rcu_dereference().
-
-       These debugging aids can help you find problems that are
-       otherwise extremely difficult to spot.
-
-17.    If you register a callback using call_rcu() or call_srcu(), and
-       pass in a function defined within a loadable module, then it in
-       necessary to wait for all pending callbacks to be invoked after
-       the last invocation and before unloading that module.  Note that
-       it is absolutely -not- sufficient to wait for a grace period!
-       The current (say) synchronize_rcu() implementation is -not-
-       guaranteed to wait for callbacks registered on other CPUs.
-       Or even on the current CPU if that CPU recently went offline
-       and came back online.
-
-       You instead need to use one of the barrier functions:
-
-       o       call_rcu() -> rcu_barrier()
-       o       call_srcu() -> srcu_barrier()
-
-       However, these barrier functions are absolutely -not- guaranteed
-       to wait for a grace period.  In fact, if there are no call_rcu()
-       callbacks waiting anywhere in the system, rcu_barrier() is within
-       its rights to return immediately.
-
-       So if you need to wait for both an RCU grace period and for
-       all pre-existing call_rcu() callbacks, you will need to execute
-       both rcu_barrier() and synchronize_rcu(), if necessary, using
-       something like workqueues to to execute them concurrently.
-
-       See rcubarrier.txt for more information.
diff --git a/Documentation/RCU/index.rst b/Documentation/RCU/index.rst

index 81a0a1e5f767e870c3daaeffbd67ca2bbc958071..c1ba4d130bb0f4fda7f91ce4fbcf7db927b26492 100644 (file)
--- a/Documentation/RCU/index.rst
+++ b/Documentation/RCU/index.rst
@@ -1,3 +1,5 @@
+.. SPDX-License-Identifier: GPL-2.0
+
  .. _rcu_concepts:
  
  ============
@@ -8,6 +10,7 @@ RCU concepts
     :maxdepth: 3
  
     arrayRCU
+   checklist
     rcubarrier
     rcu_dereference
     whatisRCU
author	Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
	Tue, 21 Apr 2020 17:04:02 +0000 (19:04 +0200)
committer	Paul E. McKenney <paulmck@kernel.org>
	Mon, 29 Jun 2020 18:58:10 +0000 (11:58 -0700)
Documentation/RCU/checklist.rst	[new file with mode: 0644]	patch \| blob
Documentation/RCU/checklist.txt	[deleted file]	patch \| blob \| history
Documentation/RCU/index.rst		patch \| blob \| history