--- /dev/null
+Locked Counters (aka ``QemuLockCnt``)
+=====================================
+
+QEMU often uses reference counts to track data structures that are being
+accessed and should not be freed. For example, a loop that invoke
+callbacks like this is not safe::
+
+ QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
+ if (ioh->revents & G_IO_OUT) {
+ ioh->fd_write(ioh->opaque);
+ }
+ }
+
+``QLIST_FOREACH_SAFE`` protects against deletion of the current node (``ioh``)
+by stashing away its ``next`` pointer. However, ``ioh->fd_write`` could
+actually delete the next node from the list. The simplest way to
+avoid this is to mark the node as deleted, and remove it from the
+list in the above loop::
+
+ QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
+ if (ioh->deleted) {
+ QLIST_REMOVE(ioh, next);
+ g_free(ioh);
+ } else {
+ if (ioh->revents & G_IO_OUT) {
+ ioh->fd_write(ioh->opaque);
+ }
+ }
+ }
+
+If however this loop must also be reentrant, i.e. it is possible that
+``ioh->fd_write`` invokes the loop again, some kind of counting is needed::
+
+ walking_handlers++;
+ QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
+ if (ioh->deleted) {
+ if (walking_handlers == 1) {
+ QLIST_REMOVE(ioh, next);
+ g_free(ioh);
+ }
+ } else {
+ if (ioh->revents & G_IO_OUT) {
+ ioh->fd_write(ioh->opaque);
+ }
+ }
+ }
+ walking_handlers--;
+
+One may think of using the RCU primitives, ``rcu_read_lock()`` and
+``rcu_read_unlock()``; effectively, the RCU nesting count would take
+the place of the walking_handlers global variable. Indeed,
+reference counting and RCU have similar purposes, but their usage in
+general is complementary:
+
+- reference counting is fine-grained and limited to a single data
+ structure; RCU delays reclamation of *all* RCU-protected data
+ structures;
+
+- reference counting works even in the presence of code that keeps
+ a reference for a long time; RCU critical sections in principle
+ should be kept short;
+
+- reference counting is often applied to code that is not thread-safe
+ but is reentrant; in fact, usage of reference counting in QEMU predates
+ the introduction of threads by many years. RCU is generally used to
+ protect readers from other threads freeing memory after concurrent
+ modifications to a data structure.
+
+- reclaiming data can be done by a separate thread in the case of RCU;
+ this can improve performance, but also delay reclamation undesirably.
+ With reference counting, reclamation is deterministic.
+
+This file documents ``QemuLockCnt``, an abstraction for using reference
+counting in code that has to be both thread-safe and reentrant.
+
+
+``QemuLockCnt`` concepts
+------------------------
+
+A ``QemuLockCnt`` comprises both a counter and a mutex; it has primitives
+to increment and decrement the counter, and to take and release the
+mutex. The counter notes how many visits to the data structures are
+taking place (the visits could be from different threads, or there could
+be multiple reentrant visits from the same thread). The basic rules
+governing the counter/mutex pair then are the following:
+
+- Data protected by the QemuLockCnt must not be freed unless the
+ counter is zero and the mutex is taken.
+
+- A new visit cannot be started while the counter is zero and the
+ mutex is taken.
+
+Most of the time, the mutex protects all writes to the data structure,
+not just frees, though there could be cases where this is not necessary.
+
+Reads, instead, can be done without taking the mutex, as long as the
+readers and writers use the same macros that are used for RCU, for
+example ``qatomic_rcu_read``, ``qatomic_rcu_set``, ``QLIST_FOREACH_RCU``,
+etc. This is because the reads are done outside a lock and a set
+or ``QLIST_INSERT_HEAD``
+can happen concurrently with the read. The RCU API ensures that the
+processor and the compiler see all required memory barriers.
+
+This could be implemented simply by protecting the counter with the
+mutex, for example::
+
+ // (1)
+ qemu_mutex_lock(&walking_handlers_mutex);
+ walking_handlers++;
+ qemu_mutex_unlock(&walking_handlers_mutex);
+
+ ...
+
+ // (2)
+ qemu_mutex_lock(&walking_handlers_mutex);
+ if (--walking_handlers == 0) {
+ QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
+ if (ioh->deleted) {
+ QLIST_REMOVE(ioh, next);
+ g_free(ioh);
+ }
+ }
+ }
+ qemu_mutex_unlock(&walking_handlers_mutex);
+
+Here, no frees can happen in the code represented by the ellipsis.
+If another thread is executing critical section (2), that part of
+the code cannot be entered, because the thread will not be able
+to increment the ``walking_handlers`` variable. And of course
+during the visit any other thread will see a nonzero value for
+``walking_handlers``, as in the single-threaded code.
+
+Note that it is possible for multiple concurrent accesses to delay
+the cleanup arbitrarily; in other words, for the ``walking_handlers``
+counter to never become zero. For this reason, this technique is
+more easily applicable if concurrent access to the structure is rare.
+
+However, critical sections are easy to forget since you have to do
+them for each modification of the counter. ``QemuLockCnt`` ensures that
+all modifications of the counter take the lock appropriately, and it
+can also be more efficient in two ways:
+
+- it avoids taking the lock for many operations (for example
+ incrementing the counter while it is non-zero);
+
+- on some platforms, one can implement ``QemuLockCnt`` to hold the lock
+ and the mutex in a single word, making the fast path no more expensive
+ than simply managing a counter using atomic operations (see
+ :doc:`atomics`). This can be very helpful if concurrent access to
+ the data structure is expected to be rare.
+
+
+Using the same mutex for frees and writes can still incur some small
+inefficiencies; for example, a visit can never start if the counter is
+zero and the mutex is taken -- even if the mutex is taken by a write,
+which in principle need not block a visit of the data structure.
+However, these are usually not a problem if any of the following
+assumptions are valid:
+
+- concurrent access is possible but rare
+
+- writes are rare
+
+- writes are frequent, but this kind of write (e.g. appending to a
+ list) has a very small critical section.
+
+For example, QEMU uses ``QemuLockCnt`` to manage an ``AioContext``'s list of
+bottom halves and file descriptor handlers. Modifications to the list
+of file descriptor handlers are rare. Creation of a new bottom half is
+frequent and can happen on a fast path; however: 1) it is almost never
+concurrent with a visit to the list of bottom halves; 2) it only has
+three instructions in the critical path, two assignments and a ``smp_wmb()``.
+
+
+``QemuLockCnt`` API
+-------------------
+
+The ``QemuLockCnt`` API is described in ``include/qemu/thread.h``.
+
+
+``QemuLockCnt`` usage
+---------------------
+
+This section explains the typical usage patterns for ``QemuLockCnt`` functions.
+
+Setting a variable to a non-NULL value can be done between
+``qemu_lockcnt_lock`` and ``qemu_lockcnt_unlock``::
+
+ qemu_lockcnt_lock(&xyz_lockcnt);
+ if (!xyz) {
+ new_xyz = g_new(XYZ, 1);
+ ...
+ qatomic_rcu_set(&xyz, new_xyz);
+ }
+ qemu_lockcnt_unlock(&xyz_lockcnt);
+
+Accessing the value can be done between ``qemu_lockcnt_inc`` and
+``qemu_lockcnt_dec``::
+
+ qemu_lockcnt_inc(&xyz_lockcnt);
+ if (xyz) {
+ XYZ *p = qatomic_rcu_read(&xyz);
+ ...
+ /* Accesses can now be done through "p". */
+ }
+ qemu_lockcnt_dec(&xyz_lockcnt);
+
+Freeing the object can similarly use ``qemu_lockcnt_lock`` and
+``qemu_lockcnt_unlock``, but you also need to ensure that the count
+is zero (i.e. there is no concurrent visit). Because ``qemu_lockcnt_inc``
+takes the ``QemuLockCnt``'s lock, the count cannot become non-zero while
+the object is being freed. Freeing an object looks like this::
+
+ qemu_lockcnt_lock(&xyz_lockcnt);
+ if (!qemu_lockcnt_count(&xyz_lockcnt)) {
+ g_free(xyz);
+ xyz = NULL;
+ }
+ qemu_lockcnt_unlock(&xyz_lockcnt);
+
+If an object has to be freed right after a visit, you can combine
+the decrement, the locking and the check on count as follows::
+
+ qemu_lockcnt_inc(&xyz_lockcnt);
+ if (xyz) {
+ XYZ *p = qatomic_rcu_read(&xyz);
+ ...
+ /* Accesses can now be done through "p". */
+ }
+ if (qemu_lockcnt_dec_and_lock(&xyz_lockcnt)) {
+ g_free(xyz);
+ xyz = NULL;
+ qemu_lockcnt_unlock(&xyz_lockcnt);
+ }
+
+``QemuLockCnt`` can also be used to access a list as follows::
+
+ qemu_lockcnt_inc(&io_handlers_lockcnt);
+ QLIST_FOREACH_RCU(ioh, &io_handlers, pioh) {
+ if (ioh->revents & G_IO_OUT) {
+ ioh->fd_write(ioh->opaque);
+ }
+ }
+
+ if (qemu_lockcnt_dec_and_lock(&io_handlers_lockcnt)) {
+ QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
+ if (ioh->deleted) {
+ QLIST_REMOVE(ioh, next);
+ g_free(ioh);
+ }
+ }
+ qemu_lockcnt_unlock(&io_handlers_lockcnt);
+ }
+
+Again, the RCU primitives are used because new items can be added to the
+list during the walk. ``QLIST_FOREACH_RCU`` ensures that the processor and
+the compiler see the appropriate memory barriers.
+
+An alternative pattern uses ``qemu_lockcnt_dec_if_lock``::
+
+ qemu_lockcnt_inc(&io_handlers_lockcnt);
+ QLIST_FOREACH_SAFE_RCU(ioh, &io_handlers, next, pioh) {
+ if (ioh->deleted) {
+ if (qemu_lockcnt_dec_if_lock(&io_handlers_lockcnt)) {
+ QLIST_REMOVE(ioh, next);
+ g_free(ioh);
+ qemu_lockcnt_inc_and_unlock(&io_handlers_lockcnt);
+ }
+ } else {
+ if (ioh->revents & G_IO_OUT) {
+ ioh->fd_write(ioh->opaque);
+ }
+ }
+ }
+ qemu_lockcnt_dec(&io_handlers_lockcnt);
+
+Here you can use ``qemu_lockcnt_dec`` instead of ``qemu_lockcnt_dec_and_lock``,
+because there is no special task to do if the count goes from 1 to 0.
+++ /dev/null
-DOCUMENTATION FOR LOCKED COUNTERS (aka QemuLockCnt)
-===================================================
-
-QEMU often uses reference counts to track data structures that are being
-accessed and should not be freed. For example, a loop that invoke
-callbacks like this is not safe:
-
- QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
- if (ioh->revents & G_IO_OUT) {
- ioh->fd_write(ioh->opaque);
- }
- }
-
-QLIST_FOREACH_SAFE protects against deletion of the current node (ioh)
-by stashing away its "next" pointer. However, ioh->fd_write could
-actually delete the next node from the list. The simplest way to
-avoid this is to mark the node as deleted, and remove it from the
-list in the above loop:
-
- QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
- if (ioh->deleted) {
- QLIST_REMOVE(ioh, next);
- g_free(ioh);
- } else {
- if (ioh->revents & G_IO_OUT) {
- ioh->fd_write(ioh->opaque);
- }
- }
- }
-
-If however this loop must also be reentrant, i.e. it is possible that
-ioh->fd_write invokes the loop again, some kind of counting is needed:
-
- walking_handlers++;
- QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
- if (ioh->deleted) {
- if (walking_handlers == 1) {
- QLIST_REMOVE(ioh, next);
- g_free(ioh);
- }
- } else {
- if (ioh->revents & G_IO_OUT) {
- ioh->fd_write(ioh->opaque);
- }
- }
- }
- walking_handlers--;
-
-One may think of using the RCU primitives, rcu_read_lock() and
-rcu_read_unlock(); effectively, the RCU nesting count would take
-the place of the walking_handlers global variable. Indeed,
-reference counting and RCU have similar purposes, but their usage in
-general is complementary:
-
-- reference counting is fine-grained and limited to a single data
- structure; RCU delays reclamation of *all* RCU-protected data
- structures;
-
-- reference counting works even in the presence of code that keeps
- a reference for a long time; RCU critical sections in principle
- should be kept short;
-
-- reference counting is often applied to code that is not thread-safe
- but is reentrant; in fact, usage of reference counting in QEMU predates
- the introduction of threads by many years. RCU is generally used to
- protect readers from other threads freeing memory after concurrent
- modifications to a data structure.
-
-- reclaiming data can be done by a separate thread in the case of RCU;
- this can improve performance, but also delay reclamation undesirably.
- With reference counting, reclamation is deterministic.
-
-This file documents QemuLockCnt, an abstraction for using reference
-counting in code that has to be both thread-safe and reentrant.
-
-
-QemuLockCnt concepts
---------------------
-
-A QemuLockCnt comprises both a counter and a mutex; it has primitives
-to increment and decrement the counter, and to take and release the
-mutex. The counter notes how many visits to the data structures are
-taking place (the visits could be from different threads, or there could
-be multiple reentrant visits from the same thread). The basic rules
-governing the counter/mutex pair then are the following:
-
-- Data protected by the QemuLockCnt must not be freed unless the
- counter is zero and the mutex is taken.
-
-- A new visit cannot be started while the counter is zero and the
- mutex is taken.
-
-Most of the time, the mutex protects all writes to the data structure,
-not just frees, though there could be cases where this is not necessary.
-
-Reads, instead, can be done without taking the mutex, as long as the
-readers and writers use the same macros that are used for RCU, for
-example qatomic_rcu_read, qatomic_rcu_set, QLIST_FOREACH_RCU, etc. This is
-because the reads are done outside a lock and a set or QLIST_INSERT_HEAD
-can happen concurrently with the read. The RCU API ensures that the
-processor and the compiler see all required memory barriers.
-
-This could be implemented simply by protecting the counter with the
-mutex, for example:
-
- // (1)
- qemu_mutex_lock(&walking_handlers_mutex);
- walking_handlers++;
- qemu_mutex_unlock(&walking_handlers_mutex);
-
- ...
-
- // (2)
- qemu_mutex_lock(&walking_handlers_mutex);
- if (--walking_handlers == 0) {
- QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
- if (ioh->deleted) {
- QLIST_REMOVE(ioh, next);
- g_free(ioh);
- }
- }
- }
- qemu_mutex_unlock(&walking_handlers_mutex);
-
-Here, no frees can happen in the code represented by the ellipsis.
-If another thread is executing critical section (2), that part of
-the code cannot be entered, because the thread will not be able
-to increment the walking_handlers variable. And of course
-during the visit any other thread will see a nonzero value for
-walking_handlers, as in the single-threaded code.
-
-Note that it is possible for multiple concurrent accesses to delay
-the cleanup arbitrarily; in other words, for the walking_handlers
-counter to never become zero. For this reason, this technique is
-more easily applicable if concurrent access to the structure is rare.
-
-However, critical sections are easy to forget since you have to do
-them for each modification of the counter. QemuLockCnt ensures that
-all modifications of the counter take the lock appropriately, and it
-can also be more efficient in two ways:
-
-- it avoids taking the lock for many operations (for example
- incrementing the counter while it is non-zero);
-
-- on some platforms, one can implement QemuLockCnt to hold the lock
- and the mutex in a single word, making the fast path no more expensive
- than simply managing a counter using atomic operations (see
- docs/devel/atomics.rst). This can be very helpful if concurrent access to
- the data structure is expected to be rare.
-
-
-Using the same mutex for frees and writes can still incur some small
-inefficiencies; for example, a visit can never start if the counter is
-zero and the mutex is taken---even if the mutex is taken by a write,
-which in principle need not block a visit of the data structure.
-However, these are usually not a problem if any of the following
-assumptions are valid:
-
-- concurrent access is possible but rare
-
-- writes are rare
-
-- writes are frequent, but this kind of write (e.g. appending to a
- list) has a very small critical section.
-
-For example, QEMU uses QemuLockCnt to manage an AioContext's list of
-bottom halves and file descriptor handlers. Modifications to the list
-of file descriptor handlers are rare. Creation of a new bottom half is
-frequent and can happen on a fast path; however: 1) it is almost never
-concurrent with a visit to the list of bottom halves; 2) it only has
-three instructions in the critical path, two assignments and a smp_wmb().
-
-
-QemuLockCnt API
----------------
-
-The QemuLockCnt API is described in include/qemu/thread.h.
-
-
-QemuLockCnt usage
------------------
-
-This section explains the typical usage patterns for QemuLockCnt functions.
-
-Setting a variable to a non-NULL value can be done between
-qemu_lockcnt_lock and qemu_lockcnt_unlock:
-
- qemu_lockcnt_lock(&xyz_lockcnt);
- if (!xyz) {
- new_xyz = g_new(XYZ, 1);
- ...
- qatomic_rcu_set(&xyz, new_xyz);
- }
- qemu_lockcnt_unlock(&xyz_lockcnt);
-
-Accessing the value can be done between qemu_lockcnt_inc and
-qemu_lockcnt_dec:
-
- qemu_lockcnt_inc(&xyz_lockcnt);
- if (xyz) {
- XYZ *p = qatomic_rcu_read(&xyz);
- ...
- /* Accesses can now be done through "p". */
- }
- qemu_lockcnt_dec(&xyz_lockcnt);
-
-Freeing the object can similarly use qemu_lockcnt_lock and
-qemu_lockcnt_unlock, but you also need to ensure that the count
-is zero (i.e. there is no concurrent visit). Because qemu_lockcnt_inc
-takes the QemuLockCnt's lock, the count cannot become non-zero while
-the object is being freed. Freeing an object looks like this:
-
- qemu_lockcnt_lock(&xyz_lockcnt);
- if (!qemu_lockcnt_count(&xyz_lockcnt)) {
- g_free(xyz);
- xyz = NULL;
- }
- qemu_lockcnt_unlock(&xyz_lockcnt);
-
-If an object has to be freed right after a visit, you can combine
-the decrement, the locking and the check on count as follows:
-
- qemu_lockcnt_inc(&xyz_lockcnt);
- if (xyz) {
- XYZ *p = qatomic_rcu_read(&xyz);
- ...
- /* Accesses can now be done through "p". */
- }
- if (qemu_lockcnt_dec_and_lock(&xyz_lockcnt)) {
- g_free(xyz);
- xyz = NULL;
- qemu_lockcnt_unlock(&xyz_lockcnt);
- }
-
-QemuLockCnt can also be used to access a list as follows:
-
- qemu_lockcnt_inc(&io_handlers_lockcnt);
- QLIST_FOREACH_RCU(ioh, &io_handlers, pioh) {
- if (ioh->revents & G_IO_OUT) {
- ioh->fd_write(ioh->opaque);
- }
- }
-
- if (qemu_lockcnt_dec_and_lock(&io_handlers_lockcnt)) {
- QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
- if (ioh->deleted) {
- QLIST_REMOVE(ioh, next);
- g_free(ioh);
- }
- }
- qemu_lockcnt_unlock(&io_handlers_lockcnt);
- }
-
-Again, the RCU primitives are used because new items can be added to the
-list during the walk. QLIST_FOREACH_RCU ensures that the processor and
-the compiler see the appropriate memory barriers.
-
-An alternative pattern uses qemu_lockcnt_dec_if_lock:
-
- qemu_lockcnt_inc(&io_handlers_lockcnt);
- QLIST_FOREACH_SAFE_RCU(ioh, &io_handlers, next, pioh) {
- if (ioh->deleted) {
- if (qemu_lockcnt_dec_if_lock(&io_handlers_lockcnt)) {
- QLIST_REMOVE(ioh, next);
- g_free(ioh);
- qemu_lockcnt_inc_and_unlock(&io_handlers_lockcnt);
- }
- } else {
- if (ioh->revents & G_IO_OUT) {
- ioh->fd_write(ioh->opaque);
- }
- }
- }
- qemu_lockcnt_dec(&io_handlers_lockcnt);
-
-Here you can use qemu_lockcnt_dec instead of qemu_lockcnt_dec_and_lock,
-because there is no special task to do if the count goes from 1 to 0.