blk-cgroup: fix list corruption from reorder of WRITE ->lqueued
authorMing Lei <ming.lei@redhat.com>
Wed, 15 May 2024 01:31:57 +0000 (09:31 +0800)
committerJens Axboe <axboe@kernel.dk>
Thu, 16 May 2024 02:14:20 +0000 (20:14 -0600)
__blkcg_rstat_flush() can be run anytime, especially when blk_cgroup_bio_start
is being executed.

If WRITE of `->lqueued` is re-ordered with READ of 'bisc->lnode.next' in
the loop of __blkcg_rstat_flush(), `next_bisc` can be assigned with one
stat instance being added in blk_cgroup_bio_start(), then the local
list in __blkcg_rstat_flush() could be corrupted.

Fix the issue by adding one barrier.

Cc: Tejun Heo <tj@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Fixes: 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20240515013157.443672-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
block/blk-cgroup.c

index 8699f193cf315dbe668794a2aba0c77a6696a675..52367a4501d0339708282fe5c72681665adf0c68 100644 (file)
@@ -1035,6 +1035,16 @@ static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu)
                struct blkg_iostat cur;
                unsigned int seq;
 
+               /*
+                * Order assignment of `next_bisc` from `bisc->lnode.next` in
+                * llist_for_each_entry_safe and clearing `bisc->lqueued` for
+                * avoiding to assign `next_bisc` with new next pointer added
+                * in blk_cgroup_bio_start() in case of re-ordering.
+                *
+                * The pair barrier is implied in llist_add() in blk_cgroup_bio_start().
+                */
+               smp_mb();
+
                WRITE_ONCE(bisc->lqueued, false);
 
                /* fetch the current per-cpu values */