xfs: ratelimit unmount time per-buffer I/O error alert
authorBrian Foster <bfoster@redhat.com>
Wed, 6 May 2020 20:25:21 +0000 (13:25 -0700)
committerDarrick J. Wong <darrick.wong@oracle.com>
Thu, 7 May 2020 15:27:46 +0000 (08:27 -0700)
At unmount time, XFS emits an alert for every in-core buffer that
might have undergone a write error. In practice this behavior is
probably reasonable given that the filesystem is likely short lived
once I/O errors begin to occur consistently. Under certain test or
otherwise expected error conditions, this can spam the logs and slow
down the unmount.

Now that we have a ratelimit mechanism specifically for buffer
alerts, reuse it for the per-buffer alerts in xfs_wait_buftarg().
Also lift the final repair message out of the loop so it always
prints and assert that the metadata error handling code has shut
down the fs.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
fs/xfs/xfs_buf.c

index 594d5e1df6f869608328f91e4802adaf36c1d38f..3918270f4eab7b6e39203847a7fc5f0ac442c3f4 100644 (file)
@@ -1657,7 +1657,8 @@ xfs_wait_buftarg(
        struct xfs_buftarg      *btp)
 {
        LIST_HEAD(dispose);
-       int loop = 0;
+       int                     loop = 0;
+       bool                    write_fail = false;
 
        /*
         * First wait on the buftarg I/O count for all in-flight buffers to be
@@ -1685,17 +1686,29 @@ xfs_wait_buftarg(
                        bp = list_first_entry(&dispose, struct xfs_buf, b_lru);
                        list_del_init(&bp->b_lru);
                        if (bp->b_flags & XBF_WRITE_FAIL) {
-                               xfs_alert(btp->bt_mount,
+                               write_fail = true;
+                               xfs_buf_alert_ratelimited(bp,
+                                       "XFS: Corruption Alert",
 "Corruption Alert: Buffer at daddr 0x%llx had permanent write failures!",
                                        (long long)bp->b_bn);
-                               xfs_alert(btp->bt_mount,
-"Please run xfs_repair to determine the extent of the problem.");
                        }
                        xfs_buf_rele(bp);
                }
                if (loop++ != 0)
                        delay(100);
        }
+
+       /*
+        * If one or more failed buffers were freed, that means dirty metadata
+        * was thrown away. This should only ever happen after I/O completion
+        * handling has elevated I/O error(s) to permanent failures and shuts
+        * down the fs.
+        */
+       if (write_fail) {
+               ASSERT(XFS_FORCED_SHUTDOWN(btp->bt_mount));
+               xfs_alert(btp->bt_mount,
+             "Please run xfs_repair to determine the extent of the problem.");
+       }
 }
 
 static enum lru_status