linux.git
13 months agoxfs: consolidate btree block verification
Christoph Hellwig [Thu, 22 Feb 2024 20:40:57 +0000 (12:40 -0800)]
xfs: consolidate btree block verification

Add a __xfs_btree_check_block helper that can be called by the scrub code
to validate a btree block of any form, and move the duplicate error
handling code from xfs_btree_check_sblock and xfs_btree_check_lblock into
xfs_btree_check_block and thus remove these two helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: tighten up validation of root block in inode forks
Christoph Hellwig [Thu, 22 Feb 2024 20:40:57 +0000 (12:40 -0800)]
xfs: tighten up validation of root block in inode forks

Check that root blocks that sit in the inode fork and thus have a NULL
bp don't have siblings.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: remove the crc variable in __xfs_btree_check_lblock
Christoph Hellwig [Thu, 22 Feb 2024 20:40:56 +0000 (12:40 -0800)]
xfs: remove the crc variable in __xfs_btree_check_lblock

crc is only used once, just use the xfs_has_crc check directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: misc cleanups for __xfs_btree_check_sblock
Christoph Hellwig [Thu, 22 Feb 2024 20:40:55 +0000 (12:40 -0800)]
xfs: misc cleanups for __xfs_btree_check_sblock

Remove the local crc variable that is only used once and remove the bp
NULL checking as it can't ever be NULL for short form blocks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: consolidate btree ptr checking
Christoph Hellwig [Thu, 22 Feb 2024 20:40:54 +0000 (12:40 -0800)]
xfs: consolidate btree ptr checking

Merge xfs_btree_check_sptr and xfs_btree_check_lptr into a single
__xfs_btree_check_ptr that can be shared between xfs_btree_check_ptr
and the scrub code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: open code xfs_btree_check_lptr in xfs_bmap_btree_to_extents
Christoph Hellwig [Thu, 22 Feb 2024 20:40:53 +0000 (12:40 -0800)]
xfs: open code xfs_btree_check_lptr in xfs_bmap_btree_to_extents

xfs_bmap_btree_to_extents always passes a level of 1 to
xfs_btree_check_lptr, thus making the level check redundant.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: simplify xfs_btree_check_lblock_siblings
Christoph Hellwig [Thu, 22 Feb 2024 20:40:53 +0000 (12:40 -0800)]
xfs: simplify xfs_btree_check_lblock_siblings

Stop using xfs_btree_check_lptr in xfs_btree_check_lblock_siblings,
as it only duplicates the xfs_verify_fsbno call in the other leg of
if / else besides adding a tautological level check.

With this the cur and level arguments can be removed as they are
now unused.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: simplify xfs_btree_check_sblock_siblings
Christoph Hellwig [Thu, 22 Feb 2024 20:40:52 +0000 (12:40 -0800)]
xfs: simplify xfs_btree_check_sblock_siblings

Stop using xfs_btree_check_sptr in xfs_btree_check_sblock_siblings,
as it only duplicates the xfs_verify_agbno call in the other leg of
if / else besides adding a tautological level check.

With this the cur and level arguments can be removed as they are
now unused.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: remove xfs_btnum_t
Christoph Hellwig [Thu, 22 Feb 2024 20:40:51 +0000 (12:40 -0800)]
xfs: remove xfs_btnum_t

The last checks for bc_btnum can be replaced with helpers that check
the btree ops.  This allows adding new btrees to XFS without having
to update a global enum.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: complete the ops predicates]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: pass a 'bool is_finobt' to xfs_inobt_insert
Christoph Hellwig [Thu, 22 Feb 2024 20:40:50 +0000 (12:40 -0800)]
xfs: pass a 'bool is_finobt' to xfs_inobt_insert

This is one of the last users of xfs_btnum_t and can only designate
either the inobt or finobt.  Replace it with a simple bool.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: split xfs_inobt_init_cursor
Christoph Hellwig [Thu, 22 Feb 2024 20:40:49 +0000 (12:40 -0800)]
xfs: split xfs_inobt_init_cursor

Split xfs_inobt_init_cursor into separate routines for the inobt and
finobt to prepare for the removal of the xfs_btnum global enumeration
of btree types.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: split xfs_inobt_insert_sprec
Christoph Hellwig [Thu, 22 Feb 2024 20:40:48 +0000 (12:40 -0800)]
xfs: split xfs_inobt_insert_sprec

Split the finobt version that never merges and uses a different cursor
out of xfs_inobt_insert_sprec to prepare for removing xfs_btnum_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: remove the which variable in xchk_iallocbt
Christoph Hellwig [Thu, 22 Feb 2024 20:40:48 +0000 (12:40 -0800)]
xfs: remove the which variable in xchk_iallocbt

The which variable that holds a btree number is passed to two functions
that ignore it and used in a single check that can check the sm_type
as well.  Remove it to unclutter the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: remove the btnum argument to xfs_inobt_count_blocks
Christoph Hellwig [Thu, 22 Feb 2024 20:40:47 +0000 (12:40 -0800)]
xfs: remove the btnum argument to xfs_inobt_count_blocks

xfs_inobt_count_blocks is only used for the finobt.  Hardcode the btnum
argument and rename the function to match that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: remove xfs_inobt_cur
Christoph Hellwig [Thu, 22 Feb 2024 20:40:46 +0000 (12:40 -0800)]
xfs: remove xfs_inobt_cur

This helper provides no real advantage over just open code the two
calls in it in the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: split xfs_allocbt_init_cursor
Christoph Hellwig [Thu, 22 Feb 2024 20:40:12 +0000 (12:40 -0800)]
xfs: split xfs_allocbt_init_cursor

Split xfs_allocbt_init_cursor into separate routines for the by-bno
and by-cnt btrees to prepare for the removal of the xfs_btnum global
enumeration of btree types.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: refactor the btree cursor allocation logic in xchk_ag_btcur_init
Christoph Hellwig [Thu, 22 Feb 2024 20:39:48 +0000 (12:39 -0800)]
xfs: refactor the btree cursor allocation logic in xchk_ag_btcur_init

Change xchk_ag_btcur_init to allocate all cursors first and only then
check if we should delete them again because the btree is to damaged.

This allows reusing the sick_mask in struct xfs_btree_ops and simplifies
the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: add a sick_mask to struct xfs_btree_ops
Christoph Hellwig [Thu, 22 Feb 2024 20:39:47 +0000 (12:39 -0800)]
xfs: add a sick_mask to struct xfs_btree_ops

Clean up xfs_btree_mark_sick by adding a sick_mask to the btree-ops
for all AG-root btrees.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: add a name field to struct xfs_btree_ops
Christoph Hellwig [Thu, 22 Feb 2024 20:39:47 +0000 (12:39 -0800)]
xfs: add a name field to struct xfs_btree_ops

The btnum in struct xfs_btree_ops is often used for printing a symbolic
name for the btree.  Add a name field to the ops structure and use that
directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: split the agf_roots and agf_levels arrays
Christoph Hellwig [Thu, 22 Feb 2024 20:39:46 +0000 (12:39 -0800)]
xfs: split the agf_roots and agf_levels arrays

Using arrays of largely unrelated fields that use the btree number
as index is not very robust.  Split the arrays into three separate
fields instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: remove xfs_bmbt_stage_cursor
Christoph Hellwig [Thu, 22 Feb 2024 20:39:45 +0000 (12:39 -0800)]
xfs: remove xfs_bmbt_stage_cursor

Just open code the two calls in the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: fold xfs_bmbt_init_common into xfs_bmbt_init_cursor
Christoph Hellwig [Thu, 22 Feb 2024 20:39:44 +0000 (12:39 -0800)]
xfs: fold xfs_bmbt_init_common into xfs_bmbt_init_cursor

Make the levels initialization in xfs_bmbt_init_cursor conditional
and merge the two helpers.

This requires the fakeroot case to now pass a -1 whichfork directly
into xfs_bmbt_init_cursor, and some special casing for that, but
at least this scheme to deal with the fake btree root is handled and
documented in once place now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: tidy up a multline ternary]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: make staging file forks explicit
Darrick J. Wong [Thu, 22 Feb 2024 20:39:43 +0000 (12:39 -0800)]
xfs: make staging file forks explicit

Don't open-code "-1" for whichfork when we're creating a staging btree
for a repair; let's define an actual symbol to make grepping and
understanding easier.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: make full use of xfs_btree_stage_ifakeroot in xfs_bmbt_stage_cursor
Christoph Hellwig [Thu, 22 Feb 2024 20:39:43 +0000 (12:39 -0800)]
xfs: make full use of xfs_btree_stage_ifakeroot in xfs_bmbt_stage_cursor

Remove the duplicate cur->bc_nlevels assignment in xfs_bmbt_stage_cursor,
and move the cur->bc_ino.forksize assignment into
xfs_btree_stage_ifakeroot as it is part of setting up the fake btree
root.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: remove xfs_rmapbt_stage_cursor
Christoph Hellwig [Thu, 22 Feb 2024 20:39:42 +0000 (12:39 -0800)]
xfs: remove xfs_rmapbt_stage_cursor

xfs_rmapbt_stage_cursor is currently unused, but future callers can
trivially open code the two calls.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: fold xfs_rmapbt_init_common into xfs_rmapbt_init_cursor
Christoph Hellwig [Thu, 22 Feb 2024 20:39:41 +0000 (12:39 -0800)]
xfs: fold xfs_rmapbt_init_common into xfs_rmapbt_init_cursor

Make the levels initialization in xfs_rmapbt_init_cursor conditional
and merge the two helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: remove xfs_refcountbt_stage_cursor
Christoph Hellwig [Thu, 22 Feb 2024 20:39:40 +0000 (12:39 -0800)]
xfs: remove xfs_refcountbt_stage_cursor

Just open code the two calls in the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: fold xfs_refcountbt_init_common into xfs_refcountbt_init_cursor
Christoph Hellwig [Thu, 22 Feb 2024 20:39:39 +0000 (12:39 -0800)]
xfs: fold xfs_refcountbt_init_common into xfs_refcountbt_init_cursor

Make the levels initialization in xfs_refcountbt_init_cursor conditional
and merge the two helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: remove xfs_inobt_stage_cursor
Christoph Hellwig [Thu, 22 Feb 2024 20:39:39 +0000 (12:39 -0800)]
xfs: remove xfs_inobt_stage_cursor

Just open code the two calls in the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: fold xfs_inobt_init_common into xfs_inobt_init_cursor
Christoph Hellwig [Thu, 22 Feb 2024 20:39:38 +0000 (12:39 -0800)]
xfs: fold xfs_inobt_init_common into xfs_inobt_init_cursor

Make the levels initialization in xfs_inobt_init_cursor conditional
and merge the two helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: remove xfs_allocbt_stage_cursor
Christoph Hellwig [Thu, 22 Feb 2024 20:39:37 +0000 (12:39 -0800)]
xfs: remove xfs_allocbt_stage_cursor

Just open code the two calls in the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: fold xfs_allocbt_init_common into xfs_allocbt_init_cursor
Christoph Hellwig [Thu, 22 Feb 2024 20:39:36 +0000 (12:39 -0800)]
xfs: fold xfs_allocbt_init_common into xfs_allocbt_init_cursor

Make the levels initialization in xfs_allocbt_init_cursor conditional
and merge the two helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: don't override bc_ops for staging btrees
Christoph Hellwig [Thu, 22 Feb 2024 20:37:35 +0000 (12:37 -0800)]
xfs: don't override bc_ops for staging btrees

Add a few conditionals for staging btrees to the core btree code instead
of overloading the bc_ops vector.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: add a xfs_btree_init_ptr_from_cur
Christoph Hellwig [Thu, 22 Feb 2024 20:37:26 +0000 (12:37 -0800)]
xfs: add a xfs_btree_init_ptr_from_cur

Inode-rooted btrees don't need to initialize the root pointer in the
->init_ptr_from_cur method as the root is found by the
xfs_btree_get_iroot method later.  Make ->init_ptr_from_cur option
for inode rooted btrees by providing a helper that does the right
thing for the given btree type and also documents the semantics.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: move comment about two 2 keys per pointer in the rmap btree
Christoph Hellwig [Thu, 22 Feb 2024 20:37:25 +0000 (12:37 -0800)]
xfs: move comment about two 2 keys per pointer in the rmap btree

Move it to the relevant initialization of the ops structure instead
of a place that has nothing to do with the key size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: create predicate to determine if cursor is at inode root level
Darrick J. Wong [Thu, 22 Feb 2024 20:37:24 +0000 (12:37 -0800)]
xfs: create predicate to determine if cursor is at inode root level

Create a predicate to decide if the given cursor and level point to the
root block in the inode immediate area instead of a disk block, and get
rid of the open-coded logic everywhere.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: split the per-btree union in struct xfs_btree_cur
Christoph Hellwig [Thu, 22 Feb 2024 20:37:03 +0000 (12:37 -0800)]
xfs: split the per-btree union in struct xfs_btree_cur

Split up the union that encodes btree-specific fields in struct
xfs_btree_cur.  Most fields in there are specific to the btree type
encoded in xfs_btree_ops.type, and we can use the obviously named union
for that.  But one field is specific to the bmapbt and two are shared by
the refcount and rtrefcountbt.  Move those to a separate union to make
the usage clear and not need a separate struct for the refcount-related
fields.

This will also make unnecessary some very awkward btree cursor
refc/rtrefc switching logic in the rtrefcount patchset.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: split out a btree type from the btree ops geometry flags
Christoph Hellwig [Thu, 22 Feb 2024 20:36:17 +0000 (12:36 -0800)]
xfs: split out a btree type from the btree ops geometry flags

Two of the btree cursor flags are always used together and encode
the fundamental btree type.  There currently are two such types:

 1) an on-disk AG-rooted btree with 32-bit pointers
 2) an on-disk inode-rooted btree with 64-bit pointers

and we're about to add:

 3) an in-memory btree with 64-bit pointers

Introduce a new enum and a new type field in struct xfs_btree_geom
to encode this type directly instead of using flags and change most
code to switch on this enum.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: make the pointer lengths explicit]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: store the btree pointer length in struct xfs_btree_ops
Darrick J. Wong [Thu, 22 Feb 2024 20:35:36 +0000 (12:35 -0800)]
xfs: store the btree pointer length in struct xfs_btree_ops

Make the pointer length an explicit field in the btree operations
structure so that the next patch (which introduces an explicit btree
type enum) doesn't have to play a bunch of awkward games with inferring
the pointer length from the enumeration.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: factor out a btree block owner check
Darrick J. Wong [Thu, 22 Feb 2024 20:35:23 +0000 (12:35 -0800)]
xfs: factor out a btree block owner check

Hoist the btree block owner check into a separate helper so that we
don't have an ugly multiline if statement.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: factor out a xfs_btree_owner helper
Darrick J. Wong [Thu, 22 Feb 2024 20:35:22 +0000 (12:35 -0800)]
xfs: factor out a xfs_btree_owner helper

Split out a helper to calculate the owner for a given btree instead of
duplicating the logic in two places.  While we're at it, make the
bc_ag/bc_ino switch logic depend on the correct geometry flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: break this up into two patches for the owner check]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: move the btree stats offset into struct btree_ops
Christoph Hellwig [Thu, 22 Feb 2024 20:35:21 +0000 (12:35 -0800)]
xfs: move the btree stats offset into struct btree_ops

The statistics offset is completely static, move it into the btree_ops
structure instead of the cursor.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: move lru refs to the btree ops structure
Darrick J. Wong [Thu, 22 Feb 2024 20:35:20 +0000 (12:35 -0800)]
xfs: move lru refs to the btree ops structure

Move the btree buffer LRU refcount to the btree ops structure so that we
can eliminate the last bc_btnum switch in the generic btree code.  We're
about to create repair-specific btree types, and we don't want that
stuff cluttering up libxfs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: set btree block buffer ops in _init_buf
Darrick J. Wong [Thu, 22 Feb 2024 20:35:19 +0000 (12:35 -0800)]
xfs: set btree block buffer ops in _init_buf

Set the btree block buffer ops in xfs_btree_init_buf since we already
have access to that information through the btree ops.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: remove the unnecessary daddr paramter to _init_block
Darrick J. Wong [Thu, 22 Feb 2024 20:35:19 +0000 (12:35 -0800)]
xfs: remove the unnecessary daddr paramter to _init_block

Now that all of the callers pass XFS_BUF_DADDR_NULL as the daddr
parameter, we can elide that too.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: btree convert xfs_btree_init_block to xfs_btree_init_buf calls
Darrick J. Wong [Thu, 22 Feb 2024 20:35:18 +0000 (12:35 -0800)]
xfs: btree convert xfs_btree_init_block to xfs_btree_init_buf calls

Convert any place we call xfs_btree_init_block with a buffer to use the
_init_buf function.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: rename btree block/buffer init functions
Darrick J. Wong [Thu, 22 Feb 2024 20:35:17 +0000 (12:35 -0800)]
xfs: rename btree block/buffer init functions

Rename xfs_btree_init_block_int to xfs_btree_init_block, and
xfs_btree_init_block to xfs_btree_init_buf so that the name suggests the
type that caller are supposed to pass in.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: initialize btree blocks using btree_ops structure
Darrick J. Wong [Thu, 22 Feb 2024 20:35:16 +0000 (12:35 -0800)]
xfs: initialize btree blocks using btree_ops structure

Notice now that the btree ops structure encodes btree geometry flags and
the magic number through the buffer ops.  Refactor the btree block
initialization functions to use the btree ops so that we no longer have
to open code all that.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: extern some btree ops structures
Darrick J. Wong [Thu, 22 Feb 2024 20:35:15 +0000 (12:35 -0800)]
xfs: extern some btree ops structures

Expose these static btree ops structures so that we can reference them
in the AG initialization code in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: turn the allocbt cursor active field into a btree flag
Christoph Hellwig [Thu, 22 Feb 2024 20:35:15 +0000 (12:35 -0800)]
xfs: turn the allocbt cursor active field into a btree flag

Add a new XFS_BTREE_ALLOCBT_ACTIVE flag to replace the active field.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: consolidate the xfs_alloc_lookup_* helpers
Christoph Hellwig [Thu, 22 Feb 2024 20:35:14 +0000 (12:35 -0800)]
xfs: consolidate the xfs_alloc_lookup_* helpers

Add a single xfs_alloc_lookup helper to sort out the argument passing and
setting of the active flag instead of duplicating the logic three times.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: remove bc_ino.flags
Christoph Hellwig [Thu, 22 Feb 2024 20:35:13 +0000 (12:35 -0800)]
xfs: remove bc_ino.flags

Just move the two flags into bc_flags where there is plenty of space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
13 months agoxfs: encode the btree geometry flags in the btree ops structure
Darrick J. Wong [Thu, 22 Feb 2024 20:34:29 +0000 (12:34 -0800)]
xfs: encode the btree geometry flags in the btree ops structure

Certain btree flags never change for the life of a btree cursor because
they describe the geometry of the btree itself.  Encode these in the
btree ops structure and reduce the amount of code required in each btree
type's init_cursor functions.  This also frees up most of the bits in
bc_flags.

A previous version of this patch also converted the open-coded flags
logic to helpers.  This was removed due to the pending refactoring (that
follows this patch) to eliminate most of the state flags.

Conversion script:

sed \
 -e 's/XFS_BTREE_LONG_PTRS/XFS_BTGEO_LONG_PTRS/g' \
 -e 's/XFS_BTREE_ROOT_IN_INODE/XFS_BTGEO_ROOT_IN_INODE/g' \
 -e 's/XFS_BTREE_LASTREC_UPDATE/XFS_BTGEO_LASTREC_UPDATE/g' \
 -e 's/XFS_BTREE_OVERLAPPING/XFS_BTGEO_OVERLAPPING/g' \
 -e 's/cur->bc_flags & XFS_BTGEO_/cur->bc_ops->geom_flags \& XFS_BTGEO_/g' \
 -i $(git ls-files fs/xfs/*.[ch] fs/xfs/libxfs/*.[ch] fs/xfs/scrub/*.[ch])

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: fix imprecise logic in xchk_btree_check_block_owner
Darrick J. Wong [Thu, 22 Feb 2024 20:34:13 +0000 (12:34 -0800)]
xfs: fix imprecise logic in xchk_btree_check_block_owner

A reviewer was confused by the init_sa logic in this function.  Upon
checking the logic, I discovered that the code is imprecise.  What we
want to do here is check that there is an ownership record in the rmap
btree for the AG that contains a btree block.

For an inode-rooted btree (e.g. the bmbt) the per-AG btree cursors have
not been initialized because inode btrees can span multiple AGs.
Therefore, we must initialize the per-AG btree cursors in sc->sa before
proceeding.  That is what init_sa controls, and hence the logic should
be gated on XFS_BTREE_ROOT_IN_INODE, not XFS_BTREE_LONG_PTRS.

In practice, ROOT_IN_INODE and LONG_PTRS are coincident so this hasn't
mattered.  However, we're about to refactor both of those flags into
separate btree_ops fields so we want this the logic to make sense
afterwards.

Fixes: 858333dcf021a ("xfs: check btree block ownership with bnobt/rmapbt when scrubbing btree")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: drop XFS_BTREE_CRC_BLOCKS
Darrick J. Wong [Thu, 22 Feb 2024 20:34:12 +0000 (12:34 -0800)]
xfs: drop XFS_BTREE_CRC_BLOCKS

All existing btree types set XFS_BTREE_CRC_BLOCKS when running against a
V5 filesystem.  All currently proposed btree types are V5 only and use
the richer XFS_BTREE_CRC_BLOCKS format.  Therefore, we can drop this
flag and change the conditional to xfs_has_crc.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: set the btree cursor bc_ops in xfs_btree_alloc_cursor
Darrick J. Wong [Thu, 22 Feb 2024 20:33:18 +0000 (12:33 -0800)]
xfs: set the btree cursor bc_ops in xfs_btree_alloc_cursor

This is a precursor to putting more static data in the btree ops structure.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: consolidate btree block allocation tracepoints
Darrick J. Wong [Thu, 22 Feb 2024 20:33:07 +0000 (12:33 -0800)]
xfs: consolidate btree block allocation tracepoints

Don't waste tracepoint segment memory on per-btree block allocation
tracepoints when we can do it from the generic btree code.

With this patch applied, two tracepoints are collapsed into one
tracepoint, with the following effects on objdump -hx xfs.ko output:

Before:

 10 __tracepoints_ptrs 00000b38  0000000000000000  0000000000000000  001412f0  2**2
 14 __tracepoints_strings 00005433  0000000000000000  0000000000000000  001689a0  2**5
 29 __tracepoints 00010d30  0000000000000000  0000000000000000  0023fe00  2**5

After:

 10 __tracepoints_ptrs 00000b34  0000000000000000  0000000000000000  001417b0  2**2
 14 __tracepoints_strings 00005413  0000000000000000  0000000000000000  00168e80  2**5
 29 __tracepoints 00010cd0  0000000000000000  0000000000000000  00240760  2**5

Column 3 is the section size in bytes; removing these two tracepoints
reduces the size of the ELF segments by 132 bytes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: consolidate btree block freeing tracepoints
Darrick J. Wong [Thu, 22 Feb 2024 20:33:06 +0000 (12:33 -0800)]
xfs: consolidate btree block freeing tracepoints

Don't waste memory on extra per-btree block freeing tracepoints when we
can do it from the generic btree code.

With this patch applied, two tracepoints are collapsed into one
tracepoint, with the following effects on objdump -hx xfs.ko output:

Before:

 10 __tracepoints_ptrs 00000b3c  0000000000000000  0000000000000000  00140eb0  2**2
 14 __tracepoints_strings 00005453  0000000000000000  0000000000000000  00168540  2**5
 29 __tracepoints 00010d90  0000000000000000  0000000000000000  0023f5e0  2**5

After:

 10 __tracepoints_ptrs 00000b38  0000000000000000  0000000000000000  001412f0  2**2
 14 __tracepoints_strings 00005433  0000000000000000  0000000000000000  001689a0  2**5
 29 __tracepoints 00010d30  0000000000000000  0000000000000000  0023fe00  2**5

Column 3 is the section size in bytes; removing these two tracepoints
reduces the size of the ELF segments by 132 bytes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: repair summary counters
Darrick J. Wong [Thu, 22 Feb 2024 20:33:05 +0000 (12:33 -0800)]
xfs: repair summary counters

Use the same summary counter calculation infrastructure to generate new
values for the in-core summary counters.   The difference between the
scrubber and the repairer is that the repairer will freeze the fs during
setup, which means that the values should match exactly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: update health status if we get a clean bill of health
Darrick J. Wong [Thu, 22 Feb 2024 20:33:04 +0000 (12:33 -0800)]
xfs: update health status if we get a clean bill of health

If scrub finds that everything is ok with the filesystem, we need a way
to tell the health tracking that it can let go of indirect health flags,
since indirect flags only mean that at some point in the past we lost
some context.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: remember sick inodes that get inactivated
Darrick J. Wong [Thu, 22 Feb 2024 20:33:03 +0000 (12:33 -0800)]
xfs: remember sick inodes that get inactivated

If an unhealthy inode gets inactivated, remember this fact in the
per-fs health summary.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: add secondary and indirect classes to the health tracking system
Darrick J. Wong [Thu, 22 Feb 2024 20:33:03 +0000 (12:33 -0800)]
xfs: add secondary and indirect classes to the health tracking system

Establish two more classes of health tracking bits:

 * Indirect problems, which suggest problems in other health domains
   that we weren't able to preserve.

 * Secondary problems, which track state that's related to primary
   evidence of health problems; and

The first class we'll use in an upcoming patch to record in the AG
health status the fact that we ran out of memory and had to inactivate
an inode with defective metadata.  The second class we use to indicate
that repair knows that an inode is bad and we need to fix it later.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: report XFS_IS_CORRUPT errors to the health system
Darrick J. Wong [Thu, 22 Feb 2024 20:32:55 +0000 (12:32 -0800)]
xfs: report XFS_IS_CORRUPT errors to the health system

Whenever we encounter XFS_IS_CORRUPT failures, we should report that to
the health monitoring system for later reporting.

I started with this semantic patch and massaged everything until it
built:

@@
expression mp, test;
@@

- if (XFS_IS_CORRUPT(mp, test)) return -EFSCORRUPTED;
+ if (XFS_IS_CORRUPT(mp, test)) { xfs_btree_mark_sick(cur); return -EFSCORRUPTED; }

@@
expression mp, test;
identifier label, error;
@@

- if (XFS_IS_CORRUPT(mp, test)) { error = -EFSCORRUPTED; goto label; }
+ if (XFS_IS_CORRUPT(mp, test)) { xfs_btree_mark_sick(cur); error = -EFSCORRUPTED; goto label; }

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: report realtime metadata corruption errors to the health system
Darrick J. Wong [Thu, 22 Feb 2024 20:32:44 +0000 (12:32 -0800)]
xfs: report realtime metadata corruption errors to the health system

Whenever we encounter corrupt realtime metadat blocks, we should report
that to the health monitoring system for later reporting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: report quota block corruption errors to the health system
Darrick J. Wong [Thu, 22 Feb 2024 20:32:44 +0000 (12:32 -0800)]
xfs: report quota block corruption errors to the health system

Whenever we encounter corrupt quota blocks, we should report that to the
health monitoring system for later reporting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: report inode corruption errors to the health system
Darrick J. Wong [Thu, 22 Feb 2024 20:32:43 +0000 (12:32 -0800)]
xfs: report inode corruption errors to the health system

Whenever we encounter corrupt inode records, we should report that to
the health monitoring system for later reporting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: report symlink block corruption errors to the health system
Darrick J. Wong [Thu, 22 Feb 2024 20:32:42 +0000 (12:32 -0800)]
xfs: report symlink block corruption errors to the health system

Whenever we encounter corrupt symbolic link blocks, we should report
that to the health monitoring system for later reporting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: report dir/attr block corruption errors to the health system
Darrick J. Wong [Thu, 22 Feb 2024 20:32:18 +0000 (12:32 -0800)]
xfs: report dir/attr block corruption errors to the health system

Whenever we encounter corrupt directory or extended attribute blocks, we
should report that to the health monitoring system for later reporting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: report btree block corruption errors to the health system
Darrick J. Wong [Thu, 22 Feb 2024 20:32:09 +0000 (12:32 -0800)]
xfs: report btree block corruption errors to the health system

Whenever we encounter corrupt btree blocks, we should report that to the
health monitoring system for later reporting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: report block map corruption errors to the health tracking system
Darrick J. Wong [Thu, 22 Feb 2024 20:31:51 +0000 (12:31 -0800)]
xfs: report block map corruption errors to the health tracking system

Whenever we encounter a corrupt block mapping, we should report that to
the health monitoring system for later reporting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: report ag header corruption errors to the health tracking system
Darrick J. Wong [Thu, 22 Feb 2024 20:31:03 +0000 (12:31 -0800)]
xfs: report ag header corruption errors to the health tracking system

Whenever we encounter a corrupt AG header, we should report that to the
health monitoring system for later reporting.  Buffer readers that don't
respond to corruption events with a _mark_sick call can be detected with
the following script:

#!/bin/bash

# Detect missing calls to xfs_*_mark_sick

filter=cat
tty -s && filter=less

git grep -A10  -E '( = xfs_trans_read_buf| = xfs_buf_read\()' fs/xfs/*.[ch] fs/xfs/libxfs/*.[ch] | awk '
BEGIN {
ignore = 0;
lineno = 0;
delete lines;
}
{
if ($0 == "--") {
if (!ignore) {
for (i = 0; i < lineno; i++) {
print(lines[i]);
}
printf("--\n");
}
delete lines;
lineno = 0;
ignore = 0;
} else if ($0 ~ /mark_sick/) {
ignore = 1;
} else {
lines[lineno++] = $0;
}
}
' | $filter

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: report fs corruption errors to the health tracking system
Darrick J. Wong [Thu, 22 Feb 2024 20:31:02 +0000 (12:31 -0800)]
xfs: report fs corruption errors to the health tracking system

Whenever we encounter corrupt fs metadata, we should report that to the
health monitoring system for later reporting.  A convenient program for
identifying places to insert xfs_*_mark_sick calls is as follows:

#!/bin/bash

# Detect missing calls to xfs_*_mark_sick

filter=cat
tty -s && filter=less

git grep -B3 EFSCORRUPTED fs/xfs/*.[ch] fs/xfs/libxfs/*.[ch] fs/xfs/scrub/*.[ch] | awk '
BEGIN {
ignore = 0;
lineno = 0;
delete lines;
}
{
if ($0 == "--") {
if (!ignore) {
for (i = 0; i < lineno; i++) {
print(lines[i]);
}
printf("--\n");
}
delete lines;
lineno = 0;
ignore = 0;
} else if ($0 ~ /mark_sick/) {
ignore = 1;
} else if ($0 ~ /if .fa/) {
ignore = 1;
} else if ($0 ~ /failaddr/) {
ignore = 1;
} else if ($0 ~ /_verifier_error/) {
ignore = 1;
} else if ($0 ~ /^ \* .*EFSCORRUPTED/) {
ignore = 1;
} else if ($0 ~ /== -EFSCORRUPTED/) {
ignore = 1;
} else if ($0 ~ /!= -EFSCORRUPTED/) {
ignore = 1;
} else {
lines[lineno++] = $0;
}
}
' | $filter

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: separate the marking of sick and checked metadata
Darrick J. Wong [Thu, 22 Feb 2024 20:31:01 +0000 (12:31 -0800)]
xfs: separate the marking of sick and checked metadata

Split the setting of the sick and checked masks into separate functions
as part of preparing to add the ability for regular runtime fs code
(i.e. not scrub) to mark metadata structures sick when corruptions are
found.  Improve the documentation of libxfs' requirements for helper
behavior.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: teach repair to fix file nlinks
Darrick J. Wong [Thu, 22 Feb 2024 20:31:00 +0000 (12:31 -0800)]
xfs: teach repair to fix file nlinks

Fix the file link counts since we just computed the correct ones.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: track directory entry updates during live nlinks fsck
Darrick J. Wong [Thu, 22 Feb 2024 20:30:59 +0000 (12:30 -0800)]
xfs: track directory entry updates during live nlinks fsck

Create the necessary hooks in the directory operations
(create/link/unlink/rename) code so that our live nlink scrub code can
stay up to date with link count updates in the rest of the filesystem.
This will be the means to keep our shadow link count information up to
date while the scan runs in real time.

In online fsck part 2, we'll use these same hooks to handle repairs
to directories and parent pointer information.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: teach scrub to check file nlinks
Darrick J. Wong [Thu, 22 Feb 2024 20:30:58 +0000 (12:30 -0800)]
xfs: teach scrub to check file nlinks

Create the necessary scrub code to walk the filesystem's directory tree
so that we can compute file link counts.  Similar to quotacheck, we
create an incore shadow array of link count information and then we walk
the filesystem a second time to compare the link counts.  We need live
updates to keep the information up to date during the lengthy scan, so
this scrubber remains disabled until the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: report health of inode link counts
Darrick J. Wong [Thu, 22 Feb 2024 20:30:58 +0000 (12:30 -0800)]
xfs: report health of inode link counts

Report on the health of the inode link counts.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: repair dquots based on live quotacheck results
Darrick J. Wong [Thu, 22 Feb 2024 20:30:57 +0000 (12:30 -0800)]
xfs: repair dquots based on live quotacheck results

Use the shadow quota counters that live quotacheck creates to reset the
incore dquot counters.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: repair cannot update the summary counters when logging quota flags
Darrick J. Wong [Thu, 22 Feb 2024 20:30:56 +0000 (12:30 -0800)]
xfs: repair cannot update the summary counters when logging quota flags

While running xfs/804 (quota repairs racing with fsstress), I observed a
filesystem shutdown in the primary sb write verifier:

run fstests xfs/804 at 2022-05-23 18:43:48
XFS (sda4): Mounting V5 Filesystem
XFS (sda4): Ending clean mount
XFS (sda4): Quotacheck needed: Please wait.
XFS (sda4): Quotacheck: Done.
XFS (sda4): EXPERIMENTAL online scrub feature in use. Use at your own risk!
XFS (sda4): SB ifree sanity check failed 0xb5 > 0x80
XFS (sda4): Metadata corruption detected at xfs_sb_write_verify+0x5e/0x100 [xfs], xfs_sb block 0x0
XFS (sda4): Unmount and run xfs_repair

The "SB ifree sanity check failed" message was a debugging printk that I
added to the kernel; observe that 0xb5 - 0x80 = 53, which is less than
one inode chunk.

I traced this to the xfs_log_sb calls from the online quota repair code,
which tries to clear the CHKD flags from the superblock to force a
mount-time quotacheck if the repair fails.  On a V5 filesystem,
xfs_log_sb updates the ondisk sb summary counters with the current
contents of the percpu counters.  This is done without quiescing other
writer threads, which means it could be racing with a thread that has
updated icount and is about to update ifree.

If the other write thread had incremented ifree before updating icount,
the repair thread will write icount > ifree into the logged update.  If
the AIL writes the logged superblock back to disk before anyone else
fixes this siutation, this will lead to a write verifier failure, which
causes a filesystem shutdown.

Resolve this problem by updating the quota flags and calling
xfs_sb_to_disk directly, which does not touch the percpu counters.
While we're at it, we can elide the entire update if the selected qflags
aren't set.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: track quota updates during live quotacheck
Darrick J. Wong [Thu, 22 Feb 2024 20:30:55 +0000 (12:30 -0800)]
xfs: track quota updates during live quotacheck

Create a shadow dqtrx system in the quotacheck code that hooks the
regular dquot counter update code.  This will be the means to keep our
copy of the dquot counters up to date while the scan runs in real time.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: implement live quotacheck inode scan
Darrick J. Wong [Thu, 22 Feb 2024 20:30:54 +0000 (12:30 -0800)]
xfs: implement live quotacheck inode scan

Create a new trio of scrub functions to check quota counters.  While the
dquots themselves are filesystem metadata and should be checked early,
the dquot counter values are computed from other metadata and are
therefore summary counters.  We don't plug these into the scrub dispatch
just yet, because we still need to be able to watch quota updates while
doing our scan.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: create a sparse load xfarray function
Darrick J. Wong [Thu, 22 Feb 2024 20:30:54 +0000 (12:30 -0800)]
xfs: create a sparse load xfarray function

Create a new method to load an xfarray element from the xfile, but with
a twist.  If we've never stored to the array index, zero the caller's
buffer.  This will facilitate RMWs updates of records in a sparse array
without fuss, since the sparse xfarray convention is that uninitialized
array elements default to zeroes.

This is a separate patch to reduce the size of the upcoming quotacheck
patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: create a helper to count per-device inode block usage
Darrick J. Wong [Thu, 22 Feb 2024 20:30:53 +0000 (12:30 -0800)]
xfs: create a helper to count per-device inode block usage

Create a helper to compute the number of blocks that a file has
allocated from the data realtime volumes.  This patch was
split out to reduce the size of the upcoming quotacheck patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: create a xchk_trans_alloc_empty helper for scrub
Darrick J. Wong [Thu, 22 Feb 2024 20:30:52 +0000 (12:30 -0800)]
xfs: create a xchk_trans_alloc_empty helper for scrub

Create a helper to initialize empty transactions on behalf of a scrub
operation.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: report the health of quota counts
Darrick J. Wong [Thu, 22 Feb 2024 20:30:51 +0000 (12:30 -0800)]
xfs: report the health of quota counts

Report the health of quota counts.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: repair file modes by scanning for a dirent pointing to us
Darrick J. Wong [Thu, 22 Feb 2024 20:30:51 +0000 (12:30 -0800)]
xfs: repair file modes by scanning for a dirent pointing to us

Repair might encounter an inode with a totally garbage i_mode.  To fix
this problem, we have to figure out if the file was a regular file, a
directory, or a special file.  One way to figure this out is to check if
there are any directories with entries pointing down to the busted file.

This patch recovers the file mode by scanning every directory entry on
the filesystem to see if there are any that point to the busted file.
If the ftype of all such dirents are consistent, the mode is recovered
from the ftype.  If no dirents are found, the file becomes a regular
file.  In all cases, ACLs are canceled and the file is made accessible
only by root.

A previous patch attempted to guess the mode by reading the beginning of
the file data.  This was rejected by Christoph on the grounds that we
cannot trust user-controlled data blocks.  Users do not have direct
control over the ondisk contents of directory entries, so this method
should be much safer.

If all the dirents have the same ftype, then we can translate that back
into an S_IFMT flag and fix the file.  If not, reset the mode to
S_IFREG.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: create a macro for decoding ftypes in tracepoints
Darrick J. Wong [Thu, 22 Feb 2024 20:30:50 +0000 (12:30 -0800)]
xfs: create a macro for decoding ftypes in tracepoints

Create the XFS_DIR3_FTYPE_STR macro so that we can report ftype as
strings instead of numbers in tracepoints.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: create a predicate to determine if two xfs_names are the same
Darrick J. Wong [Thu, 22 Feb 2024 20:30:49 +0000 (12:30 -0800)]
xfs: create a predicate to determine if two xfs_names are the same

Create a simple predicate to determine if two xfs_names are the same
objects or have the exact same name.  The comparison is always case
sensitive.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: create a static name for the dot entry too
Darrick J. Wong [Thu, 22 Feb 2024 20:30:48 +0000 (12:30 -0800)]
xfs: create a static name for the dot entry too

Create an xfs_name_dot object so that upcoming scrub code can compare
against that.  Offline repair already has such an object, so we're
really just hoisting it to the kernel.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: iscan batching should handle unallocated inodes too
Darrick J. Wong [Thu, 22 Feb 2024 20:30:48 +0000 (12:30 -0800)]
xfs: iscan batching should handle unallocated inodes too

The inode scanner tries to reduce contention on the AGI header buffer
lock by grabbing references to consecutive allocated inodes.  Batching
stops as soon as we encounter an unallocated inode.  This is unfortunate
because in the worst case performance collapses to the old "one at a
time" behavior if every other inode is free.

This is correct behavior, but we could do better.  Unallocated inodes by
definition have nothing to scan, which means the iscan can ignore them
as long as someone ensures that the scan data will reflect another
thread allocating the inode and adding interesting metadata to that
inode.  That mechanism is, of course, the live update hooks.

Therefore, extend the batching mechanism to track unallocated inodes
adjacent to the scan cursor.  The _want_live_update predicate can tell
the caller's live update hook to incorporate all live updates to what
the scanner thinks is an unallocated inode if (after dropping the AGI)
some other thread allocates one of those inodes and begins using it.

Note that we cannot just copy the ir_free bitmap into the scan cursor
because the batching stops if iget says the inode is in an intermediate
state (e.g. on the inactivation list) and cannot be igrabbed.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: cache a bunch of inodes for repair scans
Darrick J. Wong [Thu, 22 Feb 2024 20:30:47 +0000 (12:30 -0800)]
xfs: cache a bunch of inodes for repair scans

After observing xfs_scrub taking forever to rebuild parent pointers on a
pptrs enabled filesystem, I decided to profile what the system was
doing.  It turns out that when there are a lot of threads trying to scan
the filesystem, most of our time is spent contending on AGI buffer
locks.  Given that we're walking the inobt records anyway, we can often
tell ahead of time when there's a bunch of (up to 64) consecutive inodes
that we could grab all at once.

Do this to amortize the cost of taking the AGI lock across as many
inodes as we possibly can.  On the author's system this seems to improve
parallel throughput from barely one and a half cores to slightly
sublinear scaling.  The obvious antipattern here of course is where the
freemask has every other bit set (e.g. all 0xA's)

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: stagger the starting AG of scrub iscans to reduce contention
Darrick J. Wong [Thu, 22 Feb 2024 20:30:46 +0000 (12:30 -0800)]
xfs: stagger the starting AG of scrub iscans to reduce contention

Online directory and parent repairs on parent-pointer equipped
filesystems have shown that starting a large number of parallel iscans
causes a lot of AGI buffer contention.  Try to reduce this by making it
so that iscans scan wrap around the end of the filesystem, and using a
rotor to stagger where each scanner begins.  Surprisingly, this boosts
CPU utilization (on the author's test machines) from effectively
single-threaded to 160%.  Not great, but see the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: allow scrub to hook metadata updates in other writers
Darrick J. Wong [Thu, 22 Feb 2024 20:30:45 +0000 (12:30 -0800)]
xfs: allow scrub to hook metadata updates in other writers

Certain types of filesystem metadata can only be checked by scanning
every file in the entire filesystem.  Specific examples of this include
quota counts, file link counts, and reverse mappings of file extents.
Directory and parent pointer reconstruction may also fall into this
category.  File scanning is much trickier than scanning AG metadata
because we have to take inode locks in the same order as the rest of
[VX]FS, we can't be holding buffer locks when we do that, and scanning
the whole filesystem takes time.

Earlier versions of the online repair patchset relied heavily on
fsfreeze as a means to quiesce the filesystem so that we could take
locks in the proper order without worrying about concurrent updates from
other writers.  Reviewers of those patches opined that freezing the
entire fs to check and repair something was not sufficiently better than
unmounting to run fsck offline.  I don't agree with that 100%, but the
message was clear: find a way to repair things that minimizes the
quiet period where nobody can write to the filesystem.

Generally, building btree indexes online can be split into two phases: a
collection phase where we compute the records that will be put into the
new btree; and a construction phase, where we construct the physical
btree blocks and persist them.  While it's simple to hold resource locks
for the entirety of the two phases to ensure that the new index is
consistent with the rest of the system, we don't need to hold resource
locks during the collection phase if we have a means to receive live
updates of other work going on elsewhere in the system.

The goal of this patch, then, is to enable online fsck to learn about
metadata updates going on in other threads while it constructs a shadow
copy of the metadata records to verify or correct the real metadata.  To
minimize the overhead when online fsck isn't running, we use srcu
notifiers because they prioritize fast access to the notifier call chain
(particularly when the chain is empty) at a cost to configuring
notifiers.  Online fsck should be relatively infrequent, so this is
acceptable.

The intended usage model is fairly simple.  Code that modifies a
metadata structure of interest should declare a xfs_hook_chain structure
in some well defined place, and call xfs_hook_call whenever an update
happens.  Online fsck code should define a struct notifier_block and use
xfs_hook_add to attach the block to the chain, along with a function to
be called.  This function should synchronize with the fsck scanner to
update whatever in-memory data the scanner is collecting.  When
finished, xfs_hook_del removes the notifier from the list and waits for
them all to complete.

Originally, I selected srcu notifiers over blocking notifiers to
implement live hooks because they seemed to have fewer impacts to
scalability.  The per-call cost of srcu_notifier_call_chain is higher
(19ns) than blocking_notifier_ (4ns) in the single threaded case, but
blocking notifiers use an rwsem to stabilize the list.  Cacheline
bouncing for that rwsem is costly to runtime code when there are a lot
of CPUs running regular filesystem operations.  If there are no hooks
installed, this is a total waste of CPU time.

Therefore, I stuck with srcu notifiers, despite trading off single
threaded performance for multithreaded performance.  I also wasn't
thrilled with the very high teardown time for srcu notifiers, since the
caller has to wait for the next rcu grace period.  This can take a long
time if there are a lot of CPUs.

Then I discovered the jump label implementation of static keys.

Jump labels use kernel code patching to replace a branch with a nop sled
when the key is disabled.  IOWs, they can eliminate the overhead of
_call_chain when there are no hooks enabled.  This makes blocking
notifiers competitive again -- scrub runs faster because teardown of the
chain is a lot cheaper, and runtime code only pays the rwsem locking
overhead when scrub is actually running.

With jump labels enabled, calls to empty notifier chains are elided from
the call sites when there are no hooks registered, which means that the
overhead is 0.36ns when fsck is not running.  This is perfect for most
of the architectures that XFS is expected to run on (e.g. x86, powerpc,
arm64, s390x, riscv).

For architectures that don't support jump labels (e.g. m68k) the runtime
overhead of checking the static key is an atomic counter read.  This
isn't great, but it's still cheaper than taking a shared rwsem.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: implement live inode scan for scrub
Darrick J. Wong [Thu, 22 Feb 2024 20:30:45 +0000 (12:30 -0800)]
xfs: implement live inode scan for scrub

This patch implements a live file scanner for online fsck functions that
require the ability to walk a filesystem to gather metadata records and
stay informed about metadata changes to files that have already been
visited.

The iscan structure consists of two inode number cursors: one to track
which inode we want to visit next, and a second one to track which
inodes have already been visited.  This second cursor is key to
capturing live updates to files previously scanned while the main thread
continues scanning -- any inode greater than this value hasn't been
scanned and can go on its way; any other update must be incorporated
into the collected data.  It is critical for the scanning thraad to hold
exclusive access on the inode until after marking the inode visited.

This new code is a separate patch from the patchsets adding callers for
the sake of enabling the author to move patches around his tree with
ease.  The intended usage model for this code is roughly:

xchk_iscan_start(iscan, 0, 0);
while ((error = xchk_iscan_iter(sc, iscan, &ip)) == 1) {
xfs_ilock(ip, ...);
/* capture inode metadata */
xchk_iscan_mark_visited(iscan, ip);
xfs_iunlock(ip, ...);

xfs_irele(ip);
}
xchk_iscan_stop(iscan);
if (error)
return error;

Hook functions for live updates can then do:

if (xchk_iscan_want_live_update(...))
/* update the captured inode metadata */

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: speed up xfs_iwalk_adjust_start a little bit
Darrick J. Wong [Thu, 22 Feb 2024 20:30:44 +0000 (12:30 -0800)]
xfs: speed up xfs_iwalk_adjust_start a little bit

Replace the open-coded loop that recomputes freecount with a single call
to a bit weight function.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
13 months agoxfs: fix SEEK_HOLE/DATA for regions with active COW extents
Dave Chinner [Tue, 20 Feb 2024 22:49:28 +0000 (09:49 +1100)]
xfs: fix SEEK_HOLE/DATA for regions with active COW extents

A data corruption problem was reported by CoreOS image builders
when using reflink based disk image copies and then converting
them to qcow2 images. The converted images failed the conversion
verification step, and it was isolated down to the fact that
qemu-img uses SEEK_HOLE/SEEK_DATA to find the data it is supposed to
copy.

The reproducer allowed me to isolate the issue down to a region of
the file that had overlapping data and COW fork extents, and the
problem was that the COW fork extent was being reported in it's
entirity by xfs_seek_iomap_begin() and so skipping over the real
data fork extents in that range.

This was somewhat hidden by the fact that 'xfs_bmap -vvp' reported
all the extents correctly, and reading the file completely (i.e. not
using seek to skip holes) would map the file correctly and all the
correct data extents are read. Hence the problem is isolated to just
the xfs_seek_iomap_begin() implementation.

Instrumentation with trace_printk made the problem obvious: we are
passing the wrong length to xfs_trim_extent() in
xfs_seek_iomap_begin(). We are passing the end_fsb, not the
maximum length of the extent we want to trim the map too. Hence the
COW extent map never gets trimmed to the start of the next data fork
extent, and so the seek code treats the entire COW fork extent as
unwritten and skips entirely over the data fork extents in that
range.

Link: https://github.com/coreos/coreos-assembler/issues/3728
Fixes: 60271ab79d40 ("xfs: fix SEEK_DATA for speculative COW fork preallocation")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
13 months agoxfs: remove xfile_{get,put}_page
Darrick J. Wong [Mon, 19 Feb 2024 06:27:30 +0000 (07:27 +0100)]
xfs: remove xfile_{get,put}_page

These functions aren't used anymore, so get rid of them.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
13 months agoxfs: convert xfarray_pagesort to deal with large folios
Darrick J. Wong [Mon, 19 Feb 2024 06:27:29 +0000 (07:27 +0100)]
xfs: convert xfarray_pagesort to deal with large folios

Convert xfarray_pagesort to handle large folios by introducing a new
xfile_get_folio routine that can return a folio of arbitrary size, and
using heapsort on the full folio.  This also corrects an off-by-one bug
in the calculation of len in xfarray_pagesort that was papered over by
xfarray_want_pagesort.

Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
13 months agoxfs: fix a comment in xfarray.c
Christoph Hellwig [Mon, 19 Feb 2024 06:27:28 +0000 (07:27 +0100)]
xfs: fix a comment in xfarray.c

xfiles are shmem files, not memfds.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
13 months agoxfs: remove xfarray_sortinfo.page_kaddr
Christoph Hellwig [Mon, 19 Feb 2024 06:27:27 +0000 (07:27 +0100)]
xfs: remove xfarray_sortinfo.page_kaddr

Now that xfile pages don't need kmapping, there is no need to cache
the kernel virtual address for them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>