linux.git
17 months agobcachefs: Drop all btree locks when submitting btree node reads
Kent Overstreet [Fri, 9 Apr 2021 02:26:53 +0000 (22:26 -0400)]
bcachefs: Drop all btree locks when submitting btree node reads

As a rule we don't want to be holding btree locks while submitting IO -
this will improve overall filesystem latency.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: More topology repair code
Kent Overstreet [Mon, 7 Jun 2021 17:28:50 +0000 (13:28 -0400)]
bcachefs: More topology repair code

This improves the handling of overlapping btree nodes; now, we handle
the case where one btree node completely overwrites another.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Fix a buffer overrun
Kent Overstreet [Thu, 10 Jun 2021 17:21:39 +0000 (13:21 -0400)]
bcachefs: Fix a buffer overrun

In make_extent_indirect(), we were allocating too small of a buffer for
the new indirect extent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Don't mark superblocks past end of usable space
Kent Overstreet [Wed, 9 Jun 2021 02:50:30 +0000 (22:50 -0400)]
bcachefs: Don't mark superblocks past end of usable space

bcachefs-tools recently started putting a backup superblock at the end
of the device. This causes a problem if the bucket size doesn't divide
the device size - but we can fix it by just skipping marking that part.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Fix a spurious debug mode assertion
Kent Overstreet [Tue, 8 Jun 2021 20:29:24 +0000 (16:29 -0400)]
bcachefs: Fix a spurious debug mode assertion

When we switched to using bch2_btree_bset_insert_key() for extents it
turned out it started leaving invalid keys around - of type deleted but
nonzero size - but this is fine (if ugly) because they're never written
out.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Fix unitialized use of a value
Brett Holman [Sun, 6 Jun 2021 15:29:42 +0000 (09:29 -0600)]
bcachefs: Fix unitialized use of a value

Signed-off-by: Brett Holman <bpholman5@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: do not compile acl mod on minimal config
Dan Robertson [Sat, 5 Jun 2021 23:03:16 +0000 (19:03 -0400)]
bcachefs: do not compile acl mod on minimal config

Do not compile the acl.o target if BCACHEFS_POSIX_ACL is not enabled.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: btree_iter->should_be_locked
Kent Overstreet [Fri, 4 Jun 2021 21:17:45 +0000 (17:17 -0400)]
bcachefs: btree_iter->should_be_locked

Add a field to struct btree_iter for tracking whether it should be
locked - this fixes spurious transaction restarts in
bch2_trans_relock().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Improve btree iterator tracepoints
Kent Overstreet [Fri, 4 Jun 2021 19:18:10 +0000 (15:18 -0400)]
bcachefs: Improve btree iterator tracepoints

This patch adds some new tracepoints to the btree iterator code, and
adds new fields to the existing tracepoints - primarily for the iterator
position.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Preallocate transaction mem
Kent Overstreet [Thu, 3 Jun 2021 03:31:42 +0000 (23:31 -0400)]
bcachefs: Preallocate transaction mem

This helps avoid transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Check for errors from bch2_trans_update()
Kent Overstreet [Wed, 2 Jun 2021 04:15:07 +0000 (00:15 -0400)]
bcachefs: Check for errors from bch2_trans_update()

Upcoming refactoring is going to change bch2_trans_update() to start
returning transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs; Check for allocator thread shutdown
Kent Overstreet [Tue, 1 Jun 2021 00:52:39 +0000 (20:52 -0400)]
bcachefs; Check for allocator thread shutdown

We were missing a kthread_should_stop() check in the loop in
bch2_invalidate_buckets(), very occasionally leading to us getting stuck
while shutting down.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Journal space calculation fix
Kent Overstreet [Mon, 31 May 2021 04:13:39 +0000 (00:13 -0400)]
bcachefs: Journal space calculation fix

When devices have different bucket sizes, we may accumulate a journal
write that doesn't fit on some of our devices - previously, we'd
underflow when calculating space on that device and then everything
would get weird.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Don't fragment extents when making them indirect
Kent Overstreet [Sun, 21 Mar 2021 02:14:10 +0000 (22:14 -0400)]
bcachefs: Don't fragment extents when making them indirect

This fixes a "disk usage increased without a reservation" bug, when
reflinking compressed extents. Also, there's no good reason for reflink
to be fragmenting extents anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Fsck for reflink refcounts
Kent Overstreet [Sun, 23 May 2021 06:31:33 +0000 (02:31 -0400)]
bcachefs: Fsck for reflink refcounts

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Assorted endianness fixes
Kent Overstreet [Sun, 23 May 2021 21:04:13 +0000 (17:04 -0400)]
bcachefs: Assorted endianness fixes

Found by sparse

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Fix a deadlock
Kent Overstreet [Mon, 11 Sep 2023 03:33:08 +0000 (23:33 -0400)]
bcachefs: Fix a deadlock

Waiting on a btree node write with btree locks held can deadlock, if the
write errors: the write error path has to do do a btree update to drop
the pointer to the replica that errored.

The interior update path has to wait on in flight btree writes before
freeing nodes on disk. Previously, this was done in
bch2_btree_interior_update_will_free_node(), and could deadlock; now, we
just stash a pointer to the node and do it in
btree_update_nodes_written(), just prior to the transactional part of
the update.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Split out btree_error_wq
Kent Overstreet [Fri, 28 May 2021 01:38:00 +0000 (21:38 -0400)]
bcachefs: Split out btree_error_wq

We can't use btree_update_wq becuase btree updates may be waiting on
btree writes to complete.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Fix pathalogical behaviour with inode sharding by cpu ID
Kent Overstreet [Fri, 28 May 2021 09:06:18 +0000 (05:06 -0400)]
bcachefs: Fix pathalogical behaviour with inode sharding by cpu ID

If the transactior restarts on a different CPU, it could end up needing
to read in a different btree node, which makes another transaction
restart more likely...

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Fix journal write error path
Kent Overstreet [Fri, 28 May 2021 03:16:25 +0000 (23:16 -0400)]
bcachefs: Fix journal write error path

Journal write errors were racing with the submission path - potentially
causing writes to other replicas to not get submitted.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Reflink refcount fix
Kent Overstreet [Fri, 28 May 2021 01:16:50 +0000 (21:16 -0400)]
bcachefs: Reflink refcount fix

__bch2_trans_mark_reflink_p wasn't always correctly returning the number
of sectors processed - the new logic is a bit more straightforward
overall too.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Add an option to control sharding new inode numbers
Kent Overstreet [Fri, 28 May 2021 00:20:20 +0000 (20:20 -0400)]
bcachefs: Add an option to control sharding new inode numbers

We're seeing a bug where inode creates end up spinning in
bch2_inode_create - disabling sharding will simplify what we're testing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Don't use bch_write_op->cl for delivering completions
Kent Overstreet [Sat, 29 Oct 2022 06:47:33 +0000 (02:47 -0400)]
bcachefs: Don't use bch_write_op->cl for delivering completions

We already had op->end_io as an alternative mechanism to op->cl.parent
for delivering write completions; this switches all code paths to using
op->end_io.

Two reasons:
 - op->end_io is more efficient, due to fewer atomic ops, this completes
   the conversion that was originally only done for the direct IO path.
 - We'll be restructing the write path to use a different mechanism for
   punting to process context, refactoring to not use op->cl will make
   that easier.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Kill bch_write_op.index_update_fn
Kent Overstreet [Sat, 29 Oct 2022 03:57:01 +0000 (23:57 -0400)]
bcachefs: Kill bch_write_op.index_update_fn

This deletes bch_write_op.index_update_fn: indirect function calls have
gotten considerably more expensive post spectre/meltdown, and we only
have two different index_update_fns - this patch adds a flag to specify
which one to use (normal vs. data move path).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Inline fastpath of bch2_disk_reservation_add()
Kent Overstreet [Tue, 1 Nov 2022 02:28:09 +0000 (22:28 -0400)]
bcachefs: Inline fastpath of bch2_disk_reservation_add()

The fastpath now doesn't even disable preemption - instead we use a (non
locked) cmpxchg.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Don't use uuid in tracepoints
Kent Overstreet [Thu, 27 May 2021 23:15:44 +0000 (19:15 -0400)]
bcachefs: Don't use uuid in tracepoints

%pU for printing out pointers to uuids doesn't work in perf trace

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Add a tracepoint for copygc waiting
Kent Overstreet [Wed, 26 May 2021 05:03:35 +0000 (01:03 -0400)]
bcachefs: Add a tracepoint for copygc waiting

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Add a cond_resched call to the copygc main loop
Kent Overstreet [Tue, 25 May 2021 22:42:05 +0000 (18:42 -0400)]
bcachefs: Add a cond_resched call to the copygc main loop

We seem to have a bug where the copygc thread ends up spinning and
making the system unusable - this will at least prevent it from locking
up the machine, and it's a good thing to have anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Fix a null ptr deref
Kent Overstreet [Sun, 23 May 2021 22:42:51 +0000 (18:42 -0400)]
bcachefs: Fix a null ptr deref

bch2_btree_iter_peek() won't always return a key - whoops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Fix an issue with inconsistent btree writes after unclean shutdown
Kent Overstreet [Sun, 23 May 2021 01:43:20 +0000 (21:43 -0400)]
bcachefs: Fix an issue with inconsistent btree writes after unclean shutdown

After unclean shutdown, btree writes may have completed on one device
and not others - and this inconsistency could lead us to writing new
bsets with a gap in our btree node in one of our replicas.

Fortunately, this is only an issue with bsets that are newer than the
most recent journal flush, and we already have a mechanism for detecting
and blacklisting those. We just need to make sure to start new btree
writes after the most recent _non_ blacklisted bset.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Improve FS_IOC_GOINGDOWN ioctl
Kent Overstreet [Sun, 23 May 2021 01:13:17 +0000 (21:13 -0400)]
bcachefs: Improve FS_IOC_GOINGDOWN ioctl

We weren't interpreting the flags argument at all.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Add a workqueue for btree io completions
Kent Overstreet [Sat, 22 May 2021 21:37:25 +0000 (17:37 -0400)]
bcachefs: Add a workqueue for btree io completions

Also, clean up workqueue usage - we shouldn't be using system
workqueues, pretty much everything we do needs to be on our own
WQ_MEM_RECLAIM workqueues.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: rewrote prefetch asm in gas syntax for clang compatibility
Brett Holman [Fri, 21 May 2021 22:45:38 +0000 (16:45 -0600)]
bcachefs: rewrote prefetch asm in gas syntax for clang compatibility

Signed-off-by: Brett Holman <bpholman5@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Add a debug mode that always reads from every btree replica
Kent Overstreet [Sat, 22 May 2021 03:57:37 +0000 (23:57 -0400)]
bcachefs: Add a debug mode that always reads from every btree replica

There's a new module parameter, verify_all_btree_replicas, that enables
reading from every btree replica when reading in btree nodes and
comparing them against each other. We've been seeing some strange btree
corruption - this will hopefully aid in tracking it down and catching it
more often.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Don't repair btree nodes until after interior journal replay is done
Kent Overstreet [Fri, 21 May 2021 20:06:54 +0000 (16:06 -0400)]
bcachefs: Don't repair btree nodes until after interior journal replay is done

We need the btree to be in a consistent state before we can rewrite
btree nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Fix an uninitialized var
Kent Overstreet [Fri, 21 May 2021 00:47:27 +0000 (20:47 -0400)]
bcachefs: Fix an uninitialized var

this fixes a valgrind complaint

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Fix for buffered writes getting -ENOSPC
Kent Overstreet [Thu, 20 May 2021 19:49:23 +0000 (15:49 -0400)]
bcachefs: Fix for buffered writes getting -ENOSPC

Buffered writes may have to increase their disk reservation at btree
update time, due to compression and erasure coding being unpredictable:
O_DIRECT writes should be checking for -ENOSPC, but buffered writes have
already been accepted and should not.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Fix inode backpointers in RENAME_OVERWRITE
Kent Overstreet [Thu, 20 May 2021 04:09:47 +0000 (00:09 -0400)]
bcachefs: Fix inode backpointers in RENAME_OVERWRITE

When we delete the dirent an inode points to, we need to zero out the
backpointer fields - this was missed in the RENAME_OVERWRITE case.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Make bch2_remap_range respect O_SYNC
Kent Overstreet [Thu, 20 May 2021 01:21:49 +0000 (21:21 -0400)]
bcachefs: Make bch2_remap_range respect O_SYNC

Caught by xfstest generic/628

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Split extents if necessary in bch2_trans_update()
Kent Overstreet [Wed, 19 May 2021 03:17:03 +0000 (23:17 -0400)]
bcachefs: Split extents if necessary in bch2_trans_update()

Currently, we handle multiple overlapping extents in the same
transaction commit by doing fixups in bch2_trans_update() - this patch
extents that to split updates when necessary. The next patch that
changes the reflink code to not fragment extents when making them
indirect will require this.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Ratelimiting for writeback IOs
Kent Overstreet [Wed, 19 May 2021 03:53:43 +0000 (23:53 -0400)]
bcachefs: Ratelimiting for writeback IOs

Writeback throttling is a kernel config option and not always enabled.
When it's not enabled we need a fallback, to avoid unbounded memory
pinning and work item backlogs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: statfs resports incorrect avail blocks
Dan Robertson [Wed, 19 May 2021 00:36:20 +0000 (20:36 -0400)]
bcachefs: statfs resports incorrect avail blocks

The current implementation of bch_statfs does not scale the number of
available blocks provided in f_bavail by the reserve factor. This causes
an allocation of a file of this size to fail.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Fix for bch2_bkey_pack_pos() not initializing len/version fields
Kent Overstreet [Mon, 17 May 2021 20:43:30 +0000 (16:43 -0400)]
bcachefs: Fix for bch2_bkey_pack_pos() not initializing len/version fields

This bug led to push_whiteout() generating whiteouts that failed
bch2_bkey_invalid() due to nonzero length fields - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Fix a memcpy call
Kent Overstreet [Mon, 17 May 2021 20:10:06 +0000 (16:10 -0400)]
bcachefs: Fix a memcpy call

Not supposed to pass a null ptr to memcpy (even if the size is 0).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Fix bch2_extent_can_insert() call
Kent Overstreet [Mon, 17 May 2021 04:28:50 +0000 (00:28 -0400)]
bcachefs: Fix bch2_extent_can_insert() call

It was being skipped when hole punching, leading to problems when
splitting compressed extents.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: Make sure to pass a disk reservation to bch2_extent_update()
Kent Overstreet [Mon, 17 May 2021 04:08:06 +0000 (00:08 -0400)]
bcachefs: Make sure to pass a disk reservation to bch2_extent_update()

It's needed when we split an existing compressed extent - we get a null
ptr deref without it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
17 months agobcachefs: made changes to support clang, fixed a couple bugs
Brett Holman [Mon, 17 May 2021 03:53:55 +0000 (21:53 -0600)]
bcachefs: made changes to support clang, fixed a couple bugs

fs/bcachefs/bset.c              edited prefetch macro to add clang support
fs/bcachefs/btree_iter.c        bugfix: initialize iter->real_pos in bch2_btree_iter_init for later use
fs/bcachefs/io.c                bugfix: eliminated undefined behavior (negative bitshift)
fs/bcachefs/buckets.c           bugfix: invert sign to handle 64bit abs()

Signed-off-by: Brett Holman <bpholman5@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Fix locking in __bch2_set_nr_journal_buckets()
Kent Overstreet [Mon, 17 May 2021 03:46:08 +0000 (23:46 -0400)]
bcachefs: Fix locking in __bch2_set_nr_journal_buckets()

We weren't holding mark_lock correctly - it's needed for the new_fs
path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: properly initialize used values
Dan Robertson [Sat, 15 May 2021 00:02:44 +0000 (20:02 -0400)]
bcachefs: properly initialize used values

 - Ensure the second key value in bch_hash_info is initialized to zero
   if the info type is of type BCH_STR_HASH_SIPHASH.

 - Initialize the possibly returned value in bch2_inode_create. Assuming
   bch2_btree_iter_peek returns bkey_s_c_null, the uninitialized value
   of ret could be returned to the user as an error pointer.

 - Fix compiler warning in initialization of bkey_s_c_stripe

fs/bcachefs/buckets.c:1646:35: warning: suggest braces around initialization
of subobject [-Wmissing-braces]
        struct bkey_s_c_stripe new_s = { NULL };
                                         ^~~~

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Repair code for multiple types of data in same bucket
Kent Overstreet [Sat, 15 May 2021 01:28:37 +0000 (21:28 -0400)]
bcachefs: Repair code for multiple types of data in same bucket

bch2_check_fix_ptrs() is awkward, we need to find a way to improve it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Fix out of bounds read in fs usage ioctl
Dan Robertson [Wed, 5 May 2021 11:09:43 +0000 (07:09 -0400)]
bcachefs: Fix out of bounds read in fs usage ioctl

Fix a possible read out of bounds if bch2_ioctl_fs_usage is called when
replica_entries_bytes is set to a value that is smaller than the size
of bch_replicas_usage.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Fix null deref in bch2_ioctl_read_super
Dan Robertson [Thu, 13 May 2021 00:54:37 +0000 (20:54 -0400)]
bcachefs: Fix null deref in bch2_ioctl_read_super

Do not attempt to cleanup the returned value of bch2_device_lookup if
the returned value was an error pointer. We currently check to see if
the returned value is null and run the cleanup otherwise. As a result,
we attempt to run the cleanup on a error pointer.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Fix possible null deref on mount
Dan Robertson [Wed, 12 May 2021 18:07:57 +0000 (14:07 -0400)]
bcachefs: Fix possible null deref on mount

Ensure that the block device pointer in a superblock handle is not
null before dereferencing it in bch2_dev_to_fs. The block device pointer
may be null when mounting a new bcachefs filesystem given another mounted
bcachefs filesystem exists that has at least one device that is offline.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Fix error in parsing of mount options
Dan Robertson [Sun, 9 May 2021 22:52:23 +0000 (18:52 -0400)]
bcachefs: Fix error in parsing of mount options

When parsing the mount options duplicate the given options. This is
required as the options are parsed twice and strsep is used in parsing.
The options will be modified into a possibly invalid options set for the
second round of parsing if the options are not duplicated before
parsing.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: avoid out-of-bounds in split_devs
Stijn Tintel [Thu, 13 May 2021 20:08:47 +0000 (23:08 +0300)]
bcachefs: avoid out-of-bounds in split_devs

Calling mount with an empty source string causes an out-of-bounds error
in split_devs. Check the length of the source string to avoid this.

Signed-off-by: Stijn Tintel <stijn@linux-ipv6.be>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Make sure to use BTREE_ITER_PREFETCH in fsck
Kent Overstreet [Fri, 14 May 2021 20:56:26 +0000 (16:56 -0400)]
bcachefs: Make sure to use BTREE_ITER_PREFETCH in fsck

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Fix bch2_btree_iter_peek_with_updates()
Kent Overstreet [Fri, 30 Apr 2021 01:44:05 +0000 (21:44 -0400)]
bcachefs: Fix bch2_btree_iter_peek_with_updates()

By not re-fetching the next update we were going into an infinite loop.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Fix reflink trigger
Kent Overstreet [Tue, 4 May 2021 00:31:27 +0000 (20:31 -0400)]
bcachefs: Fix reflink trigger

The trigger for reflink pointers wasn't always incrementing/decrementing
the refcounts correctly - this patch fixes that logic.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Fix some refcounting bugs
Kent Overstreet [Sat, 8 May 2021 00:43:43 +0000 (20:43 -0400)]
bcachefs: Fix some refcounting bugs

We really need debug mode assertions that ca->ref and ca->io_ref are
used correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Fix oob write in __bch2_btree_node_write
Dan Robertson [Sat, 8 May 2021 02:29:02 +0000 (22:29 -0400)]
bcachefs: Fix oob write in __bch2_btree_node_write

Fix a possible out of bounds write in __bch2_btree_node_write when
the data buffer padding is cleared up to the block size. The out of
bounds write is possible if the data buffers size is not a multiple
of the block size.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Fix usage of last_seq + encryption
Kent Overstreet [Sat, 8 May 2021 03:32:26 +0000 (23:32 -0400)]
bcachefs: Fix usage of last_seq + encryption

jset->last_seq is in the region that's encrypted - on journal write
completion, we were using it and getting garbage. This patch shadows it
to fix.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Clean up bch2_btree_and_journal_walk()
Kent Overstreet [Thu, 29 Apr 2021 19:37:47 +0000 (15:37 -0400)]
bcachefs: Clean up bch2_btree_and_journal_walk()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Mark newly allocated btree nodes as accessed
Kent Overstreet [Thu, 29 Apr 2021 20:55:26 +0000 (16:55 -0400)]
bcachefs: Mark newly allocated btree nodes as accessed

This was a major oversight - this means under memory pressure we can end
up reading in a btree node, then having it evicted before we get to use
it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Fix time handling
Kent Overstreet [Thu, 29 Apr 2021 02:51:42 +0000 (22:51 -0400)]
bcachefs: Fix time handling

There were some overflows in the time conversion functions - fix this by
converting tv_sec and tv_nsec separately. Also, set sb->time_min and
sb->time_max.

Fixes xfstest generic/258.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Add a tracepoint for when we block on journal reclaim
Kent Overstreet [Thu, 29 Apr 2021 04:21:54 +0000 (00:21 -0400)]
bcachefs: Add a tracepoint for when we block on journal reclaim

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Make sure to initialize j->last_flushed
Kent Overstreet [Thu, 29 Apr 2021 02:12:07 +0000 (22:12 -0400)]
bcachefs: Make sure to initialize j->last_flushed

If the journal reclaim thread makes it to the timeout without ever
initializing j->last_flushed, we could end up sleeping for a very long
time.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Ensure that fpunch updates inode timestamps
Kent Overstreet [Wed, 28 Apr 2021 23:36:12 +0000 (19:36 -0400)]
bcachefs: Ensure that fpunch updates inode timestamps

Fixes xfstests generic/059

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Change copygc wait amount to be min of per device waits
Kent Overstreet [Tue, 27 Apr 2021 18:03:13 +0000 (14:03 -0400)]
bcachefs: Change copygc wait amount to be min of per device waits

We're seeing a filesystem get stuck when all devices but one have no
more reclaimable buckets - because the copygc wait amount is curretly
filesystem wide.

This patch should fix that, possibly at the expensive of running too
much when only one or a few devices is full and the rebalance thread
needs to move data around.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Change bch2_btree_key_cache_count() to exclude dirty keys
Kent Overstreet [Tue, 27 Apr 2021 18:02:00 +0000 (14:02 -0400)]
bcachefs: Change bch2_btree_key_cache_count() to exclude dirty keys

We're seeing livelocks that appear to be due to
bch2_btree_key_cache_scan repeatedly scanning and blocking other tasks
from using the key cache lock - we probably shouldn't be reporting
objects that can't actually be freed yet.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Call bch2_inconsistent_error() on missing stripe/indirect extent
Kent Overstreet [Fri, 30 Apr 2021 02:32:44 +0000 (22:32 -0400)]
bcachefs: Call bch2_inconsistent_error() on missing stripe/indirect extent

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: New tracepoint for bch2_trans_get_iter()
Kent Overstreet [Thu, 29 Apr 2021 20:56:17 +0000 (16:56 -0400)]
bcachefs: New tracepoint for bch2_trans_get_iter()

Trying to debug an issue where after traverse_all() we shouldn't have to
traverse any iterators... yet we are

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Fix __bch2_trans_get_iter()
Kent Overstreet [Tue, 27 Apr 2021 15:12:17 +0000 (11:12 -0400)]
bcachefs: Fix __bch2_trans_get_iter()

We need to also set iter->uptodate to indicate it needs to be traversed.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Evict btree nodes we're deleting
Kent Overstreet [Sun, 25 Apr 2021 20:24:03 +0000 (16:24 -0400)]
bcachefs: Evict btree nodes we're deleting

There was a bug that led to duplicate btree node pointers being inserted
at the wrong level. The new topology repair code can fix that, except
that the btree cache code gets confused when we read in a btree node
from the pointer that was at the wrong level. This patch evicts nodes
that we're deleting to, which nicely solves the problem.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: New check_nlinks algorithm for snapshots
Kent Overstreet [Thu, 22 Apr 2021 01:08:49 +0000 (21:08 -0400)]
bcachefs: New check_nlinks algorithm for snapshots

With snapshots, using a radix tree for the table of link counts won't
work anymore because we also need to distinguish between inodes with
different snapshot IDs. Instead, this patch builds up a sorted array of
inodes that have hardlinks that we can binary search on - taking
advantage of the fact that with inode backpointers, the check_nlinks()
pass _only_ needs to concern itself with inodes that have hardlinks now.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Fix a null ptr deref
Kent Overstreet [Sun, 25 Apr 2021 02:33:25 +0000 (22:33 -0400)]
bcachefs: Fix a null ptr deref

Fix a few memory safety issues, found by asan in userspace.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: New and improved topology repair code
Kent Overstreet [Sat, 24 Apr 2021 20:32:35 +0000 (16:32 -0400)]
bcachefs: New and improved topology repair code

This splits out btree topology repair into a separate pass, and makes
some improvements:
 - When we have to pick which of two overlapping nodes to drop keys
   from, we use the btree node header sequence number to preserve the
   newer node

 - the gc code has been changed so that it doesn't bail out if we're
   continuing/ignoring on fsck error - this way the dump tool can skip
   running the repair pass but still walk all reachable metadata

 - add a new superblock flag indicating when a filesystem is known to
   have btree topology issues, and the topology repair pass should be
   run

 - changing the start/end of a node might mean keys in that node have to
   be deleted: this patch handles that better by splitting it out into a
   separate function and running it explicitly in the topology repair
   code, previously those keys were only being dropped when the btree
   node was read in.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Fix key cache assertion
Kent Overstreet [Sat, 24 Apr 2021 22:02:59 +0000 (18:02 -0400)]
bcachefs: Fix key cache assertion

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: New helper __bch2_btree_insert_keys_interior()
Kent Overstreet [Fri, 23 Apr 2021 23:25:27 +0000 (19:25 -0400)]
bcachefs: New helper __bch2_btree_insert_keys_interior()

Consolidate common parts of bch2_btree_insert_keys_interior() and
btree_split_insert_keys() - prep work for adding some new topology
assertions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Rewrite btree nodes with errors
Kent Overstreet [Sat, 24 Apr 2021 06:47:41 +0000 (02:47 -0400)]
bcachefs: Rewrite btree nodes with errors

This patch adds self healing functionality for btree nodes - if we
notice a problem when reading a btree node, we just rewrite it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Fix bch2_verify_keylist_sorted
Kent Overstreet [Sat, 24 Apr 2021 04:59:29 +0000 (00:59 -0400)]
bcachefs: Fix bch2_verify_keylist_sorted

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Fix an out of bounds read
Kent Overstreet [Sat, 24 Apr 2021 04:42:02 +0000 (00:42 -0400)]
bcachefs: Fix an out of bounds read

bch2_varint_decode() can read up to 7 bytes past the end of the buffer,
which means we need to allocate slightly larger key cache buffers.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Use mmap() instead of vmalloc_exec() in userspace
Kent Overstreet [Sat, 24 Apr 2021 04:38:16 +0000 (00:38 -0400)]
bcachefs: Use mmap() instead of vmalloc_exec() in userspace

Calling mmap() directly is much better than malloc() then mprotect(), we
end up with much less address space fragmentation.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Don't BUG_ON() btree topology error
Kent Overstreet [Fri, 23 Apr 2021 20:05:49 +0000 (16:05 -0400)]
bcachefs: Don't BUG_ON() btree topology error

This replaces an assertion in the btree merge path with a
bch2_inconsistent_error() - fsck will fix it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Fix repair leading to replicas not marked
Kent Overstreet [Fri, 23 Apr 2021 20:18:43 +0000 (16:18 -0400)]
bcachefs: Fix repair leading to replicas not marked

bch2_check_fix_ptrs() was being called after checking if the replicas
set was marked - but repair could change which replicas set needed to be
marked. Oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Lookup/create lost+found lazily
Kent Overstreet [Tue, 20 Apr 2021 02:19:18 +0000 (22:19 -0400)]
bcachefs: Lookup/create lost+found lazily

This is prep work for subvolumes - each subvolume will have its own
lost+found.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Don't BUG() in update_replicas
Kent Overstreet [Wed, 21 Apr 2021 22:08:39 +0000 (18:08 -0400)]
bcachefs: Don't BUG() in update_replicas

Apparently, we have a bug where in mark and sweep while accounting for a
key, a replicas entry isn't found. Change the code to print out the key
we couldn't mark and halt instead of a BUG_ON().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Fix a deadlock on journal reclaim
Kent Overstreet [Tue, 20 Apr 2021 21:09:25 +0000 (17:09 -0400)]
bcachefs: Fix a deadlock on journal reclaim

Flushing the btree key cache needs to use allocation reserves - journal
reclaim depends on flushing the btree key cache for making forward
progress, and the allocator and copygc depend on journal reclaim making
forward progress.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Update bch2_btree_verify()
Kent Overstreet [Wed, 21 Apr 2021 00:21:12 +0000 (20:21 -0400)]
bcachefs: Update bch2_btree_verify()

bch2_btree_verify() verifies that the btree node on disk matches what we
have in memory. This patch changes it to verify every replica, and also
fixes it for interior btree nodes - there's a mem_ptr field which is
used as a scratch space and needs to be zeroed out for comparing with
what's on disk.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Fix two btree iterator leaks
Kent Overstreet [Wed, 21 Apr 2021 00:21:39 +0000 (20:21 -0400)]
bcachefs: Fix two btree iterator leaks

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Punt btree writes to workqueue to submit
Kent Overstreet [Tue, 6 Apr 2021 19:28:34 +0000 (15:28 -0400)]
bcachefs: Punt btree writes to workqueue to submit

We don't want to be submitting IO with btree locks held, and btree
writes usually aren't latency sensitive.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Fix a use after free
Kent Overstreet [Mon, 19 Apr 2021 21:17:34 +0000 (17:17 -0400)]
bcachefs: Fix a use after free

Turns out, we weren't waiting on in flight btree writes when freeing
existing btree nodes. This lead to stray btree writes overwriting newly
allocated buckets, but only started showing itself with some of the
recent allocator work and another patch to move submitting of btree
writes to worqueues.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Fix for btree_gc repairing interior btree ptrs
Kent Overstreet [Mon, 19 Apr 2021 21:07:20 +0000 (17:07 -0400)]
bcachefs: Fix for btree_gc repairing interior btree ptrs

Using the normal transaction commit path to insert and journal updates
to interior nodes hadn't been done before this repair code was written,
not surprising that there was a bug.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Preallocate trans mem in bch2_migrate_index_update()
Kent Overstreet [Mon, 19 Apr 2021 04:33:05 +0000 (00:33 -0400)]
bcachefs: Preallocate trans mem in bch2_migrate_index_update()

This will help avoid transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Allocator refactoring
Kent Overstreet [Sun, 18 Apr 2021 00:37:04 +0000 (20:37 -0400)]
bcachefs: Allocator refactoring

This uses the kthread_wait_freezable() macro to simplify a lot of the
allocator thread code, along with cleaning up bch2_invalidate_bucket2().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Always check for invalid bkeys in trans commit path
Kent Overstreet [Sun, 18 Apr 2021 21:44:35 +0000 (17:44 -0400)]
bcachefs: Always check for invalid bkeys in trans commit path

We check for this prior to metadata being written, but we're seeing some
strange bugs lately, and this will help catch those closer to where they
occur.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Check that keys are in the correct btrees
Kent Overstreet [Sun, 18 Apr 2021 03:18:17 +0000 (23:18 -0400)]
bcachefs: Check that keys are in the correct btrees

We've started seeing bug reports of pointers to btree nodes being
detected in leaf nodes. This should catch that before it's happened, and
it's something we should've been checking anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Handle errors in bch2_trans_mark_update()
Kent Overstreet [Sun, 18 Apr 2021 21:26:34 +0000 (17:26 -0400)]
bcachefs: Handle errors in bch2_trans_mark_update()

It's not actually the case that iterators are always checked here -
__bch2_trans_commit() checks for that after running triggers.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Allocator thread doesn't need gc_lock anymore
Kent Overstreet [Sat, 17 Apr 2021 01:53:23 +0000 (21:53 -0400)]
bcachefs: Allocator thread doesn't need gc_lock anymore

Even with runtime gc (which currently isn't supported), runtime gc no
longer clears/recalculates the main set of bucket marks - it allocates
and calculates another set, updating the primary at the end.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: gc shouldn't care about owned_by_allocator
Kent Overstreet [Sat, 17 Apr 2021 01:34:00 +0000 (21:34 -0400)]
bcachefs: gc shouldn't care about owned_by_allocator

The owned_by_allocator field is a purely in memory thing, even if/when
we bring back GC at runtime there's no need for it to be recalculating
this field. This is prep work for pulling it out of struct bucket, and
eventually getting rid of the bucket array.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
17 months agobcachefs: Refactor bchfs_fallocate() to not nest btree_trans on stack
Kent Overstreet [Sat, 17 Apr 2021 00:35:20 +0000 (20:35 -0400)]
bcachefs: Refactor bchfs_fallocate() to not nest btree_trans on stack

Upcoming patch is going to disallow multiple btree_trans on the stack.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>