linux.git
21 months agobcachefs: drop extra semicolon
Kent Overstreet [Tue, 19 Dec 2023 21:27:38 +0000 (16:27 -0500)]
bcachefs: drop extra semicolon

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Replace zero-length array with flex-array member and use __counted_by
Gustavo A. R. Silva [Tue, 19 Dec 2023 00:24:53 +0000 (18:24 -0600)]
bcachefs: Replace zero-length array with flex-array member and use __counted_by

Fake flexible arrays (zero-length and one-element arrays) are
deprecated, and should be replaced by flexible-array members.
So, replace zero-length array with a flexible-array member in
`struct bch_ioctl_fsck_offline`.

Also annotate array `devs` with `__counted_by()` to prepare for the
coming implementation by GCC and Clang of the `__counted_by` attribute.
Flexible array members annotated with `__counted_by` can have their
accesses bounds-checked at run-time via `CONFIG_UBSAN_BOUNDS` (for
array indexing) and `CONFIG_FORTIFY_SOURCE` (for strcpy/memcpy-family
functions).

This fixes the following -Warray-bounds warnings:
fs/bcachefs/chardev.c: In function 'bch2_ioctl_fsck_offline':
fs/bcachefs/chardev.c:363:34: warning: array subscript 0 is outside array bounds of '__u64[0]' {aka 'long long unsigned int[]'} [-Warray-bounds=]
  363 |         if (copy_from_user(devs, &user_arg->devs[0], sizeof(user_arg->devs[0]) * arg.nr_devs)) {
      |                                  ^~~~~~~~~~~~~~~~~~
In file included from fs/bcachefs/chardev.c:5:
fs/bcachefs/bcachefs_ioctl.h:400:33: note: while referencing 'devs'
  400 |         __u64                   devs[0];

This results in no differences in binary output.

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Use array_size() in call to copy_from_user()
Gustavo A. R. Silva [Tue, 19 Dec 2023 00:26:26 +0000 (18:26 -0600)]
bcachefs: Use array_size() in call to copy_from_user()

Use array_size() helper, instead of the open-coded version in
call to copy_from_user().

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: qstr_eq()
Kent Overstreet [Sun, 17 Dec 2023 02:16:34 +0000 (21:16 -0500)]
bcachefs: qstr_eq()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: bch_err_(fn|msg) check if should print
Kent Overstreet [Sun, 17 Dec 2023 03:43:41 +0000 (22:43 -0500)]
bcachefs: bch_err_(fn|msg) check if should print

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: fix userspace build errors
Kent Overstreet [Sat, 16 Dec 2023 03:16:51 +0000 (22:16 -0500)]
bcachefs: fix userspace build errors

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Drop journal entry compaction
Kent Overstreet [Mon, 11 Dec 2023 07:13:33 +0000 (02:13 -0500)]
bcachefs: Drop journal entry compaction

Previously, we dropped empty journal entries and coalesced entries that
could be - but it's not worth the overhead; we very rarely leave unused
journal entries after getting a journal reservation.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: kill btree_trans->wb_updates
Kent Overstreet [Sun, 12 Nov 2023 02:43:47 +0000 (21:43 -0500)]
bcachefs: kill btree_trans->wb_updates

the btree write buffer path now creates a journal entry directly

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: check_root() can now be run online
Kent Overstreet [Mon, 11 Dec 2023 03:51:16 +0000 (22:51 -0500)]
bcachefs: check_root() can now be run online

check_root() is simple enough to run as one single transaction, so is
trivial to run online.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Inline btree write buffer sort
Kent Overstreet [Sat, 4 Nov 2023 04:06:56 +0000 (00:06 -0400)]
bcachefs: Inline btree write buffer sort

The sort in the btree write buffer flush path is a very hot path, and
it's particularly performance sensitive since it's single threaded and
can block every other thread on a multithreaded write workload.

It's well worth doing a sort with inlined cmp and swap functions.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: btree write buffer now slurps keys from journal
Kent Overstreet [Thu, 2 Nov 2023 22:57:19 +0000 (18:57 -0400)]
bcachefs: btree write buffer now slurps keys from journal

Previosuly, the transaction commit path would have to add keys to the
btree write buffer as a separate operation, requiring additional global
synchronization.

This patch introduces a new journal entry type, which indicates that the
keys need to be copied into the btree write buffer prior to being
written out. We switch the journal entry type back to
JSET_ENTRY_btree_keys prior to write, so this is not an on disk format
change.

Flushing the btree write buffer may require pulling keys out of journal
entries yet to be written, and quiescing outstanding journal
reservations; we previously added journal->buf_lock for synchronization
with the journal write path.

We also can't put strict bounds on the number of keys in the journal
destined for the write buffer, which means we might overflow the size of
the preallocated buffer and have to reallocate - this introduces a
potentially fatal memory allocation failure. This is something we'll
have to watch for, if it becomes an issue in practice we can do
additional mitigation.

The transaction commit path no longer has to explicitly check if the
write buffer is full and wait on flushing; this is another performance
optimization. Instead, when the btree write buffer is close to full we
change the journal watermark, so that only reservations for journal
reclaim are allowed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: journal->buf_lock
Kent Overstreet [Fri, 3 Nov 2023 01:06:52 +0000 (21:06 -0400)]
bcachefs: journal->buf_lock

Add a new lock for synchronizing between journal IO path and btree write
buffer flush.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Unwritten journal buffers are always dirty
Kent Overstreet [Tue, 7 Nov 2023 23:08:38 +0000 (18:08 -0500)]
bcachefs: Unwritten journal buffers are always dirty

Ensure that journal bufs that haven't been written can't be reclaimed
from the journal pin fifo, and can thus have new pins taken.

Prep work for changing the btree write buffer to pull keys from the
journal directly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: bch2_trans_node_add no longer uses trans_for_each_path()
Kent Overstreet [Sun, 10 Dec 2023 22:44:04 +0000 (17:44 -0500)]
bcachefs: bch2_trans_node_add no longer uses trans_for_each_path()

In the future we'll be making trans->paths resizable and potentially
having _many_ more paths (for fsck); we need to start fixing algorithms
that walk each path in a transaction where possible.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Improve trans->extra_journal_entries
Kent Overstreet [Sun, 10 Dec 2023 21:48:22 +0000 (16:48 -0500)]
bcachefs: Improve trans->extra_journal_entries

Instead of using a darray, we now allocate journal entries for the
transaction commit path with our normal bump allocator - with an inlined
fastpath, and using btree_transaction_stats to remember how much to
initially allocate so as to avoid transaction restarts.

This is prep work for converting write buffer updates to use this
mechanism.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs; kill bch2_btree_key_cache_flush()
Kent Overstreet [Sun, 10 Dec 2023 22:52:58 +0000 (17:52 -0500)]
bcachefs; kill bch2_btree_key_cache_flush()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: kill btree_path->(alloc_seq|downgrade_seq)
Kent Overstreet [Sun, 10 Dec 2023 21:12:24 +0000 (16:12 -0500)]
bcachefs: kill btree_path->(alloc_seq|downgrade_seq)

These were for extra info in tracepoints for debugging a specialized
issue - we do not want to bloat btree_path for this, at least in release
builds.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Fix snapshot.c assertion for online fsck
Kent Overstreet [Sun, 10 Dec 2023 17:42:49 +0000 (12:42 -0500)]
bcachefs: Fix snapshot.c assertion for online fsck

c->curr_recovery_pass can go backwards; this adds a non rewinding
version, c->recovery_pass_done.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: six lock: fix typos
Randy Dunlap [Sun, 10 Dec 2023 06:06:44 +0000 (22:06 -0800)]
bcachefs: six lock: fix typos

Fix a few typos in the six.h header file.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Brian Foster <bfoster@redhat.com>
Cc: linux-bcachefs@vger.kernel.org
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: reserve path idx 0 for sentinal
Kent Overstreet [Thu, 7 Dec 2023 18:11:44 +0000 (13:11 -0500)]
bcachefs: reserve path idx 0 for sentinal

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Rename for_each_btree_key2() -> for_each_btree_key()
Kent Overstreet [Fri, 8 Dec 2023 04:33:11 +0000 (23:33 -0500)]
bcachefs: Rename for_each_btree_key2() -> for_each_btree_key()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Kill for_each_btree_key()
Kent Overstreet [Fri, 8 Dec 2023 04:28:26 +0000 (23:28 -0500)]
bcachefs: Kill for_each_btree_key()

for_each_btree_key() handles transaction restarts, like
for_each_btree_key2(), but only calls bch2_trans_begin() after a
transaction restart - for_each_btree_key2() wraps every loop iteration
in a transaction.

The for_each_btree_key() behaviour is problematic when it leads to
holding the SRCU lock that prevents key cache reclaim for an unbounded
amount of time - there's no real need to keep it around.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: continue now works in for_each_btree_key2()
Kent Overstreet [Fri, 8 Dec 2023 05:10:25 +0000 (00:10 -0500)]
bcachefs: continue now works in for_each_btree_key2()

continue now works as in any other loop

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Fix bch2_read_btree()
Kent Overstreet [Fri, 8 Dec 2023 04:50:38 +0000 (23:50 -0500)]
bcachefs: Fix bch2_read_btree()

In the debugfs code, we had an incorrect use of drop_locks_do(); on
transaction restart we don't want to restart the current loop iteration,
since we've already emitted the current key to the buffer for userspace.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Fix open coded set_btree_iter_dontneed()
Kent Overstreet [Wed, 6 Dec 2023 22:53:59 +0000 (17:53 -0500)]
bcachefs: Fix open coded set_btree_iter_dontneed()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: BCH_IOCTL_FSCK_ONLINE
Kent Overstreet [Mon, 4 Dec 2023 18:45:33 +0000 (13:45 -0500)]
bcachefs: BCH_IOCTL_FSCK_ONLINE

This adds a new ioctl for running fsck on a mounted, in use filesystem.

This reuses the fsck_thread code from the previous patch for running
fsck on an offline, unmounted filesystem, so that log messages for the
fsck thread are redirected to userspace.

Only one running fsck instance is allowed at a time; a new semaphore
(since the lock will be taken by one thread and released by another) is
added for this.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: BCH_IOCTL_FSCK_OFFLINE
Kent Overstreet [Wed, 12 Jul 2023 03:23:40 +0000 (23:23 -0400)]
bcachefs: BCH_IOCTL_FSCK_OFFLINE

This adds a new ioctl for running fsck on a list of devices.

Normally, if we wish to use the kernel's implementation of fsck we'd run
it at mount time with -o fsck. This ioctl lets us run fsck without
mounting, so that userspace bcachefs-tools can transparently switch to
the kernel's implementation of fsck when appropriate - primarily if the
kernel version of bcachefs better matches the filesystem on disk.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: bch2_run_online_recovery_passes()
Kent Overstreet [Wed, 6 Dec 2023 19:36:18 +0000 (14:36 -0500)]
bcachefs: bch2_run_online_recovery_passes()

Add a new helper for running online recovery passes - i.e. online fsck.
This is a subset of our normal recovery passes, and does not - for now -
use or follow c->curr_recovery_pass.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Mark recovery passses that are safe to run online
Kent Overstreet [Wed, 6 Dec 2023 19:24:26 +0000 (14:24 -0500)]
bcachefs: Mark recovery passses that are safe to run online

Online fsck is coming, and many of our recovery/fsck passes are already
safe to run while the filesystem is in use - mark which ones.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Add ability to redirect log output
Kent Overstreet [Tue, 5 Dec 2023 01:15:23 +0000 (20:15 -0500)]
bcachefs: Add ability to redirect log output

Upcoming patches are going to add two new ioctls for running fsck in the
kernel, but pretending that we're running our normal userspace fsck.

This patch adds some plumbing for redirecting our normal log messages
away from the dmesg log to a thread_with_file file descriptor - via a
struct log_output, which will be consumed by the fsck f_op's read method.

The new ioctls will allow for running fsck in the kernel against an
offline filesystem (without mounting it), and an online filesystem. For
an offline filesystem we need a way to pass in a pointer to the
log_output, which is done via a new hidden opts.h option.

For online fsck, we can set c->output directly, but only want to
redirect log messages from the thread running fsck - hence the new
c->output_filter method.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: thread_with_file
Kent Overstreet [Wed, 12 Jul 2023 04:20:22 +0000 (00:20 -0400)]
bcachefs: thread_with_file

Abstract out a new helper from the data job code, for connecting a
kthread to a file descriptor.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: c->ro_ref
Kent Overstreet [Wed, 6 Dec 2023 21:26:18 +0000 (16:26 -0500)]
bcachefs: c->ro_ref

Add a new refcount for async ops that don't necessarily need the fs to
be RW, with similar lifetime/rules otherwise as c->writes.

To be used by online fsck.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Improve error message when finding wrong btree node
Kent Overstreet [Tue, 5 Dec 2023 20:22:25 +0000 (15:22 -0500)]
bcachefs: Improve error message when finding wrong btree node

single_device.merge_torture_flakey is, very rarely, finding a btree node
that doesn't match the key that points to it: this patch improves the
error message to print out more fields from the btree node header, so
that we can see what else does or does not match the key.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: return from fsync on writeback error to avoid early shutdown
Brian Foster [Tue, 5 Dec 2023 13:24:39 +0000 (08:24 -0500)]
bcachefs: return from fsync on writeback error to avoid early shutdown

When investigating transient failures of generic/441 on bcachefs, it
was determined that the cause of the failure was a combination of
unconditional emergency shutdown and racing between background
journal activity and the test switchover from a working device
mapper table to an error injecting table.

Part of the reason for this sequence of events is that bcachefs
aggressively flushes as much as possible during fsync(), regardless
of errors. While this is reasonable behavior, it is technically
unnecessary because once an error is returned from fsync(), the
caller cannot make any assumptions about the resilience of data.

Tweak the bch2_fsync() logic to return an error on failure of any of
the steps involved in the flush. Note that this change alone does
not prevent generic/441 failure, but in combination with a test
tweak to avoid racing during the dm-error table switchover it avoids
the unnecessary shutdowns and allows the test to pass reliably on
bcachefs.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: BCH_ERR_opt_parse_error
Kent Overstreet [Mon, 4 Dec 2023 18:03:24 +0000 (13:03 -0500)]
bcachefs: BCH_ERR_opt_parse_error

Continuing the project of replacing generic error codes with more
specific ones.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Refactor trans->paths_allocated to be standard bitmap
Kent Overstreet [Mon, 4 Dec 2023 05:20:42 +0000 (00:20 -0500)]
bcachefs: Refactor trans->paths_allocated to be standard bitmap

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Move reflink_p triggers into reflink.c
Kent Overstreet [Sun, 3 Dec 2023 20:54:45 +0000 (15:54 -0500)]
bcachefs: Move reflink_p triggers into reflink.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Remove obsolete comment about zstd
Richard Davies [Sun, 3 Dec 2023 14:10:27 +0000 (14:10 +0000)]
bcachefs: Remove obsolete comment about zstd

Remove obsolete comment about zstd, since approach changed during
development of commit bbc3a46065d08f9ab3412b1f26bbfa778c444833

Signed-off-by: Richard Davies <richard@arachsys.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Include btree_trans in more tracepoints
Kent Overstreet [Sat, 2 Dec 2023 08:36:27 +0000 (03:36 -0500)]
bcachefs: Include btree_trans in more tracepoints

This gives us more context information - e.g. which codepath is invoking
btree node reads.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: remove sb lock and flags update on explicit shutdown
Brian Foster [Thu, 30 Nov 2023 19:17:11 +0000 (14:17 -0500)]
bcachefs: remove sb lock and flags update on explicit shutdown

bcachefs grabs s_umount and sets SB_RDONLY when the fs is shutdown
via the ioctl() interface. This has a couple issues related to
interactions between shutdown and freeze:

1. The flags == FSOP_GOING_FLAGS_DEFAULT case is a deadlock vector
   because freeze_bdev() calls into freeze_super(), which also
   acquires s_umount.

2. If an explicit shutdown occurs while the sb is frozen, SB_RDONLY
   alters the thaw path as if the sb was read-only at freeze time.
   This effectively leaks the frozen state and leaves the sb frozen
   indefinitely.

The usage of SB_RDONLY here goes back to the initial bcachefs commit
and AFAICT is simply historical behavior. This behavior is unique to
bcachefs relative to the handful of other filesystems that support
the shutdown ioctl(). Typically, SB_RDONLY is reserved for the
proper remount path, which itself is restricted from modifying
frozen superblocks in reconfigure_super(). Drop the unnecessary sb
lock and flags update bch2_ioc_goingdown() to address both of these
issues.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Make backpointer fsck wb flush check more rigorous
Kent Overstreet [Thu, 30 Nov 2023 07:16:19 +0000 (02:16 -0500)]
bcachefs: Make backpointer fsck wb flush check more rigorous

backpointers fsck now always runs in rw mode - the btree is being
modified while it runs, by e.g. copygc, rebalance, the discard worker,
the invalidate worker.

We could find a missing backpointer, flush the btree write buffer, and
then on the next iteration find a new key at the exact same position -
which will most likely need another write buffer flush.

Hence, we have to check for an exact match on last_flushed, not just the
pos.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: On missing backpointer to interior node, flush interior updates
Kent Overstreet [Thu, 30 Nov 2023 07:11:15 +0000 (02:11 -0500)]
bcachefs: On missing backpointer to interior node, flush interior updates

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: remove redundant condition from data_update_index_update
Daniel Hill [Mon, 27 Nov 2023 08:52:33 +0000 (21:52 +1300)]
bcachefs: remove redundant condition from data_update_index_update

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: copygc shouldn't try moving buckets on error
Daniel Hill [Mon, 27 Nov 2023 10:37:44 +0000 (23:37 +1300)]
bcachefs: copygc shouldn't try moving buckets on error

Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Explicity go RW for fsck
Kent Overstreet [Tue, 28 Nov 2023 21:36:54 +0000 (16:36 -0500)]
bcachefs: Explicity go RW for fsck

This eliminates a lot of BCH_TRANS_COMMIT_lazy_rw flags, and is less
error prone.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: copygc should wakeup on shutdown if disabled
Daniel Hill [Tue, 28 Nov 2023 06:24:47 +0000 (19:24 +1300)]
bcachefs: copygc should wakeup on shutdown if disabled

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: rebalance should wakeup on shutdown if disabled
Daniel Hill [Sun, 26 Nov 2023 06:33:31 +0000 (19:33 +1300)]
bcachefs: rebalance should wakeup on shutdown if disabled

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: remove dead bch2_evacuate_bucket()
Daniel Hill [Sun, 26 Nov 2023 07:26:07 +0000 (20:26 +1300)]
bcachefs: remove dead bch2_evacuate_bucket()

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Replace zero-length arrays with flexible-array members
Gustavo A. R. Silva [Tue, 28 Nov 2023 18:22:55 +0000 (12:22 -0600)]
bcachefs: Replace zero-length arrays with flexible-array members

Fake flexible arrays (zero-length and one-element arrays) are
deprecated, and should be replaced by flexible-array members.

So, replace zero-length arrays with flexible-array members
in multiple structures.

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: more write buffer refactoring
Kent Overstreet [Mon, 27 Nov 2023 03:06:48 +0000 (22:06 -0500)]
bcachefs: more write buffer refactoring

prep work for big rewrite - no functional changes in this patch.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: wb_flush_one_slowpath()
Kent Overstreet [Mon, 27 Nov 2023 02:58:11 +0000 (21:58 -0500)]
bcachefs: wb_flush_one_slowpath()

A bit of refactoring for better inlining in the main btree write buffer
flush path.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: ONLY_SPECIFIED_DEVS doesn't mean ignore durability anymore
Kent Overstreet [Wed, 29 Nov 2023 00:47:26 +0000 (19:47 -0500)]
bcachefs: ONLY_SPECIFIED_DEVS doesn't mean ignore durability anymore

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Don't open code bch2_dev_exists2()
Kent Overstreet [Tue, 28 Nov 2023 21:30:45 +0000 (16:30 -0500)]
bcachefs: Don't open code bch2_dev_exists2()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Improve trace_trans_restart_would_deadlock
Kent Overstreet [Fri, 26 May 2023 20:59:07 +0000 (16:59 -0400)]
bcachefs: Improve trace_trans_restart_would_deadlock

In the CI, we're seeing tests failing due to excessive would_deadlock
transaction restarts - the tracepoint now includes the lock cycle that
occured.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Improve trace_trans_restart_too_many_iters()
Kent Overstreet [Sun, 26 Nov 2023 22:02:06 +0000 (17:02 -0500)]
bcachefs: Improve trace_trans_restart_too_many_iters()

We now include the list of paths in use.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: count_event()
Kent Overstreet [Tue, 28 Nov 2023 03:37:27 +0000 (22:37 -0500)]
bcachefs: count_event()

Small helper for event counters.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: bch2_btree_write_buffer_flush() -> bch2_btree_write_buffer_tryflush()
Kent Overstreet [Fri, 3 Nov 2023 00:36:00 +0000 (20:36 -0400)]
bcachefs: bch2_btree_write_buffer_flush() -> bch2_btree_write_buffer_tryflush()

More accurate naming.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: bch2_btree_write_buffer_flush_locked()
Kent Overstreet [Fri, 3 Nov 2023 00:32:19 +0000 (20:32 -0400)]
bcachefs: bch2_btree_write_buffer_flush_locked()

Minor refactoring - improved naming, and move the responsibility for
flush_lock to the caller instead of having it be shared.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Clean up btree write buffer write ref handling
Kent Overstreet [Thu, 2 Nov 2023 23:37:15 +0000 (19:37 -0400)]
bcachefs: Clean up btree write buffer write ref handling

__bch2_btree_write_buffer_flush() now assumes a write ref is already
held (as called by the transaction commit path); and the wrappers
bch2_write_buffer_flush() and flush_sync() take an explicit write ref.

This means internally the write buffer code can always use
BTREE_INSERT_NOCHECK_RW, instead of in the previous code passing flags
around and hoping the NOCHECK_RW flag was always carried around
correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: delete useless commit_do()
Kent Overstreet [Mon, 27 Nov 2023 06:46:18 +0000 (01:46 -0500)]
bcachefs: delete useless commit_do()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: kill journal->preres_wait
Kent Overstreet [Mon, 27 Nov 2023 06:42:42 +0000 (01:42 -0500)]
bcachefs: kill journal->preres_wait

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Improve btree write buffer tracepoints
Kent Overstreet [Fri, 3 Nov 2023 02:31:16 +0000 (22:31 -0400)]
bcachefs: Improve btree write buffer tracepoints

 - add a tracepoint for write_buffer_flush_sync; this is expensive
 - fix the write_buffer_flush_slowpath tracepoint

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: No need to allocate keys for write buffer
Kent Overstreet [Mon, 27 Nov 2023 01:18:16 +0000 (20:18 -0500)]
bcachefs: No need to allocate keys for write buffer

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: convert bch_fs_flags to x-macro
Kent Overstreet [Sun, 26 Nov 2023 22:05:02 +0000 (17:05 -0500)]
bcachefs: convert bch_fs_flags to x-macro

Now we can print out filesystem flags in sysfs, useful for debugging
various "what's my filesystem doing" issues.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Kill journal_seq/gc args to bch2_dev_usage_update_m()
Kent Overstreet [Sun, 26 Nov 2023 03:39:21 +0000 (22:39 -0500)]
bcachefs: Kill journal_seq/gc args to bch2_dev_usage_update_m()

This is only used by gc (fsck).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Refactor bch2_check_alloc_to_lru_ref()
Kent Overstreet [Sat, 25 Nov 2023 20:46:02 +0000 (15:46 -0500)]
bcachefs: Refactor bch2_check_alloc_to_lru_ref()

This code was somewhat convoluted - because originally bch2_lru_set()
could modify the LRU index if there was a collision.

That's no longer the case, so the "create LRU entry" path has no reason
to update the alloc key, so we can separate the handling of the two fsck
errors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Add a rebalance, data_update tracepoints
Kent Overstreet [Sat, 25 Nov 2023 02:52:17 +0000 (21:52 -0500)]
bcachefs: Add a rebalance, data_update tracepoints

Add a tracepoint for rebalance, printing out
 - the target option
 - the compression option
 - the key being rebalanced

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Print durability in member_to_text()
Kent Overstreet [Sat, 25 Nov 2023 05:05:30 +0000 (00:05 -0500)]
bcachefs: Print durability in member_to_text()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Improve sysfs compression_stats
Kent Overstreet [Sat, 25 Nov 2023 04:40:08 +0000 (23:40 -0500)]
bcachefs: Improve sysfs compression_stats

Break it out by compression type, and include average extent size.

Also, format into a nice table.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Kill dev_usage->buckets_ec
Kent Overstreet [Thu, 23 Nov 2023 23:43:23 +0000 (18:43 -0500)]
bcachefs: Kill dev_usage->buckets_ec

This counter is redundant; it's simply the sum of BCH_DATA_stripe and
BCH_DATA_parity buckets.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: bch2_dev_usage_to_text()
Kent Overstreet [Thu, 23 Nov 2023 23:25:31 +0000 (18:25 -0500)]
bcachefs: bch2_dev_usage_to_text()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: New bucket sector count helpers
Kent Overstreet [Thu, 23 Nov 2023 23:05:18 +0000 (18:05 -0500)]
bcachefs: New bucket sector count helpers

This introduces bch2_bucket_sectors() and bch2_bucket_sectors_dirty(),
prep work for separately accounting stripe sectors.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: BCH_IOCTL_DEV_USAGE_V2
Kent Overstreet [Fri, 24 Nov 2023 00:26:27 +0000 (19:26 -0500)]
bcachefs: BCH_IOCTL_DEV_USAGE_V2

BCH_IOCTL_DEV_USAGE mistakenly put the per-data-type array in struct
bch_ioctl_dev_usage; since ioctl numbers encode the size of the arg,
that means adding new data types breaks the ioctl.

This adds a new version that includes the number of data types as a
parameter: the old version is fixed at 10 so as to not break when adding
new types.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Simplify check_bucket_ref()
Kent Overstreet [Thu, 23 Nov 2023 22:17:38 +0000 (17:17 -0500)]
bcachefs: Simplify check_bucket_ref()

We only need the sector count being modified.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: six locks: Simplify optimistic spinning
Kent Overstreet [Tue, 31 Oct 2023 01:03:32 +0000 (21:03 -0400)]
bcachefs: six locks: Simplify optimistic spinning

osq lock maintainers don't want it to be used outside of kernel/locking/
- but, we can do better.

Since we have lock handoff signalled via waitlist entries, there's no
reason for optimistic spinning to have to look at the lock at all -
aside from checking lock-owner; we can just spin looking at our waitlist
entry.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agopowerpc: Export kvm_guest static key, for bcachefs six locks
Kent Overstreet [Wed, 13 Sep 2023 23:59:03 +0000 (19:59 -0400)]
powerpc: Export kvm_guest static key, for bcachefs six locks

bcachefs's six locks need kvm_guest, via
 ower_on_cpu() ->  vcpu_is_preempted() -> is_kvm_guest()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: linuxppc-dev@lists.ozlabs.org
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
21 months agobcachefs: BCH_DATA_OP_drop_extra_replicas
Kent Overstreet [Tue, 21 Nov 2023 00:12:40 +0000 (19:12 -0500)]
bcachefs: BCH_DATA_OP_drop_extra_replicas

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Convert bch2_move_btree() to bbpos
Kent Overstreet [Mon, 20 Nov 2023 23:52:33 +0000 (18:52 -0500)]
bcachefs: Convert bch2_move_btree() to bbpos

Minor cleanup.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: x-macro-ify bch_data_ops enum
Kent Overstreet [Mon, 20 Nov 2023 23:43:48 +0000 (18:43 -0500)]
bcachefs: x-macro-ify bch_data_ops enum

This will let us add an enum -> string table for a to_text() fn.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: clean up one inconsistent indenting
Yang Li [Tue, 21 Nov 2023 01:05:15 +0000 (09:05 +0800)]
bcachefs: clean up one inconsistent indenting

fs/bcachefs/journal_io.c:1843 bch2_journal_write_pick_flush() warn: inconsistent indenting

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7585
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: add a quieter bch2_read_super
Daniel Hill [Sun, 19 Nov 2023 20:53:36 +0000 (09:53 +1300)]
bcachefs: add a quieter bch2_read_super

If we're looking for a bcachefs supers iteratively we don't want to see
this error.

This function replaces KERN_ERR with KERN_INFO for when we don't find a
bcachefs superblock but preserves other errors.

Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Don't use update_cached_sectors() in bch2_mark_alloc()
Kent Overstreet [Sat, 11 Nov 2023 22:40:45 +0000 (17:40 -0500)]
bcachefs: Don't use update_cached_sectors() in bch2_mark_alloc()

bch2_update_cached_sectors_list() is closer to how the new disk space
accounting works, called from trans_mark().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Rename bch_replicas_entry -> bch_replicas_entry_v1
Kent Overstreet [Thu, 9 Nov 2023 18:52:35 +0000 (13:52 -0500)]
bcachefs: Rename bch_replicas_entry -> bch_replicas_entry_v1

Prep work for introducing bch_replicas_entry_v2

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Kill btree_iter->journal_pos
Kent Overstreet [Fri, 17 Nov 2023 23:38:09 +0000 (18:38 -0500)]
bcachefs: Kill btree_iter->journal_pos

For BTREE_ITER_WITH_JOURNAL, we memoize lookups in the journal keys, to
avoid the binary search overhead.

Previously we stashed the pos of the last key returned from the journal,
in order to force the lookup to be redone when rewinding.

Now bch2_journal_keys_peek_upto() handles rewinding itself when
necessary - so we can slim down btree_iter.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Kill memset() in bch2_btree_iter_init()
Kent Overstreet [Fri, 17 Nov 2023 03:35:29 +0000 (22:35 -0500)]
bcachefs: Kill memset() in bch2_btree_iter_init()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Add a tracepoint for journal entry close
Kent Overstreet [Fri, 17 Nov 2023 01:41:10 +0000 (20:41 -0500)]
bcachefs: Add a tracepoint for journal entry close

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Don't flush journal after replay
Kent Overstreet [Fri, 10 Nov 2023 04:43:35 +0000 (23:43 -0500)]
bcachefs: Don't flush journal after replay

The flush_all_pins() after journal replay was unecessary, and trying to
completely flush the journal while RW is not a great idea - it's not
guaranteed to terminate if other threads keep adding things to the
jorunal.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Don't rejournal keys in key cache flush
Kent Overstreet [Tue, 14 Nov 2023 02:12:35 +0000 (21:12 -0500)]
bcachefs: Don't rejournal keys in key cache flush

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Fix userspace bch2_prt_datetime()
Kent Overstreet [Tue, 14 Nov 2023 00:55:09 +0000 (19:55 -0500)]
bcachefs: Fix userspace bch2_prt_datetime()

ctime_r() outputs a newline, which we don't want.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Kill BTREE_ITER_ALL_LEVELS
Kent Overstreet [Mon, 13 Nov 2023 01:35:51 +0000 (20:35 -0500)]
bcachefs: Kill BTREE_ITER_ALL_LEVELS

As discussed in the previous patch, BTREE_ITER_ALL_LEVELS appears to be
racy with concurrent interior node updates - and perhaps it is fixable,
but it's tricky and unnecessary.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: backpointers fsck no longer uses BTREE_ITER_ALL_LEVELS
Kent Overstreet [Mon, 13 Nov 2023 01:20:35 +0000 (20:20 -0500)]
bcachefs: backpointers fsck no longer uses BTREE_ITER_ALL_LEVELS

It appears that BTREE_ITER_ALL_LEVELS is racy with concurrent interior
node btree updates; unfortunate but not terribly surprising it's a
difficult problem - that was the original reason for gc_lock.

BTREE_ITER_ALL_LEVELS will probably be deleted in a subsequent patch,
this changes backpointers fsck to instead walk keys at one level of the
btree at a time.

This fixes the tiering_drop_alloc test, which stopped working with the
patch to not flush the journal after journal replay.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Improve btree_path_dowgrade tracepoint
Kent Overstreet [Mon, 13 Nov 2023 02:47:15 +0000 (21:47 -0500)]
bcachefs: Improve btree_path_dowgrade tracepoint

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Rename BTREE_INSERT flags
Kent Overstreet [Sat, 11 Nov 2023 21:31:50 +0000 (16:31 -0500)]
bcachefs: Rename BTREE_INSERT flags

BTREE_INSERT flags are actually transaction commit flags - rename them
for clarity.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: bch_str_hash_flags_t
Kent Overstreet [Sat, 11 Nov 2023 21:31:50 +0000 (16:31 -0500)]
bcachefs: bch_str_hash_flags_t

Create a separate enum for str_hash flags - instead of abusing the
btree_insert_flags enum - and create a __bitwise typedef for sparse
typechecking.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Kill dead BTREE_INSERT flags
Kent Overstreet [Sat, 11 Nov 2023 21:20:58 +0000 (16:20 -0500)]
bcachefs: Kill dead BTREE_INSERT flags

BTREE_INSERT_NOWAIT and BTREE_INSERT_GC_LOCK_HELD are no longer used,
and can be deleted.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Fix redundant variable initialization
Kent Overstreet [Sat, 11 Nov 2023 21:02:15 +0000 (16:02 -0500)]
bcachefs: Fix redundant variable initialization

path->level was being read, but never used.

Reported-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Avoiding dropping/retaking write locks in bch2_btree_write_buffer_flush_one()
Kent Overstreet [Tue, 7 Nov 2023 15:42:53 +0000 (10:42 -0500)]
bcachefs: Avoiding dropping/retaking write locks in bch2_btree_write_buffer_flush_one()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Make journal replay more efficient
Kent Overstreet [Fri, 10 Nov 2023 02:02:58 +0000 (21:02 -0500)]
bcachefs: Make journal replay more efficient

Journal replay now first attempts to replay keys in sorted order,
similar to how the btree write buffer flush path works.

Any keys that can not be replayed due to journal deadlock are then left
for later and replayed in journal order, unpinning journal entries as we
go.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Go rw before journal replay
Kent Overstreet [Fri, 10 Nov 2023 01:41:58 +0000 (20:41 -0500)]
bcachefs: Go rw before journal replay

This gets us slightly nicer log messages.

Also, this slightly clarifies synchronization of c->journal_keys; after
we go RW it's in use by multiple threads (so that the btree iterator
code can overlay keys from the journal); so it has to be prepped before
that point.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
21 months agobcachefs: Kill BTREE_UPDATE_PREJOURNAL
Kent Overstreet [Thu, 9 Nov 2023 03:04:29 +0000 (22:04 -0500)]
bcachefs: Kill BTREE_UPDATE_PREJOURNAL

With the previous patch that reworks BTREE_INSERT_JOURNAL_REPLAY, we can
now switch the btree write buffer to use it for flushing.

This has the advantage that transaction commits don't need to take a
journal reservation at all.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>