Kent Overstreet [Mon, 15 Jun 2020 23:53:46 +0000 (19:53 -0400)]
bcachefs: Fix lock ordering with new btree cache code
The code that checks lock ordering was recently changed to go off of the
pos of the btree node, rather than the iterator, but the btree cache
code didn't update to handle iterators that point to cached bkeys. Oops
Also, update various debug code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 15 Jun 2020 21:59:09 +0000 (17:59 -0400)]
bcachefs: delete a slightly faulty assertion
state lock isn't held at startup
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 15 Jun 2020 21:38:26 +0000 (17:38 -0400)]
bcachefs: Increase size of btree node reserve
Also tweak the allocator to be more aggressive about keeping it full.
The recent changes to make updates to interior nodes transactional (and
thus generate updates to the alloc btree) all put more stress on the
btree node reserves.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 15 Jun 2020 20:59:36 +0000 (16:59 -0400)]
bcachefs: Give bkey_cached_key same attributes as bpos
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 5 Oct 2019 16:54:53 +0000 (12:54 -0400)]
bcachefs: Use cached iterators for alloc btree
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 8 Mar 2019 00:46:10 +0000 (19:46 -0500)]
bcachefs: Btree key cache
This introduces a new kind of btree iterator, cached iterators, which
point to keys cached in a hash table. The cache also acts as a write
cache - in the update path, we journal the update but defer updating the
btree until the cached entry is flushed by journal reclaim.
Cache coherency is for now up to the users to handle, which isn't ideal
but should be good enough for now.
These new iterators will be used for updating inodes and alloc info (the
alloc and stripes btrees).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 15 Jun 2020 19:10:54 +0000 (15:10 -0400)]
bcachefs: Implement a new gc that only recalcs oldest gen
Full mark and sweep gc doesn't (yet?) work with the new btree key cache
code, but it also blocks updates to interior btree nodes for the
duration and isn't really necessary in practice; we aren't currently
attempting to repair errors in allocation info at runtime.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 15 Jun 2020 18:58:47 +0000 (14:58 -0400)]
bcachefs: Turn c->state_lock into an rwsem
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 13 Jun 2020 22:43:14 +0000 (18:43 -0400)]
bcachefs: Add an internal option for reading entire journal
To be used the debug tool that dumps the contents of the journal.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 13 Jun 2020 02:29:48 +0000 (22:29 -0400)]
bcachefs: Don't deadlock when btree node reuse changes lock ordering
Btree node lock ordering is based on the logical key. However, 'struct
btree' may be reused for a different btree node under memory pressure.
This patch uses the new six lock callback to check if a btree node is no
longer the node we wanted to lock before blocking.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 12 Jun 2020 18:58:07 +0000 (14:58 -0400)]
bcachefs: Fix a deadlock
__bch2_btree_node_lock() was incorrectly using iter->pos as a proxy for
btree node lock ordering, this caused an off by one error that was
triggered by bch2_btree_node_get_sibling() getting the previous node.
This refactors the code to compare against btree node keys directly.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 10 Jun 2020 01:00:29 +0000 (21:00 -0400)]
bcachefs: Refactor btree insert path
This splits out the journalling code from the btree update code; prep
work for the btree key cache.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 10 Jun 2020 00:54:36 +0000 (20:54 -0400)]
bcachefs: Always give out journal pre-res if we already have one
This is better than skipping the journal pre-reservation if we already
have one - we should still acount for the journal reservation we're
going to have to get.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 9 Jun 2020 19:44:03 +0000 (15:44 -0400)]
bcachefs: More open buckets
We need a larger open bucket reserve now that the btree interior update
path holds onto open bucket references; filesystems with many high
through devices may need more open buckets now.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 9 Jun 2020 21:49:24 +0000 (17:49 -0400)]
bcachefs: Don't allocate memory under the btree cache lock
The btree cache lock is needed for reclaiming from the btree node cache,
and memory allocation can potentially spin and sleep (for 100 ms at a
time), so.. don't do that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 9 Jun 2020 20:25:07 +0000 (16:25 -0400)]
bcachefs: Fix a linked list bug
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 9 Jun 2020 19:46:22 +0000 (15:46 -0400)]
bcachefs: Make open bucket reserves more conservative
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 9 Jun 2020 19:59:03 +0000 (15:59 -0400)]
bcachefs: btree_update_nodes_written() requires alloc reserve
Also, in the btree_update_start() path, if we already have a journal
pre-reservation we don't want to take another - that's a deadlock.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 5 Jun 2020 13:01:23 +0000 (09:01 -0400)]
bcachefs: Check gfp_flags correctly in bch2_btree_cache_scan()
bch2_btree_node_mem_alloc() uses memalloc_nofs_save()/GFP_NOFS, but
GFP_NOFS does include __GFP_IO - oops. We used to use GFP_NOIO, but as
we're a filesystem now GFP_NOFS makes more sense now and is looser.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 8 Jun 2020 18:28:16 +0000 (14:28 -0400)]
bcachefs: Call bch2_btree_iter_traverse() if necessary in commit path
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 8 Jun 2020 17:26:48 +0000 (13:26 -0400)]
bcachefs: bch2_trans_downgrade()
bch2_btree_iter_downgrade() was looping over all iterators in a
transaction; bch2_trans_downgrade() should be doing that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 4 Jun 2020 03:47:50 +0000 (23:47 -0400)]
bcachefs: Improve warning for copygc failing to move data
This will help narrow down which code is at fault when this happens.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 4 Jun 2020 03:46:15 +0000 (23:46 -0400)]
bcachefs: Always increment bucket gen on bucket reuse
Not doing so confuses copygc
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 4 Jun 2020 02:11:10 +0000 (22:11 -0400)]
bcachefs: Kill old allocator startup code
It's not needed anymore since we can now write to buckets before
updating the alloc btree.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 3 Jun 2020 22:27:07 +0000 (18:27 -0400)]
bcachefs: Improve assorted error messages
This also consolidates the various checks in bch2_mark_pointer() and
bch2_trans_mark_pointer().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 2 Jun 2020 23:41:47 +0000 (19:41 -0400)]
bcachefs: Fix a deadlock in bch2_btree_node_get_sibling()
There was a bad interaction with bch2_btree_iter_set_pos_same_leaf(),
which can leave a btree node locked that is just outside iter->pos,
breaking the lock ordering checks in __bch2_btree_node_lock(). Ideally
we should get rid of this corner case, but for now fix it locally with
verbose comments.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 2 Jun 2020 20:36:11 +0000 (16:36 -0400)]
bcachefs: Add debug code to print btree transactions
Intented to help debug deadlocks, since we can't use lockdep to check
btree node lock ordering.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 3 Jun 2020 20:20:22 +0000 (16:20 -0400)]
bcachefs: Set filesystem features earlier in fs init path
Before we were setting features after allocating btree nodes, which
meant we were using the old btree pointer format.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 2 Jun 2020 20:30:54 +0000 (16:30 -0400)]
bcachefs: Add an option to disable reflink support
Reflink might be buggy, so we're adding an option so users can help
bisect what's going on.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 28 May 2020 20:06:13 +0000 (16:06 -0400)]
bcachefs: Fixes for going RO
Now that interior btree updates are fully transactional, we don't need
to write out alloc info in a loop. However, interior btree updates do
put more things in the journal, so we still need a loop in the RO
sequence.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 28 May 2020 19:51:50 +0000 (15:51 -0400)]
bcachefs: Don't require alloc btree to be updated before buckets are used
This is to break a circular dependency in the shutdown path.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 28 May 2020 21:15:41 +0000 (17:15 -0400)]
bcachefs: fsck_error_lock requires GFP_NOFS
this fixes a lockdep splat
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 25 May 2020 18:57:06 +0000 (14:57 -0400)]
bcachefs: Interior btree updates are now fully transactional
We now update the alloc info (bucket sector counts) atomically with
journalling the update to the interior btree nodes, and we also set new
btree roots atomically with the journalled part of the btree update.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 26 May 2020 00:35:53 +0000 (20:35 -0400)]
bcachefs: Factor out bch2_fs_btree_interior_update_init()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 25 May 2020 23:29:48 +0000 (19:29 -0400)]
bcachefs: Add a mechanism for passing extra journal entries to bch2_trans_commit()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 24 May 2020 18:06:10 +0000 (14:06 -0400)]
bcachefs: Fix reading of alloc info after unclean shutdown
When updates to interior nodes started being journalled, that meant that
after an unclean shutdown, until journal replay is done we can't walk
the btree without overlaying the updates from the journal.
The initial btree gc was changed to walk the btree overlaying keys from
the journal - but bch2_alloc_read() and bch2_stripes_read() were missed.
Major whoops...
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 27 May 2020 18:10:27 +0000 (14:10 -0400)]
bcachefs: fix memalloc_nofs_restore() usage
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 24 May 2020 18:20:00 +0000 (14:20 -0400)]
bcachefs: Better error messages on bucket sector count overflows
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 24 May 2020 17:37:44 +0000 (13:37 -0400)]
bcachefs: Be more rigorous about marking the filesystem clean
Previously, there was at least one error path where we could mark the
filesystem clean when we hadn't sucessfully written out alloc info.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 26 May 2020 01:25:31 +0000 (21:25 -0400)]
bcachefs: Handle printing of null bkeys
This fixes a null ptr deref.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 25 May 2020 22:47:21 +0000 (18:47 -0400)]
bcachefs: Add vmalloc fallback for decompress workspace
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 23 May 2020 15:44:12 +0000 (11:44 -0400)]
bcachefs: Print out d_type in dirent_to_text()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Yuxuan Shui [Fri, 22 May 2020 14:50:05 +0000 (15:50 +0100)]
bcachefs: fix stack corruption
When a bkey_on_stack is passed to bch_read_indirect_extent, there is no
guarantee that it will be big enough to hold the bkey. And
bch_read_indirect_extent is not aware of bkey_on_stack to call realloc
on it. This cause a stack corruption.
This commit makes bch_read_indirect_extent aware of bkey_on_stack so it
can call realloc when appropriate.
Tested-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 21 May 2020 21:23:40 +0000 (17:23 -0400)]
bcachefs: Wrap vmap() in memalloc_nofs_save()/restore()
vmalloc() and vmap() don't take GFP_NOFS - this should be pushed further
up the IO path, but for now just doing the simple fix.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 15 May 2020 01:45:08 +0000 (21:45 -0400)]
bcachefs: Fix another iterator counting bug
We were marking the end of where we could insert incorrectly for
indirect extents.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 13 May 2020 21:53:33 +0000 (17:53 -0400)]
bcachefs: Fix setquota
We were returning -EINTR because we were failing to retry the btree
transaction.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 13 May 2020 04:15:28 +0000 (00:15 -0400)]
bcachefs: Fix a workqueue deadlock
writes running out of a workqueue (via dio path) could block and prevent
other writes from calling bch2_write_index() and completing.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 12 May 2020 22:34:16 +0000 (18:34 -0400)]
bcachefs: Validate that we read the correct btree node
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 12 May 2020 00:01:07 +0000 (20:01 -0400)]
bcachefs: Fixes for startup on very full filesystems
- Always pass BTREE_INSERT_USE_RESERVE when writing alloc btree keys
- Don't strand buckest on the copygc freelist until after recovery is
done and we're starting copygc.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 9 May 2020 03:15:42 +0000 (23:15 -0400)]
bcachefs: Fix initialization of bounce mempools
When they were converted to kvpmalloc pools they weren't converted to
pass the actual size of the allocation. Oops.
Also, validate the real length in the zstd decompression path.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 6 May 2020 19:37:04 +0000 (15:37 -0400)]
bcachefs: Some compression improvements
In __bio_map_or_bounce(), the check for if the bio is physically
contiguous is improved; it's now more readable and handles multi page
but contiguous bios.
Also when decompressing, we were doing a redundant memcpy in the case
where we were able to use vmap to map a bio contigiously.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 2 May 2020 20:21:35 +0000 (16:21 -0400)]
bcachefs: Fix two more deadlocks
Deadlock on shutdown:
btree_update_nodes_written() unblocks btree nodes from being written;
after doing so, it has to check if they were marked as needing to be
written and if so kick off those writes - if that doesn't happen, we'll
never release journal pins and shutdown will get stuck when flushing the
journal.
There was an error path where this didn't happen, because in the error
path we don't actually want those btree nodes write to happen; however,
we still have to kick off the write path so the journal pins get
released. The btree write path checks if we're in a journal error state
and doesn't do the actual write if we are.
Also - there was another deadlock because btree_update_nodes_written()
was taking the btree update off of the unwritten_list too soon - before
getting a journal reservation, which could fail and have to be retried.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 1 May 2020 23:56:31 +0000 (19:56 -0400)]
bcachefs: Fix another deadlock in btree_update_nodes_written()
We also can't be blocking on btree node write locks while holding
btree_interior_update_lock.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 29 Apr 2020 16:57:04 +0000 (12:57 -0400)]
bcachefs: Add some printks for error paths
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 29 Apr 2020 19:28:25 +0000 (15:28 -0400)]
bcachefs: Don't issue writes that are more than 1 MB
the bcachefs io path in io.c can't bounce writes larger than that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 24 Apr 2020 21:57:59 +0000 (17:57 -0400)]
bcachefs: More fixes for counting extent update iterators
This is unfortunately really fragile - hopefully we'll be able to think
of a new approach at some point.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 24 Apr 2020 22:25:11 +0000 (18:25 -0400)]
bcachefs: Fix a deadlock
btree_node_lock_increment() was incorrectly skipping over the current
iter when checking if we should increment a node we already have locked.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 24 Apr 2020 18:08:56 +0000 (14:08 -0400)]
bcachefs: Handle -EINTR bch2_migrate_index_update()
peek_slot() shouldn't return -EINTR when there's only a single live
iterator, but that's tricky to guarantee - we seem to be returning
-EINTR when we shouldn't, but it's easy enough to handle in the caller.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 24 Apr 2020 18:08:18 +0000 (14:08 -0400)]
bcachefs: Fix for the bkey compat path
In the write path, we were calling bch2_bkey_ops.compat() in the wrong
place.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 11 Apr 2020 16:32:27 +0000 (12:32 -0400)]
bcachefs: Add a few tracepoints
Transaction restart tracing should probably be overhaulled at some
point.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 11 Apr 2020 16:31:16 +0000 (12:31 -0400)]
bcachefs: Slightly reduce btree split threshold
2/3rds performs a lot better than 3/4ths on the tested workloda, leading
to significanly fewer btree node compactions.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 11 Apr 2020 16:30:30 +0000 (12:30 -0400)]
bcachefs: Improve lockdep annotation in journalling code
bch2_journal_res_get() in nonblocking mode is equivalent to a trylock.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 11 Apr 2020 16:29:32 +0000 (12:29 -0400)]
bcachefs: Fix a locking bug in bch2_journal_pin_copy()
There was a race where the src pin would be flushed - releasing the last
pin on that sequence number - before adding the new journal pin. Oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 7 Apr 2020 21:27:12 +0000 (17:27 -0400)]
bcachefs: Fix another deadlock in the btree interior update path
Can't take read locks on btree nodes while holding
btree_interior_update_lock. Also, fix a bug where we were leaking
journal prereservations.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 7 Apr 2020 21:31:38 +0000 (17:31 -0400)]
bcachefs: Fix a locking bug in bch2_btree_ptr_debugcheck()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 7 Apr 2020 17:49:14 +0000 (13:49 -0400)]
bcachefs: Account for ioclock slop when throttling rebalance thread
This should fix an issue where the rebalance thread was spinning
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 6 Apr 2020 01:49:17 +0000 (21:49 -0400)]
bcachefs: Fix a deadlock on starting an interior btree update
Not legal to block on a journal prereservation with btree locks held.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 4 Apr 2020 20:47:59 +0000 (16:47 -0400)]
bcachefs: Fix a debug mode assertion
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 4 Apr 2020 19:49:42 +0000 (15:49 -0400)]
bcachefs: Fix a debug assertion
This assertion was passing the wrong btree node type when inserting into
interior nodes.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 4 Apr 2020 19:45:06 +0000 (15:45 -0400)]
bcachefs: Fix another error path locking bug
btree_update_nodes_written() was leaking a btree node lock on failure to
get a journal reservation.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 4 Apr 2020 17:54:19 +0000 (13:54 -0400)]
bcachefs: Fix a null ptr deref during journal replay
We were calling bch2_extent_can_insert() incorrectly; it should only be
called when the extents-to-keys pass is running because that's when we
could be splitting a compressed extent. Calling bch2_extent_can_insert()
without passing in a disk reservation was causing a null ptr deref.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 1 Apr 2020 21:28:39 +0000 (17:28 -0400)]
bcachefs: Add another mssing bch2_trans_iter_put() call
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 1 Apr 2020 21:14:14 +0000 (17:14 -0400)]
bcachefs: Trace where btree iterators are allocated
This will help with iterator overflow bugs.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 1 Apr 2020 20:07:57 +0000 (16:07 -0400)]
bcachefs: Fix fallocate FL_INSERT_RANGE
This was another bug because of bch2_btree_iter_set_pos() invalidating
iterators.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 31 Mar 2020 20:25:30 +0000 (16:25 -0400)]
bcachefs: Add print method for bch2_btree_ptr_v2
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 31 Mar 2020 20:23:43 +0000 (16:23 -0400)]
bcachefs: Fix journalling of interior node updates
We weren't journalling updates done while splitting/compacting nodes -
oops.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 30 Mar 2020 22:11:13 +0000 (18:11 -0400)]
bcachefs: Fix iterating of journal keys within a btree node
Extent btrees no longer have weird special behaviour for min_key.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 30 Mar 2020 21:43:21 +0000 (17:43 -0400)]
bcachefs: Fix a locking bug
Dropping the wrong kind of lock can't lead to anything good...
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 30 Mar 2020 18:29:06 +0000 (14:29 -0400)]
bcachefs: Fix inodes pass in fsck
It wasn't updated for the patch that switched inodes to using the offset
field of struct bkey.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 30 Mar 2020 18:05:05 +0000 (14:05 -0400)]
bcachefs: Fix ec_stripe_update_ptrs()
bch2_btree_iter_set_pos() invalidates the key returned by peek().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 29 Mar 2020 20:48:53 +0000 (16:48 -0400)]
bcachefs: Check btree topology at startup
When initial btree gc was changed to overlay journal keys as it walks
the btree, it also stopped checking btree topology.
Previously, checking btree topology was a fairly complicated affair -
but it's much easier now that btree_ptr_v2 has min_key in the pointer.
This rewrites the old range_checks code and uses it in both runtime and
initial gc.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 30 Mar 2020 16:33:30 +0000 (12:33 -0400)]
bcachefs: Don't allocate memory while holding journal reservation
This fixes a lockdep splat - allocating memory can call
bch2_clear_page_bits() which takes mark_lock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 29 Mar 2020 21:01:05 +0000 (17:01 -0400)]
bcachefs: Reduce max nr of btree iters when lockdep is on
This is so we don't overflow MAX_LOCK_DEPTH.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 7 Jan 2020 18:29:32 +0000 (13:29 -0500)]
bcachefs: Kill bkey_type_successor
Previously, BTREE_ID_INODES was special - inodes were indexed by the
inode field, which meant the offset field of struct bpos wasn't used,
which led to special cases in e.g. the btree iterator code.
Now, inodes in the inodes btree are indexed by the offset field.
Also: prevously min_key was special for extents btrees, min_key for
extents would equal max_key for the previous node. Now, min_key =
bkey_successor() of the previous node, same as non extent btrees.
This means we can completely get rid of
btree_type_sucessor/predecessor.
Also make some improvements to the metadata IO validate/compat code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 29 Mar 2020 18:21:44 +0000 (14:21 -0400)]
bcachefs: Switch a BUG_ON() to a warning
This has popped and thus needs to be debugged, but the assertion firing
isn't necessarily fatal so switch it to a warning.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 29 Mar 2020 16:33:41 +0000 (12:33 -0400)]
bcachefs: Use kvpmalloc mempools for compression bounce
This fixes an issue where mounting would fail because of memory
fragmentation - previously the compression bounce buffers were using
get_free_pages().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 28 Mar 2020 22:26:01 +0000 (18:26 -0400)]
bcachefs: Read journal when keep_journal on
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 28 Mar 2020 23:17:23 +0000 (19:17 -0400)]
bcachefs: Various fixes for interior update path
The locking was wrong, and we could get a use after free in the error
path where we weren't taking the entrie being freed off the unwritten
list.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 27 Mar 2020 21:38:51 +0000 (17:38 -0400)]
bcachefs: Use memalloc_nofs_save()
vmalloc allocations don't always obey GFP_NOFS - memalloc_nofs_save() is
the prefered approach for the future.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 25 Mar 2020 20:13:00 +0000 (16:13 -0400)]
bcachefs: Improve error message in fsck
Seeing the extents that were overlapping is highly useful for figuring
out what went wrong.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 25 Mar 2020 20:12:33 +0000 (16:12 -0400)]
bcachefs: Add an option for keeping journal entries after startup
This will be used by the userspace debug tools.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 25 Mar 2020 21:57:29 +0000 (17:57 -0400)]
bcachefs: Fix an assertion when nothing to replay
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 9 Feb 2020 00:06:31 +0000 (19:06 -0500)]
bcachefs: Journal updates to interior nodes
Previously, the btree has always been self contained and internally
consistent on disk without anything from the journal - the journal just
contained pointers to the btree roots.
However, this meant that btree node split or compact operations - i.e.
anything that changes btree node topology and involves updates to
interior nodes - would require that interior btree node to be written
immediately, which means emitting a btree node write that's mostly empty
(using 4k of space on disk if the filesystemm blocksize is 4k to only
write perhaps ~100 bytes of new keys).
More importantly, this meant most btree node writes had to be FUA, and
consumer drives have a history of slow and/or buggy FUA support - other
filesystes have been bit by this.
This patch changes the interior btree update path to journal updates to
interior nodes, after the writes for the new btree nodes have completed.
Best of all, it turns out to simplify the interior node update path
somewhat.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 16 Mar 2020 02:32:03 +0000 (22:32 -0400)]
bcachefs: Replay interior node keys
This slightly modifies the journal replay code so that it can replay
updates to interior nodes.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 16 Mar 2020 03:29:43 +0000 (23:29 -0400)]
bcachefs: trans_commit() path can now insert to interior nodes
This will be needed for the upcoming patches to journal updates to
interior btree nodes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 24 Mar 2020 21:00:48 +0000 (17:00 -0400)]
bcachefs: Disable extent merging
Extent merging is currently broken, and will be reimplemented
differently soon - right now it only happens when btree nodes are being
compacted, which makes it difficult to test.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 21 Mar 2020 18:47:00 +0000 (14:47 -0400)]
bcachefs: Fix a locking bug in fsck
This works around a btree locking issue - we can't be holding read locks
while taking write locks, which currently means we can't have live
iterators holding read locks at commit time.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 21 Mar 2020 18:08:01 +0000 (14:08 -0400)]
bcachefs: Fix count_iters_for_insert()
This fixes a transaction iterator overflow.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 18 Mar 2020 17:40:28 +0000 (13:40 -0400)]
bcachefs: Fix an iterator bug
We were incorrectly not restarting the transaction when re-traversing
iterators.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 18 Mar 2020 15:46:46 +0000 (11:46 -0400)]
bcachefs: Shut down quicker
Internal writes (i.e. copygc/rebalance operations) shouldn't be blocking
on the allocator when we're going RO.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>