Kent Overstreet [Wed, 30 Jun 2021 19:44:11 +0000 (15:44 -0400)]
bcachefs: Use memalloc_nofs_save() in bch2_read_endio()
This solves a problematic memory allocation in bch2_bio_uncompress() ->
vmap().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Wed, 23 Jun 2021 01:51:17 +0000 (21:51 -0400)]
bcachefs: Fix btree_node_read_all_replicas() error handling
We weren't checking bch2_btree_node_read_done() for errors, oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Wed, 23 Jun 2021 00:44:54 +0000 (20:44 -0400)]
bcachefs: Don't loop into topology repair
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Mon, 21 Jun 2021 20:28:43 +0000 (16:28 -0400)]
bcachefs: Don't ratelimit certain fsck errors
It's unhelpful if we see "Halting mark and sweep to start topology
repair" but we don't see the error that triggered it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Dan Robertson [Thu, 17 Jun 2021 03:21:23 +0000 (23:21 -0400)]
bcachefs: ensure iter->should_be_locked is set
Ensure that iter->should_be_locked value is set to true before we
call bch2_trans_update in ec_stripe_update_ptrs.
Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 11 Jun 2021 03:34:02 +0000 (23:34 -0400)]
bcachefs: Don't disable preemption unnecessarily
Small improvements to some percpu utility code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Fri, 11 Jun 2021 01:44:27 +0000 (21:44 -0400)]
bcachefs: Extensive triggers cleanups
- We no longer mark subsets of extents, they're marked like regular
keys now - which means we can drop the offset & sectors arguments
to trigger functions
- Drop other arguments that are no longer needed anymore in various
places - fs_usage
- Drop the logic for handling extents in bch2_mark_update() that isn't
needed anymore, to match bch2_trans_mark_update()
- Better logic for hanlding the BTREE_ITER_CACHED_NOFILL case, where we
don't have an old key to mark
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Tue, 15 Jun 2021 02:29:54 +0000 (22:29 -0400)]
bcachefs: fix truncate with ATTR_MODE
After the v5.12 rebase, we started oopsing when truncate was passed
ATTR_MODE, due to not passing mnt_userns to setattr_copy(). This
refactors things so that truncate/extend finish by using
bch2_setattr_nonsize(), which solves the problem.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Mon, 14 Jun 2021 22:16:10 +0000 (18:16 -0400)]
bcachefs: Improve iter->should_be_locked
Adding iter->should_be_locked introduced a regression where it ended up
not being set on the iterator passed to bch2_btree_update_start(), which
is definitely not what we want.
This patch requires it to be set when calling bch2_trans_update(), and
adds various fixups to make that happen.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Mon, 14 Jun 2021 20:35:03 +0000 (16:35 -0400)]
bcachefs: Kill __btree_delete_at()
With trans->updates2 gone, we can now drop this helper and use
bch2_btree_delete_at() instead.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Mon, 14 Jun 2021 20:32:44 +0000 (16:32 -0400)]
bcachefs: Make sure bch2_trans_mark_update uses correct iter flags
Now that bch2_btree_iter_peek_with_updates() has been removed in favor
of BTREE_ITER_WITH_UPDATES, we need to make sure it's not used where we
don't want it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Mon, 14 Jun 2021 18:47:26 +0000 (14:47 -0400)]
bcachefs: Fix a memory leak in dio write path
Commit
c42bca92be928ce7dece5fc04cf68d0e37ee6718 "bio: don't copy bvec
for direct IO" changed bio_iov_iter_get_pages() to point bio->bi_iovec
at the incoming biovec, meaning if we already allocated one, it'll be
leaked.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Janpieter Sollie [Sun, 13 Jun 2021 20:01:08 +0000 (22:01 +0200)]
bcachefs: fix a possible bcachefs checksum mapping error opt-checksum enum to type-checksum enum
This fixes some rare cases where the metadata checksum option specified
may map to the wrong actual checksum type.
Signed-off-by: Janpieter Sollie <janpieter.sollie@edpnet.be>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 13 Jun 2021 02:33:53 +0000 (22:33 -0400)]
bcachefs: Clear iter->should_be_locked in bch2_trans_reset
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Fri, 11 Jun 2021 03:33:27 +0000 (23:33 -0400)]
bcachefs: Don't underflow c->sectors_available
This rarely used error path should've been checking for underflow -
oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Fri, 11 Jun 2021 00:15:50 +0000 (20:15 -0400)]
bcachefs: Kill bch2_btree_iter_peek_cached()
It's now been rolled into bch2_btree_iter_peek_slot()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Sat, 12 Jun 2021 21:20:02 +0000 (17:20 -0400)]
bcachefs: Allow shorter JSET_ENTRY_dev_usage entries
If the last entry(ies) would be all zeros, there's no need to write them
out - the read path already handles that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Dan Robertson [Thu, 10 Jun 2021 11:52:42 +0000 (07:52 -0400)]
bcachefs: mount: fix null deref with null devname
- Fix null deref on mount when given a null device name.
- Move the dev_name checks to return EINVAL when it is invalid.
Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 12 Jun 2021 19:45:56 +0000 (15:45 -0400)]
bcachefs: Fix null ptr deref when splitting compressed extents
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Fri, 11 Jun 2021 03:51:09 +0000 (23:51 -0400)]
bcachefs: Fix overflow in journal_replay_entry_early
If filesystem on disk was used by a version with a larger BCH_DATA_NR
thas the currently running version, we don't want this to cause a buffer
overrun.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Mon, 7 Jun 2021 20:50:30 +0000 (16:50 -0400)]
bcachefs: Always zero memory from bch2_trans_kmalloc()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Sat, 15 May 2021 19:04:08 +0000 (15:04 -0400)]
bcachefs: Merging for indirect extents
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Sat, 15 May 2021 04:37:37 +0000 (00:37 -0400)]
bcachefs: Improved extent merging
Previously, checksummed extents could only be merged when the checksum
covered only the currently live data.
xfstest generic/064 creates a test file, then uses finsert calls to
split the extent, then collapse calls to see if they get merged. But
without any reads to trigger the narrow_crcs path, each of the split
extents will still have a checksum for the entire original extent.
This patch improves the extent merge path so that if either of the
extents we're attempting to merge has a checksum that covers the entire
merged extent, we just use that checksum.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Thu, 29 Apr 2021 03:52:19 +0000 (23:52 -0400)]
bcachefs: Re-implement extent merging in transaction commit path
We haven't had extent merging in quite some time. It used to be done by
the btree code when sorting btree nodes, but that was eliminated as part
of the work to separate extent handling from core btree code.
This patch re-implements extent merging in the transaction commit path.
We don't currently have the ability to merge reflink pointers, we need
to do some work on the triggers code to be able to do that without
ending up with incorrect refcounts.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Thu, 29 Apr 2021 03:52:19 +0000 (23:52 -0400)]
bcachefs: Refactor extent_handle_overwrites()
Prep work for extent merging
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 29 Apr 2021 03:49:30 +0000 (23:49 -0400)]
bcachefs: Clean up key merging
This patch simplifies the key merging code by getting rid of partial
merges - it's simpler and saner if we just don't merge extents when
they'd overflow k->size.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Mon, 7 Jun 2021 18:54:56 +0000 (14:54 -0400)]
bcachefs: Kill trans->updates2
Now that extent handling has been lifted to bch2_trans_update(), we
don't need to keep two different lists of updates.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Mon, 7 Jun 2021 17:39:21 +0000 (13:39 -0400)]
bcachefs: Simplify reflink trigger
Now that we only mark entire extents, we can ditch the
"reflink_p_frag_references" code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Wed, 2 Jun 2021 04:18:34 +0000 (00:18 -0400)]
bcachefs: Move extent_handle_overwrites() to bch2_trans_update()
This lifts handling of overlapping extents out of __bch2_trans_commit()
and moves it to where we first do the update - which means that
BTREE_ITER_WITH_UPDATES can now work correctly in extents mode.
Also, this patch reworks how extent triggers work: previously, on
partial extent overwrite we would pass this information to the trigger,
telling it what part of the extent was being overwritten. But, this
approach has had too many subtle corner cases - now, we only mark whole
extents, meaning on partial extent overwrite we unmark the old extent
and mark the new extent.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Sat, 31 Dec 2022 00:15:53 +0000 (19:15 -0500)]
bcachefs: bch2_btree_iter_peek_slot() now saves initial position when searching
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 31 Dec 2022 00:15:53 +0000 (19:15 -0500)]
bcachefs: Kill __bch2_btree_iter_peek_slot_extents()
This codepath won't just be for extents in the future, it'll also be for
BTREE_ITER_FILTER_SNAPSHOTS mode.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 31 Dec 2022 03:41:38 +0000 (22:41 -0500)]
bcachefs: bch2_btree_iter_peek_slot() now supports BTREE_ITER_WITH_UPDATES
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 4 Jun 2021 04:29:49 +0000 (00:29 -0400)]
bcachefs: BTREE_ITER_WITH_UPDATES
This drops bch2_btree_iter_peek_with_updates() and replaces it with a
new flag, BTREE_ITER_WITH_UPDATES, and also reworks
bch2_btree_iter_peek_slot() to respect it too.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 20 Mar 2021 19:12:05 +0000 (15:12 -0400)]
bcachefs: Child btree iterators
This adds the ability for btree iterators to own child iterators - to be
used by an upcoming rework of bch2_btree_iter_peek_slot(), so we can
scan forwards while maintaining our current position.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Fri, 9 Apr 2021 02:26:53 +0000 (22:26 -0400)]
bcachefs: Drop all btree locks when submitting btree node reads
As a rule we don't want to be holding btree locks while submitting IO -
this will improve overall filesystem latency.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Mon, 7 Jun 2021 17:28:50 +0000 (13:28 -0400)]
bcachefs: More topology repair code
This improves the handling of overlapping btree nodes; now, we handle
the case where one btree node completely overwrites another.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Thu, 10 Jun 2021 17:21:39 +0000 (13:21 -0400)]
bcachefs: Fix a buffer overrun
In make_extent_indirect(), we were allocating too small of a buffer for
the new indirect extent.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Wed, 9 Jun 2021 02:50:30 +0000 (22:50 -0400)]
bcachefs: Don't mark superblocks past end of usable space
bcachefs-tools recently started putting a backup superblock at the end
of the device. This causes a problem if the bucket size doesn't divide
the device size - but we can fix it by just skipping marking that part.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Tue, 8 Jun 2021 20:29:24 +0000 (16:29 -0400)]
bcachefs: Fix a spurious debug mode assertion
When we switched to using bch2_btree_bset_insert_key() for extents it
turned out it started leaving invalid keys around - of type deleted but
nonzero size - but this is fine (if ugly) because they're never written
out.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Brett Holman [Sun, 6 Jun 2021 15:29:42 +0000 (09:29 -0600)]
bcachefs: Fix unitialized use of a value
Signed-off-by: Brett Holman <bpholman5@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Dan Robertson [Sat, 5 Jun 2021 23:03:16 +0000 (19:03 -0400)]
bcachefs: do not compile acl mod on minimal config
Do not compile the acl.o target if BCACHEFS_POSIX_ACL is not enabled.
Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 4 Jun 2021 21:17:45 +0000 (17:17 -0400)]
bcachefs: btree_iter->should_be_locked
Add a field to struct btree_iter for tracking whether it should be
locked - this fixes spurious transaction restarts in
bch2_trans_relock().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Fri, 4 Jun 2021 19:18:10 +0000 (15:18 -0400)]
bcachefs: Improve btree iterator tracepoints
This patch adds some new tracepoints to the btree iterator code, and
adds new fields to the existing tracepoints - primarily for the iterator
position.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Thu, 3 Jun 2021 03:31:42 +0000 (23:31 -0400)]
bcachefs: Preallocate transaction mem
This helps avoid transaction restarts.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Wed, 2 Jun 2021 04:15:07 +0000 (00:15 -0400)]
bcachefs: Check for errors from bch2_trans_update()
Upcoming refactoring is going to change bch2_trans_update() to start
returning transaction restarts.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Tue, 1 Jun 2021 00:52:39 +0000 (20:52 -0400)]
bcachefs; Check for allocator thread shutdown
We were missing a kthread_should_stop() check in the loop in
bch2_invalidate_buckets(), very occasionally leading to us getting stuck
while shutting down.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Mon, 31 May 2021 04:13:39 +0000 (00:13 -0400)]
bcachefs: Journal space calculation fix
When devices have different bucket sizes, we may accumulate a journal
write that doesn't fit on some of our devices - previously, we'd
underflow when calculating space on that device and then everything
would get weird.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Sun, 21 Mar 2021 02:14:10 +0000 (22:14 -0400)]
bcachefs: Don't fragment extents when making them indirect
This fixes a "disk usage increased without a reservation" bug, when
reflinking compressed extents. Also, there's no good reason for reflink
to be fragmenting extents anyways.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Sun, 23 May 2021 06:31:33 +0000 (02:31 -0400)]
bcachefs: Fsck for reflink refcounts
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Sun, 23 May 2021 21:04:13 +0000 (17:04 -0400)]
bcachefs: Assorted endianness fixes
Found by sparse
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Mon, 11 Sep 2023 03:33:08 +0000 (23:33 -0400)]
bcachefs: Fix a deadlock
Waiting on a btree node write with btree locks held can deadlock, if the
write errors: the write error path has to do do a btree update to drop
the pointer to the replica that errored.
The interior update path has to wait on in flight btree writes before
freeing nodes on disk. Previously, this was done in
bch2_btree_interior_update_will_free_node(), and could deadlock; now, we
just stash a pointer to the node and do it in
btree_update_nodes_written(), just prior to the transactional part of
the update.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 28 May 2021 01:38:00 +0000 (21:38 -0400)]
bcachefs: Split out btree_error_wq
We can't use btree_update_wq becuase btree updates may be waiting on
btree writes to complete.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 28 May 2021 09:06:18 +0000 (05:06 -0400)]
bcachefs: Fix pathalogical behaviour with inode sharding by cpu ID
If the transactior restarts on a different CPU, it could end up needing
to read in a different btree node, which makes another transaction
restart more likely...
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Fri, 28 May 2021 03:16:25 +0000 (23:16 -0400)]
bcachefs: Fix journal write error path
Journal write errors were racing with the submission path - potentially
causing writes to other replicas to not get submitted.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Fri, 28 May 2021 01:16:50 +0000 (21:16 -0400)]
bcachefs: Reflink refcount fix
__bch2_trans_mark_reflink_p wasn't always correctly returning the number
of sectors processed - the new logic is a bit more straightforward
overall too.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Fri, 28 May 2021 00:20:20 +0000 (20:20 -0400)]
bcachefs: Add an option to control sharding new inode numbers
We're seeing a bug where inode creates end up spinning in
bch2_inode_create - disabling sharding will simplify what we're testing.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Sat, 29 Oct 2022 06:47:33 +0000 (02:47 -0400)]
bcachefs: Don't use bch_write_op->cl for delivering completions
We already had op->end_io as an alternative mechanism to op->cl.parent
for delivering write completions; this switches all code paths to using
op->end_io.
Two reasons:
- op->end_io is more efficient, due to fewer atomic ops, this completes
the conversion that was originally only done for the direct IO path.
- We'll be restructing the write path to use a different mechanism for
punting to process context, refactoring to not use op->cl will make
that easier.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 29 Oct 2022 03:57:01 +0000 (23:57 -0400)]
bcachefs: Kill bch_write_op.index_update_fn
This deletes bch_write_op.index_update_fn: indirect function calls have
gotten considerably more expensive post spectre/meltdown, and we only
have two different index_update_fns - this patch adds a flag to specify
which one to use (normal vs. data move path).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 1 Nov 2022 02:28:09 +0000 (22:28 -0400)]
bcachefs: Inline fastpath of bch2_disk_reservation_add()
The fastpath now doesn't even disable preemption - instead we use a (non
locked) cmpxchg.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 27 May 2021 23:15:44 +0000 (19:15 -0400)]
bcachefs: Don't use uuid in tracepoints
%pU for printing out pointers to uuids doesn't work in perf trace
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Wed, 26 May 2021 05:03:35 +0000 (01:03 -0400)]
bcachefs: Add a tracepoint for copygc waiting
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Tue, 25 May 2021 22:42:05 +0000 (18:42 -0400)]
bcachefs: Add a cond_resched call to the copygc main loop
We seem to have a bug where the copygc thread ends up spinning and
making the system unusable - this will at least prevent it from locking
up the machine, and it's a good thing to have anyways.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Sun, 23 May 2021 22:42:51 +0000 (18:42 -0400)]
bcachefs: Fix a null ptr deref
bch2_btree_iter_peek() won't always return a key - whoops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Sun, 23 May 2021 01:43:20 +0000 (21:43 -0400)]
bcachefs: Fix an issue with inconsistent btree writes after unclean shutdown
After unclean shutdown, btree writes may have completed on one device
and not others - and this inconsistency could lead us to writing new
bsets with a gap in our btree node in one of our replicas.
Fortunately, this is only an issue with bsets that are newer than the
most recent journal flush, and we already have a mechanism for detecting
and blacklisting those. We just need to make sure to start new btree
writes after the most recent _non_ blacklisted bset.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Sun, 23 May 2021 01:13:17 +0000 (21:13 -0400)]
bcachefs: Improve FS_IOC_GOINGDOWN ioctl
We weren't interpreting the flags argument at all.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Sat, 22 May 2021 21:37:25 +0000 (17:37 -0400)]
bcachefs: Add a workqueue for btree io completions
Also, clean up workqueue usage - we shouldn't be using system
workqueues, pretty much everything we do needs to be on our own
WQ_MEM_RECLAIM workqueues.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Brett Holman [Fri, 21 May 2021 22:45:38 +0000 (16:45 -0600)]
bcachefs: rewrote prefetch asm in gas syntax for clang compatibility
Signed-off-by: Brett Holman <bpholman5@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 22 May 2021 03:57:37 +0000 (23:57 -0400)]
bcachefs: Add a debug mode that always reads from every btree replica
There's a new module parameter, verify_all_btree_replicas, that enables
reading from every btree replica when reading in btree nodes and
comparing them against each other. We've been seeing some strange btree
corruption - this will hopefully aid in tracking it down and catching it
more often.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Fri, 21 May 2021 20:06:54 +0000 (16:06 -0400)]
bcachefs: Don't repair btree nodes until after interior journal replay is done
We need the btree to be in a consistent state before we can rewrite
btree nodes.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Fri, 21 May 2021 00:47:27 +0000 (20:47 -0400)]
bcachefs: Fix an uninitialized var
this fixes a valgrind complaint
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Thu, 20 May 2021 19:49:23 +0000 (15:49 -0400)]
bcachefs: Fix for buffered writes getting -ENOSPC
Buffered writes may have to increase their disk reservation at btree
update time, due to compression and erasure coding being unpredictable:
O_DIRECT writes should be checking for -ENOSPC, but buffered writes have
already been accepted and should not.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Thu, 20 May 2021 04:09:47 +0000 (00:09 -0400)]
bcachefs: Fix inode backpointers in RENAME_OVERWRITE
When we delete the dirent an inode points to, we need to zero out the
backpointer fields - this was missed in the RENAME_OVERWRITE case.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Thu, 20 May 2021 01:21:49 +0000 (21:21 -0400)]
bcachefs: Make bch2_remap_range respect O_SYNC
Caught by xfstest generic/628
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Wed, 19 May 2021 03:17:03 +0000 (23:17 -0400)]
bcachefs: Split extents if necessary in bch2_trans_update()
Currently, we handle multiple overlapping extents in the same
transaction commit by doing fixups in bch2_trans_update() - this patch
extents that to split updates when necessary. The next patch that
changes the reflink code to not fragment extents when making them
indirect will require this.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Wed, 19 May 2021 03:53:43 +0000 (23:53 -0400)]
bcachefs: Ratelimiting for writeback IOs
Writeback throttling is a kernel config option and not always enabled.
When it's not enabled we need a fallback, to avoid unbounded memory
pinning and work item backlogs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Dan Robertson [Wed, 19 May 2021 00:36:20 +0000 (20:36 -0400)]
bcachefs: statfs resports incorrect avail blocks
The current implementation of bch_statfs does not scale the number of
available blocks provided in f_bavail by the reserve factor. This causes
an allocation of a file of this size to fail.
Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 17 May 2021 20:43:30 +0000 (16:43 -0400)]
bcachefs: Fix for bch2_bkey_pack_pos() not initializing len/version fields
This bug led to push_whiteout() generating whiteouts that failed
bch2_bkey_invalid() due to nonzero length fields - oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Mon, 17 May 2021 20:10:06 +0000 (16:10 -0400)]
bcachefs: Fix a memcpy call
Not supposed to pass a null ptr to memcpy (even if the size is 0).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Mon, 17 May 2021 04:28:50 +0000 (00:28 -0400)]
bcachefs: Fix bch2_extent_can_insert() call
It was being skipped when hole punching, leading to problems when
splitting compressed extents.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Mon, 17 May 2021 04:08:06 +0000 (00:08 -0400)]
bcachefs: Make sure to pass a disk reservation to bch2_extent_update()
It's needed when we split an existing compressed extent - we get a null
ptr deref without it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Brett Holman [Mon, 17 May 2021 03:53:55 +0000 (21:53 -0600)]
bcachefs: made changes to support clang, fixed a couple bugs
fs/bcachefs/bset.c edited prefetch macro to add clang support
fs/bcachefs/btree_iter.c bugfix: initialize iter->real_pos in bch2_btree_iter_init for later use
fs/bcachefs/io.c bugfix: eliminated undefined behavior (negative bitshift)
fs/bcachefs/buckets.c bugfix: invert sign to handle 64bit abs()
Signed-off-by: Brett Holman <bpholman5@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 17 May 2021 03:46:08 +0000 (23:46 -0400)]
bcachefs: Fix locking in __bch2_set_nr_journal_buckets()
We weren't holding mark_lock correctly - it's needed for the new_fs
path.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Dan Robertson [Sat, 15 May 2021 00:02:44 +0000 (20:02 -0400)]
bcachefs: properly initialize used values
- Ensure the second key value in bch_hash_info is initialized to zero
if the info type is of type BCH_STR_HASH_SIPHASH.
- Initialize the possibly returned value in bch2_inode_create. Assuming
bch2_btree_iter_peek returns bkey_s_c_null, the uninitialized value
of ret could be returned to the user as an error pointer.
- Fix compiler warning in initialization of bkey_s_c_stripe
fs/bcachefs/buckets.c:1646:35: warning: suggest braces around initialization
of subobject [-Wmissing-braces]
struct bkey_s_c_stripe new_s = { NULL };
^~~~
Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 15 May 2021 01:28:37 +0000 (21:28 -0400)]
bcachefs: Repair code for multiple types of data in same bucket
bch2_check_fix_ptrs() is awkward, we need to find a way to improve it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Dan Robertson [Wed, 5 May 2021 11:09:43 +0000 (07:09 -0400)]
bcachefs: Fix out of bounds read in fs usage ioctl
Fix a possible read out of bounds if bch2_ioctl_fs_usage is called when
replica_entries_bytes is set to a value that is smaller than the size
of bch_replicas_usage.
Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Dan Robertson [Thu, 13 May 2021 00:54:37 +0000 (20:54 -0400)]
bcachefs: Fix null deref in bch2_ioctl_read_super
Do not attempt to cleanup the returned value of bch2_device_lookup if
the returned value was an error pointer. We currently check to see if
the returned value is null and run the cleanup otherwise. As a result,
we attempt to run the cleanup on a error pointer.
Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Dan Robertson [Wed, 12 May 2021 18:07:57 +0000 (14:07 -0400)]
bcachefs: Fix possible null deref on mount
Ensure that the block device pointer in a superblock handle is not
null before dereferencing it in bch2_dev_to_fs. The block device pointer
may be null when mounting a new bcachefs filesystem given another mounted
bcachefs filesystem exists that has at least one device that is offline.
Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Dan Robertson [Sun, 9 May 2021 22:52:23 +0000 (18:52 -0400)]
bcachefs: Fix error in parsing of mount options
When parsing the mount options duplicate the given options. This is
required as the options are parsed twice and strsep is used in parsing.
The options will be modified into a possibly invalid options set for the
second round of parsing if the options are not duplicated before
parsing.
Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Stijn Tintel [Thu, 13 May 2021 20:08:47 +0000 (23:08 +0300)]
bcachefs: avoid out-of-bounds in split_devs
Calling mount with an empty source string causes an out-of-bounds error
in split_devs. Check the length of the source string to avoid this.
Signed-off-by: Stijn Tintel <stijn@linux-ipv6.be>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 14 May 2021 20:56:26 +0000 (16:56 -0400)]
bcachefs: Make sure to use BTREE_ITER_PREFETCH in fsck
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 30 Apr 2021 01:44:05 +0000 (21:44 -0400)]
bcachefs: Fix bch2_btree_iter_peek_with_updates()
By not re-fetching the next update we were going into an infinite loop.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 4 May 2021 00:31:27 +0000 (20:31 -0400)]
bcachefs: Fix reflink trigger
The trigger for reflink pointers wasn't always incrementing/decrementing
the refcounts correctly - this patch fixes that logic.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 8 May 2021 00:43:43 +0000 (20:43 -0400)]
bcachefs: Fix some refcounting bugs
We really need debug mode assertions that ca->ref and ca->io_ref are
used correctly.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Dan Robertson [Sat, 8 May 2021 02:29:02 +0000 (22:29 -0400)]
bcachefs: Fix oob write in __bch2_btree_node_write
Fix a possible out of bounds write in __bch2_btree_node_write when
the data buffer padding is cleared up to the block size. The out of
bounds write is possible if the data buffers size is not a multiple
of the block size.
Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 8 May 2021 03:32:26 +0000 (23:32 -0400)]
bcachefs: Fix usage of last_seq + encryption
jset->last_seq is in the region that's encrypted - on journal write
completion, we were using it and getting garbage. This patch shadows it
to fix.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 29 Apr 2021 19:37:47 +0000 (15:37 -0400)]
bcachefs: Clean up bch2_btree_and_journal_walk()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 29 Apr 2021 20:55:26 +0000 (16:55 -0400)]
bcachefs: Mark newly allocated btree nodes as accessed
This was a major oversight - this means under memory pressure we can end
up reading in a btree node, then having it evicted before we get to use
it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 29 Apr 2021 02:51:42 +0000 (22:51 -0400)]
bcachefs: Fix time handling
There were some overflows in the time conversion functions - fix this by
converting tv_sec and tv_nsec separately. Also, set sb->time_min and
sb->time_max.
Fixes xfstest generic/258.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 29 Apr 2021 04:21:54 +0000 (00:21 -0400)]
bcachefs: Add a tracepoint for when we block on journal reclaim
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 29 Apr 2021 02:12:07 +0000 (22:12 -0400)]
bcachefs: Make sure to initialize j->last_flushed
If the journal reclaim thread makes it to the timeout without ever
initializing j->last_flushed, we could end up sleeping for a very long
time.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>