Kent Overstreet [Thu, 19 Nov 2020 16:53:38 +0000 (11:53 -0500)]
bcachefs: More debug code improvements
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 18 Nov 2020 19:09:33 +0000 (14:09 -0500)]
bcachefs: Add a kmem_cache for btree_key_cache objects
We allocate a lot of these, and we're seeing sporading OOMs - this will
help with tracking those down.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 18 Nov 2020 18:21:59 +0000 (13:21 -0500)]
bcachefs: Be more precise with journal error reporting
We were incorrectly detecting a journal deadlock - the journal filling
up - when only the journal pin fifo had filled up; if the journal pin
fifo is full that just means we need to wait on reclaim.
This plumbs through better error reporting so we can better discriminate
in the journal_res_get path what's going on.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 20 Nov 2020 01:13:30 +0000 (20:13 -0500)]
bcachefs: Add btree cache stats to sysfs
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 16 Nov 2020 19:23:06 +0000 (14:23 -0500)]
bcachefs: Add an ioctl for resizing journal on a device
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 16 Nov 2020 19:16:42 +0000 (14:16 -0500)]
bcachefs: Add more debug checks
tracking down a bug where we see a btree node pointer in the wrong node
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 16 Nov 2020 23:21:55 +0000 (18:21 -0500)]
bcachefs: Dump journal state when the journal deadlocks
Currently tracking down one of these bugs.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 16 Nov 2020 23:20:50 +0000 (18:20 -0500)]
bcachefs: Dont' use percpu btree_iter buf in userspace
bcachefs-tools doesn't have a real percpu (per thread) implementation
yet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 16 Nov 2020 01:52:55 +0000 (20:52 -0500)]
bcachefs: Set preallocated transaction mem to avoid restarts
this will reduce transaction restarts, from observation of tracepoints.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 16 Nov 2020 18:06:28 +0000 (13:06 -0500)]
bcachefs: Convert tracepoints to use %ps, not %pf
Symbol decoding was changed from %pf to %ps
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 16 Nov 2020 17:22:30 +0000 (12:22 -0500)]
bcachefs: Fix journal entry repair code
When we detect bad keys in the journal that have to be dropped, the flow
control was wrong - we ended up not checking the next key in that entry.
Oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 12 Nov 2020 22:19:47 +0000 (17:19 -0500)]
bcachefs: Add a shrinker for the btree key cache
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 15 Nov 2020 21:30:22 +0000 (16:30 -0500)]
bcachefs: Take a SRCU lock in btree transactions
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 15 Nov 2020 21:31:58 +0000 (16:31 -0500)]
bcachefs: Check for errors from register_shrinker()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 14 Nov 2020 21:04:30 +0000 (16:04 -0500)]
bcachefs: Assorted journal refactoring
Improved the way we track various state by adding j->err_seq, which
records the first journal sequence number that encountered an error
being written, and j->last_empty_seq, which records the most recent
journal entry that was completely empty.
Also, use the low bits of the journal sequence number to index the
corresponding journal_buf.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 14 Nov 2020 18:12:50 +0000 (13:12 -0500)]
bcachefs: Delete dead journalling code
Usage of the journal has gotten somewhat simpler over time - neat.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 13 Nov 2020 21:19:24 +0000 (16:19 -0500)]
bcachefs: Improve journal error messages
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 13 Nov 2020 20:03:34 +0000 (15:03 -0500)]
bcachefs: Be more careful in bch2_bkey_to_text()
This is used to print keys that failed bch2_bkey_invalid(), so be more
careful with k->type.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 13 Nov 2020 21:51:02 +0000 (16:51 -0500)]
bcachefs: Inode delete doesn't need to flush key cache anymore
Inode create checks to make sure the slot doesn't exist in the btree key
cache.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 13 Nov 2020 23:30:53 +0000 (18:30 -0500)]
bcachefs: Fix a btree transaction iter overflow
extent_replay_key dates from before putting iterators was required -
fixed.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 13 Nov 2020 19:49:57 +0000 (14:49 -0500)]
bcachefs: Fix a 64 bit divide
this fixes builds on 32 bit.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 13 Nov 2020 19:39:43 +0000 (14:39 -0500)]
bcachefs: Improve journal entry validate code
Previously, the journal entry read code was changed so that if we got a
journal entry that failed validation, we'd try to use it, preferring to
use a good version from another device if available.
But this left a bug where if an earlier validation check (say, checksum)
failed, the later checks (for last_seq) wouldn't run and we'd end up
using a journal entry with a garbage last_seq field. This fixes that so
that the later validation checks run and if necessary change those
fields to something sensible.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 11 Nov 2020 17:33:12 +0000 (12:33 -0500)]
bcachefs: Deadlock prevention for ei_pagecache_lock
In the dio write path, when get_user_pages() invokes the fault handler
we have a recursive locking situation - we have to handle the lock
ordering ourselves or we have a deadlock: this patch addresses that by
checking for locking ordering violations and doing the unlock/relock
dance if necessary.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 11 Nov 2020 17:42:54 +0000 (12:42 -0500)]
bcachefs: Hack around bch2_varint_decode invalid reads
bch2_varint_decode can do reads up to 7 bytes past the end ptr, for the
sake of performance - these extra bytes are always masked off.
This won't be a problem in practice if we make sure to burn 8 bytes in
any buffer that has bkeys in it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 11 Nov 2020 23:59:41 +0000 (18:59 -0500)]
bcachefs: Fix missing memalloc_nofs_restore()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 11 Nov 2020 22:47:39 +0000 (17:47 -0500)]
bcachefs: Fix btree key cache shutdown
On emergency shutdown, we might still have dirty keys in the btree key
cache that need to be cleaned up properly.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 9 Nov 2020 18:01:52 +0000 (13:01 -0500)]
bcachefs: Add accounting for dirty btree nodes/keys
This lets us improve journal reclaim, so that it now tries to make sure
no more than 3/4s of the btree node cache and btree key cache are dirty
- ensuring the shrinkers can free memory.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 7 Nov 2020 21:55:57 +0000 (16:55 -0500)]
bcachefs: Fix btree iterator leak
this fixes an occasonial btree transaction iterators overflow.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 7 Nov 2020 21:16:52 +0000 (16:16 -0500)]
bcachefs: Inline make_bfloat() into __build_ro_aux_tree()
This is a fast path - also, lift out the checks/init for min/max key.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 7 Nov 2020 18:03:24 +0000 (13:03 -0500)]
bcachefs: use a radix tree for inum bitmap in fsck
The change to use the cpu nr for the high bits of new inode numbers
means that inode numbers are very space - we see -ENOMEM during fsck
without this.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 6 Nov 2020 04:39:33 +0000 (23:39 -0500)]
bcachefs: New varints
Previous varint implementation used by the inode code was not nearly as
fast as it could have been; partly because it was attempting to encode
integers up to 96 bits (for timestamps) but this meant that encoding and
decoding the length required a table lookup.
Instead, we'll just encode timestamps greater than 64 bits as two
separate varints; this will make decoding/encoding of inodes
significantly faster overall.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 7 Nov 2020 17:43:48 +0000 (12:43 -0500)]
bcachefs: Fix build warning when CONFIG_BCACHEFS_DEBUG=n
this function is only used by debug code, but we'd like to always build
it so we know that it does build.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 7 Nov 2020 17:31:20 +0000 (12:31 -0500)]
bcachefs: Drop typechecking from bkey_cmp_packed()
This only did anything in two places, and those can just be replaced
wiht bkey_cmp_left_packed()).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 6 Nov 2020 06:34:41 +0000 (01:34 -0500)]
bcachefs: More inlinining in the btree key cache code
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 6 Nov 2020 01:49:08 +0000 (20:49 -0500)]
bcachefs: Fix spurious transaction restarts
The checks for lock ordering violations weren't quite right.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 6 Nov 2020 01:02:01 +0000 (20:02 -0500)]
bcachefs: Add a single slot percpu buf for btree iters
Allocating our array of btree iters is a big enough allocation that it
hits the buddy allocator, and we're seeing lots of lock contention.
Sticking a single element buffer in front of it should help.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Matthew Wilcox (Oracle) [Thu, 5 Nov 2020 15:58:38 +0000 (15:58 +0000)]
bcachefs: Use attach_page_private and detach_page_private
These recently added helpers simplify the code.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Matthew Wilcox (Oracle) [Thu, 5 Nov 2020 15:58:37 +0000 (15:58 +0000)]
bcachefs: Remove page_state_init_for_read
This is dead code; delete the function.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 5 Nov 2020 17:16:05 +0000 (12:16 -0500)]
bcachefs: Build fixes for 32bit x86
PAGE_SIZE and size_t are not unsigned longs on 32 bit, annoying...
also switch to atomic64_cmpxchg instead of cmpxchg() for
journal_seq_copy, as atomic64_cmpxchg has a fallback that uses spinlocks
for when it's not supported.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 3 Nov 2020 04:51:33 +0000 (23:51 -0500)]
bcachefs: Improved inode create optimization
This shards new inodes into different btree nodes by using the processor
ID for the high bits of the new inode number. Much faster than the
previous inode create optimization - this also helps with sharding in
the other btrees that index by inode number.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 3 Nov 2020 00:49:23 +0000 (19:49 -0500)]
bcachefs: Report inode counts via statfs
Took awhile to figure out exactly what statfs wanted...
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 3 Nov 2020 00:15:18 +0000 (19:15 -0500)]
bcachefs: add const annotations to bset.c
perhaps a bit silly, but some debug assertions we want to add need const
propagated a bit more.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 2 Nov 2020 23:54:33 +0000 (18:54 -0500)]
bcachefs: Don't embed btree iters in btree_trans
These haven't been in used since reallocing iterators has been disabled,
and saves us a lot of stack if we get rid of it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 2 Nov 2020 23:36:08 +0000 (18:36 -0500)]
bcachefs: Split out debug_check_btree_accounting
This check is very expensive
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 2 Nov 2020 23:20:44 +0000 (18:20 -0500)]
bcachefs: Drop sysfs interface to debug parameters
It's not used much anymore, the module paramter interface is better.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 2 Nov 2020 22:51:38 +0000 (17:51 -0500)]
bcachefs: Minor journal reclaim improvement
With the btree key cache code, journal reclaim now has a lot more work
to do. It could be the case that after journal reclaim has finished one
iteration there's already more work to do, so put it in a loop to check
for that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 27 Oct 2020 22:56:21 +0000 (18:56 -0400)]
bcachefs: Inode create optimization
On workloads that do a lot of multithreaded creates all at once, lock
contention on the inodes btree turns out to still be an issue.
This patch adds a small buffer of inode numbers that are known to be
free, so that we can avoid touching the btree on every create. Also,
this changes inode creates to update via the btree key cache for the
initial create.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 30 Oct 2020 21:29:38 +0000 (17:29 -0400)]
bcachefs: Improve check for when bios are physically contiguous
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 28 Oct 2020 18:18:18 +0000 (14:18 -0400)]
bcachefs: Fix spurious transaction restarts
The check for whether locking a btree node would deadlock was wrong - we
have to check that interior nodes are locked before descendents, but
this check was wrong when consider cached vs. non cached iterators.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 28 Oct 2020 18:17:46 +0000 (14:17 -0400)]
bcachefs: Improve tracing for transaction restarts
We have a bug where we can get stuck with a process spinning in
transaction restarts - need more information.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 27 Oct 2020 18:10:52 +0000 (14:10 -0400)]
bcachefs: Fix stack corruption
A bkey_on_stack_realloc() call was in the wrong place, and broken for
indirect extents
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 22 Sep 2019 23:10:21 +0000 (19:10 -0400)]
bcachefs: Use cached iterators for inode updates
This switches inode updates to use cached btree iterators - which should
be a nice performance boost, since lock contention on the inodes btree
can be a bottleneck on multithreaded workloads.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 26 Oct 2020 21:03:28 +0000 (17:03 -0400)]
bcachefs: fiemap fixes
- fiemap didn't know about inline extents, fixed
- advancing to the next extent after we'd chased a pointer to the
reflink btree was wrong, fixed
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 26 Oct 2020 18:45:20 +0000 (14:45 -0400)]
bcachefs: Fix btree updates when mixing cached and non cached iterators
There was a bug where bch2_trans_update() would incorrectly delete a
pending update where the new update did not actually overwrite the
existing update, because we were incorrectly using BTREE_ITER_TYPE when
sorting pending btree updates.
This affects the pending patch to use cached iterators for inode
updates.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 26 Oct 2020 18:54:55 +0000 (14:54 -0400)]
bcachefs: Add mode to bch2_inode_to_text
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 25 Oct 2020 05:08:28 +0000 (01:08 -0400)]
bcachefs: Always write a journal entry when stopping journal
This is to fix a (harmless) bug where the read clock hand in the
superblock doesn't match the journal.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 25 Oct 2020 01:20:16 +0000 (21:20 -0400)]
bcachefs: Drop alloc keys from journal when -o reconstruct_alloc
This fixes a bug where we'd pop an assertion due to replaying a key for
an interior btree node when that node no longer exists.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 24 Oct 2020 23:51:34 +0000 (19:51 -0400)]
bcachefs: Indirect inline data extents
When inline data extents were added, reflink was forgotten about - we
need indirect inline data extents for reflink + inline data to work
correctly.
This patch adds them, and a new feature bit that's flipped when they're
used.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 25 Oct 2020 00:56:47 +0000 (20:56 -0400)]
bcachefs: Fix rare use after free in read path
If the bkey_on_stack_reassemble() call in __bch2_read_indirect_extent()
reallocates the buffer, k in bch2_read - which we pointed at the
bkey_on_stack buffer - will now point to a stale buffer. Whoops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 24 Oct 2020 20:37:17 +0000 (16:37 -0400)]
bcachefs: Improve some error messages
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 24 Oct 2020 01:07:17 +0000 (21:07 -0400)]
bcachefs: Fix for passing target= opts as mount opts
Some options can't be parsed until the filesystem initialized;
previously, passing these options to mount or remount would cause mount
to fail.
This changes the mount path so that we parse the options passed in
twice, and just ignore any options that can't be parsed the first time.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 23 Oct 2020 22:40:30 +0000 (18:40 -0400)]
bcachefs: Fix bch2_mark_stripe()
There's no reason not to always recalculate these fields
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 23 Jul 2020 03:11:48 +0000 (23:11 -0400)]
bcachefs: Don't drop replicas when copygcing ec data
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 9 Jul 2020 22:31:51 +0000 (18:31 -0400)]
bcachefs: Account for stripe parity sectors separately
Instead of trying to charge EC parity to the data within the stripe
(which is subject to rounding errors), let's charge it to the stripe
itself. It should also make -ENOSPC issues easier to deal with if we
charge for parity blocks up front, and means we can also make more fine
grained accounting available to the user.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 20 Oct 2020 02:36:24 +0000 (22:36 -0400)]
bcachefs: Fix for bad stripe pointers
The allocator usually doesn't increment bucket gens right away on
buckets that it's about to hand out (for reasons that need to be
documented), instead deferring that to whatever extent update first
references that bucket.
But stripe pointers reference buckets without changing bucket sector
counts, meaning we could end up with a pointer in a stripe with a gen
newer than the bucket it points to.
Fix this by adding a transactional trigger for KEY_TYPE_stripe that just
writes out the keys in the alloc btree for the buckets it points to.
Also - consolidate the code that checks pointer validity.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 17 Oct 2020 20:44:27 +0000 (16:44 -0400)]
bcachefs: Start/stop io clock hands in read/write paths
This fixes a bug where the clock hands in the journal and superblock
didn't match, because we were still incrementing the read clock hand
while read-only.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 17 Oct 2020 01:36:26 +0000 (21:36 -0400)]
bcachefs: Improvements to writing alloc info
Now that we've got transactional alloc info updates (and have for
awhile), we don't need to write it out on shutdown, and we don't need to
write it out on startup except when GC found errors - this is a big
improvement to mount/unmount performance.
This patch also fixes a few bugs where we weren't writing out alloc
info (on new filesystems, and new devices) and should have been.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 16 Dec 2020 18:35:16 +0000 (13:35 -0500)]
bcachefs: Fix assertion popping in transaction commit path
We can't be holding read locks on btree nodes when we go to take write
locks: this would deadlock if another thread is holding an intent lock
on the node we have a read lock on, and it tries to commit and upgrade
to a write lock.
But instead of triggering an assertion, if this happens we can just
upgrade the read lock to an intent lock.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 17 Oct 2020 01:32:02 +0000 (21:32 -0400)]
bcachefs: Perf improvements for bch_alloc_read()
On large filesystems reading in the alloc info takes a significant
amount of time. But we don't need to be calling into the fully general
bch2_mark_key() path, just open code what we need in
bch2_alloc_read_fn().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 16 Oct 2020 02:50:48 +0000 (22:50 -0400)]
bcachefs: Fix copygc dying on startup
The copygc threads errors out and makes the filesystem go RO if it ever
tries to run and discovers it has no reserve allocated - which is a
problem if it races with the allocator thread and its reserve hasn't
been filled yet.
The allocator thread doesn't start filling the copygc reserve until
after BCH_FS_STARTED has been set, so make sure to wake up the allocator
threads after setting that and before starting copygc.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 16 Oct 2020 02:23:02 +0000 (22:23 -0400)]
bcachefs: Fix copygc of compressed data
The check for when we need to get a disk reservation was wrong.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 16 Oct 2020 01:48:58 +0000 (21:48 -0400)]
bcachefs: Fix another lockdep splat
vfree() can allocate memory, so we need to call memalloc_nofs_save().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 15 Oct 2020 19:58:36 +0000 (15:58 -0400)]
bcachefs: Fix errors early in the fs init process
At some point bch2_fs_alloc() was changed to always call bch2_fs_free()
in the error path, which means we need c->cl to always be initialized.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 10 Jul 2020 23:49:34 +0000 (19:49 -0400)]
bcachefs: Copy ptr->cached when migrating data
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 13 Oct 2020 07:58:50 +0000 (03:58 -0400)]
bcachefs: Fix gc of stale ptr gens
Awhile back, gcing of stale pointers was split out from full
mark-and-sweep gc - but, the bit to actually drop those stale pointers
wasn't implemnted. Whoops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 13 Oct 2020 04:06:36 +0000 (00:06 -0400)]
bcachefs: Fix off-by-one error in ptr gen check
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 11 Oct 2020 20:33:49 +0000 (16:33 -0400)]
bcachefs: Fix a lockdep splat
We can't allocate memory with GFP_FS while holding the btree cache lock,
and vfree() can allocate memory.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 9 Oct 2020 04:09:20 +0000 (00:09 -0400)]
bcachefs: Fix __bch2_truncate_page()
__bch2_truncate_page() will mark some of the blocks in a page as
unallocated. But, if the page is mmapped (and writable), every block in
the page needs to be marked dirty, else those blocks won't be written by
__bch2_writepage().
The solution is to change those userspace mappings to RO, so that we
force bch2_page_mkwrite() to be called again.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 7 Oct 2020 02:18:21 +0000 (22:18 -0400)]
bcachefs: Fix journal_seq_copy()
We also need to update the journal's bloom filter of inode numbers that
each journal write has upudates for - in case the inode gets evicted
before it gets fsynced.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 8 Sep 2020 22:30:32 +0000 (18:30 -0400)]
bcachefs: Fix unmount path
There was a long standing race in the mount/unmount code - the VFS
intends for mount/unmount synchronizatino to be handled by the list of
superblocks, but we were still holding devices open after tearing down
our superblock in the unmount path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 7 Sep 2020 02:58:28 +0000 (22:58 -0400)]
bcachefs: Don't fail mount if device has been removed
Also - make sure to show the devices we actually have open in /proc
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 24 Aug 2020 19:58:26 +0000 (15:58 -0400)]
bcachefs: Improvements to the journal read error paths
- Print out more information in error messages
- On checksum error, keep the journal entry but mark it bad so that we
can prefer entries from other devices that don't have bad checksums
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 24 Aug 2020 19:16:32 +0000 (15:16 -0400)]
bcachefs: Make sure to go rw if lazy in fsck
The paths where we delete or truncate inodes don't pass commit flags for
BTREE_INSERT_LAZY_RW, so just go rw if necessary in the fsck code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 24 Aug 2020 18:57:48 +0000 (14:57 -0400)]
bcachefs: Some project id fixes
Inode options that are accessible via the xattr interface are stored
with a +1 bias, so that a value of 0 means unset. We weren't handling
this consistently.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 16 Aug 2020 02:41:35 +0000 (22:41 -0400)]
bcachefs: Don't report inodes to statfs
We don't have a limit on the number of inodes in a filesystem, so this
is apparently the right way to report that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 12 Aug 2020 19:08:17 +0000 (15:08 -0400)]
bcachefs: Add a cond_resched() to bch2_alloc_write()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 6 Aug 2020 19:22:24 +0000 (15:22 -0400)]
bcachefs: Fix a couple null ptr derefs when no disk groups exist
Normally successfully parsing a target means disk groups should exist,
but we don't want a BUG() or null ptr deref if we end up with an invalid
target.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 12 Aug 2020 19:00:08 +0000 (15:00 -0400)]
bcachefs: Fix disk groups not being updated when set via sysfs
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 12 Aug 2020 17:49:09 +0000 (13:49 -0400)]
bcachefs: Change copygc to consider bucket fragmentation
When devices have different sized buckets this is more correct.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 12 Aug 2020 17:48:02 +0000 (13:48 -0400)]
bcachefs: Don't block on allocations when only writing to specific device
Since the copygc thread is now global and not per device, we're not
freeing up space on any one device in bounded time - and indeed we never
really were, since rebalance wasn't moving data around between devices
with that objective.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 5 Aug 2020 03:10:08 +0000 (23:10 -0400)]
bcachefs: Fix a bug with the journal_seq_blacklist mechanism
Previously, we would start doing btree updates before writing the first
journal entry; if this was after an unclean shutdown, this could cause
those btree updates to not be blacklisted.
Also, move some code to headers for userspace debug tools.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 5 Aug 2020 03:12:49 +0000 (23:12 -0400)]
bcachefs: Fix bch2_new_stripes_to_text()
painful looking typo, fortunately difficult to hit.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 3 Aug 2020 17:58:36 +0000 (13:58 -0400)]
bcachefs: Don't disallow btree writes to RO devices
There's an inherent race with setting devices RO when they have dirty
btree nodes on them. We already check if a btree node is on an RO device
before we dirty it, so this patch just allows those writes so that we
don't have errors forcing the entire filesystem read only when trying to
remove a device.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 3 Aug 2020 17:37:11 +0000 (13:37 -0400)]
bcachefs: Fix maximum btree node size
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 25 Jul 2020 21:06:11 +0000 (17:06 -0400)]
bcachefs: Convert various code to printbuf
printbufs know how big the buffer is that was allocated, so we can get
rid of the random PAGE_SIZEs all over the place.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 25 Jul 2020 19:07:37 +0000 (15:07 -0400)]
bcachefs: Remove some uses of PAGE_SIZE in the btree code
For portability to userspace, we should try to avoid working in kernel
pages.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 25 Jul 2020 19:37:14 +0000 (15:37 -0400)]
bcachefs: Ensure we wake up threads locking node when reusing it
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 25 Jul 2020 18:19:37 +0000 (14:19 -0400)]
bcachefs: Fix bch2_btree_node_insert_fits()
It should be checking for the recently added flag
btree_node_needs_rewrite.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 23 Jul 2020 15:31:01 +0000 (11:31 -0400)]
bcachefs: Ensure we only allocate one EC bucket per writepoint
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 23 Jul 2020 02:40:32 +0000 (22:40 -0400)]
bcachefs: Fix a race with BCH_WRITE_SKIP_CLOSURE_PUT
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>