Kent Overstreet [Mon, 5 Nov 2018 04:10:09 +0000 (23:10 -0500)]
bcachefs: Disk usage in compressed sectors, not uncompressed
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 5 Nov 2018 03:09:51 +0000 (22:09 -0500)]
bcachefs: Assorted fixes for running on very small devices
It's now possible to create and use a filesystem on a 512k device with
4k buckets (though at that size we still waste almost half to internal
reserves)
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 5 Nov 2018 02:55:35 +0000 (21:55 -0500)]
bcachefs: Scale down number of writepoints when low on space
this means we don't have to reserve space for them when calculating
filesystem capacity
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 4 Nov 2018 02:00:50 +0000 (22:00 -0400)]
bcachefs: Fix an assertion when rebuilding replicas
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 4 Nov 2018 01:52:52 +0000 (21:52 -0400)]
bcachefs: Rename nofsck opt to fsck
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 4 Nov 2018 01:51:31 +0000 (21:51 -0400)]
bcachefs: Fix journal replay when replicas sb section missing
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 4 Nov 2018 00:19:04 +0000 (20:19 -0400)]
bcachefs: fix bounds checks in bch2_bio_map()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 4 Nov 2018 00:04:54 +0000 (20:04 -0400)]
bcachefs: Some fixes for building in userspace
userspace allocators don't align allocations as nicely as kernel
allocators, which meant that in some cases we weren't allocating big
enough bvec arrays - just make the calculations more rigorous and
explicit to fix it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 1 Nov 2018 20:02:02 +0000 (16:02 -0400)]
bcachefs: fix bch2_bkey_print_bfloat
was popping an assertion in the eytzinger code
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 1 Nov 2018 19:28:45 +0000 (15:28 -0400)]
bcachefs: new avoid mechanism for io retries
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 1 Nov 2018 19:21:48 +0000 (15:21 -0400)]
bcachefs: more key marking refactoring
prep work for erasure coding
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 30 Oct 2018 18:32:21 +0000 (14:32 -0400)]
bcachefs: replicas: prep work for stripes
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 30 Oct 2018 18:14:19 +0000 (14:14 -0400)]
bcachefs: kill struct bch_replicas_cpu_entry
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 21 Oct 2018 20:32:51 +0000 (16:32 -0400)]
bcachefs: add functionality for heaps to update backpointers
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 21 Oct 2018 14:56:11 +0000 (10:56 -0400)]
bcachefs: btree gc refactoring
prep work for erasure coding
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 30 Sep 2018 22:39:20 +0000 (18:39 -0400)]
bcachefs: BCH_EXTENT_ENTRY_TYPES()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 28 Sep 2018 01:08:39 +0000 (21:08 -0400)]
bcachefs: bch2_extent_ptr_decoded_append()
This new helper for the move path avoids creating a new CRC entry when
we already have one that matches the pointer being added.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 30 Sep 2018 22:28:23 +0000 (18:28 -0400)]
bcachefs: bch2_extent_drop_ptrs()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 28 Sep 2018 01:08:39 +0000 (21:08 -0400)]
bcachefs: extent_for_each_ptr_decode()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 2 Oct 2018 20:40:12 +0000 (16:40 -0400)]
bcachefs: kill bch_extent_crc_type
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 2 Oct 2018 15:03:39 +0000 (11:03 -0400)]
bcachefs: extent_ptr_decoded
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 12 Oct 2018 18:57:57 +0000 (14:57 -0400)]
bcachefs: fix missing include
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 12 Oct 2018 18:53:25 +0000 (14:53 -0400)]
bcachefs: fix a spurious gcc warning
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 6 Oct 2018 08:12:42 +0000 (04:12 -0400)]
bcachefs: Allocation code refactoring
bch2_alloc_sectors_start() was a nightmare to work with - it's got some
tricky stuff to do, since it wants to use the buckets the writepoint
already has, unless they're not in the target it wants to write to,
unless it can't allocate from any other devices in which case it will
use those buckets if it has to - et cetera.
This restructures the code to start with a new empty list of open
buckets we're going to use for the new allocation, pulling buckets from
the write point's list as we decide that we really are going to use
them - making the code somewhat more functional and drastically easier
to understand.
Also fixes a bug where we could end up waiting on c->freelist_wait
(because allocating from one device failed) but return success from
bch2_bucket_alloc(), because allocating from a different device
succeeded.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 6 Oct 2018 04:46:55 +0000 (00:46 -0400)]
bcachefs: Split out alloc_background.c
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 1 Oct 2018 04:33:42 +0000 (00:33 -0400)]
bcachefs: Fix failure to suspend
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 26 Sep 2018 03:27:57 +0000 (23:27 -0400)]
bcachefs: Fix suspend when moving data faster than ratelimit
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 21 Sep 2018 21:37:13 +0000 (17:37 -0400)]
bcachefs: fix bch2_acl_chmod()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 15 Sep 2018 21:57:22 +0000 (17:57 -0400)]
bcachefs: Fix a deadlock
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 6 Sep 2018 21:09:07 +0000 (17:09 -0400)]
bcachefs: fix a divide
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 28 Aug 2018 22:54:42 +0000 (18:54 -0400)]
bcachefs: make fsck spew less
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 21 Aug 2018 23:42:00 +0000 (19:42 -0400)]
bcachefs: Dirent repair code
There was a bug for awhile in previous kernels where we weren't
computing dirent name lengths correctly and we weren't zeroing out
padding at the end of dirents (due to struct bch_dirent changing size by
adding __attribute__((aligned)), and not updating other code to use
offsetof).
This patch fixes dirents with junk at the end, by going off of the
dirent's hash.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 21 Aug 2018 21:38:41 +0000 (17:38 -0400)]
bcachefs: Fix a btree iter bug when iter pos == POS_MAX
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 21 Aug 2018 20:30:14 +0000 (16:30 -0400)]
bcachefs: Comparison function cleanups
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 21 Aug 2018 19:19:33 +0000 (15:19 -0400)]
bcachefs: Prioritize fragmentation in bucket allocator
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 11 Aug 2018 23:12:05 +0000 (19:12 -0400)]
bcachefs: Pass around bset_tree less
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 8 Aug 2018 23:53:30 +0000 (19:53 -0400)]
bcachefs: kill extent_insert_hook
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 5 Aug 2018 21:48:00 +0000 (17:48 -0400)]
bcachefs: kill i_sectors_hook
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 11 Aug 2018 21:26:11 +0000 (17:26 -0400)]
bcachefs: convert fcollapse to bch2_extent_update()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 9 Aug 2018 01:11:43 +0000 (21:11 -0400)]
bcachefs: convert fpunch to bch2_extent_update()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 9 Aug 2018 01:09:31 +0000 (21:09 -0400)]
bcachefs: convert truncate to bch2_extent_update()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 8 Aug 2018 22:42:04 +0000 (18:42 -0400)]
bcachefs: convert bchfs_write_index_update() to bch2_extent_update()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 5 Aug 2018 21:46:41 +0000 (17:46 -0400)]
bcachefs: bch2_extent_trim_atomic()
Prep work for extents insert hook removal
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 9 Aug 2018 01:22:46 +0000 (21:22 -0400)]
bcachefs: mempoolify btree_trans
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 5 Aug 2018 19:21:52 +0000 (15:21 -0400)]
bcachefs: BTREE_INSERT_JOURNAL_RES_FULL is no longer possible
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 5 Aug 2018 19:28:29 +0000 (15:28 -0400)]
bcachefs: extent_squash() can no longer fail
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 5 Aug 2018 18:41:29 +0000 (14:41 -0400)]
bcachefs: make struct btree_iter a bit smaller
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 22 Jul 2016 03:05:06 +0000 (19:05 -0800)]
bcachefs: lift ordering restriction on 0 size extents
This lifts the restriction that 0 size extents must not overlap with
other extents, which means we can now sort extents and non extents the
same way, and will let us simplify a bunch of other stuff as well.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 2 Aug 2018 03:03:41 +0000 (23:03 -0400)]
bcachefs: extent unit tests
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 6 Aug 2018 02:23:44 +0000 (22:23 -0400)]
bcachefs: bkey_written()
also cleanups of btree node offsets
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 6 Aug 2018 02:34:03 +0000 (22:34 -0400)]
bcachefs: improved rw_aux_tree_bsearch()
shouldn't be any reason for an actual binary search here
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 3 Aug 2018 23:41:44 +0000 (19:41 -0400)]
bcachefs: Factor out btree_key_can_insert()
working on getting rid of all the reasons bch2_insert_fixup_extent() can
fail/stop partway, which is needed for other refactorings.
One of the reasons we could have to bail out is if we're splitting a
compressed extent we might need to add to our disk reservation - but we
can check that before actually starting the insert.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 24 Jul 2018 18:55:05 +0000 (14:55 -0400)]
bcachefs: BCH_SB_RESERVE_BYTES
Add an option, gc_reserve_bytes, to set the copygc reserve as a size
instead of a percent
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 1 Aug 2018 18:26:55 +0000 (14:26 -0400)]
bcachefs: Better calculation of copygc threshold
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 24 Jul 2018 20:42:49 +0000 (16:42 -0400)]
bcachefs: Change how replicated data is accounted
Due to compression, the different replicas of a replicated extent don't
necessarily have to take up the same amount of space - so replicated
data sector counts shouldn't be stored divided by the number of
replicas.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 24 Jul 2018 18:54:39 +0000 (14:54 -0400)]
bcachefs: Account for internal fragmentation better
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 24 Jul 2018 17:33:07 +0000 (13:33 -0400)]
bcachefs: kill s_alloc, use bch_data_type
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 24 Jul 2018 16:59:13 +0000 (12:59 -0400)]
bcachefs: bch2_mark_key() now takes bch_data_type
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 24 Jul 2018 20:42:27 +0000 (16:42 -0400)]
bcachefs: Fix an assertion in the btree node merge path
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 24 Jul 2018 23:45:22 +0000 (19:45 -0400)]
bcachefs: Fix locking in allocator thread
gc lock must be held while invalidating buckets - fixes
"
1f7a95698e bcachefs: Invalidate buckets when writing to alloc btree"
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 23 Jul 2018 13:13:07 +0000 (09:13 -0400)]
bcachefs: fix bch2_val_to_text()
was returning wrong value
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 23 Jul 2018 11:53:29 +0000 (07:53 -0400)]
bcachefs: minor fsync fix
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 23 Jul 2018 11:52:00 +0000 (07:52 -0400)]
bcachefs: Assorted journal refactoring
Also improve error reporting - only return an error from
bch2_journal_flush_seq() if we had an error writing that entry (i.e. not
if there was an error with a newer entry).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 23 Jul 2018 11:38:06 +0000 (07:38 -0400)]
bcachefs: fix last_seq_ondisk
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 23 Jul 2018 09:48:53 +0000 (05:48 -0400)]
bcachefs: fix mtime/ctime update on truncate
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 23 Jul 2018 09:48:35 +0000 (05:48 -0400)]
bcachefs: fix fsync after create
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 23 Jul 2018 09:28:40 +0000 (05:28 -0400)]
bcachefs: fix nbuckets usage on device resize
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 22 Jul 2018 14:43:01 +0000 (10:43 -0400)]
bcachefs: Invalidate buckets when writing to alloc btree
Prep work for persistent alloc information. Refactoring also lets us
make free_inc much smaller, which means a lot fewer buckets stranded on
freelists.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 22 Jul 2018 10:10:52 +0000 (06:10 -0400)]
bcachefs: kill bucket mark sector count saturation
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 22 Jul 2018 02:57:20 +0000 (22:57 -0400)]
bcachefs: don't call bch2_bucket_seq_cleanup from journal_buf_switch
journal_buf_switch is called from the foreground when getting a journal
reservation and thus is somewhat latency sensitive;
bch2_bucket_seq_cleanup has to run infrequently but is a bit expensive
when it does run.
Call it from the journal write path instead, and punt the journal write
to worqueue context.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 22 Jul 2018 17:04:00 +0000 (13:04 -0400)]
bcachefs: Fix an assertion
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 22 Jul 2018 17:15:51 +0000 (13:15 -0400)]
bcachefs: fix rename + fsync
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 21 Jul 2018 02:23:42 +0000 (22:23 -0400)]
bcachefs: Use ei_update_lock consistently
This is prep work for using deferred btree updates for inode updates -
the way inodes are done now we're relying on btree locking for ei_inode
and ei_update_lock could probably be removed, but it'll actually be
needed when we switch to deferred updates.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 17 Jul 2018 19:28:11 +0000 (15:28 -0400)]
bcachefs: bch2_trans_update() now takes struct btree_insert_entry
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 17 Jul 2018 18:12:42 +0000 (14:12 -0400)]
bcachefs: Fix mtime/ctime updates
Also make inode flags consistent with how the rest of the inode is
updated
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 17 Jul 2018 18:03:47 +0000 (14:03 -0400)]
bcachefs: Simplify bch2_write_inode_trans, fix lockdep splat
ei_update_lock isn't currently needed for write inode (but it will be
needed again when deferred btree updates are used for inode updates)
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 21 Jul 2018 07:56:57 +0000 (03:56 -0400)]
bcachefs: add bch_verbose() statements for shutdown
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 17 Jul 2018 16:19:14 +0000 (12:19 -0400)]
bcachefs: Fix a use after free in the journal code
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 21 Jul 2018 02:08:17 +0000 (22:08 -0400)]
bcachefs: Fix device add
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 13 Jul 2018 03:30:45 +0000 (23:30 -0400)]
bcachefs: trace transaction restarts
exceptionally crappy "tracing", but it's a start at documenting the
places restarts can be triggered
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Thu, 12 Jul 2018 23:19:41 +0000 (19:19 -0400)]
bcachefs: Convert raw uses of bch2_btree_iter_link() to new transactions
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 15 Jul 2018 01:06:51 +0000 (21:06 -0400)]
bcachefs: Only check inode i_nlink during full fsck
Now that all filesystem operatinos that manipulate the filesystem
heirachy and i_nlink are fully atomic, we can add a feature bit to
indicate i_nlink doesn't need to be checked.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Fri, 17 Mar 2017 06:18:50 +0000 (22:18 -0800)]
bcachefs: Initial commit
Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write
filesystem with every feature you could possibly want.
Website: https://bcachefs.org
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Wed, 26 Apr 2023 16:27:51 +0000 (12:27 -0400)]
MAINTAINERS: Add entry for bcachefs
bcachefs is a new copy-on-write filesystem; add a MAINTAINERS entry for
it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 10 Sep 2023 00:56:00 +0000 (20:56 -0400)]
objtool: Add bcachefs noreturns
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 12 Sep 2023 05:17:22 +0000 (01:17 -0400)]
lib/generic-radix-tree.c: Add peek_prev()
This patch adds genradix_peek_prev(), genradix_iter_rewind(), and
genradix_for_each_reverse(), for iterating backwards over a generic
radix tree.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 13 Feb 2021 01:11:25 +0000 (20:11 -0500)]
lib/generic-radix-tree.c: Don't overflow in peek()
When we started spreading new inode numbers throughout most of the 64
bit inode space, that triggered some corner case bugs, in particular
some integer overflows related to the radix tree code. Oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Kent Overstreet [Wed, 26 Apr 2023 16:27:51 +0000 (12:27 -0400)]
MAINTAINERS: Add entry for generic-radix-tree
lib/generic-radix-tree.c is a simple radix tree that supports storing
arbitrary types. Add a maintainers entry for it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sun, 5 Mar 2023 03:45:27 +0000 (22:45 -0500)]
closures: Add a missing include
Fixes building in userspace.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 4 Mar 2023 07:39:39 +0000 (02:39 -0500)]
closures: closure_nr_remaining()
Factor out a new helper, which returns the number of events outstanding.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 9 Dec 2017 17:42:44 +0000 (12:42 -0500)]
closures: closure_wait_event()
Like wait_event() - except, because it uses closures and closure
waitlists it doesn't have the restriction on modifying task state inside
the condition check, like wait_event() does.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Acked-by: Coly Li <colyli@suse.de>
Kent Overstreet [Wed, 26 Apr 2023 16:27:51 +0000 (12:27 -0400)]
MAINTAINERS: Add entry for closures
closures, from bcache, are async widgets with a variety of uses.
bcachefs also uses them, so they're being moved to lib/; mark them as
maintained.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Acked-by: Coly Li <colyli@suse.de>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Sat, 18 Mar 2017 00:35:23 +0000 (16:35 -0800)]
bcache: move closures to lib/
Prep work for bcachefs - being a fork of bcache it also uses closures
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Acked-by: Coly Li <colyli@suse.de>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Brian Foster [Mon, 14 Aug 2023 13:04:50 +0000 (09:04 -0400)]
locking: export contention tracepoints for bcachefs six locks
The bcachefs implementation of six locks is intended to land in
generic locking code in the long term, but has been pulled into the
bcachefs subsystem for internal use for the time being. This code
lift breaks the bcachefs module build as six locks depend a couple
of the generic locking tracepoints. Export these tracepoint symbols
for bcachefs.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 25 Apr 2023 18:45:28 +0000 (14:45 -0400)]
lib: Export errname
errname() returns the name of an errcode; this functionality is
otherwise only available for error pointers via %pE - bcachefs uses this
for better error messages.
Signed-off-by: Christopher James Halse Rogers <raof@ubuntu.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Mon, 25 Apr 2022 19:26:28 +0000 (15:26 -0400)]
lib/string_helpers: string_get_size() now returns characters wrote
printbuf now needs to know the number of characters that would have been
written if the buffer was too small, like snprintf(); this changes
string_get_size() to return the the return value of snprintf().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Christopher James Halse Rogers [Mon, 27 Jun 2022 00:45:12 +0000 (10:45 +1000)]
stacktrace: Export stack_trace_save_tsk
The bcachefs module wants it, and there doesn't seem to be any
reason it shouldn't be exported like the other functions.
Signed-off-by: Christopher James Halse Rogers <raof@ubuntu.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Kent Overstreet [Tue, 10 Jul 2018 03:27:33 +0000 (23:27 -0400)]
fs: factor out d_mark_tmpfile()
New helper for bcachefs - bcachefs doesn't want the
inode_dec_link_count() call that d_tmpfile does, it handles i_nlink on
its own atomically with other btree updates
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Kent Overstreet [Wed, 16 Oct 2019 19:03:50 +0000 (15:03 -0400)]
sched: Add task_struct->faults_disabled_mapping
There has been a long standing page cache coherence bug with direct IO.
This provides part of a mechanism to fix it, currently just used by
bcachefs but potentially worth promoting to the VFS.
Direct IO evicts the range of the pagecache being read or written to.
For reads, we need dirty pages to be written to disk, so that the read
doesn't return stale data. For writes, we need to evict that range of
the pagecache so that it's not stale after the write completes.
However, without a locking mechanism to prevent those pages from being
re-added to the pagecache - by a buffered read or page fault - page
cache inconsistency is still possible.
This isn't necessarily just an issue for userspace when they're playing
games; filesystems may hang arbitrary state off the pagecache, and so
page cache inconsistency may cause real filesystem bugs, depending on
the filesystem. This is less of an issue for iomap based filesystems,
but e.g. buffer heads caches disk block mappings (!) and attaches them
to the pagecache, and bcachefs attaches disk reservations to pagecache
pages.
This issue has been hard to fix, because
- we need to add a lock (henceforth called pagecache_add_lock), which
would be held for the duration of the direct IO
- page faults add pages to the page cache, thus need to take the same
lock
- dio -> gup -> page fault thus can deadlock
And we cannot enforce a lock ordering with this lock, since userspace
will be controlling the lock ordering (via the fd and buffer arguments
to direct IOs), so we need a different method of deadlock avoidance.
We need to tell the page fault handler that we're already holding a
pagecache_add_lock, and since plumbing it through the entire gup() path
would be highly impractical this adds a field to task_struct.
Then the full method is:
- in the dio path, when we first take the pagecache_add_lock, note the
mapping in the current task_struct
- in the page fault handler, if faults_disabled_mapping is set, we
check if it's the same mapping as the one we're taking a page fault
for, and if so return an error.
Then we check lock ordering: if there's a lock ordering violation and
trylock fails, we'll have to cycle the locks and return an error that
tells the DIO path to retry: faults_disabled_mapping is also used for
signalling "locks were dropped, please retry".
Also relevant to this patch: mapping->invalidate_lock.
mapping->invalidate_lock provides most of the required semantics - it's
used by truncate/fallocate to block pages being added to the pagecache.
However, since it's a rwsem, direct IOs would need to take the write
side in order to block page cache adds, and would then be exclusive with
each other - we'll need a new type of lock to pair with this approach.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Jan Kara <jack@suse.cz>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: Andreas Grünbacher <andreas.gruenbacher@gmail.com>
Linus Torvalds [Sun, 10 Sep 2023 23:28:41 +0000 (16:28 -0700)]
Linux 6.6-rc1