linux.git
18 months agobcachefs: new avoid mechanism for io retries
Kent Overstreet [Thu, 1 Nov 2018 19:28:45 +0000 (15:28 -0400)]
bcachefs: new avoid mechanism for io retries

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: more key marking refactoring
Kent Overstreet [Thu, 1 Nov 2018 19:21:48 +0000 (15:21 -0400)]
bcachefs: more key marking refactoring

prep work for erasure coding

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: replicas: prep work for stripes
Kent Overstreet [Tue, 30 Oct 2018 18:32:21 +0000 (14:32 -0400)]
bcachefs: replicas: prep work for stripes

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: kill struct bch_replicas_cpu_entry
Kent Overstreet [Tue, 30 Oct 2018 18:14:19 +0000 (14:14 -0400)]
bcachefs: kill struct bch_replicas_cpu_entry

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: add functionality for heaps to update backpointers
Kent Overstreet [Sun, 21 Oct 2018 20:32:51 +0000 (16:32 -0400)]
bcachefs: add functionality for heaps to update backpointers

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: btree gc refactoring
Kent Overstreet [Sun, 21 Oct 2018 14:56:11 +0000 (10:56 -0400)]
bcachefs: btree gc refactoring

prep work for erasure coding

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: BCH_EXTENT_ENTRY_TYPES()
Kent Overstreet [Sun, 30 Sep 2018 22:39:20 +0000 (18:39 -0400)]
bcachefs: BCH_EXTENT_ENTRY_TYPES()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: bch2_extent_ptr_decoded_append()
Kent Overstreet [Fri, 28 Sep 2018 01:08:39 +0000 (21:08 -0400)]
bcachefs: bch2_extent_ptr_decoded_append()

This new helper for the move path avoids creating a new CRC entry when
we already have one that matches the pointer being added.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: bch2_extent_drop_ptrs()
Kent Overstreet [Sun, 30 Sep 2018 22:28:23 +0000 (18:28 -0400)]
bcachefs: bch2_extent_drop_ptrs()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: extent_for_each_ptr_decode()
Kent Overstreet [Fri, 28 Sep 2018 01:08:39 +0000 (21:08 -0400)]
bcachefs: extent_for_each_ptr_decode()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: kill bch_extent_crc_type
Kent Overstreet [Tue, 2 Oct 2018 20:40:12 +0000 (16:40 -0400)]
bcachefs: kill bch_extent_crc_type

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: extent_ptr_decoded
Kent Overstreet [Tue, 2 Oct 2018 15:03:39 +0000 (11:03 -0400)]
bcachefs: extent_ptr_decoded

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: fix missing include
Kent Overstreet [Fri, 12 Oct 2018 18:57:57 +0000 (14:57 -0400)]
bcachefs: fix missing include

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: fix a spurious gcc warning
Kent Overstreet [Fri, 12 Oct 2018 18:53:25 +0000 (14:53 -0400)]
bcachefs: fix a spurious gcc warning

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Allocation code refactoring
Kent Overstreet [Sat, 6 Oct 2018 08:12:42 +0000 (04:12 -0400)]
bcachefs: Allocation code refactoring

bch2_alloc_sectors_start() was a nightmare to work with - it's got some
tricky stuff to do, since it wants to use the buckets the writepoint
already has, unless they're not in the target it wants to write to,
unless it can't allocate from any other devices in which case it will
use those buckets if it has to - et cetera.

This restructures the code to start with a new empty list of open
buckets we're going to use for the new allocation, pulling buckets from
the write point's list as we decide that we really are going to use
them - making the code somewhat more functional and drastically easier
to understand.

Also fixes a bug where we could end up waiting on c->freelist_wait
(because allocating from one device failed) but return success from
bch2_bucket_alloc(), because allocating from a different device
succeeded.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Split out alloc_background.c
Kent Overstreet [Sat, 6 Oct 2018 04:46:55 +0000 (00:46 -0400)]
bcachefs: Split out alloc_background.c

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Fix failure to suspend
Kent Overstreet [Mon, 1 Oct 2018 04:33:42 +0000 (00:33 -0400)]
bcachefs: Fix failure to suspend

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Fix suspend when moving data faster than ratelimit
Kent Overstreet [Wed, 26 Sep 2018 03:27:57 +0000 (23:27 -0400)]
bcachefs: Fix suspend when moving data faster than ratelimit

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: fix bch2_acl_chmod()
Kent Overstreet [Fri, 21 Sep 2018 21:37:13 +0000 (17:37 -0400)]
bcachefs: fix bch2_acl_chmod()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Fix a deadlock
Kent Overstreet [Sat, 15 Sep 2018 21:57:22 +0000 (17:57 -0400)]
bcachefs: Fix a deadlock

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: fix a divide
Kent Overstreet [Thu, 6 Sep 2018 21:09:07 +0000 (17:09 -0400)]
bcachefs: fix a divide

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: make fsck spew less
Kent Overstreet [Tue, 28 Aug 2018 22:54:42 +0000 (18:54 -0400)]
bcachefs: make fsck spew less

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Dirent repair code
Kent Overstreet [Tue, 21 Aug 2018 23:42:00 +0000 (19:42 -0400)]
bcachefs: Dirent repair code

There was a bug for awhile in previous kernels where we weren't
computing dirent name lengths correctly and we weren't zeroing out
padding at the end of dirents (due to struct bch_dirent changing size by
adding __attribute__((aligned)), and not updating other code to use
offsetof).

This patch fixes dirents with junk at the end, by going off of the
dirent's hash.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Fix a btree iter bug when iter pos == POS_MAX
Kent Overstreet [Tue, 21 Aug 2018 21:38:41 +0000 (17:38 -0400)]
bcachefs: Fix a btree iter bug when iter pos == POS_MAX

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Comparison function cleanups
Kent Overstreet [Tue, 21 Aug 2018 20:30:14 +0000 (16:30 -0400)]
bcachefs: Comparison function cleanups

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Prioritize fragmentation in bucket allocator
Kent Overstreet [Tue, 21 Aug 2018 19:19:33 +0000 (15:19 -0400)]
bcachefs: Prioritize fragmentation in bucket allocator

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Pass around bset_tree less
Kent Overstreet [Sat, 11 Aug 2018 23:12:05 +0000 (19:12 -0400)]
bcachefs: Pass around bset_tree less

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: kill extent_insert_hook
Kent Overstreet [Wed, 8 Aug 2018 23:53:30 +0000 (19:53 -0400)]
bcachefs: kill extent_insert_hook

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: kill i_sectors_hook
Kent Overstreet [Sun, 5 Aug 2018 21:48:00 +0000 (17:48 -0400)]
bcachefs: kill i_sectors_hook

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: convert fcollapse to bch2_extent_update()
Kent Overstreet [Sat, 11 Aug 2018 21:26:11 +0000 (17:26 -0400)]
bcachefs: convert fcollapse to bch2_extent_update()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: convert fpunch to bch2_extent_update()
Kent Overstreet [Thu, 9 Aug 2018 01:11:43 +0000 (21:11 -0400)]
bcachefs: convert fpunch to bch2_extent_update()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: convert truncate to bch2_extent_update()
Kent Overstreet [Thu, 9 Aug 2018 01:09:31 +0000 (21:09 -0400)]
bcachefs: convert truncate to bch2_extent_update()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: convert bchfs_write_index_update() to bch2_extent_update()
Kent Overstreet [Wed, 8 Aug 2018 22:42:04 +0000 (18:42 -0400)]
bcachefs: convert bchfs_write_index_update() to bch2_extent_update()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: bch2_extent_trim_atomic()
Kent Overstreet [Sun, 5 Aug 2018 21:46:41 +0000 (17:46 -0400)]
bcachefs: bch2_extent_trim_atomic()

Prep work for extents insert hook removal

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: mempoolify btree_trans
Kent Overstreet [Thu, 9 Aug 2018 01:22:46 +0000 (21:22 -0400)]
bcachefs: mempoolify btree_trans

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: BTREE_INSERT_JOURNAL_RES_FULL is no longer possible
Kent Overstreet [Sun, 5 Aug 2018 19:21:52 +0000 (15:21 -0400)]
bcachefs: BTREE_INSERT_JOURNAL_RES_FULL is no longer possible

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: extent_squash() can no longer fail
Kent Overstreet [Sun, 5 Aug 2018 19:28:29 +0000 (15:28 -0400)]
bcachefs: extent_squash() can no longer fail

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: make struct btree_iter a bit smaller
Kent Overstreet [Sun, 5 Aug 2018 18:41:29 +0000 (14:41 -0400)]
bcachefs: make struct btree_iter a bit smaller

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: lift ordering restriction on 0 size extents
Kent Overstreet [Fri, 22 Jul 2016 03:05:06 +0000 (19:05 -0800)]
bcachefs: lift ordering restriction on 0 size extents

This lifts the restriction that 0 size extents must not overlap with
other extents, which means we can now sort extents and non extents the
same way, and will let us simplify a bunch of other stuff as well.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: extent unit tests
Kent Overstreet [Thu, 2 Aug 2018 03:03:41 +0000 (23:03 -0400)]
bcachefs: extent unit tests

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: bkey_written()
Kent Overstreet [Mon, 6 Aug 2018 02:23:44 +0000 (22:23 -0400)]
bcachefs: bkey_written()

also cleanups of btree node offsets

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: improved rw_aux_tree_bsearch()
Kent Overstreet [Mon, 6 Aug 2018 02:34:03 +0000 (22:34 -0400)]
bcachefs: improved rw_aux_tree_bsearch()

shouldn't be any reason for an actual binary search here

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Factor out btree_key_can_insert()
Kent Overstreet [Fri, 3 Aug 2018 23:41:44 +0000 (19:41 -0400)]
bcachefs: Factor out btree_key_can_insert()

working on getting rid of all the reasons bch2_insert_fixup_extent() can
fail/stop partway, which is needed for other refactorings.

One of the reasons we could have to bail out is if we're splitting a
compressed extent we might need to add to our disk reservation - but we
can check that before actually starting the insert.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: BCH_SB_RESERVE_BYTES
Kent Overstreet [Tue, 24 Jul 2018 18:55:05 +0000 (14:55 -0400)]
bcachefs: BCH_SB_RESERVE_BYTES

Add an option, gc_reserve_bytes, to set the copygc reserve as a size
instead of a percent

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Better calculation of copygc threshold
Kent Overstreet [Wed, 1 Aug 2018 18:26:55 +0000 (14:26 -0400)]
bcachefs: Better calculation of copygc threshold

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Change how replicated data is accounted
Kent Overstreet [Tue, 24 Jul 2018 20:42:49 +0000 (16:42 -0400)]
bcachefs: Change how replicated data is accounted

Due to compression, the different replicas of a replicated extent don't
necessarily have to take up the same amount of space - so replicated
data sector counts shouldn't be stored divided by the number of
replicas.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Account for internal fragmentation better
Kent Overstreet [Tue, 24 Jul 2018 18:54:39 +0000 (14:54 -0400)]
bcachefs: Account for internal fragmentation better

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: kill s_alloc, use bch_data_type
Kent Overstreet [Tue, 24 Jul 2018 17:33:07 +0000 (13:33 -0400)]
bcachefs: kill s_alloc, use bch_data_type

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: bch2_mark_key() now takes bch_data_type
Kent Overstreet [Tue, 24 Jul 2018 16:59:13 +0000 (12:59 -0400)]
bcachefs: bch2_mark_key() now takes bch_data_type

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Fix an assertion in the btree node merge path
Kent Overstreet [Tue, 24 Jul 2018 20:42:27 +0000 (16:42 -0400)]
bcachefs: Fix an assertion in the btree node merge path

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Fix locking in allocator thread
Kent Overstreet [Tue, 24 Jul 2018 23:45:22 +0000 (19:45 -0400)]
bcachefs: Fix locking in allocator thread

gc lock must be held while invalidating buckets - fixes
"1f7a95698e bcachefs: Invalidate buckets when writing to alloc btree"

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: fix bch2_val_to_text()
Kent Overstreet [Mon, 23 Jul 2018 13:13:07 +0000 (09:13 -0400)]
bcachefs: fix bch2_val_to_text()

was returning wrong value

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: minor fsync fix
Kent Overstreet [Mon, 23 Jul 2018 11:53:29 +0000 (07:53 -0400)]
bcachefs: minor fsync fix

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Assorted journal refactoring
Kent Overstreet [Mon, 23 Jul 2018 11:52:00 +0000 (07:52 -0400)]
bcachefs: Assorted journal refactoring

Also improve error reporting - only return an error from
bch2_journal_flush_seq() if we had an error writing that entry (i.e. not
if there was an error with a newer entry).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: fix last_seq_ondisk
Kent Overstreet [Mon, 23 Jul 2018 11:38:06 +0000 (07:38 -0400)]
bcachefs: fix last_seq_ondisk

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: fix mtime/ctime update on truncate
Kent Overstreet [Mon, 23 Jul 2018 09:48:53 +0000 (05:48 -0400)]
bcachefs: fix mtime/ctime update on truncate

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: fix fsync after create
Kent Overstreet [Mon, 23 Jul 2018 09:48:35 +0000 (05:48 -0400)]
bcachefs: fix fsync after create

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: fix nbuckets usage on device resize
Kent Overstreet [Mon, 23 Jul 2018 09:28:40 +0000 (05:28 -0400)]
bcachefs: fix nbuckets usage on device resize

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Invalidate buckets when writing to alloc btree
Kent Overstreet [Sun, 22 Jul 2018 14:43:01 +0000 (10:43 -0400)]
bcachefs: Invalidate buckets when writing to alloc btree

Prep work for persistent alloc information. Refactoring also lets us
make free_inc much smaller, which means a lot fewer buckets stranded on
freelists.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: kill bucket mark sector count saturation
Kent Overstreet [Sun, 22 Jul 2018 10:10:52 +0000 (06:10 -0400)]
bcachefs: kill bucket mark sector count saturation

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: don't call bch2_bucket_seq_cleanup from journal_buf_switch
Kent Overstreet [Sun, 22 Jul 2018 02:57:20 +0000 (22:57 -0400)]
bcachefs: don't call bch2_bucket_seq_cleanup from journal_buf_switch

journal_buf_switch is called from the foreground when getting a journal
reservation and thus is somewhat latency sensitive;
bch2_bucket_seq_cleanup has to run infrequently but is a bit expensive
when it does run.

Call it from the journal write path instead, and punt the journal write
to worqueue context.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Fix an assertion
Kent Overstreet [Sun, 22 Jul 2018 17:04:00 +0000 (13:04 -0400)]
bcachefs: Fix an assertion

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: fix rename + fsync
Kent Overstreet [Sun, 22 Jul 2018 17:15:51 +0000 (13:15 -0400)]
bcachefs: fix rename + fsync

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Use ei_update_lock consistently
Kent Overstreet [Sat, 21 Jul 2018 02:23:42 +0000 (22:23 -0400)]
bcachefs: Use ei_update_lock consistently

This is prep work for using deferred btree updates for inode updates -
the way inodes are done now we're relying on btree locking for ei_inode
and ei_update_lock could probably be removed, but it'll actually be
needed when we switch to deferred updates.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: bch2_trans_update() now takes struct btree_insert_entry
Kent Overstreet [Tue, 17 Jul 2018 19:28:11 +0000 (15:28 -0400)]
bcachefs: bch2_trans_update() now takes struct btree_insert_entry

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Fix mtime/ctime updates
Kent Overstreet [Tue, 17 Jul 2018 18:12:42 +0000 (14:12 -0400)]
bcachefs: Fix mtime/ctime updates

Also make inode flags consistent with how the rest of the inode is
updated

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Simplify bch2_write_inode_trans, fix lockdep splat
Kent Overstreet [Tue, 17 Jul 2018 18:03:47 +0000 (14:03 -0400)]
bcachefs: Simplify bch2_write_inode_trans, fix lockdep splat

ei_update_lock isn't currently needed for write inode (but it will be
needed again when deferred btree updates are used for inode updates)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: add bch_verbose() statements for shutdown
Kent Overstreet [Sat, 21 Jul 2018 07:56:57 +0000 (03:56 -0400)]
bcachefs: add bch_verbose() statements for shutdown

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Fix a use after free in the journal code
Kent Overstreet [Tue, 17 Jul 2018 16:19:14 +0000 (12:19 -0400)]
bcachefs: Fix a use after free in the journal code

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Fix device add
Kent Overstreet [Sat, 21 Jul 2018 02:08:17 +0000 (22:08 -0400)]
bcachefs: Fix device add

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: trace transaction restarts
Kent Overstreet [Fri, 13 Jul 2018 03:30:45 +0000 (23:30 -0400)]
bcachefs: trace transaction restarts

exceptionally crappy "tracing", but it's a start at documenting the
places restarts can be triggered

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Convert raw uses of bch2_btree_iter_link() to new transactions
Kent Overstreet [Thu, 12 Jul 2018 23:19:41 +0000 (19:19 -0400)]
bcachefs: Convert raw uses of bch2_btree_iter_link() to new transactions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Only check inode i_nlink during full fsck
Kent Overstreet [Sun, 15 Jul 2018 01:06:51 +0000 (21:06 -0400)]
bcachefs: Only check inode i_nlink during full fsck

Now that all filesystem operatinos that manipulate the filesystem
heirachy and i_nlink are fully atomic, we can add a feature bit to
indicate i_nlink doesn't need to be checked.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcachefs: Initial commit
Kent Overstreet [Fri, 17 Mar 2017 06:18:50 +0000 (22:18 -0800)]
bcachefs: Initial commit

Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write
filesystem with every feature you could possibly want.

Website: https://bcachefs.org

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agoMAINTAINERS: Add entry for bcachefs
Kent Overstreet [Wed, 26 Apr 2023 16:27:51 +0000 (12:27 -0400)]
MAINTAINERS: Add entry for bcachefs

bcachefs is a new copy-on-write filesystem; add a MAINTAINERS entry for
it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agoobjtool: Add bcachefs noreturns
Kent Overstreet [Sun, 10 Sep 2023 00:56:00 +0000 (20:56 -0400)]
objtool: Add bcachefs noreturns

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agolib/generic-radix-tree.c: Add peek_prev()
Kent Overstreet [Tue, 12 Sep 2023 05:17:22 +0000 (01:17 -0400)]
lib/generic-radix-tree.c: Add peek_prev()

This patch adds genradix_peek_prev(), genradix_iter_rewind(), and
genradix_for_each_reverse(), for iterating backwards over a generic
radix tree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agolib/generic-radix-tree.c: Don't overflow in peek()
Kent Overstreet [Sat, 13 Feb 2021 01:11:25 +0000 (20:11 -0500)]
lib/generic-radix-tree.c: Don't overflow in peek()

When we started spreading new inode numbers throughout most of the 64
bit inode space, that triggered some corner case bugs, in particular
some integer overflows related to the radix tree code. Oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
18 months agoMAINTAINERS: Add entry for generic-radix-tree
Kent Overstreet [Wed, 26 Apr 2023 16:27:51 +0000 (12:27 -0400)]
MAINTAINERS: Add entry for generic-radix-tree

lib/generic-radix-tree.c is a simple radix tree that supports storing
arbitrary types. Add a maintainers entry for it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agoclosures: Add a missing include
Kent Overstreet [Sun, 5 Mar 2023 03:45:27 +0000 (22:45 -0500)]
closures: Add a missing include

Fixes building in userspace.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agoclosures: closure_nr_remaining()
Kent Overstreet [Sat, 4 Mar 2023 07:39:39 +0000 (02:39 -0500)]
closures: closure_nr_remaining()

Factor out a new helper, which returns the number of events outstanding.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agoclosures: closure_wait_event()
Kent Overstreet [Sat, 9 Dec 2017 17:42:44 +0000 (12:42 -0500)]
closures: closure_wait_event()

Like wait_event() - except, because it uses closures and closure
waitlists it doesn't have the restriction on modifying task state inside
the condition check, like wait_event() does.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Acked-by: Coly Li <colyli@suse.de>
18 months agoMAINTAINERS: Add entry for closures
Kent Overstreet [Wed, 26 Apr 2023 16:27:51 +0000 (12:27 -0400)]
MAINTAINERS: Add entry for closures

closures, from bcache, are async widgets with a variety of uses.
bcachefs also uses them, so they're being moved to lib/; mark them as
maintained.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Acked-by: Coly Li <colyli@suse.de>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
18 months agobcache: move closures to lib/
Kent Overstreet [Sat, 18 Mar 2017 00:35:23 +0000 (16:35 -0800)]
bcache: move closures to lib/

Prep work for bcachefs - being a fork of bcache it also uses closures

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Acked-by: Coly Li <colyli@suse.de>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
18 months agolocking: export contention tracepoints for bcachefs six locks
Brian Foster [Mon, 14 Aug 2023 13:04:50 +0000 (09:04 -0400)]
locking: export contention tracepoints for bcachefs six locks

The bcachefs implementation of six locks is intended to land in
generic locking code in the long term, but has been pulled into the
bcachefs subsystem for internal use for the time being. This code
lift breaks the bcachefs module build as six locks depend a couple
of the generic locking tracepoints. Export these tracepoint symbols
for bcachefs.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
19 months agolib: Export errname
Kent Overstreet [Tue, 25 Apr 2023 18:45:28 +0000 (14:45 -0400)]
lib: Export errname

errname() returns the name of an errcode; this functionality is
otherwise only available for error pointers via %pE - bcachefs uses this
for better error messages.

Signed-off-by: Christopher James Halse Rogers <raof@ubuntu.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
19 months agolib/string_helpers: string_get_size() now returns characters wrote
Kent Overstreet [Mon, 25 Apr 2022 19:26:28 +0000 (15:26 -0400)]
lib/string_helpers: string_get_size() now returns characters wrote

printbuf now needs to know the number of characters that would have been
written if the buffer was too small, like snprintf(); this changes
string_get_size() to return the the return value of snprintf().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
19 months agostacktrace: Export stack_trace_save_tsk
Christopher James Halse Rogers [Mon, 27 Jun 2022 00:45:12 +0000 (10:45 +1000)]
stacktrace: Export stack_trace_save_tsk

The bcachefs module wants it, and there doesn't seem to be any
reason it shouldn't be exported like the other functions.

Signed-off-by: Christopher James Halse Rogers <raof@ubuntu.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
19 months agofs: factor out d_mark_tmpfile()
Kent Overstreet [Tue, 10 Jul 2018 03:27:33 +0000 (23:27 -0400)]
fs: factor out d_mark_tmpfile()

New helper for bcachefs - bcachefs doesn't want the
inode_dec_link_count() call that d_tmpfile does, it handles i_nlink on
its own atomically with other btree updates

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christian Brauner <brauner@kernel.org>
19 months agosched: Add task_struct->faults_disabled_mapping
Kent Overstreet [Wed, 16 Oct 2019 19:03:50 +0000 (15:03 -0400)]
sched: Add task_struct->faults_disabled_mapping

There has been a long standing page cache coherence bug with direct IO.
This provides part of a mechanism to fix it, currently just used by
bcachefs but potentially worth promoting to the VFS.

Direct IO evicts the range of the pagecache being read or written to.

For reads, we need dirty pages to be written to disk, so that the read
doesn't return stale data. For writes, we need to evict that range of
the pagecache so that it's not stale after the write completes.

However, without a locking mechanism to prevent those pages from being
re-added to the pagecache - by a buffered read or page fault - page
cache inconsistency is still possible.

This isn't necessarily just an issue for userspace when they're playing
games; filesystems may hang arbitrary state off the pagecache, and so
page cache inconsistency may cause real filesystem bugs, depending on
the filesystem. This is less of an issue for iomap based filesystems,
but e.g. buffer heads caches disk block mappings (!) and attaches them
to the pagecache, and bcachefs attaches disk reservations to pagecache
pages.

This issue has been hard to fix, because
 - we need to add a lock (henceforth called pagecache_add_lock), which
   would be held for the duration of the direct IO
 - page faults add pages to the page cache, thus need to take the same
   lock
 - dio -> gup -> page fault thus can deadlock

And we cannot enforce a lock ordering with this lock, since userspace
will be controlling the lock ordering (via the fd and buffer arguments
to direct IOs), so we need a different method of deadlock avoidance.

We need to tell the page fault handler that we're already holding a
pagecache_add_lock, and since plumbing it through the entire gup() path
would be highly impractical this adds a field to task_struct.

Then the full method is:
 - in the dio path, when we first take the pagecache_add_lock, note the
   mapping in the current task_struct
 - in the page fault handler, if faults_disabled_mapping is set, we
   check if it's the same mapping as the one we're taking a page fault
   for, and if so return an error.

   Then we check lock ordering: if there's a lock ordering violation and
   trylock fails, we'll have to cycle the locks and return an error that
   tells the DIO path to retry: faults_disabled_mapping is also used for
   signalling "locks were dropped, please retry".

Also relevant to this patch: mapping->invalidate_lock.
mapping->invalidate_lock provides most of the required semantics - it's
used by truncate/fallocate to block pages being added to the pagecache.
However, since it's a rwsem, direct IOs would need to take the write
side in order to block page cache adds, and would then be exclusive with
each other - we'll need a new type of lock to pair with this approach.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Jan Kara <jack@suse.cz>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: Andreas Grünbacher <andreas.gruenbacher@gmail.com>
19 months agoLinux 6.6-rc1
Linus Torvalds [Sun, 10 Sep 2023 23:28:41 +0000 (16:28 -0700)]
Linux 6.6-rc1

19 months agoMerge tag 'topic/drm-ci-2023-08-31-1' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Sun, 10 Sep 2023 18:55:26 +0000 (11:55 -0700)]
Merge tag 'topic/drm-ci-2023-08-31-1' of git://anongit.freedesktop.org/drm/drm

Pull drm ci scripts from Dave Airlie:
 "This is a bunch of ci integration for the freedesktop gitlab instance
  where we currently do upstream userspace testing on diverse sets of
  GPU hardware. From my perspective I think it's an experiment worth
  going with and seeing how the benefits/noise playout keeping these
  files useful.

  Ideally I'd like to get this so we can do pre-merge testing on PRs
  eventually.

  Below is some info from danvet on why we've ended up making the
  decision and how we can roll it back if we decide it was a bad plan.

  Why in upstream?

   - like documentation, testcases, tools CI integration is one of these
     things where you can waste endless amounts of time if you
     accidentally have a version that doesn't match your source code

   - but also like the above, there's a balance, this is the initial cut
     of what we think makes sense to keep in sync vs out-of-tree,
     probably needs adjustment

   - gitlab supports out-of-repo gitlab integration and that's what's
     been used for the kernel in drm, but it results in per-driver
     fragmentation and lots of duplicated effort. the simple act of
     smashing an arbitrary winner into a topic branch already started
     surfacing patches on dri-devel and sparking good cross driver team
     discussions

  Why gitlab?

   - it's not any more shit than any of the other CI

   - drm userspace uses it extensively for everything in userspace, we
     have a lot of people and experience with this, including
     integration of hw testing labs

   - media userspace like gstreamer is also on gitlab.fd.o, and there's
     discussion to extend this to the media subsystem in some fashion

  Can this be shared?

   - there's definitely a pile of code that could move to scripts/ if
     other subsystem adopt ci integration in upstream kernel git. other
     bits are more drm/gpu specific like the igt-gpu-tests/tools
     integration

   - docker images can be run locally or in other CI runners

  Will we regret this?

   - it's all in one directory, intentionally, for easy deletion

   - probably 1-2 years in upstream to see whether this is worth it or a
     Big Mistake. that's roughly what it took to _really_ roll out solid
     CI in the bigger userspace projects we have on gitlab.fd.o like
     mesa3d"

* tag 'topic/drm-ci-2023-08-31-1' of git://anongit.freedesktop.org/drm/drm:
  drm: ci: docs: fix build warning - add missing escape
  drm: Add initial ci/ subdirectory

19 months agoMerge tag 'x86-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 10 Sep 2023 17:39:31 +0000 (10:39 -0700)]
Merge tag 'x86-urgent-2023-09-10' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
 "Fix preemption delays in the SGX code, remove unnecessarily
  UAPI-exported code, fix a ld.lld linker (in)compatibility quirk and
  make the x86 SMP init code a bit more conservative to fix kexec()
  lockups"

* tag 'x86-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sgx: Break up long non-preemptible delays in sgx_vepc_release()
  x86: Remove the arch_calc_vm_prot_bits() macro from the UAPI
  x86/build: Fix linker fill bytes quirk/incompatibility for ld.lld
  x86/smp: Don't send INIT to non-present and non-booted CPUs

19 months agoMerge tag 'perf-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 10 Sep 2023 17:34:46 +0000 (10:34 -0700)]
Merge tag 'perf-urgent-2023-09-10' of git://git./linux/kernel/git/tip/tip

Pull x86 perf event fix from Ingo Molnar:
 "Work around a firmware bug in the uncore PMU driver, affecting certain
  Intel systems"

* tag 'perf-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/uncore: Correct the number of CHAs on EMR

19 months agoMerge tag 'perf-tools-for-v6.6-1-2023-09-05' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sun, 10 Sep 2023 03:06:17 +0000 (20:06 -0700)]
Merge tag 'perf-tools-for-v6.6-1-2023-09-05' of git://git./linux/kernel/git/perf/perf-tools

Pull perf tools updates from Arnaldo Carvalho de Melo:
 "perf tools maintainership:

   - Add git information for perf-tools and perf-tools-next trees and
     branches to the MAINTAINERS file. That is where development now
     takes place and myself and Namhyung Kim have write access, more
     people to come as we emulate other maintainer groups.

  perf record:

   - Record kernel data maps when 'perf record --data' is used, so that
     global variables can be resolved and used in tools that do data
     profiling.

  perf trace:

   - Remove the old, experimental support for BPF events in which a .c
     file was passed as an event: "perf trace -e hello.c" to then get
     compiled and loaded.

     The only known usage for that, that shipped with the kernel as an
     example for such events, augmented the raw_syscalls tracepoints and
     was converted to a libbpf skeleton, reusing all the user space
     components and the BPF code connected to the syscalls.

     In the end just the way to glue the BPF part and the user space
     type beautifiers changed, now being performed by libbpf skeletons.

     The next step is to use BTF to do pretty printing of all syscall
     types, as discussed with Alan Maguire and others.

     Now, on a perf built with BUILD_BPF_SKEL=1 we get most if not all
     path/filenames/strings, some of the networking data structures,
     perf_event_attr, etc, i.e. systemwide tracing of nanosleep calls
     and perf_event_open syscalls while 'perf stat' runs 'sleep' for 5
     seconds:

      # perf trace -a -e *nanosleep,perf* perf stat -e cycles,instructions sleep 5
         0.000 (   9.034 ms): perf/327641 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 327642 (perf), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3
         9.039 (   0.006 ms): perf/327641 perf_event_open(attr_uptr: { type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x1 (PERF_COUNT_HW_INSTRUCTIONS), sample_type: IDENTIFIER, read_format: TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING, disabled: 1, inherit: 1, enable_on_exec: 1, exclude_guest: 1 }, pid: 327642 (perf-exec), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4
             ? (           ): gpm/991  ... [continued]: clock_nanosleep())               = 0
        10.133 (           ): sleep/327642 clock_nanosleep(rqtp: { .tv_sec: 5, .tv_nsec: 0 }, rmtp: 0x7ffd36f83ed0) ...
             ? (           ): pool-gsd-smart/3051  ... [continued]: clock_nanosleep())   = 0
        30.276 (           ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ...
       223.215 (1000.430 ms): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) = 0
        30.276 (2000.394 ms): gpm/991  ... [continued]: clock_nanosleep())               = 0
      1230.814 (           ): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) ...
      1230.814 (1000.404 ms): pool-gsd-smart/3051  ... [continued]: clock_nanosleep())   = 0
      2030.886 (           ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ...
      2237.709 (1000.153 ms): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) = 0
             ? (           ): crond/1172  ... [continued]: clock_nanosleep())            = 0
      3242.699 (           ): pool-gsd-smart/3051 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 0 }, rmtp: 0x7f6e7fffec90) ...
      2030.886 (2000.385 ms): gpm/991  ... [continued]: clock_nanosleep())               = 0
      3728.078 (           ): crond/1172 clock_nanosleep(rqtp: { .tv_sec: 60, .tv_nsec: 0 }, rmtp: 0x7ffe0971dcf0) ...
      3242.699 (1000.158 ms): pool-gsd-smart/3051  ... [continued]: clock_nanosleep())   = 0
      4031.409 (           ): gpm/991 clock_nanosleep(rqtp: { .tv_sec: 2, .tv_nsec: 0 }, rmtp: 0x7ffcc6f73710) ...
        10.133 (5000.375 ms): sleep/327642  ... [continued]: clock_nanosleep())          = 0

      Performance counter stats for 'sleep 5':

             2,617,347      cycles
             1,855,997      instructions                     #    0.71  insn per cycle

           5.002282128 seconds time elapsed

           0.000855000 seconds user
           0.000852000 seconds sys

  perf annotate:

   - Building with binutils' libopcode now is opt-in (BUILD_NONDISTRO=1)
     for licensing reasons, and we missed a build test on
     tools/perf/tests makefile.

     Since we now default to NDEBUG=1, we ended up segfaulting when
     building with BUILD_NONDISTRO=1 because a needed initialization
     routine was being "error checked" via an assert.

     Fix it by explicitly checking the result and aborting instead if it
     fails.

     We better back propagate the error, but at least 'perf annotate' on
     samples collected for a BPF program is back working when perf is
     built with BUILD_NONDISTRO=1.

  perf report/top:

   - Add back TUI hierarchy mode header, that is seen when using 'perf
     report/top --hierarchy'.

   - Fix the number of entries for 'e' key in the TUI that was
     preventing navigation of lines when expanding an entry.

  perf report/script:

   - Support cross platform register handling, allowing a perf.data file
     collected on one architecture to have registers sampled correctly
     displayed when analysis tools such as 'perf report' and 'perf
     script' are used on a different architecture.

   - Fix handling of event attributes in pipe mode, i.e. when one uses:

   perf record -o - | perf report -i -

     When no perf.data files are used.

   - Handle files generated via pipe mode with a version of perf and
     then read also via pipe mode with a different version of perf,
     where the event attr record may have changed, use the record size
     field to properly support this version mismatch.

  perf probe:

   - Accessing global variables from uprobes isn't supported, make the
     error message state that instead of stating that some minimal
     kernel version is needed to have that feature. This seems just a
     tool limitation, the kernel probably has all that is needed.

  perf tests:

   - Fix a reference count related leak in the dlfilter v0 API where the
     result of a thread__find_symbol_fb() is not matched with an
     addr_location__exit() to drop the reference counts of the resolved
     components (machine, thread, map, symbol, etc). Add a dlfilter test
     to make sure that doesn't regresses.

   - Lots of fixes for the 'perf test' written in shell script related
     to problems found with the shellcheck utility.

   - Fixes for 'perf test' shell scripts testing features enabled when
     perf is built with BUILD_BPF_SKEL=1, such as 'perf stat' bpf
     counters.

   - Add perf record sample filtering test, things like the following
     example, that gets implemented as a BPF filter attached to the
     event:

       # perf record -e task-clock -c 10000 --filter 'ip < 0xffffffff00000000'

   - Improve the way the task_analyzer test checks if libtraceevent is
     linked, using 'perf version --build-options' instead of the more
     expensinve 'perf record -e "sched:sched_switch"'.

   - Add support for riscv in the mmap-basic test. (This went as well
     via the RiscV tree, same contents).

  libperf:

   - Implement riscv mmap support (This went as well via the RiscV tree,
     same contents).

  perf script:

   - New tool that converts perf.data files to the firefox profiler
     format so that one can use the visualizer at
     https://profiler.firefox.com/. Done by Anup Sharma as part of this
     year's Google Summer of Code.

     One can generate the output and upload it to the web interface but
     Anup also automated everything:

       perf script gecko -F 99 -a sleep 60

   - Support syscall name parsing on arm64.

   - Print "cgroup" field on the same line as "comm".

  perf bench:

   - Add new 'uprobe' benchmark to measure the overhead of uprobes
     with/without BPF programs attached to it.

   - breakpoints are not available on power9, skip that test.

  perf stat:

   - Add #num_cpus_online literal to be used in 'perf stat' metrics, and
     add this extra 'perf test' check that exemplifies its purpose:

   TEST_ASSERT_VAL("#num_cpus_online",
                         expr__parse(&num_cpus_online, ctx, "#num_cpus_online") == 0);
   TEST_ASSERT_VAL("#num_cpus", expr__parse(&num_cpus, ctx, "#num_cpus") == 0);
   TEST_ASSERT_VAL("#num_cpus >= #num_cpus_online", num_cpus >= num_cpus_online);

  Miscellaneous:

   - Improve tool startup time by lazily reading PMU, JSON, sysfs data.

   - Improve error reporting in the parsing of events, passing YYLTYPE
     to error routines, so that the output can show were the parsing
     error was found.

   - Add 'perf test' entries to check the parsing of events
     improvements.

   - Fix various leak for things detected by -fsanitize=address, mostly
     things that would be freed at tool exit, including:

       - Free evsel->filter on the destructor.

       - Allow tools to register a thread->priv destructor and use it in
         'perf trace'.

       - Free evsel->priv in 'perf trace'.

       - Free string returned by synthesize_perf_probe_point() when the
         caller fails to do all it needs.

   - Adjust various compiler options to not consider errors some
     warnings when building with broken headers found in things like
     python, flex, bison, as we otherwise build with -Werror. Some for
     gcc, some for clang, some for some specific version of those, some
     for some specific version of flex or bison, or some specific
     combination of these components, bah.

   - Allow customization of clang options for BPF target, this helps
     building on gentoo where there are other oddities where BPF targets
     gets passed some compiler options intended for the native build, so
     building with WERROR=0 helps while these oddities are fixed.

   - Dont pass ERR_PTR() values to perf_session__delete() in 'perf top'
     and 'perf lock', fixing some segfaults when handling some odd
     failures.

   - Add LTO build option.

   - Fix format of unordered lists in the perf docs
     (tools/perf/Documentation)

   - Overhaul the bison files, using constructs such as YYNOMEM.

   - Remove unused tokens from the bison .y files.

   - Add more comments to various structs.

   - A few LoongArch enablement patches.

  Vendor events (JSON):

   - Add JSON metrics for Yitian 710 DDR (aarch64). Things like:

   EventName, BriefDescription
   visible_window_limit_reached_rd, "At least one entry in read queue reaches the visible window limit.",
   visible_window_limit_reached_wr, "At least one entry in write queue reaches the visible window limit.",
   op_is_dqsosc_mpc        , "A DQS Oscillator MPC command to DRAM.",
   op_is_dqsosc_mrr        , "A DQS Oscillator MRR command to DRAM.",
   op_is_tcr_mrr        , "A Temperature Compensated Refresh(TCR) MRR command to DRAM.",

   - Add AmpereOne metrics (aarch64).

   - Update N2 and V2 metrics (aarch64) and events using Arm telemetry
     repo.

   - Update scale units and descriptions of common topdown metrics on
     aarch64. Things like:
       - "MetricExpr": "stall_slot_frontend / (#slots * cpu_cycles)",
       - "BriefDescription": "Frontend bound L1 topdown metric",
       + "MetricExpr": "100 * (stall_slot_frontend / (#slots * cpu_cycles))",
       + "BriefDescription": "This metric is the percentage of total slots that were stalled due to resource constraints in the frontend of the processor.",

   - Update events for intel: meteorlake to 1.04, sapphirerapids to
     1.15, Icelake+ metric constraints.

   - Update files for the power10 platform"

* tag 'perf-tools-for-v6.6-1-2023-09-05' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (217 commits)
  perf parse-events: Fix driver config term
  perf parse-events: Fixes relating to no_value terms
  perf parse-events: Fix propagation of term's no_value when cloning
  perf parse-events: Name the two term enums
  perf list: Don't print Unit for "default_core"
  perf vendor events intel: Fix modifier in tma_info_system_mem_parallel_reads for skylake
  perf dlfilter: Avoid leak in v0 API test use of resolve_address()
  perf metric: Add #num_cpus_online literal
  perf pmu: Remove str from perf_pmu_alias
  perf parse-events: Make common term list to strbuf helper
  perf parse-events: Minor help message improvements
  perf pmu: Avoid uninitialized use of alias->str
  perf jevents: Use "default_core" for events with no Unit
  perf test stat_bpf_counters_cgrp: Enhance perf stat cgroup BPF counter test
  perf test shell stat_bpf_counters: Fix test on Intel
  perf test shell record_bpf_filter: Skip 6.2 kernel
  libperf: Get rid of attr.id field
  perf tools: Convert to perf_record_header_attr_id()
  libperf: Add perf_record_header_attr_id()
  perf tools: Handle old data in PERF_RECORD_ATTR
  ...

19 months agoMerge tag '6.6-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 10 Sep 2023 02:56:23 +0000 (19:56 -0700)]
Merge tag '6.6-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - six smb3 client fixes including ones to allow controlling smb3
   directory caching timeout and limits, and one debugging improvement

 - one fix for nls Kconfig (don't need to expose NLS_UCS2_UTILS option)

 - one minor spnego registry update

* tag '6.6-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  spnego: add missing OID to oid registry
  smb3: fix minor typo in SMB2_GLOBAL_CAP_LARGE_MTU
  cifs: update internal module version number for cifs.ko
  smb3: allow controlling maximum number of cached directories
  smb3: add trace point for queryfs (statfs)
  nls: Hide new NLS_UCS2_UTILS
  smb3: allow controlling length of time directory entries are cached with dir leases
  smb: propagate error code of extract_sharename()

19 months agoiov_iter: Kunit tests for page extraction
David Howells [Fri, 8 Sep 2023 16:03:22 +0000 (17:03 +0100)]
iov_iter: Kunit tests for page extraction

Add some kunit tests for page extraction for ITER_BVEC, ITER_KVEC and
ITER_XARRAY type iterators.  ITER_UBUF and ITER_IOVEC aren't dealt with
as they require userspace VM interaction.  ITER_DISCARD isn't dealt with
either as that can't be extracted.

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
19 months agoiov_iter: Kunit tests for copying to/from an iterator
David Howells [Fri, 8 Sep 2023 16:03:21 +0000 (17:03 +0100)]
iov_iter: Kunit tests for copying to/from an iterator

Add some kunit tests for page extraction for ITER_BVEC, ITER_KVEC and
ITER_XARRAY type iterators.  ITER_UBUF and ITER_IOVEC aren't dealt with
as they require userspace VM interaction.  ITER_DISCARD isn't dealt with
either as that does nothing.

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
19 months agoiov_iter: Fix iov_iter_extract_pages() with zero-sized entries
David Howells [Fri, 8 Sep 2023 16:03:20 +0000 (17:03 +0100)]
iov_iter: Fix iov_iter_extract_pages() with zero-sized entries

iov_iter_extract_pages() doesn't correctly handle skipping over initial
zero-length entries in ITER_KVEC and ITER_BVEC-type iterators.

The problem is that it accidentally reduces maxsize to 0 when it
skipping and thus runs to the end of the array and returns 0.

Fix this by sticking the calculated size-to-copy in a new variable
rather than back in maxsize.

Fixes: 7d58fe731028 ("iov_iter: Add a function to extract a page list from an iterator")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Hildenbrand <david@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
19 months agoMerge tag 'sh-for-v6.6-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubit...
Linus Torvalds [Sat, 9 Sep 2023 21:46:57 +0000 (14:46 -0700)]
Merge tag 'sh-for-v6.6-tag1' of git://git./linux/kernel/git/glaubitz/sh-linux

Pull sh updates from Adrian Glaubitz:

 - Fix a use-after-free bug in the push-switch driver (Duoming Zhou)

 - Fix calls to dma_declare_coherent_memory() that incorrectly passed
   the buffer end address instead of the buffer size as the size
   parameter

* tag 'sh-for-v6.6-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
  sh: push-switch: Reorder cleanup operations to avoid use-after-free bug
  sh: boards: Fix CEU buffer size passed to dma_declare_coherent_memory()