Ming Lei [Tue, 12 Apr 2022 08:56:13 +0000 (16:56 +0800)]
dm: improve bio splitting and associated IO accounting
The current DM code (ab)uses late assignment of dm_io->orig_bio (after
__map_bio() returns and any bio splitting is complete) to indicate the
FS bio has been processed and can be accounted. This results in
awkward waiting until ->orig_bio is set in dm_submit_bio_remap().
Also the bio splitting was implemented using bio_split()+bio_chain()
-- a well-worn pattern but it requires bio cloning purely for the
benefit of more natural IO accounting. The bio_split() result was
stored in ->orig_bio to represent the mapped part of the original FS
bio.
DM has switched to the bdev based IO accounting interface. DM's IO
accounting can be implemented in terms of the original FS bio (now
stored early in ->orig_bio) via access to its sectors/bio_op. And
if/when splitting is needed, set a new DM_IO_WAS_SPLIT flag and use
new dm_io fields of .sector_offset & .sectors to allow IO accounting
for split bios _without_ needing to clone a new bio to store in
->orig_bio.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Ming Lei [Tue, 12 Apr 2022 08:56:12 +0000 (16:56 +0800)]
dm: switch to bdev based IO accounting interfaces
DM splits flush with data into empty flush followed by bio with data
payload, switch dm_io_acct() to use bdev_{start,end}_io_acct() to do
this accoiunting more naturally (rather than temporarily changing the
bio's bi_size).
This will allow DM to more easily account bios that are split (in
following commit).
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Ming Lei [Tue, 12 Apr 2022 08:56:11 +0000 (16:56 +0800)]
dm: pass dm_io instance to dm_io_acct directly
All the other 4 parameters are retrieved from the 'dm_io' instance, so
it's not necessary to pass all four to dm_io_acct().
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Ming Lei [Tue, 12 Apr 2022 08:56:10 +0000 (16:56 +0800)]
dm: don't pass bio to __dm_start_io_acct and dm_end_io_acct
dm->orig_bio is always passed to __dm_start_io_acct and dm_end_io_acct,
so it isn't necessary to take one bio parameter for the two helpers.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mike Snitzer [Fri, 15 Apr 2022 07:04:59 +0000 (03:04 -0400)]
dm: use bio_sectors in dm_aceept_partial_bio
Rename 'bi_size' to 'bio_sectors' given bi_size is being stored in
sectors. Also, use bio_sectors() rather than open-coding it.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mike Snitzer [Wed, 30 Mar 2022 17:52:10 +0000 (13:52 -0400)]
dm: simplify basic targets
Remove needless factoring and remap bi_sector regardless of
bio_sectors() being non-zero.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mike Snitzer [Sat, 26 Mar 2022 18:14:00 +0000 (14:14 -0400)]
dm: conditionally enable branching for less used features
Use jump_labels to further reduce cost of unlikely branches for zoned
block devices, dm-stats and swap_bios throttling.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mike Snitzer [Sun, 27 Mar 2022 01:08:36 +0000 (21:08 -0400)]
dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio
If a bio is marked REQ_NOWAIT optimize dm_submit_bio()'s dm_table RCU
usage to dm_{get,put}_live_table_fast.
DM core offers protection against blocking (via suspend) if REQ_NOWAIT.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mike Snitzer [Sat, 26 Mar 2022 18:38:07 +0000 (14:38 -0400)]
dm: move hot dm_io members to same cacheline as dm_target_io
Just saves some cacheline bouncing for members accessed during cloned
bio submission and completion.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mike Snitzer [Sat, 26 Mar 2022 17:46:06 +0000 (13:46 -0400)]
dm: add local variables to clone_endio and __map_bio
Avoid redundant dereferences in both functions.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mike Snitzer [Fri, 25 Mar 2022 21:20:45 +0000 (17:20 -0400)]
dm: mark various branches unlikely
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mike Snitzer [Fri, 25 Mar 2022 18:12:47 +0000 (14:12 -0400)]
dm: simplify dm_start_io_acct
Pull common DM_IO_ACCOUNTED check out to beginning of dm_start_io_acct.
Also, use dm_tio_is_normal (and move it to dm-core.h).
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mike Snitzer [Fri, 25 Mar 2022 17:53:23 +0000 (13:53 -0400)]
dm: simplify dm_io access in dm_split_and_process_bio
Use local variable instead of redudant access using ci.io
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mike Snitzer [Thu, 17 Mar 2022 17:52:06 +0000 (13:52 -0400)]
dm: factor out dm_io_set_error and __dm_io_dec_pending
Also eliminate need to use errno_to_blk_status().
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mike Snitzer [Thu, 24 Mar 2022 18:36:47 +0000 (14:36 -0400)]
dm: conditionally enable BIOSET_PERCPU_CACHE for dm_io bioset
A bioset's per-cpu alloc cache may have broader utility in the future
but for now constrain it to being tightly coupled to QUEUE_FLAG_POLL.
Also change dm_io_complete() to use bio_clear_polled() so that it
properly clears all associated bio state on requeue.
This commit improves DM's hipri bio polling (REQ_POLLED) perf by
7 - 20% depending on the system.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Christoph Hellwig [Wed, 4 May 2022 14:33:55 +0000 (07:33 -0700)]
block: improve the error message from bio_check_eod
Print the start sector and length separately instead of the combined
value to help with debugging.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20220504143355.568660-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 4 May 2022 14:29:50 +0000 (07:29 -0700)]
block: allow passing a NULL bdev to bio_alloc_clone/bio_init_clone
Device mapper wants to allocate a bio before knowing the device it
gets send to, so add explicit support for that.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Link: https://lore.kernel.org/r/20220504142950.567582-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 4 May 2022 14:29:49 +0000 (07:29 -0700)]
block: remove superfluous calls to blkcg_bio_issue_init
blkcg_bio_issue_init is called in submit_bio. There is no need to have
extra calls that just get overriden in __bio_clone and the two places
that copy and pasted from it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Link: https://lore.kernel.org/r/20220504142950.567582-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 20 Apr 2022 04:27:23 +0000 (06:27 +0200)]
kthread: unexport kthread_blkcg
kthread_blkcg is only used by the built-in blk-cgroup code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220420042723.1010598-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 20 Apr 2022 04:27:22 +0000 (06:27 +0200)]
blk-cgroup: cleanup blkcg_maybe_throttle_current
Use blkcg_css instead of opencoding it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220420042723.1010598-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 20 Apr 2022 04:27:21 +0000 (06:27 +0200)]
blk-cgroup: cleanup blk_cgroup_congested
Use blkcg_css instead of open coding it, and switch to a slightly
more natural for loop.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220420042723.1010598-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 20 Apr 2022 04:27:20 +0000 (06:27 +0200)]
blk-cgroup: move blkcg_css to blk-cgroup.c
blkcg_css is only used in blk-cgroup.c, so move it there.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220420042723.1010598-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 20 Apr 2022 04:27:19 +0000 (06:27 +0200)]
blk-cgroup: remove unneeded includes from <linux/blk-cgroup.h>
Remove all the includes that aren't actually needed from
<linux/blk-cgroup.h> and push them to the actual source files where
needed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220420042723.1010598-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 20 Apr 2022 04:27:18 +0000 (06:27 +0200)]
blk-cgroup: remove pointless CONFIG_BLOCK ifdefs
No need to make BLK_CGROUP stubs conditional on CONFIG_BLOCK as they
can't be used without that.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220420042723.1010598-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 20 Apr 2022 04:27:17 +0000 (06:27 +0200)]
blk-cgroup: replace bio_blkcg with bio_blkcg_css
All callers of bio_blkcg actually want the CSS, so replace it with an
interface that does return the CSS. This now allows to move
struct blkcg_gq to block/blk-cgroup.h instead of exposing it in a
public header.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220420042723.1010598-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 20 Apr 2022 04:27:16 +0000 (06:27 +0200)]
blktrace: cleanup the __trace_note_message interface
Pass the cgroup_subsys_state instead of a the blkg so that blktrace
doesn't need to poke into blk-cgroup internals, and give the name a
blk prefix as the current name is way too generic for a public
interface.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220420042723.1010598-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 20 Apr 2022 04:27:15 +0000 (06:27 +0200)]
blk-cgroup: move struct blkcg to block/blk-cgroup.h
There is no real need to expose the blkcg structure to the whole kernel.
Move it to the private header an expose a helper to let the writeback
code access the cgwb_list member.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220420042723.1010598-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 20 Apr 2022 04:27:14 +0000 (06:27 +0200)]
blk-cgroup: move blkcg_{pin,unpin}_online out of line
Move these two functions out of line as there is no good reason
to inline them. Also switch to passing a cgroup_subsys_state
instead of doing the conversion in the caller to prepare for making
the blkcg structure private to blk-cgroup.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220420042723.1010598-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 20 Apr 2022 04:27:13 +0000 (06:27 +0200)]
blk-cgroup: move blk_cgroup_congested out line
There is no urgent need to inline this function, so move it out of line.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220420042723.1010598-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 20 Apr 2022 04:27:12 +0000 (06:27 +0200)]
blk-cgroup: move blkcg_{get,set}_fc_appid out of line
No need to have these helpers inline. Also remove the stubs and just use
an IS_ENABLED for the get side (the set side already is only built
conditionlly).
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220420042723.1010598-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 20 Apr 2022 04:27:11 +0000 (06:27 +0200)]
nvme-fc: fold t fc_update_appid into fc_appid_store
No need for this wrapper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220420042723.1010598-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 20 Apr 2022 04:27:10 +0000 (06:27 +0200)]
nvme-fc: don't support the appid attribute without CONFIG_BLK_CGROUP_FC_APPID
nvme-fc appid support needs CONFIG_BLK_CGROUP_FC_APPID to work, so disable
the whole code if the option is not set.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220420042723.1010598-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 20 Apr 2022 04:27:09 +0000 (06:27 +0200)]
blk-cgroup: remove __bio_blkcg
Remove the unused and deprecated __bio_blkcg helper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220420042723.1010598-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Wed, 20 Apr 2022 14:31:10 +0000 (22:31 +0800)]
block: ignore RWF_HIPRI hint for sync dio
So far bio is marked as REQ_POLLED if RWF_HIPRI/IOCB_HIPRI is passed
from userspace sync io interface, then block layer tries to poll until
the bio is completed. But the current implementation calls
blk_io_schedule() if bio_poll() returns 0, and this way causes io hang or
timeout easily.
But looks no one reports this kind of issue, which should have been
triggered in normal io poll sanity test or blktests block/007 as
observed by Changhui, that means it is very likely that no one uses it
or no one cares it.
Also after io_uring is invented, io poll for sync dio becomes legacy
interface.
So ignore RWF_HIPRI hint for sync dio.
CC: linux-mm@kvack.org
Cc: linux-xfs@vger.kernel.org
Reported-by: Changhui Zhong <czhong@redhat.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Changhui Zhong <czhong@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220420143110.2679002-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Michal Orzel [Sat, 23 Apr 2022 11:38:11 +0000 (13:38 +0200)]
block/partitions/ldm: Remove redundant assignments
Get rid of the following redundant assignments:
- to a variable r_cols from function ldm_parse_cmp3
- to variables r_id1 and r_id2 from functions ldm_parse_dgr3 and ldm_parse_dgr4
- to a variable r_index from function ldm_parse_prt3
that end up in values not being read until the end of function.
Reported by clang-tidy [deadcode.DeadStores]
Signed-off-by: Michal Orzel <michalorzel.eng@gmail.com>
Link: https://lore.kernel.org/r/20220423113811.13335-5-michalorzel.eng@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Michal Orzel [Sat, 23 Apr 2022 11:38:10 +0000 (13:38 +0200)]
block/partitions/atari: Remove redundant assignment
Get rid of redundant assignment to a variable part_fmt from function
atari_partition. It is being assigned a value that is never read until
the end of function.
Reported by clang-tidy [deadcode.DeadStores]
Signed-off-by: Michal Orzel <michalorzel.eng@gmail.com>
Link: https://lore.kernel.org/r/20220423113811.13335-4-michalorzel.eng@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Michal Orzel [Sat, 23 Apr 2022 11:38:09 +0000 (13:38 +0200)]
block/partitions/acorn: Remove redundant assignments
Get rid of redundant assignments to a variable slot from function
adfspart_check_ADFS. It is being assigned a value that is never read
as we are breaking out from the switch case and the function ends.
Reported by clang-tidy [deadcode.DeadStores]
Signed-off-by: Michal Orzel <michalorzel.eng@gmail.com>
Link: https://lore.kernel.org/r/20220423113811.13335-3-michalorzel.eng@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Michal Orzel [Sat, 23 Apr 2022 11:38:08 +0000 (13:38 +0200)]
block/blk-map: Remove redundant assignment
Get rid of redundant assignment to a variable ret from function
bio_map_user_iov as it is being assigned a value that is never read.
It is being re-assigned in the first instruction after the while loop
Reported by clang-tidy [deadcode.DeadStores]
Signed-off-by: Michal Orzel <michalorzel.eng@gmail.com>
Link: https://lore.kernel.org/r/20220423113811.13335-2-michalorzel.eng@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Michal Orzel [Sat, 23 Apr 2022 11:38:07 +0000 (13:38 +0200)]
block/badblocks: Remove redundant assignments
Get rid of redundant assignments to a variable sectors from functions
badblocks_check and badblocks_clear. This variable, that is a function
parameter, is being assigned a value that is never read until the end of
function.
Reported by clang-tidy [deadcode.DeadStores]
Signed-off-by: Michal Orzel <michalorzel.eng@gmail.com>
Link: https://lore.kernel.org/r/20220423113811.13335-1-michalorzel.eng@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Mon, 18 Apr 2022 02:27:13 +0000 (22:27 -0400)]
block: change exported IO accounting interface from gendisk to bdev
Export IO accounting interfaces in terms of block_device now that
gendisk has become more internal to block core.
Rename __part_{start,end}_io_acct's first argument from part to bdev.
Rename __part_{start,end}_io_acct to bdev_{start,end}_io_acct and
export them. Remove disk_{start,end}_io_acct and update caller (zram)
to use bdev_{start,end}_io_acct.
DM can now be updated to use bdev_{start,end}_io_acct.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Link: https://lore.kernel.org/r/20220418022733.56168-2-snitzer@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:58 +0000 (06:52 +0200)]
direct-io: remove random prefetches
Randomly poking into block device internals for manual prefetches isn't
exactly a very maintainable thing to do. And none of the performance
critical direct I/O implementations still use this library function
anyway, so just drop it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20220415045258.199825-28-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:57 +0000 (06:52 +0200)]
block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD
Secure erase is a very different operation from discard in that it is
a data integrity operation vs hint. Fully split the limits and helper
infrastructure to make the separation more clear.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd]
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nifs2]
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> [f2fs]
Acked-by: Coly Li <colyli@suse.de> [bcache]
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Acked-by: Chao Yu <chao@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-27-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:56 +0000 (06:52 +0200)]
block: add a bdev_discard_granularity helper
Abstract away implementation details from file systems by providing a
block_device based helper to retrieve the discard granularity.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd]
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Link: https://lore.kernel.org/r/20220415045258.199825-26-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:55 +0000 (06:52 +0200)]
block: remove QUEUE_FLAG_DISCARD
Just use a non-zero max_discard_sectors as an indicator for discard
support, similar to what is done for write zeroes.
The only places where needs special attention is the RAID5 driver,
which must clear discard support for security reasons by default,
even if the default stacking rules would allow for it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd]
Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390]
Acked-by: Coly Li <colyli@suse.de> [bcache]
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:54 +0000 (06:52 +0200)]
block: add a bdev_max_discard_sectors helper
Add a helper to query the number of sectors support per each discard bio
based on the block device and use this helper to stop various places from
poking into the request_queue to see if discard is supported and if so how
much. This mirrors what is done e.g. for write zeroes as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd]
Acked-by: Coly Li <colyli@suse.de> [bcache]
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-24-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:53 +0000 (06:52 +0200)]
block: refactor discard bio size limiting
Move all the logic to limit the discard bio size into a common helper
so that it is better documented.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Coly Li <colyli@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-23-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:52 +0000 (06:52 +0200)]
block: move {bdev,queue_limit}_discard_alignment out of line
No need to inline these fairly larger helpers. Also fix the return value
to be unsigned, just like the field in struct queue_limits.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220415045258.199825-22-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:51 +0000 (06:52 +0200)]
block: use bdev_discard_alignment in part_discard_alignment_show
Use the bdev based alignment helper instead of open coding it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220415045258.199825-21-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:50 +0000 (06:52 +0200)]
block: remove queue_discard_alignment
Just use bdev_alignment_offset in disk_discard_alignment_show instead.
That helpers is the same except for an always false branch that doesn't
matter in this slow path.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220415045258.199825-20-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:49 +0000 (06:52 +0200)]
block: move bdev_alignment_offset and queue_limit_alignment_offset out of line
No need to inline these fairly larger helpers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220415045258.199825-19-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:48 +0000 (06:52 +0200)]
block: use bdev_alignment_offset in disk_alignment_offset_show
This does the same as the open coded variant except for an extra branch,
and allows to remove queue_alignment_offset entirely.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220415045258.199825-18-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:47 +0000 (06:52 +0200)]
block: use bdev_alignment_offset in part_alignment_offset_show
Replace the open coded offset calculation with the proper helper.
This is an ABI change in that the -1 for a misaligned partition is
properly propagated, which can be considered a bug fix and matches
what is done on the whole device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-17-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:46 +0000 (06:52 +0200)]
block: add a bdev_max_zone_append_sectors helper
Add a helper to check the max supported sectors for zone append based on
the block_device instead of having to poke into the block layer internal
request_queue.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:45 +0000 (06:52 +0200)]
block: add a bdev_stable_writes helper
Add a helper to check the stable writes flag based on the block_device
instead of having to poke into the block layer internal request_queue.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:44 +0000 (06:52 +0200)]
block: add a bdev_fua helper
Add a helper to check the FUA flag based on the block_device instead of
having to poke into the block layer internal request_queue.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:43 +0000 (06:52 +0200)]
block: add a bdev_write_cache helper
Add a helper to check the write cache flag based on the block_device
instead of having to poke into the block layer internal request_queue.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:42 +0000 (06:52 +0200)]
block: add a bdev_nonrot helper
Add a helper to check the nonrot flag based on the block_device instead
of having to poke into the block layer internal request_queue.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:41 +0000 (06:52 +0200)]
mm: use bdev_is_zoned in claim_swapfile
Use the bdev based helper instead of poking into the queue.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:40 +0000 (06:52 +0200)]
ntfs3: use bdev_logical_block_size instead of open coding it
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220415045258.199825-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:39 +0000 (06:52 +0200)]
btrfs: use bdev_max_active_zones instead of open coding it
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Acked-by: David Sterba <dsterba@suse.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:38 +0000 (06:52 +0200)]
drbd: cleanup decide_on_discard_support
Sanitize the calling conventions and use a goto label to cleanup the
code flow.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220415045258.199825-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:37 +0000 (06:52 +0200)]
drbd: use bdev_alignment_offset instead of queue_alignment_offset
The bdev version does the right thing for partitions, so use that.
Fixes: 9104d31a759f ("drbd: introduce WRITE_SAME support")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220415045258.199825-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:36 +0000 (06:52 +0200)]
drbd: use bdev based limit helpers in drbd_send_sizes
Use the bdev based limits helpers where they exist.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220415045258.199825-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:35 +0000 (06:52 +0200)]
drbd: remove assign_p_sizes_qlim
Fold each branch into its only caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220415045258.199825-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:34 +0000 (06:52 +0200)]
target: fix discard alignment on partitions
Use the proper bdev_discard_alignment helper that accounts for partition
offsets.
Fixes: c66ac9db8d4a ("[SCSI] target: Add LIO target core v4.0.0-rc6")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:33 +0000 (06:52 +0200)]
target: pass a block_device to target_configure_unmap_from_queue
The SCSI target drivers is a consumer of the block layer and shoul
d generally work on struct block_device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Fri, 15 Apr 2022 04:52:32 +0000 (06:52 +0200)]
target: remove an incorrect unmap zeroes data deduction
For block devices, the SCSI target drivers implements UNMAP as calls to
blkdev_issue_discard, which does not guarantee zeroing just because
Write Zeroes is supported.
Note that this does not affect the file backed path which uses
fallocate to punch holes.
Fixes: 2237498f0b5c ("target/iblock: Convert WRITE_SAME to blkdev_issue_zeroout")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jan Kara [Fri, 1 Apr 2022 10:27:50 +0000 (12:27 +0200)]
bfq: Make sure bfqg for which we are queueing requests is online
Bios queued into BFQ IO scheduler can be associated with a cgroup that
was already offlined. This may then cause insertion of this bfq_group
into a service tree. But this bfq_group will get freed as soon as last
bio associated with it is completed leading to use after free issues for
service tree users. Fix the problem by making sure we always operate on
online bfq_group. If the bfq_group associated with the bio is not
online, we pick the first online parent.
CC: stable@vger.kernel.org
Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support")
Tested-by: "yukuai (C)" <yukuai3@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-9-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jan Kara [Fri, 1 Apr 2022 10:27:49 +0000 (12:27 +0200)]
bfq: Get rid of __bio_blkcg() usage
BFQ usage of __bio_blkcg() is a relict from the past. Furthermore if bio
would not be associated with any blkcg, the usage of __bio_blkcg() in
BFQ is prone to races with the task being migrated between cgroups as
__bio_blkcg() calls at different places could return different blkcgs.
Convert BFQ to the new situation where bio->bi_blkg is initialized in
bio_set_dev() and thus practically always valid. This allows us to save
blkcg_gq lookup and noticeably simplify the code.
CC: stable@vger.kernel.org
Fixes: 0fe061b9f03c ("blkcg: fix ref count issue with bio_blkcg() using task_css")
Tested-by: "yukuai (C)" <yukuai3@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-8-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jan Kara [Fri, 1 Apr 2022 10:27:48 +0000 (12:27 +0200)]
bfq: Track whether bfq_group is still online
Track whether bfq_group is still online. We cannot rely on
blkcg_gq->online because that gets cleared only after all policies are
offlined and we need something that gets updated already under
bfqd->lock when we are cleaning up our bfq_group to be able to guarantee
that when we see online bfq_group, it will stay online while we are
holding bfqd->lock lock.
CC: stable@vger.kernel.org
Tested-by: "yukuai (C)" <yukuai3@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-7-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jan Kara [Fri, 1 Apr 2022 10:27:47 +0000 (12:27 +0200)]
bfq: Remove pointless bfq_init_rq() calls
We call bfq_init_rq() from request merging functions where requests we
get should have already gone through bfq_init_rq() during insert and
anyway we want to do anything only if the request is already tracked by
BFQ. So replace calls to bfq_init_rq() with RQ_BFQQ() instead to simply
skip requests untracked by BFQ. We move bfq_init_rq() call in
bfq_insert_request() a bit earlier to cover request merging and thus
can transfer FIFO position in case of a merge.
CC: stable@vger.kernel.org
Tested-by: "yukuai (C)" <yukuai3@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-6-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jan Kara [Fri, 1 Apr 2022 10:27:46 +0000 (12:27 +0200)]
bfq: Drop pointless unlock-lock pair
In bfq_insert_request() we unlock bfqd->lock only to call
trace_block_rq_insert() and then lock bfqd->lock again. This is really
pointless since tracing is disabled if we really care about performance
and even if the tracepoint is enabled, it is a quick call.
CC: stable@vger.kernel.org
Tested-by: "yukuai (C)" <yukuai3@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-5-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jan Kara [Fri, 1 Apr 2022 10:27:45 +0000 (12:27 +0200)]
bfq: Update cgroup information before merging bio
When the process is migrated to a different cgroup (or in case of
writeback just starts submitting bios associated with a different
cgroup) bfq_merge_bio() can operate with stale cgroup information in
bic. Thus the bio can be merged to a request from a different cgroup or
it can result in merging of bfqqs for different cgroups or bfqqs of
already dead cgroups and causing possible use-after-free issues. Fix the
problem by updating cgroup information in bfq_merge_bio().
CC: stable@vger.kernel.org
Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support")
Tested-by: "yukuai (C)" <yukuai3@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-4-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jan Kara [Fri, 1 Apr 2022 10:27:44 +0000 (12:27 +0200)]
bfq: Split shared queues on move between cgroups
When bfqq is shared by multiple processes it can happen that one of the
processes gets moved to a different cgroup (or just starts submitting IO
for different cgroup). In case that happens we need to split the merged
bfqq as otherwise we will have IO for multiple cgroups in one bfqq and
we will just account IO time to wrong entities etc.
Similarly if the bfqq is scheduled to merge with another bfqq but the
merge didn't happen yet, cancel the merge as it need not be valid
anymore.
CC: stable@vger.kernel.org
Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support")
Tested-by: "yukuai (C)" <yukuai3@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-3-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jan Kara [Fri, 1 Apr 2022 10:27:43 +0000 (12:27 +0200)]
bfq: Avoid merging queues with different parents
It can happen that the parent of a bfqq changes between the moment we
decide two queues are worth to merge (and set bic->stable_merge_bfqq)
and the moment bfq_setup_merge() is called. This can happen e.g. because
the process submitted IO for a different cgroup and thus bfqq got
reparented. It can even happen that the bfqq we are merging with has
parent cgroup that is already offline and going to be destroyed in which
case the merge can lead to use-after-free issues such as:
BUG: KASAN: use-after-free in __bfq_deactivate_entity+0x9cb/0xa50
Read of size 8 at addr
ffff88800693c0c0 by task runc:[2:INIT]/10544
CPU: 0 PID: 10544 Comm: runc:[2:INIT] Tainted: G E 5.15.2-0.g5fb85fd-default #1 openSUSE Tumbleweed (unreleased)
f1f3b891c72369aebecd2e43e4641a6358867c70
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0x46/0x5a
print_address_description.constprop.0+0x1f/0x140
? __bfq_deactivate_entity+0x9cb/0xa50
kasan_report.cold+0x7f/0x11b
? __bfq_deactivate_entity+0x9cb/0xa50
__bfq_deactivate_entity+0x9cb/0xa50
? update_curr+0x32f/0x5d0
bfq_deactivate_entity+0xa0/0x1d0
bfq_del_bfqq_busy+0x28a/0x420
? resched_curr+0x116/0x1d0
? bfq_requeue_bfqq+0x70/0x70
? check_preempt_wakeup+0x52b/0xbc0
__bfq_bfqq_expire+0x1a2/0x270
bfq_bfqq_expire+0xd16/0x2160
? try_to_wake_up+0x4ee/0x1260
? bfq_end_wr_async_queues+0xe0/0xe0
? _raw_write_unlock_bh+0x60/0x60
? _raw_spin_lock_irq+0x81/0xe0
bfq_idle_slice_timer+0x109/0x280
? bfq_dispatch_request+0x4870/0x4870
__hrtimer_run_queues+0x37d/0x700
? enqueue_hrtimer+0x1b0/0x1b0
? kvm_clock_get_cycles+0xd/0x10
? ktime_get_update_offsets_now+0x6f/0x280
hrtimer_interrupt+0x2c8/0x740
Fix the problem by checking that the parent of the two bfqqs we are
merging in bfq_setup_merge() is the same.
Link: https://lore.kernel.org/linux-block/20211125172809.GC19572@quack2.suse.cz/
CC: stable@vger.kernel.org
Fixes: 430a67f9d616 ("block, bfq: merge bursts of newly-created queues")
Tested-by: "yukuai (C)" <yukuai3@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-2-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jan Kara [Fri, 1 Apr 2022 10:27:42 +0000 (12:27 +0200)]
bfq: Avoid false marking of bic as stably merged
bfq_setup_cooperator() can mark bic as stably merged even though it
decides to not merge its bfqqs (when bfq_setup_merge() returns NULL).
Make sure to mark bic as stably merged only if we are really going to
merge bfqqs.
CC: stable@vger.kernel.org
Tested-by: "yukuai (C)" <yukuai3@huawei.com>
Fixes: 430a67f9d616 ("block, bfq: merge bursts of newly-created queues")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-1-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 6 Apr 2022 06:12:28 +0000 (08:12 +0200)]
pktcdvd: stop using bio_reset
Just initialize the bios on-demand.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220406061228.410163-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 6 Apr 2022 06:12:27 +0000 (08:12 +0200)]
block: turn bio_kmalloc into a simple kmalloc wrapper
Remove the magic autofree semantics and require the callers to explicitly
call bio_init to initialize the bio.
This allows bio_free to catch accidental bio_put calls on bio_init()ed
bios as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Coly Li <colyli@suse.de>
Acked-by: Mike Snitzer <snitzer@kernel.org>
Link: https://lore.kernel.org/r/20220406061228.410163-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 6 Apr 2022 06:12:26 +0000 (08:12 +0200)]
target/pscsi: remove pscsi_get_bio
Remove pscsi_get_bio and simplify the code flow in the only caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220406061228.410163-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 6 Apr 2022 06:12:25 +0000 (08:12 +0200)]
squashfs: always use bio_kmalloc in squashfs_bio_read
If a plain kmalloc that is not backed by a mempool is safe here for a
large read (and the actual page allocations), it must also be for a
small one, so simplify the code a bit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Phillip Lougher <phillip@squashfs.org.uk>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220406061228.410163-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Christoph Hellwig [Wed, 6 Apr 2022 06:12:24 +0000 (08:12 +0200)]
btrfs: simplify ->flush_bio handling
Use and embedded bios that is initialized when used instead of
bio_kmalloc plus bio_reset.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220406061228.410163-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Mike Snitzer [Thu, 24 Mar 2022 20:35:25 +0000 (16:35 -0400)]
block: allow use of per-cpu bio alloc cache by block drivers
Refine per-cpu bio alloc cache interfaces so that DM and other block
drivers can properly create and use the cache:
DM uses bioset_init_from_src() to do its final bioset initialization,
so must update bioset_init_from_src() to set BIOSET_PERCPU_CACHE if
%src bioset has a cache.
Also move bio_clear_polled() to include/linux/bio.h to allow users
outside of block core.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220324203526.62306-3-snitzer@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Mike Snitzer [Thu, 24 Mar 2022 20:35:24 +0000 (16:35 -0400)]
block: allow using the per-cpu bio cache from bio_alloc_bioset
Replace the BIO_PERCPU_CACHE bio-internal flag with a REQ_ALLOC_CACHE
one that can be passed to bio_alloc / bio_alloc_bioset, and implement
the percpu cache allocation logic in a helper called from
bio_alloc_bioset. This allows any bio_alloc_bioset user to use the
percpu caches instead of having the functionality tied to struct kiocb.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
[hch: refactored a bit]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220324203526.62306-2-snitzer@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Sun, 17 Apr 2022 20:57:31 +0000 (13:57 -0700)]
Linux 5.18-rc3
Linus Torvalds [Sun, 17 Apr 2022 17:29:10 +0000 (10:29 -0700)]
Merge tag 'for-linus-5.18-rc3-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixlet from Juergen Gross:
"A single cleanup patch for the Xen balloon driver"
* tag 'for-linus-5.18-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/balloon: don't use PV mode extra memory for zone device allocations
Linus Torvalds [Sun, 17 Apr 2022 16:55:59 +0000 (09:55 -0700)]
Merge tag 'x86-urgent-2022-04-17' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"Two x86 fixes related to TSX:
- Use either MSR_TSX_FORCE_ABORT or MSR_IA32_TSX_CTRL to disable TSX
to cover all CPUs which allow to disable it.
- Disable TSX development mode at boot so that a microcode update
which provides TSX development mode does not suddenly make the
system vulnerable to TSX Asynchronous Abort"
* tag 'x86-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tsx: Disable TSX development mode at boot
x86/tsx: Use MSR_TSX_CTRL to clear CPUID bits
Linus Torvalds [Sun, 17 Apr 2022 16:53:01 +0000 (09:53 -0700)]
Merge tag 'timers-urgent-2022-04-17' of git://git./linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"A small set of fixes for the timers core:
- Fix the warning condition in __run_timers() which does not take
into account that a CPU base (especially the deferrable base) never
has a timer armed on it and therefore the next_expiry value can
become stale.
- Replace a WARN_ON() in the NOHZ code with a WARN_ON_ONCE() to
prevent endless spam in dmesg.
- Remove the double star from a comment which is not meant to be in
kernel-doc format"
* tag 'timers-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/sched: Fix non-kernel-doc comment
tick/nohz: Use WARN_ON_ONCE() to prevent console saturation
timers: Fix warning condition in __run_timers()
Linus Torvalds [Sun, 17 Apr 2022 16:46:15 +0000 (09:46 -0700)]
Merge tag 'smp-urgent-2022-04-17' of git://git./linux/kernel/git/tip/tip
Pull SMP fixes from Thomas Gleixner:
"Two fixes for the SMP core:
- Make the warning condition in flush_smp_call_function_queue()
correct, which checked a just emptied list head for being empty
instead of validating that there was no pending entry on the
offlined CPU at all.
- The @cpu member of struct cpuhp_cpu_state is initialized when the
CPU hotplug thread for the upcoming CPU is created. That's too late
because the creation of the thread can fail and then the following
rollback operates on CPU0. Get rid of the CPU member and hand the
CPU number to the involved functions directly"
* tag 'smp-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Remove the 'cpu' member of cpuhp_cpu_state
smp: Fix offline cpu check in flush_smp_call_function_queue()
Linus Torvalds [Sun, 17 Apr 2022 16:42:03 +0000 (09:42 -0700)]
Merge tag 'irq-urgent-2022-04-17' of git://git./linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
"A single fix for the interrupt affinity spreading logic to take into
account that there can be an imbalance between present and possible
CPUs, which causes already assigned bits to be overwritten"
* tag 'irq-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/affinity: Consider that CPUs on nodes can be unbalanced
Linus Torvalds [Sun, 17 Apr 2022 16:36:27 +0000 (09:36 -0700)]
Merge tag 'for-v5.18-rc' of git://git./linux/kernel/git/sre/linux-power-supply
Pull power supply fixes from Sebastian Reichel:
- Fix a regression with battery data failing to load from DT
* tag 'for-v5.18-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
power: supply: Reset err after not finding static battery
power: supply: samsung-sdi-battery: Add missing charge restart voltages
Linus Torvalds [Sun, 17 Apr 2022 16:31:27 +0000 (09:31 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Regular set of fixes for drivers and the dev-interface"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: ismt: Fix undefined behavior due to shift overflowing the constant
i2c: dev: Force case user pointers in compat_i2cdev_ioctl()
i2c: dev: check return value when calling dev_set_name()
i2c: qcom-geni: Use dev_err_probe() for GPI DMA error
i2c: imx: Implement errata ERR007805 or e7805 bus frequency limit
i2c: pasemi: Wait for write xfers to finish
Linus Torvalds [Sun, 17 Apr 2022 00:07:50 +0000 (17:07 -0700)]
Merge tag 'devicetree-fixes-for-5.18-2' of git://git./linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Fix scalar property schemas with array constraints
- Fix 'enum' lists with duplicate entries
- Fix incomplete if/then/else schemas
- Add Renesas RZ/V2L SoC support to Mali Bifrost binding
- Maintainers update for Marvell irqchip
* tag 'devicetree-fixes-for-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: display: panel-timing: Define a single type for properties
dt-bindings: Fix array constraints on scalar properties
dt-bindings: gpu: mali-bifrost: Document RZ/V2L SoC
dt-bindings: net: snps: remove duplicate name
dt-bindings: Fix 'enum' lists with duplicate entries
dt-bindings: irqchip: mrvl,intc: refresh maintainers
dt-bindings: Fix incomplete if/then/else schemas
dt-bindings: power: renesas,apmu: Fix cpus property limits
dt-bindings: extcon: maxim,max77843: fix ports type
Linus Torvalds [Sun, 17 Apr 2022 00:01:43 +0000 (17:01 -0700)]
Merge tag 'gpio-fixes-for-v5.18-rc3' of git://git./linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
"A single fix for gpio-sim and two patches for GPIO ACPI pulled from
Andy:
- fix the set/get_multiple() callbacks in gpio-sim
- use correct format characters in gpiolib-acpi
- use an unsigned type for pins in gpiolib-acpi"
* tag 'gpio-fixes-for-v5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: sim: fix setting and getting multiple lines
gpiolib: acpi: Convert type for pin to be unsigned
gpiolib: acpi: use correct format characters
Linus Torvalds [Sat, 16 Apr 2022 23:51:39 +0000 (16:51 -0700)]
Merge tag 'soc-fixes-5.18-2' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"There are a number of SoC bugfixes that came in since the merge
window, and more of them are already pending.
This batch includes:
- A boot time regression fix for davinci that triggered on
multi_v5_defconfig when booting any platform
- Defconfig updates to address removed features, changed symbol names
or dependencies, for gemini, ux500, and pxa
- Email address changes for Krzysztof Kozlowski
- Build warning fixes for ep93xx and iop32x
- Devicetree warning fixes across many platforms
- Minor bugfixes for the reset controller, memory controller and SCMI
firmware subsystems plus the versatile-express board"
* tag 'soc-fixes-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (34 commits)
ARM: config: Update Gemini defconfig
arm64: dts: qcom/sdm845-shift-axolotl: Fix boolean properties with values
ARM: dts: align SPI NOR node name with dtschema
ARM: dts: Fix more boolean properties with values
arm/arm64: dts: qcom: Fix boolean properties with values
arm64: dts: imx: Fix imx8*-var-som touchscreen property sizes
arm: dts: imx: Fix boolean properties with values
arm64: dts: tegra: Fix boolean properties with values
arm: dts: at91: Fix boolean properties with values
arm: configs: imote2: Drop defconfig as board support dropped.
ep93xx: clock: Don't use plain integer as NULL pointer
ep93xx: clock: Fix UAF in ep93xx_clk_register_gate()
ARM: vexpress/spc: Fix all the kernel-doc build warnings
ARM: vexpress/spc: Fix kernel-doc build warning for ve_spc_cpu_in_wfi
ARM: config: u8500: Re-enable AB8500 battery charging
ARM: config: u8500: Add some common hardware
memory: fsl_ifc: populate child nodes of buses and mfd devices
ARM: config: Refresh U8500 defconfig
firmware: arm_scmi: Fix sparse warnings in OPTEE transport driver
firmware: arm_scmi: Replace zero-length array with flexible-array member
...
Linus Torvalds [Sat, 16 Apr 2022 23:42:53 +0000 (16:42 -0700)]
Merge tag 'random-5.18-rc3-for-linus' of git://git./linux/kernel/git/crng/random
Pull random number generator fixes from Jason Donenfeld:
- Per your suggestion, random reads now won't fail if there's a page
fault after some non-zero amount of data has been read, which makes
the behavior consistent with all other reads in the kernel.
- Rather than an inconsistent mix of random_get_entropy() returning an
unsigned long or a cycles_t, now it just returns an unsigned long.
- A memcpy() was replaced with an memmove(), because the addresses are
sometimes overlapping. In practice the destination is always before
the source, so not really an issue, but better to be correct than
not.
* tag 'random-5.18-rc3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: use memmove instead of memcpy for remaining 32 bytes
random: make random_get_entropy() return an unsigned long
random: allow partial reads if later user copies fail
Linus Torvalds [Sat, 16 Apr 2022 20:38:26 +0000 (13:38 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"13 fixes, all in drivers.
The most extensive changes are in the iscsi series (affecting drivers
qedi, cxgbi and bnx2i), the next most is scsi_debug, but that's just a
simple revert and then minor updates to pm80xx"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: iscsi: MAINTAINERS: Add Mike Christie as co-maintainer
scsi: qedi: Fix failed disconnect handling
scsi: iscsi: Fix NOP handling during conn recovery
scsi: iscsi: Merge suspend fields
scsi: iscsi: Fix unbound endpoint error handling
scsi: iscsi: Fix conn cleanup and stop race during iscsid restart
scsi: iscsi: Fix endpoint reuse regression
scsi: iscsi: Release endpoint ID when its freed
scsi: iscsi: Fix offload conn cleanup when iscsid restarts
scsi: iscsi: Move iscsi_ep_disconnect()
scsi: pm80xx: Enable upper inbound, outbound queues
scsi: pm80xx: Mask and unmask upper interrupt vectors 32-63
Revert "scsi: scsi_debug: Address races following module load"
Bartosz Golaszewski [Sat, 16 Apr 2022 19:57:00 +0000 (21:57 +0200)]
Merge tag 'intel-gpio-v5.18-2' of gitolite.pub/scm/linux/kernel/git/andy/linux-gpio-intel into gpio/for-current
intel-gpio for v5.18-2
* Couple of fixes related to handling unsigned value of the pin from ACPI
gpiolib:
- acpi: Convert type for pin to be unsigned
- acpi: use correct format characters
Linus Torvalds [Sat, 16 Apr 2022 18:20:21 +0000 (11:20 -0700)]
Merge tag 'dma-mapping-5.18-2' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fix from Christoph Hellwig:
- avoid a double memory copy for swiotlb (Chao Gao)
* tag 'dma-mapping-5.18-2' of git://git.infradead.org/users/hch/dma-mapping:
dma-direct: avoid redundant memory sync for swiotlb
Jason A. Donenfeld [Wed, 13 Apr 2022 23:50:38 +0000 (01:50 +0200)]
random: use memmove instead of memcpy for remaining 32 bytes
In order to immediately overwrite the old key on the stack, before
servicing a userspace request for bytes, we use the remaining 32 bytes
of block 0 as the key. This means moving indices 8,9,a,b,c,d,e,f ->
4,5,6,7,8,9,a,b. Since 4 < 8, for the kernel implementations of
memcpy(), this doesn't actually appear to be a problem in practice. But
relying on that characteristic seems a bit brittle. So let's change that
to a proper memmove(), which is the by-the-books way of handling
overlapping memory copies.
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Linus Torvalds [Fri, 15 Apr 2022 22:57:18 +0000 (15:57 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"14 patches.
Subsystems affected by this patch series: MAINTAINERS, binfmt, and
mm (tmpfs, secretmem, kasan, kfence, pagealloc, zram, compaction,
hugetlb, vmalloc, and kmemleak)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: kmemleak: take a full lowmem check in kmemleak_*_phys()
mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE"
revert "fs/binfmt_elf: fix PT_LOAD p_align values for loaders"
hugetlb: do not demote poisoned hugetlb pages
mm: compaction: fix compiler warning when CONFIG_COMPACTION=n
mm: fix unexpected zeroed page mapping with zram swap
mm, page_alloc: fix build_zonerefs_node()
mm, kfence: support kmem_dump_obj() for KFENCE objects
kasan: fix hw tags enablement when KUNIT tests are disabled
irq_work: use kasan_record_aux_stack_noalloc() record callstack
mm/secretmem: fix panic when growing a memfd_secret
tmpfs: fix regressions from wider use of ZERO_PAGE
MAINTAINERS: Broadcom internal lists aren't maintainers