block: handle BLK_OPEN_RESTRICT_WRITES correctly
authorChristian Brauner <brauner@kernel.org>
Sat, 23 Mar 2024 16:11:19 +0000 (17:11 +0100)
committerChristian Brauner <brauner@kernel.org>
Wed, 27 Mar 2024 08:31:41 +0000 (09:31 +0100)
Last kernel release we introduce CONFIG_BLK_DEV_WRITE_MOUNTED. By
default this option is set. When it is set the long-standing behavior
of being able to write to mounted block devices is enabled.

But in order to guard against unintended corruption by writing to the
block device buffer cache CONFIG_BLK_DEV_WRITE_MOUNTED can be turned
off. In that case it isn't possible to write to mounted block devices
anymore.

A filesystem may open its block devices with BLK_OPEN_RESTRICT_WRITES
which disallows concurrent BLK_OPEN_WRITE access. When we still had the
bdev handle around we could recognize BLK_OPEN_RESTRICT_WRITES because
the mode was passed around. Since we managed to get rid of the bdev
handle we changed that logic to recognize BLK_OPEN_RESTRICT_WRITES based
on whether the file was opened writable and writes to that block device
are blocked. That logic doesn't work because we do allow
BLK_OPEN_RESTRICT_WRITES to be specified without BLK_OPEN_WRITE.

Fix the detection logic and use an FMODE_* bit. We could've also abused
O_EXCL as an indicator that BLK_OPEN_RESTRICT_WRITES has been requested.
For userspace open paths O_EXCL will never be retained but for internal
opens where we open files that are never installed into a file
descriptor table this is fine. But it would be a gamble that this
doesn't cause bugs. Note that BLK_OPEN_RESTRICT_WRITES is an internal
only flag that cannot directly be raised by userspace. It is implicitly
raised during mounting.

Passes xftests and blktests with CONFIG_BLK_DEV_WRITE_MOUNTED set and
unset.

Link: https://lore.kernel.org/r/ZfyyEwu9Uq5Pgb94@casper.infradead.org
Link: https://lore.kernel.org/r/20240323-zielbereich-mittragen-6fdf14876c3e@brauner
Fixes: 321de651fa56 ("block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access")
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
block/bdev.c
include/linux/fs.h

index 7a5f611c3d2e3e83eb00be12b9131f49d8348f5e..ab9cae0e05f1404163dc04a8780f0215ba63a780 100644 (file)
@@ -821,13 +821,11 @@ static void bdev_yield_write_access(struct file *bdev_file)
                return;
 
        bdev = file_bdev(bdev_file);
-       /* Yield exclusive or shared write access. */
-       if (bdev_file->f_mode & FMODE_WRITE) {
-               if (bdev_writes_blocked(bdev))
-                       bdev_unblock_writes(bdev);
-               else
-                       bdev->bd_writers--;
-       }
+
+       if (bdev_file->f_mode & FMODE_WRITE_RESTRICTED)
+               bdev_unblock_writes(bdev);
+       else if (bdev_file->f_mode & FMODE_WRITE)
+               bdev->bd_writers--;
 }
 
 /**
@@ -907,6 +905,8 @@ int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder,
        bdev_file->f_mode |= FMODE_BUF_RASYNC | FMODE_CAN_ODIRECT;
        if (bdev_nowait(bdev))
                bdev_file->f_mode |= FMODE_NOWAIT;
+       if (mode & BLK_OPEN_RESTRICT_WRITES)
+               bdev_file->f_mode |= FMODE_WRITE_RESTRICTED;
        bdev_file->f_mapping = bdev->bd_inode->i_mapping;
        bdev_file->f_wb_err = filemap_sample_wb_err(bdev_file->f_mapping);
        bdev_file->private_data = holder;
index 00fc429b0af0fb9bbab2382a9e347fdbac383981..8dfd53b52744a4dfffb8ccb350364972658f00eb 100644 (file)
@@ -121,6 +121,8 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 #define FMODE_PWRITE           ((__force fmode_t)0x10)
 /* File is opened for execution with sys_execve / sys_uselib */
 #define FMODE_EXEC             ((__force fmode_t)0x20)
+/* File writes are restricted (block device specific) */
+#define FMODE_WRITE_RESTRICTED  ((__force fmode_t)0x40)
 /* 32bit hashes as llseek() offset (for directories) */
 #define FMODE_32BITHASH         ((__force fmode_t)0x200)
 /* 64bit hashes as llseek() offset (for directories) */