From: Linus Torvalds Date: Mon, 30 Mar 2020 19:45:23 +0000 (-0700) Subject: Merge tag 'docs-5.7' of git://git.lwn.net/linux X-Git-Url: http://git.maquefel.me/?a=commitdiff_plain;h=481ed297d900af0ce395f6ca8975903b76a5a59e;p=linux.git Merge tag 'docs-5.7' of git://git.lwn.net/linux Pull documentation updates from Jonathan Corbet: "This has been a busy cycle for documentation work. Highlights include: - Lots of RST conversion work by Mauro, Daniel ALmeida, and others. Maybe someday we'll get to the end of this stuff...maybe... - Some organizational work to bring some order to the core-api manual. - Various new docs and additions to the existing documentation. - Typo fixes, warning fixes, ..." * tag 'docs-5.7' of git://git.lwn.net/linux: (123 commits) Documentation: x86: exception-tables: document CONFIG_BUILDTIME_TABLE_SORT MAINTAINERS: adjust to filesystem doc ReST conversion docs: deprecated.rst: Add BUG()-family doc: zh_CN: add translation for virtiofs doc: zh_CN: index files in filesystems subdirectory docs: locking: Drop :c:func: throughout docs: locking: Add 'need' to hardirq section docs: conf.py: avoid thousands of duplicate label warning on Sphinx docs: prevent warnings due to autosectionlabel docs: fix reference to core-api/namespaces.rst docs: fix pointers to io-mapping.rst and io_ordering.rst files Documentation: Better document the softlockup_panic sysctl docs: hw-vuln: tsx_async_abort.rst: get rid of an unused ref docs: perf: imx-ddr.rst: get rid of a warning docs: filesystems: fuse.rst: supress a Sphinx warning docs: translations: it: avoid duplicate refs at programming-language.rst docs: driver.rst: supress two ReSt warnings docs: trace: events.rst: convert some new stuff to ReST format Documentation: Add io_ordering.rst to driver-api manual Documentation: Add io-mapping.rst to driver-api manual ... --- 481ed297d900af0ce395f6ca8975903b76a5a59e diff --cc Documentation/filesystems/debugfs.rst index 0000000000000,c89d2d335dfbd..80f332b8eb68a mode 000000,100644..100644 --- a/Documentation/filesystems/debugfs.rst +++ b/Documentation/filesystems/debugfs.rst @@@ -1,0 -1,247 +1,247 @@@ + .. SPDX-License-Identifier: GPL-2.0 + .. include:: + + ======= + DebugFS + ======= + + Copyright |copy| 2009 Jonathan Corbet + + Debugfs exists as a simple way for kernel developers to make information + available to user space. Unlike /proc, which is only meant for information + about a process, or sysfs, which has strict one-value-per-file rules, + debugfs has no rules at all. Developers can put any information they want + there. The debugfs filesystem is also intended to not serve as a stable + ABI to user space; in theory, there are no stability constraints placed on + files exported there. The real world is not always so simple, though [1]_; + even debugfs interfaces are best designed with the idea that they will need + to be maintained forever. + + Debugfs is typically mounted with a command like:: + + mount -t debugfs none /sys/kernel/debug + + (Or an equivalent /etc/fstab line). + The debugfs root directory is accessible only to the root user by + default. To change access to the tree the "uid", "gid" and "mode" mount + options can be used. + + Note that the debugfs API is exported GPL-only to modules. + + Code using debugfs should include . Then, the first order + of business will be to create at least one directory to hold a set of + debugfs files:: + + struct dentry *debugfs_create_dir(const char *name, struct dentry *parent); + + This call, if successful, will make a directory called name underneath the + indicated parent directory. If parent is NULL, the directory will be + created in the debugfs root. On success, the return value is a struct + dentry pointer which can be used to create files in the directory (and to + clean it up at the end). An ERR_PTR(-ERROR) return value indicates that + something went wrong. If ERR_PTR(-ENODEV) is returned, that is an + indication that the kernel has been built without debugfs support and none + of the functions described below will work. + + The most general way to create a file within a debugfs directory is with:: + + struct dentry *debugfs_create_file(const char *name, umode_t mode, + struct dentry *parent, void *data, + const struct file_operations *fops); + + Here, name is the name of the file to create, mode describes the access + permissions the file should have, parent indicates the directory which + should hold the file, data will be stored in the i_private field of the + resulting inode structure, and fops is a set of file operations which + implement the file's behavior. At a minimum, the read() and/or write() + operations should be provided; others can be included as needed. Again, + the return value will be a dentry pointer to the created file, + ERR_PTR(-ERROR) on error, or ERR_PTR(-ENODEV) if debugfs support is + missing. + + Create a file with an initial size, the following function can be used + instead:: + + struct dentry *debugfs_create_file_size(const char *name, umode_t mode, + struct dentry *parent, void *data, + const struct file_operations *fops, + loff_t file_size); + + file_size is the initial file size. The other parameters are the same + as the function debugfs_create_file. + + In a number of cases, the creation of a set of file operations is not + actually necessary; the debugfs code provides a number of helper functions + for simple situations. Files containing a single integer value can be + created with any of:: + + void debugfs_create_u8(const char *name, umode_t mode, + struct dentry *parent, u8 *value); + void debugfs_create_u16(const char *name, umode_t mode, + struct dentry *parent, u16 *value); + struct dentry *debugfs_create_u32(const char *name, umode_t mode, + struct dentry *parent, u32 *value); + void debugfs_create_u64(const char *name, umode_t mode, + struct dentry *parent, u64 *value); + + These files support both reading and writing the given value; if a specific + file should not be written to, simply set the mode bits accordingly. The + values in these files are in decimal; if hexadecimal is more appropriate, + the following functions can be used instead:: + + void debugfs_create_x8(const char *name, umode_t mode, + struct dentry *parent, u8 *value); + void debugfs_create_x16(const char *name, umode_t mode, + struct dentry *parent, u16 *value); + void debugfs_create_x32(const char *name, umode_t mode, + struct dentry *parent, u32 *value); + void debugfs_create_x64(const char *name, umode_t mode, + struct dentry *parent, u64 *value); + + These functions are useful as long as the developer knows the size of the + value to be exported. Some types can have different widths on different + architectures, though, complicating the situation somewhat. There are + functions meant to help out in such special cases:: + + void debugfs_create_size_t(const char *name, umode_t mode, + struct dentry *parent, size_t *value); + + As might be expected, this function will create a debugfs file to represent + a variable of type size_t. + + Similarly, there are helpers for variables of type unsigned long, in decimal + and hexadecimal:: + + struct dentry *debugfs_create_ulong(const char *name, umode_t mode, + struct dentry *parent, + unsigned long *value); + void debugfs_create_xul(const char *name, umode_t mode, + struct dentry *parent, unsigned long *value); + + Boolean values can be placed in debugfs with:: + + struct dentry *debugfs_create_bool(const char *name, umode_t mode, + struct dentry *parent, bool *value); + + A read on the resulting file will yield either Y (for non-zero values) or + N, followed by a newline. If written to, it will accept either upper- or + lower-case values, or 1 or 0. Any other input will be silently ignored. + + Also, atomic_t values can be placed in debugfs with:: + + void debugfs_create_atomic_t(const char *name, umode_t mode, + struct dentry *parent, atomic_t *value) + + A read of this file will get atomic_t values, and a write of this file + will set atomic_t values. + + Another option is exporting a block of arbitrary binary data, with + this structure and function:: + + struct debugfs_blob_wrapper { + void *data; + unsigned long size; + }; + + struct dentry *debugfs_create_blob(const char *name, umode_t mode, + struct dentry *parent, + struct debugfs_blob_wrapper *blob); + + A read of this file will return the data pointed to by the + debugfs_blob_wrapper structure. Some drivers use "blobs" as a simple way + to return several lines of (static) formatted text output. This function + can be used to export binary information, but there does not appear to be + any code which does so in the mainline. Note that all files created with + debugfs_create_blob() are read-only. + + If you want to dump a block of registers (something that happens quite + often during development, even if little such code reaches mainline. + Debugfs offers two functions: one to make a registers-only file, and + another to insert a register block in the middle of another sequential + file:: + + struct debugfs_reg32 { + char *name; + unsigned long offset; + }; + + struct debugfs_regset32 { + struct debugfs_reg32 *regs; + int nregs; + void __iomem *base; + }; + - struct dentry *debugfs_create_regset32(const char *name, umode_t mode, - struct dentry *parent, - struct debugfs_regset32 *regset); ++ debugfs_create_regset32(const char *name, umode_t mode, ++ struct dentry *parent, ++ struct debugfs_regset32 *regset); + + void debugfs_print_regs32(struct seq_file *s, struct debugfs_reg32 *regs, + int nregs, void __iomem *base, char *prefix); + + The "base" argument may be 0, but you may want to build the reg32 array + using __stringify, and a number of register names (macros) are actually + byte offsets over a base for the register block. + + If you want to dump an u32 array in debugfs, you can create file with:: + + void debugfs_create_u32_array(const char *name, umode_t mode, + struct dentry *parent, + u32 *array, u32 elements); + + The "array" argument provides data, and the "elements" argument is + the number of elements in the array. Note: Once array is created its + size can not be changed. + + There is a helper function to create device related seq_file:: + + struct dentry *debugfs_create_devm_seqfile(struct device *dev, + const char *name, + struct dentry *parent, + int (*read_fn)(struct seq_file *s, + void *data)); + + The "dev" argument is the device related to this debugfs file, and + the "read_fn" is a function pointer which to be called to print the + seq_file content. + + There are a couple of other directory-oriented helper functions:: + + struct dentry *debugfs_rename(struct dentry *old_dir, + struct dentry *old_dentry, + struct dentry *new_dir, + const char *new_name); + + struct dentry *debugfs_create_symlink(const char *name, + struct dentry *parent, + const char *target); + + A call to debugfs_rename() will give a new name to an existing debugfs + file, possibly in a different directory. The new_name must not exist prior + to the call; the return value is old_dentry with updated information. + Symbolic links can be created with debugfs_create_symlink(). + + There is one important thing that all debugfs users must take into account: + there is no automatic cleanup of any directories created in debugfs. If a + module is unloaded without explicitly removing debugfs entries, the result + will be a lot of stale pointers and no end of highly antisocial behavior. + So all debugfs users - at least those which can be built as modules - must + be prepared to remove all files and directories they create there. A file + can be removed with:: + + void debugfs_remove(struct dentry *dentry); + + The dentry value can be NULL or an error value, in which case nothing will + be removed. + + Once upon a time, debugfs users were required to remember the dentry + pointer for every debugfs file they created so that all files could be + cleaned up. We live in more civilized times now, though, and debugfs users + can call:: + + void debugfs_remove_recursive(struct dentry *dentry); + + If this function is passed a pointer for the dentry corresponding to the + top-level directory, the entire hierarchy below that directory will be + removed. + + .. [1] http://lwn.net/Articles/309298/ diff --cc Documentation/filesystems/zonefs.rst index 0000000000000,7e733e751e98e..71d845c6a700a mode 000000,100644..100644 --- a/Documentation/filesystems/zonefs.rst +++ b/Documentation/filesystems/zonefs.rst @@@ -1,0 -1,412 +1,420 @@@ + .. SPDX-License-Identifier: GPL-2.0 + + ================================================ + ZoneFS - Zone filesystem for Zoned block devices + ================================================ + + Introduction + ============ + + zonefs is a very simple file system exposing each zone of a zoned block device + as a file. Unlike a regular POSIX-compliant file system with native zoned block + device support (e.g. f2fs), zonefs does not hide the sequential write + constraint of zoned block devices to the user. Files representing sequential + write zones of the device must be written sequentially starting from the end + of the file (append only writes). + + As such, zonefs is in essence closer to a raw block device access interface + than to a full-featured POSIX file system. The goal of zonefs is to simplify + the implementation of zoned block device support in applications by replacing + raw block device file accesses with a richer file API, avoiding relying on + direct block device file ioctls which may be more obscure to developers. One + example of this approach is the implementation of LSM (log-structured merge) + tree structures (such as used in RocksDB and LevelDB) on zoned block devices + by allowing SSTables to be stored in a zone file similarly to a regular file + system rather than as a range of sectors of the entire disk. The introduction + of the higher level construct "one file is one zone" can help reducing the + amount of changes needed in the application as well as introducing support for + different application programming languages. + + Zoned block devices + ------------------- + + Zoned storage devices belong to a class of storage devices with an address + space that is divided into zones. A zone is a group of consecutive LBAs and all + zones are contiguous (there are no LBA gaps). Zones may have different types. + + * Conventional zones: there are no access constraints to LBAs belonging to + conventional zones. Any read or write access can be executed, similarly to a + regular block device. + * Sequential zones: these zones accept random reads but must be written + sequentially. Each sequential zone has a write pointer maintained by the + device that keeps track of the mandatory start LBA position of the next write + to the device. As a result of this write constraint, LBAs in a sequential zone + cannot be overwritten. Sequential zones must first be erased using a special + command (zone reset) before rewriting. + + Zoned storage devices can be implemented using various recording and media + technologies. The most common form of zoned storage today uses the SCSI Zoned + Block Commands (ZBC) and Zoned ATA Commands (ZAC) interfaces on Shingled + Magnetic Recording (SMR) HDDs. + + Solid State Disks (SSD) storage devices can also implement a zoned interface + to, for instance, reduce internal write amplification due to garbage collection. + The NVMe Zoned NameSpace (ZNS) is a technical proposal of the NVMe standard + committee aiming at adding a zoned storage interface to the NVMe protocol. + + Zonefs Overview + =============== + + Zonefs exposes the zones of a zoned block device as files. The files + representing zones are grouped by zone type, which are themselves represented + by sub-directories. This file structure is built entirely using zone information + provided by the device and so does not require any complex on-disk metadata + structure. + + On-disk metadata + ---------------- + + zonefs on-disk metadata is reduced to an immutable super block which + persistently stores a magic number and optional feature flags and values. On + mount, zonefs uses blkdev_report_zones() to obtain the device zone configuration + and populates the mount point with a static file tree solely based on this + information. File sizes come from the device zone type and write pointer + position managed by the device itself. + + The super block is always written on disk at sector 0. The first zone of the + device storing the super block is never exposed as a zone file by zonefs. If + the zone containing the super block is a sequential zone, the mkzonefs format + tool always "finishes" the zone, that is, it transitions the zone to a full + state to make it read-only, preventing any data write. + + Zone type sub-directories + ------------------------- + + Files representing zones of the same type are grouped together under the same + sub-directory automatically created on mount. + + For conventional zones, the sub-directory "cnv" is used. This directory is + however created if and only if the device has usable conventional zones. If + the device only has a single conventional zone at sector 0, the zone will not + be exposed as a file as it will be used to store the zonefs super block. For + such devices, the "cnv" sub-directory will not be created. + + For sequential write zones, the sub-directory "seq" is used. + + These two directories are the only directories that exist in zonefs. Users + cannot create other directories and cannot rename nor delete the "cnv" and + "seq" sub-directories. + + The size of the directories indicated by the st_size field of struct stat, + obtained with the stat() or fstat() system calls, indicates the number of files + existing under the directory. + + Zone files + ---------- + + Zone files are named using the number of the zone they represent within the set + of zones of a particular type. That is, both the "cnv" and "seq" directories + contain files named "0", "1", "2", ... The file numbers also represent + increasing zone start sector on the device. + + All read and write operations to zone files are not allowed beyond the file + maximum size, that is, beyond the zone size. Any access exceeding the zone + size is failed with the -EFBIG error. + + Creating, deleting, renaming or modifying any attribute of files and + sub-directories is not allowed. + + The number of blocks of a file as reported by stat() and fstat() indicates the + size of the file zone, or in other words, the maximum file size. + + Conventional zone files + ----------------------- + + The size of conventional zone files is fixed to the size of the zone they + represent. Conventional zone files cannot be truncated. + + These files can be randomly read and written using any type of I/O operation: + buffered I/Os, direct I/Os, memory mapped I/Os (mmap), etc. There are no I/O + constraint for these files beyond the file size limit mentioned above. + + Sequential zone files + --------------------- + + The size of sequential zone files grouped in the "seq" sub-directory represents + the file's zone write pointer position relative to the zone start sector. + + Sequential zone files can only be written sequentially, starting from the file + end, that is, write operations can only be append writes. Zonefs makes no + attempt at accepting random writes and will fail any write request that has a + start offset not corresponding to the end of the file, or to the end of the last -write issued and still in-flight (for asynchrnous I/O operations). ++write issued and still in-flight (for asynchronous I/O operations). + + Since dirty page writeback by the page cache does not guarantee a sequential + write pattern, zonefs prevents buffered writes and writeable shared mappings + on sequential files. Only direct I/O writes are accepted for these files. + zonefs relies on the sequential delivery of write I/O requests to the device + implemented by the block layer elevator. An elevator implementing the sequential + write feature for zoned block device (ELEVATOR_F_ZBD_SEQ_WRITE elevator feature) -must be used. This type of elevator (e.g. mq-deadline) is the set by default ++must be used. This type of elevator (e.g. mq-deadline) is set by default + for zoned block devices on device initialization. + + There are no restrictions on the type of I/O used for read operations in + sequential zone files. Buffered I/Os, direct I/Os and shared read mappings are + all accepted. + + Truncating sequential zone files is allowed only down to 0, in which case, the + zone is reset to rewind the file zone write pointer position to the start of + the zone, or up to the zone size, in which case the file's zone is transitioned + to the FULL state (finish zone operation). + + Format options + -------------- + + Several optional features of zonefs can be enabled at format time. + + * Conventional zone aggregation: ranges of contiguous conventional zones can be + aggregated into a single larger file instead of the default one file per zone. + * File ownership: The owner UID and GID of zone files is by default 0 (root) + but can be changed to any valid UID/GID. + * File access permissions: the default 640 access permissions can be changed. + + IO error handling + ----------------- + + Zoned block devices may fail I/O requests for reasons similar to regular block + devices, e.g. due to bad sectors. However, in addition to such known I/O + failure pattern, the standards governing zoned block devices behavior define + additional conditions that result in I/O errors. + + * A zone may transition to the read-only condition (BLK_ZONE_COND_READONLY): + While the data already written in the zone is still readable, the zone can + no longer be written. No user action on the zone (zone management command or + read/write access) can change the zone condition back to a normal read/write + state. While the reasons for the device to transition a zone to read-only + state are not defined by the standards, a typical cause for such transition + would be a defective write head on an HDD (all zones under this head are + changed to read-only). + + * A zone may transition to the offline condition (BLK_ZONE_COND_OFFLINE): + An offline zone cannot be read nor written. No user action can transition an + offline zone back to an operational good state. Similarly to zone read-only + transitions, the reasons for a drive to transition a zone to the offline + condition are undefined. A typical cause would be a defective read-write head + on an HDD causing all zones on the platter under the broken head to be + inaccessible. + + * Unaligned write errors: These errors result from the host issuing write + requests with a start sector that does not correspond to a zone write pointer + position when the write request is executed by the device. Even though zonefs + enforces sequential file write for sequential zones, unaligned write errors + may still happen in the case of a partial failure of a very large direct I/O + operation split into multiple BIOs/requests or asynchronous I/O operations. + If one of the write request within the set of sequential write requests - issued to the device fails, all write requests after queued after it will ++ issued to the device fails, all write requests queued after it will + become unaligned and fail. + + * Delayed write errors: similarly to regular block devices, if the device side + write cache is enabled, write errors may occur in ranges of previously + completed writes when the device write cache is flushed, e.g. on fsync(). + Similarly to the previous immediate unaligned write error case, delayed write + errors can propagate through a stream of cached sequential data for a zone + causing all data to be dropped after the sector that caused the error. + + All I/O errors detected by zonefs are notified to the user with an error code -return for the system call that trigered or detected the error. The recovery ++return for the system call that triggered or detected the error. The recovery + actions taken by zonefs in response to I/O errors depend on the I/O type (read + vs write) and on the reason for the error (bad sector, unaligned writes or zone + condition change). + + * For read I/O errors, zonefs does not execute any particular recovery action, + but only if the file zone is still in a good condition and there is no + inconsistency between the file inode size and its zone write pointer position. + If a problem is detected, I/O error recovery is executed (see below table). + + * For write I/O errors, zonefs I/O error recovery is always executed. + + * A zone condition change to read-only or offline also always triggers zonefs + I/O error recovery. + -Zonefs minimal I/O error recovery may change a file size and a file access ++Zonefs minimal I/O error recovery may change a file size and file access + permissions. + + * File size changes: + Immediate or delayed write errors in a sequential zone file may cause the file + inode size to be inconsistent with the amount of data successfully written in + the file zone. For instance, the partial failure of a multi-BIO large write + operation will cause the zone write pointer to advance partially, even though + the entire write operation will be reported as failed to the user. In such + case, the file inode size must be advanced to reflect the zone write pointer + change and eventually allow the user to restart writing at the end of the + file. + A file size may also be reduced to reflect a delayed write error detected on + fsync(): in this case, the amount of data effectively written in the zone may + be less than originally indicated by the file inode size. After such I/O - error, zonefs always fixes a file inode size to reflect the amount of data ++ error, zonefs always fixes the file inode size to reflect the amount of data + persistently stored in the file zone. + + * Access permission changes: + A zone condition change to read-only is indicated with a change in the file + access permissions to render the file read-only. This disables changes to the + file attributes and data modification. For offline zones, all permissions + (read and write) to the file are disabled. + + Further action taken by zonefs I/O error recovery can be controlled by the user + with the "errors=xxx" mount option. The table below summarizes the result of + zonefs I/O error processing depending on the mount option and on the zone + conditions:: + + +--------------+-----------+-----------------------------------------+ + | | | Post error state | + | "errors=xxx" | device | access permissions | + | mount | zone | file file device zone | + | option | condition | size read write read write | + +--------------+-----------+-----------------------------------------+ + | | good | fixed yes no yes yes | - | remount-ro | read-only | fixed yes no yes no | ++ | remount-ro | read-only | as is yes no yes no | + | (default) | offline | 0 no no no no | + +--------------+-----------+-----------------------------------------+ + | | good | fixed yes no yes yes | - | zone-ro | read-only | fixed yes no yes no | ++ | zone-ro | read-only | as is yes no yes no | + | | offline | 0 no no no no | + +--------------+-----------+-----------------------------------------+ + | | good | 0 no no yes yes | + | zone-offline | read-only | 0 no no yes no | + | | offline | 0 no no no no | + +--------------+-----------+-----------------------------------------+ + | | good | fixed yes yes yes yes | - | repair | read-only | fixed yes no yes no | ++ | repair | read-only | as is yes no yes no | + | | offline | 0 no no no no | + +--------------+-----------+-----------------------------------------+ + + Further notes: + + * The "errors=remount-ro" mount option is the default behavior of zonefs I/O + error processing if no errors mount option is specified. + * With the "errors=remount-ro" mount option, the change of the file access + permissions to read-only applies to all files. The file system is remounted + read-only. + * Access permission and file size changes due to the device transitioning zones - to the offline condition are permanent. Remounting or reformating the device ++ to the offline condition are permanent. Remounting or reformatting the device + with mkfs.zonefs (mkzonefs) will not change back offline zone files to a good + state. + * File access permission changes to read-only due to the device transitioning - zones to the read-only condition are permanent. Remounting or reformating ++ zones to the read-only condition are permanent. Remounting or reformatting + the device will not re-enable file write access. + * File access permission changes implied by the remount-ro, zone-ro and + zone-offline mount options are temporary for zones in a good condition. + Unmounting and remounting the file system will restore the previous default + (format time values) access rights to the files affected. + * The repair mount option triggers only the minimal set of I/O error recovery + actions, that is, file size fixes for zones in a good condition. Zones + indicated as being read-only or offline by the device still imply changes to + the zone file access permissions as noted in the table above. + + Mount options + ------------- + + zonefs define the "errors=" mount option to allow the user to specify + zonefs behavior in response to I/O errors, inode size inconsistencies or zone -condition chages. The defined behaviors are as follow: ++condition changes. The defined behaviors are as follow: + + * remount-ro (default) + * zone-ro + * zone-offline + * repair + -The I/O error actions defined for each behavior is detailed in the previous -section. ++The run-time I/O error actions defined for each behavior are detailed in the ++previous section. Mount time I/O errors will cause the mount operation to fail. ++The handling of read-only zones also differs between mount-time and run-time. ++If a read-only zone is found at mount time, the zone is always treated in the ++same manner as offline zones, that is, all accesses are disabled and the zone ++file size set to 0. This is necessary as the write pointer of read-only zones ++is defined as invalib by the ZBC and ZAC standards, making it impossible to ++discover the amount of data that has been written to the zone. In the case of a ++read-only zone discovered at run-time, as indicated in the previous section. ++the size of the zone file is left unchanged from its last updated value. + + Zonefs User Space Tools + ======================= + + The mkzonefs tool is used to format zoned block devices for use with zonefs. + This tool is available on Github at: + + https://github.com/damien-lemoal/zonefs-tools + + zonefs-tools also includes a test suite which can be run against any zoned + block device, including null_blk block device created with zoned mode. + + Examples + -------- + + The following formats a 15TB host-managed SMR HDD with 256 MB zones + with the conventional zones aggregation feature enabled:: + + # mkzonefs -o aggr_cnv /dev/sdX + # mount -t zonefs /dev/sdX /mnt + # ls -l /mnt/ + total 0 + dr-xr-xr-x 2 root root 1 Nov 25 13:23 cnv + dr-xr-xr-x 2 root root 55356 Nov 25 13:23 seq + + The size of the zone files sub-directories indicate the number of files + existing for each type of zones. In this example, there is only one + conventional zone file (all conventional zones are aggregated under a single + file):: + + # ls -l /mnt/cnv + total 137101312 + -rw-r----- 1 root root 140391743488 Nov 25 13:23 0 + + This aggregated conventional zone file can be used as a regular file:: + + # mkfs.ext4 /mnt/cnv/0 + # mount -o loop /mnt/cnv/0 /data + + The "seq" sub-directory grouping files for sequential write zones has in this + example 55356 zones:: + + # ls -lv /mnt/seq + total 14511243264 + -rw-r----- 1 root root 0 Nov 25 13:23 0 + -rw-r----- 1 root root 0 Nov 25 13:23 1 + -rw-r----- 1 root root 0 Nov 25 13:23 2 + ... + -rw-r----- 1 root root 0 Nov 25 13:23 55354 + -rw-r----- 1 root root 0 Nov 25 13:23 55355 + + For sequential write zone files, the file size changes as data is appended at + the end of the file, similarly to any regular file system:: + + # dd if=/dev/zero of=/mnt/seq/0 bs=4096 count=1 conv=notrunc oflag=direct + 1+0 records in + 1+0 records out + 4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00044121 s, 9.3 MB/s + + # ls -l /mnt/seq/0 + -rw-r----- 1 root root 4096 Nov 25 13:23 /mnt/seq/0 + + The written file can be truncated to the zone size, preventing any further + write operation:: + + # truncate -s 268435456 /mnt/seq/0 + # ls -l /mnt/seq/0 + -rw-r----- 1 root root 268435456 Nov 25 13:49 /mnt/seq/0 + + Truncation to 0 size allows freeing the file zone storage space and restart + append-writes to the file:: + + # truncate -s 0 /mnt/seq/0 + # ls -l /mnt/seq/0 + -rw-r----- 1 root root 0 Nov 25 13:49 /mnt/seq/0 + + Since files are statically mapped to zones on the disk, the number of blocks of + a file as reported by stat() and fstat() indicates the size of the file zone:: + + # stat /mnt/seq/0 + File: /mnt/seq/0 + Size: 0 Blocks: 524288 IO Block: 4096 regular empty file + Device: 870h/2160d Inode: 50431 Links: 1 + Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 0/ root) + Access: 2019-11-25 13:23:57.048971997 +0900 + Modify: 2019-11-25 13:52:25.553805765 +0900 + Change: 2019-11-25 13:52:25.553805765 +0900 + Birth: - + + The number of blocks of the file ("Blocks") in units of 512B blocks gives the + maximum file size of 524288 * 512 B = 256 MB, corresponding to the device zone + size in this example. Of note is that the "IO block" field always indicates the + minimum I/O size for writes and corresponds to the device physical sector size. diff --cc MAINTAINERS index 8b6e2d85dd47d,38f58b85eb063..953478c9a2eb3 --- a/MAINTAINERS +++ b/MAINTAINERS @@@ -3907,10 -3906,10 +3907,10 @@@ W: http://ceph.com T: git git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git T: git git://github.com/ceph/ceph-client.git S: Supported - F: Documentation/filesystems/ceph.txt + F: Documentation/filesystems/ceph.rst F: fs/ceph/ -CERTIFICATE HANDLING: +CERTIFICATE HANDLING M: David Howells M: David Woodhouse L: keyrings@vger.kernel.org @@@ -5937,8 -5937,8 +5937,8 @@@ L: ecryptfs@vger.kernel.or W: http://ecryptfs.org W: https://launchpad.net/ecryptfs T: git git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs.git -S: Supported +S: Odd Fixes - F: Documentation/filesystems/ecryptfs.txt + F: Documentation/filesystems/ecryptfs.rst F: fs/ecryptfs/ EDAC-AMD64