btrfs-progs: docs: enhance manual page for mkfs

Signed-off-by: David Sterba <dsterba@suse.com>
author: David Sterba <dsterba@suse.com> 2015-10-30 19:16:41 +0100
committer: David Sterba <dsterba@suse.com> 2015-11-02 15:10:15 +0100
commit: a2d66e0962016b8d096f3aecac4c83d911ebd3a9 (patch)
tree: 3662a2ca9ba7dbab2733a21017073576aed30a28 /Documentation/mkfs.btrfs.asciidoc
parent: ce059fc9d770379e10f88bb062daf2affbdb855e (diff)
1 files changed, 229 insertions, 95 deletions
diff --git a/Documentation/mkfs.btrfs.asciidoc b/Documentation/mkfs.btrfs.asciidoc
index 5789762a..12d88400 100644
--- a/Documentation/mkfs.btrfs.asciidoc
+++ b/Documentation/mkfs.btrfs.asciidoc
@@ -11,135 +11,267 @@ SYNOPSIS
 $$[-A|--alloc-start <alloc-start>]$$
 $$[-b|--byte-count <byte-count>]$$
 $$[-d|--data <data-profile>]$$
-$$[-f|--force]$$
-$$[-n|--nodesize <nodesize>]$$
-$$[-l|--leafsize <leafsize>]$$
-$$[-L|--label <label>]$$
 $$[-m|--metadata <metadata profile>]$$
 $$[-M|--mixed]$$
-$$[-q|--quiet]$$
+$$[-l|--leafsize <leafsize>]$$
+$$[-n|--nodesize <nodesize>]$$
 $$[-s|--sectorsize <sectorsize>]$$
-$$[-r|--rootdir <rootdir>]$$
+$$[-L|--label <label>]$$
 $$[-K|--nodiscard]$$
+$$[-r|--rootdir <rootdir>]$$
 $$[-O|--features <feature1>[,<feature2>...]]$$
 $$[-U|--uuid <UUID>]$$
-$$[-h]$$
+$$[-f|--force]$$
+$$[-q|--quiet]$$
+$$[--help]$$
 $$[-V|--version]$$
 $$<device> [<device>...]$$
 
 DESCRIPTION
 -----------
-*mkfs.btrfs*
-is used to create a btrfs filesystem (usually in a disk partition, or an array
-of disk partitions).
+*mkfs.btrfs* is used to create the btrfs filesystem on a single or multiple
+devices.  <device> is typically a block device but can be a file-backed image
+as well. Multiple devices are grouped by UUID of the filesystem.
 
-<device>
-is the special file corresponding to the device (e.g /dev/sdXX ).
-If multiple devices are specified, btrfs is created
-spanning across the specified  devices.
+Before mounting such filesystem, the kernel module must know all the devices
+either via preceding execution of *btrfs device scan* or using the *device*
+mount option. See section *MULTIPLE DEVICES* for more details.
 
 OPTIONS
 -------
--A|--alloc-start <offset>::
-Specify the offset from the start of the device at which to start allocations
-in this btrfs filesystem. The default value is zero, or the start of the
-device.  This option is intended only for debugging filesystem resize
-operations.
-
--b|--byte-count <size>::
-Specify the size of the resultant filesystem. If this option is not used,
-mkfs.btrfs uses all the available storage for the filesystem.
-
--d|--data <type>::
-Specify how the data must be spanned across the devices specified. Valid
-values are 'raid0', 'raid1', 'raid5', 'raid6', 'raid10' or 'single'.
-
--f|--force::
-Force overwrite when an existing filesystem is detected on the device.
-By default, mkfs.btrfs will not write to the device if it suspects that
-there is a filesystem or partition table on the device already.
-
--l|--leafsize <size>::
-Alias for --nodesize. Deprecated.
+*-A|--alloc-start <offset>*::
+(An option to help debugging chunk allocator.)
+Specify the (physical) offset from the start of the device at which allocations
+start.  The default value is zero.
+
+*-b|--byte-count <size>*::
+Specify the size of the filesystem. If this option is not used,
+mkfs.btrfs uses the entire device space for the filesystem.
 
--n|--nodesize <size>::
-Specify the nodesize, the tree block size in which btrfs stores
-data. The default value is 16KB (16384) or the page size, whichever is
-bigger. Must be a multiple of the sectorsize, but not larger than 65536.
-Leafsize always equals nodesize and the options are aliases.
+*-d|--data <profile>*::
+Specify the profile for the data block groups.  Valid values are 'raid0',
+'raid1', 'raid5', 'raid6', 'raid10' or 'single', (case does not matter).
 
--L|--label <name>::
-Specify a label for the filesystem.
+*-m|--metadata <profile>*::
+Specify the profile for the metadata block groups.
+Valid values are 'raid0', 'raid1', 'raid5', 'raid6', 'raid10', 'single' or
+'dup', (case does not matter).
 +
-NOTE: <name> should be less than 256 characters.
+A single device filesystem will default to 'DUP', unless a SSD is detected. Then
+it will default to 'single'. The detection is based on the value of
+`/sys/block/DEV/queue/rotational`, where 'DEV' is the short name of the device.
+This is because SSDs can remap the blocks internally to a single copy thus
+deduplicating them which negates the purpose of increased metadata redunancy
+and just wastes space. 
++
+Note that the rotational status can be arbitrarily set by the underlying block
+device driver and may not reflect the true status (network block device, memory-backed
+SCSI devices etc). Use the options '--data/--metadata' to avoid confusion.
 
--m|--metadata <profile>::
-Specify how metadata must be spanned across the devices specified. Valid
-values are 'raid0', 'raid1', 'raid5', 'raid6', 'raid10', 'single' or 'dup'.
+*-M|--mixed*::
+Normally the data and metadata block groups are isolated. The 'mixed' mode
+will remove the isolation and store both types in the same block group type.
+This helps to utilize the free space regardless of the purpose and is suitable
+for small devices. The separate allocation of block groups leads to a situation
+where the space is reserved for the other block group type, is not available for
+allocation and can lead to ENOSPC state.
 +
-Single device
-will have dup set by default except in the case of SSDs which will default to
-single. This is because SSDs can remap blocks internally so duplicate blocks
-could end up in the same erase block which negates the benefits of doing
-metadata duplication.
-
--M|--mixed::
-Mix data and metadata chunks together for more efficient space
-utilization.  This feature incurs a performance penalty in
-larger filesystems.  It is recommended for use with filesystems
-of 1 GiB or smaller.
-
--q|--quiet::
-Print only error or warning messages. Options --features or --help are unaffected.
+The recommended size for the mixed mode is for filesystems less than 1GiB. The
+soft recommendation is to use it for filesystems smaller than 5GiB. Thie mixed
+mode may lead to degraded performance on larger filesystems, but is otherwise
+usable, even on multiple devices.
++
+The 'nodesize' and 'sectorsize' must be equal, and the block group types must
+match.
++
+NOTE: versions up to 4.2.x forced the mixed mode for devices smaller than 1GiB.
+This has been removed in 4.3+ as it caused some usability issues.
 
--s|--sectorsize <size>::
-Specify the sectorsize, the minimum data block allocation unit.
+*-l|--leafsize <size>*::
+Alias for --nodesize. Deprecated.
+
+*-n|--nodesize <size>*::
+Specify the nodesize, the tree block size in which btrfs stores metadata. The
+default value is 16KiB (16384) or the page size, whichever is bigger. Must be a
+multiple of the sectorsize, but not larger than 64KiB (65536).  Leafsize always
+equals nodesize and the options are aliases.
++
+Smaller node size increases fragmentation but lead to higher b-trees which in
+turn leads to lower locking contention. Higher node sizes give better packing
+and less fragmentation at the cost of more expensive memory operations while
+updating the metadata blocks.
 +
-The default
-value is the page size. If the sectorsize differs from the page size, the
-created filesystem may not be mountable by current kernel. Therefore it is not
-recommended to use this option unless you are going to mount it on a system
-with the appropriate page size.
-
--r|--rootdir <rootdir>::
-Specify a directory to copy into the newly created btrfs filesystem.
+NOTE: versions up to 3.11 set the nodesize to 4k.
+
+*-s|--sectorsize <size>*::
+Specify the sectorsize, the minimum data block allocation unit.
 +
-NOTE: '-r' option is done completely in userland, and don't need root
-privilege to mount the filesystem.
+The default value is the page size and is autodetected. If the sectorsize
+differs from the page size, the created filesystem may not be mountable by the
+kernel. Therefore it is not recommended to use this option unless you are going
+to mount it on a system with the appropriate page size.
 
--K|--nodiscard::
-Do not perform whole device TRIM operation by default.
+*-L|--label <string>*::
+Specify a label for the filesystem. The 'string' should be less than 256
+bytes and must not contain newline characters.
 
--O|--features <feature1>[,<feature2>...]::
+*-K|--nodiscard*::
+Do not perform whole device TRIM operation on devices that are capable of that.
+
+*-r|--rootdir <rootdir>*::
+Populate the toplevel subvolume with files from 'rootdir'.  This does not
+require root permissions and does not mount the filesystem.
+
+*-O|--features <feature1>[,<feature2>...]*::
 A list of filesystem features turned on at mkfs time. Not all features are
 supported by old kernels. To disable a feature, prefix it with '^'.
 +
-To see all features run:
+See section *FILESYSTEM FEATURES* for more details.  To see all available
+features that mkfs.btrfs supports run:
 +
 +mkfs.btrfs -O list-all+
 
--U|--uuid <UUID>::
-Create the filesystem with the specified UUID, which must not already exist on
-the system.
+*-f|--force*::
+Forcibly overwrite the block devices when an existing filesystem is detected.
+By default, mkfs.btrfs will utilize 'libblkid' to check for any known
+filesystem on the devices. Alternatively you can use the `wipefs` utility
+to clear the devices.
+
+*-q|--quiet*::
+Print only error or warning messages. Options --features or --help are unaffected.
+
+*-U|--uuid <UUID>*::
+Create the filesystem with the given 'UUID'. The UUID must not exist on any
+filesystem currently present.
 
--V|--version::
+*-V|--version*::
 Print the *mkfs.btrfs* version and exit.
 
--h::
+*--help*::
 Print help.
 
-UNIT
-----
-As default the unit is the byte, however it is possible to append a suffix
-to the arguments like 'k' for KBytes, 'm' for MBytes...
+SIZE UNITS
+----------
+The default unit is 'byte'. All size parameters accept suffixes in the 1024
+base. The recognized suffixes are: 'k', 'm', 'g', 't', 'e', both uppercase and
+lowercase.
+
+MULTIPLE DEVICES
+----------------
+
+Before mounting a multiple device filesystem, the kernel module must know the
+association of the block devices that are attached to the filesystem UUID.
+
+There is typically no action needed from the user.  On a system that utilizes a
+udev-like daemon, any new block device is automatically registered. The rules
+call *btrfs device scan*.
+
+The same command can be used to trigger the device scanning if the btrfs kernel
+module is reloaded (naturally all previous information about the device
+registration is lost).
+
+Another possibility is to use the mount options *device* to specify the list of
+devices to scan at the time of mount.
+
+ # mount -o device=/dev/sdb,device=/dev/sdc /dev/sda /mnt
+
+NOTE: that this means only scanning, if the devices do not exist in the system,
+mount will fail anyway. This can happen on systems without initramfs/initrd and
+root partition created with RAID1/10/5/6 profiles. The mount action can happen
+before all block devices are discovered. The waiting is usually done on the
+initramfs/initrd systems.
+
+FILESYSTEM FEATURES
+-------------------
+
+*mixed-bg*::
+mixed data and metadata block groups, also set by option '--mixed'
+
+*extref*::
+(default since btrfs-progs 3.12, kernel support since 3.7)
++
+increased hardlink limit per file in a directory to 65536, older kernels
+supported a varying number of hardlinks depending on the sum of all file name
+sizes that can be stored into one metadata block
+
+*raid56*::
+extended format for RAID5/6, also enabled if raid5 or raid6 block groups
+are selected
+
+*skinny-metadata*::
+(default since btrfs-progs 3.18, kernel support since 3.10)
++
+reduced-size metadata for extent references, saves a few percent of metadata
+
+*no-holes*::
+improved representation of file extents where holes are not explicitly
+stored as an extent, saves a few percent of metadata if sparse files are used
+
+BLOCK GROUPS, CHUNKS, RAID
+--------------------------
+
+The highlevel organizational units of a filesystem are block groups of three types:
+data, metadata and system.
+
+*DATA*::
+store data blocks and nothing else
+
+*METADATA*::
+store internal metadata in b-trees, can store file data if they fit into the
+inline limit
+
+*SYSTEM*::
+store structures that describe the mapping between the physical devices and the
+linear logical space representing the filesystem
+
+Other terms commonly used:
+
+*block group*::
+*chunk*::
+a logical range of space of a given profile, stores data, metadata or both;
+sometimes the terms are used interchangably
++
+A typical size of metadata block group is 256MiB (filesystem smaller than
+50GiB) and 1GiB (larger than 50GiB), for data it's 1GiB. The system block group
+size is a few megabytes.
+
+*RAID*::
+a block group profile type that utilizes RAID-like features on multiple
+devices: striping, mirroring, parity
+
+*profile*::
+when used in connection with block groups refers to the allocation strategy
+and constraints, see the section 'PROFILES' for more details
+
+PROFILES
+--------
+
+There are the following block group types available:
+
+[ width="60%",options="header" ]
+|=============================================================
+| Profile | Redundancy          | Striping   | Min/max devices
+| single  | 1 copy              | n/a        | 1/any
+| DUP     | 2 copies / 1 device | n/a        | 1/1
+| RAID0   | n/a                 | 1 to N     | 2/any
+| RAID10  | 2 copies            | 1 to N     | 4/any
+| RAID5   | 2 copies            | 3 to N - 1 | 2/any
+| RAID6   | 3 copies            | 3 to N - 2 | 3/any
+|=============================================================
+
+KNOWN ISSUES
+------------
+
+**SMALL FILESYSTEMS AND LARGE NODESIZE**
+
+The combination of small filesystem size and large nodesize is not recommended
+in general and can lead to various ENOSPC-related issues during mount time or runtime.
 
-NOTES
------
 Since mixed block group creation is optional, we allow small
-filesystem instances with differing values for sectorsize and nodesize
-to be created and could end up in the following situation,
+filesystem instances with differing values for 'sectorsize' and 'nodesize'
+to be created and could end up in the following situation:
 
-  [root@localhost ~]# mkfs.btrfs -f -n 65536 /dev/loop0
+  # mkfs.btrfs -f -n 65536 /dev/loop0
   btrfs-progs v3.19-rc2-405-g976307c
   See http://btrfs.wiki.kernel.org for more information.
 
@@ -159,12 +291,14 @@ to be created and could end up in the following situation,
   Devices:
     ID        SIZE  PATH
      1   512.00MiB  /dev/loop0
-  [root@localhost ~]# mount /dev/loop0 /mnt/
+
+  # mount /dev/loop0 /mnt/
   mount: mount /dev/loop0 on /mnt failed: No space left on device
 
-The ENOSPC occurs during the creation of the UUID tree. This is
-because of things like large metadata block size, DUP mode used for
-metadata and global reservation consuming space.
+The ENOSPC occurs during the creation of the UUID tree. This is caused
+by large metadata blocks and space reservation strategy that allocates more
+than can fit into the filesystem.
+
 
 AVAILABILITY
 ------------
@@ -174,4 +308,4 @@ further details.
 
 SEE ALSO
 --------
-`btrfs`(8)
+`btrfs`(8), `wipefs`(8)
author	David Sterba <dsterba@suse.com>	2015-10-30 19:16:41 +0100
committer	David Sterba <dsterba@suse.com>	2015-11-02 15:10:15 +0100
commit	a2d66e0962016b8d096f3aecac4c83d911ebd3a9 (patch)
tree	3662a2ca9ba7dbab2733a21017073576aed30a28 /Documentation/mkfs.btrfs.asciidoc
parent	ce059fc9d770379e10f88bb062daf2affbdb855e (diff)