1 files changed, 229 insertions, 95 deletions
diff --git a/Documentation/mkfs.btrfs.asciidoc b/Documentation/mkfs.btrfs.asciidoc
index 5789762a..12d88400 100644
@@ -11,135 +11,267 @@ SYNOPSIS
$$[-m|--metadata <metadata profile>]$$
-is used to create a btrfs filesystem (usually in a disk partition, or an array
-of disk partitions).
+*mkfs.btrfs* is used to create the btrfs filesystem on a single or multiple
+devices. <device> is typically a block device but can be a file-backed image
+as well. Multiple devices are grouped by UUID of the filesystem.
-is the special file corresponding to the device (e.g /dev/sdXX ).
-If multiple devices are specified, btrfs is created
-spanning across the specified devices.
+Before mounting such filesystem, the kernel module must know all the devices
+either via preceding execution of *btrfs device scan* or using the *device*
+mount option. See section *MULTIPLE DEVICES* for more details.
-Specify the offset from the start of the device at which to start allocations
-in this btrfs filesystem. The default value is zero, or the start of the
-device. This option is intended only for debugging filesystem resize
-Specify the size of the resultant filesystem. If this option is not used,
-mkfs.btrfs uses all the available storage for the filesystem.
-Specify how the data must be spanned across the devices specified. Valid
-values are 'raid0', 'raid1', 'raid5', 'raid6', 'raid10' or 'single'.
-Force overwrite when an existing filesystem is detected on the device.
-By default, mkfs.btrfs will not write to the device if it suspects that
-there is a filesystem or partition table on the device already.
-Alias for --nodesize. Deprecated.
+(An option to help debugging chunk allocator.)
+Specify the (physical) offset from the start of the device at which allocations
+start. The default value is zero.
+Specify the size of the filesystem. If this option is not used,
+mkfs.btrfs uses the entire device space for the filesystem.
-Specify the nodesize, the tree block size in which btrfs stores
-data. The default value is 16KB (16384) or the page size, whichever is
-bigger. Must be a multiple of the sectorsize, but not larger than 65536.
-Leafsize always equals nodesize and the options are aliases.
+Specify the profile for the data block groups. Valid values are 'raid0',
+'raid1', 'raid5', 'raid6', 'raid10' or 'single', (case does not matter).
-Specify a label for the filesystem.
+Specify the profile for the metadata block groups.
+Valid values are 'raid0', 'raid1', 'raid5', 'raid6', 'raid10', 'single' or
+'dup', (case does not matter).
-NOTE: <name> should be less than 256 characters.
+A single device filesystem will default to 'DUP', unless a SSD is detected. Then
+it will default to 'single'. The detection is based on the value of
+`/sys/block/DEV/queue/rotational`, where 'DEV' is the short name of the device.
+This is because SSDs can remap the blocks internally to a single copy thus
+deduplicating them which negates the purpose of increased metadata redunancy
+and just wastes space.
+Note that the rotational status can be arbitrarily set by the underlying block
+device driver and may not reflect the true status (network block device, memory-backed
+SCSI devices etc). Use the options '--data/--metadata' to avoid confusion.
-Specify how metadata must be spanned across the devices specified. Valid
-values are 'raid0', 'raid1', 'raid5', 'raid6', 'raid10', 'single' or 'dup'.
+Normally the data and metadata block groups are isolated. The 'mixed' mode
+will remove the isolation and store both types in the same block group type.
+This helps to utilize the free space regardless of the purpose and is suitable
+for small devices. The separate allocation of block groups leads to a situation
+where the space is reserved for the other block group type, is not available for
+allocation and can lead to ENOSPC state.
-will have dup set by default except in the case of SSDs which will default to
-single. This is because SSDs can remap blocks internally so duplicate blocks
-could end up in the same erase block which negates the benefits of doing
-Mix data and metadata chunks together for more efficient space
-utilization. This feature incurs a performance penalty in
-larger filesystems. It is recommended for use with filesystems
-of 1 GiB or smaller.
-Print only error or warning messages. Options --features or --help are unaffected.
+The recommended size for the mixed mode is for filesystems less than 1GiB. The
+soft recommendation is to use it for filesystems smaller than 5GiB. Thie mixed
+mode may lead to degraded performance on larger filesystems, but is otherwise
+usable, even on multiple devices.
+The 'nodesize' and 'sectorsize' must be equal, and the block group types must
+NOTE: versions up to 4.2.x forced the mixed mode for devices smaller than 1GiB.
+This has been removed in 4.3+ as it caused some usability issues.
-Specify the sectorsize, the minimum data block allocation unit.
+Alias for --nodesize. Deprecated.
+Specify the nodesize, the tree block size in which btrfs stores metadata. The
+default value is 16KiB (16384) or the page size, whichever is bigger. Must be a
+multiple of the sectorsize, but not larger than 64KiB (65536). Leafsize always
+equals nodesize and the options are aliases.
+Smaller node size increases fragmentation but lead to higher b-trees which in
+turn leads to lower locking contention. Higher node sizes give better packing
+and less fragmentation at the cost of more expensive memory operations while
+updating the metadata blocks.
-value is the page size. If the sectorsize differs from the page size, the
-created filesystem may not be mountable by current kernel. Therefore it is not
-recommended to use this option unless you are going to mount it on a system
-with the appropriate page size.
-Specify a directory to copy into the newly created btrfs filesystem.
+NOTE: versions up to 3.11 set the nodesize to 4k.
+Specify the sectorsize, the minimum data block allocation unit.
-NOTE: '-r' option is done completely in userland, and don't need root
-privilege to mount the filesystem.
+The default value is the page size and is autodetected. If the sectorsize
+differs from the page size, the created filesystem may not be mountable by the
+kernel. Therefore it is not recommended to use this option unless you are going
+to mount it on a system with the appropriate page size.
-Do not perform whole device TRIM operation by default.
+Specify a label for the filesystem. The 'string' should be less than 256
+bytes and must not contain newline characters.
+Do not perform whole device TRIM operation on devices that are capable of that.
+Populate the toplevel subvolume with files from 'rootdir'. This does not
+require root permissions and does not mount the filesystem.
A list of filesystem features turned on at mkfs time. Not all features are
supported by old kernels. To disable a feature, prefix it with '^'.
-To see all features run:
+See section *FILESYSTEM FEATURES* for more details. To see all available
+features that mkfs.btrfs supports run:
+mkfs.btrfs -O list-all+
-Create the filesystem with the specified UUID, which must not already exist on
+Forcibly overwrite the block devices when an existing filesystem is detected.
+By default, mkfs.btrfs will utilize 'libblkid' to check for any known
+filesystem on the devices. Alternatively you can use the `wipefs` utility
+to clear the devices.
+Print only error or warning messages. Options --features or --help are unaffected.
+Create the filesystem with the given 'UUID'. The UUID must not exist on any
+filesystem currently present.
Print the *mkfs.btrfs* version and exit.
-As default the unit is the byte, however it is possible to append a suffix
-to the arguments like 'k' for KBytes, 'm' for MBytes...
+The default unit is 'byte'. All size parameters accept suffixes in the 1024
+base. The recognized suffixes are: 'k', 'm', 'g', 't', 'e', both uppercase and
+Before mounting a multiple device filesystem, the kernel module must know the
+association of the block devices that are attached to the filesystem UUID.
+There is typically no action needed from the user. On a system that utilizes a
+udev-like daemon, any new block device is automatically registered. The rules
+call *btrfs device scan*.
+The same command can be used to trigger the device scanning if the btrfs kernel
+module is reloaded (naturally all previous information about the device
+registration is lost).
+Another possibility is to use the mount options *device* to specify the list of
+devices to scan at the time of mount.
+ # mount -o device=/dev/sdb,device=/dev/sdc /dev/sda /mnt
+NOTE: that this means only scanning, if the devices do not exist in the system,
+mount will fail anyway. This can happen on systems without initramfs/initrd and
+root partition created with RAID1/10/5/6 profiles. The mount action can happen
+before all block devices are discovered. The waiting is usually done on the
+mixed data and metadata block groups, also set by option '--mixed'
+(default since btrfs-progs 3.12, kernel support since 3.7)
+increased hardlink limit per file in a directory to 65536, older kernels
+supported a varying number of hardlinks depending on the sum of all file name
+sizes that can be stored into one metadata block
+extended format for RAID5/6, also enabled if raid5 or raid6 block groups
+(default since btrfs-progs 3.18, kernel support since 3.10)
+reduced-size metadata for extent references, saves a few percent of metadata
+improved representation of file extents where holes are not explicitly
+stored as an extent, saves a few percent of metadata if sparse files are used
+BLOCK GROUPS, CHUNKS, RAID
+The highlevel organizational units of a filesystem are block groups of three types:
+data, metadata and system.
+store data blocks and nothing else
+store internal metadata in b-trees, can store file data if they fit into the
+store structures that describe the mapping between the physical devices and the
+linear logical space representing the filesystem
+Other terms commonly used:
+a logical range of space of a given profile, stores data, metadata or both;
+sometimes the terms are used interchangably
+A typical size of metadata block group is 256MiB (filesystem smaller than
+50GiB) and 1GiB (larger than 50GiB), for data it's 1GiB. The system block group
+size is a few megabytes.
+a block group profile type that utilizes RAID-like features on multiple
+devices: striping, mirroring, parity
+when used in connection with block groups refers to the allocation strategy
+and constraints, see the section 'PROFILES' for more details
+There are the following block group types available:
+[ width="60%",options="header" ]
+| Profile | Redundancy | Striping | Min/max devices
+| single | 1 copy | n/a | 1/any
+| DUP | 2 copies / 1 device | n/a | 1/1
+| RAID0 | n/a | 1 to N | 2/any
+| RAID10 | 2 copies | 1 to N | 4/any
+| RAID5 | 2 copies | 3 to N - 1 | 2/any
+| RAID6 | 3 copies | 3 to N - 2 | 3/any
+**SMALL FILESYSTEMS AND LARGE NODESIZE**
+The combination of small filesystem size and large nodesize is not recommended
+in general and can lead to various ENOSPC-related issues during mount time or runtime.
Since mixed block group creation is optional, we allow small
-filesystem instances with differing values for sectorsize and nodesize
-to be created and could end up in the following situation,
+filesystem instances with differing values for 'sectorsize' and 'nodesize'
+to be created and could end up in the following situation:
- [root@localhost ~]# mkfs.btrfs -f -n 65536 /dev/loop0
+ # mkfs.btrfs -f -n 65536 /dev/loop0
See http://btrfs.wiki.kernel.org for more information.
@@ -159,12 +291,14 @@ to be created and could end up in the following situation,
ID SIZE PATH
1 512.00MiB /dev/loop0
- [root@localhost ~]# mount /dev/loop0 /mnt/
+ # mount /dev/loop0 /mnt/
mount: mount /dev/loop0 on /mnt failed: No space left on device
-The ENOSPC occurs during the creation of the UUID tree. This is
-because of things like large metadata block size, DUP mode used for
-metadata and global reservation consuming space.
+The ENOSPC occurs during the creation of the UUID tree. This is caused
+by large metadata blocks and space reservation strategy that allocates more
+than can fit into the filesystem.
@@ -174,4 +308,4 @@ further details.