summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--Documentation/mkfs.btrfs.asciidoc324
1 files changed, 229 insertions, 95 deletions
diff --git a/Documentation/mkfs.btrfs.asciidoc b/Documentation/mkfs.btrfs.asciidoc
index 5789762a..12d88400 100644
--- a/Documentation/mkfs.btrfs.asciidoc
+++ b/Documentation/mkfs.btrfs.asciidoc
@@ -11,135 +11,267 @@ SYNOPSIS
$$[-A|--alloc-start <alloc-start>]$$
$$[-b|--byte-count <byte-count>]$$
$$[-d|--data <data-profile>]$$
-$$[-f|--force]$$
-$$[-n|--nodesize <nodesize>]$$
-$$[-l|--leafsize <leafsize>]$$
-$$[-L|--label <label>]$$
$$[-m|--metadata <metadata profile>]$$
$$[-M|--mixed]$$
-$$[-q|--quiet]$$
+$$[-l|--leafsize <leafsize>]$$
+$$[-n|--nodesize <nodesize>]$$
$$[-s|--sectorsize <sectorsize>]$$
-$$[-r|--rootdir <rootdir>]$$
+$$[-L|--label <label>]$$
$$[-K|--nodiscard]$$
+$$[-r|--rootdir <rootdir>]$$
$$[-O|--features <feature1>[,<feature2>...]]$$
$$[-U|--uuid <UUID>]$$
-$$[-h]$$
+$$[-f|--force]$$
+$$[-q|--quiet]$$
+$$[--help]$$
$$[-V|--version]$$
$$<device> [<device>...]$$
DESCRIPTION
-----------
-*mkfs.btrfs*
-is used to create a btrfs filesystem (usually in a disk partition, or an array
-of disk partitions).
+*mkfs.btrfs* is used to create the btrfs filesystem on a single or multiple
+devices. <device> is typically a block device but can be a file-backed image
+as well. Multiple devices are grouped by UUID of the filesystem.
-<device>
-is the special file corresponding to the device (e.g /dev/sdXX ).
-If multiple devices are specified, btrfs is created
-spanning across the specified devices.
+Before mounting such filesystem, the kernel module must know all the devices
+either via preceding execution of *btrfs device scan* or using the *device*
+mount option. See section *MULTIPLE DEVICES* for more details.
OPTIONS
-------
--A|--alloc-start <offset>::
-Specify the offset from the start of the device at which to start allocations
-in this btrfs filesystem. The default value is zero, or the start of the
-device. This option is intended only for debugging filesystem resize
-operations.
-
--b|--byte-count <size>::
-Specify the size of the resultant filesystem. If this option is not used,
-mkfs.btrfs uses all the available storage for the filesystem.
-
--d|--data <type>::
-Specify how the data must be spanned across the devices specified. Valid
-values are 'raid0', 'raid1', 'raid5', 'raid6', 'raid10' or 'single'.
-
--f|--force::
-Force overwrite when an existing filesystem is detected on the device.
-By default, mkfs.btrfs will not write to the device if it suspects that
-there is a filesystem or partition table on the device already.
-
--l|--leafsize <size>::
-Alias for --nodesize. Deprecated.
+*-A|--alloc-start <offset>*::
+(An option to help debugging chunk allocator.)
+Specify the (physical) offset from the start of the device at which allocations
+start. The default value is zero.
+
+*-b|--byte-count <size>*::
+Specify the size of the filesystem. If this option is not used,
+mkfs.btrfs uses the entire device space for the filesystem.
--n|--nodesize <size>::
-Specify the nodesize, the tree block size in which btrfs stores
-data. The default value is 16KB (16384) or the page size, whichever is
-bigger. Must be a multiple of the sectorsize, but not larger than 65536.
-Leafsize always equals nodesize and the options are aliases.
+*-d|--data <profile>*::
+Specify the profile for the data block groups. Valid values are 'raid0',
+'raid1', 'raid5', 'raid6', 'raid10' or 'single', (case does not matter).
--L|--label <name>::
-Specify a label for the filesystem.
+*-m|--metadata <profile>*::
+Specify the profile for the metadata block groups.
+Valid values are 'raid0', 'raid1', 'raid5', 'raid6', 'raid10', 'single' or
+'dup', (case does not matter).
+
-NOTE: <name> should be less than 256 characters.
+A single device filesystem will default to 'DUP', unless a SSD is detected. Then
+it will default to 'single'. The detection is based on the value of
+`/sys/block/DEV/queue/rotational`, where 'DEV' is the short name of the device.
+This is because SSDs can remap the blocks internally to a single copy thus
+deduplicating them which negates the purpose of increased metadata redunancy
+and just wastes space.
++
+Note that the rotational status can be arbitrarily set by the underlying block
+device driver and may not reflect the true status (network block device, memory-backed
+SCSI devices etc). Use the options '--data/--metadata' to avoid confusion.
--m|--metadata <profile>::
-Specify how metadata must be spanned across the devices specified. Valid
-values are 'raid0', 'raid1', 'raid5', 'raid6', 'raid10', 'single' or 'dup'.
+*-M|--mixed*::
+Normally the data and metadata block groups are isolated. The 'mixed' mode
+will remove the isolation and store both types in the same block group type.
+This helps to utilize the free space regardless of the purpose and is suitable
+for small devices. The separate allocation of block groups leads to a situation
+where the space is reserved for the other block group type, is not available for
+allocation and can lead to ENOSPC state.
+
-Single device
-will have dup set by default except in the case of SSDs which will default to
-single. This is because SSDs can remap blocks internally so duplicate blocks
-could end up in the same erase block which negates the benefits of doing
-metadata duplication.
-
--M|--mixed::
-Mix data and metadata chunks together for more efficient space
-utilization. This feature incurs a performance penalty in
-larger filesystems. It is recommended for use with filesystems
-of 1 GiB or smaller.
-
--q|--quiet::
-Print only error or warning messages. Options --features or --help are unaffected.
+The recommended size for the mixed mode is for filesystems less than 1GiB. The
+soft recommendation is to use it for filesystems smaller than 5GiB. Thie mixed
+mode may lead to degraded performance on larger filesystems, but is otherwise
+usable, even on multiple devices.
++
+The 'nodesize' and 'sectorsize' must be equal, and the block group types must
+match.
++
+NOTE: versions up to 4.2.x forced the mixed mode for devices smaller than 1GiB.
+This has been removed in 4.3+ as it caused some usability issues.
--s|--sectorsize <size>::
-Specify the sectorsize, the minimum data block allocation unit.
+*-l|--leafsize <size>*::
+Alias for --nodesize. Deprecated.
+
+*-n|--nodesize <size>*::
+Specify the nodesize, the tree block size in which btrfs stores metadata. The
+default value is 16KiB (16384) or the page size, whichever is bigger. Must be a
+multiple of the sectorsize, but not larger than 64KiB (65536). Leafsize always
+equals nodesize and the options are aliases.
++
+Smaller node size increases fragmentation but lead to higher b-trees which in
+turn leads to lower locking contention. Higher node sizes give better packing
+and less fragmentation at the cost of more expensive memory operations while
+updating the metadata blocks.
+
-The default
-value is the page size. If the sectorsize differs from the page size, the
-created filesystem may not be mountable by current kernel. Therefore it is not
-recommended to use this option unless you are going to mount it on a system
-with the appropriate page size.
-
--r|--rootdir <rootdir>::
-Specify a directory to copy into the newly created btrfs filesystem.
+NOTE: versions up to 3.11 set the nodesize to 4k.
+
+*-s|--sectorsize <size>*::
+Specify the sectorsize, the minimum data block allocation unit.
+
-NOTE: '-r' option is done completely in userland, and don't need root
-privilege to mount the filesystem.
+The default value is the page size and is autodetected. If the sectorsize
+differs from the page size, the created filesystem may not be mountable by the
+kernel. Therefore it is not recommended to use this option unless you are going
+to mount it on a system with the appropriate page size.
--K|--nodiscard::
-Do not perform whole device TRIM operation by default.
+*-L|--label <string>*::
+Specify a label for the filesystem. The 'string' should be less than 256
+bytes and must not contain newline characters.
--O|--features <feature1>[,<feature2>...]::
+*-K|--nodiscard*::
+Do not perform whole device TRIM operation on devices that are capable of that.
+
+*-r|--rootdir <rootdir>*::
+Populate the toplevel subvolume with files from 'rootdir'. This does not
+require root permissions and does not mount the filesystem.
+
+*-O|--features <feature1>[,<feature2>...]*::
A list of filesystem features turned on at mkfs time. Not all features are
supported by old kernels. To disable a feature, prefix it with '^'.
+
-To see all features run:
+See section *FILESYSTEM FEATURES* for more details. To see all available
+features that mkfs.btrfs supports run:
+
+mkfs.btrfs -O list-all+
--U|--uuid <UUID>::
-Create the filesystem with the specified UUID, which must not already exist on
-the system.
+*-f|--force*::
+Forcibly overwrite the block devices when an existing filesystem is detected.
+By default, mkfs.btrfs will utilize 'libblkid' to check for any known
+filesystem on the devices. Alternatively you can use the `wipefs` utility
+to clear the devices.
+
+*-q|--quiet*::
+Print only error or warning messages. Options --features or --help are unaffected.
+
+*-U|--uuid <UUID>*::
+Create the filesystem with the given 'UUID'. The UUID must not exist on any
+filesystem currently present.
--V|--version::
+*-V|--version*::
Print the *mkfs.btrfs* version and exit.
--h::
+*--help*::
Print help.
-UNIT
-----
-As default the unit is the byte, however it is possible to append a suffix
-to the arguments like 'k' for KBytes, 'm' for MBytes...
+SIZE UNITS
+----------
+The default unit is 'byte'. All size parameters accept suffixes in the 1024
+base. The recognized suffixes are: 'k', 'm', 'g', 't', 'e', both uppercase and
+lowercase.
+
+MULTIPLE DEVICES
+----------------
+
+Before mounting a multiple device filesystem, the kernel module must know the
+association of the block devices that are attached to the filesystem UUID.
+
+There is typically no action needed from the user. On a system that utilizes a
+udev-like daemon, any new block device is automatically registered. The rules
+call *btrfs device scan*.
+
+The same command can be used to trigger the device scanning if the btrfs kernel
+module is reloaded (naturally all previous information about the device
+registration is lost).
+
+Another possibility is to use the mount options *device* to specify the list of
+devices to scan at the time of mount.
+
+ # mount -o device=/dev/sdb,device=/dev/sdc /dev/sda /mnt
+
+NOTE: that this means only scanning, if the devices do not exist in the system,
+mount will fail anyway. This can happen on systems without initramfs/initrd and
+root partition created with RAID1/10/5/6 profiles. The mount action can happen
+before all block devices are discovered. The waiting is usually done on the
+initramfs/initrd systems.
+
+FILESYSTEM FEATURES
+-------------------
+
+*mixed-bg*::
+mixed data and metadata block groups, also set by option '--mixed'
+
+*extref*::
+(default since btrfs-progs 3.12, kernel support since 3.7)
++
+increased hardlink limit per file in a directory to 65536, older kernels
+supported a varying number of hardlinks depending on the sum of all file name
+sizes that can be stored into one metadata block
+
+*raid56*::
+extended format for RAID5/6, also enabled if raid5 or raid6 block groups
+are selected
+
+*skinny-metadata*::
+(default since btrfs-progs 3.18, kernel support since 3.10)
++
+reduced-size metadata for extent references, saves a few percent of metadata
+
+*no-holes*::
+improved representation of file extents where holes are not explicitly
+stored as an extent, saves a few percent of metadata if sparse files are used
+
+BLOCK GROUPS, CHUNKS, RAID
+--------------------------
+
+The highlevel organizational units of a filesystem are block groups of three types:
+data, metadata and system.
+
+*DATA*::
+store data blocks and nothing else
+
+*METADATA*::
+store internal metadata in b-trees, can store file data if they fit into the
+inline limit
+
+*SYSTEM*::
+store structures that describe the mapping between the physical devices and the
+linear logical space representing the filesystem
+
+Other terms commonly used:
+
+*block group*::
+*chunk*::
+a logical range of space of a given profile, stores data, metadata or both;
+sometimes the terms are used interchangably
++
+A typical size of metadata block group is 256MiB (filesystem smaller than
+50GiB) and 1GiB (larger than 50GiB), for data it's 1GiB. The system block group
+size is a few megabytes.
+
+*RAID*::
+a block group profile type that utilizes RAID-like features on multiple
+devices: striping, mirroring, parity
+
+*profile*::
+when used in connection with block groups refers to the allocation strategy
+and constraints, see the section 'PROFILES' for more details
+
+PROFILES
+--------
+
+There are the following block group types available:
+
+[ width="60%",options="header" ]
+|=============================================================
+| Profile | Redundancy | Striping | Min/max devices
+| single | 1 copy | n/a | 1/any
+| DUP | 2 copies / 1 device | n/a | 1/1
+| RAID0 | n/a | 1 to N | 2/any
+| RAID10 | 2 copies | 1 to N | 4/any
+| RAID5 | 2 copies | 3 to N - 1 | 2/any
+| RAID6 | 3 copies | 3 to N - 2 | 3/any
+|=============================================================
+
+KNOWN ISSUES
+------------
+
+**SMALL FILESYSTEMS AND LARGE NODESIZE**
+
+The combination of small filesystem size and large nodesize is not recommended
+in general and can lead to various ENOSPC-related issues during mount time or runtime.
-NOTES
------
Since mixed block group creation is optional, we allow small
-filesystem instances with differing values for sectorsize and nodesize
-to be created and could end up in the following situation,
+filesystem instances with differing values for 'sectorsize' and 'nodesize'
+to be created and could end up in the following situation:
- [root@localhost ~]# mkfs.btrfs -f -n 65536 /dev/loop0
+ # mkfs.btrfs -f -n 65536 /dev/loop0
btrfs-progs v3.19-rc2-405-g976307c
See http://btrfs.wiki.kernel.org for more information.
@@ -159,12 +291,14 @@ to be created and could end up in the following situation,
Devices:
ID SIZE PATH
1 512.00MiB /dev/loop0
- [root@localhost ~]# mount /dev/loop0 /mnt/
+
+ # mount /dev/loop0 /mnt/
mount: mount /dev/loop0 on /mnt failed: No space left on device
-The ENOSPC occurs during the creation of the UUID tree. This is
-because of things like large metadata block size, DUP mode used for
-metadata and global reservation consuming space.
+The ENOSPC occurs during the creation of the UUID tree. This is caused
+by large metadata blocks and space reservation strategy that allocates more
+than can fit into the filesystem.
+
AVAILABILITY
------------
@@ -174,4 +308,4 @@ further details.
SEE ALSO
--------
-`btrfs`(8)
+`btrfs`(8), `wipefs`(8)