|author||David Sterba <firstname.lastname@example.org>||2016-06-21 02:48:39 +0200|
|committer||David Sterba <email@example.com>||2016-06-21 02:50:27 +0200|
btrfs-progs: docs: update 'btrfs-device' manual page
Signed-off-by: David Sterba <firstname.lastname@example.org>
Diffstat (limited to 'Documentation')
1 files changed, 152 insertions, 53 deletions
diff --git a/Documentation/btrfs-device.asciidoc b/Documentation/btrfs-device.asciidoc
index edd9b98e..d05fc457 100644
@@ -3,7 +3,7 @@ btrfs-device(8)
-btrfs-device - control btrfs devices
+btrfs-device - manage devices of btrfs filesystems
@@ -11,95 +11,102 @@ SYNOPSIS
-*btrfs device* is used to control the btrfs devices, since btrfs can be used
-across several devices, *btrfs device* is used for multiple device management.
+The *btrfs device* command group is used to manage devices of the btrfs filesystems.
-Btrfs filesystem is capable to manage multiple devices.
+Btrfs filesystem can be created on top of single or multiple block devices.
+Data and metadata are organized in allocation profiles with various redundancy
+policies. There's some similarity with traditional RAID levels, but this could
+be confusing to users familiar with the traditional meaning. Due to the
+similarity, the RAID terminology is widely used in the documentation. See
+`mkfs.btrfs`(9) for more details and the exact profile capabilities and
-Btrfs filesystem uses different profiles to manage different RAID level, and
-use balance to rebuild chunks, also devices can be added/removed/replace
+The device management works on a mounted filesystem. Devices can be added,
+removed or replaced, by commands profided by *btrfs device* and *btrfs replace*.
+The profiles can be also changed, provided there's enough workspace to do the
+conversion, using the *btrfs balance* comand and namely the filter 'convert'.
-Btrfs filesystem uses data/metadata profiles to manage allocation/duplication
-Profiles like RAID level can be assigned to data and metadata separately.
-See `mkfs.btrfs`(8) for more details.
+A profile describes an allocation policy based on the redundancy/replication
+constrants in connection with the number of devices. The profile applies to
+data and metadata block groups separately.
-Btrfs filesystem supports most of the standard RAID level: 0/1/5/6/10. +
-RAID levels can be assigned at mkfs time or online.
-See `mkfs.btrfs`(8) for mkfs time RAID level assign and `btrfs-balance`(8) for
-online RAID level assign.
-NOTE: Since btrfs is under heavy development especially the RAID5/6 support,
-it is *highly* recommended to read the follow btrfs wiki page to get more
-updated details on RAID5/6: +
-`btrfs-balance`(8) subcommand can be used to balance or rebuild chunks to the
-Due to the fact that balance can rebuild/recovery chunks according to its RAID
-duplication if possible, so when using RAID1/5/6/10 with some devices failed
-and you just added a new device to btrfs using `btrfs-device`(8), you should
-run `btrfs-balance`(8) to rebuild the chunks.
-See `btrfs-balance`(8) for more details.
+Where applicable, the level refers to a profile that matches constraints of the
+standard RAID levels. At the moment the supported ones are: RAID0, RAID1,
+RAID10, RAID5 and RAID6.
-Device can be added/removed using `btrfs-device`(8) subcommand and replaced
-When device is removed or replaced, btrfs will do the chunk rebuild if needed.
-See `btrfs-replace`(8) man page for more details on device replace.
+See the section *TYPICAL USECASES* for some examples.
*add* [-Kf] <dev> [<dev>...] <path>::
Add device(s) to the filesystem identified by <path>.
-If applicable, a whole device discard (TRIM) operation is performed.
+If applicable, a whole device discard (TRIM) operation is performed prior to
+adding the device. A device with existing filesystem detected by `blkid`(8)
+will prevent device addition and has to be forced. Alternatively the filesystem
+can be wiped from the device using eg. the `wipefs`(8) tool.
+The operation is instant and does not affect existing data. The operation merely
+adds the device to the filesystem structures and creates some block groups
-do not perform discard by default
+do not perform discard (TRIM) by default
force overwrite of existing filesystem on the given disk(s)
*remove* <dev>|<devid> [<dev>|<devid>...] <path>::
-Remove device(s) from a filesystem identified by <path>.
+Remove device(s) from a filesystem identified by <path>
+Device removal must satisfy the profile constraints, otherwise the command
+fails. The filesystem must be converted to profile(s) that would allow the
+removal. This can typically happen when going down from 2 devices to 1 and
+using the RAID1 profile. See the example section below.
+The operation can take long as it needs to move all data from the device.
+NOTE: It is not possible to delete the device that was used to mount the
+filesystem. This is a limitation given by the VFS.
*delete* <dev>|<devid> [<dev>|<devid>...] <path>::
Alias of remove kept for backward compatibility
-Check device to see if it has all of it's devices in cache for mounting.
+Wait until all devices of a multiple-device filesystem are scanned and registered
+within the kernel module.
*scan* [(--all-devices|-d)|<device> [<device>...]]::
-Scan devices for a btrfs filesystem.
+Scan devices for a btrfs filesystem and register them with the kernel module.
+This allows mounting multiple-device filesystem by specifying just one from the
+If no devices are passed, all block devices that blkid reports to contain btrfs
-If one or more devices are passed, these are scanned for a btrfs filesystem.
-If no devices are passed, btrfs uses block devices containing btrfs
-filesystem as listed by blkid.
-Finally, '--all-devices' or '-d' is the deprecated option. If it is passed,
-its behavior is the same as if no devices are passed.
+The options '--all-devices' or '-d' are deprecated and kept for backward compatibility.
+If used, behavior is the same as if no devices are passed.
+The command can be run repeatedly. Devices that have been already registered
+remain as such. Reloading the kernel module will drop this information. There's
+an alternative way of mounting multiple-device filesystem without the need for
+prior scanning. See the mount option 'device'.
*stats* [-z] <path>|<device>::
-Read and print the device IO stats for all mounted devices of the filesystem
-identified by <path> or for a single <device>.
+Read and print the device IO error statistics for all devices of the given
+filesystem identified by <path> or for a single <device>. See section *DEVICE
+STATS* for more information.
-Reset stats to zero after reading them.
+Print the stats and reset the values to zero afterwards.
*usage* [options] <path> [<path>...]::
Show detailed information about internal allocations in devices.
@@ -127,6 +134,98 @@ show sizes in TiB, or TB with --si
If conflicting options are passed, the last one takes precedence.
+STARTING WITH A SINGLE-DEVICE FILESYSTEM
+Assume we've created a filesystem on a block device '/dev/sda' with profile
+'single/single' (data/metadata), the device size is 50GiB and we've used the
+whole device for the filesystem. The mount point is '/mnt'.
+The amount of data stored is 16GiB, metadata have allocated 2GiB.
+==== ADD NEW DEVICE ====
+We want to increase the total size of the filesystem and keep the profiles. The
+size of the new device '/dev/sdb' is 100GiB.
+ $ btrfs device add /dev/sdb /mnt
+The amount of free data space increases by less than 100GiB, some space is
+allocated for metadata.
+==== CONVERT TO RAID1 ====
+Now we want to increase the redundancy level of both data and metadata, but
+we'll do that in steps. Note, that the device sizes are not equal and we'll use
+that to show the capabilities of split data/metadata and independent profiles.
+The constraint for RAID1 gives us at most 50GiB of usable space and exactly 2
+copies will be stored on the devices.
+First we'll convert the metadata. As the metadata occupy less than 50GiB and
+there's enough workspace for the conversion process, we can do:
+ $ btrfs balance start -mconvert=raid1 /mnt
+This operation can take a while as the metadata have to be moved and all block
+pointers updated. Depending on the physical locations of the old and new
+blocks, the disk seeking is the key factor affecting performance.
+You'll note that the system block group has been also converted to RAID1, this
+normally happens as the system block group also holds metadata (the physical to
+* available data space decreased by 3GiB, usable rougly (50 - 3) + (100 - 3) = 144 GiB
+* metadata redundancy increased
+IOW, the unequal device sizes allow for combined space for data yet improved
+redundancy for metadata. If we decide to increase redundancy of data as well,
+we're going to lose 50GiB of the second device for obvious reasons.
+ $ btrfs balance start -dconvert=raid1 /mnt
+The balance process needs some workspace (ie. a free device space without any
+data or metadata block groups) so the command could fail if there's too much
+data or the block groups occupy the whole first device.
+The device size of '/dev/sdb' as seen by the filesystem remains unchanged, but
+the logical space from 50-100GiB will be unused.
+The device stats keep persistent record of several error classes related to
+doing IO. The current values are printed at mount time and updated during
+filesystem lifetime or from a scrub run.
+ $ btrfs device stats /dev/sda3
+ [/dev/sda3].write_io_errs 0
+ [/dev/sda3].read_io_errs 0
+ [/dev/sda3].flush_io_errs 0
+ [/dev/sda3].corruption_errs 0
+ [/dev/sda3].generation_errs 0
+Failed writes to the block devices, means that the layers beneath the
+filesystem were not able to satisfy the write request.
+Read request analogy to write_io_errs.
+Number of failed writes with the 'FLUSH' flag set. The flushing is a method of
+forcing a particular order between write requests and is crucial for
+implementing crash consistency. In case of btrfs, all the metadata blocks must
+be permanently stored on the block device before the superblock is written.
+A block checksum mismatched or a corrupted metadata header was found.
+The block generation does not match the expected value (eg. stored in the
*btrfs device* returns a zero exit status if it succeeds. Non zero is