summaryrefslogtreecommitdiff
path: root/debian/FAQ
blob: 6f5cc7f90636a9f46da4fc45785be38d1f80e41c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
Frequently asked questions -- Debian mdadm
==========================================

Also see /usr/share/doc/mdadm/README.recipes.gz

0. What does MD stand for?
~~~~~~~~~~~~~~~~~~~~~~~~~~
  MD is an abbreviation for "multiple device" (also often called "multi-
  disk"). The Linux MD implementation implements various strategies for
  combining multiple physical devices into single logical ones. The most
  common use case is commonly known as "Software RAID". Linux supports RAID
  levels 1, 4, 5, 6, and 10, as well as the "pseudo-redundant" RAID level 0.
  In addition, the MD implementation covers linear and multipath
  configurations.

  Most people refer to MD as RAID. Since the original name of the RAID
  configuration software is "md"adm, I chose to use MD consistently instead.

1. How do I overwrite ("zero") the superblock?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  mdadm --zero-superblock /dev/mdX

  Note that this is a destructive operation. It does not actually delete any
  data, but the device will have lost its "authority". You cannot assemble the
  array with it anymore, and if you add the device to another array, the
  synchronisation process *will* *overwrite* all data on the device.

  Nevertheless, sometimes it is necessary to zero the superblock:

  - If you are reusing a disk that has been part of an array with an different
    superblock version and/or location. In this case you zero the superblock
    before you assemble the array, or add the device to an array.

  - If you are trying to prevent a device from being recognised as part of an
    array. Say for instance you are trying to change an array spanning sd[ab]1
    to sd[bc]1 (maybe because sda is failing or too slow), then automatic
    (scan) assembly will still recognise sda1 as a valid device. You can limit
    the devices to scan with the DEVICE keyword in the configuration file, but
    this may not be what you want. Instead, zeroing the superblock will
    (permanently) prevent a device from being considered as part of an array.

2. How do I change the preferred minor of an MD array (RAID)?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  See item 12 in /usr/share/doc/mdadm/README.recipes.gz and read the mdadm
  manpage (search for 'preferred').

3. How does mdadm determine which /dev/mdX or /dev/md/X to use?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  The logic used by mdadm to determine the device node name in the mdadm
  --examine output (which is used to generate mdadm.conf) depends on several
  factors. Here's how mdadm determines it:

  It first checks the superblock version of a given array (or each array in
  turn when iterating all of them). Run

    mdadm --detail /dev/mdX | sed -ne 's,.*Version : ,,p'

  to determine the superblock version of a running array, or
  
    mdadm --examine /dev/sdXY | sed -ne 's,.*Version : ,,p'

  to determine the superblock version from a component device of an array.
  
  Version 0 superblocks (00.90.XX)
  ''''''''''''''''''''''''''''''''
    You need to know the preferred minor number stored in the superblock,
    so run either of

      mdadm --detail /dev/mdX | sed -ne 's,.*Preferred Minor : ,,p'
      mdadm --examine /dev/sdXY | sed -ne 's,.*Preferred Minor : ,,p'

    Let's call the resulting number MINOR. Also see FAQ 1 further up.

    Given MINOR, mdadm will output /dev/md<MINOR> if the device node
    /dev/md<MINOR> exists.
    Otherwise, it outputs /dev/md/<MINOR>

  Version 1 superblocks (01.XX.XX)
  ''''''''''''''''''''''''''''''''
    Version 1 superblocks actually seem to ignore preferred minors and instead
    use the value of the name field in the superblock. Unless specified
    explicitly during creation (-N|--name) the name is determined from the
    device name used, using the following regexp: 's,/dev/md/?(.*),$1,', thus:

      /dev/md0     -> 0
      /dev/md/0    -> 0
      /dev/md_d0   -> _d0
      /dev/md/d0   -> d0
      /dev/md/name -> name
      (/dev/name does not seem to work)

    mdadm will append the name to '/dev/md/', so it will always output device
    names under the /dev/md/ directory.

    If you want to change the name, you can do so during assembly:

      mdadm -A -U name -N newname /dev/mdX /dev/sd[abc]X

    I know this all sounds inconsistent and upstream has some work to do.
    We're on it.

4. Which RAID level should I use?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  Please read /usr/share/doc/mdadm/RAID5_versus_RAID10.txt.gz .

  Many people seem to prefer RAID4/5/6 because it makes more efficient use of
  space. If you have disks of size X, then in order to get 2X of usable space,
  you need e.g. 3 disks with RAID5, but 4 if you use RAID10 or RAID1+0.
  
  This gain in usable space comes at a price: performance; RAID1/10 can be up
  to four times faster than RAID4/5/6.

  At the same time, however, RAID4/5/6 provide somewhat better redundancy in
  the event of two failing disks. In a RAID10 configuration, if one disk is
  already dead, the RAID can only survive if any of the two disks in the other
  RAID1 array fails, but not if the second disk in the degraded RADI1 array
  fails. A RAID6 across four disks can cope with any two disks failing.

  If you can afford the extra disks (storage *is* cheap these days), I suggest
  RAID1/10 over RAID4/5/6. If you don't care about performance but need as
  much space as possible, go with RAID4/5/6, but make sure to have backups.
  Heck, make sure to have backups whatever you do.

  Let it be said, however, that I thoroughly regret putting my primary
  workstation on RAID5. Anything disk-intensive brings the system to its
  knees; I will have to migrate to RAID10 at one point.

5. How to convert RAID5 to RAID10?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  You have me convinced, I want to convert my RAID5 to a RAID10. I have three
  disks in the RAID and a spare, so I thought I'd just remove the spare and
  one of the three disks, create a degraded RAID10 on these two, copy data,
  then add the other two disks to the new RAID10. However, mdadm cannot
  assemble a RAID10 with 50% missing devices when I ask it to:

    mdadm --create -l 10 -n4 -pn2 /dev/md1 /dev/sd[cd] missing missing

  For some reason, mdadm actually cares about the order of devices you give
  it. If you intersperse the missing keywords with the physical drives, it
  should work:

    mdadm --create -l 10 -n4 -pn2 /dev/md1 /dev/sdc missing /dev/sdd missing

  See: http://marc.theaimsgroup.com/?l=linux-raid&m=116004333406395&w=2

6. What is the difference between RAID1+0 and RAID10?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  RAID1+0 is a form of RAID in which a RAID0 is striped across two RAID1
  arrays. To assemble it, you create two RAID1 arrays and then create a RAID0
  array with the two md arrays.

  The Linux kernel provides the RAID10 level to do pretty much exactly the
  same for you, but with greater flexibility (and somewhat improved
  performance). While RAID1+0 makes sense with 4 disks, RAID10 can be
  configured to work with only 3 disks. Also, RAID10 has a little less
  overhead than RAID1+0, which has data pass the md layer twice.

  I prefer RAID10 over RAID1+0.

7. Which RAID10 layout scheme should I use
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  RAID10 gives you the choice between three ways of laying out the blocks on
  the disk. Assuming a simple 4 drive setup with 2 copies of each block, then
  if A,B,C are data blocks, a,b their parts, and 1,2 denote their copies, the
  following would be a classic RAID1+0 where 1,2 and 3,4 are RAID0 pairs
  combined into a RAID1:

  near=2 would be (this is the classic RAID1+0)

    hdd1  Aa1 Ba1 Ca1
    hdd2  Aa2 Ba2 Ca2
    hdd3  Ab1 Bb1 Cb1
    hdd4  Ab2 Bb2 Cb2

  offset=2 would be

    hdd1  Aa1 Bb2 Ca1 Db2
    hdd2  Ab1 Aa2 Cb1 Ca2
    hdd3  Ba1 Ab2 Da1 Cb2
    hdd4  Bb1 Ba2 Db1 Da2

  far=2 would be

    hdd1  Aa1 Ca1  .... Bb2 Db2
    hdd2  Ab1 Cb1  .... Aa2 Ca2
    hdd3  Ba1 Da1  .... Ab2 Cb2
    hdd4  Bb1 Db1  .... Ba2 Da2

  Where the second set start half-way through the drives.
  
  The advantage of far= is that you can easily spread a long sequential read
  across the drives.  The cost is more seeking for writes. offset= can
  possibly get similar benefits with large enough chunk size. Neither upstream
  nor the upstream maintainer have tried to understand all the implications of
  that layout. It was added simply because it is a supported layout in DDF and
  DDF support is a goal.

8. (One of) my RAID arrays is busy and cannot be stopped. What gives?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  It is perfectly normal for mdadm to report the array with the root
  filesystem to be busy on shutdown. The reason for this is that the root
  filesystem must be mounted to be able to stop the array (or otherwise
  /sbin/mdadm does not exist), but to stop the array, the root filesystem
  cannot be mounted. Catch 22. The kernel actually stops the array just before
  halting, so it's all well.

  If mdadm cannot stop other arrays on your system, check that these arrays
  aren't used anymore. Common causes for busy/locked arrays are:

    * The array contains a mounted filesystem (check the `mount' output)
    * The array is used as a swap backend (check /proc/swaps)
    * The array is used by the device-mapper (check with `dmsetup')
      * LVM
      * dm-crypt
      * EVMS
    * The array is used by a process (check with `lsof')
  
9. Should I use RAID0 (or linear)?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  No.

9b. Why not?
~~~~~~~~~~~~
  RAID0 has zero redundancy. If you stripe a RAID0 across X disks, you
  increase the likelyhood of complete loss of the filesystem by a factor of X.

  The same applies to LVM by the way.

  If you want/must used LVM or RAID0, put it on RAID1 arrays (RAID10/RAID1+0,
  or LVM on RAID1).

10. Can I cancel a running array check (checkarray)?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  See the -x option in the `checkarray --help` output.

11. mdadm warns about duplicate/similar superblocks; what gives?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  In certain configurations, especially if your last partition extends all the
  way to the end of the disk, mdadm may display a warning like:
   
    mdadm: WARNING /dev/hdc3 and /dev/hdc appear to have very similar
    superblocks. If they are really different, please --zero the superblock on
    one. If they are the same or overlap, please remove one from the DEVICE
    list in mdadm.conf.

  There are two ways to solve this:

  (a) recreate the arrays with version-1 superblocks, which is not always an
      option -- you cannot yet upgrade version-0 to version-1 superblocks for
      existing arrays.

  (b) instead of 'DEVICE partitions', list exactly those devices that are
      components of MD arrays on your system. So in the above example:

        - DEVICE partitions
        + DEVICE /dev/hd[ab]* /dev/hdc[123]

12. mdadm -E / mkconf report different arrays with the same device
    name / minor number. What gives?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  In almost all cases, mdadm updates the super-minor field in an array's
  superblock when assembling the array. It does *not* do this for RAID0
  arrays. Thus, you may end up seeing something like this when you run 
  mdadm -E or mkconf:

    ARRAY /dev/md0 level=raid0 num-devices=2 UUID=abcd...
    ARRAY /dev/md0 level=raid1 num-devices=2 UUID=dcba...

  Note how the two arrays have different UUIDs but both appear as /dev/md0.

  The solution in this case is to explicitly tell mdadm to update the
  superblock of the RAID0 array. Assuming that the RAID0 array in the above
  example should really be /dev/md1:

    mdadm --stop /dev/md1
    mdadm --assemble --update=super-minor --uuid=abcd... /dev/md1

  See also http://bugs.debian.org/386315 and recipe #12 in README.recipes .

13. Can a MD array be partitioned?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  For a MD array to be able to hold partitions, it must be created as
  a "partitionable array", using the configuration auto=part on the command
  line or in the configuration file, or by using the standard naming scheme
  (md_d* or md/d*) for partitionable arrays:

    mdadm --create --auto=yes ... /dev/md_d0 ...
    # see mdadm(8) manpage about the values of the --auto keyword

14. When would I use partitionable arrays?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  This answer by Doug Ledford is shamelessly adapted from [0] (with
  permission):

    First, not all MD types make sense to be split up, e.g. multipath. For
    those types, when a disk fails, the *entire* disk is considered to have
    failed, but with different arrays you won't switch over to the next path
    until each MD array has attempted to access the bad path. This can have
    obvious bad consequences for certain array types that do automatic
    failover from one port to another (you can end up getting the array in
    a loop of switching ports repeatedly to satisfy the fact that one array
    failed over during a path down, then the path came back up, and another
    array stayed on the old path because it didn't send any commands during
    the path down time period).

    Second, convenience. Assume you have a 6 disk RAID5 array. If a disk
    fails and you are using a partitioned MD array, then all the partitions on
    the disk will already be handled without using that disk. No need to
    manually fail any still active array members from other arrays.

    Third, safety. Again with the raid5 array. If you use multiple arrays on
    a single disk, and that disk fails, but it only failed on one array, then
    you now need to manually fail that disk from the other arrays before
    shutting down or hot swapping the disk. Generally speaking, that's not
    a big deal, but people do occasionally have fat finger syndrome and this
    is a good opportunity for someone to accidentally fail the wrong disk, and
    when you then go to remove the disk you create a two disk failure instead
    of one and now you are in real trouble.

    Forth, to respond to what you wrote about independent of each other --
    part of the reason why you partition. I would argue that's not true. If
    your goal is to salvage as much use from a failing disk as possible, then
    OK. But, generally speaking, people that have something of value on their
    disks don't want to salvage any part of a failing disk, they want that
    disk gone and replaced immediately. There simply is little to no value in
    an already malfunctioning disk. They're too cheap and the data stored on
    them too valuable to risk loosing something in an effort to further
    utilize broken hardware. This of course is written with the understanding
    that the latest MD RAID code will do read error rewrites to compensate for
    minor disk issues, so anything that will throw a disk out of an array is
    more than just a minor sector glitch.

  0. http://marc.theaimsgroup.com/?l=linux-raid&m=116117813315590&w=2

15. How can I start a dirty degraded array?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  A degraded array (e.g. a RAID5 with only two disks) that has not been
  properly stopped cannot be assembled just like that; mdadm will refuse and
  complain about a "dirty degraded array".

  The solution is to force-assemble it, and then to start it. Please see
  recipes 4 and 4b of /usr/share/doc/mdadm/README.recipes.gz .

 -- martin f. krafft <madduck@debian.org>  Wed, 18 Oct 2006 15:56:32 +0200

$Id$