summaryrefslogtreecommitdiff
path: root/managemon.c
Commit message (Collapse)AuthorAge
* mdmon: allow prepare_update to report failure.NeilBrown2014-07-10
| | | | | | | | | If 'prepare_update' fails for some reason there is little point continuing on to 'process_update'. For now only malloc failures are caught, but other failures will be considered in future. Signed-off-by: NeilBrown <neilb@suse.de>
* managemon: fix a dprintk.NeilBrown2013-09-10
| | | | | | | There is not guarantee that 'inst' is a number, and even if there were there is no point converting it str->int and then int->str again. Signed-off-by: NeilBrown <neilb@suse.de>
* mdmon: make sure we set safe_mode on SIGTERM.NeilBrown2013-09-02
| | | | | | | | | | | Without this, array may not go clean and mdmon will then not exit. A safe_mode of '0' (which is the only one that is handled differently by this patch) means "never switch to 'active_idle'". We don't want that when mdmon is stopping. Signed-off-by: NeilBrown <neilb@suse.de>
* mdmon: don't use 'ghost' values from an inactive array.NeilBrown2013-08-05
| | | | | | | | | | | | It is possible for mdmon to see (in /proc/mdstat) and array in 'inactive' state, "mdadm -S" has written "inactive" to "array_state". In this state values such as "raid_disk" are not meaningful and so should be ignored by manage_member(). Reported-by: "Dorau, Lukasz" <lukasz.dorau@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* managemon: fix typo affecting incrmental assembly.NeilBrown2013-08-05
| | | | | | | | This clearly should be 'st2'. As it is the 'raid_disk' value being tested is completely meaningless in the context of the new device. Signed-off-by: NeilBrown <neilb@suse.de>
* mdmon: always get layout from sysfsmwilck@arcor.de2013-08-05
| | | | | | | commit 71d68ff62 uses the array layout. It needs to be initialized. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
* mdmon: clear safe_mode_delay on shutdownNeilBrown2013-08-01
| | | | | | | When we receive a signal, set the safemode delay to v.small so that we can ge clean arrays and exit quickly Signed-off-by: NeilBrown <neilb@suse.de>o
* mdmon: manage_member: fix race condition during slow meta data writesMartin Wilck2013-07-31
| | | | | | | | | | | | | | | | | | | | | | | | In order to track kernel state changes, the monitor needs to notice changes in sysfs. If the changes are transient, and the monitor is busy writing meta data, it can happen that the changes are missed. This will cause the meta data to be inconsistent with the real state of the array. I can reproduce this in a test scenario with a DDF container and two subarrays, where I set a disk to "failed" and then add a global hot-spare. On a typical MD test setup with loop devices, I can reliably reproduce a failure where the metadata show degraded members although the kernel finished the recovery successfully. This patch fixes this problem by applying two changes. First, when a metadata update is queued, wait until it is certain that the monitor actually applied these meta data (the for loop is actually needed to avoid failures completely in my test case). Second, after triggering the recovery, set prev_state of the changed array to "recover", in case the monitor misses the transient "recover" state. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
* mdmon: manage_member: debug messages for array stateMartin Wilck2013-07-31
| | | | | | | Add debug messages to watch the manager's steps. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
* Remove lots of unnecessary white space.NeilBrown2013-06-19
| | | | | | | Now that I am using white-space mode in Emacs I can see all of this, and I don't like it :-) Signed-off-by: NeilBrown <neilb@suse.de>
* pr_err for mdmon.NeilBrown2013-05-21
| | | | Signed-off-by: NeilBrown <neilb@suse.de>
* Add updating component_size to manager thread of mdmonPawel Baldysiak2013-04-08
| | | | | | | | | | | | | | | | | | Mdmon does not update component_size now. It is wrong because in case of size's expansion component_size is changed by mdadm but mdmon does not reread its new value and uses a wrong, old one. As a result the metadata is incorrect during size's expansion. It contains no information that resync is in progress (there is no checkpoint too). The metadata is as if resync has already been finished but it has not. Component_size will be set to match information in sysfs. This value will be updated by manager thread in manage_member() function. Now mdmon uses the correct, current value of component_size and the correct metadata (containing information about resync and checkpoint) is written. Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Discard devnum in favour of devnmNeilBrown2013-02-21
| | | | | | | | | | | | | | We widely use a "devnum" which is 0 or +ve for md%d devices and -ve for md_d%d devices. But I want to be able to use md_%s device names. So get rid of devnum (a number) and use devnm (a 32char string). eg. md0 md_d2 md_home Signed-off-by: NeilBrown <neilb@suse.de>
* Allow data-offset to be specified per-device for createNeilBrown2012-10-04
| | | | | | | | | mdadm --create /dev/md0 .... /dev/sda1:1024 /dev/sdb1:2048 ... The size is in K unless a suffix: K M G is given. The suffix 's' means sectors. Signed-off-by: NeilBrown <neilb@suse.de>
* Remove scattered checks for malloc success.NeilBrown2012-07-09
| | | | | | | | | | | | | | malloc should never fail, and if it does it is unlikely that anything else useful can be done. Best approach is to abort and let some super-daemon restart. So define xmalloc, xcalloc, xrealloc, xstrdup which don't fail but just print a message and exit. Then use those removing all the tests for failure. Also replace all "malloc;memset" sequences with 'xcalloc'. Signed-off-by: NeilBrown <neilb@suse.de>
* Fix: Sometimes mdmon throws core dump during reshapeAdam Kwolek2012-02-09
| | | | | | | | | | | | | | | | | | | | Problem was found during reshaping 2 volumes /raid0 and raid5/ in container. Sometimes mdmon throws core dump due to NULL pointer exception. Problem occurs in scenario: - managemon: is about spare activation (degraded raid4 volume == raid0 under takeover) - managemon: detect level change and signals monitor (manage_member() calls replace_array()) - monitor: detects transition raid4/5->raid0 and sets a->container to NULL to indicate array deactivation - managemon : continues his work and tries to activate spare (a->check_degraded is set). NULL pointer is passed to metadata handler activate_spare() Core dump is generated. To resolve this situation managemon (after monitor kick) checks again a->container pointer to learn if current array is not to be deactivated. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* close_aa(): Verify file descriptors are valid before trying to close themJes Sorensen2011-11-03
| | | | | Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
* disk_init_and_add(): Fail if opening sysfs file descriptors failJes Sorensen2011-11-03
| | | | | Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
* FIX: Mdmon crashes after changing RAID level from 1 to 0Lukasz Dorau2011-09-06
| | | | | | | | | | | | | | | | | | | | | | | | | Description of the bug: Sometimes mdmon crashes after changing RAID level from 1 to 0 (takeover). Cause of the bug: The managemon marks an active_array for removal from monitoring by assigning a->container to NULL value (in the "manage_member" function). Sometimes (during stress test) it happens right when the monitor is in the "read_and_act" function and a->container pointer is in use. This causes the monitor crashes. Solution: The active array has to be marked for removal in another way than setting NULL pointer when it can be in use. A new field "to_remove" was added to the "active_array" structure. It is used in the managemon to mark a container to remove (instead of the old assigment: a->container = NULL) and monitor checks it to determine if the array should be removed. The field "to_remove" should be checked in some other places to avoid managing of the array which is going to be removed. Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* mdmon: fix, close spare activation raceDan Williams2011-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following test fails when the md_check_recovery() event triggered by the ro->rw transition causes remove_and_add_spares() to run while mdmon is attempting spare activation. Result is that the kernel races to set the slot immediately after sysfs_add_disk() writes new_dev. mdmon thinks the spare activation failed and declines to send the monitor a new acitve_array. We show degraded after the wait because the monitor cannot notify the metadata that all disks are in_sync. #!/bin/bash i=0 false while [ $? == 1 ] do i=$((i+1)) mdadm -Ss mdadm -CR /dev/md0 /dev/loop[0-2] -n 3 -e imsm mdadm -CR /dev/md1 /dev/loop[01] missing -n 3 -l 5 mdadm --wait /dev/md1 mdadm -E /dev/loop2 | grep -i degraded done echo "failed: $i" Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* FIX: After discarding array give chance monitor to remove itAdam Kwolek2011-04-05
| | | | | | | | | | | | | | | | When raid0 expansion occurs, takeover operation is used. After backward takeover monitor remains in memory. This happens due to remaining just removed active array in mdmon structures. If there is no other monitored arrays, mdmon has to finish his work. Problem was introduced in patch (2011.03.22): mdmon: Stop keeping track of RAID0 (and LINEAR) arrays. Prior to this patch mdmon kicking occurs via replace_array() where wakeup_monitor() was called. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* mdmon: Stop keeping track of RAID0 (and LINEAR) arrays.NeilBrown2011-03-22
| | | | | | | | | | | | Tracking RAID0 arrays doesn't really work. There is no need, and there are some sysfs files which won't exist when the array appears and then won't be opened when the level is changed. So simply ignore RAID0 and LINEAR arrays - don't add them when they appear and if an array we are monitoring turns into one of these, discard it promptly. Signed-off-by: NeilBrown <neilb@suse.de>
* mdmon: allow manage_member to cope with ->container becoming NULL.NeilBrown2011-03-22
| | | | | | | | | As monitor() can set ->container to NULL, we need to be careful about dereferencing it. So take a copy in manage_member, return if it is NULL, and only use the copy. Signed-off-by: NeilBrown <neilb@suse.de>
* Merge branch 'master' into devel-3.2NeilBrown2011-03-15
|\ | | | | | | | | | | | | | | Conflicts: Manage.c managemon.c super-ddf.c super-intel.c
| * ddf: implement remove_from_superNeilBrown2011-03-15
| | | | | | | | | | | | | | | | | | | | This is needed to remove devices from mdmon's knowledge when the device is removed from the md container. Now that ddf have a remove_from_super we don't need the code that allows some personalities not to implement this. Signed-off-by: NeilBrown <neilb@suse.de>
| * IMSM: Fix problem in mdmon monitor of using removed disk in imsm container.Labun, Marcin2011-03-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Manager thread shall pass the information to monitor thread (mdmon) that some devices are removed from container. Otherwise, monitor (mdmon) might use such devices (spares) to rebuild the array that has gone degraded. This problem happens for imsm containers, since a list of the container disks is maintained in intel_super structure. When array goes degraded, the list is searched to find a spare disks to start rebuild. Without this fix the rebuild could be stared on the spare device that was a member of the container, but has been removed from it. New super type function handler has been introduced to prepare metadata format specific information about removed devices. int (*remove_from_super)(struct supertype *st, mdu_disk_info_t *dinfo) The message prepared in remove_from_super is later processed by process_update handler in monitor thread. Signed-off-by: Marcin Labun <marcin.labun@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
| * managemon: Don't do spare assignment while any updates are pending.NeilBrown2011-03-15
| | | | | | | | | | | | | | | | Spare assignment requires full knowledge of array state. A pending update might modify that state (such as a pending spare assignment) so don't try while there are updates pending. Signed-off-by: NeilBrown <neilb@suse.de>
| * mdmon: don't copy an invalid chunk_sizeNeilBrown2011-03-10
| | | | | | | | | | | | | | | | | | As chunk_size in mdstat_ent is never set, we shouldn't copy it into a->info.array. In fact, it is safest to get rid of the field altogether. Reported-by: "Kwolek, Adam" <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | ddf: implement remove_from_superNeilBrown2011-03-14
| | | | | | | | | | | | | | | | | | | | This is needed to remove devices from mdmon's knowledge when the device is removed from the md container. Now that ddf have a remove_from_super we don't need the code that allows some personalities not to implement this. Signed-off-by: NeilBrown <neilb@suse.de>
* | FIX: Last_checkpoint has to be initialized in per disk unitsAdam Kwolek2011-03-14
| | | | | | | | | | | | | | | | | | last_checkpoint is variable that tracks sync_complete sysfs entry. sync_complete is per disk counter, so initializing during starting from checkpoint has to have this in mind and convert reshape position properly. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | FIX: Last checkpoint is not initialized on reshape restartAdam Kwolek2011-03-14
| | | | | | | | | | | | | | | | | | | | | | | | When reshape is restarted and active array in mdmon is being initialized, mdmon has to know last checkpoint, otherwise reshape will be restarted form '0' position. mdadm when reshaped array is assembled stores reshape_position in sysfs and runs mdmon. Initialize last_checkpoint in active array structure to value present in sysfs for reshaped array start. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | imsm: FIX: array size is wrongAdam Kwolek2011-02-03
| | | | | | | | | | | | | | | | Calculation of size is almost ok, except concept of blocks. Size for setting in md has to be divided by 2 to be correct. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | managemon: don't try to add spares when resync/recovery is happening.NeilBrown2011-02-01
| | | | | | | | | | | | | | kernel should reject this anyway, and we really should not be trying as it can only lead to confusion. Signed-off-by: NeilBrown <neilb@suse.de>
* | Detect level changeAdam Kwolek2011-01-06
| | | | | | | | | | | | | | | | | | | | For level migration support it is necessary to allow mdmon to react for level changes. It has to have ability to change configuration of active array, and for array level change to raid0 finish array monitoring. Signed-off-by: Maciej Trela <maciej.trela@intel.com> Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | Handle checkpointing during reshapeNeilBrown2010-12-16
| | | | | | | | | | | | | | | | | | We need to allow metadata to handle progress of reshape, completion, and abort-before-start. Include all those in ->set_array_state() Signed-off-by: NeilBrown <neilb@suse.de>
* | Allow a metadata update to have a linked list of allocated spaces.NeilBrown2010-12-16
| | | | | | | | | | | | | | | | | | | | | | | | Sometimes one metadata update will require allocating several larger data structures. As 'monitor' cannot allocate, 'manager' must, so it must be able to attach a list of allocates to the update, and importantly it must be able to easily free them. So add a 'space_list' element to metadata updates where each element on the list starts with a pointer to the next. Signed-off-by: NeilBrown <neilb@suse.de>
* | mdmon: when a reshape is detected, add any newly added devices to the array.NeilBrown2010-12-16
| | | | | | | | | | | | | | | | | | When mdadm starts a reshape, it might add some devices to the array first. mdmon needs to notice the reshape starting and check for any new devices. If there are any they need to be provided to be monitored. Signed-off-by: NeilBrown <neilb@suse.de>
* | IMSM: Fix problem in mdmon monitor of using removed disk in imsm container.Labun, Marcin2010-12-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Manager thread shall pass the information to monitor thread (mdmon) that some devices are removed from container. Otherwise, monitor (mdmon) might use such devices (spares) to rebuild the array that has gone degraded. This problem happens for imsm containers, since a list of the container disks is maintained in intel_super structure. When array goes degraded, the list is searched to find a spare disks to start rebuild. Without this fix the rebuild could be stared on the spare device that was a member of the container, but has been removed from it. New super type function handler has been introduced to prepare metadata format specific information about removed devices. int (*remove_from_super)(struct supertype *st, mdu_disk_info_t *dinfo) The message prepared in remove_from_super is later processed by process_update handler in monitor thread. Signed-off-by: Marcin Labun <marcin.labun@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | FIX: sync_completed_fd handler has to be closedAdam Kwolek2010-12-03
| | | | | | | | | | | | | | | | sync_completed_fd handler has to be closed when array is closing. This is in pair to open handler code. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | mdmon: don't copy an invalid chunk_sizeNeilBrown2010-11-30
| | | | | | | | | | | | | | | | | | As chunk_size in mdstat_ent is never set, we shouldn't copy it into a->info.array. In fact, it is safest to get rid of the field altogether. Reported-by: "Kwolek, Adam" <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | block monitor: freeze spare assignment for external arraysDan Williams2010-11-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to support reshape and atomic removal of spares from containers we need to prevent mdmon from activating spares. In the reshape case we additionally need to freeze sync_action while the reshape transaction is initiated with the kernel and recorded in the metadata. When reshaping a raid0 array we need to freeze the array *before* it is transitioned to a redundant raid level. Since sync_action does not exist at this point we extend the '-' prefix of a subarray string to flag mdmon not to activate spares. Mdadm needs to be reasonably certain that the version of mdmon in the system honors this 'freeze' indication. If mdmon is not already active then we assume the version that gets started is the same as the mdadm version. Otherwise, we check the version of mdmon as returned by the extended ping_monitor() operation. This is to catch cases where mdadm is upgraded in the filesystem, but mdmon started in the initramfs is from a previous release. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | Provide a mdstat_ent to subarray helperDan Williams2010-11-23
| | | | | | | | | | | | | | ...before introducing another open coded instace of this conversion. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | get_info_super: report which other devices are thought to be working/failed.NeilBrown2010-11-22
|/ | | | | | | | | | | | | | To accurately detect when an array has been split and is now being recombined, we need to track which other devices each thinks is working. We should never include a device in an array if it thinks that the primary device has failed. This patch just allows get_info_super to return a list of devices and whether they are thought to be working or not. Signed-off-by: NeilBrown <neilb@suse.de>
* Merge branch 'fixes' into for-neilDan Williams2010-07-01
|\
| * Always assume SKIP_GONE_DEVS behaviour and kill the flagDan Williams2010-06-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ...i.e. GET_DEVS == (GET_DEVS|SKIP_GONE_DEVS) A null pointer dereference in Incremental.c can be triggered by replugging a disk while the old name is in use. When mdadm -I is called on the new disk we fail the call to sysfs_read(). I audited all the locations that use GET_DEVS and it appears they can tolerate missing a drive. So just make SKIP_GONE_DEVS the default behaviour. Also fix up remaining unchecked usages of the sysfs_read() return value. Reported-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | mdmon: periodically checkpoint recoveryDan Williams2010-05-14
|/ | | | | | | | | | | | The kernel updates and notifies md/sync_completed when it is time to take a checkpoint. When this occurs (at 1/16 array size intervals) write 'idle' to md/sync_action to have the current recovery position updated in recovery_start and resync_start. Requires the metadata handler to reset ->last_checkpoint when it has determined that recovery has ended. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* mdmon: fix missing open of md/<dev>/recovery_startDan Williams2010-04-29
| | | | | | | | | When activating a spare we neglect to open recovery_start and as such do not see checkpoint events. Move disk initialization to common routine to mitigate recurrence. Reported-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* mdmon: insist on creating .pid file at startup.NeilBrown2010-02-08
| | | | | | | | | | | | | | | | Now that we don't "mdadm --takeover" until /var/run is writable there is no need to continually try to create files in there. So only create these files at startup and fail if they cannot be made. This means that to start an array with externally managed metadata, either /var/run or ALT_RUN (e.g. /lib/init/rw) must be writable. To 'takeover' from a previous mdmon instance, /var/run must be writable. This means we don't need to worry about SIGHUP (which was once used to tell us it was time to create .pid) and SIGALRM. Signed-off-by: NeilBrown <neilb@suse.de>
* mdmon: don't monitor /proc/mounts to decide when to create .pid file.NeilBrown2010-02-08
| | | | | | | | | | | Monitoring /proc/mounts and creating a .pid file as soon as /var/run is writable is racy. Most distros clean all non-directories from /var/run early in boot and if mdmon races with this it could lose the files as soon as they are created. Instead require that "mdmon --takeover" be run after /var is writable. Signed-off-by: NeilBrown <neilb@suse.de>
* mdmon: allow pid to be stored in different directory.NeilBrown2010-02-04
| | | | | | | | /var/run probably doesn't persist from early boot. So if necessary, store in in /lib/init/rw or somewhere else that does persist. Signed-off-by: NeilBrown <neilb@suse.de>