summaryrefslogtreecommitdiff
path: root/monitor.c
Commit message (Collapse)AuthorAge
* Monitor: write meta data in readonly state, sometimesmwilck@arcor.de2013-10-16
| | | | | | | | | | | | | | | | | | | This patch reverts 24a216bf: "Monitor: Don't write metadata in inactive array state". While it's true that writing meta data is usually not necessary in readonly state, there is one important exception: if a disk goes faulty, we want to record that, even if the array is inactive. We might as well just revert 24a216bf, because with the recently submitted patch "Monitor: don't set arrays dirty after transition to read-only" those meta data writes that really annoying (for a clean, readonly, healthy array during startup) are gone anyway. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: don't set arrays dirty after transition to read-onlymwilck@arcor.de2013-10-16
| | | | | | | | | | | | | | | | This patch reverts commit 4867e068. Setting arrays dirty after transition from inactive to anything else causes unnecessary meta data writes and may wreak trouble unnecessarily when a disk was missing during assembly but the array was never written to. The reason for 4867e068 was a special situation during reshape from RAID0 to RAID4. I ran all IMSM test cases with it reverted and found no regressions, so I believe the reshape logic for IMSM works fine in mdadm 3.3 also without this. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
* mdmon: wait_and_act: fix debug message for SIGUSR1Martin Wilck2013-07-31
| | | | | | | | | Correctly print out wake reason if it was a signal. Previous code would print misleading select events (pselect(2) man page says the fdsets become undefined in case of error). Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
* monitor: read_and_act: log status when calledMartin Wilck2013-07-31
| | | | | | | | | | read_and_act() currently prints a debug message only very late. Print the status seen by mdmon right away, to track mdmon's actions more closely. Add a time stamp to observe long delays between read_and_act calls, e.g. caused by meta data writes. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: Don't write metadata in inactive array statemwilck@arcor.de2013-07-09
| | | | | | | | | | | The kernel docs state that meta data is never written in states clear, inactive, suspended, readonly, and read_auto. Why should this be different for containers? We need to write metadata when the array is disabled, though. Tested with the DDF (10*) and IMSM (9*) tests, works. Signed-off-by: NeilBrown <neilb@suse.de>
* monitor: treat unreadable array_state as cleanmwilck@arcor.de2013-04-23
| | | | | | | | | Failure to read array_state can only mean the array has been deleted by the kernel; it is not an indication that the array is dirty. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
* monitor: read_and_act: handle race conditions for resync_startmwilck@arcor.de2013-04-23
| | | | | | | | | | | | | | | | | | | | | | | | | | When arrays are stopped, sysfs attributes may be deleted by the kernel, and attempts to read these attributes will fail. Setting resync_start to 0 is wrong in this case, because it may make is_resync_complete() erroneously return FALSE for a clean array. It is better to leave resync_start untouched (the previously read value for this array). Otherwise set_array_state() will pass thewrong state information to the metadata handler, which will write it to disk, and at the next restart an unnecessary recovery is started for the array. It is also possible that resync_start is actually *not* deleted yet when read_and_act is running, and an apparently valid value of "0" is read from it, with the same effect as described above. This happens if the kernel has already called md_clean() on the array (setting recovery_cp = 0), but the delayed removal of "resync_start" hasn't happened yet. Therefore, in "clear" state, "resync_start" shouldn't be read at all. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
* monitor: don't call pselect() on deleted sysfs filesmwilck@arcor.de2013-04-23
| | | | | | | | | | | | It makes no sense to listen for events on files that have been deleted. This happens when arrays are stopped and the kernel removes the associated sysfs structures. Calling pselect() on the deleted attributes may cause a storm of wake events. Signed-off-by: Martin Wilck <mwilck@arcor.de> Signed-off-by: NeilBrown <neilb@suse.de>
* Discard devnum in favour of devnmNeilBrown2013-02-21
| | | | | | | | | | | | | | We widely use a "devnum" which is 0 or +ve for md%d devices and -ve for md_d%d devices. But I want to be able to use md_%s device names. So get rid of devnum (a number) and use devnm (a 32char string). eg. md0 md_d2 md_home Signed-off-by: NeilBrown <neilb@suse.de>
* monitor: ensure we retry soon when 'remove' fails.NeilBrown2012-01-03
| | | | | | | | If a 'remove' fails there is no certainty that another event will happen soon, so make sure we retry soon anyway. Reported-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* monitor: make return from read_and_act more symbolic.NeilBrown2012-01-03
| | | | | | | | | Rather than just a number, use a named flag. This makes the code easier to understand and allows room for returning more flags later. Signed-off-by: NeilBrown <neilb@suse.de>
* monitor: don't unblock a device that isn't blocked.NeilBrown2011-12-07
| | | | | | | | | | | | | | | | When we see a failed device, we both unblock and remove it (after updating the metadata). However it might not be blocked as there can be a delay between unblocking and the device being free to be removed. If this happens the clearing of 'blocked' succeeds so md sends a sysfs notification and mdmon checks again and tries to clear 'blocked' again. Thus it enters a busy-loop until the 'remove' succeeds. To avoid this, only try to unblock if the device was blocked. Signed-off-by: NeilBrown <neilb@suse.de>
* FIX: Mdmon crashes after changing RAID level from 1 to 0Lukasz Dorau2011-09-06
| | | | | | | | | | | | | | | | | | | | | | | | | Description of the bug: Sometimes mdmon crashes after changing RAID level from 1 to 0 (takeover). Cause of the bug: The managemon marks an active_array for removal from monitoring by assigning a->container to NULL value (in the "manage_member" function). Sometimes (during stress test) it happens right when the monitor is in the "read_and_act" function and a->container pointer is in use. This causes the monitor crashes. Solution: The active array has to be marked for removal in another way than setting NULL pointer when it can be in use. A new field "to_remove" was added to the "active_array" structure. It is used in the managemon to mark a container to remove (instead of the old assigment: a->container = NULL) and monitor checks it to determine if the array should be removed. The field "to_remove" should be checked in some other places to avoid managing of the array which is going to be removed. Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* FIX: imsm: Rebuild does not start on second failed diskKrzysztof Wojcik2011-03-24
| | | | | | | | | | | | | | | | | | | | | | Problem: If we have an array with two failed disks and the array is in degraded state (now it is possible only for raid10 with 2 degraded mirrors) and we have two spare devices in the container, recovery process should be triggered on booth failed disks. It does not. Recovery is triggered only for first failed disk. Second failed disk remains unchanged although the spare drive exists in the container and is ready to recovery. Root cause: mdmon does not check if the array is degraded after recovery of first drive is completed. Resolution: Check if current number of disks in the array equals target number of disks. If not, trigger degradation check and then recovery process. Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* mdmon: Stop keeping track of RAID0 (and LINEAR) arrays.NeilBrown2011-03-22
| | | | | | | | | | | | Tracking RAID0 arrays doesn't really work. There is no need, and there are some sysfs files which won't exist when the array appears and then won't be opened when the level is changed. So simply ignore RAID0 and LINEAR arrays - don't add them when they appear and if an array we are monitoring turns into one of these, discard it promptly. Signed-off-by: NeilBrown <neilb@suse.de>
* mdmon: don't wait for O_EXCL when shutting down.NeilBrown2011-03-22
| | | | | | | | | | If mdmon is shutting down because there are no devices left to look at, then don't wait 5 seconds for an O_EXCL open, and that can block progress of --grow. Only wait for O_EXCL if we received a signal. Signed-off-by: NeilBrown <neilb@suse.de>
* Merge branch 'master' into devel-3.2NeilBrown2011-03-14
|\
| * monitor: close recovery_fd when closing state_FdNeilBrown2011-03-14
| | | | | | | | | | | | These should be open or closed together. Signed-off-by: NeilBrown <neilb@suse.de>
| * FIX: Reset disk state if disk is missingKrzysztof Wojcik2011-03-10
| | | | | | | | | | | | | | | | | | | | If we can't read actual disk state, it shoud be initiated to 0. Overwise it may be out of date value resulting false action later in code (e.g. set disk to improper state). Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | FIX: Last checkpoint is not setAdam Kwolek2011-02-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When reshape is finished monitor has to set last checkpoint to the array end to allow metatdata for reshape finalization. Metadata has to know if reshape is finished or it is broken On reshape finish metadata finalization is required. When reshape is broken, metadata must remain as is to allow for reshape restart from checkpoint. This can be resolved based on reshape_position sysfs entry. When it is equal to 'none', it means that md finishes work. In such situation move checkpoint to the end of array. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | FIX: Reset disk state if disk is missingKrzysztof Wojcik2011-01-26
| | | | | | | | | | | | | | | | | | | | If we can't read actual disk state, it shoud be initiated to 0. Overwise it may be out of date value resulting false action later in code (e.g. set disk to improper state). Signed-off-by: Krzysztof Wojcik <krzysztof.wojcik@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | FIX: sync_completed == 0 causes reshape cancellation in metadataAdam Kwolek2011-01-17
| | | | | | | | | | | | | | | | | | | | | | | | | | md signals reshape completion (whole area or parts) by setting sync_completed to 0. This causes in set_array_state() to rollback metadata changes (super-intel.c:4977. To avoid this do not allow for set last_checkpoint to 0 if reshape is finished. This was also root cause of my previous fix for finalization reshape that I agreed earlier is not necessary, Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | Raid0: detect reshape on array startAdam Kwolek2011-01-06
| | | | | | | | | | | | | | | | When raid0 array is takeovered to raid4 for reshape it should be possible to detect that array for reshape is monitored now for metadata update. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | Detect level changeAdam Kwolek2011-01-06
| | | | | | | | | | | | | | | | | | | | For level migration support it is necessary to allow mdmon to react for level changes. It has to have ability to change configuration of active array, and for array level change to raid0 finish array monitoring. Signed-off-by: Maciej Trela <maciej.trela@intel.com> Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | Handle checkpointing during reshapeNeilBrown2010-12-16
| | | | | | | | | | | | | | | | | | We need to allow metadata to handle progress of reshape, completion, and abort-before-start. Include all those in ->set_array_state() Signed-off-by: NeilBrown <neilb@suse.de>
* | mdmon: when a reshape is detected, add any newly added devices to the array.NeilBrown2010-12-16
| | | | | | | | | | | | | | | | | | When mdadm starts a reshape, it might add some devices to the array first. mdmon needs to notice the reshape starting and check for any new devices. If there are any they need to be provided to be monitored. Signed-off-by: NeilBrown <neilb@suse.de>
* | fix: mdadm -Ss for external metadata don't stop containerHawrylewicz Czarnowski, Przemyslaw2010-12-07
|/ | | | | | | | | | Sometimes (~50%) mdadm -Ss cannot stop container as mdmon opens its device and do not close it before exit(). The period between open and release of handle is too long and md is not able stop device. Releasing handle before exit does not block md. Signed-off-by: Przemyslaw Czarnowski <przemyslaw.hawrylewicz.czarnowski@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* mdmon: record sync_completed directly to the metadataDan Williams2010-06-15
| | | | | | | | | | | | | | | | | | | | When sync_action is idle mdmon takes the latest value of md/resync_start or md/<dev>/recovery_start to record the resync/rebuild checkpoint in the metadata. However, now that mdmon is reading sync_completed there is no longer a need to wait for, or force an idle event to take a checkpoint. Simply update the forward progress of ->last_checkpoint at every wakeup event and force it to be recorded at least every 1/16th array-size interval. It may be recorded more frequently if a ->set_array_state() event occurs. This also cleans up some confusion in handling the dual-rebuild case. If more than one spare has been activated the kernel starts the rebuild at the lowest recovery offset, so we do not need to worry about min_recovery_start(). Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* mdmon: periodically checkpoint recoveryDan Williams2010-05-14
| | | | | | | | | | | | The kernel updates and notifies md/sync_completed when it is time to take a checkpoint. When this occurs (at 1/16 array size intervals) write 'idle' to md/sync_action to have the current recovery position updated in recovery_start and resync_start. Requires the metadata handler to reset ->last_checkpoint when it has determined that recovery has ended. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* mdmon: insist on creating .pid file at startup.NeilBrown2010-02-08
| | | | | | | | | | | | | | | | Now that we don't "mdadm --takeover" until /var/run is writable there is no need to continually try to create files in there. So only create these files at startup and fail if they cannot be made. This means that to start an array with externally managed metadata, either /var/run or ALT_RUN (e.g. /lib/init/rw) must be writable. To 'takeover' from a previous mdmon instance, /var/run must be writable. This means we don't need to worry about SIGHUP (which was once used to tell us it was time to create .pid) and SIGALRM. Signed-off-by: NeilBrown <neilb@suse.de>
* Introduce MaxSectorDan Williams2009-12-21
| | | | | | | Replace occurrences of ~0ULL to make it clear we are talking about maximal resync/recovery position. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* Add scaffolding for handling md/dev-XXX/recovery_startDan Williams2009-12-21
| | | | | | Prepare the code to handle saving a recovery checkpoint. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* mdmon: cleanup resync_startDan Williams2009-12-14
| | | | | | | | | | We don't need to sprinkle reads of this attribute all over the place, just once at the entry of read_and_act(). Also, the mdinfo structure for the array already has a 'resync_start' member, so just reuse that. Finally, rename get_resync_start() to read_resync_start to make it consistent with the other sysfs accessors in monitor.c. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* Update copyright dates and remove references to @cse.unsw.edu.auNeilBrown2009-06-02
| | | | | | Also removed 'paper' addresses. Signed-off-by: NeilBrown <neilb@suse.de>
* Wait for POLLPRI on /proc or /sys files.NeilBrown2009-04-14
| | | | | | | | | | | | | From 2.6.30, /proc/mounts and various /sys files will probably always returns 'readable' to select, so we will need to wait on POLLPRI to get the 'new data is available' signal. When using select, this corresponds to an 'exception', so adjust calls to select accordingly. In one case we sometimes wait on a socket and sometime on /proc/mounts, so we need to test which. Signed-off-by: NeilBrown <neilb@suse.de>
* mdmon: fix resync completion detectionDan Williams2009-04-12
| | | | | | | | | Starting with 2.6.30 the md/resync_start attribute will no longer return a non-sensical number when resync is complete, instead it now returns 'none'. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* mdmon: fix missed 'clean' eventDan Williams2009-02-24
| | | | | | | | | | | | mdmon may miss events because it re-reads state after read_and_act. The additional read is used to determine dirty status before allowing a sigterm to proceed. Since read_and_act is in the best position to determine 'dirty' status and its return value is not used, modify it to return true if the array is dirty. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* mdmon: pass symbolic name to mdmon instead of device name.NeilBrown2008-11-20
| | | | | | | | | | | | | Now that names in /dev are usually created (eventually) by udev, it isn't really safe to rely in finding a name in /dev to pass to mdmon to identify which array to monitor. And it isn't really necessary to have a name in /dev. So just pass the symbolic name, e.g. md127 or md123. Change util.c to pass that name, and change mdmon to process the name sensibly. Signed-off-by: NeilBrown <neilb@suse.de>
* update copyright headersDan Williams2008-10-28
| | | | Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* mdmon: terminate cleanDan Williams2008-10-15
| | | | | | | | | | | We generally don't want mdmon to be terminated, but if a SIGTERM gets through try to leave the monitored arrays in a clean state, block attempts to mark the array dirty, and stop servicing the socket. When we are killed by sigterm don't remove the pidfile let that be cleaned up by the next monitor. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* trivial warn_unused_result squashingDan Williams2008-10-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Made the mistake of recompiling the F9 mdadm rpm which has a patch to remove -Werror and add "-Wp,-D_FORTIFY_SOURCE -O2" which turns on lots of errors: config.c:568: warning: ignoring return value of asprintf Assemble.c:411: warning: ignoring return value of asprintf Assemble.c:413: warning: ignoring return value of asprintf super0.c:549: warning: ignoring return value of posix_memalign super0.c:742: warning: ignoring return value of posix_memalign super0.c:812: warning: ignoring return value of posix_memalign super1.c:692: warning: ignoring return value of posix_memalign super1.c:1039: warning: ignoring return value of posix_memalign super1.c:1155: warning: ignoring return value of posix_memalign super-ddf.c:508: warning: ignoring return value of posix_memalign super-ddf.c:645: warning: ignoring return value of posix_memalign super-ddf.c:696: warning: ignoring return value of posix_memalign super-ddf.c:715: warning: ignoring return value of posix_memalign super-ddf.c:1476: warning: ignoring return value of posix_memalign super-ddf.c:1603: warning: ignoring return value of posix_memalign super-ddf.c:1614: warning: ignoring return value of posix_memalign super-ddf.c:1842: warning: ignoring return value of posix_memalign super-ddf.c:2013: warning: ignoring return value of posix_memalign super-ddf.c:2140: warning: ignoring return value of write super-ddf.c:2143: warning: ignoring return value of write super-ddf.c:2147: warning: ignoring return value of write super-ddf.c:2150: warning: ignoring return value of write super-ddf.c:2162: warning: ignoring return value of write super-ddf.c:2169: warning: ignoring return value of write super-ddf.c:2172: warning: ignoring return value of write super-ddf.c:2176: warning: ignoring return value of write super-ddf.c:2181: warning: ignoring return value of write super-ddf.c:2686: warning: ignoring return value of posix_memalign super-ddf.c:2690: warning: ignoring return value of write super-ddf.c:3070: warning: ignoring return value of posix_memalign super-ddf.c:3254: warning: ignoring return value of posix_memalign bitmap.c:128: warning: ignoring return value of posix_memalign mdmon.c:94: warning: ignoring return value of write mdmon.c:221: warning: ignoring return value of pipe mdmon.c:327: warning: ignoring return value of write mdmon.c:330: warning: ignoring return value of chdir mdmon.c:335: warning: ignoring return value of dup monitor.c:415: warning: rv may be used uninitialized in this function ...some of these like the write() ones are not so trivial so save those fixes for the next patch. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* monitor: clean up some debug messagesDan Williams2008-09-15
| | | | Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* 'mdadm --wait-clean' wait for array to be marked cleanDan Williams2008-09-15
| | | | | | | | | For use in distro shutdown scripts with a RAID root file system. Returns immediately if the array is 'readonly', or not an externally managed array. It is up to the distro's scripts to make sure no new writes hit the device after this returns 'true'. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* monitor: don't mark dirty on resync completeDan Williams2008-09-15
| | | | | | ...instead look at array state to determine if the array is consistent Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* monitor: mark clean on active-idleDan Williams2008-09-15
| | | | | | This also handles the case where 'clean' is set directly. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* Allow an externally managed array to be marked readonlyNeilBrown2008-08-19
| | | | | | | | | | | If the metadata_version is -mdXXX/whatever rather than /mdXXX/whatever then the array is readonly and should be left alone by mdmon. Signed-off-by: NeilBrown <neilb@suse.de>
* Extra option for set_array_state: you choose dirty or clean.NeilBrown2008-08-19
| | | | | | | | | | | | | | When we first start an array, it might be good to start recovery straight away. That requires setting the array to 'dirty', but only the metadata handler can know if that is required or not. So have a third possible 'consistent' option to set_array_state. Either 'no' or 'yes' or 'you choose'. Return value indicates what was chosen. '1' (no) should be chosen unless there is a good reason. Signed-off-by: NeilBrown <neilb@suse.de>
* mdmon: handle failures versus readauto arraysDan Williams2008-08-15
| | | | | | | | | Transition readauto arrays to active before failing drives. Hmm... why do we keep reblocking / renotifying in the readonly case? Need to bottom out on this, but not right now. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* mdmon: use activate spare for re-addDan Williams2008-08-12
| | | | | | | | | Disks that are not in-sync or failed are not assembled into member arrays by mdadm. Teach mdmon to resolve this situation by checking for spares at start. imsm_activate_spare() is updated to prefer devices that can be re-added versus new spares. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* imsm: handle degraded->normal transitions in set_diskDan Williams2008-07-24
| | | | | | | Removes the need for the call to ->set_array_state when sync_action transitions from 'recover' to 'idle'. Signed-off-by: Dan Williams <dan.j.williams@intel.com>