summaryrefslogtreecommitdiff
path: root/Monitor.c
Commit message (Collapse)AuthorAge
* Monitor: Stop monitoring devices that have disappeared.NeilBrown2014-08-14
| | | | | | | | If we are only monitoring a device because we found it in /proc/mdstat, and it has been gone for 5 checks, forget about it completely. Signed-off-by: NeilBrown <neilb@suse.de>
* New function: sysfs_waitNeilBrown2013-07-01
| | | | | | | We have several places that wait for activity on a sysfs file. Combine most of these into a single 'sysfs_wait' function. Signed-off-by: NeilBrown <neilb@suse.de>
* Remove lots of unnecessary white space.NeilBrown2013-06-19
| | | | | | | Now that I am using white-space mode in Emacs I can see all of this, and I don't like it :-) Signed-off-by: NeilBrown <neilb@suse.de>
* Wait: also wait if an action is about to start.NeilBrown2013-05-01
| | | | | | | | | | If a sync/recover action is about to start but hasn't actually begun yet, /proc/mdstat won't show it, but md/sync_action will (it checks MD_RECOVERY_NEEDED). So when /proc/mdstat seems to say nothing is happening, double check with md/sync_action. Signed-off-by: NeilBrown <neilb@suse.de>
* Discard devnum in favour of devnmNeilBrown2013-02-21
| | | | | | | | | | | | | | We widely use a "devnum" which is 0 or +ve for md%d devices and -ve for md_d%d devices. But I want to be able to use md_%s device names. So get rid of devnum (a number) and use devnm (a 32char string). eg. md0 md_d2 md_home Signed-off-by: NeilBrown <neilb@suse.de>
* Allow --wait to wait for delayed resync.NeilBrown2012-11-22
| | | | | | | | If a resync is delayed, then e->percent will be negative but not RESYNC_NONE. In that case we still want to wait. Reported-by: Ross Boylan <ross@biostat.ucsf.edu> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: don't complain about non-monitorable arrays in mdadm.confNeilBrown2012-10-24
| | | | | | | | | | | | | | If we are asked to monitor a RAID0 or Linear - which cannot be monitored - we complain with "Device Disappeared .... Wrong-Level". However if the RAID0 or Linear is being requested because it is in mdadm.conf then the message is inappropriate and confusing. So track which arrays are added from the config file, and suppress that message in that case. Reported-by: "Johnson Yan" <johnson_yan@usish.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Change Monitor to take a struct contextNeilBrown2012-07-09
| | | | Signed-off-by: NeilBrown <neilb@suse.de>
* Remove scattered checks for malloc success.NeilBrown2012-07-09
| | | | | | | | | | | | | | malloc should never fail, and if it does it is unlikely that anything else useful can be done. Best approach is to abort and let some super-daemon restart. So define xmalloc, xcalloc, xrealloc, xstrdup which don't fail but just print a message and exit. Then use those removing all the tests for failure. Also replace all "malloc;memset" sequences with 'xcalloc'. Signed-off-by: NeilBrown <neilb@suse.de>
* Introduce pr_err for printing error messages.NeilBrown2012-07-09
| | | | | | | 'pr_err("' is a lot shorter than 'fprintf(stderr, Name ": ' cont_err() is also available. Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: fix reporting for Fail vs FailSpare etc.NeilBrown2012-06-04
| | | | | | | | | | | | The tests here were specific to 0.90 metadata and didn't work properly for 1.x metadata, where a device's "number" doesn't change. By checking if this is a new array we can avoid some corner cases. Then we test mostly based on state and not based on 'number' at all. Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: Report NewArray when an array the disappeared, reappears.NeilBrown2012-06-04
| | | | Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: fix inconsistencies in values for ->percentNeilBrown2012-06-04
| | | | | | | | | | | | | ->percent sometimes stores negative values recording states like 'pending' or 'delayed'. The value '-2' means both 'delayed' and in Monitor, 'unknown'. Also, '-1' has a meaning but not #define. So change the #defines to be prefixed with "RESYNC_", instead of "PROCESS_", add new "_NONE" and "_UNKNOWN", and use correct value in each location. Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: Allow correct monitoring of more member devices.NeilBrown2012-06-04
| | | | | | | Having "MaxDisks == 384" is not good. Discard it in favour of MAX_DISKS which is 4096 Signed-off-by: NeilBrown <neilb@suse.de>
* Add --prefer option for --detail and --monitorNeilBrown2012-04-18
| | | | | | | | | | | | | | | | | Both --detail and --monitor can report the names of member devices on an array, and do so by searching /dev and finding the shortest name that matches. If --prefer=foo is given, they will instead prefer a name that contain /foo/. So mdadm --detail /dev/md0 --prefer=by-path will list the component devices via their /dev/disk/by-path/xxx names. Signed-off-by: NeilBrown <neilb@suse.de>
* Use MDMON_DIR for pid files created in Monitor.cJes Sorensen2012-02-23
| | | | | | | | Other parts of mdadm/mdmon place .pid/.sock files in MDMON_DIR. This makes Monitor.c consistent with the rest. Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
* fix: Monitor sometimes crashesLukasz Dorau2012-01-12
| | | | | | | | | The "char cnt [40]" buffer is sometimes too small to hold all message - in such case monitor crashes. The buffer must be larger to be able to hold all message. Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Report raid level type to syslog on RebuildFinished event.Sergey B Kirpichev2011-12-07
| | | | | | | Thus, for RAID1/RAID10 this can be filtered out in logcheck. Relates-to: Debian bug 599821 Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor(): free allocated memory on exitJes Sorensen2011-11-02
| | | | | Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Check all member devices in enough_fdNeilBrown2011-05-23
| | | | | | | | | | | | | | The loop over all member devices in enough_fd could easily stop before it had found all devices. This would cause --re-add to fail incorrectly. So change the loop to be based on the reported number of devices in the device - with a safe-guard limit of 1024. Change some other loops to be more careful too. Reported-by: "Schmidt, Annemarie" <Annemarie.Schmidt@stratus.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: avoid NULL dereference with 0.90 metadataJonathan Liu2011-04-13
| | | | | | | | | | | | 0.90 array do not report the metadata type in /proc/mdstat, so we cannot assume that mse->metadata_version is non-NULL. So add an appropriate check. This adds an additional check missed by commit eb28e119b03fd5149886ed516fa4bb006ad3602e. Signed-off-by: NeilBrown <neilb@suse.de>
* mdadm: respect --syslog in monitor modeMike Frysinger2011-04-11
| | | | | | | A few places don't accept syslog as a monitor mode, so fix that. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: avoid NULL dereference with 0.90 metadataNeilBrown2011-04-05
| | | | | | | | | | 0.90 array do not report the metadata type in /proc/mdstat, so we cannot assume that mse->metadata_version is non-NULL. So add an appropriate check. Reported-by: Eugene <hdejin@yahoo.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Move WaitClean from sysfs to Monitor.cNeilBrown2011-04-05
| | | | | | | It might not really belong in Monitor, but it really doesn't belong in sysfs.c, and fits well with Wait() Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: handle v.quick removal of devices better.NeilBrown2011-03-22
| | | | | | | | | | | | | If a device fails and then is removed before Monitor sees the failure, GET_DISK_INFO returns nothing so Monitor relies on mdstat info where '_' is incorrectly interpreted as 'a spare'. We should treat '_' as 'removed' - that is safer. Without this, a v.quick fail+remove gets reported as 'Failed' then 'SpareActive'. Signed-off-by: NeilBrown <neilb@suse.de>
* FIX: ping_monitor() usage causes memory leaksAdam Kwolek2011-03-18
| | | | | | | | | | When for ping_monitor() input devnum2devname() is used, received string pointer should be passed to free() for memory release. It is not made in several places. This use case should have function to avoid memory leak. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Various compile fixes.NeilBrown2011-02-01
| | | | | | | Make "make everything" succeed. This fixed some real bugs. Signed-off-by: NeilBrown <neilb@suse.de>
* Allow domain_test to report that no domains were found.NeilBrown2011-02-01
| | | | | | | | | Sometime we will need to know the difference between no domains found and domains didn't match. So allow domain_test to return different values and fix up all callers to maintain current behaviour. Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: do not move partitions to external containerCzarnowska, Anna2011-02-01
| | | | | | | | Arrays on partitions are not supported for external metadata so do not take such spare from native array. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: avoid adding too many spares to containerCzarnowska, Anna2011-01-28
| | | | | | | | | | | | | | | | | Tests revealed that sometimes there are still more spares taken than needed. The reason for this is that after adding one spare to container with degraded subarray if between ioctl in main loop and load_container in try_spare_migration mdmon activates the spare we see active<raid but find no spares in parent container and so add an extra spare. To prevent such behaviour we count active disks in the list returned by getinfo_super_disks and compare it with subarray->active. If the number has increased it means new spare was added and activated so there is no need for more. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* fix: Monitor: min_size must be set to 0Czarnowska, Anna2011-01-17
| | | | | | | | Otherwise a random value will be used for comparison later for native and ddf metadata (until min_acceptable_spare_size is defined). Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* fix: segfault if subarray is monitored but container is notCzarnowska, Anna2011-01-17
| | | | | | | | | | | In this situation to->parent is null so "to" doesn't change to parent container and to->metadata is still null. This results in segmentation fault when checking to->metadata->ss->external. We should just skip this array as container is needed to move spares to. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: skip array if error getting sizeAnna Czarnowska2011-01-12
| | | | | | | | | | | | | | load_super tries to load container first anyway but if it fails eg. after physically removing a disk then it tries to read metadata from container device. This will always fail and print confusing errors. So use load_container instead of load_super on container. On failure to read metadata we should skip this array. It will be dealt with the next time round. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* move_spare function modified and moved to Manage.cAnna Czarnowska2011-01-05
| | | | | | | It will also be needed for Incremental. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Use one function chosing spares from containerAnna Czarnowska2011-01-05
| | | | | | | | | | | | | | | | | container_chose_spares in Monitor.c and get_spares_for_grow in super-intel.c do the same thing: search for spares in a container. Another version will also be needed for Incremental so a more general solution is presented here and applied in two previous contexts. Normally domlist==NULL would lead an empty list but this is typically checked earlier so here it is interpreted as "do not test domains". Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: Check destination array domain early.Marcin Labun2010-12-21
| | | | | | | | | Destination arrays that do not have any domains are excluded from spare sharing. We can check it early, without searching for donor arrays. Signed-off-by: Marcin Labun <marcin.labun@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* fix: Monitor doesn't return after starting daemonAnna Czarnowska2010-12-15
| | | | | Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Allow --update=devicesize with --re-addNeilBrown2010-12-09
| | | | | | | | | | This is useful with 1.1 and 1.2 metadata to update the metadata if the device size has changed. The same functionality can be achieved by writing to the device size in sysfs after re-adding normally, but in some cases this might be easier. Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: don't add more spares than neededAnna Czarnowska2010-12-03
| | | | | | | | | | | | | | | | When we add a spare to a container it takes a while before it is noticed by mdmon and recovery starts. During this time the array remains degraded but we don't want to add any more spares to this container. Therefore we must check container with degraded array if it doesn't already have a suitable spare. container_choose_spare is reused with from=to Domain check is not needed in this situation. Ping_manager after moving disk is needed to be able to see newly added disk in container after coming back through the loop. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: only get min_size onceAnna Czarnowska2010-12-03
| | | | | | | | We may call chose_spare several times before we find a suitable one so it is better to get the size beforehand. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: pass statelist reference when adding new arraysAnna Czarnowska2010-12-03
| | | | | | | Otherwise it will not get updated. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: array that has disappeared doesn't need sparesAnna Czarnowska2010-11-29
| | | | | | | | If a degraded array disappears we still have it in statelist with active<raid but it is pointless to look for spares for it. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: fix writing autorebuild.pidAnna Czarnowska2010-11-29
| | | | | | | | | If /var/run/mdadm doesn't exist we can never succeed writing so we should try to create it first. When we make sure it is there we write pid file as before. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: reset dev when size too smallAnna Czarnowska2010-11-29
| | | | | | | | | Cc: linux-raid@vger.kernel.org, Williams, Dan J <dan.j.williams@intel.com>, Ciechanowski, Ed <ed.ciechanowski@intel.com> Otherwise spare will be considered good anyway. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: devid should be dev_tAnna Czarnowska2010-11-29
| | | | | | | | For consistency with makedev(). int is not sufficient. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: few bug fixes for spare migrationAnna Czarnowska2010-11-29
| | | | | | | | | | | 1. If array not changed we should still report any degraded - another array may have a new spare that we can move. 2. Array with err=1 can't give a spare. 3. We look for spares in "from" not "st" which is supertype and has devname=NULL. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: choose spare correctly for external metadata.NeilBrown2010-11-25
| | | | | | | | | When metadata is managed externally - probably as a container - we need to examine that metadata to see which devices are spares. So use the getinfo_super_disk message and use the info returned. Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: separate 'choose_spare' out from 'move_spare'NeilBrown2010-11-25
| | | | | | | | choosing a spare from a container is more complicated that from a native array. So separate out choose_spare to make it easier to use an alternate implementation Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: check spare group is non-NULL before adding to domain listNeilBrown2010-11-23
| | | | | | | ... otherwise we crash. Reported-by: "Labun, Marcin" <Marcin.Labun@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: Allow metadata to set minimum size for spare to migrate in.Anna Czarnowska2010-11-22
| | | | | Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>