mismatch_cnt != 0, member content mismatch, but md says the mirror is good
Michael Bilow
mikebw at colossus.bilow.com
Tue Feb 23 14:01:16 EST 2010
Commanding "check" to the md device is ordinarily a read-only
operation, despite the terminology in the log that says "resyncing".
During the md check operation, the array is "clean" (not degraded)
and you can see that explicitly with the "[UU]" status report; if
the array were degraded the failed device would be marked with an
underscore (that is, array status would be "[U_]" or "[_U]").
It is not a "scrub" because it does not attempt to repair anything.
In ancient days, it was necessary to refresh data periodically by
reading it and rewriting it to make sure it was not decaying due to
changes in temperature, head position, dimensional stability, and so
forth. The term comes from Middle English, where "scrub" means to
remove impurities and is etymologically related to "scrape"; the
original use of the term in computing is for core memory from which
it was later applied to dynamic RAM and eventually to disks.
If a hardware read error is encountered during the check, the md
driver handles this in the same way as a hardware read error that is
encountered at any other time. Depending upon the RAID mode, it may
attempt to reconstruct the failed sector and write it, possibly
triggering the physical drive to reallocate a spare sector. More
commonly, the md device will mark the physical drive as failed and
degrade the array. Detecting and reporting "soft failure" incidents
such as reallocations of spare sectors is the job of something like
smartmontools, which can and should be configured to look past the
md device and monitor the physical drives that are its components.
This consistency check is not strictly guaranteed to be read-only
because it can trigger the array to drop to degraded mode depending
upon what is encountered, but this (as far as I know) only occurs
when there is some underlying hardware problem beyond merely
different data. If the array is so configured, such as having a
hot-spare device on-line, then the degradation incident can trigger
writing operations.
-- Mike
On 2010-02-23 at 10:12 -0500, Michael ODonnell wrote:
>
>
> I executed commands as they would have been during the cron.weekly run and
> I can now see why our simple monitor script would conclude the RAID had
> a problem based on the resultant contents of /proc/mdstat. During the
> "check" operation the RAID state is described as "clean, resyncing"
> by mdadm and I don't know whether the RAID should be regarded as being
> fault-tolerant in that state, though Mr. Bilow indicated that it should
> and I see no screaming evidence to the contrary.
>
> Before the "check" operation:
>
> ### ROOT ### cbc1:~ 545---> cat /sys/block/md0/md/array_state
> clean
>
> ### ROOT ### cbc1:~ 546---> cat /sys/block/md0/md/sync_action
> idle
>
> ### ROOT ### cbc1:~ 547---> cat /proc/mdstat
> Personalities : [raid1]
> md0 : active raid1 sdb5[1] sda5[0]
> 951409344 blocks [2/2] [UU]
>
> unused devices: <none>
>
> Trigger the check:
>
> ### ROOT ### cbc1:~ 548---> echo check > /sys/block/md0/md/sync_action
>
> After the check:
>
> ### ROOT ### cbc1:~ 549---> cat /sys/block/md0/md/array_state
> clean
>
> ### ROOT ### cbc1:~ 550---> cat /sys/block/md0/md/sync_action
> check
>
> ### ROOT ### cbc1:~ 551---> cat /proc/mdstat
> Personalities : [raid1]
> md0 : active raid1 sdb5[1] sda5[0]
> 951409344 blocks [2/2] [UU]
> [>....................] resync = 0.1% (958592/951409344) finish=132.1min speed=119824K/sec
>
> unused devices: <none>
>
> ### ROOT ### cbc1:~ 552---> mdadm --query --detail /dev/md0
> /dev/md0:
> Version : 0.90
> Creation Time : Fri Jan 22 11:08:38 2010
> Raid Level : raid1
> Array Size : 951409344 (907.33 GiB 974.24 GB)
> Used Dev Size : 951409344 (907.33 GiB 974.24 GB)
> Raid Devices : 2
> Total Devices : 2
> Preferred Minor : 0
> Persistence : Superblock is persistent
>
> Update Time : Tue Feb 23 09:42:14 2010
> State : clean, resyncing
> Active Devices : 2
> Working Devices : 2
> Failed Devices : 0
> Spare Devices : 0
>
> Rebuild Status : 0% complete
>
> UUID : daf8dd0b:00087a40:d5caa7ee:ae05b3aa
> Events : 0.56
>
> Number Major Minor RaidDevice State
> 0 8 5 0 active sync /dev/sda5
> 1 8 21 1 active sync /dev/sdb5
>
More information about the gnhlug-discuss
mailing list