mismatch_cnt != 0, member content mismatch, but md says the mirror is good

Tue Feb 23 14:01:16 EST 2010

Commanding "check" to the md device is ordinarily a read-only 
operation, despite the terminology in the log that says "resyncing".

During the md check operation, the array is "clean" (not degraded) 
and you can see that explicitly with the "[UU]" status report; if 
the array were degraded the failed device would be marked with an 
underscore (that is, array status would be "[U_]" or "[_U]").

It is not a "scrub" because it does not attempt to repair anything. 
In ancient days, it was necessary to refresh data periodically by 
reading it and rewriting it to make sure it was not decaying due to 
changes in temperature, head position, dimensional stability, and so 
forth. The term comes from Middle English, where "scrub" means to 
remove impurities and is etymologically related to "scrape"; the 
original use of the term in computing is for core memory from which 
it was later applied to dynamic RAM and eventually to disks.

If a hardware read error is encountered during the check, the md 
driver handles this in the same way as a hardware read error that is 
encountered at any other time. Depending upon the RAID mode, it may 
attempt to reconstruct the failed sector and write it, possibly 
triggering the physical drive to reallocate a spare sector. More 
commonly, the md device will mark the physical drive as failed and 
degrade the array. Detecting and reporting "soft failure" incidents 
such as reallocations of spare sectors is the job of something like 
smartmontools, which can and should be configured to look past the 
md device and monitor the physical drives that are its components.

This consistency check is not strictly guaranteed to be read-only 
because it can trigger the array to drop to degraded mode depending 
upon what is encountered, but this (as far as I know) only occurs 
when there is some underlying hardware problem beyond merely 
different data. If the array is so configured, such as having a 
hot-spare device on-line, then the degradation incident can trigger 
writing operations.

-- Mike

On 2010-02-23 at 10:12 -0500, Michael ODonnell wrote:

>
>
> I executed commands as they would have been during the cron.weekly run and
> I can now see why our simple monitor script would conclude the RAID had
> a problem based on the resultant contents of /proc/mdstat.  During the
> "check" operation the RAID state is described as "clean, resyncing"
> by mdadm and I don't know whether the RAID should be regarded as being
> fault-tolerant in that state, though Mr.  Bilow indicated that it should
> and I see no screaming evidence to the contrary.
>
> Before the "check" operation:
>
>   ### ROOT ### cbc1:~ 545---> cat /sys/block/md0/md/array_state
>   clean
>
>   ### ROOT ### cbc1:~ 546---> cat /sys/block/md0/md/sync_action
>   idle
>
>   ### ROOT ### cbc1:~ 547---> cat /proc/mdstat
>   Personalities : [raid1]
>   md0 : active raid1 sdb5[1] sda5[0]
>         951409344 blocks [2/2] [UU]
>
>   unused devices: <none>
>
> Trigger the check:
>
>   ### ROOT ### cbc1:~ 548---> echo check > /sys/block/md0/md/sync_action
>
> After the check:
>
>   ### ROOT ### cbc1:~ 549---> cat /sys/block/md0/md/array_state
>   clean
>
>   ### ROOT ### cbc1:~ 550---> cat /sys/block/md0/md/sync_action
>   check
>
>   ### ROOT ### cbc1:~ 551---> cat /proc/mdstat
>   Personalities : [raid1]
>   md0 : active raid1 sdb5[1] sda5[0]
>         951409344 blocks [2/2] [UU]
>         [>....................]  resync =  0.1% (958592/951409344) finish=132.1min speed=119824K/sec
>
>   unused devices: <none>
>
>   ### ROOT ### cbc1:~ 552---> mdadm --query --detail /dev/md0
>   /dev/md0:
>           Version : 0.90
>     Creation Time : Fri Jan 22 11:08:38 2010
>        Raid Level : raid1
>        Array Size : 951409344 (907.33 GiB 974.24 GB)
>     Used Dev Size : 951409344 (907.33 GiB 974.24 GB)
>      Raid Devices : 2
>     Total Devices : 2
>   Preferred Minor : 0
>       Persistence : Superblock is persistent
>
>       Update Time : Tue Feb 23 09:42:14 2010
>             State : clean, resyncing
>    Active Devices : 2
>   Working Devices : 2
>    Failed Devices : 0
>     Spare Devices : 0
>
>    Rebuild Status : 0% complete
>
>              UUID : daf8dd0b:00087a40:d5caa7ee:ae05b3aa
>            Events : 0.56
>
>       Number   Major   Minor   RaidDevice State
>          0       8        5        0      active sync   /dev/sda5
>          1       8       21        1      active sync   /dev/sdb5
>