mismatch_cnt != 0, member content mismatch, but md says the mirror is good

Benjamin Scott dragonhawk at gmail.com
Mon Feb 22 15:14:49 EST 2010


On Mon, Feb 22, 2010 at 1:39 PM, Michael ODonnell
<michael.odonnell at comcast.net> wrote:
> So far, then, it's looking like every Sunday at 4:22 all the RAIDs
> (all types or just RAID1?) in standard x86_64 CentOS5.4 (and RHAT?)
> boxes are broken and then resync'd.

  All types (as I interpret the script source).

  If the documentation is to be believed, they are not being broken;
they are being checked for consistency.  Not the same thing.  Breaking
and rebuilding leaves the array vulnerable during the rebuild, as you
note.  A consistency check just compares the supposedly identical
members to confirm they really *are* identically, and warns you if
they are not.

  What I find interesting is that I'm not getting log messages from
the kernel about liberty's "md1" device -- only "md0".  I can think of
two possible reasons for that: (R1) The kernel only logs the message
if a mismatch is discovered.  (R2) The check is not being run on "md1"
on liberty for some reason.

  If R1 is the case, that implies your system has mismatches across
several arrays, which I would think is a bad sign.

  If R2 is the case, I'd like to know why, and fix it so it works.

> Interactive responsiveness is usually significantly reduced, as well ...

  With a good RAID implementation, I/O for patrol reads is done when
the array is idle.  (Kind of like "nice 19" for I/O.)  I don't know if
Linux does this or not.

> We'll probably disable that "helpful" weekly script on our machines
> until we have a better handle on this (or a fix).

  You may want to determine if you've got mismatches or not before
disabling the script.  It could be it just alerted you to trouble
before it became a disaster.

-- Ben


More information about the gnhlug-discuss mailing list