Software RAID issues (was Re: Suggestions solicited, server bring up)

Ben Scott dragonhawk at gmail.com
Fri Nov 20 16:43:36 EST 2009


On Fri, Nov 20, 2009 at 2:42 PM, Bill McGonigle <bill at bfccomputing.com> wrote:
>> Software-based solutions -- which don't kick in
>> until the OS is running -- sometimes get caught up trying to boot from
>> a failed disk.
>
> "Please don't use RAID-5".
>A healthy, properly configured (and tested) RAID-1 will boot nicely.

  It's not an issue with RAID 1 vs 5.  The issue is that non-RAID
cards are not intelligent.  Here's the scenario:

  System powers on.  Disks spin up, servo heads, come online.  BIOS
sees both disks as reporting online.  BIOS reads the MBR from disk 0,
finds a valid signature, tries to boot from it.  The bootstrap in the
MBR proceeds to try and load additional stages.  One of those includes
a bad block.  Loader aborts with an I/O error.  System sits there like
a dumb shit forever.  Disk 1 is fine and would work, but the BIOS
doesn't know that.

  An alternate scenario is the system just hangs or aborts trying to
read the MBR from disk 0.

  Optionally, have a watchdog that reboots the system.  System sits in
a boot loop forever.

  If this hasn't happened to you yet, lucky you.  I sincerely hope
your luck continues.  My luck is not as good as yours.

  With a hardware RAID controller, the first time disk 0 has a bad
block, the controller will fail that disk out of the RAID set, and use
disk 1 for everything.  The BIOS is never even aware there is a
problem.

  The disks are behaving; I don't want disk drives to self-destruct
and report "Not Ready" (and thus be totally unusable) just because
they have non-zero bad blocks.

  You could argue that the alternate scenario above is the fault of
the BIOS or disk controller, that it should be able to recover from an
I/O error on disk 0 and move on and try disk 1.  You're prolly right.
But this is the pee sea platform we're talking about here.  I don't
really need to explain the environment to you, do I?  ;-)  And the
BIOS is still helpless once the MBR bootstrap takes over.

-- Ben


More information about the gnhlug-discuss mailing list