Software RAID issues (was Re: Suggestions solicited, server bring up)
Ben Scott
dragonhawk at gmail.com
Sat Nov 21 18:31:39 EST 2009
On Sat, Nov 21, 2009 at 6:02 PM, Bill McGonigle <bill at bfccomputing.com> wrote:
> RAID-5 itself has a problem known as the "RAID-5 write hole" where data
> loss can be guaranteed in certain situations.
I've seen multiple different claims of something called "RAID-5 write hole".
One claim I see is that if there's an undetected read failure, the
system can calculate incorrect parity. Which I suppose is true. But
if there's an undetected read failure on a mirror, you also get bad
data. It just happens when you read the disk during normal I/O,
rather than when you rebuild the array. I'm also not sure what the
incidence of undetected failures really are; this might be an
imaginary problem.
Another claim I've seen is what happens if there is a failure in the
storage subsystem during a write operation, where the system updates
some stripes but not all stripes for a given logical block. Updating
multiple disks is not atomic. Which I suppose is true on especially
poor RAID-5 implementations. Better ones don't report the write as
successful to the OS until all stripes are updated (write-through), or
use battery-backed cache.
> RAID-6 is a patch to prevent this.
I recall RAID-6 being defined as "like RAID-5, but with two parity
stripes", i.e., now you can sustain two disk failures. But how does
that help the case of incomplete stripe writes?
> But RAID-5/6 also come with complexity, and software is buggy.
I guess the real problem is there is too much crappy hardware out
there. I've used RAID-5 on LSI/AMI MegaRAID for something like a
decade and never had any problems with bad implementation. This
prolly goes back to the pee sea nature again. But it appears that, to
get a decent storage subsystem in the pee sea, you have to buy a
decent RAID controller. :)
> And for a boot disk, having only one surviving drive
> is sufficient to get your machine running again.
I find that statement misleading. With mirroring, "only one
surviving disk" also means "only one failed disk". All the RAID
levels except for RAID 0 (striping without parity) give you that. :-)
> I still wouldn't use it for a boot disk since a boot disk
> set doesn't need to be big enough to justify any cost
> savings.
Agreed, but with disks so large these days, aren't we also beyond
the era of having a disk that's just a "boot disk"?
I have a feeling our next server at work is just going to have a
couple of 2 TB disks in it, for everything -- OS, apps, and data, all
on one mirror.
-- Ben
More information about the gnhlug-discuss
mailing list