recovering FC3 from a bad superblock

Mon May 16 17:26:00 EDT 2005

On 5/16/05, Greg Rundlett <greg.rundlett at gmail.com> wrote:
> The battery ran out, and it seems like the cache
> got dumped onto the superblock 

  Odd.  That usually doesn't happen with modern hardware.  Of course,
there's a world of difference between "usually doesn't" and "never". 
:-(

> e2fsck ran for a long time, complaining about every inode on the disk,
> and seemingly moved everything around.

  That is a very bad sign.  Generally speaking, I've found that if
e2fsck thinks it has to do that kind of major repair, either (1) the
filesystem is scrambled beyond reasonable hope of recovery, and/or (2)
there is hardware or software problem causing e2fsck to go insane.  In
the later case, letting e2fsck run to completion will usually cause
the former case to come into play.

> This seems to indicate that the Windows partition is the boot
> partition.  However, the system normally boots into Linux without any
> interaction.

  This is typical.  GRUB often replaces the MBR (Master Boot Record,
or "boot block").  Normally, the code in the MBR is responsible for
interpreting the partition table and giving meaning to the "active"
flag.  When GRUB replaces the MBR, though, it replaces that behavior
with its own, which mostly ignores the active flag.  So while your
partition table may indicate that Windows partition is the boot
partition, GRUB has its own ideas.

> e2fsck -f -n /dev/hda2
> tells me there are errors

  Any particular errors?  Or is this one of those, "All of them, I
think" situations?

> Using the so-called backup superblocks [block-size (8192 *n) +1], it
> reports a 'bad magic number'
> e2fsck -b 16384 -n /dev/hda2

  "Bad magic number" means e2fsck went looking for a superblock
signature and didn't find one.  This usually means you either got the
superblock number wrong, or you're not working with an EXT2 filesystem
(anymore).

> How should I go about fixing the 'errors' while the filesystem is not
> mounted? 

  The correct course of action depends on the errors you are seeing
and how good your backups are.

  As Derek says, you're prolly best off just mounting the
filesystem(s) read-only, coping off any data you can and want, and
then remaking the filesystems.  Then you either restore from backup or
reinstall the OS.

> Or, how should I mount the filesystem properly so that it can be fixed,
> but not 'used' during the fix?

  In the case of EXT2/EXT3 (and, indeed, most filesystems), you
*NEVER* want the filesystem mounted during a fix/repair operation. 
Always unmount first.  (In the case of the root filesystem, mount
read-only, fsck, and then reboot.)  Running fsck on an "active"
filesystem will generally result in scrambled filesystem.

  "There are two kinds of people in this world: Those who have good
backups, and those who will end up wishing they did."