GNHLUG server liberty reboot failure

Ben Scott dragonhawk at gmail.com
Thu Jul 31 10:52:34 EDT 2008


On Thu, Jul 31, 2008 at 8:02 AM, Bruce Dawson <jbd at codemeta.com> wrote:
> Ben: Check the size of the boot partition to see if another kernel can
> be placed in there ...

  Good thought.  But, that's not it.  The /boot partition is 100 MB,
27 MB used, 68 MB free.  All the kernel and kernel-smp packages pass
an RPM verify.

  FWIW, none of the other LVs are that close to full, either.  The
worst by far is the /var LV, which is 63% full and 362 MB free.
/var/log and /var/spool are on separate LVs.  The mailman archives are
using the bulk of the allocated space on /var; they grow very slowly
so I'm not in a panic (yet).

  But: I just discovered that the running kernel is 2.6.9-55, while
the latest kernel (the one I was rebooting for) is 2.6.9-67.0.22.  The
grub.conf from the mounted /boot says it should have booted -67...

  Oh, crap.  I bet I know what happened.  Say one of the disks got
tossed out of the mirror set.  That would mean when RPM installed the
-67 kernel packages, the changes would only be written to the
remaining mirror member.  If the system then booted off the glitched
disk, it would have a stale mirror with the older grub.conf, and boot
the -55 kernel.

  And, sure enough, doing a read-only mount of /dev/sda1 on /mnt/tmp
reveals that the -67 kernel files are not present, and the grub.conf
does not mention them.  I did a directory list and made a copy of
grub.conf and put them under /adm/bootfail/ for future reference, and
unmounted /dev/sda1 again.

  Of course, that does not explain the root cause of the disk being
tossed out of the mirror set.  It could be a random glitch, or it
could be the disk is starting to fail.

-- Ben


More information about the gnhlug-discuss mailing list