Out of memory while booting? update

Ben Scott dragonhawk at gmail.com
Mon Apr 6 19:48:10 EDT 2009


On Mon, Apr 6, 2009 at 3:13 PM, Charles G Montgomery
<cgm at physics.utoledo.edu> wrote:
> Booting single-user fails in the same way as a regular boot.
> I can get a shell with "init=/bin/sh".  From the shell I can fsck
> the file systems and mount and unmount.  I can also run almost all
> the scripts in /etc/rc2.d with no problems.
[...]
> From the shell (which has pid 1) I
> can "exec /sbin/init" and init starts as during a normal boot, but
> encounters the "Out of memory" disaster.

  The above is very significant.  It indicates the system is mostly
working, and that the main kernel image doesn't cause the crash.  And
as Jim Kuzdrall pointed out, if this was some kind of intermittent
hardware fault, you wouldn't be able to reproduce it so consistently.
The fact that a memory diagnostic doesn't find anything wrong in one
pass means it's not bad hardware.  So that means it's something that
happens after the kernel boots and after init starts.  That narrows it
down considerably, and should also make trouble-shooting much easier
(trouble-shooting issues that occur before init starts is much more
difficult).

> From the shell, shutdown doesn't work because /dev/initctl doesn't
> exist.

  The shutdown(8) command expects to tell init(8) to actually do the
work of shutting the system down.  Since you told the kernel to start
a shell as the initial program, the regular /sbin/init isn't running
for shutdown to talk to.

> The problem seems to come during the loading of modules.

  That was going to be my next guess.  :)  What's almost certainly
happening is that module is loading a kernel device driver which is
malfunctioning.  A malfunctioning driver can wreck all sorts of havoc.
 It might be corrupting kernel data structures, or getting the
hardware into a bad state, or just allocating way too much kernel
memory.

> One that it tries is a touchscreen module, which takes a long time and ends
> in trouble.

  That's a pretty strong sign.  I'd start testing there.  Even if the
system appears to honor your [CTRL]+[C], I'm betting that module has
already trashed the system at that point.

> Is there any way to stop init partway through, so I could at least
> see if unreasonable amounts of memory seem to have been used, or
> look for other information?

  My knowledge of Debian init scripts is a little rusty, but I think
booting to single-user mode might defer loading of modules.  If so,
you should be able to manually try loading the modules, one at a time,
and testing to see which one causes the system to become unstable.

  Booting to single-user mode does basic initialization (the
/etc/rc.d/rc.sysinit file on Red Hat derived systems; I think Debian
uses /etc/rcS.d/* or something like that) and then stops.  It's
somewhat analogous to the "Safe Mode -- Command Prompt Only" of
Microsoft Windows.  A normal Linux boot executes various service
scripts for the default runlevel in addition to basic initialization.

  To boot single-user, one typically appends the word "single" to the
kernel command line.  You're probabbly using GRUB, so interrupt the
countdown for the default boot, select the boot entry you want, then
hit [E] to edit it.  Select the line that says something about
"kernel" or "linux" or "linuz", and hit [E] to edit that.  At the end
of the command line, tack on a space and "single" (without the
quotes).  Hit [ENTER] to accept your change.  Then hit [B] to boot
with the modified kernel command line.  The system should go through a
minimal initialization, and then give you a root shell prompt.

  To load a module, type "modprobe foo" at a root prompt, where "foo"
is the module name.

  Getting a list of module names for your particular system is
trickier.  Investigate the files /etc/modules and /etc/modprobe.conf,
and/or the directory /etc/modprobe.d/ (depending on what flavors of
the module stuff your Debian is using).

> Is there some interaction with hardware that occurs during
> module loading but not during other activities?

  Absolutely.  Most kernel modules load device drivers, and device
drivers have full privileges to muck about with hardware and the
kernel, and indeed, must do so as part of their intended function.

> Is there a way to get init to not load modules?

  As "VirginSnow" said, it's not init that loads modules.  init
basically just shepherds the system.  init invokes a veritable horde
of scripts to do everything from initializing the basic system to
starting services and other tasks.  These are called "init scripts".
Most of them live under /etc/rc.d/ on most systems.  You'll also find
directories of the form /etc/rc?.d/ (where ? is a digit) which link to
those scripts, and select which ones get run under which conditions.

  I already addressed some things you can do to maybe stop modules
from being loaded.  You may also be able to just rename the
files/directories I mentioned.  The idea of renaming /sbin/modprobe
and /sbin/insmod isn't a bad one, either, but that may cause other
malfunctions (I'm not sure).

-- Ben



More information about the gnhlug-discuss mailing list