moving linux installs

Mon Apr 21 23:10:13 EDT 2008

  Warning: Giant message ahead.  If you're of the "TL;DR" persuasion,
hit delete now!  :-)

On Mon, Apr 21, 2008 at 4:10 PM, Bill McGonigle <bill at bfccomputing.com> wrote:
>  I thought those were similar enough that grub would be copacetic ...

  I would have thought so, too.   :(

> So, I'm not even sure why that didn't work at this point.

  What did you end up doing to fix the problem?

>> /boot/grub/device.map
>
>  On both these machines, that looks like ...
>  should be compatible, right?  No.

  Well, from what you're saying, it sounds like you weren't doing
anything that would require manually mapping devices.

  You would manually map devices if you were doing something like
installing GRUB to another disk, from an already running system.
Maybe the disk you're targeting is (hd1) when you run the GRUB shell.
But when GRUB actually boots that disk, you will have moved it to
(hd0).  So you have to tell GRUB the disk is going to be in a
different spot when it actually runs at boot.

  That doesn't seem to be your situation.

>  Yeah, as a function of outwards complexity this is a simple thing that's
> hard and the hard things were simple, relatively.

  Uhhh.... Yah.  Right.  0.o

> at some point the BIOS picks a boot medium, and it loads the
> bootloader off of disk and puts it in memory and sets the instruction
> pointer to its address and JMP's to it, right?  So then grub is running...

  Not exactly.  Fasten your seatbelt, because now things get hairy and
arcane.  The first two rows may get wet.

  The BIOS loads the first sector of the disk into RAM and looks for a
boot signature.  If it finds the signature, it assumes it found a boot
loader, and executes it.  If it doesn't find the boot signature, it
executes ROM BASIC^W^W^W coughs and dies.  If the signature lied about
being a boot loader, the machine generally crashes/hangs/reboots.

  (What devices the BIOS checks for the signature is
implementation-dependent.  In the classic BIOS, it was the first
floppy, then the first fixed disk, and that was all.  Modern BIOSes
usually support checking multiple fixed disks for a signature, or
picking one from a menu.  But if a boot sig is found on what is
normally a secondary disk, the BIOS will generally re-arrange things
so that the selected boot disk appears as (hd0) for that boot.  This
is done because "stuff" assumes the first fixed disk in the system
will be the boot disk.)

  So anyway, our boot block is loaded and running.  Yay!  But this
only gives us 446 bytes of code.  Boo!  (512 byte block size, and the
partition table uses some of it.)  That's not enough to do anything
more complicated than load a contiguous run of blocks from disk.  So
the only thing the GRUB boot block -- called "stage1" -- does is load
the rest of GRUB (called "stage2").  GRUB stage2 contains the
filesystem drivers, menu routines, kernel loader, and the like.
Sometimes there's also an intermediate "stage1.5" involved.

  How does stage1 find the next stage?  Well, the GRUB install routine
finds a contiguous run on disk for the next stage.  It also modifies
the stage1 code in the MBR to "know" where the start of said location
is for that particular system.  When the stage1 code runs, it loads
that location and attempts to execute it.  If the location in the MBR
has become invalid since it was written, that's generally when you get
the "it just prints 'GRUB' and hangs" failure mode -- GRUB loads and
executes something that isn't a (complete, working) GRUB second stage.

  How does GRUB store the second stage?

  GRUB can't use a filesystem for this, because filesystems generally
don't guarantee the stage2 file will be stored as a contiguous run of
blocks.  (Remember, that's all stage1 can handle.)  Plus, filesystems
tend to let you do things like modify the files in them.  A software
update or defrag might move the stage2 file.  (Remember how LILO would
fall apart if you breathed on the boot files?  This is why.)

  Sometimes, the GRUB installer can find a safe location that is big
enough for all of stage2, and writes a copy there for stage1 to load.
There is usually some unpartitioned space immediately after the MBR
which can be used for this purpose.  Some filesystems also let you
reserve space inside them for this kind of thing.

  If the GRUB installer can't find enough room for stage2, that's when
it uses a stage1.5 loader.  stage2 contains all of GRUB; it's fairly
big by boot loader standards.  Each stage1.5 image contains a single
filesystem driver (there is a stage1.5 image for each of several
filesystems).  So the installer can usually find a place to write a
copy of a stage1.5 loader.  (If not, it aborts the install with an
error message.  Hopefully you notice this.)

  In this scenario, the stage1 from the MBR loads stage1.5.  The
stage1.5 code uses the filesystem driver (rather than "raw" disk block
numbers) to find stage2.  That lets GRUB load a stage2 stored as a
file in a filesystem.

  Once stage2 is running, things get much easier.  stage2 is smart.
It knows about all the various filesystems, and can just load files
like an OS does.  It does this to load the menu.lst config file, and
then it uses the directives in that file -- or directives you type --
to load the kernel file, any initrd file, and any boot module files.
Once everything is loaded into memory, it executes them with the
"boot" command.

  That's when the kernel actually starts running.  The kernel does its
own init, including switching to protected mode.  The kernel pretty
much ignores the BIOS.  It has native drivers for the disk controller,
RAID, the filesystem, I/O controllers, etc., so it can find and mount
the root filesystem.  (If it turns out the kernel *doesn't* have
those, it panic()'s and dies.)

  Up until the kernel is started, the only way to do disk I/O is using
the BIOS.  Specifically, INT13 (software interrupt vector 0x13).  GRUB
is entirely dependent on the BIOS for all of its disk I/O.  The later
stages may have filesystem drivers, which let GRUB see the disk as
something more sophisticated than an array of blocks, but it still
needs the BIOS to read the blocks containing the filesystem.

> So, why doesn't grub just ignore the BIOS like linux?

  Because to do that, GRUB would have to contain disk controller
drivers for every type of hardware out there.  And it would have to
fit them all into the 446 bytes of code available in the MBR.
Because, if you're going to depend on the BIOS to load the disk
controller drivers from elsewhere on the disk, why just not depend on
the BIOS to load the OS files (like it does now) and save yourself the
trouble of writing and maintaining a bunch of disk controller drivers?

>> To tell the truth, I'm still not sure why one thing worked, but who
>> am I to argue with success?
>
>  Yeah, at some point one feels like, "I know pushing these knobs will make
> it work, but why?"

  Heh, this was even better.  This was once of those, "It worked, and
I hadn't pushed the knobs yet.  How did that work?  That shouldn't
have worked."  :)

>  Gar!  It looks like EFI borrowed a lot from OpenFirmware too.   Hmm, I
> wonder if anybody besides Apple is making EFI x86 boards.

  I also wonder if EFI is really the silver bullet Intel thinks it
will be.  In theory, all this stuff should work with the "legacy
BIOS".  One of the biggest barriers is that there is an awful lot of
buggy BIOS/firmware code out there.  I don't see EFI fixing that.  I
guess the theory is that since EFI is less of a "steaming pile of
elegance" than the BIOS, it should be easier to get right.  But I've
seen that argument used before for other things, and people usually
still find a way to frell everything up.  "If you try and make it
idiot-proof, they'll build a better idiot."

>  Ah, yeah.  I think I've only done this with single-disk setups on Windows.
> Reading some more on this, Windows seems to lean more on BIOS than the
> unices do ...

  It depends on which "Windows" you're talking about.  Classic Windows
(Win 3.x, 9x, ME) booted and ran on top of DOS.  They supported
protected mode drivers (which made a *huge* performance difference),
but they could actually fall back on 16-bit, real mode, MS-DOS and
BIOS support for I/O if needed.

  Windows NT (2000/XP/...) work a lot more like GRUB.  The NTLDR file
is kind of like GRUB's stage2.  I'm not sure how the MBR boot block
gets around to loading it.  But NTLDR uses the BIOS for I/O, reads
BOOT.INI for config, and then loads several dozen different files to
get NT running.  That includes reading the registry for information on
which drivers to load.  Gulp.

> [In Windows] it is possible, once booted, to then use your different sound,
> graphics, etc. drivers with hardware profiles.

  Right, but since Windows is all plug-and-play and everything,
doesn't that mean all we're really accomplishing is avoiding a bunch
of yellow-bang icons in Device Manager (for the hardware that suddenly
disappeared)?  I mean, yah, sure, it makes things cleaner, but it's
not exactly critical.  No?

> AFAIK, Linux (-based distros) doesn't offer such a facility.

  Well, that depends on what you're looking for.  There's no "Hardware
Profiles" GUI dialog box thingy (and $DEITY willing, there never will
be).  But if you attempt to load modules for all the hardware you
*might* have, the kernel will only keep the ones for hardware it
actually finds.  So if your NIC might be a 3Com or might be an Intel,
try to load both modules, and you'll end up with a single eth0 for
whichever you have.  (Or use the MAC address matching some distros
have to keep the interface name consistent.)

> I think the current xorg
> tree at least does autodetection reliably for video, or so I've heard.  This
> has the potential to shorten LUG meetings by 15 minutes or more. :)

  Careful.  Other problems will spontaneously generate to compensate.
At last week's SLUG meeting, the projector worked fine on the first
try.  So five minutes into the presentation, it went "whhiiiNE
whhiiiNE whhiiiNE POP!", and then the only light coming from it was
from the fault light on top.

>  I haven't counted, but does Linux support more than a hundred-ish types of
> disk controllers these days?  If not, I'd rather see them all available all
> the time ...

  Hmmmm.  Interesting idea.

  It isn't even a new one.  Back in the early days of Linux, before
the kernel was modularized (versions 1.x and prior), it was common for
distributions to ship with a "generic", "fat" kernel which tried to
build in support for as many different disk controllers as possible.
Once you got the system installed, you were expected to recompile your
own custom kernel with drivers for just your hardware.

  However, I recall reading about limits in the size of the kernel and
initrd image files in the past.  They may still exist.

>  Putting on my naive hat: GRUB can read ext2, so why can't it read a map of
> PCI ID's and drivers off of disk and do the right thing without initrd?

  I guess, in theory, it might be possible to do something like that,
but I believe it would require non-trivial changes to the kernel, or
some very elaborate changes to GRUB.

  When the kernel first starts running, the only drivers available are
those which were statically compiled into the main kernel image file.
This used to be the only way; you had to make sure you compiled in the
driver for the disk controller and filesystem for your root
filesystem.

  Then along comes initrd (initial RAM disk).  You build a generic
kernel which has a filesystem driver for the initrd.  When the kernel
starts, it finds the initrd already in RAM, mounts it, and runs the
/linuxrc script contained within.  That script loads modules needed
for the system it was built for -- disk controller, filesystem, RAID,
LVM, etc.

  To have GRUB load individual drivers for the kernel, there would
have to be some interface for loading kernel modules before the kernel
starts.  I imagine that's technically feasible, but I don't think
there's anything like it right now.

  Alternatively, you could have GRUB build an initrd image "on the
fly", but that would require *a lot* more sophistication than what
GRUB has now.  In particular, the filesystem drivers in GRUB are
read-only.  Write support needs much more complicated code.  Or so I'm
told.

>  If I'm understanding, that's roughly what I did with the LiveCD, but I
> think introducing a new disk into the mix (booted from) would affect grub's
> maps.

  It would.  See above (way above) about that situation.  :-)  First
you get the system installed on the temp disk.  Then you use that
running system to prepare proper GRUB map and menu files to re-install
GRUB on the "real" disks.

-- Ben