Server issues baffling me...
Neil Joseph Schelly
neil at jenandneil.com
Wed Oct 22 19:18:06 EDT 2008
I'm hoping someone can say they recognize this and that if I press 'Alt-p' or
something, it will all go away. I am not that optimistic, but I figured it's
worth a try.
I have a server that I can turn on in our office, plugged into the wall, and
it will work fine for days, weeks, whatever. I have never had a problem
there. When I bring it to the datacenter, it won't finish booting before it
starts to get hda DMA timeouts. It's the errors I typically associate with a
failed drive. Without fail, the machine does it every time it's booted up in
the datacenter. And without exception again, it works fine in the office.
To replicate the error in the office, I've tried switching the IDE cables,
running badblocks or other disk-thrashing sorts of programs like dd
if=/dev/hda of=/dev/null many times. It's run for over a week without any
issue. I've tried it with the network interface at full and half duplex. I
tried running the machine in a closed room that probably got up to about
75-80 degrees or so in temperature.
To prevent the error in the datacenter, I've tried booting it with different
kernels. I've disconnected the network cable so that it's just power and a
serial console. I also did just power and a monitor/keyboard. No matter
what I try, it never gets to finish the booting process, not even to
single-user mode, before the timeouts start filling the screen.
Has anyone seen any behavior like this? At this point, I don't even know
where to look. I can't imagine that there's actually an element of our
office that provides a better environment for machines and the office power
surely can't be any better than what's at the datacenter either. No other
machines are exhibiting this behavior. The server in question had been
running fine in the datacenter for months until this apparent disk failure
occurred. I replace the disk and it worked for another month. I replaced
that disk under warranty and the new one never booted up right. I don't
believe I've actually got 3 hard drive failures in a month's time, but I
don't know what else to look at.
Help...
-N
More information about the gnhlug-discuss
mailing list