Server issues baffling me...

Neil Joseph Schelly neil at jenandneil.com
Wed Oct 22 19:18:06 EDT 2008


I'm hoping someone can say they recognize this and that if I press 'Alt-p' or 
something, it will all go away.  I am not that optimistic, but I figured it's 
worth a try.

I have a server that I can turn on in our office, plugged into the wall, and 
it will work fine for days, weeks, whatever.  I have never had a problem 
there.  When I bring it to the datacenter, it won't finish booting before it 
starts to get hda DMA timeouts.  It's the errors I typically associate with a 
failed drive.  Without fail, the machine does it every time it's booted up in 
the datacenter.  And without exception again, it works fine in the office.

To replicate the error in the office, I've tried switching the IDE cables, 
running badblocks or other disk-thrashing sorts of programs like dd 
if=/dev/hda of=/dev/null many times.  It's run for over a week without any 
issue.  I've tried it with the network interface at full and half duplex.  I 
tried running the machine in a closed room that probably got up to about 
75-80 degrees or so in temperature.

To prevent the error in the datacenter, I've tried booting it with different 
kernels. I've disconnected the network cable so that it's just power and a 
serial console.  I also did just power and a monitor/keyboard.  No matter 
what I try, it never gets to finish the booting process, not even to 
single-user mode, before the timeouts start filling the screen.

Has anyone seen any behavior like this?  At this point, I don't even know 
where to look.  I can't imagine that there's actually an element of our 
office that provides a better environment for machines and the office power 
surely can't be any better than what's at the datacenter either.  No other 
machines are exhibiting this behavior.  The server in question had been 
running fine in the datacenter for months until this apparent disk failure 
occurred.  I replace the disk and it worked for another month.  I replaced 
that disk under warranty and the new one never booted up right.  I don't 
believe I've actually got 3 hard drive failures in a month's time, but I 
don't know what else to look at.

Help...
-N


More information about the gnhlug-discuss mailing list