Help! Is this kernel or hardware problem? (fwd)

Stephen Ryan stephen.p.ryan at dartmouth.edu
Thu Nov 24 21:39:01 EST 2005


On Wed, 2005-11-23 at 22:31 -0500, Steven W. Orr wrote:
> My system has been behaving weirdly. (What the hell does that mean?)
> It has rebooted a couple of times and just now I had a ~20 second lockup.
> 
> I just grabbed syslog for the event but I don't know what it means. Any help? 
> Please?

...

> Nov 23 22:24:53 saturn kernel: (scsi0:A:0:0): Device is disconnected, 
> re-queuing SCB
> Nov 23 22:24:53 saturn kernel: Recovery code sleeping
> Nov 23 22:24:53 saturn kernel: (scsi0:A:0:0): Abort Tag Message Sent
> Nov 23 22:24:53 saturn kernel: Recovery code awake
> Nov 23 22:24:53 saturn kernel: Timer Expired
> Nov 23 22:24:53 saturn kernel: aic7xxx_abort returns 0x2003
> Nov 23 22:24:53 saturn kernel: scsi0:0:0:0: Attempting to queue a TARGET RESET 
> message
> Nov 23 22:24:53 saturn kernel: CDB: 0x2a 0x0 0x3 0x6 0xcc 0x48 0x0 0x4 0x0 0x0
> Nov 23 22:24:53 saturn kernel: aic7xxx_dev_reset returns 0x2003
> Nov 23 22:24:53 saturn kernel: Recovery SCB completes

I've seen a whole bunch of those; they increased in frequency (along
with those nasty 20 second hangs) until I had six hard drives fail in
the space of two weeks.  After replacing all those drives, I no longer
get those messages or the 20 second hangs :-)  I had it fairly easy;
those machines all tend to have uptimes that measure in months, so when
I suddenly started getting this kind of failure message in the logs, it
was pretty obviously not a bad kernel or unsupported hardware.

Try the SMART monitoring tools from
http://smartmontools.sourceforge.net/ to see if your drives are still
happy.




More information about the gnhlug-discuss mailing list