Hanging server

paul_cour at verizon.net paul_cour at verizon.net
Wed Feb 22 13:48:01 EST 2006


>From: Michael ODonnell <michael.odonnell at comcast.net>
>Date: Wed Feb 22 09:11:35 CST 2006
>To: gnhlug-discuss at mail.gnhlug.org
Hi

Great Post.
Enjoyed reading it!

paulc

>Subject: Re: Hanging server

>
>
>Here's a summary that's both superficial and obvious, but
>at least it's based on a tortured analogy:
>
>A) If the patient is conscious and communicative, you can
>   sometimes get them to tell you where it hurts, or at least
>   to report as much as they can about how they feel and what
>   they know, and have them just generally cooperate in your
>   analysis as much as possible.
>
>B) If the patient is comatose/unresponsive it's sometimes
>   still possible to perform exploratory surgery/vivisection.
>
>C) If the patient is dead, an autopsy is your only option.
>
>D) If you can persuade the patient to ride the Wheel Of Life
>   and repeatedly relive the same experience during each
>   successive reincarnation (ala Groundhog Day) you may be
>   able to adjust their Karma from cycle to cycle until they
>   (or you) reach Enlightenment.
>
>Case A involves tools like ps, netstat, lsof, strace, gdb, top,
>crash, etc.  You can monitor the behavior of various processes
>with strace, rummage around in their /proc data, attach to them
>with gdb, stare at system logs, use crash to see who's blocked
>on what, etc.
>
>Case B is difficult unless you had the foresight to rig up kGDB,
>and is a bit of a Black Art.  With kGDB you can set breakpoints
>in various code paths, see who's blocked on what, whether/which
>interrupts are being serviced, etc.  The only other means of
>gathering any info is (as Ben mentioned) probably the various
>SysReq key combos, though IIRC you also must have enabled them
>in advance.  BTW, neither of these work if the corresponding
>interrupts are being ignored.
>
>Case C is basically limited to crashdump analysis, a static
>variation of B.  Generating a dump can sometimes be a challenge.
>See the Linux Kernel Crash Dump (LKCD) project.
>
>Case D requires that you can reliably reproduce the symptoms so
>you can change conditions from cycle to cycle, observing if/how
>the failure mode changes and then reason backwards from there.
> 
>_______________________________________________
>gnhlug-discuss mailing list
>gnhlug-discuss at mail.gnhlug.org
>http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss




More information about the gnhlug-discuss mailing list