Hanging server
paul_cour at verizon.net
paul_cour at verizon.net
Wed Feb 22 13:48:01 EST 2006
>From: Michael ODonnell <michael.odonnell at comcast.net>
>Date: Wed Feb 22 09:11:35 CST 2006
>To: gnhlug-discuss at mail.gnhlug.org
Hi
Great Post.
Enjoyed reading it!
paulc
>Subject: Re: Hanging server
>
>
>Here's a summary that's both superficial and obvious, but
>at least it's based on a tortured analogy:
>
>A) If the patient is conscious and communicative, you can
> sometimes get them to tell you where it hurts, or at least
> to report as much as they can about how they feel and what
> they know, and have them just generally cooperate in your
> analysis as much as possible.
>
>B) If the patient is comatose/unresponsive it's sometimes
> still possible to perform exploratory surgery/vivisection.
>
>C) If the patient is dead, an autopsy is your only option.
>
>D) If you can persuade the patient to ride the Wheel Of Life
> and repeatedly relive the same experience during each
> successive reincarnation (ala Groundhog Day) you may be
> able to adjust their Karma from cycle to cycle until they
> (or you) reach Enlightenment.
>
>Case A involves tools like ps, netstat, lsof, strace, gdb, top,
>crash, etc. You can monitor the behavior of various processes
>with strace, rummage around in their /proc data, attach to them
>with gdb, stare at system logs, use crash to see who's blocked
>on what, etc.
>
>Case B is difficult unless you had the foresight to rig up kGDB,
>and is a bit of a Black Art. With kGDB you can set breakpoints
>in various code paths, see who's blocked on what, whether/which
>interrupts are being serviced, etc. The only other means of
>gathering any info is (as Ben mentioned) probably the various
>SysReq key combos, though IIRC you also must have enabled them
>in advance. BTW, neither of these work if the corresponding
>interrupts are being ignored.
>
>Case C is basically limited to crashdump analysis, a static
>variation of B. Generating a dump can sometimes be a challenge.
>See the Linux Kernel Crash Dump (LKCD) project.
>
>Case D requires that you can reliably reproduce the symptoms so
>you can change conditions from cycle to cycle, observing if/how
>the failure mode changes and then reason backwards from there.
>
>_______________________________________________
>gnhlug-discuss mailing list
>gnhlug-discuss at mail.gnhlug.org
>http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
More information about the gnhlug-discuss
mailing list