Unkillable processes?
Paul Lussier
p.lussier at comcast.net
Sat Feb 18 21:27:01 EST 2006
Ben Scott <dragonhawk at gmail.com> writes:
> On 2/17/06, Dan Coutu <coutu at snowy-owl.com> wrote:
>> On a Red Hat 9 system I've encountered a situation where there are two
>> processes that I cannot kill when using kill -9 (or any other value, for
>> that matter.)
>
> Do a "ps aux" and note their status. It's "D", right? That means
> they're in "uninterruptable sleep" -- waiting for system call to
> finish something that cannot be interrupted. The "D" stood for
> "driver" or "disk" originally. Bad hardware or buggy device drivers
> are the most common cause of a process stuck in this state.
Or, an NFS server from which this system mounted a file system has
gone off the net. I've seen this quite often. You've got some system
which exports a file system via NFS, it goes down, and on all clients
which NFS mount from this server, any process which stats all mounted
file systems (think df, ls, etc.) hangs and can't be killed.
> The only thing you can do is wait or reboot the system.
This isn't entirely true, *especially* if the cause is as I just
described. If it is in fact a down NFS server from which the client
didn't properly unmount the file system before it went away, there's
is a cure. That cure is to bring the NFS server back online.
However, it may be that you can't bring *that* NFS server back on line
for some reason, at least not any time soon. In that case, find some
other system, configure it as an NFS server, and export *something* as
the same name as the file system which is wedged. One that's done,
configure the NIC to use the exact same IP as the NFS server which
suddenly disappeared. This won't fool NFS entirely, but just enough
such that the client with get a 'stale NFS file handle' error, and
allow you to umount the file system.
I recently used this trick in exactly this way. If this isn't enough
detail, let me know, and I'll send out the postmortem I mailed to my
dev group (which isn't all that much more detailed than this e-mail :)
> If the syscalls ever complete, the kernel will immediately process
> the kill signals you sent, so those processes are dead, they just
> don't know it yet. :)
And that's what this NFS spoofing trick does. It essentially allows
the processes enough room to complete and die :)
--
Seeya,
Paul
More information about the gnhlug-discuss
mailing list