Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!husc6!uwvax!oddjob!gargoyle!ihnp4!cbosgd!osu-cis!tut!lvc
From: lvc@tut.cis.ohio-state.edu (Lawrence V. Cipriani)
Newsgroups: comp.unix.xenix,comp.unix.questions
Subject: Re: Need help with SCO: the process that would not die.
Message-ID: <2345@tut.cis.ohio-state.edu>
Date: Fri, 27-Nov-87 17:01:52 EST
Article-I.D.: tut.2345
Posted: Fri Nov 27 17:01:52 1987
Date-Received: Mon, 30-Nov-87 00:42:22 EST
References: <116@citcom.UUCP> <911@csun.UUCP>
Organization: Ohio State Computer & Info Science
Lines: 30
Keywords: ps kill process SCO xenix
Summary: thats the attnix way
Xref: mnetor comp.unix.xenix:1225 comp.unix.questions:5127

In article <911@csun.UUCP>, abcscnge@csun.UUCP (Scott Neugroschl) writes:
> 
> I realize this isn't a Xenix question (from me), but we have a similar
> problem with our Zilog S8000 running ZEUS 3.2 (Zilog's version of SYS III)
> at work (not CSUN).  It appears to be related to signal processing.   Our
> in-house guru tells us that the process is "locked on I/O", implying that
> the signal really screwed up the kernel data.  Recommend you look at the
> signal handling logic if possible, and ask the people causing the lockup
> if they have done an interrupt (ctrl-c or DEL) just before it locked...
> 
> Any wizards out there know of such bugs in either kernel (xenix or zilog)?
>
> Scott "The Pseudo-Hacker" Neugroschl

Its not a bug.  This is the way UNIX and all derivatives (that I know
of) are designed.  Whether this is a good design is another question.
If the operating system is performing certain I/O operation on behalf of
your program (eg a close), and the operation does not complete (for whatever
reason - usually a hardware problem) your program won't die, and can't die
with a signal, not even SIGKILL.  You might adb the os and fiddle some
bits, but I don't recommend it. A reboot is the only sure way to make it
go away, though other tricks sometimes work depending on the circumstances.
The wchan is an address that can be used to identify the offendig hardware,
a tty structure, a tape, or network device for example.  A local guru should
be able to tell you what device corresponds to the address.  If he or she
can't they aren't much of a guru.

This is one area of UNIX where it is particularly weak.  Hardware failures
ought to be handled more robustly, and most certainly if they are for
non critical devices.  I don't see any hope soon for a better strategy.