Path: utzoo!attcan!uunet!lll-winken!lll-tis!helios.ee.lbl.gov!pasteur!ucbvax!ICAEN.UIOWA.EDU!dbfunk
From: dbfunk@ICAEN.UIOWA.EDU (David B. Funk)
Newsgroups: comp.sys.apollo
Subject: Hanging CSH
Message-ID: <8808190051.AA27050@umaxc.weeg.uiowa.edu>
Date: 19 Aug 88 00:47:29 GMT
Sender: daemon@ucbvax.BERKELEY.EDU
Organization: The Internet
Lines: 30

> Since we went to SR9.7, we've had a recurring problem, but one that we
> have not been able to repeat on demand.  Occasionally, one types a Unix
> command and either gets a "segmentation violation" or the Cshell
> abruptly terminates.  Then the Cshell and all other Cshells active
> hang.  New CSH's also hang, as do new BSH's.  The Aegis shell still
> runs, and can be used to run at least some unix commands.  A "ps ax"
> so derived indicates that the CSH's are in state "S".

    This sounds like an "/etc/passwd.map" problem.
Many Domain/IX utilities, including the shells, need to be able to
read the /etc/passwd.map file.
This file provides a mapping between Aegis UIDs (PPO files) and
Unix User IDs. If this file is unavailable (due to a node crash
or network problem) or is out of sync (not updated with /etc/crpasswd)
then it can cause the problems that are described above.
A Unix shell (Bsh, csh) when started throws a read lock on the
passwd.map file. If the node with the real "/etc" directory crashes
the lock may be lost. If the passwd files (/etc/groups, /etc/passwd,
& /etc/passwd.map) are then updated with crpasswd the stream
to the old passwd.map file may be lost. This can happen easily
with users who "never" log out. (IE log in on Monday and leave
the same shells active all week.)
     To verify that this is your problem: do a "ps agu" in an
Aegis shell and see if the user names show up in column 1. 
The ps listing will have blank
spaces for the user names if ps can't "see" /etc/passwd.
    It is possible to have multiple copies of "/etc" on a
system to increase availability but this causes increased
sys_admin overhead. As the links pointing to it are static
this will still not help active shells when a copy goes away.
Again, this is fixed at sr10.