Path: utzoo!utgpu!water!watmath!clyde!cbosgd!ncr-sd!matt
From: matt@ncr-sd.SanDiego.NCR.COM (Matt Costello)
Newsgroups: comp.unix.wizards
Subject: Re: Wait, Select, and a SIGCHLD Race Condition
Message-ID: <1944@ncr-sd.SanDiego.NCR.COM>
Date: 12 Dec 87 05:53:50 GMT
References: <5105@sol.ARPA>
Reply-To: matt@ncr-sd.SanDiego.NCR.COM (Matt Costello)
Organization: NCR Corporation, Rancho Bernardo
Lines: 39

In article <5105@sol.ARPA> stuart@cs.rochester.edu writes:
>I need advice (or sympathy) for handling a race condition in 4.3BSD
>flavored UNIX.  Briefly, I want to use wait3 to reap all the dead or
>stopped children of a process, then use select to wait for the first
>new IO or child activity.

I've two methods I use to get around the race conditions in signals.
They are:

1.  If you are not using SIGALRM for something else, have your timeout
    routine re-enable the SIGALRM on 1 second intervals until it is
    turned off in the outer level code.  If the original signal hits
    the timing hole then the second (or third) won't.
    The beauty of this is that it usable in any version of UNIX, since
    it uses no features specific to BSD or USG.

    For wanting to not miss any child processes with SIGCHLD:

	onedied() {
		signal(SIGCHLD,SIG_DFL); /* will infinite loop otherwise */
		signal(SIGALRM,onedied); alarm(1);
	}

		signal(SIGCHLD,onedied);
		/* race condition is here... */
		numfds = select();	  /* or read(), or msgrcv() */
		alarm(0);


2.  For select() or any operation where the process is waiting on incoming
    IO, you can have the signal routine send a dummy message that will
    cause the select() to return immediately.  Rather than aborting the
    operation find some way to make it terminate normally.  This works
    wonderfully for SYSV message queues since it is perfectly legal to
    send a zero length message.
-- 
Matt Costello	
+1 619 485 2926	
		{sdcsvax,cbosgd,pyramid,nosc.ARPA}!ncr-sd!matt