Path: utzoo!utgpu!water!watmath!clyde!rutgers!rochester!daemon From: stuart@cs.rochester.edu Newsgroups: comp.unix.wizards Subject: Wait, Select, and a SIGCHLD Race Condition Message-ID: <5105@sol.ARPA> Date: 11 Dec 87 05:43:40 GMT Sender: daemon@cs.rochester.edu Lines: 53 I need advice (or sympathy) for handling a race condition in 4.3BSD flavored UNIX. Briefly, I want to use wait3 to reap all the dead or stopped children of a process, then use select to wait for the first new IO or child activity. Sketch something like this: while (0 < (pid = wait3(..., WNOHANG, ...))) { /* do something with child */ } /* XXX Race condition is here */ numfds = select(...); if (numfds < 0) { if (errno == EINTR) /* caught a signal, what kind was it, etc */ } There is a race condition between reaping children and starting the select. It is possible that a child can change status, a SIGCHLD gets delivered *before* I enter select, I don't notice it, enter select and hang forever. Even if I have a handler for SIGCHLD that sets a flag and I check that flag immediately before calling select, there is still a (small) window of vulnerability. Ideally, I would like to set the signal mask to block SIGCHLD and have select release the signal *after* starting to wait. That would allow me to ensure that *all* dead children are noticed. However, select does not release any signals as far as I can tell. Berkeley truly improved the signal handling features going to 4.3, but the (improved) features don't seem to let me write this code safely. (In particular, the sigblock, signal, sigpause, signal, setsigmask idiom is of no help here.) I would appreciate advice on how to safely avoid this race condition given 4.3BSD features. I suspect that it's not possible, but would be delighted to learn otherwise (see next paragraph for an equivocation for "not possible"). It's not essential that the skeleton code look like that given above; all that's needed is that I/O and child activity is processed as soon as *either* is available. Neither kind of activity is guaranteed to happen, and some events may already have happened, which must not be ignored. There *is* a kludge that I can fall back on, but I would really like to avoid it: Put a maximum on the timeout given to select and check for more children when select times out. Even if I miss a SIGCHLD, I would still reap the child. This is doable, but a pain, because I am managing timer requests in addition to IO and child requests in the same package; keeping the real timeouts straight from the kluge timeouts (which might coincide!) is real ugly. The whole point of this package is to multiplex lots of request and AVOID POLLING. The kludge is, of course, nothing but polling. Stu Friedberg {ames,cmcl2,rutgers}!rochester!stuart stuart@cs.rochester.edu