Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site rti-sel.UUCP Path: utzoo!linus!philabs!cmcl2!seismo!harvard!talcott!panda!genrad!decvax!mcnc!rti-sel!trt From: trt@rti-sel.UUCP (Tom Truscott) Newsgroups: net.unix-wizards Subject: Re: fork timing hole and runaway children Message-ID: <356@rti-sel.UUCP> Date: Mon, 19-Aug-85 13:39:52 EDT Article-I.D.: rti-sel.356 Posted: Mon Aug 19 13:39:52 1985 Date-Received: Fri, 23-Aug-85 20:44:09 EDT References: <541@unisoft.UUCP> <671@cyb-eng.UUCP> <546@unisoft.UUCP> Distribution: net Organization: Research Triangle Institute, NC Lines: 44 By the way, the 'fork() loses signals' problem hit our site about a month ago, due to a rampantly forking program. Someone here created a program which forked infinitely: while ((pid = fork()) != THEONEIWANT) ; Okay, now I have the pid I want ... This is one of those classic annoying UNIX problems that seems to lack a general, simple, portable solution. If I am wrong, someone please post it along with a man page. Sorry this article is so long, think of it as either (a) Verbose evidence that fork() has a bug (b) A suggestion that this subject be reopened I was busy reading news at the time, so we let the problem go until people started complaining that troff was slow (the load average was over 25). Then we came up with the following incorrect solution: killpg(getpgrp(atoi(argv[1])), SIGKILL); /* 'killpgrp'? 'getpg'? */ We used 'ps' (well, actually 'top') to get a pid and gave it to the above program in an attempt to kill all of the rampant monsters. There were several flaws in the above program: 0) In general the monsters are in different process groups (we were lucky). 1) Since fork() shields a partial child (fetus?) from SIGKILL, we could not kill them all, and the ones that were left immediately regenerated. This *really* slowed things down. 2) After a few rounds of this we fed a bad pid to the above program, so getpgrp returned -1, and we did a killpg(-1, SIGKILL). I am not sure why, but we found a reboot quite necessary. (It solved the problem!) Over lunch Mike Shaddock decided that if we had done a SIGSTOP rather than a SIGKILL we might have avoided the embarrassment. But Tim Seaver (mcnc -- Microelectronics Center of North Carolina) had a more general solution. Run (as root) a program which gobbles MAXUPRC process slots: for (i = 0; i < 25; i++) if (fork() == 0) { setuid(rampantuid); sleep(5*60); /* you have five minutes to clean up */ } These methods require manual zapping of the monsters, but at least they (probably) work. Tom Truscott