Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!husc6!mit-eddie!mit-trillian!jis From: jis@mit-trillian.MIT.EDU (Jeffrey I. Schiller) Newsgroups: comp.bugs.4bsd,comp.unix.wizards Subject: Re: longjmp botches in sendmail on 4.3+NFS Message-ID: <1562@mit-trillian.MIT.EDU> Date: Wed, 17-Dec-86 23:09:26 EST Article-I.D.: mit-tril.1562 Posted: Wed Dec 17 23:09:26 1986 Date-Received: Thu, 18-Dec-86 06:56:03 EST References: <1351@megaron.UUCP> Reply-To: jis@trillian.UUCP (Jeffrey I. Schiller) Organization: MIT Project Athena Lines: 53 Keywords: sendmail bugs longjump botch Xref: mnetor comp.bugs.4bsd:93 comp.unix.wizards:389 The problem is caused by the two nested setjumps. Basically what happens is that smtpinit() sets up a timer to go off after five minutes (if it doesn't get a greeting). It then calls reply() which ultimately calls sfgets(). sfgets sets up a timer (usually 2 hours) to go off if no data is received (ie. you are in a collect and no data comes in after 2 hours). The code in sfgets does a setjmp, sets a timer (which will do a longjmp) and does the read. If the read completes the timer is removed... HOWEVER if the 5 minute timer goes off in smtpinit, the stack frame of sfgets is abandoned with the timer still active. Now if the same sendmail process is around when that timer goes off (ie. in two hours), which will typically only happen on LARGE mailing lists, you get a longjmp botch. I found this bug a few weeks ago (with a mailing list of about ~250 recipients). I fixed it by changing the code in smtpinit to NOT SET A TIMER, but to instead change the value of "ReadTimeout" (which is the global variable that sfgets() uses to determine how long to wait) to 5 minutes and then restore it later. Here is the comment in my code: /* ** Get the greeting message. ** This should appear spontaneously. Give it five minutes to ** happen. ** ** JIS: We change the global variable ReadTimeout to be 5 ** minutes. This variable is used by the lowlevel routine ** sfgets to determine how long to wait for input. ** when we get our greeting we return ReadTimeout to its ** previous state. IMPORTANT: The older code I replaced ** used a separate timeout (via a setjmp and longjmp) ** this LOSES REAL BIG if the 5 minute timeout goes off ** for then sfgets gets its stack unwound and leaves ** a lingering event that will eventually cause a longjmp ** to some ancient stack history, sendmail then dies horribly. ** This usually happens only when dealing with large mailing ** lists ("xpert" in this case > 200 recipients), which is ** the LAST place you want to dump core, for then the queue ** files are out of date and LOTS of people get a duplicate ** copy of the message that was in progress. * */ Btw. Another unrelated bug just discovered yesterday is that if you have a LARGE number of recipients at one destination (like wiscvm or seismo) then syslog() may get called with a line greater then 1024 characters.... and blamo! core dump. This bug is really in the syslog(3) routine, not sendmail itself... -Jeff