Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!seismo!nbires!hao!noao!arizona!megaron!whm From: whm@megaron.UUCP Newsgroups: comp.bugs.4bsd,comp.unix.wizards Subject: longjmp botches in sendmail on 4.3+NFS Message-ID: <1351@megaron.UUCP> Date: Fri, 12-Dec-86 03:55:32 EST Article-I.D.: megaron.1351 Posted: Fri Dec 12 03:55:32 1986 Date-Received: Mon, 15-Dec-86 19:41:20 EST Distribution: net Organization: Dept of CS, U of Arizona, Tucson Lines: 37 Xref: mnetor comp.bugs.4bsd:80 comp.unix.wizards:349 I had occasionally noticed a core file in sendmail's queue directory, but had never thought much of it; I'd just remove it. I then got to wondering about how often sendmail core dumped and found that it happens more than one might think (hope?). In particular, submissions to a large mailing list here (~100 non-local addresses) often produce several such core dumps. Examination of the dumps usually reveals that sendmail got a longjmp botch. According to my count, there are four longjmps in sendmail. Two of them are longjmp(TopFrame) and these are only done if QuickAbort is != 0. The core files show that QuickAbort is 0, so that seems to eliminate those two longjmps as candidates for the botches in question. The third longjmp is invoked from the smtpinit routine -- if the SMTP greeting isn't seen within five minutes of getting a connection, an event goes off and the longjmp is performed. The fourth longjmp is called due to read timeout in sfgets -- the fgets-like routine that takes steps to not get hung. In most of the core files, SmtpState is SMTP_OPEN, which implies that the longjmp that's blowing up is the fourth one, in sfgets. Popular values for SmtpPhase are "user open", "greeting wait", and "DATA wait". The sites involved on the far end are typically Arpanet hosts that we exchange mail with on a regular basis. I'm fuzzy on this, but I think the usual reason for a botch in longjmp is that the routine that did the associated setjmp has already returned. In both the two likely longjmps mentioned, it seems unlikely that either routine with the setjmp call could return before the longjmp is done. If anyone has any suggestions on what the problem might be, I'd like to hear them. I think my step is to put some debugging stuff in a version of longjmp in order to try to narrow down the problem some more. Bill Mitchell whm@arizona.edu {allegra,cmcl2,ihnp4,noao}!arizona!whm