Path: utzoo!mnetor!uunet!husc6!hao!ames!aurora!labrea!decwrl!hplabs!hp-sdd!ucsdhub!jack!crash!bblue
From: bblue@crash.cts.com (Bill Blue)
Newsgroups: news.software.b
Subject: Re: Strange Core Dumps
Message-ID: <2132@crash.cts.com>
Date: 12 Dec 87 08:06:06 GMT
References: <2122@crash.cts.com> <7961@princeton.Princeton.EDU>
Reply-To: bblue@crash.CTS.COM (Bill Blue)
Organization: Crash TS, El Cajon, CA
Lines: 67

In article <7961@princeton.Princeton.EDU> pep@Princeton.EDU (Pat Parseghian) writes:
>Your problem has a familiar ring . . .
>Our mail/news gateway occasionally runs out of swap space and we haven't been
>lucky enough to track it down.  It doesn't panic, it limps along - but it's
>hard to get evidence when "ps" dumps core (Bus error) and utilities like "top"
>and "systat" won't run (Not enough memory).
>
>We're running 4.3BSD+NFS (Mt. Xinu) on a Microvax II.  We're running 2.11,
>patch 14, but we've seen this with patch 8).  We went from patch 8 to patch 14
>a week ago; whether we saw the problem prior to patch 8, I don't recall.  We
>have 16 MB of memory and about the same amount of swap space.

I too saw this behavior at patch level #8.

>What does your core dump suggest???

That things were very scrambled.  The stack trace revealed only one
call, 'pad()+02f'.  I looked all through the code (actually grep'd) for
any sort of a routine called pad() to no avail, so just assumed that
stack display was erroneous.

>I'd like to supplement your observations:
>- The offending articles are the only ones in my history file with a "%" in a
>  Message-ID.
>- One of the articles () has a References line
>  that is not a valid Message-ID (to the best of my understanding).
>- That particular article is still on my system as /usr/spool/news/.ar006603,
>  which is linked to /usr/spool/news/comp/lang/modula2/576.

I missed that one completely.  The three .ar files that I have do in fact
have correctly posted counterparts (originally links) in comp.lang.modula2
here.  I completely missed the link...

In checking all the messages in my spool directory for comp.lang.modula2,
the three messages for which I have saved .ar files, are the only files in
the directory that use that format in the Message-ID line.  Others may
have it in a References line, though.

>This last observation was most interesting to me, because:
>- The article is in my history file.

Yep.

>- We sent the article successfully to our netnews neighbors (I found it in a
>  spool directory on one of our neighboring machines, with a path indicating
>  it came through us).

Same here.

>- So why didn't inews unlink /usr/spool/news/.ar006603?  Probably because it
>  got hung before calling xxit().  The article arrived on our system around
>  21:43 on 12/8; at 9:49 on 12/9 we ran out of memory/swap space.  At 10:42 I
>  rebooted.  I'll bet that inews was running, but I can't confirm that.

The times on each of my .ar files are within three minutes of each runaway or
crash that I experience.

I originally reported that rerunning the same batch did not produce the
same results.  But in thinking back about this, I'm not positive this
is so, because I'm not sure that the batch that was running at the time
of the crash was interrupted in the middle and was actually left there
for the next run.  So it looks as if something about the processing of
the Message-ID line for those particular articles is corrupting code in
another location, which doesn't show up until a later time -- like maybe
when that batch ends.

--Bill