Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!purdue!gatech!ncsuvx!mcnc!decvax!ima!esegue!compilers-sender
From: eachus@mbunix.mitre.org (Robert Eachus)
Newsgroups: comp.compilers
Subject: Re: Error reporting in LR parsers
Message-ID: <1989Aug15.151121.3390@esegue.uucp>
Date: 15 Aug 89 15:11:21 GMT
Sender: compilers-sender@esegue.uucp (John R. Levine)
Reply-To: eachus@mbunix.mitre.org (Robert Eachus)
Organization: The MITRE Corporation, Bedford, Mass.
Lines: 66
Approved: compilers@esegue.segue.bos.ma.us

In article <1989Aug12.201931.4857@esegue.uucp> Mark Grand writes:
>Generating a list of acceptable tokens before allowing YACC to perform a
>default reduction is expensive.  A cheaper way (assumimg a fast
>implementation of memcpy) is to take a snapshot of YACC's state stack every
>time it gets a new token.  That way you can generate a list of the expected
>tokens from the snapshot and only have to do it when actually needed.

     There is an even faster way.  Pat Prange and I used it in a
parser generator and driver {LALR} which is available on Multics.  If
the next action to be taken by the compiler is a reduction (other than
into an accepting state) keep a list of states but do nothing. If it
is the first read (shift) following an accepting state hold it to one
side also.  When a succeeding legal token has been read and is ready
to shift onto the stack (or an accepting state is reached), do the
pending read (if any) then do all pending reductions up to the current
read.

     With this scheme, error correction can start from the point just
before the  last legal token was read.  This allows all sorts of
possible error recovery strategies to be tried.  We had twelve
combinations that we tried including deleting the previously read
token and swaping the previous and current tokens.  All this was
"free" in that it required a few bytes of storage in the drivers
stack, and about 3 or 4 extra instructions per successful shift or
reduction.  (Failures are another story, we allowed ourselves about a
tenth of a CPU second per correction, but it was worth it since the
compilers could often continue through several difficult syntax errors
and do a meaningful error scan thorugh an entire program.

     Two of my favorite corrections came from the Ada Compiler
Validation Suite:

     legal-statement;
     IFB THEN ...

     was corrected to:

     legal-statement;
     if IFB then ...

     and

     X: INTEGER ran ge 1..10;

     was corrected to:

     X: INTEGER range 1..10;

     All without special error tables, and without looking at the
spellings of non-terminals! In fact the parser tables for the Ada
grammar above were about 3500 36-bit words, including 36 words for
error recovery (closures for constructs to be deleted if panic mode
was used to throw a sequence of invalid tokens).

     As far as I know this tool is still used to maintain the parsers
for several Honeywell compilers (especially for the DPS6 and DPS6+,
but not the Ada compiler from DDC), and the source (highly Multics
specific PL/1) is on every existing machine running Multics.  (There
just aren't too many of them any more...sigh!)

					Robert I. Eachus
					eachus@mbunix.mitre.org
-- 
Send compilers articles to compilers@ima.isc.com or, perhaps, Levine@YALE.EDU
{ decvax | harvard | yale | bbn }!ima.  Meta-mail to ima!compilers-request.
Please send responses to the author of the message, not the poster.