Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!purdue!gatech!ncsuvx!mcnc!decvax!ima!esegue!compilers-sender From: eachus@mbunix.mitre.org (Robert Eachus) Newsgroups: comp.compilers Subject: Re: Error reporting in LR parsers Message-ID: <1989Aug15.151121.3390@esegue.uucp> Date: 15 Aug 89 15:11:21 GMT Sender: compilers-sender@esegue.uucp (John R. Levine) Reply-To: eachus@mbunix.mitre.org (Robert Eachus) Organization: The MITRE Corporation, Bedford, Mass. Lines: 66 Approved: compilers@esegue.segue.bos.ma.us In article <1989Aug12.201931.4857@esegue.uucp> Mark Grand writes: >Generating a list of acceptable tokens before allowing YACC to perform a >default reduction is expensive. A cheaper way (assumimg a fast >implementation of memcpy) is to take a snapshot of YACC's state stack every >time it gets a new token. That way you can generate a list of the expected >tokens from the snapshot and only have to do it when actually needed. There is an even faster way. Pat Prange and I used it in a parser generator and driver {LALR} which is available on Multics. If the next action to be taken by the compiler is a reduction (other than into an accepting state) keep a list of states but do nothing. If it is the first read (shift) following an accepting state hold it to one side also. When a succeeding legal token has been read and is ready to shift onto the stack (or an accepting state is reached), do the pending read (if any) then do all pending reductions up to the current read. With this scheme, error correction can start from the point just before the last legal token was read. This allows all sorts of possible error recovery strategies to be tried. We had twelve combinations that we tried including deleting the previously read token and swaping the previous and current tokens. All this was "free" in that it required a few bytes of storage in the drivers stack, and about 3 or 4 extra instructions per successful shift or reduction. (Failures are another story, we allowed ourselves about a tenth of a CPU second per correction, but it was worth it since the compilers could often continue through several difficult syntax errors and do a meaningful error scan thorugh an entire program. Two of my favorite corrections came from the Ada Compiler Validation Suite: legal-statement; IFB THEN ... was corrected to: legal-statement; if IFB then ... and X: INTEGER ran ge 1..10; was corrected to: X: INTEGER range 1..10; All without special error tables, and without looking at the spellings of non-terminals! In fact the parser tables for the Ada grammar above were about 3500 36-bit words, including 36 words for error recovery (closures for constructs to be deleted if panic mode was used to throw a sequence of invalid tokens). As far as I know this tool is still used to maintain the parsers for several Honeywell compilers (especially for the DPS6 and DPS6+, but not the Ada compiler from DDC), and the source (highly Multics specific PL/1) is on every existing machine running Multics. (There just aren't too many of them any more...sigh!) Robert I. Eachus eachus@mbunix.mitre.org -- Send compilers articles to compilers@ima.isc.com or, perhaps, Levine@YALE.EDU { decvax | harvard | yale | bbn }!ima. Meta-mail to ima!compilers-request. Please send responses to the author of the message, not the poster.