Path: utzoo!attcan!utgpu!watmath!att!bellcore!decvax!ima!esegue!compilers-sender From: heirich@cs.ucsd.edu (Alan Heirich) Newsgroups: comp.compilers Subject: Re: Error reporting in LR parsers Message-ID: <1989Aug8.130922.1019@esegue.uucp> Date: 8 Aug 89 13:09:22 GMT References: <1989Aug4.032053.753@esegue.uucp> <1989Aug6.024931.10014@esegue.uucp> Sender: compilers-sender@esegue.uucp Reply-To: heirich@cs.ucsd.edu (Alan Heirich) Organization: EE/CS Dept. U.C. San Diego Lines: 86 Approved: compilers@esegue.segue.bos.ma.us This is the first of two postings. This posting describes a workaround for the default reduction problem with yacc. The second posting describes code modifications to DECUS yacc which provides meaningful diagnostic messages both for describing a parse (debugging a grammar) and for an end-user at run time. In article <1989Aug6.024931.10014@esegue.uucp> djones@megatest.uucp (Dave Jones) writes: >Mr. Worley has generously taken the time to post a long article about getting >yacc to give up a list of token-categories which would be legal when a syntax >error is found. The idea of grouping tokens into categories is a good one. >But the technique in no way solves the "default reduction" problem, as >he claims it does. Mr. Jones is correct. However, the default reduction problem is not hard to work around. The problem: yacc violates correct shift-reduce parsing rules by assigning "default reductions". For the following item: A --> B C . the corresponding rule should be reduced only when the lookahead token is in the follow set of A. A given automaton state might contain many such items, and also many items on which a shift is anticipated, e.g. Q --> R S . T (in which case a shift would be expected on when the lookahead token is in the first set of T). The yacc parser should report an error whenever the lookahead token is neither in a follow set of a rule like "A" above, nor in a first set for a rule like "Q" above. But yacc builds its automaton in such a way that this is not the case. Instead, a rule like "A" would get reduced whenever the lookahead token can't be handled in any other way. In other words, a reduction happens when an error should be reported. These "default reductions" continue until the parser is in a state with no reductions, at which point it reports an error. Any diagnostic message based on the state where the error is reported are thus potentially incorrect, since the lookahead set might be different from the lookahead set of the state where the error was actually encountered. Consequences of the problem: there are three types of states: shift only; reduce only; and combined shift & reduce states. It turns out that this "problem" in irrelevant to two of the kinds of states; and is easily worked around for the case of combined shift & reduce states. case 1: a state contains only shift items. Then there is no problem, because the error is reported instantly. case 2: a state contains only reduce items. The default reductions will change the state before the error gets reported. But this state will have the same lookahead set as the state where the error was encountered. This is because the reduction will return the parser to a state where the lookahead set contains the first set of the reduced nonterminal. The reductions don't change the lookahead set. case 3: combined shift & reduce items. This is the potentially problematic state, which Mr. Jones illustrated in his posting. As he explained, the default reductions move the parser to a state in which the lookahead set is not the same as when the error was encountered. Specifically, the new lookahead set will contain the union of the first sets of the reduced nonterminal(s), but will not contain the tokens which could have been shifted in the state where the error was encountered. Case 3 has an easy fix. Modify the yacc parser to save the current state *after* it obtains a new token from yylex. (This occurs near the beginning of the parse code, right after the debug code that prints out the state number). Then, when an error is encountered, yyerror traps the current state and compares it to this saved state. The set of expected tokens is the union of the lookahead sets of these two states. (In cases 1 and 2 above, these will be the same sets of tokens). The next posting describes a set of mods I have made to DECUS yacc to implement automatic diagnostics based on this approach. ------------------------- Alan Heirich Comp. Sci. & Eng., Cognitive Science C-014 University of California, San Diego 92093 heirich@cs.ucsd.edu aheirich@ucsd.bitnet -- -- Send compilers articles to compilers@ima.isc.com or, perhaps, Levine@YALE.EDU { decvax | harvard | yale | bbn }!ima. Meta-mail to ima!compilers-request. Please send responses to the author of the message, not the poster.