Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!gatech!mcnc!decvax!ima!esegue!compilers-sender From: djones@megatest.com (Dave Jones) Newsgroups: comp.compilers Subject: Re: Error reporting in LR parsers Message-ID: <1989Aug11.140942.12335@esegue.uucp> Date: 11 Aug 89 14:09:42 GMT References: <1989Aug8.131112.1081@esegue.uucp> Sender: compilers-sender@esegue.uucp Reply-To: djones@megatest.com (Dave Jones) Organization: Megatest Corporation, San Jose, Ca Lines: 62 Approved: compilers@esegue.segue.bos.ma.us From article <1989Aug8.131112.1081@esegue.uucp>, by heirich@cs.ucsd.edu (Alan Heirich): > This posting describes modifications to > DECUS yacc to permit automatic diagnostic generation. ... > > The changes are nearly all in the routine "output". This routine writes out > the parser description to the description file. You will want to modify it > to write out five pieces of information: > > (1) a set of strings containing token names > (2) a set of strings containing nonterminal names > (3) a set of states containing items sets > (4) a set of states containing lookahead sets > (5) a set of states containing goto sets > All this info is in the y.output file from standard yacc. There's no need to (ahem) hack yacc. If there's a demand, and I can find the time, I'll package up and post some a nawk (new awk) scripts and so forth that use the y.output file to generate a procedural, rather than table-based, parser. They could be modified easily enough to write the above info in any format you might want. But be warned, however you obtain this info, you still have to calculate the legal-token-sets dynamically. There is not enough info in the LALR(1) item-sets to calculate them from the state-number alone. You have to keep up with the default-reduction states as they are popped. I hope I have said this often enough and loudly enough now. I did the scripts partly as an exercise in learning nawk, partly to get a faster parser, but mostly to aid in debugging compilers. It's impossible to pick through the coded tables in a debugger and make any kind of sense, but it is easy to single step through code that looks like the following, which is cut and pasted from a compiler I'm in the process of writing: switch(state) { ... case 3:switch(lookahead){ /* * file : $$2 _ declarations * declaration_list : _ (28) */ case EOF: YREDUCE(28,0,NT_declaration_list); case error: YSHIFT(7); case CONST: YREDUCE(28,0,NT_declaration_list); case INSERT: YREDUCE(28,0,NT_declaration_list); case FUNCTION: YREDUCE(28,0,NT_declaration_list); case LABEL: YREDUCE(28,0,NT_declaration_list); case PROCEDURE: YREDUCE(28,0,NT_declaration_list); case TYPE: YREDUCE(28,0,NT_declaration_list); case VAR: YREDUCE(28,0,NT_declaration_list); case WITH: YREDUCE(28,0,NT_declaration_list); default: YERROR(); }; The scripts "wrote" the above code from inspection of the y.output file. [From djones@megatest.com (Dave Jones)] -- Send compilers articles to compilers@ima.isc.com or, perhaps, Levine@YALE.EDU { decvax | harvard | yale | bbn }!ima. Meta-mail to ima!compilers-request. Please send responses to the author of the message, not the poster.