Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!decvax!ima!johnl From: johnl@ima.UUCP Newsgroups: mod.compilers Subject: Re: advice needed re: parsing C decl syntax Message-ID: <284@ima.UUCP> Date: Mon, 8-Dec-86 23:08:04 EST Article-I.D.: ima.284 Posted: Mon Dec 8 23:08:04 1986 Date-Received: Wed, 10-Dec-86 02:22:46 EST Reply-To: SKYOrganization: U of Rochester, CS Dept, Rochester, NY Lines: 33 Approved: In-Reply-To: <282@ima.UUCP> Uucp: ..!{allegra,decvax,seismo}!rochester!ken ARPA: ken@rochester.arpa Snail: CS Dept., U. of Roch., NY 14627. Voice: Ken! LR parsing implies bottom up parsing. The standard trick in LL (recursive descent) parsing is to delay doing something about a token if it could be one of several things. Basically you are rewriting your grammar so that it is LL(1) instead of LL(k), k > 1, by left factoring. For example: statement -> identifier := expression -> identifier ( paramlist ) is not LL(1). But when you factor it out thus: statement -> identifier stmttail stmttail -> := expression -> ( paramlist ) Now one token of lookahead suffices. Ken [This is true, but in real recursive descent compilers that I have seen, it's as likely that you parse these things by kludgery as by adding productions. Most recursive descent compilers are written by hand, so you'd parse the first syntax by eating the identifier, remembering it, eating the next token, and then going ahead with the appropriate construction. But I'd rather let yacc do the work for me. Yacc can parse anything with enough help -- I have written a yacc parser with a moderate number of kludges that correctly parses Fortran 77, a language in which a statement can't be tokenized until you know what kind of statement it is. -John] -- Send compilers mail to ima!compilers or, in a pinch to Levine@YALE.EDU Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbncca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request