Path: utzoo!attcan!uunet!lll-winken!lll-tis!ames!ncar!tank!uxc!uxc.cso.uiuc.edu!a.cs.uiuc.edu!m.cs.uiuc.edu!liberte
From: liberte@m.cs.uiuc.edu
Newsgroups: comp.lang.misc
Subject: Re: Dumb Lexical Analyzers are Smart
Message-ID: <5200027@m.cs.uiuc.edu>
Date: 20 Sep 88 06:11:00 GMT
References: <5200026@m.cs.uiuc.edu>
Lines: 49
Nf-ID: #R:m.cs.uiuc.edu:5200026:m.cs.uiuc.edu:5200027:000:2117
Nf-From: m.cs.uiuc.edu!liberte    Sep 20 01:11:00 1988


My interpretation of Bill Smith's argument is that languages should
distinguish tokens lexically if they are used in different ways
syntactically.  As part of that argument, he says:

> C is not such
> a language because of the typedef construct.  A typedef changes
> the lexical class of the new type's identifier to avoid
> horrendous ambiguity in the language.

But a typedef would not have to change the lexical class of
identifiers if typedef identifiers were only used in the syntax in
unambiguous ways.  However, they are used ambiguously and the only
way to disambiguate is to use the (semantic) fact that an
identifier was declared as a typedef.  

For example, when you see "foo * bar;" in C, you cant tell whether
it is a use of the typedef identifier "foo" or a multiplication
expression; not without looking back to find out if the next and
previous lines are declarations and/or if "foo" is a typedef id. 
But a better syntax could make that locally unambiguous.

Pascal also supports user defined type identifiers, but they may
only be used, like all type identifiers, in unambiguous ways. 
Actually, some Pascals support type casting that looks identical to
function calls, so this is ambiguous in some sense, but it doesnt
matter at the syntactic level.  (A similar example is the ambiguity
between a parameterless function call and a variable reference.)

So, while you see it as a problem to be solved at the lexical
level, I would prefer to solve it at the syntax level.

> Advantages I can see:
> 
> 	1.  A person familiar with the lexical rules of the language can 
> 		more easily understand a routine without consulting all of the
> 		declarations involved.

Granted.  But if the syntax doesnt permit ambiguous use of tokens,
then a reader of a program should be able to tell pretty quickly
what is meant from the local context.

The other advantages you list relate to splitting up lexical and
syntax processing and not requiring semantic info.  But the same
advantages obtain with a syntactic solution.

Dan LaLiberte
uiucdcs!liberte
liberte@cs.uiuc.edu
liberte%a.cs.uiuc.edu@uiucvmd.bitnet