Path: utzoo!attcan!uunet!ncrlnk!ncrcae!ece-csc!ncsuvx!gatech!ukma!tut.cis.ohio-state.edu!osu-cis!att!lzaz!bds
From: bds@lzaz.ATT.COM (B.SZABLAK)
Newsgroups: comp.lang.c++
Subject: Re: Perfect hash function for g++ reserved words
Message-ID: <237@lzaz.ATT.COM>
Date: 21 Sep 88 13:12:35 GMT
References: <8809192248.aa03503@PARIS.ICS.UCI.EDU>
Organization: AT&T ISL Lincroft NJ USA
Lines: 28

In article <8809192248.aa03503@PARIS.ICS.UCI.EDU>, schmidt@siam.ics.uci.EDU ("Douglas C. Schmidt") writes:
...
> p.s. Suggestions for improvement of the basic approach are always welcome.
...
> + /* Contains the offset into the word_table for each of
> +    the reserved words.  This table is indexed by the
> +    hash function value.  Some entries are dummies, needed
> +    since the function is perfect, but not minimum. */

An old trick when using this technique is to use a sparse table (i.e. far
from minimum) in order to reduce hash collisions of variable names with
keyword tokens. This will result in executing the following loop a fewer
number of times:

+             while (str_start < str_end) 
+               {
+                 if (*str++ != word_table[str_start++]) 
+                   {
+                     return (NULL); /* bail if no match */
+                   }
+               }

A perfect minimal table would always require a check
to insure that you have the keyword and not a variable name, but
by having a sparse table many of those checks will be avoided as variable
names will likely hash to an unused table entry.

You'll have to experiment to find the optimal table size.