Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!uwm.edu!bionet!apple!apple.com!dowdy
From: dowdy@apple.com (Tom Dowdy)
Newsgroups: comp.sys.mac.programmer
Subject: Re: Hashing....
Message-ID: <4434@internal.Apple.COM>
Date: 28 Sep 89 16:00:06 GMT
Sender: usenet@Apple.COM
Distribution: na
Organization: Apple Computer, Inc.
Lines: 57
References:<4345@internal.Apple.COM> <1728@neoucom.UUCP> <9690@chinet.chi.il.us>

In article <9690@chinet.chi.il.us> henry@chinet.chi.il.us (Henry C. 
Schmitt) writes:
> Since I am currently in a graduate course in algorithms and we are
> currently doing (you guessed it) string searching, I recommend the
> Boyer-Moore algorithm.  It has the interesting property of working
> faster the longer the substring is!

> [ Interesting Runga-Kutta Hashing (it's a *joke*) algorithm deleted ]

Comment on algorithm:  Sure seems to call the len() function quite a bit, 
which is a rather expensive operation on C strings, esp when searching 
long blocks.  (This analysis may not be correct)

> [H3nry then recommends a good book for us to read ]

I'd like to add in here that when adapting algorithms such as this from CS 
books, Knuth or what have you, that one should use the Script Manager 
where applicable.   Also, be wary of "clever" algorithms that take 
advantage of assuming ASCII, or those that by their cleverness introduce 
overhead when combined with the Script Manager.  Let us examine this 
algorithm in this light:

For example, string search and other similar operations should always use 
ParseTable() or CharByte() when comparing and dividing characters.  It's 
not a good idea to assume one byte per char.  In fact, one can't really 
assume that a random run of bytes being in the middle of a block of bytes 
even is aligned on "character" boundries without calling CharByte() on the 
first one to make sure.

I believe a final sanity check after you "think" you've found the 
substring you want with a call to CharByte() is the only addition needed 
for this particular algorithm.  Of course, one would need to implement it 
in a way such that if the sanity check failed, one would continue the 
search.  I'll add that since CharByte() can end up doing some scanning 
(depends on the particular Script System which is active)  of it's own, 
you might wish to just use ParseTable() or CharByte()from the start and 
convert to a more "boring" algorithm.  Left as an exercise for the reader. 
(Gosh, I've always wanted to say that!)

I'd recommend that people pick up the latest Script Manager chapter, which 
I believe was distributed with the Tech Notes some months back and is 
probably available from APDA.  (Mark?  Is this true?)  More interesting 
and useful routines.  

Also, for all of you people who've got it in your head to make the next 
great compiler or scripting language or what have you, be sure to take a 
look at the IntlTokenize() routine, which does much of the hard work for 
you, in a Script Manager compatible way.  A fun routines to call, everyone 
should run out and call it today.

p.s.  No, I'm not in the International Group here.

 Tom Dowdy                 Internet:  dowdy@apple.COM
 Apple Computer MS:27AJ    UUCP:      {sun,voder,amdahl,decwrl}!apple!dowdy
 20525 Mariani Ave         AppleLink: DOWDY1
 Cupertino, CA 95014       
 "The 'Ooh-Ah' Bird is so called because it lays square eggs."