Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!uwm.edu!bionet!apple!apple.com!dowdy From: dowdy@apple.com (Tom Dowdy) Newsgroups: comp.sys.mac.programmer Subject: Re: Hashing.... Message-ID: <4434@internal.Apple.COM> Date: 28 Sep 89 16:00:06 GMT Sender: usenet@Apple.COM Distribution: na Organization: Apple Computer, Inc. Lines: 57 References:<4345@internal.Apple.COM> <1728@neoucom.UUCP> <9690@chinet.chi.il.us> In article <9690@chinet.chi.il.us> henry@chinet.chi.il.us (Henry C. Schmitt) writes: > Since I am currently in a graduate course in algorithms and we are > currently doing (you guessed it) string searching, I recommend the > Boyer-Moore algorithm. It has the interesting property of working > faster the longer the substring is! > [ Interesting Runga-Kutta Hashing (it's a *joke*) algorithm deleted ] Comment on algorithm: Sure seems to call the len() function quite a bit, which is a rather expensive operation on C strings, esp when searching long blocks. (This analysis may not be correct) > [H3nry then recommends a good book for us to read ] I'd like to add in here that when adapting algorithms such as this from CS books, Knuth or what have you, that one should use the Script Manager where applicable. Also, be wary of "clever" algorithms that take advantage of assuming ASCII, or those that by their cleverness introduce overhead when combined with the Script Manager. Let us examine this algorithm in this light: For example, string search and other similar operations should always use ParseTable() or CharByte() when comparing and dividing characters. It's not a good idea to assume one byte per char. In fact, one can't really assume that a random run of bytes being in the middle of a block of bytes even is aligned on "character" boundries without calling CharByte() on the first one to make sure. I believe a final sanity check after you "think" you've found the substring you want with a call to CharByte() is the only addition needed for this particular algorithm. Of course, one would need to implement it in a way such that if the sanity check failed, one would continue the search. I'll add that since CharByte() can end up doing some scanning (depends on the particular Script System which is active) of it's own, you might wish to just use ParseTable() or CharByte()from the start and convert to a more "boring" algorithm. Left as an exercise for the reader. (Gosh, I've always wanted to say that!) I'd recommend that people pick up the latest Script Manager chapter, which I believe was distributed with the Tech Notes some months back and is probably available from APDA. (Mark? Is this true?) More interesting and useful routines. Also, for all of you people who've got it in your head to make the next great compiler or scripting language or what have you, be sure to take a look at the IntlTokenize() routine, which does much of the hard work for you, in a Script Manager compatible way. A fun routines to call, everyone should run out and call it today. p.s. No, I'm not in the International Group here. Tom Dowdy Internet: dowdy@apple.COM Apple Computer MS:27AJ UUCP: {sun,voder,amdahl,decwrl}!apple!dowdy 20525 Mariani Ave AppleLink: DOWDY1 Cupertino, CA 95014 "The 'Ooh-Ah' Bird is so called because it lays square eggs."