Path: utzoo!utgpu!water!watmath!clyde!att!alberta!ubc-cs!uw-beaver!uw-june!uw-entropy!dataio!pilchuck!ssc!happym!polari!rlb From: rlb@polari.UUCP (rlb) Newsgroups: comp.lang.c Subject: Re: hash algorithm Summary: rehash of current discussion Message-ID: <813@polari.UUCP> Date: 19 Aug 88 18:20:46 GMT References: <2550078@hpisod2.HP.COM> <723@goofy.megatest.UUCP> Organization: Polarserv, Seattle WA Lines: 42 In article <723@goofy.megatest.UUCP>, djones@megatest.UUCP (Dave Jones) writes: > From article <2550078@hpisod2.HP.COM>, by decot@hpisod2.HP.COM (Dave Decot): > >> could anyone recommend a hashing algorithm to store a list of 10 > >> digit phone numbers (area code plus 7 digits). > > ... > > Treat the ten digits as an integer, n, and compute the index > > as (n % HASH_SIZE), where HASH_SIZE is the size of your hash table. > > At this point, no one has pointed out that modulo is a better hash function if the table size is a prime number. > > ... > > I have devised a method for hashing which has not been published previously. > It is very very fast. Here is the WORLD PREMIERE!! > > First let's settle the business of how to hash integers into the > range 0..M-1. I suggest that you use hash(I) = (I*32821) % M > > My hashing technique uses a table-size M which is always a power of two. > This makes the modulo-function (%) very fast on a two's compliment machine, Well, yes, but you're not counting the cost of multiplying by 32821 elsewhere, so, in reality, your way seems to be a wee bit slower than (key % prime). > ... long discussion/demonstration of rehashing technique ... > To summarize: You've created a hashing scheme. Forming the hash itself isn't faster than the simple (key % prime) algorithm. Re-hashing and deletions require more code than the simple "put it in the next empty slot" method. It requires more storage than the simple algorithm (I'm assuming it "breaks" if you let the table get more than half full, I didn't actually figure it out). The principal advantage seems to be that access time is flatter (not, flat, but flatter, and probably not much flatter) than the simple "put it in the next empty slot" method. If you're not going to let the table get more than half full, then (assuming a fairly successful hash) there's only about a 50% chance that you'll have to rehash. So, even with a bit of clumping, doesn't it seem likely that the simplest rehashing scheme compares quite favorably with the one you've created? Agree? Disagree?