Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!cornell!batcomputer!itsgw!steinmetz!uunet!mcvax!philmds!leo From: leo@philmds.UUCP (Leo de Wit) Newsgroups: comp.sources.wanted Subject: Re: SOUNDEX routines wanted Keywords: soundex Message-ID: <538@philmds.UUCP> Date: 29 Jun 88 11:00:46 GMT References: <250@iconsys.UUCP> Reply-To: leo@philmds.UUCP (L.J.M. de Wit) Organization: Philips I&E DTS Eindhoven Lines: 74 In article <250@iconsys.UUCP> ron@iconsys.UUCP (Ron Holt) writes: > >I am considering writing an interactive spell checker/corrector for >Unix similar to that implemented in WordPerfect. I would like to try >using Soundex for the spell corrector portion. Does any one know >where I can get source code to any Soundex routines? Soundex is in fact so easy you should write it yourself. Here's what I read in an old Pascal exercise book (in Dutch, so I translated for you): ---------------- S T A R T Q U O T A T I O N ------------------ All characters belong to a group, as follows (ignoring case) 0: a,e,i,o,u,h,w,y,1: b,f,p,v 2: c,g,j,k,q,s,x,z 3: d,t 4: l 5: m,n 6: r 1) First replace each character from the string to be encoded by the group. So 'This is a testcase' becomes '300200200030232020'. 2) Then replace all repetitions by one occurence. So the example becomes '30202030232020'. 3) Finally remove '0''s. So the example becomes '32232322'. ---------------- E N D Q U O T A T I O N ------------------ Because there are 7 groups (with only 6 used) you can use a nibble (4 bits) to encode the group number. If your strcmp() does not ignore bit 7, you can use it for comparing encoded soundex strings, otherwise use memcmp(). Below I put an implementation that should work (haven't tested it). The class[] char array should contain a 0 on most places, only class['b'] = class['f'] = class['p'] = class['v'] = class['B'] = class['F'] = class['P'] = class['V'] = 1; etc. for the other non-0 classes. The encoded string is null-terminated to allow standard str... functions. static char class[] = { /* put correct (256) initializers here */ }; void soundex(src,dest) char *src, *dest; { char lastclass = 0, newclass; int even = 1; for ( ; *src != '\0'; src++) { if ((newclass = class[*src]) != lastclass) { lastclass = newclass; if (newclass != 0) { if (even) { *dest = newclass << 4; } else { *dest++ |= newclass; } even = !even; } } } if (!even) dest++; *dest = '\0'; } Hope this will satisfy your need --- success! Leo.