Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!cornell!batcomputer!itsgw!steinmetz!uunet!mcvax!philmds!leo
From: leo@philmds.UUCP (Leo de Wit)
Newsgroups: comp.sources.wanted
Subject: Re: SOUNDEX routines wanted
Keywords: soundex
Message-ID: <538@philmds.UUCP>
Date: 29 Jun 88 11:00:46 GMT
References: <250@iconsys.UUCP>
Reply-To: leo@philmds.UUCP (L.J.M. de Wit)
Organization: Philips I&E DTS Eindhoven
Lines: 74

In article <250@iconsys.UUCP> ron@iconsys.UUCP (Ron Holt) writes:
>
>I am considering writing an interactive spell checker/corrector for
>Unix similar to that implemented in WordPerfect.  I would like to try
>using Soundex for the spell corrector portion.  Does any one know
>where I can get source code to any Soundex routines?

Soundex is in fact so easy you should write it yourself. Here's what I
read in an old Pascal exercise book (in Dutch, so I translated for you):

---------------- S T A R T    Q U O T A T I O N ------------------

All characters belong to a group, as follows (ignoring case)

0:  a,e,i,o,u,h,w,y, 
1:  b,f,p,v
2:  c,g,j,k,q,s,x,z
3:  d,t
4:  l
5:  m,n
6:  r

1) First replace each character from the string to be encoded by the
group. So 'This is a testcase' becomes '300200200030232020'.

2) Then replace all repetitions by one occurence. So the example becomes
'30202030232020'.

3) Finally remove '0''s. So the example becomes '32232322'.

---------------- E N D   Q U O T A T I O N ------------------

Because there are 7 groups (with only 6 used) you can use a nibble (4
bits) to encode the group number. If your strcmp() does not ignore bit
7, you can use it for comparing encoded soundex strings, otherwise use
memcmp(). Below I put an implementation that should work (haven't tested
it). The class[] char array should contain a 0 on most places, only
    class['b'] = class['f'] = class['p'] = class['v'] = 
    class['B'] = class['F'] = class['P'] = class['V'] = 1;
etc. for the other non-0 classes.
The encoded string is null-terminated to allow standard str...  functions.

static char class[] = {

  /* put correct (256) initializers here */

};

void soundex(src,dest)
char *src, *dest;
{
    char lastclass = 0, newclass;
    int even = 1;

    for ( ; *src != '\0'; src++) {
        if ((newclass = class[*src]) != lastclass) {
            lastclass = newclass;
            if (newclass != 0) {
                if (even) {
                    *dest = newclass << 4;
                } else {
                    *dest++ |= newclass; 
                }
                even = !even;
            }
        }
    }
    if (!even) dest++;
    *dest = '\0';
}

Hope this will satisfy your need  ---  success!

    Leo.