Xref: utzoo comp.misc:2778 comp.lang.c:11260
Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!husc6!purdue!umd5!mimsy!chris
From: chris@mimsy.UUCP (Chris Torek)
Newsgroups: comp.misc,comp.lang.c
Subject: Re: Soundex algorithm
Message-ID: <12410@mimsy.UUCP>
Date: 11 Jul 88 21:51:27 GMT
References: <2130@hubcap.UUCP> <12520@sunybcs.UUCP>
Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742
Lines: 27

[I have deleted groups comp.theory and comp.ai since Soundex has little
to do with these]

In article <12520@sunybcs.UUCP> stewart@sunybcs.uucp (Norman R. Stewart)
writes:
>2: Apply the following rules to produce a code of one letter and
>   three numbers.
>   A: The first letter of the word becomes the initial character
>      in the code.
>   B: When two or more letters from the same group occur together
>      only the first is coded.
>   C: If two letters from the same group are seperated by an H or
>      a W, code only the first.
>   D: Group 7 letters are never coded (this does not include the
>      first letter in the word, which is always coded).

[I thought Soundex codes were usually fixed at four symbols.]

What if more than two letters from the same group are separated by H
or W?  For instance: FDHTWTHTWL.  Is this encoded as F334 or as F34?

The table has L=4, R=6; I find this surprising, as both R and L are
semivowels and they are easily confused by those who did not grow up
with the distinction (e.g., some Orientals).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris