Xref: utzoo comp.misc:2778 comp.lang.c:11260 Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!husc6!purdue!umd5!mimsy!chris From: chris@mimsy.UUCP (Chris Torek) Newsgroups: comp.misc,comp.lang.c Subject: Re: Soundex algorithm Message-ID: <12410@mimsy.UUCP> Date: 11 Jul 88 21:51:27 GMT References: <2130@hubcap.UUCP> <12520@sunybcs.UUCP> Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742 Lines: 27 [I have deleted groups comp.theory and comp.ai since Soundex has little to do with these] In article <12520@sunybcs.UUCP> stewart@sunybcs.uucp (Norman R. Stewart) writes: >2: Apply the following rules to produce a code of one letter and > three numbers. > A: The first letter of the word becomes the initial character > in the code. > B: When two or more letters from the same group occur together > only the first is coded. > C: If two letters from the same group are seperated by an H or > a W, code only the first. > D: Group 7 letters are never coded (this does not include the > first letter in the word, which is always coded). [I thought Soundex codes were usually fixed at four symbols.] What if more than two letters from the same group are separated by H or W? For instance: FDHTWTHTWL. Is this encoded as F334 or as F34? The table has L=4, R=6; I find this surprising, as both R and L are semivowels and they are easily confused by those who did not grow up with the distinction (e.g., some Orientals). -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris