Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.1 6/24/83; site alice.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!allegra!alice!trickey
From: trickey@alice.UucP (Howard Trickey)
Newsgroups: net.internat
Subject: Re: Hyphenation
Message-ID: <4546@alice.UUCP>
Date: Sun, 10-Nov-85 09:17:41 EST
Article-I.D.: alice.4546
Posted: Sun Nov 10 09:17:41 1985
Date-Received: Mon, 11-Nov-85 05:42:18 EST
References: <471@harvard.ARPA> <773@mmintl.UUCP>, <1861@watdcsu.UUCP>
Organization: Bell Labs, Murray Hill
Lines: 17

> Yes, and none of them are any good.

The hyphenation algorithm invented by Frank Liang, and incorporated in TeX
is good.  It is essentially a way of converting a hyphenated wordlist
(from a dictionary, but with all forms of all words) and creating a
list of "patterns".  You can set parameters to trade off table size
vs. percentage of hyphens that it will find vs. error rate.
The standard TeX table takes about 20kbyte, finds 86.7% of the hyphens
in an inflected Webster's Pocket Dictionary (and all of the hyphens in
the 676 most common words), and no wrong hyphens.  With about 2kbyte
you could find 35.2% of the hyphens and no errors.
(Please note that this algorithm is in TeX82, not the original TeX.)
See "Word Hy-phen-a-tion by Com-put-er" by Frank Liang (Phd thesis)
Stanford CS Dept. tech report STAN-CS-83-977 for details.

Several groups have done French hyphenation tables using this algorithm,
and found that they are typically much smaller than English ones.