Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site brl-tgr.ARPA Path: utzoo!watmath!clyde!burl!ulysses!allegra!mit-eddie!think!harvard!seismo!brl-tgr!tgr!bilbo.jbrown@ucla-locus.ARPA From: bilbo.jbrown@ucla-locus.ARPA (Jordan Brown) Newsgroups: net.unix Subject: International Unix Message-ID: <2400@brl-tgr.ARPA> Date: Thu, 24-Oct-85 15:57:31 EDT Article-I.D.: brl-tgr.2400 Posted: Thu Oct 24 15:57:31 1985 Date-Received: Sat, 26-Oct-85 04:10:12 EDT Sender: news@brl-tgr.ARPA Lines: 33 A couple of notes on the message from Erik Fair (ucbvax!fair): Unfortunately, you CAN'T build a good international character set. Some of those silly European countries have the same character in several languages, but sort the character in different places in each language. They also have interesting constructs like characters that sort as two characters, and pairs of characters that sort as single characters. That is, there might be a character @ which sorts as "xy", so that @m sorts right after xylophone and before xyn. Similarly, they sometimes say that the pair ll sorts as a single character; I don't remember where. Character set is not (or should not be) a very basic assumption. Aren't there EBCDIC UNIXes out there? Most of the system is (should be) completely independent of the character set. The only place you should have problems will be programs which make assumptions about arithmetic on characters, or about the range of values characters take on. (Note that C promises that all characters are non-negative (this is not to say that all possible values of a char variable are non-negative, however)) What characters does the kernel (for instance) know and care about? Slash (/), Null (\0), and maybe Dot (.) in the main body of the kernel; a few control characters in the tty drivers. No big deal. There will be work, but it shouldn't be too bad. Much more grunt work is involved in isolating the messages for translation. People writing code commercially should keep this in mind. Keep your messages in a separate module, or better yet in an external file. Try to make the code flexible about exactly how long messages are; the length will vary dramatically when you translate the message, and English is usually the most terse language. Wouldn't it be easier to convince the Europeans to speak English? :-)