Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site calma.UUCP Path: utzoo!watmath!clyde!burl!ulysses!allegra!mit-eddie!think!harvard!cmcl2!seismo!lll-crg!ucdavis!ucbvax!decvax!decwrl!sun!calma!radzy From: radzy@calma.UUCP (Tim Radzykewycz) Newsgroups: net.unix Subject: Re: International Unix Message-ID: <45@calma.UUCP> Date: Thu, 31-Oct-85 15:22:35 EST Article-I.D.: calma.45 Posted: Thu Oct 31 15:22:35 1985 Date-Received: Sun, 3-Nov-85 09:10:56 EST References: <2400@brl-tgr.ARPA> <2344@ukma.UUCP> Reply-To: radzy@calma.UUCP (Tim Radzykewycz) Organization: GE/Calma Co., R&D Systems Engineering, Milpitas, CA Lines: 65 In article <2344@ukma.UUCP> sambo@ukma.UUCP (Father of micro-ln) writes: >In article <2400@brl-tgr.ARPA> bilbo.jbrown@ucla-locus.ARPA (Jordan Brown) writes: >>Unfortunately, you CAN'T build a good international character set. >>Some of those silly European countries have the same character in >>several languages, but sort the character in different places in each >>language. They also have interesting constructs like characters that >>sort as two characters, and pairs of characters that sort as single >>characters. That is, there might be a character @ which sorts as "xy", >>so that @m sorts right after xylophone and before xyn. Similarly, they >>sometimes say that the pair ll sorts as a single character; I don't >>remember where. >I guess I would like to see some examples of the above. Are you saying >that in some language, the order of the letters might be "a b c ...", >whereas in some other language, the order might be "a c b ..."? What >pair of languages is like this? Also, in which language is some single >character considered as two characters? Basically, yes. That's the general idea. If you go through your archives for net.nlang for about the last 3 or 4 weeks, you can get about 6 examples of alphabets, at least two of which have "letters out of sequence". One other way of looking at this (let's see how far ahead of myself I can get) is to think of the reasons for the internaltional character set: 1. consistent sorting 2. consistent pred/succ operations 3. no special characters in one language that are printable chars in another Well, reason 2 says we can't have gaps in the letters for *any* language. Reason 3 says languages with smaller alphabets can't use the extra chars. Reason 1 says everything has to be in order. So lets take a look at 3 character sets (english, spanish and german) a b c d e f g h i j k l m n o p q r s t u v w x y z <- english a b c d e f g h i j k l ll m n o p q r s t u v w x y z <- spanish a b c d e f g h i j k l m n o p q r s B t u v w x y z <- german (pardon me if any of this is wrong, but at least it makes the point, even if it *is* wrong.) So the letters (E:m-z,S:ll-z,G:m-z) are all different, and we're still on the latin alphabet (How about cyrillic?). Aside: I strongly recommend that anyone seriously interested in international [issues|unix] read net.nlang. It is not too difficult to cull the garbage from it and read only the relevant articles, such as the ones I mentioned above. Please send flames to /dev/null and discussions to me or the net. >I speak Spanish and some French. Without thinking very much, something >like the double "l" (which at least in Honduras is pronounced the same >as a "y") would need to be treated as a single character, but written >out as two characters. The problem is in capitalizing it. There need >to be two forms for the uppercase double "l": "LL" and "Ll". This would >mean that there would be two different codes for the uppercase double >"l". Again, without thinking very much, this is the same situation as >with vowels, since they may have an accent. I assume this is all an argument to support the original article, however I don't think that was clear the way it was written. -- Tim (radzy) Radzykewycz, The Incredible Radical Cabbage calma!radzy@ucbvax.ARPA {ucbvax,sun,csd-gould}!calma!radzy