Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site mmintl.UUCP Path: utzoo!linus!philabs!pwa-b!mmintl!franka From: franka@mmintl.UUCP (Frank Adams) Newsgroups: net.internat Subject: Re: What do we REALLY want? Message-ID: <762@mmintl.UUCP> Date: Sun, 3-Nov-85 21:33:09 EST Article-I.D.: mmintl.762 Posted: Sun Nov 3 21:33:09 1985 Date-Received: Tue, 5-Nov-85 07:42:09 EST References: <723@inset.UUCP> <960@erix.UUCP> <1569@hammer.UUCP> <6066@utzoo.UUCP> <224@l5.uucp> <988@erix.UUCP> Reply-To: franka@mmintl.UUCP (Frank Adams) Organization: Multimate International, E. Hartford, CT Lines: 61 In article <988@erix.UUCP> robert@erix.UUCP (Robert Virding) writes: >In article <224@l5.uucp> gnu@l5.uucp (John Gilmore) writes: >>I think the proposals are that a coding scheme for text be defined which >>allows 16-bit characters to be escape-coded into an 8-bit text stream. >>The arguments mostly center on what kind of coding scheme would fit both >>the needs of few-16-bit-char folks and few-8-bit-char folks without wasting >>too much storage for either. > >Wow, this sounds like trying to convert ITS-Emacs' 9-bit ascii into >7-bit sequences, but 7 bits worse. Talk about breaking existing >programs. Ans who is to say that the *english* alphabet should be in >the 8-bit set? I think you miss the point here. Certainly the 8-bit code should support the basic Roman alphabet and reasonable extensions to it. This will cover all the European languages except Greek and those using the Cyrillic alphabet. (What to do about those, as well as Arabic, is not obvious.) What is not included is the Japanese and Chinese ideographs, which do not fit in an 8 bit code just by themselves. Doubling the size of all text files is just not a viable option. Let me make a more concrete proposal for a standard (although still pretty vague). One needs an escape character from an 8-bit Acsii code. The obvious choice for this is decimal 255 (hex FF). Following the escape byte would be a byte identifying the function. Functions include: * The following two bytes are a 16-bit character. * Change into 16-bit mode. * Specify the alphabet to be used for subsequent characters (e.g., Greek, Cyrillic, Arabic, etc.) The same two byte sequences can be used as escapes from the 16 bit mode. Thus, if 01 is the function code for the Roman alphabet, the 16 bit "character" FF01 would mean "drop into 8 bit mode, using the Roman alphabet". This would mean two bytes of overhead per file for documents using a different alphabet. I do not think this is an unacceptable overhead. Now, this would leave the default to be the Roman alphabet. This is de facto discriminatory, but the reasons for it are not. The cost of converting to a non-upward compatible format are large. (The cost of converting to an upward compatible format are large enough that it will be a problem.) >This sounds a little like "we >have to accept that the rest of the world may like to use their own >language, but if we english speakers are going to >have to change anything their sakes". Yeah, it does sound a bit like that. And there are people who feel that way. But there are also good economic reasons for finding an upward compatible solution. And regardless of the reasons, if you don't make it easy for the English speakers to adopt the standard, they won't, and the effort will fail, or at best be much less successful than it could have been for many years. I think success in this endeavor is much more important than keeping to any absolute standards of fairness. (Absolute is a key word in that sentence. Some minimum of fairness is what this is all about.) Frank Adams ihpn4!philabs!pwa-b!mmintl!franka Multimate International 52 Oakland Ave North E. Hartford, CT 06108