Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.1 6/24/83; site mmintl.UUCP
Path: utzoo!linus!philabs!pwa-b!mmintl!franka
From: franka@mmintl.UUCP (Frank Adams)
Newsgroups: net.internat
Subject: Re: What do we REALLY want?
Message-ID: <762@mmintl.UUCP>
Date: Sun, 3-Nov-85 21:33:09 EST
Article-I.D.: mmintl.762
Posted: Sun Nov  3 21:33:09 1985
Date-Received: Tue, 5-Nov-85 07:42:09 EST
References: <723@inset.UUCP> <960@erix.UUCP> <1569@hammer.UUCP> <6066@utzoo.UUCP> <224@l5.uucp> <988@erix.UUCP>
Reply-To: franka@mmintl.UUCP (Frank Adams)
Organization: Multimate International, E. Hartford, CT
Lines: 61

In article <988@erix.UUCP> robert@erix.UUCP (Robert Virding) writes:
>In article <224@l5.uucp> gnu@l5.uucp (John Gilmore) writes:
>>I think the proposals are that a coding scheme for text be defined which
>>allows 16-bit characters to be escape-coded into an 8-bit text stream.
>>The arguments mostly center on what kind of coding scheme would fit both
>>the needs of few-16-bit-char folks and few-8-bit-char folks without wasting
>>too much storage for either.
>
>Wow, this sounds like trying to convert ITS-Emacs' 9-bit ascii into
>7-bit sequences, but 7 bits worse.  Talk about breaking existing
>programs.  Ans who is to say that the *english* alphabet should be in
>the 8-bit set?

I think you miss the point here.  Certainly the 8-bit code should support
the basic Roman alphabet and reasonable extensions to it.  This will cover
all the European languages except Greek and those using the Cyrillic
alphabet.  (What to do about those, as well as Arabic, is not obvious.)
What is not included is the Japanese and Chinese ideographs, which do not
fit in an 8 bit code just by themselves.  Doubling the size of all text files
is just not a viable option.

Let me make a more concrete proposal for a standard (although still pretty
vague).  One needs an escape character from an 8-bit Acsii code.  The obvious
choice for this is decimal 255 (hex FF).  Following the escape byte would
be a byte identifying the function.  Functions include:

* The following two bytes are a 16-bit character.

* Change into 16-bit mode.

* Specify the alphabet to be used for subsequent characters (e.g., Greek,
Cyrillic, Arabic, etc.)

The same two byte sequences can be used as escapes from the 16 bit mode.
Thus, if 01 is the function code for the Roman alphabet, the 16 bit
"character" FF01 would mean "drop into 8 bit mode, using the Roman alphabet".

This would mean two bytes of overhead per file for documents using a
different alphabet.  I do not think this is an unacceptable overhead.

Now, this would leave the default to be the Roman alphabet.  This is de facto
discriminatory, but the reasons for it are not.  The cost of converting to
a non-upward compatible format are large.  (The cost of converting to an
upward compatible format are large enough that it will be a problem.)

>This sounds a little like "we
>have to accept that the rest of the world may like to use their own
>language, but  if we english speakers are going to
>have to change anything their sakes".

Yeah, it does sound a bit like that.  And there are people who feel that way.
But there are also good economic reasons for finding an upward compatible
solution.  And regardless of the reasons, if you don't make it easy for the
English speakers to adopt the standard, they won't, and the effort will fail,
or at best be much less successful than it could have been for many years.
I think success in this endeavor is much more important than keeping to any
absolute standards of fairness.  (Absolute is a key word in that sentence.
Some minimum of fairness is what this is all about.)

Frank Adams                           ihpn4!philabs!pwa-b!mmintl!franka
Multimate International    52 Oakland Ave North    E. Hartford, CT 06108