Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site fortune.UUCP
Path: utzoo!watmath!clyde!cbosgd!ihnp4!fortune!mats
From: mats@fortune.UUCP (Mats Wichmann)
Newsgroups: net.internat
Subject: Re: What do we REALLY want?
Message-ID: <5762@fortune.UUCP>
Date: Fri, 8-Nov-85 13:22:17 EST
Article-I.D.: fortune.5762
Posted: Fri Nov  8 13:22:17 1985
Date-Received: Sun, 10-Nov-85 09:31:01 EST
References: 
Reply-To: mats@fortune.UUCP (Mats Wichmann)
Organization: Fortune Systems, Redwood City, CA
Lines: 43

Okay, I don't know if anyone has posted this, we seem to be getting
things very sporadically here so I may have missed it. However. 
There is an ISO standard for "code extension techniques" (ISO 2022) 
which is supposed to address these wonderful issues.  It starts 
from 7-bit ASCII (very important, because they use the 8th bit...). 
There are two ways to shift character sets: "Single-shift" and 
"Locking-Shift". Single shift is like you pressing the SHIFT or 
CONTROL key on your terminal - it has to be done for each character.
Locking Shift puts you into a different mode until an unlock sequence 
comes along.

The AT&T internationalization proposal is based on this idea,
but uses only single-shift, and basically follows these two rules:

1. If the high-order bit of an 8-bit byte is turned off, the 8-bit
   sequence comes from an ASCII character set.

2. If the high-order bit is turned on, the 8-bit sequence is non-ASCII
   and should be interpreted as belonging to one of the three local
   character sets. The exact character set it belongs to depends
   on the internal coding method and whether it was preceded by
   a single-shift character.

There will be special "single-shift characters" which signify
one or two byte following sequences (the two magic cookies
which select this would be "SS2" = 0x8e and "SS3" = 0x8f).
The above is a major condensation, and only represents the
proposal as I understand it.

The reference document is: "Information Processing - ISO 7-bit
and 8-bit Coded Character Sets - Code Extension Techniques",
ISO 2022-1982(E).

I am relatively new to this game, so if anyone has sensible
objections to this scheme, I would love to be educated.

This sort of suggestion does of course not tackle issues
like sorting at all; it merely suggests how to represent
the data, not what you can do with it.

Mats Wichmann
Fortune Systems
{ihnp4,hplabs,dual}!fortune!mats