Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.3 alpha 5/22/85; site cbosgd.UUCP
Path: utzoo!watmath!clyde!cbosgd!mark
From: mark@cbosgd.UUCP (Mark Horton)
Newsgroups: net.internat
Subject: Re: What do we REALLY want?
Message-ID: <1594@cbosgd.UUCP>
Date: Sun, 10-Nov-85 20:01:25 EST
Article-I.D.: cbosgd.1594
Posted: Sun Nov 10 20:01:25 1985
Date-Received: Mon, 11-Nov-85 06:14:29 EST
References: <723@inset.UUCP> <960@erix.UUCP> <1569@hammer.UUCP>
Organization: AT&T Bell Laboratories, Columbus, Oh
Lines: 44

The Japanese Kanji character set can be input in the same phonetic
way as was described for Chinese.  (You type in 2 or 3 Roman letters
which phonetically sound like the syllable you want, and it turns into
the (unique) Katakana glyph for the syllable you want.  You do this
for every syllable in the word and then press a special key, and
something consults a (big) table and finds all the glyphs that sound
like that.  It puts up a menu, which often has 2-6 choices, on an extra
line at the bottom of the terminal.  You pick one and it goes up on
the screen.

I'm told there are about 60000 Kanji characters, and a few tens of
thousands more Chinese characters (I can't remember the exact numbers.)
However, a subset that fits in 14 bits is in common use, and they are
willing to restrict theirselves to this subset.

There are apparently already official standards for encoding Kanji
in 16 bits, intermixed with ASCII.  It seems that you take the 14
bits and put them in two bytes, each byte with the 8th bit on.
Having two consecutive bytes with the parity bit on means it's a
Kanji character.  A single parity character might have a different
international meaning.

This doesn't break tail or grep.  I don't know what they do if there
are two European characters in a row, but I gather there is some
standard way of dealing with this.  The only mode needed is attached
to the keyboard, so it can tell if you're typing in Roman or Katakana.

By the way, I've seen several references to a function "printw" with
an assumption that this would be a 16 bit printf.  I'd like to point
out that the name "printw" has already been taken by curses, which is
present in both 4BSD and System V.  (printw means "print window.")
I'm not even convinced that such a function is needed, since the
existing standards seem oriented toward streams of 8 bit bytes.
I don't think stdio cares whether a character is Kanji or Roman,
that's between the application and the terminal.  Regular old printf
works fine.

	Mark Horton

P.S. While everybody agrees that this group should exist and should be
distributed worldwide, but the name "net.internat" is terrible.  Let's
settle the issue of whether it's to be moderated (I understand we have
a volunteer to be the moderator) and then call it either net.international
or mod.international.