Xref: utzoo comp.sys.ibm.pc:16768 comp.binaries.ibm.pc.d:478 comp.emacs:3711
Path: utzoo!attcan!uunet!mcvax!unido!infbs!neitzel
From: neitzel@infbs.UUCP (Martin Neitzel)
Newsgroups: comp.sys.ibm.pc,comp.binaries.ibm.pc.d,comp.emacs
Subject: Re: US PC programmers still live in a 7-bit world!
Message-ID: <920@infbs.UUCP>
Date: 27 Jun 88 23:59:48 GMT
References: <1988Jun22.223158.1366@LTH.Se> <126@dcs.UUCP>
Reply-To: neitzel@infbs.UUCP (Martin Neitzel)
Organization: TU Braunschweig,Informatik,West Germany
Lines: 120

(Here comes yet anotherone of those European bastards... :-)

torsten@DNA.LTH.Se (Torsten Olsson) writes:
TO>	US PC programmers still live in a 7-bit world!
TO>	[...]

In article <126@dcs.UUCP> wnp@dcs.UUCP (Wolf N. Paul) replied:
WNP>	[...]
WNP>	The C functions toupper()/tolower() rely on upper and lower case to be
WNP>	two parallel groups of consecutive codes within the ASCII scheme.

Basically Right.  But one of the reasons for those macros/functions
is to get independent of the used character set, or am I completely
wrong?  (I know, the UNIX(tm)  requires us to write something
like "if (isascii(c) && islower(c) ...", but that should be relatively
easy to port into an 8-bit environment.  On the other hand, if someone
thinks:  "Hey, 7 bits in my char for the ascii code, now let's see what
I can mess around with the 8th!" -- that's neither portable nor justified
by K&R.

[This is the point where I should mention that discussions about
 our European character sets are usually somewhat emotionally
 heated  (probably because we always had kind of to "suffer"
 from ASCII here in Europe).
 None of the following replies are intended to attack Wolf Paul
 personally.  But some points should be made clear.
]

WNP>	The way	IBM has chosen to implement non-English characters on its
WNP>	PC line is
WNP>	(a) non-standard (i.e. applies only to IBM-compatible machines) and

Then perhaps we can move to ISO Latin-X.  That would still confront us
with the same problems:  non-ASCII, 8-bit.  (For more on the standards
issue, see below.)

WNP>	(b) incompatible with the assumption about upper and lower case in ASCII
WNP>	    and thus in C and other programming languages.

K&R did not made any assumptions about required character sets, except
there must be at least one positive element contained.  So far regarding
"C & ASCII".

WNP>	(c) incompatible with the way European characters are implemented
WNP>	    on MOST printers and ANSI terminals.

Perhaps I should explain what "this way" was: The ascii characters like
[]{}\~ were considered as "not so useful for Europeans" and their codes
were interpreted as national characters.  That was a HACK, not a
solution!  Yes, usage of those ascii charcaters and national characters
was mutually exclusive.  Not "both of them" in one document, listen?

[btw, that's one of the reasons why I write in English when programming,
 and not in my native language (and the language the people that want to
 read my programs here would like to read.)]

Wolf is right: IBM did not constrain itself to any standard when
introducing their PCs.  But they made a first step into a reasonable
direction:
(1) Keeping 0-127 ASCII, and
(2) Providing most of the Europeans with "their" characters.

They certainly broke no previous standard.  The previous practice
as explained above was not a standard.  As I said, it was a hack, and
nobody ever before did care about finding a solution.


WNP>	The way IBM implemented it, all case functions would have to be
WNO>	table-driven, which is much less elegant than working with the
WNP>	parallel ranges of characters in standard ASCII.

Why is the table-driven approach "much less elegant"?  A syntax-table
ala GNU's is easy to implement, costs few memory, simplifies and speeds
up the code, offers classifications beyond "letter/upper/lower" in a
simple way, is easy to re-configure at run time. (I think, for an editor
a table-driven implementation is the only way to go.  But let's keep to
the 8-bit issue.  [Btw, the character classification problem is
one of Jon Bentleys primary examples for data-driven-programming
in his "Programming Pearls".])

WNP>	So all of you Europeans should lobby hardware manufacturers to
WNP>	implement foreign characters in an intelligent way, and in a
WNP>	STANDARD WAY across different architectures, and THEN you can
WNP>	reasonably expect the authors of compilers and libraries and
WNP>	tools to support these characters. 

The Intelligent Way will be some ascii extended to eigth bits. 

It maybe hard to accept for Americans that ASCII is not
sufficient as an international code for information interchange
(and information processing).  A change has to be made and that
may hurt one or the other, resulting in emotionally heated
reactions on the American side.  At least ASCII has been proven
to be useful for some decades now, and large systems depend on
it, including UNIX(tm).

Authors of compilers, libraries, tools, and programming languages
have begun to at least consider foreign character sets.  ANSI C
is the current popular example.  While their trigraphs are just
an superflous wart on a wart in my opinion, their concept of
"locales" is just the thing we all think of.  Your  for
you, mine for me.

In the meantime, please re-read Torsten Olssons article.  All he
asks for is to respect all bits in a character variable as
pertaining to the character code itself.  Don't mess around with
the eigth, ninth, or whatever bit of it, please.

We Europeans don't want to prohibit you Americans to write
programs using the ASCII code anymore, don't get me wrong.  If
we want to extend your programs for our needs we will probably
manage that as we did until now.  Don't break your head with
writing sorting routines for non-ASCII character sets.
Thank you.

							Martin

[PS: I have to admit: I am frightened of the day I have to write
programs respecting Japanese/Chinese character sets, too.  These
poor fellows have to suffer from ASCII most, I think.]