Xref: utzoo comp.sys.ibm.pc:16768 comp.binaries.ibm.pc.d:478 comp.emacs:3711 Path: utzoo!attcan!uunet!mcvax!unido!infbs!neitzel From: neitzel@infbs.UUCP (Martin Neitzel) Newsgroups: comp.sys.ibm.pc,comp.binaries.ibm.pc.d,comp.emacs Subject: Re: US PC programmers still live in a 7-bit world! Message-ID: <920@infbs.UUCP> Date: 27 Jun 88 23:59:48 GMT References: <1988Jun22.223158.1366@LTH.Se> <126@dcs.UUCP> Reply-To: neitzel@infbs.UUCP (Martin Neitzel) Organization: TU Braunschweig,Informatik,West Germany Lines: 120 (Here comes yet anotherone of those European bastards... :-) torsten@DNA.LTH.Se (Torsten Olsson) writes: TO> US PC programmers still live in a 7-bit world! TO> [...] In article <126@dcs.UUCP> wnp@dcs.UUCP (Wolf N. Paul) replied: WNP> [...] WNP> The C functions toupper()/tolower() rely on upper and lower case to be WNP> two parallel groups of consecutive codes within the ASCII scheme. Basically Right. But one of the reasons for those macros/functions is to get independent of the used character set, or am I completely wrong? (I know, the UNIX(tm)requires us to write something like "if (isascii(c) && islower(c) ...", but that should be relatively easy to port into an 8-bit environment. On the other hand, if someone thinks: "Hey, 7 bits in my char for the ascii code, now let's see what I can mess around with the 8th!" -- that's neither portable nor justified by K&R. [This is the point where I should mention that discussions about our European character sets are usually somewhat emotionally heated (probably because we always had kind of to "suffer" from ASCII here in Europe). None of the following replies are intended to attack Wolf Paul personally. But some points should be made clear. ] WNP> The way IBM has chosen to implement non-English characters on its WNP> PC line is WNP> (a) non-standard (i.e. applies only to IBM-compatible machines) and Then perhaps we can move to ISO Latin-X. That would still confront us with the same problems: non-ASCII, 8-bit. (For more on the standards issue, see below.) WNP> (b) incompatible with the assumption about upper and lower case in ASCII WNP> and thus in C and other programming languages. K&R did not made any assumptions about required character sets, except there must be at least one positive element contained. So far regarding "C & ASCII". WNP> (c) incompatible with the way European characters are implemented WNP> on MOST printers and ANSI terminals. Perhaps I should explain what "this way" was: The ascii characters like []{}\~ were considered as "not so useful for Europeans" and their codes were interpreted as national characters. That was a HACK, not a solution! Yes, usage of those ascii charcaters and national characters was mutually exclusive. Not "both of them" in one document, listen? [btw, that's one of the reasons why I write in English when programming, and not in my native language (and the language the people that want to read my programs here would like to read.)] Wolf is right: IBM did not constrain itself to any standard when introducing their PCs. But they made a first step into a reasonable direction: (1) Keeping 0-127 ASCII, and (2) Providing most of the Europeans with "their" characters. They certainly broke no previous standard. The previous practice as explained above was not a standard. As I said, it was a hack, and nobody ever before did care about finding a solution. WNP> The way IBM implemented it, all case functions would have to be WNO> table-driven, which is much less elegant than working with the WNP> parallel ranges of characters in standard ASCII. Why is the table-driven approach "much less elegant"? A syntax-table ala GNU's is easy to implement, costs few memory, simplifies and speeds up the code, offers classifications beyond "letter/upper/lower" in a simple way, is easy to re-configure at run time. (I think, for an editor a table-driven implementation is the only way to go. But let's keep to the 8-bit issue. [Btw, the character classification problem is one of Jon Bentleys primary examples for data-driven-programming in his "Programming Pearls".]) WNP> So all of you Europeans should lobby hardware manufacturers to WNP> implement foreign characters in an intelligent way, and in a WNP> STANDARD WAY across different architectures, and THEN you can WNP> reasonably expect the authors of compilers and libraries and WNP> tools to support these characters. The Intelligent Way will be some ascii extended to eigth bits. It maybe hard to accept for Americans that ASCII is not sufficient as an international code for information interchange (and information processing). A change has to be made and that may hurt one or the other, resulting in emotionally heated reactions on the American side. At least ASCII has been proven to be useful for some decades now, and large systems depend on it, including UNIX(tm). Authors of compilers, libraries, tools, and programming languages have begun to at least consider foreign character sets. ANSI C is the current popular example. While their trigraphs are just an superflous wart on a wart in my opinion, their concept of "locales" is just the thing we all think of. Your for you, mine for me. In the meantime, please re-read Torsten Olssons article. All he asks for is to respect all bits in a character variable as pertaining to the character code itself. Don't mess around with the eigth, ninth, or whatever bit of it, please. We Europeans don't want to prohibit you Americans to write programs using the ASCII code anymore, don't get me wrong. If we want to extend your programs for our needs we will probably manage that as we did until now. Don't break your head with writing sorting routines for non-ASCII character sets. Thank you. Martin [PS: I have to admit: I am frightened of the day I have to write programs respecting Japanese/Chinese character sets, too. These poor fellows have to suffer from ASCII most, I think.]