Xref: utzoo comp.sys.ibm.pc:16795 comp.binaries.ibm.pc.d:481 comp.emacs:3717 Path: utzoo!attcan!uunet!lll-winken!lll-lcc!ames!killer!wnp From: wnp@killer.UUCP (Wolf Paul) Newsgroups: comp.sys.ibm.pc,comp.binaries.ibm.pc.d,comp.emacs Subject: Re: US PC programmers still live in a 7-bit world! Message-ID: <4635@killer.UUCP> Date: 28 Jun 88 20:33:41 GMT References: <1988Jun22.223158.1366@LTH.Se> <126@dcs.UUCP> <920@infbs.UUCP> Reply-To: wnp@killer.UUCP (Wolf Paul) Organization: The Unix(R) Connection BBS, Dallas, Tx Lines: 169 In article <920@infbs.UUCP> neitzel@infbs.UUCP (Martin Neitzel) writes: >(Here comes yet anotherone of those European bastards... :-) Nun, so schlimm sind wir doch nicht! >torsten@DNA.LTH.Se (Torsten Olsson) writes: >TO> US PC programmers still live in a 7-bit world! >TO> [...] >In article <126@dcs.UUCP> wnp@dcs.UUCP (Wolf N. Paul) replied: >WNP> [...] >WNP> The C functions toupper()/tolower() rely on upper and lower case to be >WNP> two parallel groups of consecutive codes within the ASCII scheme. >Basically Right. But one of the reasons for those macros/functions >is to get independent of the used character set, or am I completely >wrong? (I know, the UNIX(tm)requires us to write something >like "if (isascii(c) && islower(c) ...", but that should be relatively >easy to port into an 8-bit environment. On the other hand, if someone >thinks: "Hey, 7 bits in my char for the ascii code, now let's see what >I can mess around with the 8th!" -- that's neither portable nor justified >by K&R. I fully agree with you there. My point is that if we are going to use the 8-bit character set, and if, like IBM, we call it "extended ASCII", let's extend it consistently. In standard ASCII, uppercase alphabetics are codes 65-90 (Hex 41-5A), and lowercase alphas are codes 97-122 (Hex 61-7A). Thus, by adding 32 (Hex 20) to an uppercase character, I can convert it to lower case, and by subtracting the same amount from a lower case character, i can convert it to an uppercase character. If IBM's character set were really "extended ASCII", this would work for the non-English, 8-bit characters, as well. It doesn't ... Of course, IBM is not alone to blame. When I worked with Apple // computers, they would sell a computer in Germany or Austria which could be switched between German and English, but if you moved that computer to France, or even to Switzerland, you'd be stuck with some common characters which your computer didn't produce. They used the eighth bit as a "flashing" attribute. > None of the following replies are intended to attack Wolf Paul > personally. But some points should be made clear. Don't feel attacked. I'm European myself. > >WNP> The way IBM has chosen to implement non-English characters on its >WNP> PC line is >WNP> (a) non-standard (i.e. applies only to IBM-compatible machines) and > >Then perhaps we can move to ISO Latin-X. That would still confront us >with the same problems: non-ASCII, 8-bit. (For more on the standards >issue, see below.) I'd be very interested if you or someone else could send me a table of ISO Latin-X. >WNP> (b) incompatible with the assumption about upper and lower case in ASCII >WNP> and thus in C and other programming languages. > >K&R did not made any assumptions about required character sets, except >there must be at least one positive element contained. So far regarding >"C & ASCII". But in practice most compiler writers assume ASCII, probably because computer manufacturers claim that their machines support ASCII -- albeit "extended". >WNP> (c) incompatible with the way European characters are implemented >WNP> on MOST printers and ANSI terminals. > >Perhaps I should explain what "this way" was: The ascii characters like >[]{}\~ were considered as "not so useful for Europeans" and their codes >were interpreted as national characters. That was a HACK, not a >solution! Yes, usage of those ascii charcaters and national characters >was mutually exclusive. Not "both of them" in one document, listen? I agree with that assessment. >[btw, that's one of the reasons why I write in English when programming, > and not in my native language (and the language the people that want to > read my programs here would like to read.)] One reason I avoid my native language (same as yours) when dealing with computers and programming are the unbelievable kludges which seem to have entered it. I can't get myself talking or writing about "Fietschers" of a system. (I picked that one our of "Chip" a couple of years ago). >Wolf is right: IBM did not constrain itself to any standard when >introducing their PCs. But they made a first step into a reasonable >direction: >(1) Keeping 0-127 ASCII, and >(2) Providing most of the Europeans with "their" characters. > >They certainly broke no previous standard. The previous practice >as explained above was not a standard. As I said, it was a hack, and >nobody ever before did care about finding a solution. But they missed the mark when they failed to group upper and lower case characters together in a manner similar to standard ASCII. >WNP> The way IBM implemented it, all case functions would have to be >WNO> table-driven, which is much less elegant than working with the >WNP> parallel ranges of characters in standard ASCII. > >Why is the table-driven approach "much less elegant"? A syntax-table >ala GNU's is easy to implement, costs few memory, simplifies and speeds >up the code, offers classifications beyond "letter/upper/lower" in a >simple way, is easy to re-configure at run time. (I think, for an editor >a table-driven implementation is the only way to go. But let's keep to >the 8-bit issue. [Btw, the character classification problem is >one of Jon Bentleys primary examples for data-driven-programming >in his "Programming Pearls".]) Because of the harebrained way IBM assigned characters to the eight-bit positions, you would actually need two tables -- one to go from upper to lower case, and another vice-versa. THAT's inelegant. >The Intelligent Way will be some ascii extended to eigth bits. > >It maybe hard to accept for Americans that ASCII is not >sufficient as an international code for information interchange >(and information processing). A change has to be made and that >may hurt one or the other, resulting in emotionally heated >reactions on the American side. At least ASCII has been proven >to be useful for some decades now, and large systems depend on >it, including UNIX(tm). > >Authors of compilers, libraries, tools, and programming languages >have begun to at least consider foreign character sets. ANSI C >is the current popular example. While their trigraphs are just >an superflous wart on a wart in my opinion, their concept of >"locales" is just the thing we all think of. Your for >you, mine for me. I basically agree, but with todays's terminals and printers, you need a different ctype.h for your screen and for your printer. > >In the meantime, please re-read Torsten Olssons article. All he >asks for is to respect all bits in a character variable as >pertaining to the character code itself. Don't mess around with >the eigth, ninth, or whatever bit of it, please. With reference to C compilers, I don't think there's any funny business going on with the eighth bit. They just don't support the IBM non-English characters as alphabetics, because they don't fit into the ASCII scheme, even if one extended it to eight bits. >We Europeans don't want to prohibit you Americans to write >programs using the ASCII code anymore, don't get me wrong. If >we want to extend your programs for our needs we will probably >manage that as we did until now. Don't break your head with >writing sorting routines for non-ASCII character sets. Now that would truly have to be localized on a per-language basis, because different languages using the same character shapes don't necessarily sort them in the same sequence. So we won't bother with it over here -- I'll wait until I get back to Europe, and then see which part I've landed in before I mess with sorting :-). >[PS: I have to admit: I am frightened of the day I have to write >programs respecting Japanese/Chinese character sets, too. These >poor fellows have to suffer from ASCII most, I think.] Well, that won't fit into 8 bits, anyway. -- Wolf N. Paul * 3387 Sam Rayburn Run * Carrollton TX 75007 * (214) 306-9101 UUCP: killer!dcs!wnp ESL: 62832882 DOMAIN: wnp@dcs.UUCP TLX: 910-380-0585 EES PLANO UD