Xref: utzoo comp.sys.ibm.pc:16916 comp.binaries.ibm.pc.d:515 comp.emacs:3753 Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!rutgers!mcnc!rti!bcw From: bcw@rti.UUCP (Bruce Wright) Newsgroups: comp.sys.ibm.pc,comp.binaries.ibm.pc.d,comp.emacs Subject: Re: US PC programmers still live in a 7-bit world! Summary: Dealing with non-English words Message-ID: <2344@rti.UUCP> Date: 4 Jul 88 03:56:35 GMT References: <1988Jun22.223158.1366@LTH.Se> <4581@killer.UUCP> Organization: Research Triangle Institute, RTP, NC Lines: 53 In article <4581@killer.UUCP>, bobc@killer.UUCP (Bob Calbridge) writes: > In article <1988Jun22.223158.1366@LTH.Se>, newsuser@LTH.Se (Lund Institute of Technology news server) writes: > > > Why? Well, we are sure the intelligent reader already grasps the > > reason. Take a look at the IBM PC character code set a b o v e > > ASCII 127. Our alphabet is there, too, and you just can't imagine > > what funny results your tools yield when encountering them. > > So, if your pet program is to become our pet, too, you have > > to rethink concerning using the 8th bit as a flag, you have > > to rewrite toupper, tolower, word scan, delete word, word > > counters and the like. > > :-) But then that's why, as you included in your missive it's > called ASCII. The A stands for American. Is there there > such as thing as ISCII??? > > I can just imagine the loops and whirls you would have to > go through to write a viable sort routine to include new > alpha sorts. And who's to determine what the order of sort > would be. I'll see ya at the international conference where > we can hash this out. :-] > I realize that Bob was just trying to be funny, but even if you live in the US you may want the international characters. If you are trying to correspond with people outside the English-speaking world (or sometimes even within the English-speaking world), it can be >>EXTREMELY<< useful to have the true characters rather than just their approximation in ASCII. This is especially true if you are using letters (you know, paper and ink and all that) rather than the network - your recipient will not appreciate misspellings which are "unavoidable" on your text editor. It's fairly easy to write a table to convert lower case to upper case for any given character set. Unfortunately the IBM set does not correspond to the ISO character set nor to the DEC multinational character set - this inhibits the portability of both files and programs. It would also be nice if the IBM-PC were set up to make >>GENERATING<< the characters easier -- the DEC terminals and PC's are much nicer in this respect. It is much easier to remember "Compose-e-'" than it is to remember "Alt-1-3-0" (not to mention fewer keystrokes). Unfortunately even on the DEC PC's most editors don't deal with the multinational set well (though they do on some of the larger DEC systems). You do have a real problem with sorts though - even though any GIVEN collating sequence is easy (just a table lookup), you will find that there is no UNIVERSAL collating sequence for all the "special" characters. For example, Spanish considers "Ll" a special character sequence and collates it separately from the English sequence "Ll". I suspect that you would have to have a number of translation tables. But that would be a relatively minor problem - at least for our purposes, we are more interested in generating text for printing than we are in sorting anything. Bruce C. Wright