Xref: utzoo comp.sys.ibm.pc:16916 comp.binaries.ibm.pc.d:515 comp.emacs:3753
Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!rutgers!mcnc!rti!bcw
From: bcw@rti.UUCP (Bruce Wright)
Newsgroups: comp.sys.ibm.pc,comp.binaries.ibm.pc.d,comp.emacs
Subject: Re: US PC programmers still live in a 7-bit world!
Summary: Dealing with non-English words
Message-ID: <2344@rti.UUCP>
Date: 4 Jul 88 03:56:35 GMT
References: <1988Jun22.223158.1366@LTH.Se> <4581@killer.UUCP>
Organization: Research Triangle Institute, RTP, NC
Lines: 53

In article <4581@killer.UUCP>, bobc@killer.UUCP (Bob Calbridge) writes:
> In article <1988Jun22.223158.1366@LTH.Se>, newsuser@LTH.Se (Lund Institute of Technology news server) writes:
> 
> > Why? Well, we are sure the intelligent reader already grasps the
> > reason. Take a look at the IBM PC character code set  a b o v e
> > ASCII 127. Our alphabet is there, too, and you just can't imagine
> > what funny results your tools yield when encountering them.
> > So, if your pet program is to become our pet, too, you have
> > to rethink concerning using the 8th bit as a flag, you have
> > to rewrite toupper, tolower, word scan, delete word, word
> > counters and the like.
> 
> :-) But then that's why, as you included in your missive it's
> called ASCII.  The A stands for American.  Is there there
> such as thing as ISCII??? 
> 
> I can just imagine the loops and whirls you would have to
> go through to write a viable sort routine to include new
> alpha sorts.  And who's to determine what the order of sort
> would be.  I'll see ya at the international conference where
> we can hash this out.  :-]
> 
I realize that Bob was just trying to be funny, but even if you live in
the US you may want the international characters.  If you are trying
to correspond with people outside the English-speaking world (or sometimes
even within the English-speaking world), it can be >>EXTREMELY<< useful
to have the true characters rather than just their approximation in ASCII.
This is especially true if you are using letters (you know, paper and ink
and all that) rather than the network - your recipient will not appreciate
misspellings which are "unavoidable" on your text editor.

It's fairly easy to write a table to convert lower case to upper case
for any given character set.  Unfortunately the IBM set does not correspond 
to the ISO character set nor to the DEC multinational character set - this 
inhibits the portability of both files and programs.  

It would also be nice if the IBM-PC were set up to make >>GENERATING<< the
characters easier -- the DEC terminals and PC's are much nicer in this
respect.  It is much easier to remember "Compose-e-'" than it is to remember
"Alt-1-3-0"  (not to mention fewer keystrokes).  Unfortunately even on
the DEC PC's most editors don't deal with the multinational set well (though
they do on some of the larger DEC systems).

You do have a real problem with sorts though - even though any GIVEN collating
sequence is easy (just a table lookup), you will find that there is no
UNIVERSAL collating sequence for all the "special" characters.  For example,
Spanish considers "Ll" a special character sequence and collates it separately
from the English sequence "Ll".  I suspect that you would have to have a
number of translation tables.  But that would be a relatively minor problem -
at least for our purposes, we are more interested in generating text for
printing than we are in sorting anything.

						Bruce C. Wright