Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/18/84; site brl-tgr.ARPA Path: utzoo!linus!philabs!cmcl2!seismo!brl-tgr!tgr!hester@ICSE.UCI.EDU From: hester@ICSE.UCI.EDU (Jim Hester) Newsgroups: net.sources Subject: Re: upper/lower case filter Message-ID: <3114@brl-tgr.ARPA> Date: Mon, 11-Nov-85 20:59:43 EST Article-I.D.: brl-tgr.3114 Posted: Mon Nov 11 20:59:43 1985 Date-Received: Wed, 13-Nov-85 04:37:25 EST Sender: news@brl-tgr.ARPA Lines: 50 UNIX provides this facility with the 'tr' (translate characters) program. To change everything to upper case, use tr A-Z a-z I don't know what effect this has if the letters are not contiguous (as in an IBM character code I won't name). If that is a problem, you just explicitly list the letters from A to Z in both upper and lower cases. If the files are reasonably large, a more efficient algorithm (than checking character types during input) is a table lookup scheme like the following (which is the basic method used by tr): #define NCHARS 256 int table[ NCHARS ], ch; for ( ch = 0 ; ch < NCHARS ; ++ch ) { if ( islower(ch) ) table[ch] = toupper(ch); else table[ch] = ch; } while ( EOF != (ch = getchar()) ) putchar( table[ch] ); Running a few quick tests, table lookup took 3/4 of the time of checking character types for each input character. When alphabetic characters are contiguous (which implies a constant difference between case of characters, which you took advantage of), as in ASCII, the initialization loop can be sped up by elimenating the 256 calls to islower() and 26 calls to toupper(). Simply remove the first three lines in the loop and add a new loop: shift = 'A'-'a'; for ( ch = 'a' ; ch <= 'z' ; ++ch ) table[ch] += shift; Also, if the character code uses a single bit to distinguish character case, you can speed it up even more by just ANDing or ORing a mask to the appropriate locations in the table: mask = ~('a'-'A'); for ( ch = 'a' ; ch <= 'z' ; ++ch ) table[ch] &= mask; One or both of these two speedups have negligable effect on the runtime for large inputs since, being only used during a constant initialization step, they are independant of the input size. It's probably better to stick with something closer to the original code I gave, for reasons of simplicity and portability. Jim