Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watmath!clyde!cbosgd!ihnp4!ptsfa!ames!ucbcad!ucbvax!decvax!decwrl!labrea!glacier!jbn From: jbn@glacier.UUCP Newsgroups: comp.unix.wizards Subject: Re: Type size problems Message-ID: <17123@glacier.STANFORD.EDU> Date: Thu, 9-Jul-87 14:39:28 EDT Article-I.D.: glacier.17123 Posted: Thu Jul 9 14:39:28 1987 Date-Received: Sun, 12-Jul-87 03:05:20 EDT References: <3659@spool.WISC.EDU> <743@geac.UUCP> Reply-To: jbn@glacier.UUCP (John B. Nagle) Organization: Stanford University Lines: 131 Newsgroups: comp.unix.wizards Subject: Re: Type size problems Summary: Expires: References: <3659@spool.WISC.EDU> <743@geac.UUCP> Sender: Reply-To: jbn@glacier.UUCP (John B. Nagle) Followup-To: Distribution: Organization: Stanford University Keywords: I did some work in this area at one time, back when Ada came in four colors, and proposed some approaches that are sound but have more of a Pascal or Ada flavor than C programmers are used to. My basic position was similar to that taken by the IEEE floating point standards people: the important thing is to get the right answer. As it turns out, with some work in the compiler, we can do integer arithmetic in a completely portable way with no loss in performance. 1. Sizes belong to the program, not to the machine. Thus, integer variables should be declared by range, by giving a lower and upper bound for the value. (In Pascal, this is called a "subrange", reflecting the assumption by Wirth that the type "integer" is somehow big enough for all practical purposes. This reflects the fact that he was using a Control Data 6600, a machine with a 60-bit word, when he designed Pascal.) For example, in Pascal, one writes VAR x: 0..255; 2. Named types (such as "int" and "short") should be predefined but not built in, and thus redefinable if needed. Some standard definitions such as "unsigned_byte" should be defined the same way in all implementations. But in general programmers should use ranges. (Of course, when declaring a range, expressions evaluatable at compile time should be allowed in range bounds. Pascal doesn't allow this, which results in great frustration.) VAR unsigned_short: 0..65535; is a typical declaration in Pascal. C should have equivalent syntax. It's silly that one has to guess what the type keywords mean in terms of numeric value in each implementation yet can't simply write the range when you want to. Thus, if we had syntax in C for ranges, along the lines of range 0..65535 unsigned_short; we could do in C what one can do in Pascal. Given range declarations, one can create the "fundamental" types of C. typedef range 0..255 unsigned_byte; typedef range -(2^15)..(2^15)-1 short; typedef range 0..(2^16)-1 unsigned_short; typedef range -(2^31)..(2^31)-1 long; typedef range 0..(2^31)-1 unsigned_long; These should be in an include file, not built into the compiler. 3. Now here's the good part. The compiler has to pick the size of intermediate results. (When we write "X = (A+B)+C;", "A+B" generates an intermediate result.) The compiler should always pick a size for an intermediate result that cannot result in overflow unless overflow of the result would occur. This strange rule does what you want; if you write "X = X+1", and X has the range -32768..32767 (what we usually call "short"), then there's no need to compute a long result for "X+1", even though, if X=32767, overflow would occur, because overflow would also occur in the final result, which is an error. (One would like to check for such errors; on VAXen, one can enable such checking in the subroutine entry mask. But nobody does; I once built PCC with it enabled, and almost no UNIX program would work. More on this later.) On the other hand, if one writes "X = (A*B)/C;", and all variables are "short", the term "A*B" will be computed as a "long" automatically, thus avoiding the possibility of overflow. (If you don't like that, you would write "X = ((short)(A*B))/C;" and the compiler would recognize this a statement that A*B should fit in a "short".) 4. Sometimes, but not often, one wants overflow, usually because one is doing checksumming, hashing, or modular arithmetic. The right way to do this is to provide modular arithmetic operators. One should be able to write X = MODPLUS(X,1,256); and get "X+1 % 256". The compiler must recognize as special cases modular arithmetic with bounds of 2^n, and especially 2^(8*b), and do those efficiently. The above example ought to compile into a simple byte-wide add on machines that have the instruction to do it. 3. Some intermediate results aren't computable on most machines. short X, A, B, C, D, E, F, G, H, I; X = (A * B * C * D * E * F * G * H) / I; should generate an error message at compile time indicating that the intermediate result won't fit in the machine. If the user really wants something like that evaluated (and recognize that for most random values overflow would result in the above expression) some casts or coercions will be necessary to tell the compiler what the user has in mind. Note that some programs that will compile on some machines won't compile on others. This is better than getting the wrong answer. 4. Function declarations have to be available when calls are compiled, so the compiler can see what types it is supposed to send. Ada and Pascal work this way, and C++ moves strongly in that direction. 5. There probably shouldn't be a predefined type "int" or "integer" at all. (I've been thinking of publishing the thinking shown here under the title "Type integer considered harmful"). There's a general trend toward making integer arithmetic portable in LISP, where unlimited length integers are often supported. To the Common LISP programmer, the width of the underlying machine's numeric unit is irrelevant. The performance penalty for this generality in LISP is high. But we can achieve equivalent portability in the hard-compiled languages with some effort. This discussion probably should move to the C or C++ groups. John Nagle