From: utzoo!decvax!harpo!eagle!mhtsa!alice!npoiv!eisx!whuxlb!mash Newsgroups: net.lang.c Title: Re: C pet peeve Article-I.D.: whuxlb.1040 Posted: Thu Mar 24 00:22:50 1983 Received: Fri Mar 25 21:35:41 1983 As various folks have mentioned, it is difficult to check C subscripts. In fact, it is worse than has been mentioned: there may well be only two rational design points for languages ofthe C/PASCAL/FORTRAN/ALGOL... level: 1) (like C) use a language that models typical machines directly, with little extra overhead, and fairly unconstrained semantics, i.e., we all know pointers are addresses, and expect no protection. OR 2) Design a language to be compile-time checkable from day one, with a) highly-constrained pointer semantics, b) either dope vectors/ descriptors for any objects (like arrays) passed by reference, or array-size conformance required of functions (thus forbidding variably-sized arguments). In case 2, given an optimizing compiler that does serious dataflow analysis (i.e., like IBM FORTRAN IV(H)), it is possible to optimize away many of the otherwise necessary subscript checks. However, much care is needed in design of language semantics or this becomes excruciatingly difficult (excruciating because safety usually implies numerous checks that are actually unecessary). For example, in PL/I: DCL X(10); DCL X(10); DCL X(10); DO I = 1 TO 10; DO I = 1 TO 10; CALL SUBR(I); X(I) = X(I)+1; CALL SUBR(I); I = 1; END; X(I) = X(I) + 1; CALL SUBY; END; X[I] = 1; The left case needs no subscript checking; the 2nd case needs 1 subscript check for the assignment statement, because SUBR may have modified I. (It probably didn't, but call-by-reference makes it very difficult to know what's happening at the point of invocation -- here, C's default call-by-value only is a great help: at least when you see funct(&x) you expect that x might be changed.) Even worse, in the 3rd case, the X(I) above also needs a check, because safety requires that you assume that once you give away the address of anything (as in SUBR), that it may be saved somewhere and the value modified in any subroutine call. Same issue arises in some FORTRANs. Solutions to the problem for typical languages require complex inter- procedural analysis, fancy linkers, or complex compilation/binding systems What's the moral? this is not an argument against checking for (subscript-in-range, undefined variables, pointer usage), but an observation that doing checking well requires considerable language design thought, or acceptance of considerable overhead in space and time. I personally think that either a) stick with something whose semantics is fairly straightforward, like C, or b) go to a much higher level where subscript-checking mostly disappears into higher-level aggregate operations, i.e., go to APL or SETL, etc. -mashey