Path: utzoo!attcan!uunet!lll-winken!lll-tis!ames!amelia!lemming.nas.nasa.gov!fouts From: fouts@lemming.nas.nasa.gov.nas.nasa.gov (Marty Fouts) Newsgroups: comp.lang.fortran Subject: Re: Arrays and pointers Message-ID: <1023@amelia.nas.nasa.gov> Date: 20 Sep 88 22:40:47 GMT References: <447@quintus.UUCP> <3826@lanl.gov> Sender: news@amelia.nas.nasa.gov Reply-To: fouts@lemming.nas.nasa.gov.nas.nasa.gov (Marty Fouts) Lines: 131 From the point of view of a language designer, one way to think of arrays is that they provide a notation for naming a block of contiguous memory and a mapping references to single elements of that block. Consider the Fortran array A(2,2) (using 1 based indexing.) This array names 4 consecutive locations: Some_Address + 0 A(1,1) Some_Address + 1 A(2,1) Some_Address + 2 A(1,2) Some_Address + 3 A(2,2) This approach generalizes to multiple dimensions, structures don't have to fit into a single memory word, and indices can range from any minimum to any maximum. In this model, an Array is precisely a rectangular array with adjacent elements. This is exactly equivalent to using pointer arithmetic to accomplish the naming, in which the arithematic is given using the classic offset formulae: Pointer = Some_Address Address_Of(A(i,j)) = Pointer + (i-1) + ( (j-1) * 2 ) It doesn't matter if your language allows you to explicitly express the latter form or not, if you implement arrays as blocks of memory, the equivalence exists. [It is in fact, how the compiler does array reference calculations without optimization in most languages. . .] You can notice that (j-1) * 2 requires a multiplication. (Bit shift in this case, but multiplication if the array dimension isn't a power of two.) This can be avoided by building a table of precomputed addresses, called a dope vector. Some compilers do this as an optimization for multiple dimension arrays, to avoid a lot of runtime calculations. A 'Trick' many Fortran programmers use to avoid array index calculations when marching through a multiple dimension array is to use Fortran to "flatten" the array: PROGRAM XMPL PARAMETER *** Specify values for N, M, and L DIMENSION X(N,M,L) CALL INIT(X,N*M*L) *** etc *** END SUBROUTINE INIT(X,ISIZE) DIMENSION X(ISIZE) DO 10 I = 1, ISIZE X(I) = *** initialization value 10 CONTINUE RETURN END Of course, if your language has explicit address manipulation, you can avoid the subroutine call via P = &X(1,1,1) DO 10 I = 0, ISIZE - 1 *(P+I) = *** initialization value 10 CONTINUE You can also do your own 'dope vector' optimization: DO 30 I = 1, L DO 20 J = 1, M DO 10 K = 1,N *** Calculate using X(I,J,K) 10 CONTINUE 20 CONTINUE 30 CONTINUE can also be expressed as INDEX = 0 DO 30 I = 1, L DO 20 J = 1, M DO 10 K = 1,N *** Calculate using *(P+INDEX) INDEX = INDEX + 1 10 CONTINUE 20 CONTINUE 30 CONTINUE an optimization one can argue that the compiler should make, but not all compilers do. Also, you can write your own function for mapping names to physical address and implement sparse matrices, triangular matrices, etc, in a way you can control. You can also generalize all of these things. Current Fortran doesn't give you the kind of control to do this, because it has no mechanism for explicit control of name to address mapping and aliasing at this level. Current C does some of this, although not all of it well. In addition, current C adds more general addres to name aliasing functionality. Contrary to some of the arguments in this list, from the point of view of a language designer, pointers/arrays serve very much the same purpose. contiguous arrays are a compact special purpose mechanism for annotating a mapping from names to addresses. Pointers also provide such a mechanism, plus give additional semantics. C recognizes the existing equivalance by allowing the two syntactic structures which express the same semantics to be used interchangably. This is a notational convenience which can make algorithms easier to read. (Also harder, it is still possible to write unreadable code in any language.) To see the value of multiple syntax for the same semantics, consider the result (proven long ago) that the only structures needed to express control in sequential programs are sequence, branching, and loop-while-true. Most languages offer several variations on branching and looping, for the notational use of the programmer, and few people would argue that we should do away with all control structures except goto and while-condition-do-block. A language with pointers has more expressive capabilty (although, of course, it can't compute anything one without them can) than one which doesn't. This gives the programmer more power in expressing computation, at the expensive of providing more ways to make mistakes. Some programmers would prefer not to have access to the expressive power, as a way of avoiding making bugs. Others, myself included, are willing to take the chance. Marty +-+-+-+ I don't know who I am, why should you? +-+-+-+ | fouts@lemming.nas.nasa.gov | | ...!ames!orville!fouts | | Never attribute to malice what can be | +-+-+-+ explained by incompetence. +-+-+-+