Path: utzoo!attcan!uunet!lll-winken!lll-tis!ames!amelia!lemming.nas.nasa.gov!fouts
From: fouts@lemming.nas.nasa.gov.nas.nasa.gov (Marty Fouts)
Newsgroups: comp.lang.fortran
Subject: Re: Arrays and pointers
Message-ID: <1023@amelia.nas.nasa.gov>
Date: 20 Sep 88 22:40:47 GMT
References: <447@quintus.UUCP> <3826@lanl.gov>
Sender: news@amelia.nas.nasa.gov
Reply-To: fouts@lemming.nas.nasa.gov.nas.nasa.gov (Marty Fouts)
Lines: 131


From the point of view of a language designer, one way to think of
arrays is that they provide a notation for naming a block of
contiguous memory and a mapping references to single elements of that
block.  Consider the Fortran array A(2,2) (using 1 based indexing.)
This array names 4 consecutive locations:

  Some_Address + 0  A(1,1)
  Some_Address + 1  A(2,1)
  Some_Address + 2  A(1,2)
  Some_Address + 3  A(2,2)

This approach generalizes to multiple dimensions, structures don't
have to fit into a single memory word, and indices can range from any
minimum to any maximum.  In this model, an Array is precisely a
rectangular array with adjacent elements.

This is exactly equivalent to using pointer arithmetic to accomplish
the naming, in which the arithematic is given using the classic offset
formulae:

  Pointer = Some_Address

  Address_Of(A(i,j)) = Pointer + (i-1) + ( (j-1) * 2 )

It doesn't matter if your language allows you to explicitly express
the latter form or not, if you implement arrays as blocks of memory,
the equivalence exists.  [It is in fact, how the compiler does array
reference calculations without optimization in most languages. . .]

You can notice that (j-1) * 2 requires a multiplication. (Bit shift in
this case, but multiplication if the array dimension isn't a power of
two.) This can be avoided by building a table of precomputed
addresses, called a dope vector.  Some compilers do this as an
optimization for multiple dimension arrays, to avoid a lot of runtime
calculations.

A 'Trick' many Fortran programmers use to avoid array index
calculations when marching through a multiple dimension array is to
use Fortran to "flatten" the array:

      PROGRAM XMPL
      PARAMETER *** Specify values for N, M, and L
      DIMENSION X(N,M,L)
      CALL INIT(X,N*M*L)
      *** etc ***
      END

      SUBROUTINE INIT(X,ISIZE)
      DIMENSION X(ISIZE)
      DO 10 I = 1, ISIZE
        X(I) = *** initialization value
   10 CONTINUE
      RETURN
      END

Of course, if your language has explicit address manipulation, you can
avoid the subroutine call via

      P = &X(1,1,1)
      DO 10 I = 0, ISIZE - 1
        *(P+I) = *** initialization value
   10 CONTINUE

You can also do your own 'dope vector' optimization:

      DO 30 I = 1, L
        DO 20 J = 1, M
          DO 10 K = 1,N
            *** Calculate using X(I,J,K)
   10     CONTINUE
   20   CONTINUE
   30 CONTINUE

can also be expressed as

      INDEX = 0
      DO 30 I = 1, L
        DO 20 J = 1, M
          DO 10 K = 1,N
            *** Calculate using *(P+INDEX)
            INDEX = INDEX + 1
   10     CONTINUE
   20   CONTINUE
   30 CONTINUE

an optimization one can argue that the compiler should make, but not
all compilers do.  Also, you can write your own function for mapping
names to physical address and implement sparse matrices, triangular
matrices, etc, in a way you can control.

You can also generalize all of these things.  Current Fortran doesn't
give you the kind of control to do this, because it has no mechanism
for explicit control of name to address mapping and aliasing at this
level.  Current C does some of this, although not all of it well.  In
addition, current C adds more general addres to name aliasing
functionality.

Contrary to some of the arguments in this list, from the point of view
of a language designer, pointers/arrays serve very much the same
purpose.  contiguous arrays are a compact special purpose mechanism
for annotating a mapping from names to addresses.  Pointers also
provide such a mechanism, plus give additional semantics.  C
recognizes the existing equivalance by allowing the two syntactic
structures which express the same semantics to be used interchangably.
This is a notational convenience which can make algorithms easier to
read.  (Also harder, it is still possible to write unreadable code in
any language.)

To see the value of multiple syntax for the same semantics, consider
the result (proven long ago) that the only structures needed to
express control in sequential programs are sequence, branching, and
loop-while-true.  Most languages offer several variations on branching
and looping, for the notational use of the programmer, and few people
would argue that we should do away with all control structures except
goto and while-condition-do-block.

A language with pointers has more expressive capabilty (although, of
course, it can't compute anything one without them can) than one which
doesn't. This gives the programmer more power in expressing
computation, at the expensive of providing more ways to make mistakes.

Some programmers would prefer not to have access to the expressive
power, as a way of avoiding making bugs.  Others, myself included, are
willing to take the chance.

Marty
+-+-+-+     I don't know who I am, why should you?     +-+-+-+
   |        fouts@lemming.nas.nasa.gov                    |
   |        ...!ames!orville!fouts                        |
   |        Never attribute to malice what can be         |
+-+-+-+     explained by incompetence.                 +-+-+-+