Path: utzoo!attcan!uunet!ubvax!ames!lamaster
From: lamaster@ames.arc.nasa.gov (Hugh LaMaster)
Newsgroups: comp.arch
Subject: Re: Memory latency / cacheing / scientific programs
Message-ID: <11106@ames.arc.nasa.gov>
Date: 1 Jul 88 00:36:49 GMT
References: <243@granite.dec.com> <779@garth.UUCP> <2033@pt.cs.cmu.edu> <11023@ames.arc.nasa.gov> <8429@pur-ee.UUCP>
Reply-To: lamaster@ames.arc.nasa.gov.UUCP (Hugh LaMaster)
Organization: NASA Ames Research Center, Moffett Field, Calif.
Lines: 66

In article <8429@pur-ee.UUCP> hankd@pur-ee.UUCP (Hank Dietz) writes:
>Registers are not a valid substitute for cache:  they are fundamentally more
>restricted in function (although they are efficiently managed at
>compile-time).  For example, both a[i] and a[j] can be kept in cache ad
>infinitum, however, if we (the compiler) don't know if i==j, we can't put
>either one in a register without having to flush both from registers every
>time either one is stored into.  It's the classic ambiguous alias problem.

Only scalars are allocated in registers using these compilers so there
isn't an aliasing problem.  What these compilers do, and what
the architecture supports, is the use of registers for local scalars,
and the use of memory for everything else: arrays and global variables
of all kinds.  While this is patently not the best arrangement for scalar
oriented C, it works very well for Fortran because:
1) Fortran modules tend to be "bigger" and have more local variables,
   compiler generated temporaries from expression evaluations, etc.,
   to store.
2) Arrays tend to have vector operations on them anyway, so a direct
   memory read/write is often in order.
3) Fortran does not have the same aliasing problems that "Algol-like"
   languages have.

It is true that all other things being equal, cache is better.  In fact,
a really clever machine architecture might be able to dispense with
registers altogether.  On an array oriented vector machine, however,
there is a serious problem.  It is not unusual at all for a large
fraction of memory to be accessed with each iteration of a problem,
with each element accessed only once or a few times.  If the entire
cache is emptied before an element is reused, there isn't much point
in having a cache.  This is one point where array oriented machines and
scalar oriented machines have mutually exclusive requirements.  Without
a register file, it is difficult to do the setup that the
vector unit needs fast enough.  Typically, code is reorganized on these
machines so that some of the setup necessary for the NEXT vector
instruction is started before beginning execution on the CURRENT instruction.
Then, while the scalar setup instructions are completing the vector unit is
operating (everything pipelined, scoreboards, etc.).  So, on a vector
machine, you will still need sufficient registers, even with a cache,
and then it takes a lot of work to make sure that the cache doesn't hurt
your vector performance (it gets even more interesting when you throw
multiprocessor multitasking into the picture ;-)

>them, as in some RISC designs).  As for the number of registers, we've
>recently found that a perfect (or just really good) global register
>allocator should rarely want more than about 10 registers per processor --

I have seen the number "32" bandied about previously as the ideal number
of registers for C.  10 looks pretty small to me.  I can remember writing
assembly code on a CDC 7600 and running out of registers very easily
doing expression evaluations with array elements. It doesn't take many
multi-dimensional arrays to use up your index registers (but, of course,
multiplying the elements of a two dimensional array times a column of a
third is an extremely unusual thing to do :-)    What sort of programs
is "10" based on?  (Heavily recursive "small" programs are not a fair
basis of comparison.  How about troff or cc?)  Again, the assumption
that a load/store costs about the same as another instruction is not true
on a fast pipelined machine with no cache.  If a single load takes a lot
of cycles, you need more registers.  Many of these architectural choices
are difficult to separate from each other.


-- 
  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)694-6117