Path: utzoo!attcan!uunet!lll-winken!lll-lcc!ames!umd5!uflorida!mailrus!iuvax!pur-ee!hankd
From: hankd@pur-ee.UUCP (Hank Dietz)
Newsgroups: comp.arch
Subject: Re: Memory latency / cacheing / scientific programs
Summary: Registers != Cache
Message-ID: <8429@pur-ee.UUCP>
Date: 30 Jun 88 18:34:03 GMT
References: <243@granite.dec.com> <779@garth.UUCP> <2033@pt.cs.cmu.edu> <11023@ames.arc.nasa.gov>
Organization: Purdue University Engineering Computer Network
Lines: 41

In article <11023@ames.arc.nasa.gov>, lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes:
> Neither the 205 nor the Cray has a cache.  The philosophy is to put in
> enough registers that a cache is unnecessary.  The 256 registers on
> the 205 were plenty for any module that I saw.  The place where this
> approach hurts is in scalar codes that have very frequent procedure
> calls (typical C system and utilities code) since data has to be
> saved and restored between procedure calls even if it is being reused.
> So, don't run code like that on these machines more than necessary...

Registers are not a valid substitute for cache:  they are fundamentally more
restricted in function (although they are efficiently managed at
compile-time).  For example, both a[i] and a[j] can be kept in cache ad
infinitum, however, if we (the compiler) don't know if i==j, we can't put
either one in a register without having to flush both from registers every
time either one is stored into.  It's the classic ambiguous alias problem.

For vector registers, incidentally, you don't hit this problem because the
same alias problem which prevents keeping things in registers would also
prevent vectorization.  State save/restore on procedure calls is not so bad
a problem for registers, because the state variables typically have no
aliases, hence they can be maintained in registers (if you have enough of
them, as in some RISC designs).  As for the number of registers, we've
recently found that a perfect (or just really good) global register
allocator should rarely want more than about 10 registers per processor --
it's a pitty there are so few good register allocators out in the real
world....  By the way, if you ignore loops, the cache size also can be a
very small number.

Chi-Hung Chi and I will have a paper discussing the aliasing problem and how
it relates to registers vs. cache at SUPERCOMPUTING '88:  "CRegs: A New Kind
of Memory for Referencing Arrays and Pointers."  The CReg is a new structure
which is managed like registers but treats aliased objects more like a cache
does...  patent perhaps-soon-to-be-pending?  Anyway, I'm willing to make the
paper available on a limited basis before the conference.

     __         /|
  _ |  |  __   / |  Compiler-oriented
 /  |--| |  | |  |  Architecture
/   |  | |__| |_/   Researcher from
\__ |  | | \  |     Purdue
    \    |  \  \
	 \      \   Hank Dietz, Ass't Prof. of EE