Path: utzoo!attcan!uunet!ubvax!ames!lamaster From: lamaster@ames.arc.nasa.gov (Hugh LaMaster) Newsgroups: comp.arch Subject: Re: Memory latency / cacheing / scientific programs Message-ID: <11106@ames.arc.nasa.gov> Date: 1 Jul 88 00:36:49 GMT References: <243@granite.dec.com> <779@garth.UUCP> <2033@pt.cs.cmu.edu> <11023@ames.arc.nasa.gov> <8429@pur-ee.UUCP> Reply-To: lamaster@ames.arc.nasa.gov.UUCP (Hugh LaMaster) Organization: NASA Ames Research Center, Moffett Field, Calif. Lines: 66 In article <8429@pur-ee.UUCP> hankd@pur-ee.UUCP (Hank Dietz) writes: >Registers are not a valid substitute for cache: they are fundamentally more >restricted in function (although they are efficiently managed at >compile-time). For example, both a[i] and a[j] can be kept in cache ad >infinitum, however, if we (the compiler) don't know if i==j, we can't put >either one in a register without having to flush both from registers every >time either one is stored into. It's the classic ambiguous alias problem. Only scalars are allocated in registers using these compilers so there isn't an aliasing problem. What these compilers do, and what the architecture supports, is the use of registers for local scalars, and the use of memory for everything else: arrays and global variables of all kinds. While this is patently not the best arrangement for scalar oriented C, it works very well for Fortran because: 1) Fortran modules tend to be "bigger" and have more local variables, compiler generated temporaries from expression evaluations, etc., to store. 2) Arrays tend to have vector operations on them anyway, so a direct memory read/write is often in order. 3) Fortran does not have the same aliasing problems that "Algol-like" languages have. It is true that all other things being equal, cache is better. In fact, a really clever machine architecture might be able to dispense with registers altogether. On an array oriented vector machine, however, there is a serious problem. It is not unusual at all for a large fraction of memory to be accessed with each iteration of a problem, with each element accessed only once or a few times. If the entire cache is emptied before an element is reused, there isn't much point in having a cache. This is one point where array oriented machines and scalar oriented machines have mutually exclusive requirements. Without a register file, it is difficult to do the setup that the vector unit needs fast enough. Typically, code is reorganized on these machines so that some of the setup necessary for the NEXT vector instruction is started before beginning execution on the CURRENT instruction. Then, while the scalar setup instructions are completing the vector unit is operating (everything pipelined, scoreboards, etc.). So, on a vector machine, you will still need sufficient registers, even with a cache, and then it takes a lot of work to make sure that the cache doesn't hurt your vector performance (it gets even more interesting when you throw multiprocessor multitasking into the picture ;-) >them, as in some RISC designs). As for the number of registers, we've >recently found that a perfect (or just really good) global register >allocator should rarely want more than about 10 registers per processor -- I have seen the number "32" bandied about previously as the ideal number of registers for C. 10 looks pretty small to me. I can remember writing assembly code on a CDC 7600 and running out of registers very easily doing expression evaluations with array elements. It doesn't take many multi-dimensional arrays to use up your index registers (but, of course, multiplying the elements of a two dimensional array times a column of a third is an extremely unusual thing to do :-) What sort of programs is "10" based on? (Heavily recursive "small" programs are not a fair basis of comparison. How about troff or cc?) Again, the assumption that a load/store costs about the same as another instruction is not true on a fast pipelined machine with no cache. If a single load takes a lot of cycles, you need more registers. Many of these architectural choices are difficult to separate from each other. -- Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)694-6117