Path: utzoo!attcan!uunet!lll-winken!lll-lcc!ames!umd5!uflorida!mailrus!iuvax!pur-ee!hankd From: hankd@pur-ee.UUCP (Hank Dietz) Newsgroups: comp.arch Subject: Re: Memory latency / cacheing / scientific programs Summary: Registers != Cache Message-ID: <8429@pur-ee.UUCP> Date: 30 Jun 88 18:34:03 GMT References: <243@granite.dec.com> <779@garth.UUCP> <2033@pt.cs.cmu.edu> <11023@ames.arc.nasa.gov> Organization: Purdue University Engineering Computer Network Lines: 41 In article <11023@ames.arc.nasa.gov>, lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes: > Neither the 205 nor the Cray has a cache. The philosophy is to put in > enough registers that a cache is unnecessary. The 256 registers on > the 205 were plenty for any module that I saw. The place where this > approach hurts is in scalar codes that have very frequent procedure > calls (typical C system and utilities code) since data has to be > saved and restored between procedure calls even if it is being reused. > So, don't run code like that on these machines more than necessary... Registers are not a valid substitute for cache: they are fundamentally more restricted in function (although they are efficiently managed at compile-time). For example, both a[i] and a[j] can be kept in cache ad infinitum, however, if we (the compiler) don't know if i==j, we can't put either one in a register without having to flush both from registers every time either one is stored into. It's the classic ambiguous alias problem. For vector registers, incidentally, you don't hit this problem because the same alias problem which prevents keeping things in registers would also prevent vectorization. State save/restore on procedure calls is not so bad a problem for registers, because the state variables typically have no aliases, hence they can be maintained in registers (if you have enough of them, as in some RISC designs). As for the number of registers, we've recently found that a perfect (or just really good) global register allocator should rarely want more than about 10 registers per processor -- it's a pitty there are so few good register allocators out in the real world.... By the way, if you ignore loops, the cache size also can be a very small number. Chi-Hung Chi and I will have a paper discussing the aliasing problem and how it relates to registers vs. cache at SUPERCOMPUTING '88: "CRegs: A New Kind of Memory for Referencing Arrays and Pointers." The CReg is a new structure which is managed like registers but treats aliased objects more like a cache does... patent perhaps-soon-to-be-pending? Anyway, I'm willing to make the paper available on a limited basis before the conference. __ /| _ | | __ / | Compiler-oriented / |--| | | | | Architecture / | | |__| |_/ Researcher from \__ | | | \ | Purdue \ | \ \ \ \ Hank Dietz, Ass't Prof. of EE