Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!ames!lamaster From: lamaster@ames.arc.nasa.gov (Hugh LaMaster) Newsgroups: comp.arch Subject: Re: Memory latency / cacheing / scientific programs Keywords: cache latency bus memory Message-ID: <11022@ames.arc.nasa.gov> Date: 29 Jun 88 16:07:16 GMT References: <243@granite.dec.com> Reply-To: lamaster@ames.arc.nasa.gov.UUCP (Hugh LaMaster) Organization: NASA Ames Research Center, Moffett Field, Calif. Lines: 32 In article <243@granite.dec.com> jmd@granite.dec.com (John Danskin) writes: > >I am interested in running a class of programs that process large >(bigger than cache but smaller than memory) arrays of data repeatedly. In some cases it is possible to reorganize array accesses so that, for example, columns of the array are reused. The generic example of how to do this is Dongarra and Eisenstat's "Squeezing the most out of an algorithm in Cray Fortran", ANL/MCS-TM-9 (Argonne Nat. Lab. - May 83). This is the algorithm used in the "Matrix Vector" version of the Linpack benchmark. This type of reorganization can help cache, TLB translation, and paging efficiency on some scalar machines (if they have fast enough floating point to notice), and vector performance on both memory to memory (e.g. Cyber 205) and vector register (e.g. Cray) vector machines. If your algorithm requires a complete pass through the array for a small amount of computation done per step, clever coding can only eliminate unnecessary memory references. It will not replace the full sweep through memory that you have to do. So the key to seeing whether some improvement is possible is to see whether the elements of the array are being used more than once in each iteration. -- Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)694-6117