Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!ames!lamaster
From: lamaster@ames.arc.nasa.gov (Hugh LaMaster)
Newsgroups: comp.arch
Subject: Re: Memory latency / cacheing / scientific programs
Keywords: cache latency bus memory
Message-ID: <11022@ames.arc.nasa.gov>
Date: 29 Jun 88 16:07:16 GMT
References: <243@granite.dec.com>
Reply-To: lamaster@ames.arc.nasa.gov.UUCP (Hugh LaMaster)
Organization: NASA Ames Research Center, Moffett Field, Calif.
Lines: 32

In article <243@granite.dec.com> jmd@granite.dec.com (John Danskin) writes:
>
>I am interested in running a class of programs that process large
>(bigger than cache but smaller than memory) arrays of data repeatedly.


In some cases it is possible to reorganize array accesses so that, for
example, columns of the array are reused.  The generic example of how
to do this is Dongarra and Eisenstat's "Squeezing the most out of an
algorithm in Cray Fortran", ANL/MCS-TM-9 (Argonne Nat. Lab. - May 83).
This is the algorithm used in the "Matrix Vector" version of the Linpack
benchmark.  This type of reorganization can help cache, TLB translation,
and paging efficiency on some scalar machines (if they have fast
enough floating point to notice), and vector performance on both
memory to memory (e.g. Cyber 205) and vector register (e.g. Cray)
vector machines.

If your algorithm
requires a complete pass through the array
for a small amount of computation done
per step, clever coding can only eliminate unnecessary memory references.
It will not replace the full sweep through memory that you have to do.
So the key to seeing whether some improvement is possible is to see
whether the elements of the array are being used more than once in each
iteration.


-- 
  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)694-6117