Path: utzoo!attcan!uunet!lll-winken!lll-tis!ames!pioneer!eugene
From: eugene@pioneer.arpa (Eugene N. Miya)
Newsgroups: comp.arch
Subject: Re: Memory latency / cacheing / scientific programs
Keywords: cache latency bus memory
Message-ID: <10978@ames.arc.nasa.gov>
Date: 28 Jun 88 21:19:17 GMT
References: <243@granite.dec.com>
Sender: usenet@ames.arc.nasa.gov
Reply-To: eugene@pioneer.UUCP (Eugene N. Miya)
Organization: NASA Ames Research Center, Moffett Field, Calif.
Lines: 45

In article <243@granite.dec.com> jmd@granite.dec.com (John Danskin) writes:
>
>I am interested in running a class of programs that process large
>(bigger than cache but smaller than memory) arrays of data repeatedly.
> . . .
>How has prefetching worked out? Alan Jay Smith seems to recommend
>prefetching given that it is implemented well. Has anybody been able to
>do it?
> . . .
>Is my class of problem interesting? It is my understanding that many
>large scientific programs have similar behavior, but that the standard
>UNIX timesharing load (whatever that is) has significantly different behavior.

John, enjoyed lunch with you.
Yes, your problem is interesting, but few are willing to do anything
about it.  Obviously, significance is in the eye of the beholder, I
don't think they differ much (personally, gross generalization right?).

To wit: Alan stopped reading comp.arch about a year ago, but I did mail
the note to him.  He didn't have any special comment.  In the interest
of discussion, however, Alan and I have had a running argument about
how to deal with memory hierarchies.

Users currently write codes which run optimally on vector machines.
This contorts the code with compiler directives, tight vector loops
(rather than looser ones), and so forth.  It's still portable, but
looks funny at time.

Alan's argument is that if a machine has a cache, codes can be written
to assume a cache, and do as much work which in the fast cache with a
minimum of faulting.  The problem is people don't have a feel for what
it costs to place some in there, how much work is break even, what kinds
of percentages of a full cache (If I have a 16 K cache, how much work
can I do).  There's no guideposts unlike the vector world.

Perhaps people should write a few articles (grad students?)
on writing codes specifically for performance.  I would not,
it's too transitory a topic.  Just what are the tradeoffs.

Another gross generalization from

--eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov
  resident cynic at the Rock of Ages Home for Retired Hackers:
  "Mailers?! HA!", "If my mail does not reach you, please accept my apology."
  {uunet,hplabs,ncar,decwrl,allegra,tektronix}!ames!aurora!eugene
  "Send mail, avoid follow-ups.  If enough, I'll summarize."