Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!pyramid!voder!apple!bcase
From: bcase@apple.UUCP (Brian Case)
Newsgroups: comp.arch
Subject: Re: Horizontal pipelining
Message-ID: <6832@apple.UUCP>
Date: Wed, 25-Nov-87 12:18:23 EST
Article-I.D.: apple.6832
Posted: Wed Nov 25 12:18:23 1987
Date-Received: Sun, 29-Nov-87 02:10:42 EST
References: <201@PT.CS.CMU.EDU> <388@sdcjove.CAM.UNISYS.COM> <988@edge.UUCP> <958@winchester.UUCP> <11444@sci.UUCP>
Reply-To: bcase@apple.UUCP (Brian Case)
Organization: Apple Computer Inc., Cupertino, USA
Lines: 25

In article <11444@sci.UUCP> kenm@sci.UUCP (Ken McElvain) writes:

   [Seems to be talking about something like the PPUs of the old Cybers.[

>I agree that cache [or TLB] hit rates will almost certainly go down.
>However, miss penalties will also drop.  It is quite possible that
>a cache fill could happen in the time it takes for the barrel
>to turn around.
>A ten stage barrel processor running at 25Mhz would easily allow
>over 300ns for a cache fill before it cost another instruction slot.
>The performance limit here is likely to be the bandwidth of the
>cache fill mechanism.

Yes, but if a fair fraction of the processors in the barrel are causing
misses (say 3 or so) then your memory system will have to be multiported
(or very fast, in which case why not just one fast processor?).
This doesn't invalidate what you are saying, just an observation.

>Another issue is the instruction set.  It's not clear that you want
>a bunch of registers.  It may be much better to do more of a memory
>to memory architecture.  (I would recommend keeping some base registers).
>A number of other areas also have some surprising tradeoffs.

I fail to see why memory-memory would be better than registers.  Can
you give some proof?  Also, what other areas have surprising tradeoffs,
and what are they?