Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!pyramid!voder!apple!bcase
From: bcase@apple.UUCP (Brian Case)
Newsgroups: comp.arch
Subject: Re: Why is SPARC so slow?
Message-ID: <6964@apple.UUCP>
Date: 10 Dec 87 19:46:04 GMT
References: <1078@quacky.UUCP> <8809@sgi.SGI.COM>
Reply-To: bcase@apple.UUCP (Brian Case)
Organization: Apple Computer Inc., Cupertino, USA
Lines: 71

In article <8809@sgi.SGI.COM> baskett@baskett writes:
    [A lot of well-considered stuff about why the current and soon-to-be
     SPARC machines are/will be "so slow."]

Forest, I agree completely with your reasoning on most points:  slow
loads and stores, slow branches, and (intertwined with the previous)
only one bus cycled only once per cycle.

>necessarily mean that the SPARC architecture has problems but I'd be
>reluctant to accept SPARC as the basis for an Application Binary 
>Interface standard until I saw some evidence that high performance
>implementations of SPARC are possible.

I also agree that a standardization of this kind is not the right idea.
But I believe it is possible to have a high-performance implementation
of the SPARC.  By high performance, I mean close enough to others in its
class so as to make the difference not worth too much worry.  Without large,
on-chip caches, processors in the class about which we are speaking need
chip-boundary bandwidth commensurate with on-chip data and instruction
consumption rates.  The lack of such bandwidth is the main, in my opinion,
failing of the SPARC implementation.  Notice that the Cypress version will
be no better, if not worse (the floating-point bus is gone!).

With regards to:

>The R2000
>also has a single address bus and a single data bus but it can use them
>twice per cycle.  This means you can then split your cache into an
>instruction cache and a data cache and make use of the extra bandwidth
>by fetching an instruction every cycle in spite of loads and stores.
...
>The separate instruction and data cache only run
>at single cycle rates but they run a half cycle out of phase with each
>other so it all works out.  (Pretty slick, don't you think?)

Yes, I do think it is pretty slick, but I also think this is a liability
at clock speeds higher than 16 Mhz (and maybe even at 16MHz).  I am sure,
though, that MIPS has a plan to fix this problem.  It sure seems like the
way to go at 8 Mhz.  Preventing bus crashes (i.e. meeting real-world
timing constraints) can be problem.

And:

>Since MIPS and Sun seem to be producing these systems with similar
>technologies at similar clock rates at similar times in history, these
>differences in the cycle counts for our most favorite and popular
>instructions seem to go a long way toward explaining why SPARC is so
>slow.
>Forest Baskett

Thanks again for the analysis.  However, I have one last point of contention.
SUN is not MIPS in many respects, not the least of which is dedication to
working with fabs and process technologies.  SUN's business seems to be
standards.  In light of their constraints, I applaud their success in
squeezing so much on a lowly gate array.  I am sure one of their chief
concerns was future ECL implementation.  Sure the SPARC processor core
(the stuff that actually does the work, minus register file) is virtually
the same as anyone else's in function and in size (at least I think this
is true), and with that in mind, the MIPS R2000, the Am2900, or whatever
are all equally scalable (the other components on chips are largely
implmenetations of integrated system, not architectural, functions; the
Branch Target Cache of the 29000 is NOT an architectural feature.).  But
by choosing register windows (which lets them vary the number of registers,
in window increments, for a given implementation) and a very simple
definition otherwise, SUN simply did the best they could to make future
implementation easy.  However, I am a little dismayed (but happy for SUN)
at the incredible backing SPARC is getting in the world of huge, influential
conglomerates.  I think the standardization of UNIX is good, but the
standardization of processors is BAD.  We should have a way to achieve
processor independence without necessarily transporting source code (and
in fact, I have an idea for this, but can't share it).  We must not bet our
future on a given processor!  Comments?