Path: utzoo!mnetor!uunet!lll-winken!lll-lcc!pyramid!voder!apple!bcase From: bcase@apple.UUCP (Brian Case) Newsgroups: comp.arch Subject: Re: Why is SPARC so slow? Message-ID: <6964@apple.UUCP> Date: 10 Dec 87 19:46:04 GMT References: <1078@quacky.UUCP> <8809@sgi.SGI.COM> Reply-To: bcase@apple.UUCP (Brian Case) Organization: Apple Computer Inc., Cupertino, USA Lines: 71 In article <8809@sgi.SGI.COM> baskett@baskett writes: [A lot of well-considered stuff about why the current and soon-to-be SPARC machines are/will be "so slow."] Forest, I agree completely with your reasoning on most points: slow loads and stores, slow branches, and (intertwined with the previous) only one bus cycled only once per cycle. >necessarily mean that the SPARC architecture has problems but I'd be >reluctant to accept SPARC as the basis for an Application Binary >Interface standard until I saw some evidence that high performance >implementations of SPARC are possible. I also agree that a standardization of this kind is not the right idea. But I believe it is possible to have a high-performance implementation of the SPARC. By high performance, I mean close enough to others in its class so as to make the difference not worth too much worry. Without large, on-chip caches, processors in the class about which we are speaking need chip-boundary bandwidth commensurate with on-chip data and instruction consumption rates. The lack of such bandwidth is the main, in my opinion, failing of the SPARC implementation. Notice that the Cypress version will be no better, if not worse (the floating-point bus is gone!). With regards to: >The R2000 >also has a single address bus and a single data bus but it can use them >twice per cycle. This means you can then split your cache into an >instruction cache and a data cache and make use of the extra bandwidth >by fetching an instruction every cycle in spite of loads and stores. ... >The separate instruction and data cache only run >at single cycle rates but they run a half cycle out of phase with each >other so it all works out. (Pretty slick, don't you think?) Yes, I do think it is pretty slick, but I also think this is a liability at clock speeds higher than 16 Mhz (and maybe even at 16MHz). I am sure, though, that MIPS has a plan to fix this problem. It sure seems like the way to go at 8 Mhz. Preventing bus crashes (i.e. meeting real-world timing constraints) can be problem. And: >Since MIPS and Sun seem to be producing these systems with similar >technologies at similar clock rates at similar times in history, these >differences in the cycle counts for our most favorite and popular >instructions seem to go a long way toward explaining why SPARC is so >slow. >Forest Baskett Thanks again for the analysis. However, I have one last point of contention. SUN is not MIPS in many respects, not the least of which is dedication to working with fabs and process technologies. SUN's business seems to be standards. In light of their constraints, I applaud their success in squeezing so much on a lowly gate array. I am sure one of their chief concerns was future ECL implementation. Sure the SPARC processor core (the stuff that actually does the work, minus register file) is virtually the same as anyone else's in function and in size (at least I think this is true), and with that in mind, the MIPS R2000, the Am2900, or whatever are all equally scalable (the other components on chips are largely implmenetations of integrated system, not architectural, functions; the Branch Target Cache of the 29000 is NOT an architectural feature.). But by choosing register windows (which lets them vary the number of registers, in window increments, for a given implementation) and a very simple definition otherwise, SUN simply did the best they could to make future implementation easy. However, I am a little dismayed (but happy for SUN) at the incredible backing SPARC is getting in the world of huge, influential conglomerates. I think the standardization of UNIX is good, but the standardization of processors is BAD. We should have a way to achieve processor independence without necessarily transporting source code (and in fact, I have an idea for this, but can't share it). We must not bet our future on a given processor! Comments?