Path: utzoo!attcan!uunet!lll-winken!lll-lcc!ames!ubvax!vsi1!wyse!mips!earl
From: earl@mips.COM (Earl Killian)
Newsgroups: comp.arch
Subject: Re: RISC machines and scoreboarding
Message-ID: <2547@wright.mips.COM>
Date: 6 Jul 88 03:46:35 GMT
References: <1362@oakhill.UUCP>
Lines: 44
In-reply-to: mpaton@oakhill.UUCP's message of 1 Jul 88 20:57:15 GMT

In article <1362@oakhill.UUCP> mpaton@oakhill.UUCP (Michael Paton) writes:

mp> The MIPS processors do not snoop their bus and therefore leave
mp> memory coherence to the write-through mechanism.  In
mp> multiprocessing applications, the memory bus can become saturated
mp> with a few processors on the bus (~4?).  Write-back caches cause a
mp> sufficient reduction in memory bus traffic to allow twice the
mp> number of processing ensembles to utilize the bus.

Write through to a 32b bus does indeed limit you to about 4 processors
(the max supported by the 88000).  If you want more than that, use a
64b bus (~8 processors), or use a secondary cache (which has its own
benefits) and make it write-back.  Both approaches have already been
implemented in R2000-based systems.

When you build a R3000-based MP, you don't have to limit the amount of
cache per processor (unlike, e.g., the 88000, which allows only 16KB
per processor in a 4-processor system).  If you're building an MP,
presumably you're interested in performance, so it seems strange to
cripple each processor with a small cache.


mp> ...in particular, we attempted to beat on the SRAM technology less
mp> hard.  If we are correct, this should be more scalable in the
mp> future (read ECL/GaAs) as off-chip delays approach .4 cycle.

It's hard to envision ECL output drive-times approaching 0.4 clocks;
modern ECL parts (e.g. Sony's CXB1100Q 3-NOR) can receive signals from
off-chip, do the NOR function, and drive off chip again (using 100K
levels) in 390 picoseconds. {EDN 6-23-88, p. 97}  So (0.4 * Tclock) =
390ps giving Tclock = 975 picoseconds (1.03 GHz) !!!  A more
"believable" clock period might be 4ns (a la Cray-2), in which case
the drive-off time is 0.1 clock.... a smaller fraction of the cycle
than you quote for your CMOS design.


mp> Alternatively, our design costs less to manufacture in high volume
mp> and allow less costly SRAM parts than the MIPS Co. design.

An R3000 may require faster SRAMs, but these are multiple-sourced,
off-the-shelf, commodity devices, and the price of 30 16Kx4 20ns SRAMs
is actually lower than that of eight 88200s (sole-sourced).
-- 
UUCP: {ames,decwrl,prls,pyramid}!mips!earl
USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086