Path: utzoo!attcan!uunet!wyse!mips!earl
From: earl@mips.COM (Earl Killian)
Newsgroups: comp.arch
Subject: Re: RISC machines and scoreboarding
Message-ID: <2483@gumby.mips.COM>
Date: 1 Jul 88 02:14:13 GMT
References: <1082@nud.UUCP> <2438@winchester.mips.COM> <1098@nud.UUCP> <2459@gumby.mips.COM> <1110@nud.UUCP>
Lines: 52
In-reply-to: tom@nud.UUCP's message of 29 Jun 88 18:23:09 GMT

In article <1110@nud.UUCP> tom@nud.UUCP (Tom Armistead) writes:

TA#     I didn't design the 88100 (I am not a chip designer at all).
TA# However, I doubt that any speed difference is due to a tradeoff made
TA# to provide FP pipelining.  Whether the FP unit was pipelined or not, I
TA# think the FP latencies would still be the same.  Do you have some reason
TA# to think that not pipelining can decrease FP latency?  If no increase
TA# in latency is incurred to provide pipelining, then providing it can only
TA# help performance.

.........................................................................
The answer to this is, yes, I believe the lack of pipelining does
partially explain why the R3010 has smaller latency than the pipelined
88k.  Do you have an alternate suggestion?

For example, non-pipelined functional units can reuse hardware
resources.  If it takes 2 shifts to do an fp add, a non-pipelined adder
can have just one shifter but a pipelined adder needs 2 shifters.
This is especially nasty for things like the divider which reuse some
hardware entities (e.g. quotient digit lookup table) many, many times.
An N-stage-pipelined divider would have needed N of these lookup tables.

By avoiding the need for extra hardware, the non-pipelined implementation
frees up valuable chip area.  In the R3010 we used that area for
(i) 64-bits x 16-words x 4-ports register file; (ii) separate divider
independent of multiplier; (iii) delay-optimized physical layout and hand
tweaked circuit design employing LARGE transistors; (iv) control unit's
resource scheduler that permits 4 instructions to execute concurrently.
.........................................................................


TA#     Would you post the R3010 FP stats?  Not that I doubt you, I'm just
TA# interested in how each particular R3010 FP instruction compares to the
TA# equivalent 88K instruction.

.........................................................................
		88100	R3010	cycle
Operation	cycles  cycles	ratio
======================================
sp add		  5	  2	  2.5
dp add		  6	  2	  3.0
sp mul		  5	  4	  1.2
dp mul		 10	  5	  2.0
sp div		 30	 12	  2.5
dp div		 60	 19	  3.2
sp convert	  5	1-3	1.7-5
dp convert	  6	1-3	  2-6
abs/neg/mov	  5?	  1
.........................................................................
-- 
UUCP: {ames,decwrl,prls,pyramid}!mips!earl
USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086