Path: utzoo!attcan!uunet!wyse!mips!earl From: earl@mips.COM (Earl Killian) Newsgroups: comp.arch Subject: Re: RISC machines and scoreboarding Message-ID: <2483@gumby.mips.COM> Date: 1 Jul 88 02:14:13 GMT References: <1082@nud.UUCP> <2438@winchester.mips.COM> <1098@nud.UUCP> <2459@gumby.mips.COM> <1110@nud.UUCP> Lines: 52 In-reply-to: tom@nud.UUCP's message of 29 Jun 88 18:23:09 GMT In article <1110@nud.UUCP> tom@nud.UUCP (Tom Armistead) writes: TA# I didn't design the 88100 (I am not a chip designer at all). TA# However, I doubt that any speed difference is due to a tradeoff made TA# to provide FP pipelining. Whether the FP unit was pipelined or not, I TA# think the FP latencies would still be the same. Do you have some reason TA# to think that not pipelining can decrease FP latency? If no increase TA# in latency is incurred to provide pipelining, then providing it can only TA# help performance. ......................................................................... The answer to this is, yes, I believe the lack of pipelining does partially explain why the R3010 has smaller latency than the pipelined 88k. Do you have an alternate suggestion? For example, non-pipelined functional units can reuse hardware resources. If it takes 2 shifts to do an fp add, a non-pipelined adder can have just one shifter but a pipelined adder needs 2 shifters. This is especially nasty for things like the divider which reuse some hardware entities (e.g. quotient digit lookup table) many, many times. An N-stage-pipelined divider would have needed N of these lookup tables. By avoiding the need for extra hardware, the non-pipelined implementation frees up valuable chip area. In the R3010 we used that area for (i) 64-bits x 16-words x 4-ports register file; (ii) separate divider independent of multiplier; (iii) delay-optimized physical layout and hand tweaked circuit design employing LARGE transistors; (iv) control unit's resource scheduler that permits 4 instructions to execute concurrently. ......................................................................... TA# Would you post the R3010 FP stats? Not that I doubt you, I'm just TA# interested in how each particular R3010 FP instruction compares to the TA# equivalent 88K instruction. ......................................................................... 88100 R3010 cycle Operation cycles cycles ratio ====================================== sp add 5 2 2.5 dp add 6 2 3.0 sp mul 5 4 1.2 dp mul 10 5 2.0 sp div 30 12 2.5 dp div 60 19 3.2 sp convert 5 1-3 1.7-5 dp convert 6 1-3 2-6 abs/neg/mov 5? 1 ......................................................................... -- UUCP: {ames,decwrl,prls,pyramid}!mips!earl USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086