Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!cornell!batcomputer!itsgw!steinmetz!nuke!oconnor From: oconnor@nuke.steinmetz (Dennis M. O'Connor) Newsgroups: comp.arch Subject: Re: RISC machines and scoreboarding Message-ID: <11474@steinmetz.ge.com> Date: 7 Jul 88 17:46:47 GMT References: <1362@oakhill.UUCP> Sender: news@steinmetz.ge.com Reply-To: oconnor%sungod@steinmetz.UUCP Organization: GE Corporate R&D Center Lines: 119 An article by earl@mips.COM (Earl Killian) says: ] In article <1362@oakhill.UUCP> mpaton@oakhill.UUCP (Michael Paton) writes: ] mp> [...] one might want to ask the folks at MIPS Co. this ] mp> question: ] mp> Why did you multiplex your memory bus? ] mp> Consider factors related to power dissipation. The current M88100 ] mp> processors are running between .25 and .5 watts @ 20Mhz. If we ] mp> were to multiplex the 2 memory ports as did MIPS Co., our worst ] mp> case power consumption would be 4 watts. The problem is that the ] mp> AC power dissipation is given by: ] mp> 2*C*V**2*F*N, ] mp> N = number of pins which make transitions. ] mp> The addresses from the instruction port are very highly correlated ] mp> (about 1.4 bits per cycle change). The addresses from the data ] mp> port are only partially correlated (less so with better ] mp> compilers). Mixing these two streams results in almost ] mp> uncorrelated address streams and therefore a bigger N, resulting ] mp> in more power dissipation. ] ] Your observation is interesting. But I hope the 88100's packaging ] isn't based on an _average_case_ analysis. Malicious programmers (who ] write, for example, a branch-to-branch infinite loop) can't melt the ] 88100, can they? So the worse case is really 4 watts, right? What ] does the datasheet say? Seems to me a branch-to-branch i-loop would execute entirely out of an onboard I-Cache if there is one. But, if not, a 5V 20MHz signal driving, say, 30pf dissipates 30mW. Multiply by 32 and you get about 1 Watt power increases. During this branch-to-branch, the data address bus would, of course, be idle. Nastier would be something like : 7FFFFFFF : STORE-0-TO-H55555555 80000000 : BRANCH-TO-HFFFFFFFF . . . FFFFFFFF : STORE-FFFFFFFF-TO-HAAAAAAAA 00000000 : BRANCH-TO-H7FFFFFFF This sequence changes, on average, 64 output-pins per cycle : 96 during the cycles with the stores, 32 on the branches. Assumimg the buses do the most conservative thing ( float ) during their non-driven periods. This produces 3 Watts power dissipation. Unless, of course, there's an on-board I-Cache, in which case only 32 pins per cycle, average ( 64 in the store cycles, 0 in the branch cycles ) for a dissipation of 1 Watt. ] Also the cpu subsystem power consumption isn't much reduced by your ] observation, since it only applies to the address outputs on the cpu, ] which is a small fraction of the total pins in the system (the I-cache ] SRAM pins change every cycle even if the address doesn't change much). How to handle an extra Watt of power dissipation on a chip is a more difficult question than how to handle it in a board or hybrid, or how to add it to your power supply. ] Hmm, I guess the address outputs in an 88000 system are a significant ] fraction of the pins, unlike the R3000, because you send the full 32b ] virtual address off chip, instead of only 18b as in the R3000. That ] wastes as much power as you save on average from the demultiplexing, ] doesn't it? A 16-bit processor would consume less power than a 32-bit processor, all else being equal. People build the 32-bit ones anyway. But I digress. So, what can you address with 18 bits ? 256KWords ? Not enough. I'm kinda surprised if MIPS's address bus is 18 bits, unless they have some kind of clever external mapping scheme. ] mp> Notice that the pin counts on the two packages are not that much ] mp> different (144 for the R3000 vs. 180 for the MC88100) and neither ] mp> are the power/grounds pin counts (30 for the R3000 vs. 36 for the ] mp> MC88100). ] MIPS designed its first RISC chip (the R2000) 4 years ago, and at that ] time the difference between 180 pins and 144 pins was significant. ] The i80386, designed at roughly the same time, actually has fewer ] pins -- 132. The 2nd generation (R3000) could have changed the ] interface, but we felt it was desirable to provide an upgrade path for ] R2000 designs. Big-pin-count packages are a neccesary evil. They cost more, take up more real estate, have lower packaging yeilds, are more difficult to reliably attach to boards, have more problems from thermal- expansion-coefficients, et cetera. PGAs can't be surface mounted, which means you can't put them on both sides of the board and multi-layer routing gets painful. Fine-pitch pins have their own problems. The RPM40 uses a 132 pin LCC. It only outputs new instruction addresses on branches, traps, et cetera. It uses the same pins for instruction addresses as for operand addresses, even though the data buses for each are separate. The instruction memory system is therefor neccesarily a pipelined look-ahead system much like those used on vector processors for vector fetc ( after all, instruction memory is essentially a set of vectors, one vector per basic block ). We felt we had to go this route, no choice : at the time ( late '85 ) the 132-pin LCC was the biggest thing we knew of that was fully qualified at 40MHz, and trying to multiplex the instruction and operand addresses at 80MHz was considered beyond CMOS capability. ] Back when RISC vs. CISC was still an issue, people complained RISC ] required more bandwidth. We just provided the necessary bandwidth. ] When we need more, we'll add more. If that means modifying the ] R3000-style cache interface, so what? ] -- ] UUCP: {ames,decwrl,prls,pyramid}!mips!earl ] USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086 "RISC vs. CISC" is, I think, not so much an absolute issue as an issue of which fits a particular technology better. When memory speeds are up there with adder speeds, RISC wins. When adders are much faster than memory, perhaps CISC does. Different horses for different courses, I think. Woops, gotta go. -- Dennis O'Connor oconnor%sungod@steinmetz.UUCP ARPA: OCONNORDM@ge-crd.arpa "Never confuse USENET with something that matters, like PIZZA."