Path: utzoo!utgpu!watmath!clyde!att!pacbell!ames!amdcad!rpw3 From: rpw3@amdcad.AMD.COM (Rob Warnock) Newsgroups: comp.arch Subject: Re: penalty for microcode Message-ID: <23663@amdcad.AMD.COM> Date: 2 Dec 88 10:15:06 GMT References: <3290@ucdavis.ucdavis.edu> <28200241@urbsdc> <568@m3.mfci.UUCP> Reply-To: rpw3@amdcad.UUCP (Rob Warnock) Organization: [Consultant] San Mateo, CA Lines: 91 +--------------- | Cycles are a bad thing! The universe is not discrete. | All instructions should be self-timed, to precisely the length of | time required to do the operation. +--------------- Uh, ever look at a DEC PDP-10? (The original KA-10 CPU, circa 1967, based on the earlier DEC PDP-6 [1965?].) The internal implementation was exactly what you seem to be asking for: Each "time-state" was a pulse regenerator which strobed the results of the previous time-state to its target register, conditionally set or cleared various bits to choose what to do next (generally enabling operands onto the input busses of the ALU, or enabling result registers to accept the output of the ALU), and conditionally fed its output pulse into one of several delay lines. The specific delay line was chosen so that when the pulse can out the other end (and got regenerated as the next "time-state" pulse), the operation was done. There were hardware "subroutines", for example, "memory read cycle". For every potential "caller", there was one flip-flop. The caller pulse set that "return" flop, and also went into an inclusive-OR gate with all of the other "callers" pulses. At the bottom of the "subroutine", the last time pulse was fanned out to a bunch of AND gates, one per caller, whose other input was the "return-PC" ;-} ...that is, the flip-flop that had been set when the subroutine was called. The output wire of the selected AND gate fed back to the continuation point of the caller. There was no centralized "clock", nor were the delay lines bunched up in some centrtal place and shared. There was exactly one delay line for each event in the CPU. In other words, the "PC" of the microengine was expressed by which delay line the pulse was hiding in at any given time. (Think of the micro-PC as being in unary, rather than binary!) The "clock ticks" were those instants when the pulse could be seen between delay lines, as it got regenerated, when it could "do things", and get routed around before hopping into another delay line. In fact, the micro-PC could travel between cabinets. The memories, you see, were in external boxes (a whole 16k words each!), and during a memory-cycle subroutine the timing pulse travelled out to the selected memory module and ran the timing routines of the memory itself out there, and then travelled back to the CPU in the form of the "memory done" pulse, only to be routed to whichever part of the CPU which had called the memory-cycle subroutine. It was simply *lovely* to 'scope! The flow-charts of the instruction interpreter were virtually one-to-one with the delay lines and pulse regenerators of the hardware. And since the micro-PC was unary, it was simplicity itself to trigger an oscilloscope on any desired micro-step. (Ah! Nostalgia...) p.s. A later version of this technique -- called "Chartware" by DEC, for the ease from which you could go from the flowchart to the wiring diagram -- was used in the PDP-14 industrial controller modules (a sort of Tinker Toy build-your- own-CPU family -- there was never a general-purpose computer built out of it that I knwo of). It did use a centralized clock, but still had a unary micro-PC, and still left the timing of the operations to the various functional units. It used a scheme similar to the HP-IB (a.k.a. IEEE-488) bus. There was a common wired-OR "ready" line, and as the clock ticked the selected functional units pulled down on "ready" (made if false) until their operation was complete. The last one to let go allowed the clock to tick again, thus strobing the results in the the destination, and at the same time clocking the "PC" from its previous location to the next. The "PC" in this case was flip-flops instead of delay lines, and only one "control" flip-flop in the system should be set at a time. (A unary PC, again.) It might be better to say that the "PC" was a huge shift register, with loops and branches. John Alderman (founder of Dig. Comm. Assoc.) and I developed a still simpler version we called "synchronous chartware" (though it owed as much to the PDP-10 style as to the Chartware style), which was just a shift register (with loops and branches) driven from a single system clock, wherein the operations were timed by how many shift register stages (flip flops) lay between the one that started the operation and the one that used the result. Still, operations could be of different lengths, and even of variable lengths. (Long variable-length delays were implemented with a "while loop" which waited for the completion signal from the functional unit.) We found this design technique to be of great utility for things like magtape and disk controllers. (Cheap fast ROMs weren't yet available [circa 1971], nor was the now-common bit-slice microcode controller, e.g. the Am2911.) The technique, though for most uses hopelessly low-density by today's standards, still comes in handy for very-high-speed state machines with a lot of multi-way transition edges. Rob Warnock Systems Architecture Consultant UUCP: {amdcad,fortune,sun}!redwood!rpw3 ATTmail: !rpw3 DDD: (415)572-2607 USPS: 627 26th Ave, San Mateo, CA 94403