Path: utzoo!utgpu!watmath!clyde!att!pacbell!ames!amdcad!rpw3
From: rpw3@amdcad.AMD.COM (Rob Warnock)
Newsgroups: comp.arch
Subject: Re: penalty for microcode
Message-ID: <23663@amdcad.AMD.COM>
Date: 2 Dec 88 10:15:06 GMT
References: <3290@ucdavis.ucdavis.edu> <28200241@urbsdc> <568@m3.mfci.UUCP>
Reply-To: rpw3@amdcad.UUCP (Rob Warnock)
Organization: [Consultant] San Mateo, CA
Lines: 91

+---------------
| Cycles are a bad thing! The universe is not discrete.
| All instructions should be self-timed, to precisely the length of
| time required to do the operation.
+---------------

Uh, ever look at a DEC PDP-10? (The original KA-10 CPU, circa 1967, based
on the earlier DEC PDP-6 [1965?].)

The internal implementation was exactly what you seem to be asking
for: Each "time-state" was a pulse regenerator which strobed the results
of the previous time-state to its target register, conditionally set
or cleared various bits to choose what to do next (generally enabling
operands onto the input busses of the ALU, or enabling result registers
to accept the output of the ALU), and conditionally fed its output pulse
into one of several delay lines. The specific delay line was chosen so
that when the pulse can out the other end (and got regenerated as the
next "time-state" pulse), the operation was done.

There were hardware "subroutines", for example, "memory read cycle".
For every potential "caller", there was one flip-flop. The caller pulse
set that "return" flop, and also went into an inclusive-OR gate with
all of the other "callers" pulses. At the bottom of the "subroutine",
the last time pulse was fanned out to a bunch of AND gates, one per caller,
whose other input was the "return-PC" ;-}   ...that is, the flip-flop that
had been set when the subroutine was called. The output wire of the selected
AND gate fed back to the continuation point of the caller.

There was no centralized "clock", nor were the delay lines bunched up in
some centrtal place and shared. There was exactly one delay line for each
event in the CPU.  In other words, the "PC" of the microengine was expressed
by which delay line the pulse was hiding in at any given time. (Think of
the micro-PC as being in unary, rather than binary!) The "clock ticks" were
those instants when the pulse could be seen between delay lines, as it got
regenerated, when it could "do things", and get routed around before hopping
into another delay line.

In fact, the micro-PC could travel between cabinets. The memories, you see,
were in external boxes (a whole 16k words each!), and during a memory-cycle
subroutine the timing pulse travelled out to the selected memory module and
ran the timing routines of the memory itself out there, and then travelled
back to the CPU in the form of the "memory done" pulse, only to be routed to
whichever part of the CPU which had called the memory-cycle subroutine.

It was simply *lovely* to 'scope! The flow-charts of the instruction
interpreter were virtually one-to-one with the delay lines and pulse
regenerators of the hardware. And since the micro-PC was unary, it was
simplicity itself to trigger an oscilloscope on any desired micro-step.

(Ah! Nostalgia...)

p.s.
A later version of this technique -- called "Chartware" by DEC, for the ease
from which you could go from the flowchart to the wiring diagram -- was used
in the PDP-14 industrial controller modules (a sort of Tinker Toy build-your-
own-CPU family -- there was never a general-purpose computer built out of it
that I knwo of). It did use a centralized clock, but still had a unary micro-PC,
and still left the timing of the operations to the various functional units.
It used a scheme similar to the HP-IB (a.k.a. IEEE-488) bus. There was a
common wired-OR "ready" line, and as the clock ticked the selected functional
units pulled down on "ready" (made if false) until their operation was complete.
The last one to let go allowed the clock to tick again, thus strobing the
results in the the destination, and at the same time clocking the "PC" from
its previous location to the next. The "PC" in this case was flip-flops instead
of delay lines, and only one "control" flip-flop in the system should be set
at a time.  (A unary PC, again.) It might be better to say that the "PC" was
a huge shift register, with loops and branches.

John Alderman (founder of Dig. Comm. Assoc.) and I developed a still simpler
version we called "synchronous chartware" (though it owed as much to the PDP-10
style as to the Chartware style), which was just a shift register (with loops
and branches) driven from a single system clock, wherein the operations were
timed by how many shift register stages (flip flops) lay between the one that
started the operation and the one that used the result. Still, operations could
be of different lengths, and even of variable lengths. (Long variable-length
delays were implemented with a "while loop" which waited for the completion
signal from the functional unit.) We found this design technique to be of
great utility for things like magtape and disk controllers. (Cheap fast ROMs
weren't yet available [circa 1971], nor was the now-common bit-slice microcode
controller, e.g. the Am2911.) The technique, though for most uses hopelessly
low-density by today's standards, still comes in handy for very-high-speed
state machines with a lot of multi-way transition edges.


Rob Warnock
Systems Architecture Consultant

UUCP:	  {amdcad,fortune,sun}!redwood!rpw3
ATTmail:  !rpw3
DDD:	  (415)572-2607
USPS:	  627 26th Ave, San Mateo, CA  94403