Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.1 6/24/83; site fortune.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!mhuxl!ihnp4!fortune!rpw3
From: rpw3@fortune.UUCP
Newsgroups: net.arch
Subject: Re: RISC perspective - (nf)
Message-ID: <2736@fortune.UUCP>
Date: Sat, 10-Mar-84 23:02:23 EST
Article-I.D.: fortune.2736
Posted: Sat Mar 10 23:02:23 1984
Date-Received: Sun, 11-Mar-84 07:06:49 EST
Sender: notes@fortune.UUCP
Organization: Fortune Systems, Redwood City, CA
Lines: 55

#R:orstcs:-280000100:fortune:16500005:000:2889
fortune!rpw3    Mar 10 19:07:00 1984

And while all you guys are busy stuffing things in registers, let us
not forget that process-switch time gets worse the more "short term"
state there is to save/restore. Especially when using an operating
system model of the Concurrent Euclid sort, or any other system in
which the "interrupt handlers" are full processes in their own right,
saving and restoring ("swapping") the enormous amount of state implied
by all those registers can result in a net reduction of throughput.
Properly designed non-write-through cache does not suffer from that
problem, since many pieces of many interrupt processes can come to live
in the cache.

Small aside:	I really wish that the Motorola 68000 had only 8
		general registers rather than 8 addr + 8 data. For any
		instruction set of the VAX/68k/16k style, 16 registers
		is too many if you do a lot of process switching.

If the "registers" are implemented as main memory locations which addressed
with short addresses relative to some process base, and the identification
of "register" addresses is fast enough so that cached "registers" compete
with hardware register addressing, then the number of registers should be
set solely by the number of bits we wish to use for them in the instructions.
I still feel that even in this case the number of specially named "registers"
(short addresses) will be quite small (32 or less).

An interesting example to look at for register naming is the old RCA-1602 (?)
4-bit microcontroller chip. It had a few registers (4?), but the neat thing
was that the actual scratchpad cell that was used for a given register was
run-time settable. To do a "process switch" you said "Use loc.234 for the N
register, 107 for the T reg, and (last) loc.025 for the PC".

Now, following that through to bigger machines, we should look at decoupling
the NAMING of "registers" (which is restricted by the desire to keep the
instruction stream efficient) from the LOCATION of registers (which should
be a large address space).  The context-switch problem then becomes changing
the name-location map quickly, like the TI 9900.  Additionally, with smart
enough compilers (but don't ask me to program in assembly for THIS machine!)
you can change the assignment of registers to locations as needed throughout
a given module or even subroutine, to balance the "register working set".
("Quick Martha, call the nut house! He's gone and re-invented register
page-maps!") If the register page-map was actually of the translation-
lookaside-cache style (with the same number of entries as registers), and
was onboard the CPU chip, it could probably compete in performance with
fixed hardwired register sets.

Just a little something to jog the discussion along...

Rob Warnock

UUCP:	{sri-unix,amd70,hpda,harpo,ihnp4,allegra}!fortune!rpw3
DDD:	(415)595-8444
USPS:	Fortune Systems Corp, 101 Twin Dolphin Drive, Redwood City, CA 94065