Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.1 6/24/83; site oakhill.UUCP
Path: utzoo!linus!philabs!prls!amdimage!amdcad!amd!vecpyr!lll-crg!seismo!ut-sally!oakhill!davet
From: davet@oakhill.UUCP (Dave Trissel)
Newsgroups: net.micro.68k,net.micro.16k
Subject: Re: Re: PDP11s vs the micros
Message-ID: <492@oakhill.UUCP>
Date: Fri, 16-Aug-85 21:46:06 EDT
Article-I.D.: oakhill.492
Posted: Fri Aug 16 21:46:06 1985
Date-Received: Tue, 20-Aug-85 06:35:39 EDT
References: <1617@hao.UUCP> <847@mako.UUCP> <2422@sun.uucp> <2607@sun.uucp> <5874@utzoo.UUCP>
Reply-To: davet@oakhill.UUCP (Dave Trissel)
Organization: Motorola Inc. Austin, Tx
Lines: 64
Xref: linus net.micro.68k:1005 net.micro.16k:336

In article <5874@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:

>> > Any particular reason to do this rather than restart the instruction from
>> > where it left off?
>> 
>
>Motorola obviously :-) views its 68020 line primarily as a way to sell
>memory chips.  Between the incredible pile of trash it heaves onto the
>stack when you take a page fault, and the huge internal state of the
>68881 FPU that has to be shoveled in and out every time you context-switch
>(what's the betting Motorola's next FPU chip has DMA? :-), the memory
>market is clearly what they're aiming at.  That and the cache market.

What you don't realize is the amazing performance we can get because of the
"incredible pile of trash" we heave on the stack.

The crux of the problem is that chips which have to back-up and redo
instructions pay a nasty penalty in pipeline design.  Consider the following
generic microprocessor code sequence:

		MOVE   something to memory
		SHIFT  Reg by immediate
		MUL    Reg to Reg
		etc.

The MC68020 executes the MOVE and the bus unit schedules a write cycle.  Then
the execution unit/pipeline happily continues executing the instruction
stream without regard to the final status of the write.  Even if the write
fails (bus errors) there could be several more instructions executed (in fact
any amount until one is hit which requires the bus again.)

Contrast this to chips which redo instructions.  They must soon stop dead in
their tracks until the write cycle has been verified as properly done. Other-
wise they would alter the programmers model and invalidate retry.

Another thing to consider, is that the total operating system code executed
to continue from a page fault (assign an unused page frame and map it in the
MMU, or block the process and schedule a swapped out page to be read) makes
the overhead of writing the internal 020 machine state seem insignificant.
The stack save equates to about the same overhead as executing 12
instructions.

Concerning floating-point state saves we gave a lot of thought to minimizing
latency times.  What we did was give an indication to the OS of whether any
of the FP registers had been used.  If not, the OS could skip the context
save and restore completely.

Intel has a novel approach on their 8087 and 2087 where they let the process
context switch without saving FP state.  If another process tries using
floating-point an interrupt occurs letting the OS then swap context only
when necessary.  The trouble with this technique is that all it takes is
for one out of every 20 or so context switches to require a re-save and you
start losing overall processor time over just saving it unconditionally.
At worse, if you have several processes constantly sharing the FP chip then
you have essentially forced a complete extra interrupt exception invocation
for every change in context - a massive penalty.

One solution would be to keep multiple contexts on chip.  Ah - if we only
had next decade's technology today.  Lot's of exciting things are going to
happen once we can get millions of gates on a single chip running at 70 mHz.

 --  Dave Trissel
     Motorola Semiconductor Inc.
     Austin, Texas              {seismo,ihnp4}!ut-sally!oakhill!davet