Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.1 6/24/83; site oakhill.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!allegra!mit-eddie!genrad!panda!talcott!harvard!seismo!ut-sally!oakhill!davet
From: davet@oakhill.UUCP (Dave Trissel)
Newsgroups: net.micro.68k,net.micro.16k
Subject: Re: Re: PDP11s vs the micros
Message-ID: <493@oakhill.UUCP>
Date: Mon, 19-Aug-85 01:04:46 EDT
Article-I.D.: oakhill.493
Posted: Mon Aug 19 01:04:46 1985
Date-Received: Sat, 24-Aug-85 00:21:11 EDT
References: <1617@hao.UUCP> <847@mako.UUCP> <2422@sun.uucp> <2607@sun.uucp> <492@oakhill.UUCP> <489@talcott.UUCP>
Reply-To: davet@oakhill.UUCP (Dave Trissel)
Organization: Motorola Inc. Austin, Tx
Lines: 82
Xref: watmath net.micro.68k:1070 net.micro.16k:369

In article <489@talcott.UUCP> tmb@talcott.UUCP (Thomas M. Breuel) writes:
>|
>|		MOVE   something to memory
>|		SHIFT  Reg by immediate
>|		MUL    Reg to Reg
>|		etc.
>|
>|The MC68020 executes the MOVE and the bus unit schedules a write cycle.  Then
>|the execution unit/pipeline happily continues executing the instruction
>|stream without regard to the final status of the write.  Even if the write
>|fails (bus errors) there could be several more instructions executed (in fact
>|any amount until one is hit which requires the bus again.)
>
>I find this argument amusing. You just generated a page fault.  That
>means context switch, disk driver, housekeeping, ... .  Compared to all
>this, the overhead of your instruction re-start is going to be
>negligible no matter how inefficiently you do it.

You are not getting the point - maybe I did not make it that clear. Most of
the time instructions execute without a page fault interrupt. The problem is
that microprocessors which backup and redo instructions must ALWAYS halt
when a write is done onto the bus because there just may possibly be a bus
fault even though there almost always isn't.

The '020 pipeline only halts for memory operand reads, changes in supervisor
state or locked bus cycle instructions like TAS and CAS.                    .
Probably the '020 bus averages somewhere around 30 percent write type cycles.
This means there are many chances for this overlap to increase performance.

The overlap the '020 gains is dependent on how far along the pipeline can
crunch before another bus cycle is needed.  WIth a 256 byte cache and large
number of work registers (15) there is a large percentage of the time that
one, two or more instructions can be executed while a write is being done.
Even if the next instruction requires an operand read or write from the bus
and therefore stops the pipe there, at least an overlap of instruction
decoding and queueing of another bus cycle is accomplished before the halt.

>In addition, I tend not to believe that what you gain in cache
>performance makes up for the time required to push a lot onto the
>stack.

For the average one to three million instructions the '020 may be doing each
second the 24 extra longwords saved and stored over a bus
fault (which occurs anywhere from zero to let's say 10 times a second)
doesn't really make any difference.

>Cache performance is going to increase in the way you describe
>it on writes only anyhow, since if you get a page fault on a read
>(which is probably the more common case) you have to wait for the
>page to be brought in no matter what.

Maybe I didn't make it clear that I was getting at the majority of the
time that you don't have a bus fault.  Yes, any operand read from memory
will lock the pipe since obviously it cannot continue regardless of whether
a bus fault is going to occur or not.

>Finally, the thought of having a page fault pending and the CPU
>happily executing more instructions before the fault is serviced
>somehow worries me. It may play havoc with simple-minded process
>synchronisation techniques.
>

There are some side-effects but they don't occur for synchronisation since,
as I mentioned earlier, for semaphore and lock operations the pipe does not
forge ahead.  The side-effects are subtle and relate mostly to exception
handling and asynchronous exit invocations by the OS. That's the small penalty
you pay for getting higher performance.  Any advanced pipeline mechanism is
going to be executing ahead whether you're on the '020 or a supercomputer.

>Altogether, I don't buy that the 68020 gets 'amazing performance'
>because it pushes of the order of 20 longwords onto the stack every
>time it gets a page fault.

The way to tell is to simply look at some assembly code and follow the
instructions after operand writes.  A pretty good estimate can be gotten
from this method.  And remember, even if the very next instruction after a
write forces a bus access the '020 pipeline can progress up to the point of
that bus cycle request before it halts.

  -- Dave Trissel
     Motorola Semiconductor           {seismo,ihnp4}!ut-sally!oakhill!davet
     Austin, Texas