Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.1 6/24/83; site oakhill.UUCP
Path: utzoo!watmath!clyde!burl!ulysses!allegra!mit-eddie!genrad!panda!talcott!harvard!seismo!ut-sally!oakhill!davet
From: davet@oakhill.UUCP (Dave Trissel)
Newsgroups: net.micro.68k,net.micro.16k
Subject: Re: Re: PDP11s vs the micros
Message-ID: <501@oakhill.UUCP>
Date: Fri, 23-Aug-85 23:05:46 EDT
Article-I.D.: oakhill.501
Posted: Fri Aug 23 23:05:46 1985
Date-Received: Mon, 26-Aug-85 01:30:42 EDT
References: <1617@hao.UUCP> <847@mako.UUCP> <2422@sun.uucp> <5890@utzoo.UUCP>
Reply-To: davet@oakhill.UUCP (Dave Trissel)
Organization: Motorola Inc. Austin, Tx
Lines: 53
Xref: watmath net.micro.68k:1078 net.micro.16k:372

In article <5890@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:


>In other words, all you need is 12 contiguous non-memory-referencing
>instructions and the 68020's stack puke will actually break even! This is
>is stretching it a bit, since on the pdp11 typically every third or fourth
>instruction did some sort of memory reference; I doubt that the 68000 family
>does much better.        ...
>

It's clear that you still don't understand what I'm getting at.  I'll try one
more time.

The '020 averages between 2 to 5 million external bus operations per second
and that doesn't count the internal bus cycles run from the on-chip cache.
The overhead for the "puking" as you call it is 46 bus cycles (23 each way.)

If you insist that those 46 bus cycles are significant against 2 to 5 million
bus cycles then there's nothing more I can say. .

>On the other hand, I'm glad to hear that Motorola did have the sense to
>put a floating-point-used flag in the FPU, so at least you don't have to
>shovel 300 bytes of state around unnecessarily.

Your use of the word "state" is ambiguous.  If you mean internal chip context
save state then Mike Cruess has already brought that into perspective.  If you
mean the user register context size then that's 208 bytes of state and our
analysis at the time for not including DMA (or more correctly bus mastership
capability) can be gone into.

>> Intel has a novel approach on their 8087 and 2087 where they let the process
>> context switch without saving FP state.  If another process tries using
>> floating-point an interrupt occurs letting the OS then swap context only
>> when necessary.  The trouble with this technique is that all it takes is
>> for one out of every 20 or so context switches to require a re-save and you
>> start losing overall processor time over just saving it unconditionally.

I have since figured the 286/287 overhead out and it is somewhat less than
what I stated.  It takes 209 clocks to determine that no other task has used
the 287 in the meantime and that there is no state to reload.  If the
exception routine detects the 287 now has some other task's registers then
the exception routines execution takes 765 clocks.

It takes 535 clocks to unconditionally save and restore the state.  However,
the 286 is not smart enough to handle the 287 with it's task switching
capability which means there really is little alternative but to use the
exception routine route anyway.

So the ratio for use is somewhere in about one in four.

  --  Dave Trissel            {seismo,ihnp4}!ut-sally!oakhill!davet
      Motorola Semiconductor Inc.
      Austin, Texas