Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site oakhill.UUCP Path: utzoo!linus!philabs!prls!amdimage!amdcad!amd!vecpyr!lll-crg!seismo!ut-sally!oakhill!davet From: davet@oakhill.UUCP (Dave Trissel) Newsgroups: net.micro.68k,net.micro.16k Subject: Re: Re: PDP11s vs the micros Message-ID: <492@oakhill.UUCP> Date: Fri, 16-Aug-85 21:46:06 EDT Article-I.D.: oakhill.492 Posted: Fri Aug 16 21:46:06 1985 Date-Received: Tue, 20-Aug-85 06:35:39 EDT References: <1617@hao.UUCP> <847@mako.UUCP> <2422@sun.uucp> <2607@sun.uucp> <5874@utzoo.UUCP> Reply-To: davet@oakhill.UUCP (Dave Trissel) Organization: Motorola Inc. Austin, Tx Lines: 64 Xref: linus net.micro.68k:1005 net.micro.16k:336 In article <5874@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes: >> > Any particular reason to do this rather than restart the instruction from >> > where it left off? >> > >Motorola obviously :-) views its 68020 line primarily as a way to sell >memory chips. Between the incredible pile of trash it heaves onto the >stack when you take a page fault, and the huge internal state of the >68881 FPU that has to be shoveled in and out every time you context-switch >(what's the betting Motorola's next FPU chip has DMA? :-), the memory >market is clearly what they're aiming at. That and the cache market. What you don't realize is the amazing performance we can get because of the "incredible pile of trash" we heave on the stack. The crux of the problem is that chips which have to back-up and redo instructions pay a nasty penalty in pipeline design. Consider the following generic microprocessor code sequence: MOVE something to memory SHIFT Reg by immediate MUL Reg to Reg etc. The MC68020 executes the MOVE and the bus unit schedules a write cycle. Then the execution unit/pipeline happily continues executing the instruction stream without regard to the final status of the write. Even if the write fails (bus errors) there could be several more instructions executed (in fact any amount until one is hit which requires the bus again.) Contrast this to chips which redo instructions. They must soon stop dead in their tracks until the write cycle has been verified as properly done. Other- wise they would alter the programmers model and invalidate retry. Another thing to consider, is that the total operating system code executed to continue from a page fault (assign an unused page frame and map it in the MMU, or block the process and schedule a swapped out page to be read) makes the overhead of writing the internal 020 machine state seem insignificant. The stack save equates to about the same overhead as executing 12 instructions. Concerning floating-point state saves we gave a lot of thought to minimizing latency times. What we did was give an indication to the OS of whether any of the FP registers had been used. If not, the OS could skip the context save and restore completely. Intel has a novel approach on their 8087 and 2087 where they let the process context switch without saving FP state. If another process tries using floating-point an interrupt occurs letting the OS then swap context only when necessary. The trouble with this technique is that all it takes is for one out of every 20 or so context switches to require a re-save and you start losing overall processor time over just saving it unconditionally. At worse, if you have several processes constantly sharing the FP chip then you have essentially forced a complete extra interrupt exception invocation for every change in context - a massive penalty. One solution would be to keep multiple contexts on chip. Ah - if we only had next decade's technology today. Lot's of exciting things are going to happen once we can get millions of gates on a single chip running at 70 mHz. -- Dave Trissel Motorola Semiconductor Inc. Austin, Texas {seismo,ihnp4}!ut-sally!oakhill!davet