Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84 SMI; site sun.uucp
Path: utzoo!linus!philabs!cmcl2!seismo!harvard!talcott!panda!genrad!decvax!decwrl!sun!gnu
From: gnu@sun.uucp (John Gilmore)
Newsgroups: net.arch
Subject: Re: MMU Cache revisited
Message-ID: <2581@sun.uucp>
Date: Thu, 8-Aug-85 20:29:14 EDT
Article-I.D.: sun.2581
Posted: Thu Aug  8 20:29:14 1985
Date-Received: Mon, 12-Aug-85 02:34:16 EDT
References: <5374@fortune.UUCP> <268@gcc-bill.ARPA> <1838@amdahl.UUCP>
Distribution: net
Organization: Sun Microsystems, Inc.
Lines: 26

Someone pointed out that changing contexts in the IBM 370 doesn't cause
a big performance hit because the page table cache (called the "TLB")
contains multiple contexts' entries, indexed with a pointer into another
cache ("STO stack").  It should be mentioned that early 370's do not have
the STO hardware; it was added because context switching took too long.
(Their first virtual memory systems did not give each process a different
address space, it just let the address space everybody shared be larger
than the physical memory.  For that they didn't need to change the MMU
context.)

It has also not been mentioned that systems where you copy page table
entries into dedicated fast RAMs need not recopy on every context
switch.  In the Sun-2 MMU, for example, 8 complete contexts can remain
in the fast RAMs, and the only reloading required is when you are
context-switching more than 8 processes.  On a single user system
(running Unix where most processes die quickly anyway) this is not a 
performance bottleneck.

From the hardware designs I've seen, it's a lot harder to build an MMU
with a cache than it is to build one out of RAM.  This is because
the cache is doing in hardware what would otherwise be done in software
(updating the entries in the hardware translation table).  Whether
this is worth it or not depends on the individual system and what
it will be used for.  I suspect the overhead difference between the
two is negligible in the overall system load, unless the hardware is
badly designed, so I favor the cheaper approach.