Xref: utzoo comp.arch:7461 comp.sys.ibm.pc.rt:207
Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!rutgers!rochester!pt.cs.cmu.edu!RPD.MACH.CS.CMU.EDU!rpd
From: rpd@RPD.MACH.CS.CMU.EDU (Richard Draves)
Newsgroups: comp.arch,comp.sys.ibm.pc.rt
Subject: Re: Why the original RT seemed/was slow (was ...)
Message-ID: <3764@pt.cs.cmu.edu>
Date: 4 Dec 88 17:37:14 GMT
References: <5046@polya.Stanford.EDU> <1287@auschs.UUCP> <1309@auschs.UUCP> <3736@pt.cs.cmu.edu> <447@scifi.UUCP>
Organization: Carnegie-Mellon University, CS/RI
Lines: 38

In article <447@scifi.UUCP> njs@scifi.UUCP (Nicholas J. Simicich) writes:
>At IBM T.J. Watson Research, we have a number of RT's running Mach.  A
>simple C program running CPU bound with a working set of around 300k
>running niced makes it impossible to do any other work on the machine.
>This does not happen on either the AOS or AIX machines we have.  The
>operating system seems to be the sole difference I can come up with.
>I believe that the Mach operating system runs well on a number of
>other machines and suspect that it is simply a matter of tuning.

I know of one performance gotcha with Mach on RTs.  The RT MMU only allows
sharing of segments.  Mach VM is more general and allows sharing of pages.
However, it should still notice when what is being shared is in fact an
entire segment (notably, text segments) and use a common segment to
implement the sharing.  However, it doesn't do this.  (The interface between
the machine-independent and machine-dependent VM code makes it difficult
to figure out that this is possible/desirable.)

Instead, each address space is composed of different segments.  Because the
RT architecture only allows a page to be in one segment at a time, when
a process uses a shared page it may take a "translation fault" which moves
the page into the right segment.  These faults are pretty expensive; on a
Model 25 RT they take more than a millisecond.  For example, every time
our csh runs a command about 80 of these faults occur.  Rich Sanzi recently
greatly improved the translation-fault handling time, but it is still
an unfortunate performance hit.

I dug out my copy of Dhrystone 1.1 and tried to reproduce Sauer's numbers.
		Sauer		Draves
Model 25	 4000		 3270
Model 125	 8300		 7855
Model 135	 10400		 9765

I used hc2.1d and ran the tests single-user.  Problems with VM don't explain
the discrepancies.  (I wonder why the Model 25 number is especially far off?)
Is there some compiler better than hc2.1d?  Do AIX and AOS get different
numbers?

Rich Draves
--