Xref: utzoo comp.arch:7461 comp.sys.ibm.pc.rt:207 Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!rutgers!rochester!pt.cs.cmu.edu!RPD.MACH.CS.CMU.EDU!rpd From: rpd@RPD.MACH.CS.CMU.EDU (Richard Draves) Newsgroups: comp.arch,comp.sys.ibm.pc.rt Subject: Re: Why the original RT seemed/was slow (was ...) Message-ID: <3764@pt.cs.cmu.edu> Date: 4 Dec 88 17:37:14 GMT References: <5046@polya.Stanford.EDU> <1287@auschs.UUCP> <1309@auschs.UUCP> <3736@pt.cs.cmu.edu> <447@scifi.UUCP> Organization: Carnegie-Mellon University, CS/RI Lines: 38 In article <447@scifi.UUCP> njs@scifi.UUCP (Nicholas J. Simicich) writes: >At IBM T.J. Watson Research, we have a number of RT's running Mach. A >simple C program running CPU bound with a working set of around 300k >running niced makes it impossible to do any other work on the machine. >This does not happen on either the AOS or AIX machines we have. The >operating system seems to be the sole difference I can come up with. >I believe that the Mach operating system runs well on a number of >other machines and suspect that it is simply a matter of tuning. I know of one performance gotcha with Mach on RTs. The RT MMU only allows sharing of segments. Mach VM is more general and allows sharing of pages. However, it should still notice when what is being shared is in fact an entire segment (notably, text segments) and use a common segment to implement the sharing. However, it doesn't do this. (The interface between the machine-independent and machine-dependent VM code makes it difficult to figure out that this is possible/desirable.) Instead, each address space is composed of different segments. Because the RT architecture only allows a page to be in one segment at a time, when a process uses a shared page it may take a "translation fault" which moves the page into the right segment. These faults are pretty expensive; on a Model 25 RT they take more than a millisecond. For example, every time our csh runs a command about 80 of these faults occur. Rich Sanzi recently greatly improved the translation-fault handling time, but it is still an unfortunate performance hit. I dug out my copy of Dhrystone 1.1 and tried to reproduce Sauer's numbers. Sauer Draves Model 25 4000 3270 Model 125 8300 7855 Model 135 10400 9765 I used hc2.1d and ran the tests single-user. Problems with VM don't explain the discrepancies. (I wonder why the Model 25 number is especially far off?) Is there some compiler better than hc2.1d? Do AIX and AOS get different numbers? Rich Draves --