Path: utzoo!utgpu!water!watmath!clyde!bellcore!rutgers!aramis.rutgers.edu!athos.rutgers.edu!hedrick From: hedrick@athos.rutgers.edu (Charles Hedrick) Newsgroups: comp.unix.wizards Subject: Re: Vax 11/780 performance vs Sun 4/280 performance Message-ID:Date: 4 Jun 88 20:27:33 GMT References: <15875@brl-adm.ARPA> Organization: Rutgers Univ., New Brunswick, N.J. Lines: 34 I've played around with our Sun 4's a bit (and also with a VAX 750) to duplicate the various tests. I can confirm that with many processes waiting for very small times, there is in fact a very sharp "knee". It happened for me at something like 19 processes. Vmstat is probably the best tool for watching this. With 18 processes, vmstat showed over 90% of the system idle. Start one more and suddenly 3% idle and over 90% of the CPU spent in system state. Killing and restarting that last process would cause the system to toggle between the two states. It was very dramatic. In retrospect it is very clear what is going on. There are a finite number of hardware contexts in the MMU. Presumably (assuming rational system programmers) they are managed much like virtual memory. That is, when a process is to be activated, its MMU info must be put in one of the contexts. If it isn't there already, some algorithm (maybe LRU?) is used to decide which process' information to remove. Every time a process is activated and its information isn't already in a context register, some work has to be done (which seems to take about 1 msec). Problems are going to occur when new processes have to be put in context registers at a rate that is more than about 100/sec. This requires not only a lot of processes, but also a lot of process activations. That is, you are always OK if the number of active processes is less than 15, since those will fit into the hardware context registers. But you are also OK if you have more than 15 processes, as long as they aren't being activated at a high rate. Even if you have 100 CPU-bound processes, the problem won't occur as long as the scheduler gives them fairly long runtime slices. This is the reason that changing the amount of sleep time in the tests was so critical. It's hard to know offhand exactly when this problem will show up in practice, but I have to believe that somebody at Sun has done simulation studies with reasonable job mixes, since that's the way the game is played these days. But it is not the case that your system will come to a screaming halt when you activate the 16th process, and it certainly is not limited to 15 users. On the other hand, nobody is claiming that the Sun 4's are intended for 100 users.