Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site ccivax.UUCP
Path: utzoo!watmath!clyde!cbosgd!ihnp4!qantel!dual!lll-crg!gymble!umcp-cs!seismo!rochester!ritcv!ccivax!rb
From: rb@ccivax.UUCP (rex ballard)
Newsgroups: net.arch
Subject: Re: Scientific Computing and mips 
Message-ID: <256@ccivax.UUCP>
Date: Mon, 16-Sep-85 22:54:28 EDT
Article-I.D.: ccivax.256
Posted: Mon Sep 16 22:54:28 1985
Date-Received: Fri, 20-Sep-85 05:34:10 EDT
References: <419@kontron.UUCP> <2300001@uicsl> <1093@ames.UUCP> <1119@ames.UUCP>
Organization: CCI Telephony Systems Group,  Rochester NY
Lines: 71

> >
> >I think you need to make your performance measurements in such a way that
> >you get a set of distinct numbers which can be used analytically to determine
> >performance for a given program if you know certain properties of the
> >program.  For example:
> >
> >1) The rate of execution of each member of the set of arithmetic operations
> >provided by the machine's instruction set, ...
> >..., with cache disabled.
> >
> >2) The rate of execution of 1-word memory-to-memory moves, with cache
> >disabled.
> >
> >3) The rate of execution of a tight loop ...register-to-register
> >moves, with cache disabled.
> >
> >4) The rate of execution of a tight loop ... , with cache enabled.
> >
> >5) The rate of execution of a tight loop performing (same word size as #3
> >and #4 above) memory-to-memory moves that produce all cache "hits", with
> >cache enabled.  Note that this gives you two properties of your cache: your
> >speedup for operand fetch and store resulting from caching, and any
> >performance penalties resulting from a write-through vs. write-back cache.
> >
> >6) Specifications such as the number of registers available to the user,
> >the size of the cache, etc.
> >
> >Well, you get the idea, anyway... personally I tend to feel that statistical
> >performance measurements are not nearly as useful as analytical ones; I
> >would rather see a list of fairly distinct performance properties of a pro-
> >cessor anytime, since I think you can do more with them in terms of
> >saying how the machine will perform for a given application that way.
 
I would like to add a few more tests in this vein.

7) The time required to do a "structured call"  (ie: save entire machine
state; transfer control to a "minimal subroutine" like "return(arg1+arg2+arg3)"
with all arguments on the stack; place the result in single register;
and return to caller.

The reason for a test like this comes from a study done by M. McGowan.
In a study of several million lines of code, the number of revisions of
a given source module increased EXPONENTIALLY relative to it's size.

Reguardless of the language, the number of revisions increased an
average of (1/25)**2.  The 25 was the number of lines displayable
on the screen at one time.

The theoretical ideal ratio between implementing a 'Macro Expansion' and a
'structured call' should theoretically be 0;

In convential benchmarks, a "call optimized" computer may show very little
superiority.  In general purpose applications where "modular software
design" is a necessity, the relative performance may double.

Unfortunately, such a computer would also have this advantage in general
benchmark tests.

8) The time required to do a "context switch" (ie: save entire machine
state, get new context, save state, return to old context.)

This can be a good indicator of interrupt responsiveness, suitability
for multitasking, and "event driven" situations.

9) The time required to save "equivalent states";

a machine with 8 registers may have less to do in "state save" than a
machine with 32, but can "hide" the number of "real" state values
required for a context switch for benchmarking purposes.

(these opinions were my own, but I'm giving them up for adoption)