Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watmath!clyde!rutgers!sri-unix!sri-spam!ames!amdcad!sun!pitstop!sundc!seismo!uunet!hsi!stevens
From: stevens@hsi.UUCP
Newsgroups: unix-pc.general,comp.sys.att
Subject: 3b1 instruction timings
Message-ID: <787@hsi.UUCP>
Date: Fri, 4-Dec-87 17:00:11 EST
Article-I.D.: hsi.787
Posted: Fri Dec  4 17:00:11 1987
Date-Received: Wed, 9-Dec-87 06:29:52 EST
Organization: Health Systems Intl., New Haven, CT
Lines: 59
Keywords: 3b1, 7300
Xref: utgpu unix-pc.general:84 comp.sys.att:1725

While trying to hand optimize some C code for a graphics routine
that I wanted to get as fast as possible, I performed some
timings on the 3b1.  What I wanted was the relative speeds of
the basic operations on different data types, to see
if there is anything "interesting".  My results are:

				  add	  sub	  mul	  div
				-----	-----	-----	-----
	register short		  1.0	  1.0	  4.0	 11.0
	short			  3.1	  3.1	  6.1	 13.1

	register long		  1.2	  1.2	 33.3	 44.5
	long			  4.3	  4.3	 36.5	 47.7

	register float		340.4	293.5	452.6	503.7
	float			344.1	296.5	458.7	504.3

	register double		 98.8	 90.5	211.9	258.9
	double			 98.8	 90.5	211.9	258.9

I didn't try to compare any absolute values for the 3b1 with any
other system, I just wanted to know how to write "optimal" code,
when necessary (i.e., inner loops of graphics routines).  The numbers
above are all relative to the value 1.0 for a register short add.
I used the cc optimizer for all timings.  A couple of observations:

- stick to shorts instead of ints or longs, when possible, since
	a 32-bit multiply or divide gets very expensive.  This is
	usually possible for graphics routines, and indeed I've noticed
	that some source (such as an implementation of Bresenham's
	line drawing algorithm from The Store) uses only shorts.

- registers don't buy you much except for adds and subtracts (and
	assignments too, I'd guess).

- avoid floats, and stick to doubles.  The C rule that forces all
	float artihmetic to be performed using double precision
	kills you on this system.

- this system really should have been designed with an FPU, as the
	floating point times are all 1 to 2 orders of magnitude greater
	than the integer times.  Would anyone from AT&T who is
	"in the know" about the 3b1, care to comment why there isn't
	one available ??

There are a couple of other points that I figured out about the 3b1,
that may be of interest:

    there are 6 short registers available (d2, d3, d4, d5, d6, d7)
    there are 6 long registers available (d2, d3, d4, d5, d6, d7)
    there are 6 float registers available (d2, d3, d4, d5, d6, d7)
    there are 4 pointer registers available (a2, a3, a4, a5)

Overall I wasn't very impressed with the code quality of the C compiler,
even with the optimizer.

	Richard Stevens
	Health Systems International, New Haven, CT
           { uunet | ihnp4 } ! hsi ! stevens