Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watmath!clyde!rutgers!sri-unix!sri-spam!ames!amdcad!sun!pitstop!sundc!seismo!uunet!hsi!stevens From: stevens@hsi.UUCP Newsgroups: unix-pc.general,comp.sys.att Subject: 3b1 instruction timings Message-ID: <787@hsi.UUCP> Date: Fri, 4-Dec-87 17:00:11 EST Article-I.D.: hsi.787 Posted: Fri Dec 4 17:00:11 1987 Date-Received: Wed, 9-Dec-87 06:29:52 EST Organization: Health Systems Intl., New Haven, CT Lines: 59 Keywords: 3b1, 7300 Xref: utgpu unix-pc.general:84 comp.sys.att:1725 While trying to hand optimize some C code for a graphics routine that I wanted to get as fast as possible, I performed some timings on the 3b1. What I wanted was the relative speeds of the basic operations on different data types, to see if there is anything "interesting". My results are: add sub mul div ----- ----- ----- ----- register short 1.0 1.0 4.0 11.0 short 3.1 3.1 6.1 13.1 register long 1.2 1.2 33.3 44.5 long 4.3 4.3 36.5 47.7 register float 340.4 293.5 452.6 503.7 float 344.1 296.5 458.7 504.3 register double 98.8 90.5 211.9 258.9 double 98.8 90.5 211.9 258.9 I didn't try to compare any absolute values for the 3b1 with any other system, I just wanted to know how to write "optimal" code, when necessary (i.e., inner loops of graphics routines). The numbers above are all relative to the value 1.0 for a register short add. I used the cc optimizer for all timings. A couple of observations: - stick to shorts instead of ints or longs, when possible, since a 32-bit multiply or divide gets very expensive. This is usually possible for graphics routines, and indeed I've noticed that some source (such as an implementation of Bresenham's line drawing algorithm from The Store) uses only shorts. - registers don't buy you much except for adds and subtracts (and assignments too, I'd guess). - avoid floats, and stick to doubles. The C rule that forces all float artihmetic to be performed using double precision kills you on this system. - this system really should have been designed with an FPU, as the floating point times are all 1 to 2 orders of magnitude greater than the integer times. Would anyone from AT&T who is "in the know" about the 3b1, care to comment why there isn't one available ?? There are a couple of other points that I figured out about the 3b1, that may be of interest: there are 6 short registers available (d2, d3, d4, d5, d6, d7) there are 6 long registers available (d2, d3, d4, d5, d6, d7) there are 6 float registers available (d2, d3, d4, d5, d6, d7) there are 4 pointer registers available (a2, a3, a4, a5) Overall I wasn't very impressed with the code quality of the C compiler, even with the optimizer. Richard Stevens Health Systems International, New Haven, CT { uunet | ihnp4 } ! hsi ! stevens