Xref: utzoo comp.lang.fortran:916 comp.unix.questions:8261 Path: utzoo!attcan!uunet!lll-winken!lll-lcc!ames!mailrus!tut.cis.ohio-state.edu!rutgers!columbia!cubsun!shenkin From: shenkin@cubsun.BIO.COLUMBIA.EDU (Peter Shenkin) Newsgroups: comp.lang.fortran,comp.unix.questions Subject: Re: Sun 3 vs uVAXII floating point speed.... Message-ID: <67@cubsun.BIO.COLUMBIA.EDU> Date: 15 Jul 88 20:06:57 GMT References: <25065@ucbvax.BERKELEY.EDU> <3381@phri.UUCP> <59936@sun.uucp> Reply-To: shenkin@cubsun.UUCP (Peter Shenkin) Distribution: na Organization: Dept. of Biology, Columbia Univ., New York, NY Lines: 98 For what it's worth, here are some benchmarks I did for one of my programs. I list the total time, the time in the "tweak" (number- crunching) subroutine, and the time in the "io" (heavy on io) subroutine, for two separarate runs, one of which is more io-intensive than the other. The VAX was an 11/780 with fpa, running ULTRIX. The code was written in Fortran, and compiled & run with f77 on the VAX and Sun, with fc on the Convex C1. The different -O levels for the Convex refer to different levels of optimization (see below). Separate benchmarks of a different kind indicated that the uVAX-II is about 0.8 of an 11/780fpa on ordinary floating point arithmetic. Lots depends on the compiler, though. A previous posting pointed out that DEC now makes its own Fortran compiler, previously available only under VMS, available under ULTRIX, and that Sun now has a DEC-compatible Fortran compiler, which people say also produces better code than their version of f77 used to. I advise you to skip the data for now and come back to it after reading the conclusions at the bottom. Comparison of times on the VAX, Sun3 and Convex for two typical random tweak runs: l2-1000-0.0: relatively high io/compute ratio l2-150-2.0all: relatively low io/compute ratio NUMBERS: l2-1000-0.0 Sun-3 Sun-3 Convex Convex Convex =========== VAX -68881 -fpa -O0 -O1 -O2 TIMES (cpu-s) total: 2766 2107 1199 325 302 300 tweak: 1679 1824 950 208 184 174 io: 1029 226 224 108 110 119 TOTAL SPEED: 1 1.31 2.31 8.51 9.16 9.22 (VAX = 1) ************************************************************************* ************************************************************************* l2-150-2.0all Sun-3 Sun-3 Convex Convex Convex =========== VAX -68881 -fpa -O0 -O1 -O2 TIMES (cpu-s) total: 2339 2734 1287 273 246 229 tweak: 1656 2266 1062 205 180 162 io: 161 34 34 16 17 17 TOTAL SPEED: 1 0.86 1.82 8.57 9.51 10.21 (VAX = 1) ************************************************************************* ************************************************************************* CONCLUSIONS (for THIS PROGRAM!!!): (1) Sun-3 vs. VAX: With -68881, Sun is 4-5 times faster on IO, about 0.8 times as fast on single-precision arithmetic. (I know through other tests that it's several times faster on double-precision.) With -fpa (Weitek floating point board), same IO comparison holds, but Sun is about 1.7 times the speed of the vax in single-precision arithmetic. (2) Convex vs. VAX: with full optimization, about 9 times faster than the VAX on IO, about 10 times faster on single-precision arithmetic. Vectorization (-O2) gives a 20% speed-up over only local scalar optimization (-O0); full scalar optimization gives a 10% speed-up over only local. NOTES: (1) The program is (a) poorly written, and (b) not well-suited in its present form to automatic vectorization. As such it is probably typical. (On the other hand, it works....) (2) Estimates of IO and floating-point speeds were made from the io and tweak times, which are dominated by these kinds of operations, respectively. (3) VAX is the 11/780-fpa at Columbia Biology (cubsvax); Sun3 -68881 refers to the 68881 floating point processor. This was also at Columbia Biology (ramon). Sun3 -fpa was a machine at Sun in Fort Lee, NJ. Convex was cuhhca at Howard Hughes Institute, Columbia Medical School. See above for illumination of the -O options. (4) This particular program probably does not easily lend itself to great speed-up through vectorization, since the operations tend to be on fairly short vectors -- about 40 long in these examples, perhaps about 120 long in the "best" case, these being the numbers of atoms in the loop being repeatedly randomly generated. With difficulty, it might be possible to rewrite the program so as to generate many loops together, and thereby deal with longer vectors. Less drastic rewrites might conceivable speed things up by a factor of 1.5 to 2 overall (just a guess, based on the speed-up of those portions of the code where everything vectorized). -- ******************************************************************************* Peter S. Shenkin, Department of Biological Sciences, Columbia University, New York, NY 10027 Tel: (212) 280-5517 (work); (212) 829-5363 (home) shenkin@cubsun.bio.columbia.edu shenkin%cubsun.bio.columbia.edu@cuvmb.BITNET