Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!seismo!mo
From: mo@seismo.CSS.GOV (Mike O'Dell)
Newsgroups: comp.arch
Subject: Vector machines
Message-ID: <44042@beno.seismo.CSS.GOV>
Date: Fri, 24-Jul-87 14:08:42 EDT
Article-I.D.: beno.44042
Posted: Fri Jul 24 14:08:42 1987
Date-Received: Sat, 25-Jul-87 14:37:15 EDT
Organization: Center for Seismic Studies, Arlington, VA
Lines: 67

In a former lifetime I supercomputed to keep myself in
beans and remember two interesting things people might
find interesting.

Based on the parallelizing compiler work done by MCC
(Massechusetts Computer Consultants) on IVTRAN for
Illiac IV, John Levesque and friends at RDA built
"RDALIB" for the CDC 7600.  THis library basically
provided many of the facilities now in the CRAY
machines, but in some ways more interesting. The
automatic parallelizer was experimented with to 
try and use RDALIB, but the best results were obtained
after further hand-tuning.  Anyway, the secret of
RDALIB was the "instruction stack" - what we now
would call a "prefetch cache" implemented in
very fast memory.  If you could contain the loop
in the instruction stack, the ol' 7600 could really
get with it - memory hits for instruction fetches
really slowed it down by over a factor of 2.
Between the incredibly tense code which kept the
RDALIB primitives in the i-stack, and the
work on the codes, the ol' 7600 was an unbelivably
fast machine, considering it was designed in the 
late-middle 1960's.  If I remember right (considerable
fog...) it got well into the 40 megaflops sustained,
measured over  8 hours clock time, the usual stint
when you had "block time" on such machines.
And because of tricks like BUFFERIN and BUFFEROUT,
clocktime essentially equalled cputime.
In fact, when the CRAY-1 was first introduced, it
took considerable work to realize its potential
and the 7600 did not abdicate its megaflop crown 
readily.

Another vector machine which was, to quote 
the designer of the Startrek M5, "was not 
completely sucessful" was the TI-ASC.
It was a multipipeline beast - one to four
pipes.  The machine was largely compatible with
the System 360 instruction set except for
the vector pipe additions.  IN fact, they
committed the unpardonable sin of reinventing
OS/360 for the ASC, only slightly different
JCL syntax.  UGH!!!!  Anyway, the problem 
with the machine and most of the pre-CRAY
vector machines (the STAR*100 in particular
and somewhat remaining in the Cyber 205)
was the pipeline startup overhead.  If the
vectors weren't about 100 elements long
(the number varied between 50 and 100 depending
on what you were doing in the pipe), starting
the pipeline actually SLOWED DOWN THE PROGRAM!!!
The vectorizer was again based on the MCC IVTRAN
work and was quite good at vectorizing DO loops, 
but because it usually pessimized the code,
the code had to be liberally laced with
$NOVECTORIZE directives.  Finally they
added a flag to make vectorizing default
to OFF and then you could put in only a few
$VECTORIZE directives.  Anyway, the
machine never achieved anything like its
advertised speeds.  There may still be one
ASC still running, but there never were
very many built.

	Yours for faster machines,
	-Mike O'Dell