Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!seismo!gatech!bloom-beacon!husc6!cmcl2!phri!roy
From: roy@phri.UUCP (Roy Smith)
Newsgroups: comp.arch
Subject: Re: What with these Vector's anyways?
Message-ID: <2806@phri.UUCP>
Date: Tue, 21-Jul-87 11:04:41 EDT
Article-I.D.: phri.2806
Posted: Tue Jul 21 11:04:41 1987
Date-Received: Thu, 23-Jul-87 05:35:16 EDT
References: <2378@ames.arpa> <687@elmgate.UUCP>
Reply-To: roy@phri.UUCP (Roy Smith)
Organization: Public Health Research Inst. (NY, NY)
Lines: 41
Keywords: vector Cray Cyber CDC Cpu Supercomputers

In article <687@elmgate.UUCP> jdg@aurora.UUCP (Jeff Gortatowsky) writes:
wants to know what "vector" means in the context of "vector processors"

	Let's say you have 3 floating point arrays, x, y, and z and you
want to set each element in z equal to the product of the corresponding
elements in x and y.  On a scalar processor (i.e. Vax, Sun, etc) you would
write:

	for i goes from 1 to upper-limit-of-x,y,z
	do
		z[i] = x[i] * y[i]
	end

	The problem is that the cpu wastes a lot of time doing the dunky
work of executing the loop, (increment the index and check for upper
limit), computing the addresses for the array references, fectching and
decoding the multiply instruction opcode, etc, and only after all that does
it get to do the "real" work of doing the floating-point multiply.  On a
vector processor, you would have a single instruction to do the whole loop.

	Furthur, if you look carefully at a floating multiply operation,
you see it takes a dozen or so atomic steps; multiply the mantissas, add
the exponents, normalize the result, check for under/overflow, etc.  On a
scalar machine these operations get done in series.  On a vector machine,
if you have 6 multiplies to do (call them M1-M6) once you get the pipeline
primed, you can be doing mantissa-multiply for M4 at the same time that
another bit of hardware is doing the exponent-add for M3 while some other
piece of hardware is doing the normalize for M2 and the overflow-check for
M1 is being done by yet another bit of hardware.  Thus, if it takes N clock
cycles to do a complete multiply, on a vector machine you need N clock
cycles before the first result is complete, and thereafter you get another
result every clock cycle.

	Some problems vectorize easily, some don't.  If you have the type
of problem that does, running it on a vector machine is a big win.  If you
have the type of problem that doesn't, running it on a vector machine is
just a good way to waste expensive hardware.
-- 
Roy Smith, {allegra,cmcl2,philabs}!phri!roy
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016