Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!mnetor!uunet!seismo!gatech!bloom-beacon!husc6!cmcl2!phri!roy From: roy@phri.UUCP (Roy Smith) Newsgroups: comp.arch Subject: Re: What with these Vector's anyways? Message-ID: <2806@phri.UUCP> Date: Tue, 21-Jul-87 11:04:41 EDT Article-I.D.: phri.2806 Posted: Tue Jul 21 11:04:41 1987 Date-Received: Thu, 23-Jul-87 05:35:16 EDT References: <2378@ames.arpa> <687@elmgate.UUCP> Reply-To: roy@phri.UUCP (Roy Smith) Organization: Public Health Research Inst. (NY, NY) Lines: 41 Keywords: vector Cray Cyber CDC Cpu Supercomputers In article <687@elmgate.UUCP> jdg@aurora.UUCP (Jeff Gortatowsky) writes: wants to know what "vector" means in the context of "vector processors" Let's say you have 3 floating point arrays, x, y, and z and you want to set each element in z equal to the product of the corresponding elements in x and y. On a scalar processor (i.e. Vax, Sun, etc) you would write: for i goes from 1 to upper-limit-of-x,y,z do z[i] = x[i] * y[i] end The problem is that the cpu wastes a lot of time doing the dunky work of executing the loop, (increment the index and check for upper limit), computing the addresses for the array references, fectching and decoding the multiply instruction opcode, etc, and only after all that does it get to do the "real" work of doing the floating-point multiply. On a vector processor, you would have a single instruction to do the whole loop. Furthur, if you look carefully at a floating multiply operation, you see it takes a dozen or so atomic steps; multiply the mantissas, add the exponents, normalize the result, check for under/overflow, etc. On a scalar machine these operations get done in series. On a vector machine, if you have 6 multiplies to do (call them M1-M6) once you get the pipeline primed, you can be doing mantissa-multiply for M4 at the same time that another bit of hardware is doing the exponent-add for M3 while some other piece of hardware is doing the normalize for M2 and the overflow-check for M1 is being done by yet another bit of hardware. Thus, if it takes N clock cycles to do a complete multiply, on a vector machine you need N clock cycles before the first result is complete, and thereafter you get another result every clock cycle. Some problems vectorize easily, some don't. If you have the type of problem that does, running it on a vector machine is a big win. If you have the type of problem that doesn't, running it on a vector machine is just a good way to waste expensive hardware. -- Roy Smith, {allegra,cmcl2,philabs}!phri!roy System Administrator, Public Health Research Institute 455 First Avenue, New York, NY 10016