Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!killer!ames!pasteur!agate!garnet!weemba
From: weemba@garnet.berkeley.edu (Obnoxious Math Grad Student)
Newsgroups: comp.arch
Subject: Re: Vectorising conditional code.
Message-ID: <11855@agate.BERKELEY.EDU>
Date: 9 Jul 88 14:18:47 GMT
References: <893@garth.UUCP>
Sender: usenet@agate.BERKELEY.EDU
Reply-To: weemba@garnet.berkeley.edu (Obnoxious Math Grad Student)
Organization: Brahms Gang Posting Central
Lines: 68
Supersedes: <11854@agate.BERKELEY.EDU>
In-reply-to: smryan@garth.UUCP (Steven Ryan)

In article <893@garth.UUCP>, smryan@garth (Steven Ryan) writes:

>        for i
>          a[i] := ...
>          exit if p[i]
>          b[i] := ...

>The actual number of iterations is controlled by the predicate p[].
>This means the actual number of elements stored into a[] is determined
>by a subsequent statement. I see no simple way to handle this.

See my comments below.  If the above code fragment is the inner loop of
a bigger calculation, and each inner loop is independent of all others,
then vectorization can be done straightforwardly.

>If I were actually working on this at the moment, I would like see enough
>typical cases

I once played with the Mandelbrot set on a Cray-1.  The relevant code frag-
ment is (with complex arithmetic):

	for c in [set of pixels]
	{	for(z=i=0; |z|<2 && i	       where all the extra work is worth the effort.

In this case, it was definitely worth it.

Now back to your example and my promised comments.

>        for i
>          a[i] := ...
>          exit if p[i]
>          b[i] := ...

What I did, then, was to vectorize the a[i] calculation.  I had no b[i]
calculation--this is true for any while loop.  But this could be handled
just as easily.  I write one part of the code that does A-P, and another
part that does B-A-P.  Every pixel would do one round of A-P, and then
it's just a B-A-P while loop.

More fun happens if you replace the "exit" with "continue": then the pix-
els start in the A-P batch and eventually enough migrate to B-A-P allow-
ing both to loop for a while.  Keeping the vector registers full and beat-
ing off gridlock is tedious, but it is not overly difficult.

Remember, this method works if you have a huge outer loop that guarantees
that the code fragments A-P and B-A-P are vector calculations.  It can,
in principle, vectorize arbitrarily complex conditionals.

ucbvax!garnet!weemba	Matthew P Wiener/Brahms Gang/Berkeley CA 94720