Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!killer!ames!pasteur!ucbvax!decwrl!purdue!i.cc.purdue.edu!j.cc.purdue.edu!pur-ee!a.cs.uiuc.edu!uxc.cso.uiuc.edu!uicsrd.csrd.uiuc.edu!hoefling
From: hoefling@uicsrd.csrd.uiuc.edu
Newsgroups: comp.arch
Subject: Re: getting rid of branches
Message-ID: <43700043@uicsrd.csrd.uiuc.edu>
Date: 7 Jul 88 17:40:00 GMT
References: <12258@mimsy.UUCP>
Lines: 66
Nf-ID: #R:mimsy.UUCP:12258:uicsrd.csrd.uiuc.edu:43700043:000:1956
Nf-From: uicsrd.csrd.uiuc.edu!hoefling    Jul  7 12:40:00 1988


>/* Written  1:21 pm  Jul  4, 1988 by ho@svax.cs.cornell.edu */
>
>original:   do 100 i = 1, 100
>               statement1
>               if (x(i)) goto 200
>               statement2
>        100 continue
>            statement3
>        200 statement4
>
>transformed:
>            ex1 = .true.
>            do 100 i = 1, 100
>               if (ex1) statement1
>               if (ex1) ex1 = .not. x(i)
>               if (ex1) statement2
>        100 continue
>            if (.not. ex1) goto 200
>            statement3
>        200 statement4
>
>the do loop can now be vectorized.

If "x" in the original is invariant inside the loop (i.e. no dependences 
involving it within the loop), then it is trivial to determine on which
iteration the exit will occur [ it's iteration i where x(i) is first .TRUE. ].
Knowing that, it is also trivial to determine how many "statement1"s, 
"statement2"s, "statement3"s and "statement4"s will be executed. It is 
therefore also trivial to set up vector statements which do the statements
that many times.  The crucial question is whether there are any dependences 
from statement2 to statement1.  Such a dependence would make the loop not 
vectorizable (at some point, an instance of statement1 would have to wait 
for an instance of statement2 to finish).

If we assume that there are no dependences on "x" and no dependences from 
statement2 to statement1, then the problem comes down to simply finding the 
index of the first occurence of .TRUE. in "x".

C---Let's say that "first_TRUE_index" returns 0 if no x(i) is .TRUE.

	exit_index = first_TRUE_index(x)

	if (exit_index .EQ. 0) then
		limit1 = 100
		limit2 = 100
	else 
		limit1 = exit_index
		limit2 = exit_index-1
	end if

	dovector i=1,limit1
		statement1
	end dovector

	dovector i=1,limit2
		statement2
	end dovector

	if (exit_index .EQ. 0) statement3

	statement4

Jay Hoeflinger
Center for Supercomputing Research and Development
U of Illinois