Path: utzoo!attcan!uunet!lll-winken!lll-lcc!ames!umd5!mimsy!chris
From: chris@mimsy.UUCP (Chris Torek)
Newsgroups: comp.arch
Subject: getting rid of branches
Message-ID: <12258@mimsy.UUCP>
Date: 30 Jun 88 21:43:22 GMT
References: <1941@pt.cs.cmu.edu> <3208@ubc-cs.UUCP> <1986@pt.cs.cmu.edu> <91odrecXKL1010YEOek@amdahl.uts.amdahl.com>
Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742
Lines: 43

In article <91odrecXKL1010YEOek@amdahl.uts.amdahl.com>
chuck@amdahl.uts.amdahl.com (Charles Simmons) writes:
>You guys aren't thinking big enough.  How about multiple parallel
>pipelines to compute all the various instruction threads in parallel
>and just keep the results of the one that is actually taken?

Actually, this sort of idea is contained in some research and thesis
work that is (was?) going on here at Maryland.  How do you move old
code (`dusty decks') onto a parallel processor?  One way is to slice up
the program into independent pieces that can be combined again later.
In some case, rather than running the loop

	for i in [0..n-1)
		A := compute
		if test(A)
		then B[i] := fn1(A)
		else B[i] := fn2(A)
		fi
	rof

you run two or three loops:

		1			2			3

	for i in [0..n-1)	for i in [0..n-1)	for i in [0..n-1)
		A := compute		A := compute		A := compute
		B[i] := fn1(A)		B[i] := fn2(a)		combine[i] :=
									test(A)
	rof			rof			rof

and at the end, you pick the appropriate B[i]s based on the (secret)
combine[i]s (which may be assigned within either of processes 1 and 2,
or might be done on a third path, depending on how hard `test' is).

The separation (slicing) and recombining (splicing) is relatively
straightforward, although there are oodles of messy details (computing
dominators, watching side effects, what about overflow?, ...).  If you
have a massively parallel machine, computing both answers, then
throwing it away the wrong one, may be cheapest.  You have to account
for communications costs, and decide where the communication goes (can
process 3 stop processes 1 and 2 short? can 1 & 2 share A? etc.).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris