Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!mailrus!tut.cis.ohio-state.edu!ucbvax!hplabs!amdcad!cayman!tim
From: tim@cayman.amd.com (Tim Olson)
Newsgroups: comp.arch
Subject: Re: delayed branch
Message-ID: <26716@amdcad.AMD.COM>
Date: 11 Aug 89 15:02:44 GMT
References: <828@eutrc3.urc.tue.nl> <26667@amdcad.AMD.COM> <26676@amdcad.AMD.COM> <8266@hoptoad.uucp>
Sender: news@amdcad.AMD.COM
Reply-To: tim@amd.com (Tim Olson)
Organization: Advanced Micro Devices, Austin, TX
Lines: 57
Summary:
Expires:
Sender:
Followup-To:

In article <8266@hoptoad.uucp> gnu@hoptoad.uucp (John Gilmore) writes:
| tim@cayman.amd.com (Tim Olson) wrote:
| > Does anyone else know of other processors with such restrictions?
| 
| I'm surprised that nobody mentioned the SPARC.

Well, from the previous postings and email conversations, it appears
that nearly every RISC processor besides the Am29000 has a restriction
on what can go in a branch delay slot, including SPARC, MIPS, 88000,
i860, and ROMP.  Most of the restrictions are advisory (don't do this; the
result is undefined), but the ROMP has hardware to detect and trap this
condition.

One interesting thing to think about if control transfers are allowed in
branch delay slots is how a delay-slot call should work:

loop:
	.
	.
	jmp	loop
	call	lr0, function
exit:
	.
	.


Calls are typically defined in RISC processors to save the return
address in a register.  Since calls themselves have delay slots, the
return address is normally the second instruction after the call.

The action that a delay-slot call takes depends upon how the return
address is calculated in the processor.  It could either be the address
of the call + 2 (words), or the address of the call's delay slot
instruction + 1.  These normally result in the same value, but if the
call is itself in a delay slot, they work differently:

	ret <- call+2			ret <- call_delay+1

	jmp	loop			jmp	loop
	call	lr0, function		call	lr0, function
				
			
	.				.
			
			

In the former case, the jmp/call pair acts as a visit to the jmp's
target, and does not execute the instruction at exit (it substitutes the
jmp's target for the call's delay slot).  In the later case, the
jmp/call pair continues the loop, executing the first instruction of the
loop just before the call target is executed, and returns to the second
instruction in the loop.  The Am29000 exhibits the second behavior.


	-- Tim Olson
	Advanced Micro Devices
	(tim@amd.com)