Path: utzoo!attcan!utgpu!jarvis.csri.toronto.edu!rutgers!tut.cis.ohio-state.edu!ucbvax!hoptoad!gnu
From: gnu@hoptoad.uucp (John Gilmore)
Newsgroups: comp.arch
Subject: Re: delayed branch
Message-ID: <8266@hoptoad.uucp>
Date: 11 Aug 89 10:09:48 GMT
References: <828@eutrc3.urc.tue.nl> <26667@amdcad.AMD.COM> <26676@amdcad.AMD.COM>
Organization: Grasshopper Group in San Francisco
Lines: 48

tim@cayman.amd.com (Tim Olson) wrote:
> Does anyone else know of other processors with such restrictions?

I'm surprised that nobody mentioned the SPARC.  It has restrictions on
which types of branches can sit in the delay slot of which other
types.  I think in the first draft of the architecture I was the one
who noticed that the intended "return from interrupt" sequence was one
of the invalid ones!  I don't have my SPARC manual handy but as I
recall the invalid combinations are defined to "keep you executing in
the same address space but otherwise jump to an undefined location"...
I found this quite a botch for a CPU architecture but I'm not a chip
designer -- I got into this business via software.  Then again, it's
been on the market for a few years and nobody seems to be screaming
about it.

A case where this bit me came up in the observation below:

While examining the function block profiler code (cc -a) I noticed an
interesting thing.  If you do:

	bcond,a	foo		[,a means annul]
	instruction
foo:

What you have is a "skip on not condition" instruction.  If the
condition is true, it does a delayed branch to foo, executing the
instruction in the delay slot.  If the condition is false, it falls
thru, but annuls the instruction.  In either case, the execution time
is the same (two cycles) and you end up at foo.

This reminds me of the "skip" instructions on the old DG Nova and Eclipse.
Quite nice on machines with a single size instruction.

You can also think of it as a "conditionally execute one instruction"
instruction; in this case you don't have to mentally reverse the condition.
E.g.   blt,a foo; insn; foo:   executes insn if less than.

There's a serious catch to it on the SPARC:  the second instruction
cannot be a delayed control transfer [i.e. a branch with a delay
slot].  If it is, what the CPU does is undefined!

I was hoping to use this to shorten the block profiling code, but it doesn't
work because the second instruction is a CALL.  Still, there are probably
places where the optimizers can use it.
-- 
John Gilmore      {sun,pacbell,uunet,pyramid}!hoptoad!gnu      gnu@toad.com
      "And if there's danger don't you try to overlook it,
       Because you knew the job was dangerous when you took it"