Path: utzoo!utgpu!watmath!clyde!att!rutgers!mailrus!ames!vsi1!wyse!mips!mash
From: mash@mips.COM (John Mashey)
Newsgroups: comp.arch
Subject: Re: Memory-mapped floating point (was Re: ZISC computers)
Message-ID: <9139@winchester.mips.COM>
Date: 2 Dec 88 05:30:54 GMT
References: <22115@sgi.SGI.COM> <278@antares.UUCP> <2958@ima.ima.isc.com> <8939@winchester.mips.COM> <1044@microsoft.UUCP> <9061@winchester.mips.COM> <23656@amdcad.AMD.COM>
Reply-To: mash@mips.COM (John Mashey)
Organization: MIPS Computer Systems, Sunnyvale, CA
Lines: 78

In article <23656@amdcad.AMD.COM> tim@crackle.amd.com (Tim Olson) writes:
>In article <9061@winchester.mips.COM> mash@mips.COM (John Mashey) writes:
>| Here's an example.  MAybe this is close in speed, or maybe not, or maybe
>| I don't understand the 29K FPU interface well enough.  here's a small
>| example (*always be wary of small examples; I don't have time right now
>| for anything more substantive):
>| main() {
>| 	double x, y, z;
>| 	x = y + z;
>| }

Sorr

>No, local doubles are kept in the Am29000 register file, so no
>loads/stores will occur to the memory stack.  The Am29000 has two
Sorry, I was using a simple example, without optimization, to create
code typical of references to variables in memory.  The optimizer 
of course erases this code.
>methods of generating floating-point code, either emitting floating
>point instructions (which trap in the current Am29000 implementation) or
>emitting inline '027 code directly.  The fp instruction code for:

>double
>g(double x, double y)
>{
>	return x+y;
>}
>
>(essentially the same as your test case, but I had to revise it to make
>it emit *any* code) is:
>
>	jmpi	lr0
>	dadd	gr96,lr4,lr2

As I understand it, this is a trap to a routine that ends up issuing
something the code in your next example....
The R3000 code for this routine is of course:
  [y.c:   3] 0x0:	03e00008	jr	ra
  [y.c:   3] 0x4:	462e6000	add.d	f0,f12,f14


>The in-line '027 code for this looks like:
>
>	const	gr96,1	; (0x1)
>	consth	gr96,65536	; (0x10000)
>	store	1,38,gr96,gr96
>	store	1,32,lr4,lr5
>	store	1,97,lr2,lr3
>	load	1,1,gr97,gr96
>	load	1,0,gr96,gr96

>Nope, it is:
>
>	DP +	DP *	SP +	SP *
>  R3000 9	12	6	8
>  29K	12	12	9	9
>
>This again assumes that the '027 instruction is not reused (which it
>would be in "real" code).  If it were reused, the counts would drop by 2
>cycles.
Given the same assumptions (variables already in registers,
and we have enough of them to make that happen occasionally also),
but giving 29K benefit of the doubt on the constants, we end up with:
	DP +	DP *	SP +	SP *
  R3000 2	5	2	4
  29K	10	10	7	7
This, of course, is one of the most favorable-to-MIPS comparisons,
which is why I didn't use it in the first case.  Needless to say,
29K code generators will want to allocate FP variables to the FP unit
whenever possible, to avoid this kind of hit.

>| Now: THE REAL PROOF IS IN RUNNING REAL PROGRAMS, THRU COMPILERS.
>Agreed.
-- 
-john mashey	DISCLAIMER: 
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086