Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!utcsri!utegc!utai!garfield!dalcs!mnetor!uunet!seismo!gatech!amdcad!tim
From: tim@amdcad.UUCP
Newsgroups: comp.lang.c,comp.lang.forth,comp.lang.misc
Subject: Re: The winner!
Message-ID: <17623@amdcad.AMD.COM>
Date: Tue, 21-Jul-87 12:34:01 EDT
Article-I.D.: amdcad.17623
Posted: Tue Jul 21 12:34:01 1987
Date-Received: Thu, 23-Jul-87 01:35:09 EDT
References: <398@sugar.UUCP> <8326@utzoo.UUCP> <1946@aw.sei.cmu.edu>
Reply-To: tim@amdcad.UUCP (Tim Olson)
Organization: Advanced Micro Devices, Inc., Sunnyvale, Ca.
Lines: 43
Xref: utgpu comp.lang.c:3042 comp.lang.forth:97 comp.lang.misc:546

In article <1946@aw.sei.cmu.edu> firth@bd.sei.cmu.edu.UUCP (PUT YOUR NAME HERE) writes:
+-----
| On our M/500, the time taken is about 250 ns for each sequence.
| 
| Can anyone beat THAT?
+-----
Now go to a faster cycle time, like the mips M/800 (62ns) or the Am29000
(40ns, simulated):

NEXT:
	jmpi	ip
	add	ip,ip,4

DOCOL:
	call	ret_addr, 
	store	ret_addr, ip


That's now 80ns for each sequence.  

You can also use the large register file of the Am29000 to cache the
FORTH parameter stack on-chip, so primatives like "+" can be written
like:

	plus:
		add	lr1, lr1, lr0	; perform add : nos <- nos + tos
		add	gr1, gr1, 4	; pop stack

instead of:

	plus:
		load	r0, (sp)
		load	r1, 4(sp)
		add	r1, r1, r0
		store	r1, 4(sp)
		add	sp, sp, 4

which has to go to memory 3 times.


	-- Tim Olson
	Advanced Micro Devices
	(tim@amdcad.amd.com)