Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!utcsri!utegc!utai!garfield!dalcs!mnetor!uunet!seismo!gatech!amdcad!tim From: tim@amdcad.UUCP Newsgroups: comp.lang.c,comp.lang.forth,comp.lang.misc Subject: Re: The winner! Message-ID: <17623@amdcad.AMD.COM> Date: Tue, 21-Jul-87 12:34:01 EDT Article-I.D.: amdcad.17623 Posted: Tue Jul 21 12:34:01 1987 Date-Received: Thu, 23-Jul-87 01:35:09 EDT References: <398@sugar.UUCP> <8326@utzoo.UUCP> <1946@aw.sei.cmu.edu> Reply-To: tim@amdcad.UUCP (Tim Olson) Organization: Advanced Micro Devices, Inc., Sunnyvale, Ca. Lines: 43 Xref: utgpu comp.lang.c:3042 comp.lang.forth:97 comp.lang.misc:546 In article <1946@aw.sei.cmu.edu> firth@bd.sei.cmu.edu.UUCP (PUT YOUR NAME HERE) writes: +----- | On our M/500, the time taken is about 250 ns for each sequence. | | Can anyone beat THAT? +----- Now go to a faster cycle time, like the mips M/800 (62ns) or the Am29000 (40ns, simulated): NEXT: jmpi ip add ip,ip,4 DOCOL: call ret_addr,store ret_addr, ip That's now 80ns for each sequence. You can also use the large register file of the Am29000 to cache the FORTH parameter stack on-chip, so primatives like "+" can be written like: plus: add lr1, lr1, lr0 ; perform add : nos <- nos + tos add gr1, gr1, 4 ; pop stack instead of: plus: load r0, (sp) load r1, 4(sp) add r1, r1, r0 store r1, 4(sp) add sp, sp, 4 which has to go to memory 3 times. -- Tim Olson Advanced Micro Devices (tim@amdcad.amd.com)