Path: utzoo!attcan!utgpu!watmath!att!dptg!rutgers!usc!ucsd!ames!amdahl!pyramid!ncc!alberta!idacom!andrew
From: andrew@idacom.UUCP (Andrew Scott)
Newsgroups: comp.lang.forth
Subject: Re: Forth Compilation (again) ?
Message-ID: <712@idacom.UUCP>
Date: 8 Aug 89 19:53:46 GMT
References: <114600003@uxa.cso.uiuc.edu>
Organization: IDACOM Electronics Ltd., Edmonton, Alta.
Lines: 61

In article <114600003@uxa.cso.uiuc.edu>, ews00461@uxa.cso.uiuc.edu writes:
> 
> I've seen Forth systems that generate object code.  How is usually done ?
> Inline code ?  Lots of jsr (subroutine calls) ?  Are they simply using
> the dictionary as a "symbol table" ?

Yes, yes, and sometimes.  Or, to be a little less glib, there are many ways
of implementing subroutine threaded systems and you've touched on some of the
techniques used.

First of all, a subroutine threaded Forth uses subroutine calls as the threading
mechanism.   Depending on the underlying architechture, this can be faster than
any other kind of threading because the inner interpreter has been eliminated.
Literals are handled by inline code that pushes the value to the stack.  Cond-
itional code uses the processor's native branch instructions instead of mani-
pulating the inner interpreter's instruction pointer.

The subroutine calls need not take up more space, either.  For a 68000 system,
you can use the two-byte or four-byte forms of BSR when compiling words that
are within 128 or 32K of the current location.  You only need to use a JSR
instruction when you call a word over 32K away.  If the code is sufficiently
modular, it shouldn't happen that often.

Inline code is usually used as an optimization in these systems.  Things like
DUP, DROP, +, and EXIT are usually very short words and do not increase the
size of code when inlined.

I recently wrote a subroutine-threaded Forth that added a new class of optim-
izations to those above.  I made the compiler do more than " CFA , " when
compiling a word.  Instead, sequences of words are found that can be collapsed
into short sequences of inline code.  For example:

	Forth			68000 code
	-----			----------
	DUP >R			MOV.L   (SP), -(RP)

	LIT +			ADDI.L  #N, (SP)     (ADDQ used when possible)

	= 0BRANCH		CMPM.L  (SP)+, (SP)+
				BNE.S   ??

The latter example illustrates how conditionals can be made even faster.  IF,
UNTIL, WHILE etc. all compile the primitive 0BRANCH (or ?BRANCH in other
Forths).  A relational operator such as = can be "folded into" the branch,
eliminating the need of pushing a true/false value to the stack only to be
popped off by 0BRANCH and tested.

Using a list of about 75 sequence "rules", the compiler produced code for a
particular application that ran about three times as fast as indirect threaded
Forth and was 12% smaller.

The optimizer was not that large, either.  It was written in about 250 lines
of assembler (for compilation speed).  The rules file was about 200 lines of
Forth.  No "inline" flags were required either.  The only hook was to replace
the  CFA ,  in the guts of INTERPRET with a new word, which I called (COMPILE).


(BTW: I'm glad to see that activity on comp.lang.forth has picked up recently!)
-- 
Andrew Scott			andrew@idacom
			- or -	{att, watmath, ubc-cs}!alberta!idacom!andrew