Path: utzoo!utgpu!water!watmath!clyde!bellcore!rutgers!cbmvax!uunet!mcvax!philmds!leo
From: leo@philmds.UUCP (Leo de Wit)
Newsgroups: comp.sys.atari.st
Subject: Re: null fill eliminated
Keywords: addition
Message-ID: <491@philmds.UUCP>
Date: 3 Jun 88 09:54:28 GMT
References: <490@philmds.UUCP>
Reply-To: leo@philmds.UUCP (L.J.M. de Wit)
Organization: Philips I&E DTS Eindhoven
Lines: 76

Here are some small corrections for the fast loader I put on the net this week.
1) There is a header for the module now. It says:

* Even when loading from ramdisk or harddisk the ROM program null fills all
* uninitialized data, heap, stack (often the major part of your RAM).
* This null filler makes loading programs faster. Its null filling is 7 times
* as fast as the ROM's, using the quick movem.l instruction. Besides it only
* clears the BSS space.
* At least the fillhigh and filllow addresses have to be adapted to suit your
* ROM version.

2) The bigint definition should read:

bigint		equ $7ffffff0

3) I abandoned the idea of no null filling at all. Some programs generated
bus errors when started with this VBL routine active, so I've looked things 
up in K & R. In paragraph 4.9 (Initialization):
...In the absence of explicit initialization, external and static variables 
are guaranteed to be initialized to zero; ...
So the routine now clears the BSS space; the programs that generated errors
now work OK. The null filling is performed by null filling chunks of 128
bytes using movem.l instructions; that seems to be the fastest way, especially
if you move many registers at a time. The 'modulo 128' part is cleared first,
at the top of the BSS. Here it is (I have left the initialization routine out):

fastload
	movea.l	74(sp),a0	* PC
	cmpa.l	#fillhigh,a0
	bhi.s	fastdone
	cmpa.l	#filllow,a0
	blt.s	fastdone
	lea.l	32(sp),a0	* Address D5 on stack
	cmp.l	#bigint,(a0)
	bge.s	fastdone	* Already filled
	move.l	#bigint,(a0)	* Maximize D5 on stack
	move.l	68(sp),a6	* Value of A6 on stack to A6
	move.l	-4(a6),a4	* Start of block to fill
	move.l	-58(a6),d0	* # bytes to fill: BSS size
	move.l	d0,d1
	and.w	#$7f,d1		* d1 = d0 & 0x7f
	moveq.l	#0,d2
	lea.l	(a4,d0.l),a5	* End (one past)
	bra.s	fastl1
fastl0
	move.b	d2,-(a5)	* Clear top d1 bytes
fastl1
	dbra	d1,fastl0
	moveq.l	#0,d0		* Nullify d0-d7/a0-a3
	move.l	d0,d1
	move.l	d0,d2
	move.l	d0,d3
	move.l	d0,d4
	move.l	d0,d5
	move.l	d0,d6
	move.l	d0,d7
	move.l	d0,a0
	move.l	d0,a1
	move.l	d0,a2
	move.l	d0,a3
	bra.s	fastl3		* a5 - a4 is now a multiple of 128
fastl2
	movem.l	do-d7/a0-a3,-(a5)  * Clear 4 * (12 + 12 + 8) = 128 bytes / turn
	movem.l	do-d7/a0-a3,-(a5)
	movem.l	do-d7,-(a5)
fastl3
	cmpa.l	a4,a5
	bgt.s	fastl2		* Until start address A4 reached
fastdone
	rts

	section s.data

noque	dc.b	'No vbl entry available!',13,10,0

	end