Path: utzoo!utgpu!attcan!uunet!cbmvax!daveh
From: daveh@cbmvax.UUCP (Dave Haynie)
Newsgroups: comp.arch
Subject: Re: Sw vs. Hw BitBlit.
Message-ID: <4461@cbmvax.UUCP>
Date: 10 Aug 88 15:47:53 GMT
References: <61783@sun.uucp>
Organization: Commodore Technology, West Chester, PA
Lines: 62

in article <61783@sun.uucp>, guy@gorodish.Sun.COM (Guy Harris) says:
> Keywords: BitBlit.

> 	  The second common case of "bitblt" is scrolling a rectangular region
> 	of a bitmap, usually the display.  Since the word boundaries in the
> 	scan lines of a bitmap are at the same place in each line, the speed of
> 	scrolling depends primarily on the speed of the MC68000 instruction

> 		mov.l	%a0@+, %a1@+

> 	or, in C,

> 		register long *p, *q;
> 		*p++ = *q++;

> 	For typical rectangles, the edges, which must be handled with more
> 	complicated code, do not dominate the performance.  There is nothing
> 	hardware can do to accelerate this loop except provide faster memory
> 	access.  If the display were accessed through a narrower or clumsier
> 	interface, it would take longer to move the data.

With a MC68000, not so.  Given an equal memory access speed, something like
a DMA controller can be several times faster than the 68000.  All it needs
do is fetch data from location A, dump it to location B, and increment some
internal counters.  While it looks like that's what the 68000 is doing, it's
really also fetching the move instruction and a branch instruction of some
kind.  So for every word moved, you're probably fetching as many instruction
words as overhead.  Certainly the 68010 in some cases and the 68020 in most
cases solve this problem via caching, but I can't yet buy either of these 
parts for the $2.50 or so I pay for a 68000.  

> If a BitBlt chip is reasonably cheap, and can do the whole job, it may be worth
> it.  Note that in the cases shown, you got at most a 3.5x speedup (scroll
> screen horizontally).  For vertical scrolling, you got only 1.18x; for randomly
> drawing the letter 'a', you got only 1.23x; and for texturing a random 40x40
> square, you got 1.95x.  How cheap does it have to be for that to be worth it?
> (The "do the whole job" comes from comments made in the paper that a
> half-hearted hardware assist can get in the way, rather than help.)

You also have to consider a few more things.  For instance, if you have a blitter
that operates on video memory and lets the CPU do things with non video memory
in parallel (like on the Amiga, and apparently on the Sun mentioned), then you
have a big advantage, in that any blit may end up costing nothing but the setup
time in terms of real CPU usage.  Still no good reason to use the blitter for
small, single character blits, but it can really be a justification for larger
things.  And given that a blit chip can often be a much simpler design than the
host CPU, there's a real good chance it WILL be able to have a faster path to
memory.

That depends of course on the chip and the base CPU in your system.  If the
combination of a blitter chip and 68000 ran me more than a 68020, that had
better be one heck of a blitter, or I'm wasting my $$$ -- the 68020 being more
general purpose than a blitter can give you a better overall system performance.
But if I can get my blitter and 68000 CPU and maybe a bunch of other functions 
for less than the cost of a 68010, I'm probably winning (if I'm not concerned
about the 68010's virtual memory facilities, which a Sun of course obviously
is).

-- 
Dave Haynie  "The 32 Bit Guy"     Commodore-Amiga  "The Crew That Never Rests"
   {ihnp4|uunet|rutgers}!cbmvax!daveh      PLINK: D-DAVE H     BIX: hazy
		"I can't relax, 'cause I'm a Boinger!"