Path: utzoo!attcan!uunet!cs.utexas.edu!usc!apple!oliveb!amiga!cbmvax!daveh
From: daveh@cbmvax.UUCP (Dave Haynie)
Newsgroups: comp.sys.amiga.tech
Subject: Re: DMA or polling (was Re: GVP controller)
Message-ID: <7616@cbmvax.UUCP>
Date: 10 Aug 89 18:10:43 GMT
References: <8908092130.AA23369@jade.berkeley.edu>
Organization: Commodore Technology, West Chester, PA
Lines: 88

in article <8908092130.AA23369@jade.berkeley.edu>, 451061@UOTTAWA.BITNET (Valentin Pepelea) says:

> Steve -Raz- Berry  writes in <120232@sun.Eng.Sun.COM>

>> In article <8908072207.AA14796@jade.berkeley.edu> 451061@UOTTAWA.BITNET
>>  (Valentin Pepelea) writes:

>> >The net result is that the processor therefore spends less time on the data
>> >transfer and is available more often for other concurrent tasks.

>>       Yikes! I'm sorry, but I TOTALLY disagree with you on this one.
>> Logicly, if you look at the time to complete a given task, based only
>> on the number of bus cycles it takes to transfer a given block of data,
>> DMA will always win. Period. 

> Clearly you don't understand, or perhaps I did not explain well. The 
> bottleneck here is the speed at which the hard disk turns, and therefore 
> the rate at which data is available to the DMA channel. 

>> Sorry, this is one EE type that just won't believe it. 

> Obviously some EE types are better than others. 

Well, you all know me as an EE type.  

I think there confusion here because the problem hasn't been properly 
decomposed.  There are two transfers going on in most hard drive systems --
from the drive to the controller, and from the controller to system memory.
It's always a losing proposition to transfer directly from the data as
read from the drive to the system memory, regardless of whether you go via
a CPU read method or a DMA method.  Fortunately, it's almost impossible as
well, unless you're dealing with direct manipulation of an ST-506 interface.

Assuming a SCSI device, you really don't have any idea how the data is 
handled between the physical hard drive and the SCSI channel.  Still, the best
a direct asynchronous SCSI read or DMA can do is significantly less that any
buffering scheme you might come up with.  The Apple Macintosh is a good example
of what happens when you don't buffer up your SCSI, if for no other reason than
to convert the SCSI byte stream to a word stream before travelling between the
controller and the system memory.  So let's agree not to take any simple,
stupid approaches -- all the mentioned controllers, GVP, Commodore, and
Microbotics, take a much more intelligent approach.

GVP is the simplest in concept.  It sucks up a whole block into local RAM,
then transfers this at memory-to-memory speeds across the bus, from it's
local RAM to it's final destination.  On a 68000, even with some cleverly
designed copy loops like CopyMemQuick() or similar, you'll still have over
two bus crossings per word transferred -- one from the local RAM to the 
68000, one from the 68000 to the system RAM, and occasional stops to fetch
opcodes.  With a 68010 or better, you can basically ignore the opcode fetch
time, but you still have the two complete bus crossings per word.  With a
68020 or 68030 and some 32 bit memory, you can reduce this to two slow and
one fast bus crossings per longword, which comes pretty close to one bus
crossing per word, but not quite.

The Commodore controllers are all DMA driven and backed by a FIFO.  The 2090
will read from the SCSI controller into it's FIFO, and when the FIFO starts to
fill, it'll take the bus, dump 32 words across at full speed, and then give
back the bus.  This results in one bus crossing per word, plus a small bus
arbitration time.  Most other DMA driven controllers work very similarly.

The main idea here is that the fastest a non-DMA controller will ever run is
approximately the same as the normal speed of a DMA controller.  Without a
68020 or 68030 and some 32 bit RAM, the DMA controller is always a win.  You
can, of course, pick a bad DMA controller and compare it to a good programmed
controller, or visa versa, to accentuate the point of YOUR particular
religious views, but I'm dealing in science here.

There is one situation where a non-DMA device will run faster than a DMA device
in Amiga systems.  If you have a 68020 or 68030 system with 32 bit memory above
the 24 bit address space of the 68000, a good non-DMA device like GVPs will go
faster under FFS.  The deal here is that the programmed transfer doesn't have 
any 24 bit limits, while the DMA transfer does.   Plus, with a 32 bit card, the
non-DMA transfer is already approaching the speed of the DMA transfer (the
difference with a fast '030 card may be as much software overhead as hardware
differences).  So while the non-DMA transfer works normally, the DMA device must
dump it's data to a temporary RAM buffer, and then run a CPU driven copy to the
final destination.  That copy is likely about as fast as the non-DMA transfer,
so in this situation, the non-DMA device may be around twice as fast as the
DMA transfer.  This situation will disappear with full 32 bit DMA device, but
you won't be having them on the A2000 bus.

> Valentin

-- 
Dave Haynie Commodore-Amiga (Systems Engineering) "The Crew That Never Rests"
   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: D-DAVE H     BIX: hazy
           Be careful what you wish for -- you just might get it