Path: utzoo!utgpu!jarvis.csri.toronto.edu!rutgers!mit-eddie!uw-beaver!ubc-cs!alberta!calgary!enme3!deraadt
From: deraadt@enme3.ucalgary.ca (Theo Deraadt)
Newsgroups: comp.sys.amiga
Subject: Re: GVP controller
Keywords: DMA is better.
Message-ID: <1701@cs-spool.calgary.UUCP>
Date: 12 Aug 89 09:28:30 GMT
References: <8908072207.AA14796@jade.berkeley.edu> <12254@grebyn.com> <501@tardis.Tymnet.COM>
Sender: news@calgary.UUCP
Reply-To: deraadt@enme3.UUCP (Theo Deraadt)
Organization: U. of Calgary, Calgary, Alberta, Canada
Lines: 67


Wrong. Everyone does it this way, ie. 2090A, VME boards.. all the good
boards use FIFO's and DMA circuitry, with the exception of some AWESOME
VME boards that have 2M of ram and 68020's on them... but anyways..

1.	wait for 1/2 full mark
2.	start DMA transfer
3.	transfer like heck till empty
4.	suspend DMA transfer
5.	goto 1

Example, the FIFO is 64 bytes, the disk is transferring 1 byte every 3500ns.
The CPU can get a byte every 520ns. Nothing happens till fifo hits 32bytes.
A DMA request is made at this point, and takes ~9 cpu cycles = 4680ns.
During that time, an extra 1.3 bytes arrive.
Now, the processor start to unload, while the disk drive is still filling
it. 520ns * 1 word (2090 does words, some do bytes) * 16 words = 8320ns.
During that time 2.3 bytes arrive. So transfer another 2 bytes arrive from
the disk, that's 520ns. So, 8320+520 = 8840ns, during which 17 words were
sent into the Amiga. Now, the DMA cycle ends, which takes ~4 cpu cycles.
Now the fifo chip starts to fill up again. Each DMA cycle took 520ns,
which is the fastest that you can transfer data on the Amiga BUS, because
that's how a 68000 ~8MHz bus operates. Thus, to transfer 512 bytes =
256 words * 520ns = 256+some more cycles = 133120+ns of cpu cycles. During
this time, the CPU is dead. Of course, the entire sector takes 1792000ns
to arrive. But the difference in time CAN be used by the cpu.

Now, the alternative is to use the cpu to transfer the data. In this case
you have the following. The data gets into a 512byte fifo on the controller
board, and the processor gets an interrupt that the fifo is full. Well, it's
not gonna poll it, is it? So, it has to go get the data now. Oh yeah, we
should count the interrupt latency (the time to actually get to the right
code that copies the data out of the fifo into ram, which is NOT insignificant
compared to my 256+ max 50 bus cycles. Then you have to execute lots of
bus cycles to copy the data. If you have a 68010 which has the fancy loop
mode, and if you use that, you can BEGIN to do about 3 times the work that
the DMA does. You can check this with cycle timings in a 68000 book about
anywhere. A 68010 will do better, in loop mode, and a 68020 will do better
again, but not by near as much as you would think. In fact I suspect there
might not be a difference of more than 3% between the 68010 and the 68020.

I have designed a SCSI port for a VME board that is not built yet, but is
capable through the use of large FIFO's (8K long) of probably 3M/sec. There
are devices that can get data that fast. If you had a Conner SCSI drive
with a cache on it, or one of the new Maxtors, the SCSI drive can download
from the cache at incredible speeds. I suspect the 2090 (2.5M/sec max)
cannot load it's FIFO (~80ns cycle time I think) fast enough to keep up to
it.

With a SCSI drive, any of the newer than 2 year ones anyways, the drive
spin time is NOT a factor. You issue a command to the SCSI drive, and then
disconnect from the device. The device has a cache on it. It gets the first
bunch of stuff. Then it reconnects to the controller. It starts sending
data as fast as you/it can handle it, through an async protocol. As it's
doing this, it's loading the next stuff into the fifo allready. These
drives go like spit. The end result is that if you request a big block
from your hard drive, you will spend your entire time with the processor
trying to keep your fifo clean, while the DMA will be able to do that
three times as fast. IF you could go that fast.

I have a databook for a Maxtor scsi, an earlier model actually. The thing
has two 8 bit processors (Z8 & i8031), a dual port ram, more than 32K of
rom, and a bunch of dma circuitry as well. It's more bloody complicated
than a simple DUMB NONDMA controller you might want to put in your Amiga.