From: utzoo!decvax!ittvax!swatt
Newsgroups: net.unix-wizards
Title: Re: DZ-11 vs DH-11
Article-I.D.: ittvax.437
Posted: Tue Sep 14 10:23:29 1982
Received: Wed Sep 15 04:06:47 1982
References: duke.2531

I can contribute a LITTLE information  on the  programmed I/O vs.
DMA tradeoff point.

We have a Megatek 7200  graphics  system  running  which  can  do
both.    The  DMA  interface  is  quite  conventional.   The  PIO
operation never interrupts unless you specify a bad address (even 
then the interrupt  can  be  disabled  and  the  error  is  still
available  in  the  status  register).  I modified a driver I got
from Purdue which had provisions for choosing which method to use 
based on transfer size.  This driver was for  an  11/70  and  the
threshold was set at 20 bytes.  

I/O to Megatek graphics memory  is  always  in  terms  of  32-bit
words,  so  the  Purdue  driver  would use DMA for transfers of 5
Megatek words or more.  

The PIO operation is such that  if  you're  transferring  several
words  to  sequential  addresses, you only load the address once.
Thereafter each transfer is just: 

	load most significant 16-bit half-word
	load least significant 16-bit half-word
	check for error

The transfer goes as fast as the VAX can run that loop.   I  have
added  code  that  allows a user-settable threshold and have done
some crude experiments in PIO vs.   DMA  overhead.   The  default
threshold  is  now 64 megatek words (128 bytes) and it seems that
even for those size transfers PIO is less overhead than  DMA.   I
haven't  looked closely at the Unibus map allocate operation, but
it must be fairly involved.  

Now for devices like DZ's, where you can't transfer characters as 
fast as the CPU can stuff  them,  the  tradeoff  point  obviously
depends  on  how fast you can service an interrupt.  Berkeley 4.1
has a special assembly-language transmit  interrupt  routine  for
DZ's that take characters out of a buffer and stuff them into the 
DZ  data  buffer  and  only call the C interrupt routine when the
buffer is empty.  The Berkeley documents say that 1 DZ line doing 
continuous output at 9600 baud consumes 5% of a 780  CPU,  verses
3% for the same output from a DH line.  

If you had a  DZ  device  with  an  internal  buffer  of  say  64
characters  per  line,  and  you could stuff characters into that
buffer as fast as the CPU could loop, and only get  an  interrupt
when the buffer was empty, then I bet such a device would be less 
overhead  than a DH in all cases (for VAX anyway; PDP-11 might be
different).  UNIX won't do DMA to DH devices in hunks larger than 
a cblock structure can hold anyway (28 characters on  4.1bsd;  14
characters on standard V7).  I'm SURE you could stuff 28 bytes in 
a  loop  in  a lot less time than it takes to allocate and free a
Unibus map.  

	- Alan S. Watt
	([decvax!]ittvax!swatt)