Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.1 6/24/83; site isrnix.UUCP
Path: utzoo!watmath!clyde!burl!we13!ihnp4!inuxc!iuvax!isrnix!greg
From: greg@isrnix.UUCP (Gregory R. Travis)
Newsgroups: net.unix-wizards
Subject: Some hacks I'll share!
Message-ID: <98@isrnix.UUCP>
Date: Fri, 9-Mar-84 04:10:51 EST
Article-I.D.: isrnix.98
Posted: Fri Mar  9 04:10:51 1984
Date-Received: Sun, 11-Mar-84 01:22:34 EST
Organization: Inst. of Social Res. (Indiana University)
Lines: 44


   I was doing some playing this evening and guess what I found out:

		1) On the PDP 11/44 a floating point (double precision)
		   clear (8 bytes) is almost exactly twice as fast as
		   4 clr (integer clear (2 bytes each)) instructions.
		   I replaced the code in clrbuf (in bio.c) with
		   floating point clears for a code speedup.
		2) A floating point load (double prec. again) 
		   followed by a floating point store is just a weeeee
		   bit faster than the appropriate number of 'mov'
		   instructions (assuming the cache is disabled).
		   I'll bet on the 11/70 you could use floating point
		   load/stores for twice the speed over conventional
		   mov's.

  What the h*ll does this mean?  That for some applications involving
  manipulation of blocks of data, it may be keen-o to use the floating
  point processor for the manipulations.  Super-cool 11 floating point
  processors (like the FP-11C in the 11/70 and FP-11E in the 11/60)
  that operate in parallel with the CPU may give you quite a performance
  boost if you play your cards right.

  Can anyone see problems with this scheme?  Has anyone thought of it
  before?  

  Does anyone run a 44 or 24 with the commercial instruction set 
  option?  If you do,  do you use the block character move instructions?
  Here at isrnix I wrote some code that copies kernel buffers to/from the
  users address space with 'mov' instructions (the scheme plays with the
  segmentation registers) instead of the slow m[t,f]p[d,i] instructions.
  It would be a thrill to see if I could pop a CIS board in our CPU and
  use the block move instruction and see what kind of a performance
  increase I get.  Even with the current situation I get better than 
  twice the performance in copying buffers than the previous copyin/copyout
  scheme.

  Any comments?

-- 
    Gregory R. Travis
    Institute for Social Research - Indiana University - Bloomington, In
    ihnp4!inuxc!isrnix!greg
    {pur-ee,allegra,qusavx}!isrnix!greg