Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.1 6/24/83; site isrnix.UUCP Path: utzoo!watmath!clyde!burl!we13!ihnp4!inuxc!iuvax!isrnix!greg From: greg@isrnix.UUCP (Gregory R. Travis) Newsgroups: net.unix-wizards Subject: Some hacks I'll share! Message-ID: <98@isrnix.UUCP> Date: Fri, 9-Mar-84 04:10:51 EST Article-I.D.: isrnix.98 Posted: Fri Mar 9 04:10:51 1984 Date-Received: Sun, 11-Mar-84 01:22:34 EST Organization: Inst. of Social Res. (Indiana University) Lines: 44 I was doing some playing this evening and guess what I found out: 1) On the PDP 11/44 a floating point (double precision) clear (8 bytes) is almost exactly twice as fast as 4 clr (integer clear (2 bytes each)) instructions. I replaced the code in clrbuf (in bio.c) with floating point clears for a code speedup. 2) A floating point load (double prec. again) followed by a floating point store is just a weeeee bit faster than the appropriate number of 'mov' instructions (assuming the cache is disabled). I'll bet on the 11/70 you could use floating point load/stores for twice the speed over conventional mov's. What the h*ll does this mean? That for some applications involving manipulation of blocks of data, it may be keen-o to use the floating point processor for the manipulations. Super-cool 11 floating point processors (like the FP-11C in the 11/70 and FP-11E in the 11/60) that operate in parallel with the CPU may give you quite a performance boost if you play your cards right. Can anyone see problems with this scheme? Has anyone thought of it before? Does anyone run a 44 or 24 with the commercial instruction set option? If you do, do you use the block character move instructions? Here at isrnix I wrote some code that copies kernel buffers to/from the users address space with 'mov' instructions (the scheme plays with the segmentation registers) instead of the slow m[t,f]p[d,i] instructions. It would be a thrill to see if I could pop a CIS board in our CPU and use the block move instruction and see what kind of a performance increase I get. Even with the current situation I get better than twice the performance in copying buffers than the previous copyin/copyout scheme. Any comments? -- Gregory R. Travis Institute for Social Research - Indiana University - Bloomington, In ihnp4!inuxc!isrnix!greg {pur-ee,allegra,qusavx}!isrnix!greg