Path: utzoo!utgpu!water!watmath!clyde!att!rutgers!ucsd!ucsdhub!hp-sdd!hplabs!hp-sde!hpcea!hpausla!cjh From: cjh@hpausla.HP.COM (Clifford Heath) Newsgroups: comp.arch Subject: Re: block copy & VAX MOVC (was Re: Explanation, please!) Message-ID: <2220003@hpausla.HP.COM> Date: 26 Sep 88 07:35:17 GMT References:Organization: HP Australian Software Operation Lines: 35 I played with Duffs device on an HP 9000/850 (RISC machine), and got some interesting results. Duffs is faster than the comparable non-unrolled loop, but only by about 20-30%. memcpy was heaps faster, so I looked at the (memcpy) assembly code using a debugger. As a result of this I changed the unrolling factor in Duff's to 4 (not much change), changed the auto-incr pointer addressing to short offset indexing (using a pointer adjustment before the loop and a single increment before the while) and got about 30% more. The 850 has auto-increment, but it still takes time that doesn't need to be wasted. It also has a good global optimizer, which seemed to do sensible things even for this strange device. Duffs's was STILL slower than memcpy by about 50%, and couldn't handle byte-size moves, non-aligned moves etc etc. Duff's is really only a way of saving the code size required to perform the additional moves left after the unrolled loop has run, which is a fairly poor excuse for using a device that's so hard to read. The only additional benefit is that the extra instructions may be in the I-cache, which isn't really such a big deal. The memcpy on the 850 is quite an astonishing effort, using word moves with double register 8/16/24 bit shifts for unequally non-aligned moves. It also has a very small setup time, so that small moves get caught early and handled quickly. Congratulations to the coder, a very good effort. Before this experiment, I was convinced that C with a good optimizer could get within 10% of assembly code for anything. I now have a convincing counter-example. In short, use the system-supplied routines for preference, and if they prove to be slow, replace them yourself AND SEND THE CODE to the company that wrote it. They'll probably be grateful. Clifford Heath, Hewlett Packard Australian Software Operation. (UUCP: hplabs!hpfcla!hpausla!cjh, ACSnet: cjh@hpausla.oz)