Xref: utzoo comp.sys.atari.st:12763 comp.os.minix:4237 Path: utzoo!utgpu!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!uwmcsd1!marque!uunet!mcvax!hp4nl!philmds!leo From: leo@philmds.UUCP (Leo de Wit) Newsgroups: comp.sys.atari.st,comp.os.minix Subject: Re: improved long multiply for GCC -m68000 Message-ID: <883@philmds.UUCP> Date: 6 Dec 88 12:52:38 GMT References: <22486@watmath.waterloo.edu> Reply-To: leo@philmds.UUCP (Leo de Wit) Distribution: comp Organization: Philips I&E DTS Eindhoven Lines: 43 In article <22486@watmath.waterloo.edu> egisin@watmath.waterloo.edu (Eric Gisin) writes: |Here's an improved long multiply for the 68000. |The original came from some program in comp.sources.amiga. |It uses a single hardware multiply when the arguments are |the range 0 to 2^^16 - 1. The signed version is the |same as the unsigned, since a non-widening twos complement |multiply is the same for signed and unsigned operands |(except for overflow detection, which we ignore). | || optimized long multiply - author unknown [source left out]... It is even feasible to do some more optimization: (if we call the upper half of d0 d0.hi, the lower half d0.lo etc): the routine currently calculates d0.lo x d1.lo, and, if one of (d0.hi, d1.hi) is non-zero, also d0.lo x d1.hi and d0.hi x d1.lo. That is, 1 mult. for d0, d1 in the 0..65535 range, 3 mult.'s in all other cases. Since multiplications are rather expensive (as compared to many other operations) some more testing for special cases come to mind: If EITHER one of (d0.hi, d1.hi) is zero, we can leave out the multiplications that use them (for obvious reasons). If either one of (d0.hi, d1.hi) is -1 (that is, 0xffff in this case), we can reduce this to the 'zero case', after doing a neg.l d0 (or d1, whichever it was), and negating the result in the end, or alternatively subtracting d1.lo or d0.lo from the partial sum, since adding a number multiplied by -1 is the same as subtracting it. This isn't such a special case if you consider the fact that either one of (d0.hi, d1.hi) must be zero (possibly after the 'negation trick') for the result not to overflow. The conclusion is that two multiplications should suffice (in case of no overflow). If someone isn't already hacking on this one, I'll take a look at it and post the results when I'm done. Leo. P.S. Lattice C passes the arguments for multiplication in registers (d0 and d1 I think). This can further increase the routine's speed, since you save 4 stack references (and a stack justification ala addq.l #8,sp), although I don't know if it is possible to force the compiler into this action for user supplied library routines.