Path: utzoo!attcan!uunet!lll-winken!lll-tis!helios.ee.lbl.gov!pasteur!ucbvax!decwrl!alverson
From: alverson@decwrl.dec.com (Robert Alverson)
Newsgroups: comp.arch
Subject: Re: RISC bashing at USENIX
Summary: Library code does the right thing
Message-ID: <603@bacchus.DEC.COM>
Date: 14 Jul 88 20:50:36 GMT
References: <6965@ico.ISC.COM> <936@garth.UUCP> <202@baka.stan.UUCP> <59798@sun.uucp> <204@baka.stan.UUCP>
Reply-To: alverson@decwrl.UUCP (Robert Alverson)
Distribution: na
Organization: Digital Equipment Corporation
Lines: 30

In article <204@baka.stan.UUCP> stan!landru@boulder.edu writes:
>In article <59798@sun.uucp> pope@sun.UUCP (John Pope) writes:
>>>register long count;
>>>register long *src, *dst;
>           ^^^^
>>>   while( --count )
>>>   {
>>>      *dst++ = *src++;
>>>   }
>>*** Warning! Brain damaged software alert! ***
>>This should be re-coded to use the bcopy() library routine, which
>>does a 32 bit copy instead of a byte at a time. You should see a
>>*noticable* improvement. Moral: use your libraries, that's what they're 
>>there for.

Despite the incorrectness of Pope's reasoning, I tend to agree that
you should use a library routine to perform such a low-level function
as copying memory.  In particular, a library routine might unroll
the loop many times, so that the cost per word approaches that of a
single load+store pair.  This would make the cost per byte nearly 5
cycles on Sparc (I think), bringing it to 300ns (?).  This is still
rather high, it seems like a RISC ought to do a load+store in 2 or
3 cycles (scheduled!).

Similarly, on a VAX, the library routine might just happen to correspond
directly to a VAX instruction, so that the loop could be executed in
microcode.  In any case, copying memory seems like such a fundamentally
useful operation that you can expect the library code to be at least
as good as what you can get out of the compiler.

Bob