Path: utzoo!utgpu!water!watmath!clyde!att!alberta!calgary!radford
From: radford@calgary.UUCP (Radford Neal)
Newsgroups: comp.arch
Subject: Re: RISC bashing at USENIX
Message-ID: <1746@vaxb.calgary.UUCP>
Date: 14 Jul 88 19:13:24 GMT
References: <6965@ico.ISC.COM> <936@garth.UUCP> <202@baka.stan.UUCP>
Organization: U. of Calgary, Calgary, Ab.
Lines: 45

In article <202@baka.stan.UUCP>, landru@stan.UUCP (Mike Rosenlof) writes:

> When I first brought up X on our color sun 4/260, recently converted from
> a sun 3/260, I was amazed that the X server performance for simple things
> like scrolling and moving windows around was no better...

> The loop which does most of the work for a bit blt looks like this for the
> common copy case:
> 
> register long count;
> register long *src, *dst;
> 
>    while( --count )
>    {
>       *dst++ = *src++;
>    }
> 
> [ goes on to examine the code generated for 68020 and SPARC ]


Your problem is that the above C code is grossly non-optimal. Assuming
that "count" is typically fairly large, the optimal C code is the
following:

     bcopy ((char*)src, (char*)dst, count*sizeof(long));

If for some bizzare reason your C comiler doesn't come with a "bcopy"
routine, I suggest something along the following lines:

     while (count>8)
     {
        *dst++ = *src++; *dst++ = *src++; *dst++ = *src++; *dst++ = *src++;
        *dst++ = *src++; *dst++ = *src++; *dst++ = *src++; *dst++ = *src++;
        count -= 8;
     }
     while (count>0)
     { 
       *dst+ = *src++;
       count -= 1;
     }

There are, of course, many variations, and it's hard to tell which will
be best on any particular processor, which is why "bcopy" was invented.

   Radford Neal