Path: utzoo!utgpu!water!watmath!clyde!att!alberta!calgary!radford From: radford@calgary.UUCP (Radford Neal) Newsgroups: comp.arch Subject: Re: RISC bashing at USENIX Message-ID: <1746@vaxb.calgary.UUCP> Date: 14 Jul 88 19:13:24 GMT References: <6965@ico.ISC.COM> <936@garth.UUCP> <202@baka.stan.UUCP> Organization: U. of Calgary, Calgary, Ab. Lines: 45 In article <202@baka.stan.UUCP>, landru@stan.UUCP (Mike Rosenlof) writes: > When I first brought up X on our color sun 4/260, recently converted from > a sun 3/260, I was amazed that the X server performance for simple things > like scrolling and moving windows around was no better... > The loop which does most of the work for a bit blt looks like this for the > common copy case: > > register long count; > register long *src, *dst; > > while( --count ) > { > *dst++ = *src++; > } > > [ goes on to examine the code generated for 68020 and SPARC ] Your problem is that the above C code is grossly non-optimal. Assuming that "count" is typically fairly large, the optimal C code is the following: bcopy ((char*)src, (char*)dst, count*sizeof(long)); If for some bizzare reason your C comiler doesn't come with a "bcopy" routine, I suggest something along the following lines: while (count>8) { *dst++ = *src++; *dst++ = *src++; *dst++ = *src++; *dst++ = *src++; *dst++ = *src++; *dst++ = *src++; *dst++ = *src++; *dst++ = *src++; count -= 8; } while (count>0) { *dst+ = *src++; count -= 1; } There are, of course, many variations, and it's hard to tell which will be best on any particular processor, which is why "bcopy" was invented. Radford Neal