Path: utzoo!attcan!uunet!lll-winken!lll-tis!ames!ncar!boulder!stan!landru
From: landru@stan.UUCP (Mike Rosenlof)
Newsgroups: comp.arch
Subject: Re: RISC bashing at USENIX
Message-ID: <202@baka.stan.UUCP>
Date: 13 Jul 88 15:29:34 GMT
References: <6965@ico.ISC.COM> <936@garth.UUCP>
Reply-To: stan!landru@boulder.edu
Organization: SAE Inc., Longmont, Colorado
Lines: 73

In article <936@garth.UUCP> walter@garth.UUCP (Walter Bays) writes:
>
>An article in the July issue of UNIX Review sheds some light on the issue.
>It's by David Wilson of Workstation Laboratories (another benchmark service).
>The article shows "... a class of tasks for which the Sun 4/260 is two or
>three times faster than the Sun 3/260, a class for which performance is about
>the same, and a class where the 4/260's performance seems slightly lower..."
>Wilson discusses the reasons for these results.
>

When I first brought up X on our color sun 4/260, recently converted from
a sun 3/260, I was amazed that the X server performance for simple things
like scrolling and moving windows around was no better.  This was just how
it looked, I didn't get out my stopwatch.  So I did a little comparrison
with bit blt ( bit block transfer, scrolling, moving a window, ... ) timing.

The loop which does most of the work for a bit blt looks like this for the
common copy case:

register long count;
register long *src, *dst;

   while( --count )
   {
      *dst++ = *src++;
   }


the sun 68k compiler after optimizing, produces this code:

LY00001:
    movl    a5@+,a4@+
LY00000:
    subql   #1,d7
    jne LY00001

according to the 68020 users manual, this loop takes 10 clocks in the
best case and 15 clocks in its cache case.  With a 40 nsec clock, this
is 400 and 600 nsec per loop.


the sun SPARC compiler after optimizing, produces:

LY2:                    ! [internal]
    ld  [%o3],%o0
    dec %o5
    tst %o5
    st  %o0,[%o4]
    inc 4,%o3
    bne LY2
    inc 4,%o4

which takes 9 clocks, and with a 60 nsec clock, this is 540 nsec.

Since the 68K loop is so tight, I suspect we're seeing the best case 68K
timing with SPARC doing the set up work faster to make up some of the
difference.  

Which processor is going to get faster clocks sooner?  Or will newer
versions reduce the clock count?  I've heard of a 33 Mhz 68020 being
available now or soon, other SPARC implementations are also said to be
in the works.


My point is that with a reduced instruction set, you're very likely to 
find some applications that are slowed down by this reduction.  In this
case, I find that the sun 4/260 makes a very nice compile or compute
server, but it's not a very impressive X server.

-- 
Mike Rosenlof		SAE			(303)447-2861
2190 Miller Drive			stan!landru@boulder.edu
Longmont Colorado			landru@stan.uucp
80501					...hao!boulder!stan!landru