Path: utzoo!utgpu!jarvis.csri.toronto.edu!mailrus!csd4.csd.uwm.edu!bionet!ames!amdcad!sun!road!khb
From: khb@road.Sun.COM (Keith Bierman - Advanced Languages - Floating Point Group )
Newsgroups: comp.arch
Subject: Re: John von Neumann, sqrt instr
Message-ID: <122600@sun.Eng.Sun.COM>
Date: 19 Aug 89 01:57:20 GMT
References: <21353@cup.portal.com> <25643@obiwan.mips.COM> <1513@l.cc.purdue.edu> <2376@wyse.wyse.com>
Sender: news@sun.Eng.Sun.COM
Reply-To: khb@sun.UUCP (Keith Bierman - Advanced Languages - Floating Point Group )
Distribution: usa
Organization: Sun Microsystems, Mountain View
Lines: 44

In article <2376@wyse.wyse.com> stevew@wyse.UUCP (Steve Wilson xttemp dept303) writes:
>In article <1513@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:

>The Cydra-5 included hardware divide and square-root on the some
>board ....
>
>I know that both operations sure did a number on scheduling the inner-most
>loop.  Both operations had a long latency, thus caused scheduling
>headaches.>
>How does the scientific computing community feel about this functionality?

The Cydra-5's very long divide latency proved very harmful when
running customer level application benchmarks. Divide may be "rare"
but when it happens it happens often. Furthermore the very long
(26ish) cycle latency was compounded by a decision to reuse some of
the stages .. so it was 26cycles between initations ... as opposed to
1 or 2 cycles for most other operations. This resulted in compile
times of HOURS for some simple loops ( / / / /) while the compiler
tried to get a sensible schedule (for some reason, my suggestion of
simply having acompiler directive to give up after a couple of minutes
wasn't accepted).

Both divide and sqrt crop up when one wants to be VERY careful about
numerics ... as several really good algorithms rely on them ... there
are quicker alogorithms for those applications, but usually less
numerically robust.

The Cydra 5 failed for primarily for business reasons; but there were
some suboptimal technical decisions and I'd place 26 cycle II for
divide on that list. Key applications went a good 10x slower; compile
times went exponetially bad (though that was fixable). The cost of NOT
having done a better job on this was quite large. I don't know if the
desigers gave thought to giving up sqrt for a pipelined divide ....
but it would have been a very good trade.

While no one at ardent will 'fess up, I feel pretty confident in
guessing that their next machine will have divide, or they will adopt
a cray style "divide". 


Keith H. Bierman    |*My thoughts are my own. !! kbierman@sun.com
It's Not My Fault   |	MTS --Only my work belongs to Sun* 
I Voted for Bill &  | Advanced Languages/Floating Point Group            
Opus                | "When the going gets Weird .. the Weird turn PRO"