Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.1 6/24/83; site redwood.UUCP
Path: utzoo!watmath!clyde!cbosgd!ihnp4!mhuxn!mhuxj!mhuxr!ulysses!allegra!bellcore!decvax!tektronix!hplabs!hpda!fortune!redwood!rpw3
From: rpw3@redwood.UUCP (Rob Warnock)
Newsgroups: net.arch
Subject: Re: Re: Caltech's Cosmic Cube
Message-ID: <183@redwood.UUCP>
Date: Tue, 5-Mar-85 01:39:04 EST
Article-I.D.: redwood.183
Posted: Tue Mar  5 01:39:04 1985
Date-Received: Sat, 9-Mar-85 08:47:42 EST
References: <333@oakhill.UUCP> <21294@lanl.ARPA> <7268@watrose.UUCP> <166@cmu-cs-wb1.ARPA> <198@cornell.UUCP>
Organization: [Consultant], Foster City, CA
Lines: 111

+---------------
| >What puzzles me is why use point to point channels between processors (and
| >do routing if a connection does not exist)?  Wouldn't it be much simpler to
| >use a dedicated ethernet?
| I assume that most of the communication between processors consists of very
| short packets, i.e., a single floating point number...
+---------------

Just went to a very interesting talk today at NASA/Ames given by Cleve
Moler of Intel Scientific Computers, who make a commercial hypercube
system (announced in net.arch previously). Don't know about Caltech's
applications, but for Intel's, the messages tend to be fairly large vectors,
actually. (Hundreds of floating-point numbers.)

+---------------
|                                                  ...  Ethernet is very
| inefficient when it is handling short packets, since it has a lot of overhead
| per packet.  In actual practice, the 10mb bandwith is approximated only when
| packets are very long (perhaps 10KB, I forget).
+---------------

Well, not 10KB, since the maximum legal packet is 1518 bytes. The minimum
packet size is 46 data bytes (64 total bytes including preamble, address,
and CRC), and those can happen every 60.8 microseconds (51.2 for the packet
and 9.6 "mikes" of inter-packet delay), or every 76 byte times. Let's see,
that's a minimum efficiency of 46/76 or about 60%, in the absence of collisions.
Packets of only 128 data bytes yield 81%; 256 bytes, 89.5%; and 1024 bytes, 97%.

Even with collisions, channel efficiency stays high for packets over 128
bytes or so, but remember that in the backplane "bus" application here, the
Ethernet channel is VERY short (much less than a bit time), so collisions
are much less frequent.  (Try solving the equations for efficiency in the
original Ethernet paper for C = 10 Mbit/sec and T = 0.1 microsecond.)

+---------------
| Also, I bet most of the algorithms for the Cosmic Cube are fairly
| synchronous, so all the processors would want to be broadcasting at the
| same time...
+---------------

That didn't seem to be the case for the application problems I saw presented
today -- concurrent, yes; "synchronous", no.

Further, the targets of messages were always specific processors (processes,
actually). Broadcast did not seem to be (yet) implemented.

+---------------
|         ... Ethernet assumes that the net is not very loaded.  A 10%
| loaded Ethernet is very rare.
+---------------

True, a heavily-loaded Ethernet is rare in, say, a real-life
office-automation environment. But Ethernet doesn't "assume" that, in
fact, the access algorithm and total throughput are stable even under
extreme overload. (See "Measured Performance of an Ethernet...", Shoch
& Hupp.) The net will not collapse, as long as the rules are followed,
and the thoroughput will be high if packets are a few hundred bytes or
more. On a "bus" backplane, the throughput will be even higher (the
number of "hosts" is smaller, and the "cable" is shorter.)

+---------------
| Also, Ethernet is not that cheap.  Each connection runs a few hundred
| dollars.  A straightforward serial connection would only be a few dollars,
+---------------

Geez... I wonder why the Intel hypercube uses ETHERNET chips... EIGHT (8)
OF THEM!!! ;-} ;-}  And they use them for mere point-to-point links!

Seriously, you should look at current chip prices. In "backplane" applications
you don't need a full transceiver per connections, but can interconnect at
the "transceiver cable" level (or even at TTL, if you supply clock).

+---------------
|      ...  A straightforward serial connection would only be a few dollars,
| and a parallel port is even faster and almost as cheap (wiring costs, you
| know).
+---------------

Sorry, most of the cost is NOT in the serialization, but in the bus interface,
buffer handling, and line driving/receiving -- all things which a parallel
interface also has to do. And the parallel interface doesn't have the noise
immunity (at least not a cheap TTL one), while the Ethernet transceiver-cable
driver/receivers cheerfully drive 50 meters over a shielded twisted pair
(differential shifted-ECL levels).

+---------------
|      ...As long as the interconnection pattern is regular and there are
| not too many processors (too many is more than the number that fit in one
| or two cabinets) the Cosmic Cube interconnection scheme should be cheap
| and simple.  | | Ralph Johnson
+---------------

I'd like to see you interconnect 128 processors in a hypercube using 50-pin
ribbon cable! ;-} The interconnection pattern is regular, but it's not
necessarily convenient! (Remember, each processor is a "corner", and as you
"linearize" the Cube by putting it in a rack, the interconnects get to be a
bit of a rat's nest.)

Disclaimer: I am not selling the Intel method; I have some concerns
about having that many high-speed point-to-point links on a memory bus.
(I am an advocate of quasi-bus serial backplanes, rather than point-to-point).
However, Intel's use of Ethernet chips is quite reasonable, given the
connection pattern they chose, and is MUCH preferred to 8 parallel interfaces!


Rob Warnock
Systems Architecture Consultant

UUCP:	{ihnp4,ucbvax!dual}!fortune!redwood!rpw3
DDD:	(415)572-2607
USPS:	510 Trinidad Lane, Foster City, CA  94404