Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Posting-Version: version B 2.10.2 9/18/84; site watrose.UUCP
Path: utzoo!watmath!watrose!cdshaw
From: cdshaw@watrose.UUCP (Chris Shaw)
Newsgroups: net.arch
Subject: Re: Cube designs vs. x,y,z bus
Message-ID: <7327@watrose.UUCP>
Date: Fri, 1-Mar-85 17:44:04 EST
Article-I.D.: watrose.7327
Posted: Fri Mar  1 17:44:04 1985
Date-Received: Sat, 2-Mar-85 02:53:37 EST
References: <48@pbear.UUCP> <268@oliveb.UUCP> <7306@watrose.UUCP> <5056@fortune.UUCP>
Reply-To: cdshaw@watrose.UUCP (Chris Shaw)
Organization: U of Waterloo, Ontario
Lines: 48


>   Ah, surely you jest.  Given that each node on the psuedo cube has 
> its own associated memory, the vast majority of that processors time
> will be spent without touching the 'bus'.  And in most cases, the only
> times the bus is used is for infrequent data movement (lets say for 
> mmu misses) and for interprocessor communication.

No I don't jest.. and here's why.
The purpose of the cube is NOT to have a machine which dozens of people
can log on to and have their own 286-based micro. The motivation for the
cube is to get a machine in which all of the processors work together
on the same VERY large problem. In other words, the cube is a PARALLEL
processing machine, not just a machine with lots of processors. As a 
previous reply to your posting indicated, you can't have
too much parallelism in a machine of this ilk.


>   As long as there are not too many processors on any one bus or 
>ethernet link, the number of times where you would have to wait for the
>bus would be minimal. The trade off as to how many would be allowed
>is part of the architects job, to analyze the usage of the machine, the
>tasks it must do, the performance requirements and the cost.

There are two thing I see wrong with this suggestion :

1) It seems to imply that there would be several versions of a machine,
say a linear algebra engine, a database engine, etc. This sounds kind of
hokey to me.
2) Your estimation of communications load I think is far too small. Caltech
has a cube running in which they solve the 7-body problem. This problem
requires that you calculate 21 pairs of interactions (I don't know for sure).
The structure of solution was to have 7 body processes and one i/o processes
on a 4-node square. (2 processes per node). Each body process sent its position
to 3 other processes, and received similar data from the other 3. With 
info for each body, calculations were done, and the results passed on to the
remaining processes. (See Jan '85 CACM for real description). The point is,
the solution to this problem is highly communication based. Many applications
where matrix-bashing is needed are also likely to be dependent on getting
partial results sent to them from other processes within the cube.
Basically, as much time could conceivably be spent on sending and receiving
data is spent on doing actual calculations.


>-Jim Wall
>...amd!fortune!wall

Chris Shaw
University of Waterloo