Path: utzoo!utgpu!watmath!att!tut.cis.ohio-state.edu!gem.mps.ohio-state.edu!ginosko!husc6!think!barmar
From: barmar@think.COM (Barry Margolin)
Newsgroups: comp.arch
Subject: MIMD on Connection Machine
Message-ID: <26356@news.Think.COM>
Date: 9 Aug 89 19:40:58 GMT
References: <107900005@iuvax> <8953@june.cs.washington.edu> <2238@jhunix.HCF.JHU.EDU> <5796@pt.cs.cmu.edu>
Sender: news@Think.COM
Reply-To: barmar@kulla.UUCP (Barry Margolin)
Organization: Thinking Machines Corporation, Cambridge MA, USA
Lines: 49

In article <5796@pt.cs.cmu.edu> jps@cat.cmu.edu (James Salsman) writes:
>In article <2238@jhunix.HCF.JHU.EDU> ins_atge@jhunix.UUCP (Thomas G Edwards) writes:
>> And just because every processor gets the same instruction feed, one must not
>> think that every processor is "doing the same thing." Each CM processor
>> can hold an index to an array located in that processor,
>> so with the right software, the CM could become a MIMD machine.  
>
>Please do not attempt this.  I have tried for months to
>optimize a MIMD-emulator on the CM-2 with very disappointing
>results.  It might be a fun class project for a bunch of
>novice hackers learning parallelism.

Right.  I assume Thomas was thinking of keeping a macro-instruction
stream in the CM processor memory and stepping through that using the
indirect addressing feature.  You could then implement a parallel
simulator to interpret the macro-instruction stream.  However, only a
small part of that interpreter can actually run fully parallel.  Once
the macro-instruction is decoded and it's time to actually execue it,
you essentially have a big CASE (or, in *Lisp syntax, *COND)
statement.  Each branch will be executed in parallel by all the
processors that pass its associated test, but you still have to run
through all the tests sequentially.  The code might look something
like:

(*cond ((load-op-p opcode) (do-load operands))
       ((store-op-p opcode) (do-store operands))
       ((add-op-p opcode) (do-add operands))
       ...)

This will first do all the simultaneous LOAD operations in parallel,
then all the simultaneous STOREs, then all the simultaneous ADDs, etc.
Therefore, the speed of this MIMD simulator will be O(n), where n is
the number of different opcodes.  I can think of some ways to adjust
the above tests dynamically to change it to O(m), where m is the
average number of different opcodes being executed simultaneously, but
that's still not very good.

It'll never be a very good MIMD machine.  The individual processors
aren't very powerful.  It might be reasonable for research on
programming techniques for use on kiloprocessor MIMD machines until
the real things come along (it's probably better than trying to
simulate them on uniprocessors, since the performance there is O(p),
where p is tne number of processors being simulated).

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar