Path: utzoo!attcan!uunet!lll-winken!lll-lcc!lll-tis!helios.ee.lbl.gov!pasteur!ucbvax!decwrl!labrea!rutgers!gatech!hubcap!ian From: ian@esl.UUCP (Ian Kaplan) Newsgroups: comp.parallel Subject: Re: parallel numerical algorithms Message-ID: <1826@hubcap.UUCP> Date: 6 Jun 88 13:00:17 GMT Sender: fpst@hubcap.UUCP Lines: 139 Approved: parallel@hubcap.clemson.edu In article <1776@hubcap.UUCP> gerald@umb.umb.edu (Gerald Ostheimer) writes (in response to a note from George Nelan): > >You should take a look at the work on the tagged-token data flow machine of >MIT's Computation Structures Group under Arvind. (Their work, for some reason >unbeknownst to me, did not yet receive any attention in this newsgroup.) [ much deleted ] > > A (possibly) surprising problem that turns up is that there can be too > much parallelism in a program, which can overflow the pipelines and choke the > machine. [ text deleted ] > > There are of course more problems. > I for one never quite understood how the program is distributed over the > CPU's. This must happen dynamically (when calling functions or > entering loops, for example), if parallelism is to be exploited. To make up for the deficit of data flow discussion in this news group, I will submit this overly long note discussing some of Arvind's work. Data flow is complex and I cannot provide an introduction in the space of a note that I have time to write and you are likely to read. As a result, I will assume some familiarity on the part of the reader with data flow. If there is enough interest, I might be persuaded to put together brief bibliography. All programs must have a method of handling global state, which is usually stored in global data structures. As Mr. Ostheimer points out, Arvind's I-structures handle this well. Arvind's group has also looked at reducing the overhead normally associated with handling these data structures. Some of these methods are algorithmic and others are architectural. For example, Steve Heller in Arvind's lab did some work on a hardware implementation of an I-structure store, which is a sort of "smart memory". Arvind's model of data flow is referred to as dynamic data flow (this is in contrast to Prof. Dennis' model of static data flow). In dynamic data flow several instances of a loop can be active at the same time. As each loop instance is instantiated a loop field in the data flow tag is incremented. This allows separate data flow matching on the tokens of each loop instance. While this technique generates a lot of parallelism for loops, as Mr. Ostheimer points out, it can also generate so much parallelism that the data flow matching store overflows. One way to solve this problem is to allow the loops to only expand so far. For example, a given loop would only be allowed to have five instantiations. This would be decided in advance by the compiler. How the compiler decides this, I have not seen explained. Arvind's implementation of dynamic data flow assumes that as constructs like loops parallelize at run time, new loop instances run on additional processors. This means that the code for the loop must be resident on the processor. One easy way to handle this was used by the Cal. Tech. Cosmic Cube group: a complete copy of the program is loaded onto every processor. Of course this is a very inefficient way to use memory, especially in a small grain parallel computer. Like Mr. Ostheimer, I have never seen an explanation of how code is allocated to processors in a dynamic data flow system. On Arvind's data flow machine I don't thing that assignment is dynamic (it is on the Manchester data flow machine). The data flow model is an asynchronous model. A data producer sends data to a data consumer when ever it wants and the consumer uses the data when it is ready. There are no rendezvous in data flow. This has the advantage of allowing pipelining, which is an important source of parallelism. It has the disadvantage of consuming large, potentially unbounded, amounts of buffer space. If a data producer (perhaps a data flow sub-graph) produces data faster than a consumer can use it, the consumer's buffers will overflow. Arvind wrote two papers on "demand driven data flow", which allows the consumer to tell the producer that it is ready for more data. I do not know if the current Id compiler that is being used by Arvind's group uses demand driven data flow or not. Without something like demand driven data flow (or reduction) an asynchronous data flow machine has the potential of consuming all of its buffer memory and crashing. Arvind and Dennis propose data flow supercomputers. Supercomputers are being used primarily for numeric computation (as opposed to symbol computation). This means that numeric programs must run well on the system being proposed. As many people have noted, numeric programs spend most of their time in loops. While existing programs might be better restated for a data flow machine (in Id for example), they will still have a heavy loop content. Arvind's data flow model takes advantage of the parallelism in loops and produces a great deal of parallelism while the loop is executing. Many loops must synchronize, either during the loop execution (if the loop is pipelining) or at the end of the loop. If one looks at a performance graph for a numeric program executing on a data flow machine, there will be large peaks, with a lot of parallelism, and troughs, where synchronization must take place. The synchronization is talking place on a relatively small number of processors. At this point, the entire computation is limited by the speed of the processors. If the computation is highly parallel, these synchronization points will come to dominate the computation time. The data flow model assumes that many (i.e., 10^4 or 10^6) relatively inexpensive, relatively low performance, processors are used. This is fine for the "peaks" of parallelism, but these slow processors become the limiting factor for synchronization. This is one of the reasons that Prof. Kuck at the Univ. of Ill. proposes a few very powerful processors for his Cedar parallel processor. Since these processors are fast, they will execute the synchronization sections rapidly. Conclusion Arvind's data flow model has a reasonable solution for handling global state (global data structures). The dynamic data flow model has a potential producing so much loop parallelism that the data flow matching stores overflow. This can be managed by choking loop parallelism. Arvind also has a solution for handling the "producer-consumer" problem, but it is very theoretical and needs work before it can be implemented. Using a combination of reduction and data flow may provide an elegant solution, but I have not seen Arvind propose this (although there are people in his lab who know reduction well). Also, it is not clear, at least to me, that small grain data flow is a good model for numeric computation. Symbolic computation might be a better application, since less synchronization may be required. Finally, I have heard a lot of grumbling at conferences along the lines of "Why is there no real hardware implementation of a data flow machine." One person even suggested that no machine had been built "because Arvind knew that it would not work" (a view that I don't subscribe to). Although some of the criticism is unfair, it still remains true that there is no "iron". Compared to a couple of years ago, data flow work has lost momentum if the ACM Computer Architecture Conference proceedings and the International Conference on Parallel Processing proceedings are any indication. Of the data flow papers I saw presented at the last ICPP, none of them addressed the hard problems in data flow. All in all, data flow in America does not appear as lively as it once was. However the Japanese just finished a fairly large data flow parallel processor named Sigma, so there is still work going on. I have not seen any information on the latest Sigma work. Well, I hope that this generates some light, in addition to heat. Sorry it was so long. Any views expressed here are personal, and do not necessarily reflect those of ESL or my department. Ian L. Kaplan ESL Inc. Advanced Technology Systems ian@esl.COM ames!esl!ian