hehe, the claim is for parallel programming, you'd be dataflow oriented

送交者: AA 于 2005-9-29, 23:15:19:

回答: CORBA is more close to concurrent programming than 由 steven 于 2005-9-29, 12:04:51:

Control flow oriented can't make application scale well, because people can't think of all the controls/interactions well when they scale up.

It's much easier for dataflow oriented parallel programming, because then you care only where the data come from and go to (---- the hypothesis is data flows can be thought of clearly). A very good analog is history book. There are basically two ways to write a history of certain period: One is to describe in time order about all events happening at each time point; the other is to describe nation by nation, person by person seperately. The former approach is like pthread programming, and the latter is dataflow programming ---- the person is the data flowing through all kinds of historical events he was involved. The big picture of the event in threading model is shaped well in the mind of the programmer (history writer) before he describes it. Well in dataflow model, the big picture is not required to be thought of clearly; instead, it is composed of small pictures of each person.

If a reader of the dataflow history book wants to have a clear big picture of an event, he need go through all the persons' biographies to compile the related things together. It looks like tough, but it is the cost of a easy writing. Just imagine when the book is written by a large group of writers with each one being responsible for ten biographies. No single writer is required to understand the big historical event before he writes his part. The thicker the book is, the easier this model works. This is called "scalability" in parallel programming.

The corresponding cost of losing big picture of historical event in dataflow parallel programming is, losing data locality and bloating code redundancy.
Locality: The context of an event is common to (or shared by) all persons in the event, but the biography writers have to describe it again and again for each person when the event happens to him.
Redundancy: When two persons are in the same scene having a conversation, it's much easier to describe the situation as a chat. In dataflow model, you may need to describe the conversation twice because each biography has one.

The cost is unacceptable for good performance: losing locality means the tasks need move the data around back and forth; code bloating means the tasks execute redundant codes. Then compiler kicks in. A book compiler can compose a big event picture by reading through all the biographies; a compiler compiler can bring back the locality and reduce the redundant code as well. Well it requires the compiler is smart enough. So far as I know, it's an area without enough exploration. This is the original question I raised, and I am hoping to do some work in it if possible.

In my dictionary, the threading model is called "annalistic style 编年体", the dataflow model is "biographical style 纪传体".

Some additional words: MPI is dataflow programming in some sense. It has been proven to be scalable than DSM. But it has its problem in programming compared to shared memory model. That is, sometimes you don't want to explicitly specify what data to access from where. It's even impossible in some cases. For example, to program a mtrix multiplication, you may want to just describe the algorithm, rather than to distribute the data across the system. This is a known weak point of dataflow programming, where the shared memory algorithm (the big picture) is well defined already, we certainly don't want to dissect it into pieces.

所有跟贴:

加跟贴

所有跟贴·加跟贴·新语丝科技论坛