[Pydra] Return values of ParallelTask, etc.

Yin QIU allenchue at gmail.com
Fri May 29 17:43:08 UTC 2009


Hi,

It looks like that currently we cannot obtain the return values of
those work units derived from a ParallelTask. The primary worker just
slices the total work and delegates resultant work units to various
workers. The delegation is non-blocking and has no callbacks (as in
Worker.request_worker). So I want to ask what we can do if we want to
collect the results (and perhaps do some post-processing)? Or what is
ParallelTasks supposed to work in real applications? For example, if
we want to do DNA sequencing with Pydra, how do we combine analyzed
results computed at multiple workers? One approach I can think of is
to let the task define the combining strategy. A simple strategy might
be appending results to a central data store. A similar problem also
exist in a parallel TaskContainer.

I'm asking those partly because a scheduler needs to know when a
worker has finished its work unit so that it can add it to the idle
pool or assign it another work unit. What we proposed earlier was to
use a signal-based or message-oriented mechanism. I just want to
confirm that this won't conflict with existing design.

Another tiny question. I noticed the following lines in
ParallelTask.more_work():

if len(self._data_in_progress) or len(self._data):
    print '[debug] Paralleltask - still has more work: %s : %s' %
(self._data, self._data_in_progress)
    reactor.callLater(5, self.more_work)

I think this would cause a infinite loop. Was the third line supposed
to be something like "reactor.callLater(5, self._assign_work)"?

-- 
Yin Qiu
Nanjing University, China


More information about the Pydra mailing list