[Pydra] Return values of ParallelTask, etc.
peter at osuosl.org
Fri May 29 18:02:47 UTC 2009
Yin QIU wrote:
> It looks like that currently we cannot obtain the return values of
> those work units derived from a ParallelTask. The primary worker just
> slices the total work and delegates resultant work units to various
> workers. The delegation is non-blocking and has no callbacks (as in
> Worker.request_worker). So I want to ask what we can do if we want to
> collect the results (and perhaps do some post-processing)? Or what is
> ParallelTasks supposed to work in real applications? For example, if
> we want to do DNA sequencing with Pydra, how do we combine analyzed
> results computed at multiple workers? One approach I can think of is
> to let the task define the combining strategy. A simple strategy might
> be appending results to a central data store. A similar problem also
> exist in a parallel TaskContainer.
> I'm asking those partly because a scheduler needs to know when a
> worker has finished its work unit so that it can add it to the idle
> pool or assign it another work unit. What we proposed earlier was to
> use a signal-based or message-oriented mechanism. I just want to
> confirm that this won't conflict with existing design.
> Another tiny question. I noticed the following lines in
> if len(self._data_in_progress) or len(self._data):
> print '[debug] Paralleltask - still has more work: %s : %s' %
> (self._data, self._data_in_progress)
> reactor.callLater(5, self.more_work)
> I think this would cause a infinite loop. Was the third line supposed
> to be something like "reactor.callLater(5, self._assign_work)"?
When a task finishes here is the flow of calls:
1) This is how the master is notified that the worker finished the task.
Task -> Worker.work_complete(...) - >Master.send_results(...)
2) At this point the master has the results. It will handle anything
internally such as scheduling. If it was a subtask then the Main-Worker
Master -> Worker.receive_results(..) ->
When the workunit is assigned it can't be a blocking call. It returns
right away, with at most a confirmation that the workunit was assigned.
The workunit may take a long time to finish, so it handles
notification. There is fault tolerance in both the Master and Worker to
ensure that if either goes away the workunit is never dropped.
Theres a few issues here that i know need to be worked on, but I've been
putting off for other things because they are mostly coding style changes.
1) The naming convention - its not very clear which functions call what.
2) files need to be organized and modularized so they are easier to
deal with. Master.py has grown very large and functions should be
grouped into smaller components.
2) There are additional functions that deal with sending Failures
and Returning work from disconnected Workers. The results passing calls
need to be unified, and reworked to expect a list of results. This is a
requirement for improved slicing which will result in lists of workunits.
More information about the Pydra