[Pydra] Return values of ParallelTask, etc.

Peter Krenesky peter at osuosl.org
Sun May 31 15:42:18 UTC 2009

Yin QIU wrote:
> I'm sorry but I've got another issue. What if a request_worker call by
> a ParallelTask fails? That is, what does a root worker do if it fails
> to request a worker from the master?
> The following is an excerpt from worker.py:
>         def request_worker(self, subtask_key, args, workunit_key):
>             """
>             Requests a work unit be handled by another worker in the cluster
>             """
>             print '[info] Worker:%s - requesting worker for: %s' %
> (self.worker_key, subtask_key)
>             deferred = self.master.callRemote('request_worker',
> subtask_key, args, workunit_key)
> This call will invoke Master.request_worker through a WorkerAvatar.
> But Master.request_worker returns no value. Since we know the master
> will eventually call Master.select_worker to pick a worker to delegate
> the work unit, what if select_worker fails because of lack of idle
> workers, or if a subsequent call to run_worker fails? The current code
> handles this by making Master.run_task() return a zero value. But
> obviously Master.request_worker() does not inspect the return value of
> run_task(). So does this mean that a worker has no way to detect
> failure of Worker.request_worker()?

You are correct, this is a problem.  It works well under ideal
conditions but breaks down quickly from failures.

It works because all the workers are Preallocated to the MainWorker.  It
only requests workers at the start of a task, or when a previous work
unit completes.  Preallocation ensures that the workers are never busy
on another task.  If a worker disconnects it will fail, because its
assigned but no longer exists.  There is no signal to the MainWorker to
let it know one of its allocated workers has disconnected, or that
another worker is idle and available.

> If this is true, even in current setup, we cannot guarantee that a
> ParallelTask will complete with only 1 worker (in the extreme case). A
> work unit might be assigned on the premise that it will be run on a
> remote worker (and thus not assigned locally). But if it does not get
> run, the whole task will never complete.
ParallelTask uses a separate method call for assigning work to the
MainWorker.  it bypasses the master to reduce overhead.  Its already
running on the Mainworker, so its always available to it.


> I know failing to request a worker is unlikely to happen in current
> setup because a task acquires all available workers greedily when it
> starts. But this won't still be true if we take faulty nodes and
> competitive scheduling into account.
> Actually with a prospective scheduler, worker requests together with
> the work units (including arguments and subtask_keys) would be queued
> by the scheduler (at the master side?). In this sense, a request will
> eventually be handled. But it seems that the problem that a
> ParallelTask cannot complete with one worker would still exist,
> because we have no way to know when we should run a work unit locally.
> Any thoughts on this?

The algorithm i previously described (priorityqueue that calls
task.get_work()) would solve these problems. 

currently its a task that requests a worker that might not be ready. 
handling faults due to changing worker status requires lots of calls to
the Mainworker to keep it informed.

Shifting the work to the master, allows assignment and failure to be
handled easily because the master has direct knowledge of what workers
are connected and idle.  It can respond quicker to failures, using only
local calls.

Handling work on MainWorker is still an issue.  It requires special
logic because it can only process workunits from the task it is
currently running.  Our choices are:

    1) keep slicing on the MainWorker and allow the MainWorker to
continue to assign workunits locally outside the Master request process.

    2) include special handling in the scheduler that allows the
MainWorker to exist as an idle resource, but only for the Task it is
already running

    I'm leaning towards 1) because it will allow easier containment for
the slicing operation.  Talking with Jakub about input helpers we've
realized that they will be specific to tasks.  This means executing
custom code.  We need to contain all custom code for fault tolerance and
security.  The worker already requires containment because its running
the task implemented by a user.  Slicing run on the worker would exist
in that same containment rather than having to build it on the master too.

Containment works by running a section of code in a separate process
with a restricted user.  Workers already require a separate process to
allow concurrent execution so its easier to implement containment.

That said there are other reasons to prefer slicing on the master.  I
think i need to write them up so we can make a final decision on it.

- Peter

More information about the Pydra mailing list