[Pydra] Master-Node-Worker relationship refactor

Peter Krenesky peter at osuosl.org
Sat Aug 22 05:23:04 UTC 2009

I have a semi-functional version checked into the node_refactor branch. 
Its been a tedious process moving things around.  Its been going slower
than I hoped but I've made alot of progress tonight.

This code has the majority of the refactor but is still missing a few
things for managing the workers.  It can run a simple task once.

   * does not kill workers after a task completes
   * does not kill workers after release has been called
   * will only run a task once due to a deferred error


Peter Krenesky wrote:
> Hi all,
> I've started refactoring Master, Node, and Worker to change the way in
> which they relate to eachother.  When this refactor is complete Master
> will only communicate with Nodes.   Node will be the only component to
> interact with Workers.  Workers will be spawned per TaskInstance.
> == WHY? ==
>   - workers need to be chrooted (sandboxed) per TaskInstance to ensure
> no task can affect other users.  Even importing a task file to read task
> name and description puts the cluster at risk.
>   - Some libraries, django especially, can only be configured once per
> runtime.  This means changing datasources is not possible under the
> current system.
>   - less network overhead from TCP connections.
>   - simpler networking logic.
> == How? ==
> Master
>      - remove WorkerConnectionManager module
>      - change add_node() so that instead of adding workers to the
> checker, WORKER_CONNECTED signals are emited with a special proxy object
> that mimics a WorkerAvatar but is really the remote from the Node.  This
> allows all other logic in Master to remain the same.
>     - change node disconnection logic to include disconnecting workers
> as well
> Node
>      - Add WorkerConnectionManager Module, Master's version of this can
> be reused.
>      - Add mechanism for tracking running workers
>      - Add task_run that manages passing work to workers, and starting
> new workers. 
>      - Add callback system to task_run to handle asynchronous nature of
> waiting for a worker to start before passing on a task_run
>      - Add remotes that proxy all other functions in
> worker_task_controls to worker avatars
>      - Add remotes that proxy master functions to MasterAvatar.
> Worker
>     - Modify WorkerConnectionManager to connect locally only and use
> Node key for auth.
> == status ==
> Much of the above code in place but it is not tested.  I'll likely have
> it complete within the next few days.

More information about the Pydra mailing list