[Pydra] Master-Node-Worker relationship refactor

Yin QIU allenchue at gmail.com
Tue Aug 18 15:01:44 UTC 2009


Thanks Peter. That's great news.

I have managed to make task synchronization work. In my experiment, I
ran the master and a node in two different folders on the same
machine. Initially, the node does not contain any task code in its
task_cache. After the scheduler dispatches a task (I used the simplest
TestTask, others will presumably do too) to the worker, the node will
automatically synchronize the task code with the master. Similar
things will happen too if the task code is updated on the master. As
discussed, task synchronization is done asynchronously.

Under the hood, two modules, namely TaskSycnClient and TaskSyncServer,
interacts with each other. The former is located on the node, and the
latter resides on the master. Currently, TaskSyncClient is a module of
the Worker ModuleManager. After your refacotoring is done, it won't be
hard to migrate it to the Node ModuleManager.

Latest changes have been committed to the task_packaging branch on github.

On Tue, Aug 18, 2009 at 12:44 PM, Peter Krenesky<peter at osuosl.org> wrote:
> Hi all,
>
> I've started refactoring Master, Node, and Worker to change the way in
> which they relate to eachother.  When this refactor is complete Master
> will only communicate with Nodes.   Node will be the only component to
> interact with Workers.  Workers will be spawned per TaskInstance.
>
>
> == WHY? ==
>  - workers need to be chrooted (sandboxed) per TaskInstance to ensure
> no task can affect other users.  Even importing a task file to read task
> name and description puts the cluster at risk.
>
>  - Some libraries, django especially, can only be configured once per
> runtime.  This means changing datasources is not possible under the
> current system.
>
>  - less network overhead from TCP connections.
>
>  - simpler networking logic.
>
>
>
> == How? ==
>
> Master
>     - remove WorkerConnectionManager module
>     - change add_node() so that instead of adding workers to the
> checker, WORKER_CONNECTED signals are emited with a special proxy object
> that mimics a WorkerAvatar but is really the remote from the Node.  This
> allows all other logic in Master to remain the same.
>    - change node disconnection logic to include disconnecting workers
> as well
>
>
> Node
>     - Add WorkerConnectionManager Module, Master's version of this can
> be reused.
>     - Add mechanism for tracking running workers
>     - Add task_run that manages passing work to workers, and starting
> new workers.
>     - Add callback system to task_run to handle asynchronous nature of
> waiting for a worker to start before passing on a task_run
>     - Add remotes that proxy all other functions in
> worker_task_controls to worker avatars
>     - Add remotes that proxy master functions to MasterAvatar.
>
>
> Worker
>    - Modify WorkerConnectionManager to connect locally only and use
> Node key for auth.
>
>
>
> == status ==
>
> Much of the above code in place but it is not tested.  I'll likely have
> it complete within the next few days.
> _______________________________________________
> Pydra mailing list
> Pydra at osuosl.org
> http://lists.osuosl.org/mailman/listinfo/pydra
>



-- 
Yin Qiu
Nanjing University, China


More information about the Pydra mailing list