[Pydra] Master-Node-Worker relationship refactor
peter at osuosl.org
Tue Aug 18 15:55:04 UTC 2009
Great news. I'll checkout the code when I get back to the office
Sent from my iPhone
On Aug 18, 2009, at 8:01 AM, Yin QIU <allenchue at gmail.com> wrote:
> Thanks Peter. That's great news.
> I have managed to make task synchronization work. In my experiment, I
> ran the master and a node in two different folders on the same
> machine. Initially, the node does not contain any task code in its
> task_cache. After the scheduler dispatches a task (I used the simplest
> TestTask, others will presumably do too) to the worker, the node will
> automatically synchronize the task code with the master. Similar
> things will happen too if the task code is updated on the master. As
> discussed, task synchronization is done asynchronously.
> Under the hood, two modules, namely TaskSycnClient and TaskSyncServer,
> interacts with each other. The former is located on the node, and the
> latter resides on the master. Currently, TaskSyncClient is a module of
> the Worker ModuleManager. After your refacotoring is done, it won't be
> hard to migrate it to the Node ModuleManager.
> Latest changes have been committed to the task_packaging branch on
> On Tue, Aug 18, 2009 at 12:44 PM, Peter Krenesky<peter at osuosl.org>
>> Hi all,
>> I've started refactoring Master, Node, and Worker to change the way
>> which they relate to eachother. When this refactor is complete
>> will only communicate with Nodes. Node will be the only component
>> interact with Workers. Workers will be spawned per TaskInstance.
>> == WHY? ==
>> - workers need to be chrooted (sandboxed) per TaskInstance to ensure
>> no task can affect other users. Even importing a task file to read
>> name and description puts the cluster at risk.
>> - Some libraries, django especially, can only be configured once per
>> runtime. This means changing datasources is not possible under the
>> current system.
>> - less network overhead from TCP connections.
>> - simpler networking logic.
>> == How? ==
>> - remove WorkerConnectionManager module
>> - change add_node() so that instead of adding workers to the
>> checker, WORKER_CONNECTED signals are emited with a special proxy
>> that mimics a WorkerAvatar but is really the remote from the Node.
>> allows all other logic in Master to remain the same.
>> - change node disconnection logic to include disconnecting workers
>> as well
>> - Add WorkerConnectionManager Module, Master's version of this
>> be reused.
>> - Add mechanism for tracking running workers
>> - Add task_run that manages passing work to workers, and starting
>> new workers.
>> - Add callback system to task_run to handle asynchronous nature
>> waiting for a worker to start before passing on a task_run
>> - Add remotes that proxy all other functions in
>> worker_task_controls to worker avatars
>> - Add remotes that proxy master functions to MasterAvatar.
>> - Modify WorkerConnectionManager to connect locally only and use
>> Node key for auth.
>> == status ==
>> Much of the above code in place but it is not tested. I'll likely
>> it complete within the next few days.
>> Pydra mailing list
>> Pydra at osuosl.org
> Yin Qiu
> Nanjing University, China
> Pydra mailing list
> Pydra at osuosl.org
More information about the Pydra