[Pydra] Master-Node-Worker relationship refactor

Peter Krenesky peter at osuosl.org
Tue Aug 18 15:55:04 UTC 2009


Great news.  I'll checkout the code when I get back to the office

Sent from my iPhone

On Aug 18, 2009, at 8:01 AM, Yin QIU <allenchue at gmail.com> wrote:

> Thanks Peter. That's great news.
>
> I have managed to make task synchronization work. In my experiment, I
> ran the master and a node in two different folders on the same
> machine. Initially, the node does not contain any task code in its
> task_cache. After the scheduler dispatches a task (I used the simplest
> TestTask, others will presumably do too) to the worker, the node will
> automatically synchronize the task code with the master. Similar
> things will happen too if the task code is updated on the master. As
> discussed, task synchronization is done asynchronously.
>
> Under the hood, two modules, namely TaskSycnClient and TaskSyncServer,
> interacts with each other. The former is located on the node, and the
> latter resides on the master. Currently, TaskSyncClient is a module of
> the Worker ModuleManager. After your refacotoring is done, it won't be
> hard to migrate it to the Node ModuleManager.
>
> Latest changes have been committed to the task_packaging branch on  
> github.
>
> On Tue, Aug 18, 2009 at 12:44 PM, Peter Krenesky<peter at osuosl.org>  
> wrote:
>> Hi all,
>>
>> I've started refactoring Master, Node, and Worker to change the way  
>> in
>> which they relate to eachother.  When this refactor is complete  
>> Master
>> will only communicate with Nodes.   Node will be the only component  
>> to
>> interact with Workers.  Workers will be spawned per TaskInstance.
>>
>>
>> == WHY? ==
>>  - workers need to be chrooted (sandboxed) per TaskInstance to ensure
>> no task can affect other users.  Even importing a task file to read  
>> task
>> name and description puts the cluster at risk.
>>
>>  - Some libraries, django especially, can only be configured once per
>> runtime.  This means changing datasources is not possible under the
>> current system.
>>
>>  - less network overhead from TCP connections.
>>
>>  - simpler networking logic.
>>
>>
>>
>> == How? ==
>>
>> Master
>>     - remove WorkerConnectionManager module
>>     - change add_node() so that instead of adding workers to the
>> checker, WORKER_CONNECTED signals are emited with a special proxy  
>> object
>> that mimics a WorkerAvatar but is really the remote from the Node.   
>> This
>> allows all other logic in Master to remain the same.
>>    - change node disconnection logic to include disconnecting workers
>> as well
>>
>>
>> Node
>>     - Add WorkerConnectionManager Module, Master's version of this  
>> can
>> be reused.
>>     - Add mechanism for tracking running workers
>>     - Add task_run that manages passing work to workers, and starting
>> new workers.
>>     - Add callback system to task_run to handle asynchronous nature  
>> of
>> waiting for a worker to start before passing on a task_run
>>     - Add remotes that proxy all other functions in
>> worker_task_controls to worker avatars
>>     - Add remotes that proxy master functions to MasterAvatar.
>>
>>
>> Worker
>>    - Modify WorkerConnectionManager to connect locally only and use
>> Node key for auth.
>>
>>
>>
>> == status ==
>>
>> Much of the above code in place but it is not tested.  I'll likely  
>> have
>> it complete within the next few days.
>> _______________________________________________
>> Pydra mailing list
>> Pydra at osuosl.org
>> http://lists.osuosl.org/mailman/listinfo/pydra
>>
>
>
>
> -- 
> Yin Qiu
> Nanjing University, China
> _______________________________________________
> Pydra mailing list
> Pydra at osuosl.org
> http://lists.osuosl.org/mailman/listinfo/pydra


More information about the Pydra mailing list