[Pydra] Master-Node-Worker relationship refactor

Peter Krenesky peter at osuosl.org
Sun Aug 23 06:19:03 UTC 2009


Branch has been with a more functional version. 

Simple tasks can now be run, workers will be cleaned up correctly.  Have
not yet tested with parallelTasks, I expect them to be broken but most
of the work should be done.  Also have not implemented all of the worker
shutdown code for errors and other cases other than a successful task

-Peter

Peter Krenesky wrote:
> I have a semi-functional version checked into the node_refactor branch. 
> Its been a tedious process moving things around.  Its been going slower
> than I hoped but I've made alot of progress tonight.
>
> This code has the majority of the refactor but is still missing a few
> things for managing the workers.  It can run a simple task once.
>
> bugs:
>    * does not kill workers after a task completes
>    * does not kill workers after release has been called
>    * will only run a task once due to a deferred error
>
> -Peter
>
>
> Peter Krenesky wrote:
>   
>> Hi all,
>>
>> I've started refactoring Master, Node, and Worker to change the way in
>> which they relate to eachother.  When this refactor is complete Master
>> will only communicate with Nodes.   Node will be the only component to
>> interact with Workers.  Workers will be spawned per TaskInstance.
>>
>>
>> == WHY? ==
>>   - workers need to be chrooted (sandboxed) per TaskInstance to ensure
>> no task can affect other users.  Even importing a task file to read task
>> name and description puts the cluster at risk.
>>  
>>   - Some libraries, django especially, can only be configured once per
>> runtime.  This means changing datasources is not possible under the
>> current system.
>>
>>   - less network overhead from TCP connections.
>>
>>   - simpler networking logic.
>>  
>>
>>
>> == How? ==
>>
>> Master
>>      - remove WorkerConnectionManager module
>>      - change add_node() so that instead of adding workers to the
>> checker, WORKER_CONNECTED signals are emited with a special proxy object
>> that mimics a WorkerAvatar but is really the remote from the Node.  This
>> allows all other logic in Master to remain the same.
>>     - change node disconnection logic to include disconnecting workers
>> as well
>>    
>>
>> Node
>>      - Add WorkerConnectionManager Module, Master's version of this can
>> be reused.
>>      - Add mechanism for tracking running workers
>>      - Add task_run that manages passing work to workers, and starting
>> new workers. 
>>      - Add callback system to task_run to handle asynchronous nature of
>> waiting for a worker to start before passing on a task_run
>>      - Add remotes that proxy all other functions in
>> worker_task_controls to worker avatars
>>      - Add remotes that proxy master functions to MasterAvatar.
>>
>>
>> Worker
>>     - Modify WorkerConnectionManager to connect locally only and use
>> Node key for auth.
>>
>>
>>
>> == status ==
>>
>> Much of the above code in place but it is not tested.  I'll likely have
>> it complete within the next few days.
>>
>>   
>>     
>
>
>   



More information about the Pydra mailing list