[Pydra] Master-Node-Worker relationship refactor

Yin QIU allenchue at gmail.com
Thu Aug 20 06:07:26 UTC 2009


On Thu, Aug 20, 2009 at 12:12 AM, Peter Krenesky<peter at osuosl.org> wrote:
> Yin QIU wrote:
>> On Wed, Aug 19, 2009 at 5:31 AM, Peter Krenesky<peter at osuosl.org> wrote:
>>
>>> It does not appear to load existing tasks properly.  This happened when
>>> i ran the task a second time:
>>>
>>>  File "/usr/lib/python2.6/dist-packages/twisted/spread/flavors.py",
>>> line 114, in remoteMessageReceived
>>>    state = method(*args, **kw)
>>>  File
>>> "/home/peter/wrk/pydra.sync2/pydra/pydra_server/cluster/worker/worker_task_controls.py",
>>> line 73, in run_task
>>>    workunit_key, main_worker, task_id)
>>>  File
>>> "/home/peter/wrk/pydra.sync2/pydra/pydra_server/cluster/tasks/task_manager.py",
>>> line 304, in retrieve_task
>>>    module_search_path = [pydraSettings.tasks_dir, pkg.folder \
>>> exceptions.AttributeError: TaskPackage instance has no attribute 'folder'
>>>
>>>
>>
>> Sorry that I missed to test that branch of the control flow. I've
>> fixed this issue.
>>
>>
> Great!
>
>>> Otherwise it works really well, great job!
>>>
>>>  I didn't test it with larger files but i didn't really notice the
>>> extra load time while syncing.  I like how you split out the package
>>> loading as well, great way of handling it.
>>>
>>>
>>
>> Good to hear this. Thanks!
>>
>>
>>> it handles updated files but can it handle multiple versions for the
>>> same task handled for updating while a task is running?  I know that one
>>> was a bit more complicated and likely required saving files in a
>>> different directory structure.
>>>
>>
>> Technically this wouldn't cause much trouble. Physical locations of
>> task packages is now computed simply by concatenating the tasks_dir
>> and the package name. We read and write task packages using the
>> package name as a key. We can include versions in the lookup too,
>> which will allow multiple versions of task packages simultaneously
>> exist. By examining the last-modified time of folders, we'll be able
>> to tell which folder contains the latest code.
>>
>> My primary concern is, however, is complicated package deployment
>> logic. Imagining we have already got the multi-version-package
>> feature, I can think of two major problems (or complications).
>>
>> 1. The user deploys a task package called "foo" on the master. Now the
>> user shouldn't be allowed to directly put a "foo" folder in
>> task_cache. Since we may be tracking multiple versions of "foo", the
>> contents of "foo" is likely to be saved in "task_cache/foo/v3", where
>> "v3", in our implementation, is the calculated sha1 hash of the
>> package contents. From another perspective, we should prevent users
>> from manipulating task_cache manually, for that would cause
>> inconsistency.
>>
>>
> you're right about this.  A user manually editing it would cause problems.
>
> What if we store a processed copy of the task?  We provide a TASK_CACHE
> directory which users can deploy tasks to.  When task manager reads the
> tasks from TASK_CACHE they are copied to TASK_CACHE_INTERNAL if its a
> new/updated version.  The tasks that actually get loaded/executed would
> be in TASK_CACHE_INTERNAL.
>
> It duplicates stored code but it allows users to change a file without
> affecting the code pydra uses.
>

The first thing that comes up is efficiency. But it can be easily
solved by using hard links.

Besides, I guess we still have to provide certain mechanism (e.g.,
write permission) to protect task_cache_internal, because it is
essentially also a file system resource just as task_cache is.

Another thing which I want to make clear is: how many different
methods of deploying task packages are we going to support? Knowing
this will help us consolidate the code that reads/writes tasks
packages.

>
>> 2. When versions of a task package increases, we'll need to clean up
>> older contents. This can be done manually or automatically. Obviously,
>> automatic cleanup is more appealing. But to do this, we have to track
>> the status of each version of a task package (probably in the
>> scheduler). If it is not used by any worker, we can safely delete it.
>>
>>
> definitely an automatic cleanup.  The only reason for multiple versions
> is to allow updates while tasks are running.  we could modify the
> scheduler to emit TASK_START and TASK_STOP so that the task manager
> could track which packages are in use.

Yes. Signals would be great.

>>> -Peter
>>>
>>>
>>>
>>>
>>> Yin QIU wrote:
>>>
>>>> Thanks Peter. That's great news.
>>>>
>>>> I have managed to make task synchronization work. In my experiment, I
>>>> ran the master and a node in two different folders on the same
>>>> machine. Initially, the node does not contain any task code in its
>>>> task_cache. After the scheduler dispatches a task (I used the simplest
>>>> TestTask, others will presumably do too) to the worker, the node will
>>>> automatically synchronize the task code with the master. Similar
>>>> things will happen too if the task code is updated on the master. As
>>>> discussed, task synchronization is done asynchronously.
>>>>
>>>> Under the hood, two modules, namely TaskSycnClient and TaskSyncServer,
>>>> interacts with each other. The former is located on the node, and the
>>>> latter resides on the master. Currently, TaskSyncClient is a module of
>>>> the Worker ModuleManager. After your refacotoring is done, it won't be
>>>> hard to migrate it to the Node ModuleManager.
>>>>
>>>> Latest changes have been committed to the task_packaging branch on github.
>>>>
>>>> On Tue, Aug 18, 2009 at 12:44 PM, Peter Krenesky<peter at osuosl.org> wrote:
>>>>
>>>>
>>>>> Hi all,
>>>>>
>>>>> I've started refactoring Master, Node, and Worker to change the way in
>>>>> which they relate to eachother.  When this refactor is complete Master
>>>>> will only communicate with Nodes.   Node will be the only component to
>>>>> interact with Workers.  Workers will be spawned per TaskInstance.
>>>>>
>>>>>
>>>>> == WHY? ==
>>>>>  - workers need to be chrooted (sandboxed) per TaskInstance to ensure
>>>>> no task can affect other users.  Even importing a task file to read task
>>>>> name and description puts the cluster at risk.
>>>>>
>>>>>  - Some libraries, django especially, can only be configured once per
>>>>> runtime.  This means changing datasources is not possible under the
>>>>> current system.
>>>>>
>>>>>  - less network overhead from TCP connections.
>>>>>
>>>>>  - simpler networking logic.
>>>>>
>>>>>
>>>>>
>>>>> == How? ==
>>>>>
>>>>> Master
>>>>>     - remove WorkerConnectionManager module
>>>>>     - change add_node() so that instead of adding workers to the
>>>>> checker, WORKER_CONNECTED signals are emited with a special proxy object
>>>>> that mimics a WorkerAvatar but is really the remote from the Node.  This
>>>>> allows all other logic in Master to remain the same.
>>>>>    - change node disconnection logic to include disconnecting workers
>>>>> as well
>>>>>
>>>>>
>>>>> Node
>>>>>     - Add WorkerConnectionManager Module, Master's version of this can
>>>>> be reused.
>>>>>     - Add mechanism for tracking running workers
>>>>>     - Add task_run that manages passing work to workers, and starting
>>>>> new workers.
>>>>>     - Add callback system to task_run to handle asynchronous nature of
>>>>> waiting for a worker to start before passing on a task_run
>>>>>     - Add remotes that proxy all other functions in
>>>>> worker_task_controls to worker avatars
>>>>>     - Add remotes that proxy master functions to MasterAvatar.
>>>>>
>>>>>
>>>>> Worker
>>>>>    - Modify WorkerConnectionManager to connect locally only and use
>>>>> Node key for auth.
>>>>>
>>>>>
>>>>>
>>>>> == status ==
>>>>>
>>>>> Much of the above code in place but it is not tested.  I'll likely have
>>>>> it complete within the next few days.
>>>>> _______________________________________________
>>>>> Pydra mailing list
>>>>> Pydra at osuosl.org
>>>>> http://lists.osuosl.org/mailman/listinfo/pydra
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Pydra mailing list
>>> Pydra at osuosl.org
>>> http://lists.osuosl.org/mailman/listinfo/pydra
>>>
>>>
>>
>>
>>
>>
>
> _______________________________________________
> Pydra mailing list
> Pydra at osuosl.org
> http://lists.osuosl.org/mailman/listinfo/pydra
>



-- 
Yin Qiu
Nanjing University, China


More information about the Pydra mailing list