[Pydra] Master-Node-Worker relationship refactor
peter at osuosl.org
Tue Aug 18 21:31:21 UTC 2009
It does not appear to load existing tasks properly. This happened when
i ran the task a second time:
line 114, in remoteMessageReceived
state = method(*args, **kw)
line 73, in run_task
workunit_key, main_worker, task_id)
line 304, in retrieve_task
module_search_path = [pydraSettings.tasks_dir, pkg.folder \
exceptions.AttributeError: TaskPackage instance has no attribute 'folder'
Otherwise it works really well, great job!
I didn't test it with larger files but i didn't really notice the
extra load time while syncing. I like how you split out the package
loading as well, great way of handling it.
it handles updated files but can it handle multiple versions for the
same task handled for updating while a task is running? I know that one
was a bit more complicated and likely required saving files in a
different directory structure.
Yin QIU wrote:
> Thanks Peter. That's great news.
> I have managed to make task synchronization work. In my experiment, I
> ran the master and a node in two different folders on the same
> machine. Initially, the node does not contain any task code in its
> task_cache. After the scheduler dispatches a task (I used the simplest
> TestTask, others will presumably do too) to the worker, the node will
> automatically synchronize the task code with the master. Similar
> things will happen too if the task code is updated on the master. As
> discussed, task synchronization is done asynchronously.
> Under the hood, two modules, namely TaskSycnClient and TaskSyncServer,
> interacts with each other. The former is located on the node, and the
> latter resides on the master. Currently, TaskSyncClient is a module of
> the Worker ModuleManager. After your refacotoring is done, it won't be
> hard to migrate it to the Node ModuleManager.
> Latest changes have been committed to the task_packaging branch on github.
> On Tue, Aug 18, 2009 at 12:44 PM, Peter Krenesky<peter at osuosl.org> wrote:
>> Hi all,
>> I've started refactoring Master, Node, and Worker to change the way in
>> which they relate to eachother. When this refactor is complete Master
>> will only communicate with Nodes. Node will be the only component to
>> interact with Workers. Workers will be spawned per TaskInstance.
>> == WHY? ==
>> - workers need to be chrooted (sandboxed) per TaskInstance to ensure
>> no task can affect other users. Even importing a task file to read task
>> name and description puts the cluster at risk.
>> - Some libraries, django especially, can only be configured once per
>> runtime. This means changing datasources is not possible under the
>> current system.
>> - less network overhead from TCP connections.
>> - simpler networking logic.
>> == How? ==
>> - remove WorkerConnectionManager module
>> - change add_node() so that instead of adding workers to the
>> checker, WORKER_CONNECTED signals are emited with a special proxy object
>> that mimics a WorkerAvatar but is really the remote from the Node. This
>> allows all other logic in Master to remain the same.
>> - change node disconnection logic to include disconnecting workers
>> as well
>> - Add WorkerConnectionManager Module, Master's version of this can
>> be reused.
>> - Add mechanism for tracking running workers
>> - Add task_run that manages passing work to workers, and starting
>> new workers.
>> - Add callback system to task_run to handle asynchronous nature of
>> waiting for a worker to start before passing on a task_run
>> - Add remotes that proxy all other functions in
>> worker_task_controls to worker avatars
>> - Add remotes that proxy master functions to MasterAvatar.
>> - Modify WorkerConnectionManager to connect locally only and use
>> Node key for auth.
>> == status ==
>> Much of the above code in place but it is not tested. I'll likely have
>> it complete within the next few days.
>> Pydra mailing list
>> Pydra at osuosl.org
More information about the Pydra