[gsoc-dev] [GSoC 2014] Mirror Syncing - Progress Report and Usage

Kenneth Lett kennric at osuosl.org
Thu Jun 12 22:37:21 UTC 2014


Pranjal,

This is looking good so far, it looks like you have most of the basic
functionality in the master and client apis. there are a couple of
minor style things that I would clean up:

Please read through PEP-257 [1] and consider updating your docstrings
in the code

Instead of using string concatenation (x + "etc" + y), use string
expansions: ("%s etc %s" % (x, y))

It might be a good idea to validate/sanitize the incoming post data, so
a malicious user can not send a bad string for the hostname.

Shouldn't the port field in the model be an integer rather than a
string?

Structurally, things look good - it might be a good idea to break the
modules up into models.py and views.py, and put these in subdirectories 

master/
	__init__.py
	views.py
	models.py

slave/
	__init__.py
	views.py

One last thing - have you thought about a security model? as-is, anyone
could POST to the api and create/modify projects, at some point we'll
need a system to prevent that, perhaps public keys - I'll do a bit of
research into how to do this with Flask apps, but feel free to
experiment if you have any ideas.

[1] http://legacy.python.org/dev/peps/pep-0257/

Thanks,
Ken

 On Thu, 12 Jun 2014 20:54:09 +0530
Pranjal Mittal <mittal.pranjal at gmail.com> wrote:

> Hello everyone,
> 
> I am quite excited to introduce you to my project on Re-architecture
> and implementation of tools for FTP Mirror Syncing as a part of GSoC
> 2014 with the awesome Open Source Lab.
> 
> *Background*
> 
> The objective of my project is to make the FTP Mirror Syncing process
> at OSL more scalable, easy to use by following a better architecture
> specification and build an API as per the spec which further allows
> building of apps on top of it like a visualization web interface, CLI
> tools, etc.
> 
> The architecture specifications are pretty straightforward and
> described here [1]
> The architecture makes use of two types of API daemons, Master API
> Daemon that runs on master node and Slave API daemon that runs on
> each of the FTP Hosts.
> 
> The system administrator or user primarily interacts with the Master
> API daemon and the Slave API daemon is used by the Master daemon
> internally for sending messages and asking it to perform tasks like
> rsync-ing from the master node.
> 
> 
> *Current Standing*
> 
> I have successfully implemented a Master API daemon with basic
> functionality like adding, removing, updating projects.
> Slave Node API has also been implemented and interfaced with the
> Master API, so that projects synced up by the master node are
> automatically synced by each of the slave nodes (after master
> completes its syncing from upstream). For this we need to make sure
> that the master node is aware of the slave nodes before so we will
> have to add slave nodes to the master like we do in ganeti. (All of
> this is explaied in the docs [3])
> 
> 
> 
> *How to use / Running the code*
> 
> 
>    - Clone the repository from here [2] on the master node.
>    - Start an rsync daemon on the master node with an appropriate
> rsyncd conf and password.
>    - We will keep only one rsync daemon on the master node.
>    - Edit the settings.py file suitably where parameters like
>    master_hostname, master_rsync_password etc are defined.
>    - Now copy the repository with the new settings.py file to each of
> the nodes (both master and slaves) at any location.
>    - Run an rsync daemon on the master node: python master.py (Runs
> on port 5000 by default)
>    - Run an rsync daemon on each of the slave node: python slave.py
>    - You are now ready to use the API via the master node, as
> explained in README in the repository.
> 
>   - We will first make the master node aware of each of the slave
> nodes by adding slave nodes there.
>     (redundant, will be removed subsequently as slaves can
> automatically inform master on starting api)
>   - Perform other actions like adding projects for syncing, etc
> 
> 
> *Results*
> 
> We will see that the projects scheduled for syncing are synced by the
> master node to the appropriate destination and then automatically
> synced by each of the ftp hosts to the public_html directory as
> defined by the SLAVE_PUBLIC_DIR settings in settings.py file.
> The idea is to keep minimum configuration and maximum flexibility, so
> the goal is to keep as few settings as possible so that configuration
> management is not requried and full control is achieved from the
> master node.
> 
> *Future Goals*
> 
> 
>    - The goal ahead is to add more features to the API, minimize the
> amount of configuration involved.
>    - Provision for feedback, Example: After the slaves complete rsync
> from master they can inform the master, so that we have a track of
> things centrally (useful for web interface later)
>    - CLI tools over the API for convenience. Though example python
> scripts have been provided which are self explanatory and can be used
> for the purpose. [4]
>    - { Testing, bug-patching, improving documentation, getting
> feedback and improving } x REPEAT
> 
> 
> *Feedback plea*
> 
> I would definitely encourage everyone to try the code as I am hungry
> for feedback. :D The feedback is very important for me to be able to
> improve this code ahead and add more useful features and patch bugs.
> Anyone is welcome to contribute to the repository too and feel free
> to ask me any questions on the setup process since I might not have
> described everything well.
> 
> 
> [1]
> https://docs.google.com/presentation/d/1G3uTyIreF5JvAfRwVu0l751W0bHssdaB5kfAwPwANMM/
> [2] https://github.com/pramttl/mirror-sync-api
> [3] https://github.com/pramttl/mirror-sync-api/blob/develop/README.md
> [4] https://github.com/pramttl/mirror-sync-api/tree/develop/examples
> 
> 
> 
> Thanks a lot!
> Cheers,
> - Pranjal
> 
> 



More information about the gsoc-dev mailing list