[gsoc-dev] [GSoC 2014] Mirror Syncing - Progress Report and Usage

Pranjal Mittal mittal.pranjal at gmail.com
Thu Jun 12 15:24:09 UTC 2014


Hello everyone,

I am quite excited to introduce you to my project on Re-architecture and
implementation of tools for FTP Mirror Syncing as a part of GSoC 2014 with
the awesome Open Source Lab.

*Background*

The objective of my project is to make the FTP Mirror Syncing process at
OSL more scalable, easy to use by following a better architecture
specification and build an API as per the spec which further allows
building of apps on top of it like a visualization web interface, CLI
tools, etc.

The architecture specifications are pretty straightforward and described
here [1]
The architecture makes use of two types of API daemons, Master API Daemon
that runs on master node and Slave API daemon that runs on each of the FTP
Hosts.

The system administrator or user primarily interacts with the Master API
daemon and the Slave API daemon is used by the Master daemon internally for
sending messages and asking it to perform tasks like rsync-ing from the
master node.


*Current Standing*

I have successfully implemented a Master API daemon with basic
functionality like adding, removing, updating projects.
Slave Node API has also been implemented and interfaced with the Master
API, so that projects synced up by the master node are automatically synced
by each of the slave nodes (after master completes its syncing from
upstream). For this we need to make sure that the master node is aware of
the slave nodes before so we will have to add slave nodes to the master
like we do in ganeti. (All of this is explaied in the docs [3])



*How to use / Running the code*


   - Clone the repository from here [2] on the master node.
   - Start an rsync daemon on the master node with an appropriate rsyncd
   conf and password.
   - We will keep only one rsync daemon on the master node.
   - Edit the settings.py file suitably where parameters like
   master_hostname, master_rsync_password etc are defined.
   - Now copy the repository with the new settings.py file to each of the
   nodes (both master and slaves) at any location.
   - Run an rsync daemon on the master node: python master.py (Runs on port
   5000 by default)
   - Run an rsync daemon on each of the slave node: python slave.py
   - You are now ready to use the API via the master node, as explained in
   README in the repository.

  - We will first make the master node aware of each of the slave nodes by
adding slave nodes there.
    (redundant, will be removed subsequently as slaves can automatically
inform master on starting api)
  - Perform other actions like adding projects for syncing, etc


*Results*

We will see that the projects scheduled for syncing are synced by the
master node to the appropriate destination and then automatically synced by
each of the ftp hosts to the public_html directory as defined by the
SLAVE_PUBLIC_DIR settings in settings.py file.
The idea is to keep minimum configuration and maximum flexibility, so the
goal is to keep as few settings as possible so that configuration
management is not requried and full control is achieved from the master
node.

*Future Goals*


   - The goal ahead is to add more features to the API, minimize the amount
   of configuration involved.
   - Provision for feedback, Example: After the slaves complete rsync from
   master they can inform the master, so that we have a track of things
   centrally (useful for web interface later)
   - CLI tools over the API for convenience. Though example python scripts
   have been provided which are self explanatory and can be used for the
   purpose. [4]
   - { Testing, bug-patching, improving documentation, getting feedback and
   improving } x REPEAT


*Feedback plea*

I would definitely encourage everyone to try the code as I am hungry for
feedback. :D The feedback is very important for me to be able to improve
this code ahead and add more useful features and patch bugs. Anyone is
welcome to contribute to the repository too and feel free to ask me any
questions on the setup process since I might not have described everything
well.


[1]
https://docs.google.com/presentation/d/1G3uTyIreF5JvAfRwVu0l751W0bHssdaB5kfAwPwANMM/
[2] https://github.com/pramttl/mirror-sync-api
[3] https://github.com/pramttl/mirror-sync-api/blob/develop/README.md
[4] https://github.com/pramttl/mirror-sync-api/tree/develop/examples



Thanks a lot!
Cheers,
- Pranjal


-- 
Pranjal Mittal
B.Tech.  2014
Indian Institute of Technology,BHU
Varanasi, U.P,
India

Github <http://github.com/pramttl> | LinkedIn
<http://in.linkedin.com/pub/pranjal-mittal/26/660/318/> | Blog
<http://pranjalmittal.in>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osuosl.org/pipermail/gsoc-dev/attachments/20140612/03f7ce56/attachment.html>


More information about the gsoc-dev mailing list