[gsoc-dev] [GSoC 2014] Mirror Syncing - Progress Report and Usage

Pranjal Mittal mittal.pranjal at gmail.com
Fri Jun 13 08:56:37 UTC 2014


Hi Ken,

Thanks for the feedback. I will get onto reading Pep 257 about docstrings.
Added the string expansion case to github issues, must correct that.

I haven't quite gone thought on the security model yet, but we must think
about it. I will try to figure out something, perhaps having some
authorization header in the POST requests.

Splitting up looks like a good idea. Although I was thinking if we could
just have single files for master and slave api that could be very
convenient from an end user perspective, but we anyways have a settings.py
file separately. So we can split the code into two apps *master* and
*slave* as per the structure you suggested.

P.S: I messed up the "How to use" section in my first email with typos and
some errors. Adding a corrected version below.

*How to use / Running the code*

   - Clone the repository from here [2] on the master node.
   - Start an rsync daemon on the master node with an appropriate rsyncd
   conf file and password.
   - We will keep only one rsync *module* on the master node.
   - Edit the settings.py file suitably where parameters like
   master_hostname, master_rsync_password etc are defined.
   - Now copy the repository with the new settings.py file to each of the
   nodes (both master and slaves) at any location.
   - Run the *master api* on the master node: python master.py (Runs on
   port 5000 by default)
   - Run the *slave api* on each of the slave node: python slave.py
   - You are now ready to use the API via the master node, as explained in
   README in the repository.
    - We will first make the master node aware of each of the slave nodes
   by adding slave nodes there.
    - Perform other actions like adding projects for syncing, etc



Best Regards,
- Pranjal


On Fri, Jun 13, 2014 at 4:07 AM, Kenneth Lett <kennric at osuosl.org> wrote:

> Pranjal,
>
> This is looking good so far, it looks like you have most of the basic
> functionality in the master and client apis. there are a couple of
> minor style things that I would clean up:
>
> Please read through PEP-257 [1] and consider updating your docstrings
> in the code
>
> Instead of using string concatenation (x + "etc" + y), use string
> expansions: ("%s etc %s" % (x, y))
>
> It might be a good idea to validate/sanitize the incoming post data, so
> a malicious user can not send a bad string for the hostname.
>
> Shouldn't the port field in the model be an integer rather than a
> string?
>
> Structurally, things look good - it might be a good idea to break the
> modules up into models.py and views.py, and put these in subdirectories
>
> master/
>         __init__.py
>         views.py
>         models.py
>
> slave/
>         __init__.py
>         views.py
>
> One last thing - have you thought about a security model? as-is, anyone
> could POST to the api and create/modify projects, at some point we'll
> need a system to prevent that, perhaps public keys - I'll do a bit of
> research into how to do this with Flask apps, but feel free to
> experiment if you have any ideas.
>
> [1] http://legacy.python.org/dev/peps/pep-0257/
>
> Thanks,
> Ken
>
>  On Thu, 12 Jun 2014 20:54:09 +0530
> Pranjal Mittal <mittal.pranjal at gmail.com> wrote:
>
> > Hello everyone,
> >
> > I am quite excited to introduce you to my project on Re-architecture
> > and implementation of tools for FTP Mirror Syncing as a part of GSoC
> > 2014 with the awesome Open Source Lab.
> >
> > *Background*
> >
> > The objective of my project is to make the FTP Mirror Syncing process
> > at OSL more scalable, easy to use by following a better architecture
> > specification and build an API as per the spec which further allows
> > building of apps on top of it like a visualization web interface, CLI
> > tools, etc.
> >
> > The architecture specifications are pretty straightforward and
> > described here [1]
> > The architecture makes use of two types of API daemons, Master API
> > Daemon that runs on master node and Slave API daemon that runs on
> > each of the FTP Hosts.
> >
> > The system administrator or user primarily interacts with the Master
> > API daemon and the Slave API daemon is used by the Master daemon
> > internally for sending messages and asking it to perform tasks like
> > rsync-ing from the master node.
> >
> >
> > *Current Standing*
> >
> > I have successfully implemented a Master API daemon with basic
> > functionality like adding, removing, updating projects.
> > Slave Node API has also been implemented and interfaced with the
> > Master API, so that projects synced up by the master node are
> > automatically synced by each of the slave nodes (after master
> > completes its syncing from upstream). For this we need to make sure
> > that the master node is aware of the slave nodes before so we will
> > have to add slave nodes to the master like we do in ganeti. (All of
> > this is explaied in the docs [3])
> >
> >
> >
> > *How to use / Running the code*
> >
> >
> >    - Clone the repository from here [2] on the master node.
> >    - Start an rsync daemon on the master node with an appropriate
> > rsyncd conf and password.
> >    - We will keep only one rsync daemon on the master node.
> >    - Edit the settings.py file suitably where parameters like
> >    master_hostname, master_rsync_password etc are defined.
> >    - Now copy the repository with the new settings.py file to each of
> > the nodes (both master and slaves) at any location.
> >    - Run an rsync daemon on the master node: python master.py (Runs
> > on port 5000 by default)
> >    - Run an rsync daemon on each of the slave node: python slave.py
> >    - You are now ready to use the API via the master node, as
> > explained in README in the repository.
> >
> >   - We will first make the master node aware of each of the slave
> > nodes by adding slave nodes there.
> >     (redundant, will be removed subsequently as slaves can
> > automatically inform master on starting api)
> >   - Perform other actions like adding projects for syncing, etc
> >
> >
> > *Results*
> >
> > We will see that the projects scheduled for syncing are synced by the
> > master node to the appropriate destination and then automatically
> > synced by each of the ftp hosts to the public_html directory as
> > defined by the SLAVE_PUBLIC_DIR settings in settings.py file.
> > The idea is to keep minimum configuration and maximum flexibility, so
> > the goal is to keep as few settings as possible so that configuration
> > management is not requried and full control is achieved from the
> > master node.
> >
> > *Future Goals*
> >
> >
> >    - The goal ahead is to add more features to the API, minimize the
> > amount of configuration involved.
> >    - Provision for feedback, Example: After the slaves complete rsync
> > from master they can inform the master, so that we have a track of
> > things centrally (useful for web interface later)
> >    - CLI tools over the API for convenience. Though example python
> > scripts have been provided which are self explanatory and can be used
> > for the purpose. [4]
> >    - { Testing, bug-patching, improving documentation, getting
> > feedback and improving } x REPEAT
> >
> >
> > *Feedback plea*
> >
> > I would definitely encourage everyone to try the code as I am hungry
> > for feedback. :D The feedback is very important for me to be able to
> > improve this code ahead and add more useful features and patch bugs.
> > Anyone is welcome to contribute to the repository too and feel free
> > to ask me any questions on the setup process since I might not have
> > described everything well.
> >
> >
> > [1]
> >
> https://docs.google.com/presentation/d/1G3uTyIreF5JvAfRwVu0l751W0bHssdaB5kfAwPwANMM/
> > [2] https://github.com/pramttl/mirror-sync-api
> > [3] https://github.com/pramttl/mirror-sync-api/blob/develop/README.md
> > [4] https://github.com/pramttl/mirror-sync-api/tree/develop/examples
> >
> >
> >
> > Thanks a lot!
> > Cheers,
> > - Pranjal
> >
> >
>
> _______________________________________________
> gsoc-dev mailing list
> gsoc-dev at lists.osuosl.org
> http://lists.osuosl.org/mailman/listinfo/gsoc-dev
>



-- 
Best Regards,
Pranjal Mittal
B.Tech.  2014
Indian Institute of Technology,BHU
Varanasi, U.P,
India

Github <http://github.com/pramttl> | LinkedIn
<http://in.linkedin.com/pub/pranjal-mittal/26/660/318/> | Blog
<http://pranjalmittal.in>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osuosl.org/pipermail/gsoc-dev/attachments/20140613/92b33ccc/attachment-0001.html>


More information about the gsoc-dev mailing list