[gsoc-dev] [Mirror-Syncing] Question: Rsync configuration on master node

Pranjal Mittal mittal.pranjal at gmail.com
Tue May 27 21:56:07 UTC 2014


On Tue, May 27, 2014 at 11:41 PM, Lance Albertson <lance at osuosl.org> wrote:

> ​[Moving to the list so others can see our discussion]​
>
> On Sun, May 25, 2014 at 4:34 PM, Pranjal Mittal <mittal.pranjal at gmail.com>wrote:
>
>> How are you doing?
>>
>
> ​Doing good! We had a 3-day holiday weekend so I was AFK most of the time.​
>

>
>> Somehow our times weren't able to align last week for a chat.
>>
>
> ​I should be around today and tomorrow. However I should let you know that
> I will be out on personal vacation from May 29 - Jun 6. I can certainly
> make sure I reply to emails the best I can​. Does 10AM PDT seem to work
> best for you? I'll exactly be in CDT during my vacation which may or may
> not help you out. Lets try and find a time slot to try and be available for
> generally most days (minus weekends) and I'll try my best to make it.
>

Yes, 10 AM PDT is fine for me and I should be around this time on most
days.  I wouldn't want to trouble during vacation but if I have to, then I
think emails would do.


>
>> Was keen to know if you tried testing the master api that I have written
>> so far [1]? I am quite in need of suggestions to improve it and add more
>> features. :)
>>
>
> ​I'm looking now and I'm trying to wrap my head around this compared to
> how it currently works. Here's a few things I think might be helpful. Let
> me know if you'd prefer I move these to github issues. We can at least
> discuss them here first and then create issues later.
>

Definitely, makes sense to add these issues on github. I'll try to add most
of them after reading this mail and you can comment there / add missing
ones.


> 1. Tweaking the cron API
>
> I'm not sure how this works in the code but to me as a user its hard to
> read how the interval looks. Would it be possible to just give it a full
> cron-like string? (i.e. "43 6,18 * * *"). It may not make sense to do this
> since its just an API and we can write our tools to do this but just
> something to consider.
>

Yes, I thought about this. I think its better to have it as is in the api
as it looks more readable and easy to understand as all the parameters are
named. I think we can make a *CLI tool *over this anyway that uses the 5
element string format. (And one more fact to consider, since apscheduler
provides scheduling with precision of seconds, so there is a 6th parameter
if we want to use it as well)


>
> 2. start_date
>
> What is this used for? Is this the effective time to start syncing? IMO it
> should start as soon as you enter it in. If we need to disable a sync that
> should be an API flag to set (which should be set as enabled to sync by
> default).
>

Yes, this is the effective time syncing starts (start time included). If
this parameter is not provided syncing is in effect as soon as  project is
added.

Disable/Enable(default) feature is a good idea! I'll add this as a github
issue.


>
> 3. field names
>
> I think it might be helpful to use slightly different names for some of
> the fields. Here are a couple of suggestions:
>
> host -> rsync_host
> source -> rsync_module
>
> That way we know its specific to rsync. Also, it looks like source implies
> that "::" will be prefixed in the rsync command. That might be documented
> somewhere. I don't think forcing the use of :: is a problem, but we may
> want to be able to disable that at some point.
>

Sure, I'll rename these.
The *source* means the rsync module + relative path from the module
directory. Eg. if there is module called "*foo" *on the rsync host. We can
even have the source as "*foo/subdir" *or "*foo/subdir/"*. So, to be
precise source is not necessarily just the module name.
But I still get the feeling rsync_module is a better name over source. Any
other alternatives that elaborate the use of this parameter better?


>
> 4. Extra rsync args
>
> I think its going to be very important to have something like
> rsync_options for specific sync arguments we may need per project.
>

Oh yes this is important. Having an *rsync_options* list of strings would
be a nice idea.


> 5. --delete options
>
> We will likely need to have a different --delete option for some repos.
> Sometimes we prefer --delete-before sometimes we like --delete-after, etc.
> It might make sense to add an rsync_delete option.
>

I think this could come in the *rsync_options* list itself (mentioned
above)? I mean we can omit the "--delete" option from the list when we want
to use "--delete-before"?
Does this make sense?


>
> 6. default rsync options
>
> We likely want to have a global rsync option parameter. We'll probably use
> something like "-avH" as the default but it would be good to either make
> this a global or per project option.
>

Per project looks like the easy way out. We can set it globally, but if we
do not want to use -avH in some cases then we would need some way to
override it. I can set it as the default option if we need it always?


>
> 7. rsync user/password
>
> I'm a little confused how you're using this. Where do you specify the
> rsync user if it has a password? I would also make sure you use
> "rsync_user" and "rsync_password" so that we know what its for.
>

*project  (*name)  is itself the *rsync_user.*
I think it makes sense to rename it as rsync_user. Would you suggest that?
I used *project *because it looked close to the terminology in the
*osuosl-mirror-sync
<https://github.com/osuosl/osuosl-mirror-sync>* code.



>
>
>> I am thinking on the Slave node API too. As per the reference
>> architecture [2] we have to run an rsync daemon on the master from which
>> the ftp hosts pull when told to.
>> I was wondering how many rsync modules is it ideal to have on the master
>> node? Currently I see that we use one per array?
>>
>
> ​Is this the same module that we use with port 9000 currently?​ If so see
> below.
>
> Well we don't need to follow the same methods we used in the old setup
> since we already know which path to sync to/from. ​Ideally we just need one
> rsync module that runs on a specific port and has special permissions that
> only allows slaves to sync from it. Are you planning on having the app
> manage this rsync module/service? I'm not sure that's the ideal way to do
> it so we may want to talk about this. This is getting into the realm of
> stuff configuration management could do vs. what the app should do.
>

I am not sure if I understood all of this perfectly. Since I wrote you the
last email I figured out we might not have to modify the rsyncd.conf file
on master.
There would be an api endpoint on the master through which slave hosts (+
port info) can be added.
Aren't having one rsync module per array (2 modules on master rsync
daemon)? I mean as you mentioned we could have a single daemon only that
points to *ftp *directory on the master node and ftp/.1/ or ftp/.2/ can be
specified in the rsync_module or source (whatever we call it) when adding
projects.


>
>
>> The rsync configuration file on the master node will be fixed I suppose
>> (i.e. not changing with addition/update/removal of projects via the API) ?
>>
>
> See above.​
>
> On the topic of the rsync config for the public facing rsync service, I do
> think this app should at least generate the config file for that. But lets
> talk about this more in person. We need to be careful on what we do or
> don't do with the app.
>

Yes, I will discuss this part with you when we talk on irc / hangouts next
to understand it better.




>
>> [1] https://github.com/pramttl/mirror-sync-api
>> [2]
>> https://docs.google.com/presentation/d/1G3uTyIreF5JvAfRwVu0l751W0bHssdaB5kfAwPwANMM/
>
>
>
> --
> Lance Albertson
> Director
> Oregon State University | Open Source Lab
>


Thanks,
Kind Regards,
- Pranjal



-- 
Pranjal Mittal
B.Tech.  2014
Indian Institute of Technology,BHU
Varanasi, U.P,
India

Github <http://github.com/pramttl> |
LinkedIn<http://in.linkedin.com/pub/pranjal-mittal/26/660/318/>|
Blog <http://pranjalmittal.in>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osuosl.org/pipermail/gsoc-dev/attachments/20140528/87aa461e/attachment-0001.html>


More information about the gsoc-dev mailing list