[darcs-devel] patch for lazy partial repos

Tue Apr 10 13:06:44 PDT 2007

On Tue, Apr 10, 2007 at 09:21:37PM +0200, Eric Y. Kow wrote:
> On Mon, Apr 09, 2007 at 17:24:33 -0700, David Roundy wrote:
> > Right, I just couldn't think of a better description than "lazy" for what
> > we're doing.  It's not a very good name, since it's probably only
> > meaningful to haskell programmers and CS folks.
> 
> Oh.  I hadn't thought of that.  Maybe something like --postpone-download

Or maybe a --download-when-needed? This is the sort of thing I'd probably
run by darcs-users.  But I don't mind leaving it as lazy for a while, since
hashed repositories won't stabilize for a few months, at least.

> > > Perhaps it would be good if --lazy implied --partial and
> > > --hashed-inventory
> > 
> > Yes, I thought about that and wasn't sure.  And am still unsure.
> 
> What might also be interesting is if --lazy were completely orthogonal
> to --partial.  You know, lazy complete repositories which just avoid
> getting patches until they have to.

I thought about that, but a lazy complete repository would always be unlazy
(all the patches downloaded) by the time the get is completed.  On the
other hand, a lazy complete get would be resumable by running repair.
i.e. if darcs dies before all the patches have been downloaded, you could
just run repair to finish downloading the rest, so that's not a bad idea...

We could potentially introduce a get --resume (or resume get?) which
effectively does a repair and a revert --all.

> > I'd prefer to always gzip them, but don't think it's a big deal.  In any
> > case, the user should be able to change the option using optimize.
> 
> Yay! I was wondering why they weren't being gzipped.  Also, would it
> make the code complicated to have the .gz extension on the patch
> filenames?

I'd rather not--either that or always keep the .gz extension.  It's just
easier to have the file always have the same name, so we don't have to try
twice to locate it.  And I thought it'd be a bit prettier to not have the
.gz extension, since this really is an internal data file.  But I could go
either way (and historically, I *have* gone both ways) on this question.

> > > Also, this mechanism suffers from a variant of --partial's leapfrog
> > > problem (double get):
> 
> To be fair, how often do people actually do this?

More than you might think, I'd say.

> > One option that could alleviate things would be to simply use symlinks or
> > hard links when the url is local (i.e. has no ':' in it).  This might cover
> > many such possibilities.
> 
> Well, I guess we could try to use hardlinks if it's local, with the
> option of failing if it's a different filesystem, e.g. NFS mount.
> Symlinks frighten me somewhat.  What if somebody moves the source repo?
> 
> Actually, would that be a source of concern for reading lazy patches?
> One thing I've always liked about darcs's distributedness is the
> inherent robustness. Somebody move the repo away? No problem, just
> start pulling from elsewhere.  How would we deal with a file no
> longer existing?

The reason I'd be happy with symlinks is that they have precisely the same
behavior as lazy patches, in that they're unsafe if the source repository
might change.  It's something we'll have to make clear, that you don't want
to use --lazy unless you're sure the source repository will stay at the
same url.

> > It'd be a user-interace improvement (if optional), but I'm afraid it would
> > *greatly* complicate the code (which is currently very simple).
> 
> Too bad.
> 
> To continue my random thought, I was thinking that it would be great if
> partial repositories had some kind of checkpoint awareness, something
> like:
> 
>    I need to retrieve 43 patches up to checkpoint
>      2.0 stable
>    Go ahead? [ynaq]
> 
>    I need to retrieve 132 patches up to checkpoint
>      1.7
>    Go ahead? [ynaq]

Hmmmm.  I tend to agree, but am unsure how that'd look in terms of the
code.  Maybe if we made the inventory-reading code laziness aware (so that
we could lazily read inventories as well as patches), that'd give us a
frameword in which to place this feature.  i.e. we could have something
like a

-- examineHashFile returns either the url to grab from or the contents of
-- the file
examineHashFile :: String -> String -> String
                -> IO (Either String PackedString)

read_tag_inventory :: [DarcsFlag] -> String -> String -> String -> IO PatchSet
read_tag_inventory opts url outdir ihash = do
  either_inv <- examineHashFile opts url outdir ihash
  case either_inv of
    Left url' -> handle_lazy_inventory opts url' outdir ihash
    Right inv -> return $ parse_inventory inv

with the idea being that handle_lazy_inventory will grab the inventory
file, count the patches in it, prompt whether to go ahead (as in your
suggestion), and then run copyHashFile on all those patches and return the
parsed inventory.  If the user says no, it'll exit with an exception.

I think this would work, and it'd have the advantage of both increasing the
laziness (inventories would be lazy) and providing a better UI.  We'd
obviously need a flag to avoid prompting during a non-interactive run (or
for users who don't mind downloading patches), and I suppose we'd also want
a global variable so we can implement the "a" option.  I'm not sure what
the difference between your "n" and "q" options would be.  I guess maybe
it'd be that with the "n" option all the file reads would return "Nothing"
instead of exiting with an exception.  Which would work for changes, since
changes happily skips over patches that it doesn't actually have.

> > > It might serve as documentation for a good chunk of these functions.
> > 
> > Even better might be a newtype.  It's more of a pain, but actually
> > enforces the documentation.
> 
> I could live with that.  The pain might not be so severe if our
> functions are just passing the paths/urls around and not trying to
> match on them.  But I'm not sure how strong we want the typing on
> stuff like this, in practice.  In theory, I'm for it.

Right.  The idea would be that once everything was converted to
"statically-verified safe clean code" there wouldn't be any pain left.
Only on reading and writing would there be potential pain, and those
conversions might be leveraged to do some sort of canonicalization.  This
would also allow us to possibly hide PackedStrings in the url/filepath
type, which potentially could give us some more speed.
-- 
David Roundy
Department of Physics
Oregon State University
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.osuosl.org/pipermail/darcs-devel/attachments/20070410/f7b7f484/attachment.pgp