[darcs-devel] patch for lazy partial repos
David Roundy
droundy at darcs.net
Mon Apr 9 17:24:33 PDT 2007
Hi Eric (and lurkers),
I'm cutting the bits that explain what's going on which are correct, which
is most of Eric's email. Thanks for reviewing this, Eric!
> The mechanism does not seem to particularly rely on Haskell laziness.
> The idea is that the first time you attempt to retrieve a patch, you
> merely write down where the patch lives. For example, let's take a look
> a hypothetical patch file from a recent get. The filepath is
> _darcs/patches/4b00ace798c51edc37702890d9a5262ed3ded490 and its contents
> are indented below:
>
> url: /private/tmp/x/a/_darcs/patches
Right, I just couldn't think of a better description than "lazy" for what
we're doing. It's not a very good name, since it's probably only
meaningful to haskell programmers and CS folks.
> General comments
> ----------------
> Perhaps it would be good if --lazy implied --partial and
> --hashed-inventory
Yes, I thought about that and wasn't sure. And am still unsure.
> There is seems to be some inconsistency about when patches are gzipped,
> and when they are not. As far as I can tell, when you create a
> repository with a hashed inventory, you do not have gzipped patches.
> However, when you retrieve patches via a lazy get, they *are*
> compressed. I guess it doesn't really matter if darcs is doing
> everything behind the scenes, but (i) it might be nice if patches were
> compressed at all times as is the case with regular unhashed inventory
> (ii) it may save us some confusion later on.
I'd prefer to always gzip them, but don't think it's a big deal. In any
case, the user should be able to change the option using optimize.
> Also, this mechanism suffers from a variant of --partial's leapfrog
> problem (double get):
>
> darcs get --lazy --partial --hashed-inventory foo bar
> # put bar on a web server or chmod -R -w bar
> darcs get --lazy --partial --hashed-inventory bar quux
>
> user error (problem gzopening file for write:
> /private/tmp/bar/_darcs/patches/4b00ace798c51edc37702890d9a5262ed3ded490)
>
> And even if it does work (say, on the local end), I'm not very keen on
> the idea of bar patches being rewritten (fleshed out) because I try to
> do a darcs get from it. I wonder if we could do something more clever
> to avoid this, somehow detect when bar patches contain urls and copy
> those over instead of pointing to bar.
This is indeed problematic, and I think the best we can hope do is probably
to document that --lazy is no good if the source isn't reliably
accessible.
One option that could alleviate things would be to simply use symlinks or
hard links when the url is local (i.e. has no ':' in it). This might cover
many such possibilities.
> Finally, I'm a bit skeptical that this will really avoid patches being
> retrieved. It seems that any little operation, specifically changes -v
> will trigger a full download and also when you're not particularly
> expecting it.
That's definitely a danger, but in the cases where --partial works, which
is read-only use or occasional development (with nothing tricky like
examining all the patches), --lazy will also work just as well. When
--partial does not work, most users would prefer to download more patches
rather than get an error message.
> As a random thought, maybe a sort of interactive laziness could be made
> to work, and if so useful:
>
> I need to retrieve the following patch:
> Adjust the frobnicator reuptake simulators [y/n/a/q]
>
> Or...
>
> I need to retrieve 765 patches to do this. Go ahead? [y/n]
It'd be a user-interace improvement (if optional), but I'm afraid it would
*greatly* complicate the code (which is currently very simple). So I'd
rather fall back on something similar to what we say with --partial
repositories, which is *don't do that*.
We definitely should have a universally-available flag informing darcs not
to download patches, so the user could add to defaults
changes --dont-download-lazy-stuff
> Detailed comments
> -----------------
> > + DarcsNoArgOption [] ["lazy"] Lazy
> > + "get files as needed",
>
> get *patch* files as needed, maybe?
I'd like to extend this to inventory files as well. Currently, the code
doesn't handle it, because we wouldn't have url stubs for the patch files
pointed to by the inventory file, but it shouldn't be *too* much trouble to
extend things so we can only download the parts ot the inventory that we
need, which would be great (or I think it'd be great, anyhow).
We might get the cost of a --lazy get down very close to the cost of
downloading a tarball.
> src/Data/Repository/HashedRepo.lhs
> ----------------------------------
> > +copyHashFile :: [DarcsFlag] -> String -> String -> String -> IO ()
> > +copyHashFile opts url dir hash
>
> > +readHashFile :: [DarcsFlag] -> String -> String -> IO PackedString
> > +readHashFile opts dir hash =
>
> I wonder if it might be useful to have a type synonym like
> type FilePathOrUrl = String
>
> It might serve as documentation for a good chunk of these functions.
Even better might be a newtype. It's more of a pain, but actually enforces
the documentation.
> > +read_inventories :: [DarcsFlag] -> String -> String -> IO [[(PatchInfo, String)]]
> > +read_inventories opts d ihash = do
>
> This is a candidate for a low level refactor. Perhaps we could replace
> the ihash String arguement by the PackedString that one would get from
> readHashFile. We could then move that call inside the recursive call
>
> do r <- unsafeInterleaveIO $
> readHashFile opts d (unpackPS h) >>= read_inventories opts d
>
> This allows us to simplify read_inventory_private:
>
> read_inventory_private opts d iname = do
> i <- fetchFilePS (d++"/"++iname) Uncachable
> read_inventories opts (d++"/inventories) i
>
> Not sure if this is desirable. Perhaps now the code wouldn't make as
> much sense as a result.
Maybe. I'm not sure how much this would simplify things... but then I
haven't time just now to look over the relevant code.
> src/Data/Repository.lhs
> -----------------------
> I'm not entirely clear on what the criteria are for something being
> in Repository or Repository.Interal. I guess it does not matter much.
Basically, Repository.Internal has everything that I can stick in there
without introducing a module import loop. Recursive modules would be ever
so nice! If only we had a compliant Haskell 98 compiler... :(
> > +data HashedVsOld a = HvsO { old, hashed :: a }
> > +
> > +decideHashedOrNormal :: Monad m => RepoFormat -> HashedVsOld (m a) -> m a
> > +decideHashedOrNormal rf (HvsO { hashed = h, old = o })
> > + | format_has_together [HashedInventory] rf = h
> > + | format_has_together [Darcs1_0,HashedInventory] rf = o >> h
> > + | otherwise = o
>
> This is repeated in Repository/Internal. Is that intentional?
I was too busy and tired to refactor... and didn't really want to export
this somewhat ugly helper function. It's not bad locally, but it'd just be
a horrible export.
> > +copyRepository :: Repository -> IO ()
> > +copyRepository fromrepository@(Repo _ opts _ _)
> > + | Partial `elem` opts || Lazy `elem` opts =
> > + do isPartial <- copyPartialRepository fromrepository
> > + unless (isPartial == IsPartial) $ copyFullRepository fromrepository
> > + | otherwise = copyFullRepository fromrepository
>
> > +data PorNP = NotPartial | IsPartial
> > + deriving ( Eq )
>
> Perhaps PartialOrNot is a better name for this.
Yes.
> > +copyInventory :: Repository -> IO ()
> > +copyInventory (Repo fromdir opts rf _) = do
>
> > + _ | format_has HashedInventory rf ||
> > + format_has HashedInventory rf2
> > + -> undefined
>
> The following block of code seems to trigger this case.
>
> mkdir a
> cd a; darcs init
> touch foo; darcs add foo; darcs record -a -m 'foo'
> darcs tag --checkpoint 1.0
> cd ..
> darcs get --hashed-inventory --lazy --partial a b
>
> It seems plausible to me that somebody might say 'oh let's try that new
> lazy repo feature' without first making sure that the source repository
> supports it.
It's undefined because I hadn't gotten around to making it work. It
definitely still needs to be implemented.
> > + _ -> DarcsRepo.copy_repo_patches opts fromdir todir
> > +
> > +copyPartialRepository :: Repository -> IO PorNP
> > +copyPartialRepository fromrepository@(Repo repodir opts _ _) = do
> > + mch <- get_checkpoint opts repodir
> > + case mch of
> > + Nothing -> do putStrLn "No checkpoint."
> > + return NotPartial
>
> When I was first playing with this, I did not realise that you needed
> --partial for --lazy to work. When you give just --lazy (and
> --hashed-inventory) as an argument, you get a mystifying 'No checkpoint'
> message, despite having created such checkpoints.
Hmmm. This needs to be fixed.
> Note that I have not thoroughly reviewed the rest of this file.
Thanks for the review, and I will look forward to more review at some later
date!
--
David Roundy
Department of Physics
Oregon State University
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.osuosl.org/pipermail/darcs-devel/attachments/20070409/9352e23d/attachment.pgp
More information about the darcs-devel
mailing list