[darcs-devel] patch for lazy partial repos

Eric Y. Kow eric.kow at gmail.com
Mon Apr 9 05:39:17 PDT 2007


[Resending with extra snippage]

Hi again, David,

I'm going to take the bugfix patch and wait till next weekend so that we
can discuss the rest.  If you're not back by then, I'll just push them
in.

> For some reason the darcs.net server is down, so I'm sending
> these patches blind (as I'll be travelling soon).  My apologies
> if they don't work.  I decided to implement "lazy" downloading
> of patches, which is to say, a framework for downloading patches
> only when they are required.

Eric attempts to understand
---------------------------
This patch extends the partial repository mechanism.  In order to use
it, you must have a repository created with --hashed-inventory and
which has a checkpoint tag in it first. You then invoke the following:

  darcs get --hashed-inventory --partial --lazy foo bar

(Note that you can 'convert' a regular repo into a hashed-inventory one
 by using darcs get)

As usual, the idea is that we do not retrieve the patches past a
checkpoint.  In the regular version of --partial repositories, anything
that requires these patches will just fail.  In lazy partial
repositories, we will simply retrieve the patches when we do need them.

The mechanism does not seem to particularly rely on Haskell laziness.
The idea is that the first time you attempt to retrieve a patch, you
merely write down where the patch lives.  For example, let's take a look
a hypothetical patch file from a recent get.  The filepath is
_darcs/patches/4b00ace798c51edc37702890d9a5262ed3ded490 and its contents
are indented below:

  url: /private/tmp/x/a/_darcs/patches

Now we've gotten the repository, and we want to take a look inside.  We
do this with an invocation 'darcs changes -v'.  This requires darcs to
read every patch and display its contents. Whoops!  Nothing there, only
a url.  But that's ok, because the function for reading patches notices
this and consequently (1) retrieves the real patch (2) replaces our
file _darcs/patches/4b00ace798c51edc37702890d9a5262ed3ded490 with the
actual patch contents modulo gunzip:

  [foo
  Eric Kow <eric.kow at loria.fr>**20070409115210] {
  addfile ./a
  }

General comments
----------------
Perhaps it would be good if --lazy implied --partial and
--hashed-inventory

There is seems to be some inconsistency about when patches are gzipped,
and when they are not.  As far as I can tell, when you create a
repository with a hashed inventory, you do not have gzipped patches.
However, when you retrieve patches via a lazy get, they *are*
compressed.  I guess it doesn't really matter if darcs is doing
everything behind the scenes, but (i) it might be nice if patches were
compressed at all times as is the case with regular unhashed inventory
(ii) it may save us some confusion later on.

Also, this mechanism suffers from a variant of --partial's leapfrog
problem (double get):
 
 darcs get --lazy --partial --hashed-inventory foo bar
 # put bar on a web server or chmod -R -w bar
 darcs get --lazy --partial --hashed-inventory bar quux

 user error (problem gzopening file for write:
 /private/tmp/bar/_darcs/patches/4b00ace798c51edc37702890d9a5262ed3ded490)

And even if it does work (say, on the local end),  I'm not very keen on
the idea of bar patches being rewritten (fleshed out) because I try to
do a darcs get from it.  I wonder if we could do something more clever
to avoid this, somehow detect when bar patches contain urls and copy
those over instead of pointing to bar.

Finally, I'm a bit skeptical that this will really avoid patches being
retrieved.  It seems that any little operation, specifically changes -v
will trigger a full download and also when you're not particularly
expecting it.  As a random thought, maybe a sort of interactive laziness
could be made to work, and if so useful:

  I need to retrieve the following patch:
    Adjust the frobnicator reuptake simulators [y/n/a/q]

Or...

  I need to retrieve 765 patches to do this.  Go ahead? [y/n]

Detailed comments
-----------------
> +           DarcsNoArgOption [] ["lazy"] Lazy
> +           "get files as needed",

get *patch* files as needed, maybe?

src/Data/Repository/HashedRepo.lhs
----------------------------------
> +copyHashFile :: [DarcsFlag] -> String -> String -> String -> IO ()
> +copyHashFile opts url dir hash

> +readHashFile :: [DarcsFlag] -> String -> String -> IO PackedString
> +readHashFile opts dir hash =

I wonder if it might be useful to have a type synonym like
type FilePathOrUrl = String

It might serve as documentation for a good chunk of these functions.

> +read_inventories :: [DarcsFlag] -> String -> String -> IO [[(PatchInfo, String)]]
> +read_inventories opts d ihash = do

This is a candidate for a low level refactor.  Perhaps we could replace
the ihash String arguement by the PackedString that one would get from
readHashFile.  We could then move that call inside the recursive call

  do r <- unsafeInterleaveIO $
     readHashFile opts d (unpackPS h) >>= read_inventories opts d

This allows us to simplify read_inventory_private:

  read_inventory_private opts d iname = do
      i <- fetchFilePS (d++"/"++iname) Uncachable
      read_inventories opts (d++"/inventories) i
   
Not sure if this is desirable.  Perhaps now the code wouldn't make as
much sense as a result.

src/Data/Repository.lhs
-----------------------
I'm not entirely clear on what the criteria are for something being
in Repository or Repository.Interal. I guess it does not matter much.

> +data HashedVsOld a = HvsO { old, hashed :: a }
> +
> +decideHashedOrNormal :: Monad m => RepoFormat -> HashedVsOld (m a) -> m a
> +decideHashedOrNormal rf (HvsO { hashed = h, old = o })
> +    | format_has_together [HashedInventory] rf = h
> +    | format_has_together [Darcs1_0,HashedInventory] rf = o >> h
> +    | otherwise = o

This is repeated in Repository/Internal.  Is that intentional?

> +copyRepository :: Repository -> IO ()
> +copyRepository fromrepository@(Repo _ opts _ _)
> +    | Partial `elem` opts || Lazy `elem` opts =
> +        do isPartial <- copyPartialRepository fromrepository
> +           unless (isPartial == IsPartial) $ copyFullRepository fromrepository
> +    | otherwise = copyFullRepository fromrepository

> +data PorNP = NotPartial | IsPartial
> +             deriving ( Eq )

Perhaps PartialOrNot is a better name for this.

> +copyInventory :: Repository -> IO ()
> +copyInventory (Repo fromdir opts rf _) = do

> +    _ | format_has HashedInventory rf ||
> +        format_has HashedInventory rf2
> +          -> undefined

The following block of code seems to trigger this case.

  mkdir a
  cd a; darcs init
  touch foo; darcs add foo; darcs record -a -m 'foo'
  darcs tag --checkpoint 1.0
  cd ..
  darcs get --hashed-inventory --lazy --partial a b

It seems plausible to me that somebody might say 'oh let's try that new
lazy repo feature' without first making sure that the source repository
supports it.

>
> +    _ -> DarcsRepo.copy_repo_patches opts fromdir todir
> +
> +copyPartialRepository :: Repository -> IO PorNP
> +copyPartialRepository fromrepository@(Repo repodir opts _ _) = do
> +  mch <- get_checkpoint opts repodir
> +  case mch of
> +    Nothing -> do putStrLn "No checkpoint."
> +                  return NotPartial

When I was first playing with this, I did not realise that you needed
--partial for --lazy to work.  When you give just --lazy (and
--hashed-inventory) as an argument, you get a mystifying 'No checkpoint'
message, despite having created such checkpoints.

Note that I have not thoroughly reviewed the rest of this file.

-- 
Eric Kow                     http://www.loria.fr/~kow
PGP Key ID: 08AC04F9         Merci de corriger mon français.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 186 bytes
Desc: not available
Url : http://lists.osuosl.org/pipermail/darcs-devel/attachments/20070409/f1161f16/attachment-0001.pgp


More information about the darcs-devel mailing list