[darcs-devel] patch for lazy partial repos
Eric Y. Kow
eric.kow at gmail.com
Mon Apr 9 05:35:57 PDT 2007
Hi again, David,
I'm going to take the bugfix patch and wait till next weekend so that we
can discuss the rest. If you're not back by then, I'll just push them
in.
> For some reason the darcs.net server is down, so I'm sending
> these patches blind (as I'll be travelling soon). My apologies
> if they don't work. I decided to implement "lazy" downloading
> of patches, which is to say, a framework for downloading patches
> only when they are required.
Eric attempts to understand
---------------------------
This patch extends the partial repository mechanism. In order to use
it, you must have a repository created with --hashed-inventory and
which has a checkpoint tag in it first. You then invoke the following:
darcs get --hashed-inventory --partial --lazy foo bar
(Note that you can 'convert' a regular repo into a hashed-inventory one
by using darcs get)
As usual, the idea is that we do not retrieve the patches past a
checkpoint. In the regular version of --partial repositories, anything
that requires these patches will just fail. In lazy partial
repositories, we will simply retrieve the patches when we do need them.
The mechanism does not seem to particularly rely on Haskell laziness.
The idea is that the first time you attempt to retrieve a patch, you
merely write down where the patch lives. For example, let's take a look
a hypothetical patch file from a recent get. The filepath is
_darcs/patches/4b00ace798c51edc37702890d9a5262ed3ded490 and its contents
are indented below:
url: /private/tmp/x/a/_darcs/patches
Now we've gotten the repository, and we want to take a look inside. We
do this with an invocation 'darcs changes -v'. This requires darcs to
read every patch and display its contents. Whoops! Nothing there, only
a url. But that's ok, because the function for reading patches notices
this and consequently (1) retrieves the real patch (2) replaces our
file _darcs/patches/4b00ace798c51edc37702890d9a5262ed3ded490 with the
actual patch contents modulo gunzip:
[foo
Eric Kow <eric.kow at loria.fr>**20070409115210] {
addfile ./a
}
General comments
----------------
Perhaps it would be good if --lazy implied --partial and
--hashed-inventory
There is seems to be some inconsistency about when patches are gzipped,
and when they are not. As far as I can tell, when you create a
repository with a hashed inventory, you do not have gzipped patches.
However, when you retrieve patches via a lazy get, they *are*
compressed. I guess it doesn't really matter if darcs is doing
everything behind the scenes, but (i) it might be nice if patches were
compressed at all times as is the case with regular unhashed inventory
(ii) it may save us some confusion later on.
Also, this mechanism suffers from a variant of --partial's leapfrog
problem (double get):
darcs get --lazy --partial --hashed-inventory foo bar
# put bar on a web server or chmod -R -w bar
darcs get --lazy --partial --hashed-inventory bar quux
user error (problem gzopening file for write:
/private/tmp/bar/_darcs/patches/4b00ace798c51edc37702890d9a5262ed3ded490)
And even if it does work (say, on the local end), I'm not very keen on
the idea of bar patches being rewritten (fleshed out) because I try to
do a darcs get from it. I wonder if we could do something more clever
to avoid this, somehow detect when bar patches contain urls and copy
those over instead of pointing to bar.
Finally, I'm a bit skeptical that this will really avoid patches being
retrieved. It seems that any little operation, specifically changes -v
will trigger a full download and also when you're not particularly
expecting it. As a random thought, maybe a sort of interactive laziness
could be made to work, and if so useful:
I need to retrieve the following patch:
Adjust the frobnicator reuptake simulators [y/n/a/q]
Or...
I need to retrieve 765 patches to do this. Go ahead? [y/n]
Detailed comments
-----------------
> + DarcsNoArgOption [] ["lazy"] Lazy
> + "get files as needed",
get *patch* files as needed, maybe?
src/Data/Repository/HashedRepo.lhs
----------------------------------
> +copyHashFile :: [DarcsFlag] -> String -> String -> String -> IO ()
> +copyHashFile opts url dir hash
> +readHashFile :: [DarcsFlag] -> String -> String -> IO PackedString
> +readHashFile opts dir hash =
I wonder if it might be useful to have a type synonym like
type FilePathOrUrl = String
It might serve as documentation for a good chunk of these functions.
> +read_inventories :: [DarcsFlag] -> String -> String -> IO [[(PatchInfo, String)]]
> +read_inventories opts d ihash = do
This is a candidate for a low level refactor. Perhaps we could replace
the ihash String arguement by the PackedString that one would get from
readHashFile. We could then move that call inside the recursive call
do r <- unsafeInterleaveIO $
readHashFile opts d (unpackPS h) >>= read_inventories opts d
This allows us to simplify read_inventory_private:
read_inventory_private opts d iname = do
i <- fetchFilePS (d++"/"++iname) Uncachable
read_inventories opts (d++"/inventories) i
Not sure if this is desirable. Perhaps now the code wouldn't make as
much sense as a result.
> read_repo :: Repository -> IO PatchSet
> -read_repo (Repo r _ rf (DarcsRepository _))
> - | format_has HashedInventory rf = HashedRepo.read_repo r
> +read_repo (Repo r opts rf (DarcsRepository _))
> + | format_has HashedInventory rf = HashedRepo.read_repo opts r
> | otherwise = DarcsRepo.read_repo r
> read_repo (Repo r _ _ GitRepository) = GitRepo.read_repo r
>
> hunk ./src/Darcs/Repository.lhs 311
> -- both exist and be valid files. This is obviously unsafe in the senses
> -- that "isJust" doesn't tell us if a patch is actually parsable.
> unsafe_fast_read_repo :: Repository -> IO PatchSet
> -unsafe_fast_read_repo (Repo r _ rf (DarcsRepository _))
> - | format_has HashedInventory rf = HashedRepo.read_repo_lazily r
> +unsafe_fast_read_repo (Repo r opts rf (DarcsRepository _))
> + | format_has HashedInventory rf = HashedRepo.read_repo_lazily opts r
> | otherwise = DarcsRepo.lazily_read_repo r
> unsafe_fast_read_repo (Repo r _ _ GitRepository) = GitRepo.read_repo r
>
> }
> [fix bug Eric pointed out (which has also now bitten me).
> David Roundy <droundy at darcs.net>**20070404233317
> This was a bug in the Checkpoint repo, where we assumed we were
> looking at the current working directory, incorrectly.
> ]
> <
> > {
> hunk ./src/Darcs/Repository/Checkpoint.lhs 98
>
> get_check_internal :: [DarcsFlag] -> String -> IO (Maybe Patch)
> get_check_internal opts r = do
> - pis <- (map fst . concat) `fmap` (identifyRepository opts "." >>= read_repo)
> + pis <- (map fst . concat) `fmap` (identifyRepository opts r >>= read_repo)
> pistr <- fetchFilePS (r++"/_darcs/checkpoints/inventory") Uncachable
> `catchall` (return $ packString "")
> case filter (`elem` pis) $ reverse $ read_patch_ids pistr of
> }
> [add support for partial and lazy downloading of hashed repos.
> David Roundy <droundy at darcs.net>**20070405000616]
> <
> > {
> move ./src/Darcs/Repository.lhs ./src/Darcs/Repository/Internal.lhs
> addfile ./src/Darcs/Repository.lhs
> hunk ./GNUmakefile 74
> MODULES_REPOSITORY:=\
> Checkpoint.lhs \
> HashedRepo.lhs \
> - Format.lhs Motd.lhs Prefs.lhs \
> + Format.lhs Internal.lhs \
> + Motd.lhs Prefs.lhs \
> Pristine.lhs \
> DarcsRepo.lhs \
>
> hunk ./src/Darcs/Commands/Check.lhs 31
> import Darcs.Commands ( DarcsCommand(..), nodefaults )
> import Darcs.Arguments ( DarcsFlag( Quiet, Verbose, NoTest, LeaveTestDir ),
> partial_check, any_verbosity, notest,
> + verify_hash,
> leave_test_dir, working_repo_dir,
> )
> import Darcs.Hopefully ( actually )
> hunk ./src/Darcs/Commands/Check.lhs 91
> command_get_arg_possibilities = return [],
> command_argdefaults = nodefaults,
> command_darcsoptions = [partial_check,
> + verify_hash,
> any_verbosity,notest,
> leave_test_dir,
> working_repo_dir
> hunk ./src/Darcs/Commands/Get.lhs 144
> else write_inventory "." [[]]
>
> if format_has HashedInventory rf || format_has HashedInventory rfsource
> - then do identifyRepository opts repodir >>= copyRepository opts
> + then do identifyRepository opts repodir >>= copyRepository
> when (SetScriptsExecutable `elem` opts) $
> do putVerbose $ text "Making scripts executable"
> myname <- getCurrentDirectory
> hunk ./src/Darcs/Commands/Optimize.lhs 34
> TagName, CheckPoint,
> Relink, RelinkPristine ),
> tagname, verbose, checkpoint, reorder_patches,
> - uncompress_nocompress, verify_hash,
> + uncompress_nocompress,
> relink, relink_pristine, sibling,
> flagsToSiblings, modernize_patches,
> working_repo_dir, umask_option,
> hunk ./src/Darcs/Commands/Optimize.lhs 97
> reorder_patches,
> sibling, relink,
> relink_pristine,
> - verify_hash,
> umask_option]}
> \end{code}
> \begin{code}
> hunk ./src/Darcs/Repository/Checkpoint.lhs 45
> readPatch,
> gzWritePatch
> )
> -import Darcs.Repository ( Repository, identifyRepository, read_repo, slurp_recorded,
> - absolute_dir, withRecorded )
> +import Darcs.Repository.Internal
> + ( Repository, identifyRepository, read_repo, slurp_recorded,
> + absolute_dir, withRecorded )
> import Darcs.Repository.DarcsRepo ( apply_patches )
> import Darcs.Patch.Info ( PatchInfo, make_filename, readPatchInfo,
> showPatchInfo
> hunk ./src/Darcs/Repository/HashedRepo.lhs 27
> module Darcs.Repository.HashedRepo ( revert_tentative_changes, finalize_tentative_changes,
> add_to_tentative_inventory, remove_from_tentative_inventory,
> read_repo, read_repo_lazily,
> - write_inventory,
> + write_inventory, copy_repo,
> writeHashFile, copyHashFile, readHashFile
> ) where
>
> hunk ./src/Darcs/Repository/HashedRepo.lhs 33
> import System.IO.Unsafe ( unsafeInterleaveIO )
> import System.IO ( stderr, hPutStrLn )
> -import System.Directory ( doesDirectoryExist )
> import Data.List ( delete )
> import Control.Monad ( unless, when )
>
> hunk ./src/Darcs/Repository/HashedRepo.lhs 37
> import Workaround ( renameFile, getCurrentDirectory, createDirectoryIfMissing )
> -import Darcs.Flags ( DarcsFlag( Lazy, VerifyHash ) )
> +import Darcs.Flags ( DarcsFlag( Verbose, Lazy, Partial, VerifyHash ) )
> import Darcs.Patch.Set ( PatchSet )
> import Darcs.Repository.DarcsRepo ( absolute_dir )
> import Darcs.Hopefully ( Hopefully, actuallyHashed, extractHash, unavailable )
> hunk ./src/Darcs/Repository/HashedRepo.lhs 41
> -import Darcs.Patch ( Patch, showPatch, patch2patchinfo, readPatch, readPatchLazily,
> - gzReadPatchFileLazily )
> +import Darcs.Patch ( Patch, showPatch, patch2patchinfo, readPatch,
> + readPatchLazily )
> import Darcs.Patch.Depends ( commute_to_end, slightly_optimize_patchset )
> import Darcs.Patch.Info ( PatchInfo, showPatchInfo, readPatchInfo )
> hunk ./src/Darcs/Repository/HashedRepo.lhs 45
> -import FastPackedString ( PackedString, readFilePS, writeFilePS, gzReadFilePS, nullPS,
> +import FastPackedString ( PackedString, readFilePS, writeFilePS, nullPS,
> tailPS, lengthPS, takePS, dropPS,
> hunk ./src/Darcs/Repository/HashedRepo.lhs 47
> + gzWriteFilePS,
> packString, breakOnPS, unpackPS, dropWhitePS )
> import Printer ( Doc, hcat, ($$), text )
> import SHA1 ( sha1PS )
> hunk ./src/Darcs/Repository/HashedRepo.lhs 102
>
> copyHashFile :: [DarcsFlag] -> String -> String -> String -> IO ()
> copyHashFile opts url dir hash
> - | Lazy `elem` opts = writeFile dirhash $ "url: " ++ url
> + | Lazy `elem` opts || Partial `elem` opts =
> + writeFile dirhash $ "url: " ++ url
> | otherwise = copyFileOrUrl urlhash dirhash Cachable
> where urlhash = url ++ "/" ++ hash
> dirhash = dir ++ "/" ++ hash
> hunk ./src/Darcs/Repository/HashedRepo.lhs 112
> readHashFile opts dir hash =
> do hf <- gzFetchFilePS (dir ++ "/" ++ hash) Cachable
> if takePS 5 hf == packString "url: "
> - then readHashFile opts (unpackPS $ dropPS 5 hf) hash
> + then do ps <- readHashFile opts (unpackPS $ dropPS 5 hf) hash
> + gzWriteFilePS (dir ++ "/" ++ hash) ps
> + return ps
> else do when (VerifyHash `elem` opts && sha1PS hf /= hash) $
> fail $ "Hash failed hash: "++ dir ++ "/" ++ hash
> return hf
> hunk ./src/Darcs/Repository/HashedRepo.lhs 143
> read_repo_private :: [DarcsFlag] -> Bool -> FilePath -> FilePath -> IO PatchSet
> read_repo_private opts am_lazy d iname =
> do inventories <- read_inventory_private opts (d++"/_darcs") iname
> - isdir <- doesDirectoryExist d
> - let parse (_,h) = let fn = d ++ "/_darcs/patches/" ++ h
> - in if isdir then parse_local h fn
> - else parse_remote h fn
> + let dn = d ++ "/_darcs/patches"
> + parse (_,h) = do ps <- readHashFile opts dn h
> + return $
> + if am_lazy
> + then actuallyHashed h $ fst $ readPatchLazily ps
> + else hopefullyNoParseError h (dn++"/"++h) $
> + fst `fmap` readPatch ps
> read_patches parse `mapM` inventories
> hunk ./src/Darcs/Repository/HashedRepo.lhs 151
> - where -- parse_remote should really download to a temporary file removed
> - -- at exit
> - parse_remote h fn = do ps <- gzFetchFilePS fn Cachable
> - return $
> - if am_lazy
> - then actuallyHashed h $ fst $ readPatchLazily ps
> - else hopefullyNoParseError h fn $
> - fst `fmap` readPatch ps
> - parse_local h fn = if am_lazy
> - then actuallyHashed h `fmap` gzReadPatchFileLazily fn
> - else (hopefullyNoParseError h fn . (fmap fst . readPatch))
> - `fmap` gzReadFilePS fn
> - hopefullyNoParseError :: String -> String -> Maybe Patch -> Hopefully Patch
> + where hopefullyNoParseError :: String -> String -> Maybe Patch -> Hopefully Patch
> hopefullyNoParseError h _ (Just x) = actuallyHashed h x
> hopefullyNoParseError _ s Nothing = unavailable $ "Couldn't parse file "++s
> read_patches :: ((PatchInfo,String) -> IO (Hopefully Patch))
> hunk ./src/Darcs/Repository/HashedRepo.lhs 169
> write_inventory :: PatchSet -> IO ()
> write_inventory = write_either_inventory "hashed_inventory"
>
> +copy_repo :: [DarcsFlag] -> String -> String -> IO ()
> +copy_repo opts inr outr = withCurrentDirectory outr $ do
> + read_repo opts inr >>= write_inventory
> + inventories <- read_inventory_private opts (outr++"/_darcs") "hashed_inventory"
> + putVerbose "Lazily copying hashed patch files..."
> + mapM_ (copyHashFile opts inp outp . snd) $ concat inventories
> + where putVerbose = when (Verbose `elem` opts) . putStrLn
> + inp = inr ++ "/_darcs/patches"
> + outp = outr ++ "/_darcs/patches"
> +
> write_either_inventory :: String -> PatchSet -> IO ()
> write_either_inventory iname x =
> do createDirectoryIfMissing False "_darcs/inventories"
> hunk ./src/Darcs/Repository/Internal.lhs 20
> % Boston, MA 02110-1301, USA.
>
> \begin{code}
> -module Darcs.Repository ( Repository, maybeIdentifyRepository, identifyRepository,
> +module Darcs.Repository.Internal ( Repository(..), RepoType(..),
> + maybeIdentifyRepository, identifyRepository,
> findRepository, amInRepository, amNotInRepository,
> slurp_pending,
> slurp_recorded, slurp_recorded_and_unrecorded,
> hunk ./src/Darcs/Repository/Internal.lhs 37
> #endif
> tentativelyAddPatch, tentativelyRemovePatches, tentativelyAddToPending,
> finalizeRepositoryChanges,
> - copyRepository,
> unrevertUrl,
> applyToPristine, patchSetToPatches,
> createPristineDirectoryTree,
> hunk ./src/Darcs/Repository/Internal.lhs 42
> optimizeInventory,
> getMarkedupFile,
> - PatchSet
> + PatchSet,
> + add_to_tentative_pristine,
> ) where
> import Prelude hiding ( catch )
> import Darcs.External ( fetchFilePS, Cachable( Cachable ) )
> hunk ./src/Darcs/Repository/Internal.lhs 75
> import qualified Darcs.Repository.DarcsRepo as DarcsRepo
> import qualified Darcs.Repository.GitRepo as GitRepo
> import Darcs.Flags ( DarcsFlag(AnyOrder, Boring, LookForAdds, Verbose,
> - WorkDir, UMask) )
> + WorkDir, UMask) )
> import Darcs.Patch ( Patch, flatten, join_patches, reorder_and_coalesce,
> commute, patch2patchinfo, null_patch, readPatch,
> writePatch, flatten_to_primitives, invert, is_similar,
> hunk ./src/Darcs/Repository/Internal.lhs 377
> GitRepo.set_tentative_tree token
> return patch'
>
> -tentativelyAddPatchFromOutside :: Repository -> [DarcsFlag] -> Patch -> IO Patch
> -tentativelyAddPatchFromOutside (Repo dir _ rf (DarcsRepository _)) opts patch =
> - withCurrentDirectory dir $
> - do fp <- decideHashedOrNormal rf $ HvsO {
> - hashed = HashedRepo.add_to_tentative_inventory opts patch,
> - old = DarcsRepo.add_to_tentative_inventory opts patch
> - }
> - add_to_tentative_pristine fp
> - gzReadPatchFileLazily fp
> -tentativelyAddPatchFromOutside (Repo dir _ _ GitRepository) _ patch =
> - withCurrentDirectory dir $ do (patch', token) <- GitRepo.writePatch dir patch
> - GitRepo.set_tentative_tree token
> - return patch'
> -
> tentativelyAddToPending :: Repository -> [DarcsFlag] -> Patch -> IO ()
> tentativelyAddToPending (Repo dir _ _ rt) _ patch =
> withCurrentDirectory dir $ do
> hunk ./src/Darcs/Repository/Internal.lhs 550
> \end{code}
>
> \begin{code}
> -copyRepository :: [DarcsFlag] -> Repository -> IO ()
> -copyRepository opts fromrepository =
> - withRepoLock opts $ \torepository ->
> - do patches <- read_repo fromrepository
> - mapM_ (tentativelyAddPatchFromOutside torepository opts) $
> - map hopefully $ reverse $ concat patches
> - finalizeRepositoryChanges torepository
> - identifyPristine >>= createPristineWorking torepository
> - where createPristineWorking :: Repository -> Pristine -> IO ()
> - createPristineWorking repository pris =
> - do done <- easyCreatePristineDirectoryTree pris "."
> - unless done $ do
> - patches <- unsafe_fast_read_repo repository
> - apply_patches [] False noPut noPut $ reverse $ concat patches
> - noPut _ = return ()
> -
> optimizeInventory :: Repository -> IO ()
> optimizeInventory repository@(Repo r _ rf (DarcsRepository _)) =
> do ps <- read_repo repository
src/Data/Repository.lhs
-----------------------
I'm not entirely clear on what the criteria are for something being
in Repository or Repository.Interal. I guess it does not matter much.
> +data HashedVsOld a = HvsO { old, hashed :: a }
> +
> +decideHashedOrNormal :: Monad m => RepoFormat -> HashedVsOld (m a) -> m a
> +decideHashedOrNormal rf (HvsO { hashed = h, old = o })
> + | format_has_together [HashedInventory] rf = h
> + | format_has_together [Darcs1_0,HashedInventory] rf = o >> h
> + | otherwise = o
This is repeated in Repository/Internal. Is that intentional?
> +copyRepository :: Repository -> IO ()
> +copyRepository fromrepository@(Repo _ opts _ _)
> + | Partial `elem` opts || Lazy `elem` opts =
> + do isPartial <- copyPartialRepository fromrepository
> + unless (isPartial == IsPartial) $ copyFullRepository fromrepository
> + | otherwise = copyFullRepository fromrepository
> +data PorNP = NotPartial | IsPartial
> + deriving ( Eq )
Perhaps PartialOrNot is a better name for this.
> +copyInventory :: Repository -> IO ()
> +copyInventory (Repo fromdir opts rf _) = do
> + _ | format_has HashedInventory rf ||
> + format_has HashedInventory rf2
> + -> undefined
The following block of code seems to trigger this case.
mkdir a
cd a; darcs init
touch foo; darcs add foo; darcs record -a -m 'foo'
darcs tag --checkpoint 1.0
cd ..
darcs get --hashed-inventory --lazy --partial a b
It seems plausible to me that somebody might say 'oh let's try that new
lazy repo feature' without first making sure that the source repository
supports it.
>
> + _ -> DarcsRepo.copy_repo_patches opts fromdir todir
> +
> +copyPartialRepository :: Repository -> IO PorNP
> +copyPartialRepository fromrepository@(Repo repodir opts _ _) = do
> + mch <- get_checkpoint opts repodir
> + case mch of
> + Nothing -> do putStrLn "No checkpoint."
> + return NotPartial
When I was first playing with this, I did not realise that you needed
--partial for --lazy to work. When you give just --lazy (and
--hashed-inventory) as an argument, you get a mystifying 'No checkpoint'
message, despite having created such checkpoints.
Note that I have not thoroughly reviewed the rest of this file.
--
Eric Kow http://www.loria.fr/~kow
PGP Key ID: 08AC04F9 Merci de corriger mon français.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 186 bytes
Desc: not available
Url : http://lists.osuosl.org/pipermail/darcs-devel/attachments/20070409/35ca4bdb/attachment.pgp
More information about the darcs-devel
mailing list