[darcs-users] [patch260] Implement darcs optimize --http (and 2 more)

Alexey Levan exlevan at gmail.com
Thu Jun 3 22:00:13 UTC 2010


2010/6/3 Petr Rockai <me at mornfall.net>:
>
> Implement darcs optimize --http
> --------------------------------
>
>> hunk ./src/Darcs/Commands/Get.lhs 161
>>      putInfo opts $ text "Warning: 'old-fashioned-inventory' is ignored with a darcs-2 repository\n"
>>    let opts' = if formatHas Darcs2 rfsource
>>                then UseFormat2:opts
>> -              else if not (UseOldFashionedInventory `elem` opts)
>> +              else if not (UseOldFashionedInventory `elem` opts) &&
>> +                      not (Partial `elem` opts)
>>                     then UseHashedInventory:filter (/= UseFormat2) opts
>>                     else UseOldFashionedInventory:filter (/= UseFormat2) opts
>>    createRepository opts'
> I am not sure you want this hunk included in this patch. Could you
> explain what was the intention of this change? (There was a discussion
> in #darcs recently about --partial behaviour, but this certainly needs
> to be done as a separate patch, if at all.)

Ugh, that really doesn't belong to this patch. I just saw a ticket,
and was hacking around, so I thought it's trivial to implement and
just did it. After that, I've successfully forgotten about this change
:-) It shouldn't be here, I didn't even test it after all.

>> hunk ./src/Darcs/Commands/Optimize.lhs 36
>>  import Darcs.Arguments ( DarcsFlag( UpgradeFormat, UseHashedInventory,
>>                                      Compress, UnCompress,
>>                                      NoCompress, Reorder,
>> -                                    Relink, RelinkPristine, OptimizePristine ),
>> +                                    Relink, RelinkPristine, OptimizePristine,
>> +                                    OptimizeHTTP ),
>>                          reorderPatches,
>>                          uncompressNocompress,
>>                          relink, relinkPristine, sibling,
>> hunk ./src/Darcs/Commands/Optimize.lhs 43
>>                          flagsToSiblings,
>>                          upgradeFormat,
>> -                        workingRepoDir, umaskOption, optimizePristine
>> +                        workingRepoDir, umaskOption, optimizePristine,
>> +                        optimizeHTTP
>>                        )
>>  import Darcs.Repository.Prefs ( getPreflist )
>>  import Darcs.Repository ( Repository,
> OK
>
>> hunk ./src/Darcs/Commands/Optimize.lhs 89
>>  import Storage.Hashed.Plain( readPlainTree )
>>  import Storage.Hashed.Darcs( writeDarcsHashed )
>>
>> +import Codec.Archive.Tar ( create )
>> +
>>  #include "gadts.h"
>>
>>  optimizeDescription :: String
> OK
>
>> hunk ./src/Darcs/Commands/Optimize.lhs 134
>>                                                   sibling, relink,
>>                                                   relinkPristine,
>>                                                    upgradeFormat,
>> -                                                 optimizePristine]}
>> +                                                 optimizePristine,
>> +                                                 optimizeHTTP]}
>>
>>  optimizeCmd :: [DarcsFlag] -> [String] -> IO ()
>>  optimizeCmd origopts _ = do
> OK
>
>> hunk ./src/Darcs/Commands/Optimize.lhs 140
>>      when (UpgradeFormat `elem` origopts) optimizeUpgradeFormat
>> +    when (OptimizeHTTP `elem` origopts) doOptimizeHTTP
>>      withRepoLock opts $- \repository -> do
>>      if (OptimizePristine `elem` opts)
>>         then doOptimizePristine repository
> This should maybe run within a repository lock?

Yep, certainly.

>> hunk ./src/Darcs/Commands/Optimize.lhs 364
>>      withCurrentDirectory dir $ do
>>        gzs <- filter ((== ".gz") . takeExtension) `fmap` getDirectoryContents "."
>>        mapM_ removeFile gzs
>> +
>> +doOptimizeHTTP :: IO ()
>> +doOptimizeHTTP = withRepoLock [] $- \_ -> do
>> +  createDirectoryIfMissing False packsDir
>> +  create (packsDir </> "invprist.tar") "" $ map (darcsdir </>)
>> +    ["hashed_inventory", "inventories", "pristine.hashed"]
>> +  create (packsDir </> "patches.tar") "" $ [darcsdir </> "patches"]
>> + where
>> +  packsDir = darcsdir </> "packs"
>>  \end{code}
>
> I am not very happy with the "invprist" filename. At first sight, it
> sounds like "inverted pristine" to me, although I understand it's meant
> to say inventories and pristine. So one (probably too verbose) option
> would be to name it inventories_and_pristine.tar, or possibly
> inventories+pristine (but needs escaping in URLs) or maybe something
> like "basic.tar" or "minimal.tar" that would convey that this is a
> minimal working repository.

basic.tar is less typing, so I'll go with it :-)

>> hunk ./src/Darcs/Repository.hs 94
>>  import Darcs.Hopefully ( PatchInfoAnd, info, extractHash )
>>  import Darcs.Repository.Checkpoint ( identifyCheckpoint, writeCheckpointPatch, getCheckpoint )
>>  import Darcs.Repository.ApplyPatches ( applyPatches )
>> -import Darcs.Repository.HashedRepo ( applyToTentativePristine, pris2inv )
>> +import Darcs.Repository.HashedRepo ( applyToTentativePristine, pris2inv, revertTentativeChanges )
>>  import Darcs.Repository.InternalTypes ( Pristine(..), extractOptions )
>>  import Darcs.Patch ( RepoPatch, Named, Prim, Patch, patch2patchinfo, apply )
>>  import Darcs.Witnesses.Ordered ( FL(..), RL(..), bunchFL, mapFL, mapRL
>> hunk ./src/Darcs/Repository.hs 98
>> -                               , reverseRL ,lengthRL, (+>+) )
>> +                               , reverseRL ,lengthRL, (+>+), (:\/:)(..) )
>>  import Darcs.Patch.Info ( PatchInfo )
>>  import Darcs.Repository.Format ( RepoProperty ( HashedInventory ),
>>                                   createRepoFormat, formatHas, writeRepoFormat )
>> hunk ./src/Darcs/Repository.hs 104
>>  import Darcs.Repository.Prefs ( writeDefaultPrefs )
>>  import Darcs.Repository.Pristine ( createPristine, flagsToPristine, createPristineFromWorking )
>> -import Darcs.Patch.Depends ( getPatchesBeyondTag, areUnrelatedRepos )
>> +import Darcs.Patch.Depends ( getPatchesBeyondTag, areUnrelatedRepos, findCommonAndUncommon )
>>  import Darcs.Utils ( withCurrentDirectory, catchall, promptYorn, prettyError )
>> hunk ./src/Darcs/Repository.hs 106
>> -import Darcs.External ( copyFileOrUrl, Cachable(..) )
>> +import Darcs.External ( copyFileOrUrl, Cachable(..), fetchFilePS )
>> +import Darcs.Lock ( withTemp )
>>  import Progress ( debugMessage, tediousSize, beginTedious, endTedious )
>>  import Darcs.ProgressPatches (progressRLShowTags, progressFL)
>>  import Darcs.Lock ( writeBinFile, writeDocBinFile, rmRecursive )
>> hunk ./src/Darcs/Repository.hs 125
>>  import ByteStringUtils( gzReadFilePS )
>>
>>  import System.FilePath( (</>) )
>> +import Codec.Archive.Tar ( extract )
>>
>>  import qualified Data.ByteString.Char8 as BS
>>
> OK (just imports)
>
>> hunk ./src/Darcs/Repository.hs 148
>
>>  copyRepository :: RepoPatch p => Repository p C(r u t) -> IO ()
>> -copyRepository fromrepository@(Repo _ opts rf _)
>> +copyRepository fromRepo@(Repo fromDir opts rf _)
>>      | Partial `elem` opts && not (formatHas HashedInventory rf) =
>> hunk ./src/Darcs/Repository.hs 150
>> -        do isPartial <- copyPartialRepository fromrepository
>> -           unless (isPartial == IsPartial) $ copyFullRepository fromrepository
>> -    | otherwise = copyFullRepository fromrepository
>> +        do isPartial <- copyPartialRepository fromRepo
>> +           unless (isPartial == IsPartial) $ copyFullRepository fromRepo
>> +    | otherwise = do
>> +        isPacked <- doesRemoteFileExist $ fromDir </> darcsdir </>
>> +          "packs" </> "invprist.tar"
>> +        if isPacked
>> +          then copyPackedRepository fromRepo
>> +          else copyFullRepository fromRepo
>> + where
>> +  doesRemoteFileExist x = fetchFilePS x Cachable >> return True `catch`
>> +    (\_ -> return False)
> It is not very clear that this does what it seems to want to do. I am
> assuming that your intention was to get the download going in this
> place, and have the second request in copyPackedRepository below
> (copyFileOrUrl) to be satisfied from the already fetched file.
>
> However, the Cachable bit only seems to affect proxies or generally the
> Pragma/Cache-Control HTTP headers and nothing much else. So it seems to
> me that this code as it is will actually download the code twice. For
> all I can tell, fetchFilePS is completely strict, so it will wait till
> the download is complete, and it will erase its temporary file after
> reading it into memory.
>
> I think the way to fix this is to just try copyPackedRepository right
> away and if it fails, fall back to copyFullRepository -- you will have
> to reorder copyPackedRepository a bit to *first* grab the tarball and
> only then set up prefs and finally unpack the tarball.

Cachable bit is just copy/paste artifact, it's taken from code that
checks repository format. Difference is, that there inventory is
downloaded, so it's not a big deal. I'll follow your suggestion on
this.

>>  data PorNP = NotPartial | IsPartial
>>               deriving ( Eq )
>> hunk ./src/Darcs/Repository.hs 261
>>                     debugMessage "Writing the pristine"
>>                     pristineFromWorking torepository
>>
>> +copyPackedRepository :: forall p C(r u t). RepoPatch p => Repository p C(r u t) -> IO ()
>> +copyPackedRepository fromRepo@(Repo fromDir opts _ (DarcsRepository _ fromCache)) = do
>> +  debugMessage "Copying prefs"
>> +  copyFileOrUrl opts (fromDir ++ "/" ++ darcsdir ++ "/prefs/prefs")
>> +    (darcsdir ++ "/prefs/prefs") (MaxAge 600) `catchall` return ()
>> +  debugMessage "Grabbing lock in new repository..."
>> +  Repo toDir _ toFormat (DarcsRepository toPristine toCache) <-
>> +    identifyRepositoryFor fromRepo "."
>> +  toCache2 <- unionRemoteCaches toCache fromCache fromDir
>> +  let toRepo :: Repository p C(r u t)
>> +      toRepo = Repo toDir opts toFormat $ DarcsRepository toPristine toCache2
>> +  -- unpack inventory & pristine cache
>> +  withTemp $ \tmp -> do
>> +    let fromPacksDir = fromDir ++ "/" ++ darcsdir ++ "/packs/"
>> +    copyFileOrUrl [] (fromPacksDir ++ "invprist.tar") tmp Uncachable
>> +    extract toDir tmp
>> +    createPristineDirectoryTree toRepo "."
>> +  -- pull new patches
>> +  us <- readRepo toRepo
>> +  them <- readRepo fromRepo
>> +  comm :\/: unc <- return $ findCommonAndUncommon us them
>> +  revertTentativeChanges
>> +  Sealed pw <- tentativelyMergePatches toRepo "get" opts comm unc
>> +  withGutsOf toRepo $ do
>> +    finalizeRepositoryChanges toRepo
>> +    applyToWorking toRepo opts pw
>> +    return ()
>> +  -- get old patches
>> +  fetchPatchesIfNecessary opts toRepo
> Other than the above observation, I think this is basically OK. It may
> be useful to factor the "pull new patches" bit into a helper function
> (or look if there is one already, although there probably isn't). Or
> maybe not. I defer that decision to you.

Looks useful for me, too. Though I wouldn't put high priority on that.

> Recompress repository packs
> ---------------------------
>
>> hunk ./src/Darcs/Commands/Optimize.lhs 24
>>  {-# LANGUAGE CPP #-}
>>
>>  module Darcs.Commands.Optimize ( optimize ) where
>> -import Control.Monad ( when, unless )
>> +import Control.Applicative ( (<$>) )
>> +import Control.Monad ( when, unless, (<=<) )
>>  import Data.Maybe ( isJust )
>>  import System.Directory ( getDirectoryContents, doesDirectoryExist, doesFileExist )
>>  import qualified Data.ByteString.Char8 as BS
>> hunk ./src/Darcs/Commands/Optimize.lhs 29
>> +import qualified Data.ByteString.Lazy as BL
>>
>>  import Storage.Hashed.Darcs( decodeDarcsSize )
>>
>> hunk ./src/Darcs/Commands/Optimize.lhs 91
>>  import Storage.Hashed.Plain( readPlainTree )
>>  import Storage.Hashed.Darcs( writeDarcsHashed )
>>
>> -import Codec.Archive.Tar ( create )
>> +import Codec.Archive.Tar ( write )
>> +import Codec.Archive.Tar.Entry ( fileEntry, toTarPath )
>> +import Codec.Compression.GZip ( compress )
>>
>>  #include "gadts.h"
>>
> OK (imports again)
>
>> hunk ./src/Darcs/Commands/Optimize.lhs 371
>>
>>  doOptimizeHTTP :: IO ()
>>  doOptimizeHTTP = withRepoLock [] $- \_ -> do
>> +  rf <- either fail return =<< identifyRepoFormat "."
>> +  unless (formatHas HashedInventory rf) $ fail
>> +    "Unsupported repository format"
>>    createDirectoryIfMissing False packsDir
> Good point. Does this maybe belong to a separate patch, or maybe to the
> previous one? Does not seem to be directly related to re-compression.

I've put some fixes of previous patch to this one, they'll be in place
in the next amendment.

>> hunk ./src/Darcs/Commands/Optimize.lhs 375
>> -  create (packsDir </> "invprist.tar") "" $ map (darcsdir </>)
>> -    ["hashed_inventory", "inventories", "pristine.hashed"]
>> -  create (packsDir </> "patches.tar") "" $ [darcsdir </> "patches"]
>> +  i <- fileEntry' $ darcsdir </> "hashed_inventory"
>> +  is <- tarDarcsDir "inventories"
>> +  pr <- tarDarcsDir "pristine.hashed"
>> +  BL.writeFile (packsDir </> "invprist.tar.gz") . compress $ write (i : (is ++ pr))
>> +  ps <- tarDarcsDir "patches"
>> +  BL.writeFile (packsDir </> "patches.tar.gz") . compress $ write ps
>>   where
>>    packsDir = darcsdir </> "packs"
>> hunk ./src/Darcs/Commands/Optimize.lhs 383
>> +  fileEntry' x = do
>> +    content <- BL.fromChunks . return <$> gzReadFilePS x
>> +    tp <- either fail return (toTarPath False x)
>> +    return $ fileEntry tp content
>> +  dirContents d = map (d </>) . filter ((/= '.') . head) <$> getDirectoryContents d
>> +  tarDarcsDir = mapM fileEntry' <=< dirContents . (darcsdir </>)
> OK
>
>> hunk ./src/Darcs/Repository.hs 48
>>      ) where
>>
>>  import System.Exit ( ExitCode(..), exitWith )
>> +import Data.List ( isSuffixOf )
>>  import Data.Maybe( catMaybes )
>>
>>  import Darcs.Repository.State( readRecorded, readUnrecorded, readWorking, unrecordedChanges
>> hunk ./src/Darcs/Repository.hs 86
>>  import URL ( maxPipelineLength )
>>
>>  import Control.Monad ( unless, when )
>> -import System.Directory ( createDirectory, renameDirectory )
>> +import System.Directory ( createDirectory, renameDirectory, createDirectoryIfMissing )
>>  import System.IO.Error ( isAlreadyExistsError )
>>
>>  import qualified Darcs.Repository.DarcsRepo as DarcsRepo
>> hunk ./src/Darcs/Repository.hs 108
>>  import Darcs.Patch.Depends ( getPatchesBeyondTag, areUnrelatedRepos, findCommonAndUncommon )
>>  import Darcs.Utils ( withCurrentDirectory, catchall, promptYorn, prettyError )
>>  import Darcs.External ( copyFileOrUrl, Cachable(..), fetchFilePS )
>> -import Darcs.Lock ( withTemp )
>>  import Progress ( debugMessage, tediousSize, beginTedious, endTedious )
>>  import Darcs.ProgressPatches (progressRLShowTags, progressFL)
>>  import Darcs.Lock ( writeBinFile, writeDocBinFile, rmRecursive )
>> hunk ./src/Darcs/Repository.hs 125
>>  import ByteStringUtils( gzReadFilePS )
>>
>>  import System.FilePath( (</>) )
>> -import Codec.Archive.Tar ( extract )
>> -
>> +import qualified Codec.Archive.Tar as Tar
>> +import Codec.Compression.GZip ( compress, decompress )
>>  import qualified Data.ByteString.Char8 as BS
>> hunk ./src/Darcs/Repository.hs 128
>> +import qualified Data.ByteString.Lazy as BL
>>
>>  #include "impossible.h"
>>
> OK, more imports
>
>> hunk ./src/Darcs/Repository.hs 155
>>             unless (isPartial == IsPartial) $ copyFullRepository fromRepo
>>      | otherwise = do
>>          isPacked <- doesRemoteFileExist $ fromDir </> darcsdir </>
>> -          "packs" </> "invprist.tar"
>> +          "packs" </> "invprist.tar.gz"
>>          if isPacked
>>            then copyPackedRepository fromRepo
>>            else copyFullRepository fromRepo
> OK
>
>> hunk ./src/Darcs/Repository.hs 160
>>   where
>> -  doesRemoteFileExist x = fetchFilePS x Cachable >> return True `catch`
>> -    (\_ -> return False)
>> +  doesRemoteFileExist x = (fetchFilePS x Cachable >> return True) `catchall`
>> +    return False
> Changes catch to catchall, probably quite OK. Maybe should be amend of
> previous though, instead of part of this one.

The main fix here are parentheses, without them exception doesn't get caught.

>>  data PorNP = NotPartial | IsPartial
>>               deriving ( Eq )
>> hunk ./src/Darcs/Repository.hs 273
>>    toCache2 <- unionRemoteCaches toCache fromCache fromDir
>>    let toRepo :: Repository p C(r u t)
>>        toRepo = Repo toDir opts toFormat $ DarcsRepository toPristine toCache2
>> +      fromPacksDir = fromDir ++ "/" ++ darcsdir ++ "/packs/"
>> +  createDirectoryIfMissing False $ toDir </> darcsdir </> "inventories"
>> +  createDirectoryIfMissing False $ toDir </> darcsdir </> "pristine.hashed"
>> +  createDirectoryIfMissing False $ toDir </> darcsdir </> "patches"
>>    -- unpack inventory & pristine cache
>> hunk ./src/Darcs/Repository.hs 278
>> -  withTemp $ \tmp -> do
>> -    let fromPacksDir = fromDir ++ "/" ++ darcsdir ++ "/packs/"
>> -    copyFileOrUrl [] (fromPacksDir ++ "invprist.tar") tmp Uncachable
>> -    extract toDir tmp
>> -    createPristineDirectoryTree toRepo "."
>> +  writeCompressed . Tar.read . decompress . BL.fromChunks . return =<<
>> +    fetchFilePS (fromPacksDir ++ "invprist.tar.gz") Uncachable
> This is a little dangerous, since it means having the complete tarball
> in memory at once. The version with temporary directory is probably
> better. You can presumably decompress and unpack the tarball in a
> streamed fashion from disk somehow? This may need some benchmarking and
> maybe some creative way to measure memory usage during decompression.

Clone of fetchFilePS that works on lazy bytestrings should solve that,
even without saving to disk. And why there's need for creative
benchmarking, wouldn't GHC's standard profiling features suffice?

>> +  createPristineDirectoryTree toRepo "."
>>    -- pull new patches
>>    us <- readRepo toRepo
>>    them <- readRepo fromRepo
>> hunk ./src/Darcs/Repository.hs 292
>>      applyToWorking toRepo opts pw
>>      return ()
>>    -- get old patches
>> -  fetchPatchesIfNecessary opts toRepo
>> +  writeCompressed . Tar.read . decompress . BL.fromChunks .
>> +    return =<< fetchFilePS (fromPacksDir ++ "patches.tar.gz") Uncachable
>> + where
>> +  writeCompressed Tar.Done = return ()
>> +  writeCompressed (Tar.Next x xs) = case Tar.entryContent x of
>> +    Tar.NormalFile x' _ -> do
>> +      let p = Tar.entryPath x
>> +      BL.writeFile p $ if "hashed_inventory" `isSuffixOf` p
>> +        then x'
>> +        else compress x'
>> +      writeCompressed xs
>> +    _ -> fail ""
>> +  writeCompressed (Tar.Fail e) = fail e
> OK

The one thing I'm not sure about is how to use cache. For me it looks like this:

case number of cached patches of
  Few -> copy a patch pack, ignore cached patches
  Many -> copy cached patches, copy remote patches one-by-one

But reliable way to determine border between these cases is mystery to
me. Or maybe these methods could be somehow combined? Say, start
copying patch pack to get patches from the end of history and in
parallel copy patches one by one using cache from the start. Looks
like a good idea, maybe there are some cases where it won't work well?


More information about the darcs-users mailing list