[darcs-devel] [patch1134] resolve issue1987: Garbage collection for inventories ...

Guillaume Hoffmann bugs at darcs.net
Mon Apr 7 18:21:21 UTC 2014


Guillaume Hoffmann <guillaumh at gmail.com> added the comment:

Hi Marcio,

first, I recommend you to set your author string with the pattern
"Full Name <email at aaaa.aa>" since we have some scripts that catch that
format to generate the news bulletin. And it's more standard.

Also when sending an amended version of a patch, it is better to send
it with the same ticket number, in the present case it would have
been:

     darcs send --mail --subject '[patch1128]'

Let's continue with that ticket now (1134).

Code review (I'm doing it in an order that makes more sense to me):

> hunk ./src/Darcs/UI/Commands/Optimize.hs 167
> - "\n" ++
> - "There is one more optimization which CAN NOT be performed by this\n" ++
> - "command.  Every time your record a patch, a new inventory file is\n" ++
> - "written to `_darcs/inventories/`, and old inventories are never reaped.\n" ++
> - "\n" ++
> - "If `_darcs/inventories/` is consuming a relatively large amount of\n" ++
> - "space, you can safely reclaim it by using `darcs get` to make a\n" ++
> - "complete copy of the repo.  When doing so, don't forget to copy over\n" ++
> -  "any unsaved changes you have made to the working tree or to\n" ++
> - "unversioned files in `_darcs/prefs/` (such as `_darcs/prefs/author`).\n"
> + "\n"
> hunk ./src/Darcs/UI/Commands/Optimize.hs 198
> -    cleanRepository repository -- garbage collect pristine.hashed directory
> +    cleanRepository repository -- garbage collect pristine.hashed, inventories and patches directories

OK.

> hunk ./src/Darcs/Repository/Internal.hs 182
> +                            , cleanInventories
> +                            , cleanPatches
> hunk ./src/Darcs/Repository/Internal.hs 895
> -    HvsO { hashed = HashedRepo.cleanPristine repository,
> +    HvsO { hashed = cleanHashedRepo repository,
> hunk ./src/Darcs/Repository/Internal.hs 897
> +           where
> +            cleanHashedRepo r = do
> +              HashedRepo.cleanPristine r
> +              HashedRepo.cleanInventories r
> +              HashedRepo.cleanPatches r

OK, I'm fine with cleaning everything at once.

> hunk ./src/Darcs/Repository/HashedRepo.hs 22
> +    , cleanInventories
> +    , cleanPatches
> hunk ./src/Darcs/Repository/HashedRepo.hs 57
> +import qualified Data.Set as Set
> hunk ./src/Darcs/Repository/HashedRepo.hs 64
> -import System.Directory ( createDirectoryIfMissing )
> +import System.Directory ( createDirectoryIfMissing, getDirectoryContents, doesFileExist )
> hunk ./src/Darcs/Repository/HashedRepo.hs 85
> +    , removeFileMayNotExist
> hunk ./src/Darcs/Repository/HashedRepo.hs 140
> +patchesDir, patchesDirPath :: String
> +patchesDir = "patches"
> +patchesDirPath = makeDarcsdirPath patchesDir
> +

OK.

> hunk ./src/Darcs/Repository/HashedRepo.hs 241
> +-- |cleanInventories removes any obsolete (unreferenced) files in the
> +-- inventories directory
> +cleanInventories :: Repository p wR wU wT -> IO ()
> +cleanInventories _ = do
> +    debugMessage "Cleaning out inventories..."
> +    hs <- listInventories

This calls the function that returns you a list of inventory hashes,
jumping from one "Starting With:" to another, starting with the hash
given in hashed_inventory. This will give us the good inventory files,
the ones we want to keep.

> +    fs <- filter okayHash <$> getDirectoryContents inventoriesDirPath

This lists the files of _darcs/inventories , only keeping those that
have the name length of a hashed file.

A side-effect of that is that we're not going to clean any random file
that could have been added manually (_darcs/inventories/SOME_FILE).
Maybe that's not a problem.

> +    mapM_ (removeFileMayNotExist . (inventoriesDirPath </>))
> +        (unset $ (set fs) `Set.difference` (set hs))
> +    where   set = Set.fromList . map BC.pack
> +            unset = map BC.unpack . Set.toList
> +

And we're removing those files. Seems ok.


> +-- |cleanPatches removes any obsolete (unreferenced) files in the
> +-- patches directory
> +cleanPatches :: Repository p wR wU wT -> IO ()
> +cleanPatches _ = do
> +    debugMessage "Cleaning out patches..."
> +    hs <- listPatches
> +    fs <- filter okayHash <$> getDirectoryContents patchesDirPath
> +    mapM_ (removeFileMayNotExist . (patchesDirPath </>))
> +        (unset $ (set fs) `Set.difference` (set hs))
> +    where   set = Set.fromList . map BC.pack
> +            unset = map BC.unpack . Set.toList
> +

OK so what happens is that we only delete files that have the
"shape" of a hashed file (since we filter with `okayHash`).

Hence files like pending, pending.tentative, unrevert are not deleted, good.
The alternative would be to have a special "white list" of files we
don't want to delete,
and not use `okayHash`.

> hunk ./src/Darcs/Repository/HashedRepo.hs 457
> -    inv <- skipPristine <$> gzFetchFilePS (dir </> invName) Uncachable
> -    readInventoryFromContent (toPath dir ++ "/" ++ darcsdir ++ invName) inv
> +  inv <- skipPristine <$> gzFetchFilePS (dir </> invName) Uncachable
> +  readInventoryFromContent (toPath dir ++ "/" ++ darcsdir ++ invName) inv
> +

You can remove this hunk with `darcs amend --unrecord`.



> hunk ./src/Darcs/Repository/HashedRepo.hs 601
> -        fst <$> readInventoryPrivate  invDir inv
> +        fst <$> readSafeInventoryPrivate  invDir inv
> hunk ./src/Darcs/Repository/HashedRepo.hs 608
> +    readSafeInventoryPrivate dir invName = do
> +        b <- doesFileExist (dir </> invName)
> +        if b then readInventoryPrivate dir invName
> +            else return (Nothing, [])
> +
> +

You should define another function `listInventories`
that call `readSafeInventoryPrivate` because there are other
functions that call `listInventories` and you precisely want them
to retrieve abstent inventories, otherwise darcs believes that your
repository is almost empty... (this happens when
you're working in a lazy repository, the directory `_darcs/inventories`
is actually empty right after cloning).

You could call that separate function `listInventoriesLocal`,
and don't forget to update the Haddock comment.

Also I'm not sure about the name `readSafeInventoryPrivate` since
it's not a matter of safety really; you could call it
`readInventoryLocalPrivate`.

> +listPatches :: IO [String]
> +listPatches = do
> +    inventory <- readSafeInventoryPrivate darcsdir hashedInventory

I'd also name it `listPatchesLocal` and put it a comment as for
`listInventories`.

Same comment as above about `readSafeInventoryPrivate`.

> +    followStartingWiths (fst inventory) (getPatches inventory)

You're getting the hashes of the patches of hashed_inventory and
continuing with the next inventory in the "Starting With:" chain; good.

> +    where
> +        followStartingWiths Nothing patches = return patches

ok.

> +        followStartingWiths (Just startingWith) patches = do
> +            inv <- readSafeInventoryPrivate inventoriesDirPath startingWith
> +            (patches++) <$> followStartingWiths (fst inv) (getPatches inv)
> +
> +        getPatches inv = map snd (snd inv)

OK.

> +
> +        readSafeInventoryPrivate dir invName = do
> +            b <- doesFileExist (dir </> invName)
> +            if b then readInventoryPrivate dir invName
> +                else return (Nothing, [])
> +

Same remark as above.

__________________________________
Darcs bug tracker <bugs at darcs.net>
<http://bugs.darcs.net/patch1134>
__________________________________


More information about the darcs-devel mailing list