[darcs-devel] [patch1134] resolve issue1987: Garbage collection for inventories ...
Guillaume Hoffmann
bugs at darcs.net
Mon Apr 7 18:21:21 UTC 2014
Guillaume Hoffmann <guillaumh at gmail.com> added the comment:
Hi Marcio,
first, I recommend you to set your author string with the pattern
"Full Name <email at aaaa.aa>" since we have some scripts that catch that
format to generate the news bulletin. And it's more standard.
Also when sending an amended version of a patch, it is better to send
it with the same ticket number, in the present case it would have
been:
darcs send --mail --subject '[patch1128]'
Let's continue with that ticket now (1134).
Code review (I'm doing it in an order that makes more sense to me):
> hunk ./src/Darcs/UI/Commands/Optimize.hs 167
> - "\n" ++
> - "There is one more optimization which CAN NOT be performed by this\n" ++
> - "command. Every time your record a patch, a new inventory file is\n" ++
> - "written to `_darcs/inventories/`, and old inventories are never reaped.\n" ++
> - "\n" ++
> - "If `_darcs/inventories/` is consuming a relatively large amount of\n" ++
> - "space, you can safely reclaim it by using `darcs get` to make a\n" ++
> - "complete copy of the repo. When doing so, don't forget to copy over\n" ++
> - "any unsaved changes you have made to the working tree or to\n" ++
> - "unversioned files in `_darcs/prefs/` (such as `_darcs/prefs/author`).\n"
> + "\n"
> hunk ./src/Darcs/UI/Commands/Optimize.hs 198
> - cleanRepository repository -- garbage collect pristine.hashed directory
> + cleanRepository repository -- garbage collect pristine.hashed, inventories and patches directories
OK.
> hunk ./src/Darcs/Repository/Internal.hs 182
> + , cleanInventories
> + , cleanPatches
> hunk ./src/Darcs/Repository/Internal.hs 895
> - HvsO { hashed = HashedRepo.cleanPristine repository,
> + HvsO { hashed = cleanHashedRepo repository,
> hunk ./src/Darcs/Repository/Internal.hs 897
> + where
> + cleanHashedRepo r = do
> + HashedRepo.cleanPristine r
> + HashedRepo.cleanInventories r
> + HashedRepo.cleanPatches r
OK, I'm fine with cleaning everything at once.
> hunk ./src/Darcs/Repository/HashedRepo.hs 22
> + , cleanInventories
> + , cleanPatches
> hunk ./src/Darcs/Repository/HashedRepo.hs 57
> +import qualified Data.Set as Set
> hunk ./src/Darcs/Repository/HashedRepo.hs 64
> -import System.Directory ( createDirectoryIfMissing )
> +import System.Directory ( createDirectoryIfMissing, getDirectoryContents, doesFileExist )
> hunk ./src/Darcs/Repository/HashedRepo.hs 85
> + , removeFileMayNotExist
> hunk ./src/Darcs/Repository/HashedRepo.hs 140
> +patchesDir, patchesDirPath :: String
> +patchesDir = "patches"
> +patchesDirPath = makeDarcsdirPath patchesDir
> +
OK.
> hunk ./src/Darcs/Repository/HashedRepo.hs 241
> +-- |cleanInventories removes any obsolete (unreferenced) files in the
> +-- inventories directory
> +cleanInventories :: Repository p wR wU wT -> IO ()
> +cleanInventories _ = do
> + debugMessage "Cleaning out inventories..."
> + hs <- listInventories
This calls the function that returns you a list of inventory hashes,
jumping from one "Starting With:" to another, starting with the hash
given in hashed_inventory. This will give us the good inventory files,
the ones we want to keep.
> + fs <- filter okayHash <$> getDirectoryContents inventoriesDirPath
This lists the files of _darcs/inventories , only keeping those that
have the name length of a hashed file.
A side-effect of that is that we're not going to clean any random file
that could have been added manually (_darcs/inventories/SOME_FILE).
Maybe that's not a problem.
> + mapM_ (removeFileMayNotExist . (inventoriesDirPath </>))
> + (unset $ (set fs) `Set.difference` (set hs))
> + where set = Set.fromList . map BC.pack
> + unset = map BC.unpack . Set.toList
> +
And we're removing those files. Seems ok.
> +-- |cleanPatches removes any obsolete (unreferenced) files in the
> +-- patches directory
> +cleanPatches :: Repository p wR wU wT -> IO ()
> +cleanPatches _ = do
> + debugMessage "Cleaning out patches..."
> + hs <- listPatches
> + fs <- filter okayHash <$> getDirectoryContents patchesDirPath
> + mapM_ (removeFileMayNotExist . (patchesDirPath </>))
> + (unset $ (set fs) `Set.difference` (set hs))
> + where set = Set.fromList . map BC.pack
> + unset = map BC.unpack . Set.toList
> +
OK so what happens is that we only delete files that have the
"shape" of a hashed file (since we filter with `okayHash`).
Hence files like pending, pending.tentative, unrevert are not deleted, good.
The alternative would be to have a special "white list" of files we
don't want to delete,
and not use `okayHash`.
> hunk ./src/Darcs/Repository/HashedRepo.hs 457
> - inv <- skipPristine <$> gzFetchFilePS (dir </> invName) Uncachable
> - readInventoryFromContent (toPath dir ++ "/" ++ darcsdir ++ invName) inv
> + inv <- skipPristine <$> gzFetchFilePS (dir </> invName) Uncachable
> + readInventoryFromContent (toPath dir ++ "/" ++ darcsdir ++ invName) inv
> +
You can remove this hunk with `darcs amend --unrecord`.
> hunk ./src/Darcs/Repository/HashedRepo.hs 601
> - fst <$> readInventoryPrivate invDir inv
> + fst <$> readSafeInventoryPrivate invDir inv
> hunk ./src/Darcs/Repository/HashedRepo.hs 608
> + readSafeInventoryPrivate dir invName = do
> + b <- doesFileExist (dir </> invName)
> + if b then readInventoryPrivate dir invName
> + else return (Nothing, [])
> +
> +
You should define another function `listInventories`
that call `readSafeInventoryPrivate` because there are other
functions that call `listInventories` and you precisely want them
to retrieve abstent inventories, otherwise darcs believes that your
repository is almost empty... (this happens when
you're working in a lazy repository, the directory `_darcs/inventories`
is actually empty right after cloning).
You could call that separate function `listInventoriesLocal`,
and don't forget to update the Haddock comment.
Also I'm not sure about the name `readSafeInventoryPrivate` since
it's not a matter of safety really; you could call it
`readInventoryLocalPrivate`.
> +listPatches :: IO [String]
> +listPatches = do
> + inventory <- readSafeInventoryPrivate darcsdir hashedInventory
I'd also name it `listPatchesLocal` and put it a comment as for
`listInventories`.
Same comment as above about `readSafeInventoryPrivate`.
> + followStartingWiths (fst inventory) (getPatches inventory)
You're getting the hashes of the patches of hashed_inventory and
continuing with the next inventory in the "Starting With:" chain; good.
> + where
> + followStartingWiths Nothing patches = return patches
ok.
> + followStartingWiths (Just startingWith) patches = do
> + inv <- readSafeInventoryPrivate inventoriesDirPath startingWith
> + (patches++) <$> followStartingWiths (fst inv) (getPatches inv)
> +
> + getPatches inv = map snd (snd inv)
OK.
> +
> + readSafeInventoryPrivate dir invName = do
> + b <- doesFileExist (dir </> invName)
> + if b then readInventoryPrivate dir invName
> + else return (Nothing, [])
> +
Same remark as above.
__________________________________
Darcs bug tracker <bugs at darcs.net>
<http://bugs.darcs.net/patch1134>
__________________________________
More information about the darcs-devel
mailing list