[darcs-users] darcs patch: Import relevant bits of gorsvet, for now... (and 16 more)

Sun May 31 15:10:43 UTC 2009

Reinier Lamers <tux_rocker at reinier.de> writes:

> On Wednesday 27 May 2009 15:57:00 Eric Kow wrote:
>> Reinier: could you read Gorsvet.hs itself in a more thorough manner?
>
> Yes, of course. General remarks:
>  * Though I complain a bit below about missing documentation, the haddocks of      
>    hashed-storage are very helpful when present.
>  * No glaring bugs found, but quite a lot of questions. When Petr answers 
>    them, it's OK to apply the bundle.
>
> First thing I observe about Gorsvet.hs is that it gives an awful lot of 
> warnings about missing type signatures and other things.
Right, this is work in progress and things are in such a flux that it sometimes
doesn't make sense to fix all warnings. I agree they could be reduced a little.

> <imports skipped>. Darcs.Gorsvet does not explicitly declare what it exports. 
> I'd rather see it did, but that may be a personal preference. What do you 
> thnk?
It eventually will, but it's more hassle than worth at this stage (same as
warnings, I guess). In fact, I'd like to get rid of the module altogether, in
the long run.

>>floatFn = floatPath . fn2fp
>
> What floatPath does is not so well documented on hackage, especially because 
> the AnchoredPath type has no explanation.
Ok, I'll haddock AnchoredPath, it is a fairly important bit.

>>instance ReadableDirectory (StateT TreeState IO) where
>>    mDoesDirectoryExist d = gets (\x -> isJust $ findTree (tree x) (floatFn
>> d)) mDoesFileExist f = gets (\x -> isJust $ findFile (tree x) (floatFn f))
>> mInCurrentDirectory d action = do -- TODO bracket?
>>      wd <- gets cwd
>>      modify (\x -> x { cwd = floatFn d })
>>      x <- action
>>      modify (\x -> x { cwd = wd })
>>      return x
>>    mGetDirectoryContents = error "get dir contents"
>>    mReadFilePS p = do x <- readFile (floatFn p) -- ratify readFile: ...
>>                       return $ BS.concat (BL.toChunks x)
>
> Wouldn't it make the code more readable to use a type synonym for "StateT 
> TreeState IO"? The haddock for hashed-storage mentions "TreeIO" without a link 
> sometimes. Is that a type synonym for StateT TreeState IO?

> BTW, what is a TreeState? It's not documented anywhere. It appears to be a 
> Tree with a current working directory.

I think it's forbidden to use type synonyms in instance declarations or
something (I know I tried and GHC complained loudly for a reason or other). I
use TreeIO elsewhere. I'll also haddock TreeState.

> Is it OK to have mSetFileExecutable be a no-op? Or is this something you plan 
> to change in the future? Also, I'd love to see haskell_policy cursed in Czech 
> for a change :-). For the interested reader: all the functions with the system 
> call-like names (unlink, createDirectory, readFile) are not Haskell library 
> functions but functions exported by hashed-storage.
So far, it is OK, but yes, eventually this will need to be implemented. I'll
see what can be done for Czech. ;)

>>treeDiff :: (FilePath -> FileType) -> Tree -> Tree -> IO (FL Prim)
>>treeDiff ft t1 t2 = do
>>  (from, to) <- diffTrees t1 t2
>>  diffs <- sequence $ zipTrees diff from to
>>  return $ foldr (+>+) NilFL (diffs)
>
> I suppose the parentheses around diffs can be removed?
I guess so, too. ;)

[snip code]
> How about the case of a file in the one tree and a directory in the other 
> tree, or reverse?
Yes, this needs some thought. It would be also good to have this covered in the
testsuite somehow.

> That is one long function. Perhaps split the 'diff' and 'text_diff' functions 
> into separate top-level ones?
Dunno. I like the encapsulation effect -- it makes it less tempting to use
text_diff elsewhere, while it really should not.

>>readRecordedAndPending :: (RepoPatch p) => Repository p -> IO Tree
>>readRecordedAndPending repo = do
>>  pristine <- readDarcsPristine "."
>>  Sealed pend <- read_pending repo
>>  (_, t) <- virtualTreeIO (apply [] pend) pristine
>>  return t
>
> OK.
Sort of OK, anyway. :)

>>unrecordedChanges :: (RepoPatch p) => [DarcsFlag] -> Repository p
>>                  -> (Tree -> Tree) -> IO (FL Prim)
>>unrecordedChanges opts repo restrict_ = do
>>  checkIndex repo
>>  slurp_pending repo -- XXX: only here to get us the "pending conflicts"
>> check -- that I don't know yet how to implement properly pristine <-
>> readDarcsPristine "."
>>  Sealed pending_patches <- read_pending repo
>>  (res, current') <- virtualTreeIO (apply [] pending_patches) pristine
>>  let current = {- restrict -} current'
>>
>>  working <- case (LookForAdds `elem` opts, IgnoreTimes `elem` opts) of
>>               (False, False) -> (restrict_ `fmap` readIndex) >>= unfold
>>               (False, True) -> do guide <- unfold current
>>                                   restrict guide `fmap` readPlainTree "."
>>               (True, _) -> filter nodarcs `fmap` readPlainTree "."
>>  ft <- filetype_function
>>  diff <- treeDiff ft current working
>>  return $ sort_coalesceFL (pending_patches +>+ diff)
>>      where nodarcs (AnchoredPath (Name x:_)) _ | x == BS.pack "_darcs" =
>> False nodarcs _ _ = True
>
> What's the "restrict" business in this function? It looks OK further. 
> Shouldn't the nodarcs function take the user's boring file into account 
> eventually?
The restrict business is how the Tree gets narrowed down to only the
"interesting" bits (where we eg. just look for changes in a directory or
files). There's restrict_subpaths that can be used to produce a suitable
predicate from a list of SubPath's.

This suffers from one complicated issue though, that is pending renames. I so
far didn't figure out where and how to apply the renames without breaking
something, somewhere. Current darcs behaviour is to show file's changes
whenever either the pre-rename or post-rename filename given. I think the right
solution would be to extend the list of restrictions like this:

    let paths' = paths `union` apply_to_filepaths (inverse pending) paths

or such, and then feeding it to restrict_subpaths. However, I have layering
problem, since the client code shouldn't need to know anything about
pending. Probably an extra function akin to restrict_subpaths that would also
take the pending patch into account would be the right solution. (It needs to
go into IO though. Probably no big deal.)

There's still some bits to sort out though with this restrict business. Also,
as you point out, nodarcs needs to take boring into account, which further
complicates the code. I guess I'll rip the "nodarcs" filter out as a toplevel
function that produces a predicate from the boring file.

>>-- XXX both application actions below could avoid unfolding if TreeIO would
>> be -- smart enough to unfold-as-needed...
>>applyToTentativePristine _ patches =
>>    do pristine <- readDarcsPristine "." >>= unfold
>>       (_, tree) <- hashedTreeIO (apply [] patches)
>>                    pristine "_darcs/pristine.hashed"
>>       BS.writeFile "_darcs/tentative_pristine" $
>>         BS.concat [BS.pack "pristine:"
>>                   , darcsFormatHash (fromJust $ treeHash tree)]
>
> OK.
>
>>applyToWorking :: (RepoPatch p) => Repository p -> Sealed (FL Prim) -> IO
>> Tree applyToWorking _ (Sealed patches) =
>>    do pristine <- readDarcsPristine "." >>= unfold
>>       working <- readIndex
>>       snd `fmap` plainTreeIO (apply [] patches) working "."
>
> Why is it reading that pristine there? Just to ensure it is in a darcs repo?
>
>>tentativelyMerge r cmd usi themi =
>
> A type signature would have been helpful here.
>
>>  do let us = mapFL_FL hopefully usi
>>         them = mapFL_FL hopefully themi
>>         (_ :/\: pc) = merge (progressFL "Merging them" them
>>
>>                                             :\/: progressFL "Merging us"
>>                                             : us)
>>
>>     pend <- unrecordedChanges [] r id
>>     anonpend <- anonymous (fromPrims pend)
>>     let pend' :/\: pw = merge (pc :\/: anonpend :>: NilFL)
>>         pwprim = joinPatches $ mapFL_FL patchcontents pw
>>         Sealed standard_resolved_pw = standard_resolution pwprim
>>     mapM_ backupByCopying $ list_touched_files standard_resolved_pw
>>     have_conflicts <- announce_merge_conflicts cmd [] standard_resolved_pw
>>     have_unrecorded_conflicts <- check_unrecorded_conflicts [] pc
>>     let Sealed pw_resolution = if have_conflicts ||
>> have_unrecorded_conflicts then seal NilFL
>>                                   else seal standard_resolved_pw
>>     let doChanges :: FL (PatchInfoAnd p) -> IO ()
>>         doChanges NilFL = applyps r themi
>>         doChanges _     = applyps r (mapFL_FL n2pia pc)
>>     doChanges usi
>>     setTentativePending r (effect pend' +>+ pw_resolution)
>>     return $ seal (effect pwprim +>+ pw_resolution)
>>  where mapAdd :: RepoPatch p => Repository p -> FL (PatchInfoAnd p) -> [IO
>> ()] mapAdd _ NilFL = []
>>        mapAdd r' (a:>:as) =
>>               (add_to_tentative_inventory (extractCache r') NoCompression a
>> >> return ()) : mapAdd r' as applyps :: (RepoPatch p) => Repository p -> FL
>> (PatchInfoAnd p) -> IO () applyps repo ps = do sequence_ $ mapAdd repo ps
>>                             applyToTentativePristine repo ps
>
> The code in this function looks quite David-ish. Did you adapt it from 
> elsewhere? I have to say I don't understand it. First it merges the changes 
> from another source, then it tries to find out if there were conflicts and in 
> the end it applies the original merge to the tentative pristine. It does not 
> look particularly dangerous, but I can't tell you why that seal is there 
> around pw_resolution, for example.
Actually, I guess this code should be in fact removed. It's indeed a copy of a
function from Repository.Internal with things reone to use TreeIO instead of
Slurps.

>>filter_paths files =
>>    \p _ -> any (\x -> x `isPrefix` p || p `isPrefix` x) files
>>
>>restrict_paths files = if null files
>>                          then id
>>                          else filter $ filter_paths files
>>
>>restrict_subpaths = restrict_paths . map (floatPath . fn2fp . sp2fn)
>
> This goes from a list of subpaths and a list of anchored pahs to the list of 
> those anchored paths that are prefixes or suffixes of any files in the first 
> list of paths. Type signatures and docs are forthcoming, I hope?
Eventually, when I'll figure the right interface for this. This has been mostly
blocking on the pending issue above, which I hope is now on its path to
resolution.

>>checkIndex repo = do
>>  invalid <- doesFileExist "_darcs/index_invalid"
>>  exist <- doesFileExist "_darcs/index"
>>  when (not exist || invalid) $ updateIndex repo
>>  when invalid $ removeFile "_darcs/index_invalid"
>
> OK. Have you thought about races where two darcs processes would be trying to 
> rebuild the index at the same time?
I believe this is "safe" (for some value of safe, anyway). See also my reply to
Trent, message-id <87ws7yelb5.fsf at mornfall.net>.

>>updateIndex repo = do
>>  pristine <- readRecordedAndPending repo
>>  updateIndexFrom pristine >>= unfold
>>  return ()
>
> What determines what index updateIndexFro modifies? The current working 
> directory of the process?
Current working directory indeed. I think this is assumed to be the repository
directory at various places in darcs code (and --repodir's effect is to cwd
into the repodir as well).

Thanks for review.

Yours,
   Petr.

-- 
Peter Rockai | me()mornfall!net | prockai()redhat!com
 http://blog.mornfall.net | http://web.mornfall.net

"In My Egotistical Opinion, most people's C programs should be
 indented six feet downward and covered with dirt."
     -- Blair P. Houghton on the subject of C program indentation