[darcs-users] darcs patch: make rollback smarter about breaking up changes.

Jason Dagit dagit at codersbase.com
Thu May 14 01:38:17 UTC 2009


On Wed, May 13, 2009 at 5:55 PM, Eric Kow <kowey at darcs.net> wrote:

> On Tue, May 12, 2009 at 22:16:34 -0700, David Roundy wrote:
> > > We'd like to pull this patch from your version of darcs.  Any chance
> you
> > > could explain it to us?
> >
> > No, not really.  It's been a while, and I don't have time to review it
> now.
>
> OK, no problem.  I'll try to work out an explanation for this one.
>
> If a volunteer could step up to review David's patch, even better!
> Anybody?


] hunk ./src/Darcs/Commands/Rollback.lhs 45
                           read_repo, slurp_recorded,
                           tentativelyMergePatches, withGutsOf,
                           finalizeRepositoryChanges, sync_repo )
-import Darcs.Patch ( summary, invert, namepatch, effect, fromPrims,
sort_coalesceFL )
+import Darcs.Patch ( summary, invert, namepatch, effect, fromPrims,
+                     sort_coalesceFL, canonize )
 import Darcs.Ordered
 import Darcs.Hopefully ( n2pia )
 import Darcs.Lock ( world_readable_temp )
hunk ./src/Darcs/Commands/Rollback.lhs 127
                              exitWith ExitSuccess
        definePatches ps
        with_selected_last_changes_to_files' "rollback" opts
-               existing_files (sort_coalesceFL $ effect ps) $ \ (_:>ps'')
->
+               existing_files (concatFL $ mapFL_FL canonize $
+                               sort_coalesceFL $ effect ps) $ \ (_:>ps'')
->
          do when (nullFL ps'') $ do logMessage "No changes selected!"
                                     exitWith ExitSuccess
             let make_log = world_readable_temp "darcs-rollback"

Understanding this patch is really understanding why we want to canonize.
Looking in Darcs.Patch.Prim I see:

It can sometimes be handy to have a canonical representation of a given
patch.  We achieve this by defining a canonical form for each patch type,
and a function ``{\tt canonize}'' which takes a patch and puts it into
canonical form.  This routine is used by the diff function to create an
optimal patch (based on an LCS algorithm) from a simple hunk describing the
old and new version of a file.
\begin{code}
canonize :: Prim C(x y) -> FL Prim C(x y)
canonize (Split ps) = sort_coalesceFL ps
canonize p | IsEq <- is_identity p = NilFL
canonize (FP f (Hunk line old new)) = canonizeHunk f line old new
canonize p = p :>: NilFL

mapPrimFL :: (FORALL(x y) FL Prim C(x y) -> FL Prim C(x y))
             -> FL Prim C(w z) -> FL Prim C(w z)
mapPrimFL f x =
#ifdef GADT_WITNESSES
                f x
#else
-- an optimisation; break the list up into independent sublists
-- and apply f to each of them
     case mapM toSimple $ mapFL id x of
     Just sx -> foldr (+>+) NilFL $ elems $
                mapWithKey (\ k p -> f (fromSimples k (p NilFL))) $
                fromListWith (flip (.)) $
                map (\ (a,b) -> (a,(b:>:))) sx
     Nothing -> f x

-- | 'sort_coalesceFL' @ps@ coalesces as many patches in @ps@ as
--   possible, sorting the results according to the scheme defined
--   in 'comparePrim'
sort_coalesceFL :: FL Prim C(x y) -> FL Prim C(x y)
sort_coalesceFL = mapPrimFL sort_coalesceFL2

-- | The heart of "sort_coalesceFL"
sort_coalesceFL2 :: FL Prim C(x y) -> FL Prim C(x y)
sort_coalesceFL2 NilFL = NilFL
sort_coalesceFL2 (x:>:xs) | IsEq <- nullP x = sort_coalesceFL2 xs
sort_coalesceFL2 (x:>:xs) | IsEq <- is_identity x = sort_coalesceFL2 xs
sort_coalesceFL2 (x:>:xs) = either id id $ push_coalesce_patch x $
sort_coalesceFL2 xs

-- | 'push_coalesce_patch' @new ps@ is almost like @new :>: ps@ except
--   as an alternative to consing, we first try to coalesce @new@ with
--   the head of @ps at .  If this fails, we try again, using commutation
--   to push @new@ down the list until we find a place where either
--   (a) @new@ is @LT@ the next member of the list [see 'comparePrim']
--   (b) commutation fails or
--   (c) coalescing succeeds.
--   The basic principle is to coalesce if we can and cons otherwise.
--
--   As an additional optimization, push_coalesce_patch outputs a Left
--   value if it wasn't able to shrink the patch sequence at all, and
--   a Right value if it was indeed able to shrink the patch sequence.
--   This avoids the O(N) calls to lengthFL that were in the older
--   code.
--
--   Also note that push_coalesce_patch is only ever used (and should
--   only ever be used) as an internal function in in
--   sort_coalesceFL2.
push_coalesce_patch :: Prim C(x y) -> FL Prim C(y z)
                    -> Either (FL Prim C(x z)) (FL Prim C(x z))
push_coalesce_patch new NilFL = Left (new:>:NilFL)
push_coalesce_patch new ps@(p:>:ps')
    = case coalesce (p :< new) of
      Just new' | IsEq <- nullP new' -> Right ps'
                | otherwise -> Right $ either id id $ push_coalesce_patch
new' ps'
      Nothing -> if comparePrim new p == LT then Left (new:>:ps)
                            else case commutex (p :< new) of
                                 Just (new' :< p') ->
                                     case push_coalesce_patch new' ps' of
                                     Right r -> Right $ either id id $
                                                push_coalesce_patch p' r
                                     Left r -> Left (p' :>: r)
                                 Nothing -> Left (new:>:ps)


Yay for haddocks!  The comment in mapPrimFL means I can pretty much ignore
it, so I do.  I'm really glad someone took the time to read over
push_coalesce and optimize it and document it.  That is really cool!

So now it looks like the trick is to figure out what coalesce is doing:
-- | 'coalesce' @p2 :< p1@ tries to combine @p1@ and @p2@ into a single
--   patch without intermediary changes.  For example, two hunk patches
--   modifying adjacent lines can be coalesced into a bigger hunk patch.
--   Or a patch which moves file A to file B can be coalesced with a
--   patch that moves file B into file C, yielding a patch that moves
--   file A to file C.
coalesce :: (Prim :< Prim) C(x y) -> Maybe (Prim C(x y))
coalesce (FP f1 _ :< FP f2 _) | f1 /= f2 = Nothing
coalesce (p2 :< p1) | IsEq <- p2 =\/= invert p1 = Just null_patch
coalesce (FP f1 p1 :< FP _ p2) = coalesceFilePrim f1 (p1 :< p2) -- f1 = f2
coalesce (Identity :< p) = Just p
coalesce (p :< Identity) = Just p
coalesce (Split NilFL :< p) = Just p
coalesce (p :< Split NilFL) = Just p
coalesce (Move a b :< Move b' a') | a == a' = Just $ Move b' b
coalesce (Move a b :< FP f AddFile) | f == a = Just $ FP b AddFile
coalesce (Move a b :< DP f AddDir) | f == a = Just $ DP b AddDir
coalesce (FP f RmFile :< Move a b) | b == f = Just $ FP a RmFile
coalesce (DP f RmDir :< Move a b) | b == f = Just $ DP a RmDir
coalesce (ChangePref p f1 t1 :< ChangePref p2 f2 t2) | p == p2 && t2 == f1 =
Just $ ChangePref p f2 t1
coalesce _ = Nothing

I think, all this does is change the representation back and forth a bit
trying to break things apart and reassemble them to be minimal then
concatenate the whole thing back together.

I do wonder about the operations at the high level.  For example, it seems
as though we can dodge a call to mapFL_FL by doing a concatFL between the
sort_coalesceFL and the canonize.  At first I thought maybe the
sort_coalesceFL followed by a canonize was redundant and the sort_coalesceFL
could be removed, but it may be hard to verify that and I doubt we have any
test cases for rollback that would check if it is equivalent.

Bottom line:  It looks like it breaks things apart and tries to simplify
them as it puts them back together.  There may be some redundant cases that
would offer an unknown amount of optimization, but the current implentation
is probably more conservative (meaning safe) and not likely to be a bottle
neck.

Does that match your analysis Eric?

Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osuosl.org/pipermail/darcs-users/attachments/20090513/e0093c1e/attachment-0001.htm>


More information about the darcs-users mailing list