[darcs-users] darcs patch: add substitution mechanism for PatchChoices (and 4 more)

Eric Kow kowey at darcs.net
Wed Sep 23 07:35:18 UTC 2009


Still plodding along for this review.

On Sat, Sep 19, 2009 at 11:54:08 +0100, Ganesh Sittampalam wrote:
> Mon Aug 31 00:46:47 BST 2009  Ganesh Sittampalam <ganesh at earth.li>
>   * add some QC properties that demonstrate the problem with canonizeFL

Pushing this one... albeit with a slipshod review :-(
 
> Sat Sep 19 01:40:53 BST 2009  Ganesh Sittampalam <ganesh at earth.li>
>   * Resolve issue291: add (basic) interactive patch splitting

Not quite there yet, sorry!

Liberally re-arranging as I go

add some QC properties that demonstrate the problem with canonizeFL
-------------------------------------------------------------------
Silly Eric...

  data RepoModel C(x)
   = RepoModel {
       rmFileName :: !FileName,
       rmFileContents :: [B.ByteString]
     } deriving (Eq)
  
My brain wouldn't stop reading that 'rm' as 'remove' instead of 'repoModel'.  I
kept wondering why we were trying to track these removes.

Does this mean in general that our unit tests operate only on one-file repos?

> +repoModelToSlurpy :: RepoModel C(x) -> WSlurpy C(x)
> +repoModelToSlurpy rm
> +  = Witnessed $ Slurpy (rmFileName rm) (SlurpFile (Nothing, 0, 0) (B.concat (map (flip BC.snoc '\n') (rmFileContents rm))))

Create a one-file slurpy out of the RepoModel and tack a '\n' on it.

> +canonicalDiff :: RepoModel C(x) -> RepoModel C(y) -> FL Prim C(x y)
> +canonicalDiff rm1 rm2 = diff [IgnoreTimes] (const TextFile) (repoModelToSlurpy rm1) (repoModelToSlurpy rm2)

This will be better when Neil's CmdArgs work lands

> +instance Arbitrary (Sealed RepoModel) where
> +  arbitrary = do wes <- arbitraryState initRepoModel
> +                 let _ :: Sealed (FL Prim C(InitRepoModel)) = mapSeal wesPatch wes
> +                 return (mapSeal wesState wes)

Out of curiosity, why not just let foo :: and return foo?

For the interested, wes = WithEndState which the haddock says is
just a patch and its end state.

  data WithEndState px s C(y) = WithEndState { wesPatch :: px C(y)
                                             , wesState :: s C(y) }

> +  shrink (Sealed (RepoModel { rmFileName = fn, rmFileContents = lines }))
> +    = do lines' <- shrinkList lines
> +         return (Sealed (RepoModel { rmFileName = fn, rmFileContents = lines' }))

I don't really know much about QC so I'm learning as I go.  Shrink just returns
a list a possible shrinks - which I take to mean simpler candidate values.  I'm
guessing that the use for this would be to see if you can get the same failure
out of a shrink, in which case you should just use that to avoid reporting a
whole lot of gunk.  But that's just a guess.

> +shrinkList [] = []
> +shrinkList (x:xs) = xs:map (x:) (shrinkList xs)

I think the idea here is that each candidate shrink is just the result of
getting rid of some lines

Isn't this just some common list idiom like powerset? (Which is not to say
that we should necessarily use any of the fancy liftM tricks, but maybe just
call it what it is).

> +class Size p where
> +    size :: p C(x y) -> Int
>
> +instance Size p => Size (FL p) where
> +    size NilFL = 0
> +    size (p:>:ps) = size p + size ps

OK size looks like just a rough way of characterising patches.

> +instance Size FilePatchType where
> +   size RmFile = 1
> +   size AddFile = 1
> +   size (TokReplace _ _ _) = 1
> +   size (Binary c1 c2) = B.length c1 + B.length c2
> +   size (Hunk _ c1 c2) = length c1 + length c2

> +instance Size DirPatchType where
> +   size _ = 1

The important thing is that hunk changes are measured by the number of lines
affected.  Everything else (like a move patch) just counts as one change.

> +instance Size Prim where
> +    size (Move _ _) = 1
> +    size (DP _ dp) = size dp
> +    size (FP _ fp) = size fp
> +    size (Split ps) = size ps
> +    size Identity = 0
> +    size (ChangePref _ _ _) = 1

For the interested:

    Move :: !FileName -> !FileName -> Prim C(x y)
    DP :: !FileName -> !(DirPatchType C(x y)) -> Prim C(x y)
    FP :: !FileName -> !(FilePatchType C(x y)) -> Prim C(x y)
    Split :: FL Prim C(x y) -> Prim C(x y)
    Identity :: Prim C(x x)
    ChangePref :: !String -> !String -> !String -> Prim C(x y)

So we don't care about the filenames when grabbing the size of
directory of file patches.

> +prop_canonize_works :: (forall x y . FL Prim C(x y) -> FL Prim C(x y) -> a) -> Sealed RepoModel -> Sealed RepoModel -> Sealed RepoModel -> a
> +prop_canonize_works equality (Sealed rm1) (Sealed rm2) (Sealed rm3) =
>     canonicalDiff rm1 rm3 `equality` canonizeFL (canonicalDiff rm1 rm2 +>+ canonicalDiff rm2 rm3)

> +prop_canonize_is_canonical = prop_canonize_works ((==) `on` show)
> +prop_canonize_is_minimal = prop_canonize_works ((==) `on` size)

Now to the meaty bits.  Basically we want to check if canonicalDiff always
gives us the same result as taking two diffs and then canonizing.

For the interested, here is copy and pasting from a previous Ganesh
message the problem it's pointing out:
 
| state    1    2    3
| contents h => v => v
|          c         c
|          v         v
| 
| The direct diff is "h" -> "v" at the start, the indirect coalesced one
| removes "hc" at the start and adds "cv" at the end because it doesn't know that
| "v" exists in the middle.


diff :     -h
           +v

canonize:  -h
           -c
           +c
           +v

I guess we're looking for two things.  Canonicalness subsumes minimality, so
I'm not sure why we test for the latter.  My guess is that it's because
minimality is useful on its own and easier to achieve.

I confess I haven't run these tests yet -- is that easy to do?

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL: <http://lists.osuosl.org/pipermail/darcs-users/attachments/20090923/30ea0bc2/attachment.pgp>


More information about the darcs-users mailing list