[darcs-users] darcs patch: simplify issue965 test (which took quite... (and 5 more)

Wed Sep 17 17:48:45 UTC 2008

Hi David,

This is a pretty exciting bundle.  I like it when people say things like
"fix bugs in pending" especially when we have regression tests for them.

These are going into stable, so no action on your part needed.  For
issue885, I have some suggestions to reduce the hokeyness you point out.
For issues709 and issue1012, I only have eric-tries-to-understand
comments, so no action on your part necessary.  I will warn you that
eric does not fully understand, so I am going to have to just trust the
tests.  Nicolas Pouillard was looking into this bug, so I am hoping he
too will have a chance to review your patches.  Maybe between the four
of us (you, me, Nicolas and Jason, we can be fairly confident about
this).

I am very tempted to put these in the 2.0.3 pre-release.  I think making
people wait for until January to see this new code hurts them more than
the risk of new bugs introduced by the code.  Any thoughts, darcs-users?
In the worst case, we can rollback (and rollback my rollbacks).

The bugs
========
So here is my rough understanding of issue709.  You've got three patches
in pending, which we will abbreviate AHC

 Add  foo
 Hunk foo
 Changepref boring foo

We've got logic that says "when you create a named patch in the
repository, get rid of its corresponding changes in pending".

So if you record a patch AH, you get rid of the corresponding changes,
leaving behind C in pending.

There were two bugs.  For some reason, pending was in this *corrupt*
state: RC.  The corruption is that we are removing a non-empty file.
David fixed this bug, so that pending was now merely wrong (H^RC) and
then moved on to fixing the second bug so that we get the desired
pending C.

Issue1012 is similar.  Unrecording a rmfile patch creates these two
patches in pending.

 H^
 R

When we try to re-record this patch, we get the corrupt pending
patch

 R

David's first fix is enough to correct this.

simplify issue965 test (which took quite a while for me to figure out).
-----------------------------------------------------------------------
The simplification here is to doing fancy things with the editor 'ed'
and just write out the file contents.  It's just part of a general trend
in our test suite, pushing for the very simple and explicit tests.

first-stage fix for issue709.
-----------------------------
>  Here I fix the bug which leads to a corrupt pending being left, with a
>  rmfile change but no hunk removing the contents.  This doesn't fix
>  issue709, since an incorrect pending is left, it's just no longer a
>  corrupt pending (i.e. it's still got the rmfile, but if you record it
>  there's no problem).

>  is_simple :: Prim C(x y) -> Bool
> -is_simple x = is_hunk x || is_binary x || is_setpref x ||
> -              is_addrmfile x || is_addrmdir x
> +is_simple x = is_hunk x || is_binary x || is_setpref x

For reference, here are all the possible primitive patches:
- mv
- rmdir, addir, rmfile, addfile [used to be simple]
- hunk,   binary                [simple]
- tokreplace
- split (obsolete)
- identity
- changepref                    [simple]

This is the kind of patch I don't know how to review :-(

By the way, it might help to (again) explain roughly what it means to
sift, assuming we are always using this word in the same sense, as in
'sift for pending'.

So the questions here I asked myself here are

 1. why do we care if a patch is "simple"?
 2. why should addrmfile and addrmdir not be considered simple?
 3. what are the risks of no longer considering them to be so?

But then I got stuck.  Below is me trying to find the thread.

There is a general pattern in handle_pend_for_add and crude_sift (the
two users of this function) that goes

  ps2 = if all is_simple pending
           then filter f ps
           else ps

Given a sequence of patches that potentially overlaps with pending, if
everything in pending is simple, we will just take the hunk and binary
patches from that sequence; otherwise, we take the whole lot.  By
considering fewer patches to be simple, we are more like to be grabbing
the whole lot.

It almost sounds like this is_simple business is some sort of
optimisation.  Maybe taking the whole lot always works, but in general,
things are a lot faster when we only consider the hunk and binary
patches, provided conditions are safe to do so?  And maybe the nature
of this patch is that we have discovered that it is NOT safe to only
take hunk and binary patches when there are addrm{file,dir} patches.

This function is used by handle_pend_for_add and crude_sift.

handle_pend_for_add: This is used when darcs adds a new patch to the
repository (for example, in darcs pull).  My rough understanding is that
its purpose is to remove primitives from the pending patch which also
appear in the patch we are adding.  The bug was that if the pending
patch contains only AHC, we feel like it's ok to remove H (because A,H,
and C are all simple).  But we should NOT permit ourselves to remove H
because it depends on A.

Thanks to David's change, if the patch we are trying to add contains any
rmfiles, we do not remove the corresponding hunks/binaries, so less the
corruption is avoided.  Now I just don't know if by failing to remove
some patches, we are doing something else that is bad for pending.  If
the is_simple stuff is just an optimisation then I think I can be happy.

crude_sift: this is a slightly more specific version of the pattern
above

  pending2 = if all is_simple pending
             then filter f pending 
             else pending 

where ps == pending

We assume that the filtered sequence has the same starting and ending
context, which I find to be somewhat magical :-/

> hunk ./bugs/issue709_pending_look-for-adds.sh 11
>  darcs init
> +
> +# Here we check whether recording just one of two --look-for-add
> +# addfiles causes any trouble (which it doesn't)

I wonder if the fixes in this bundle will help
http://bugs.darcs.net/issue1073

By the way David, maybe you should set yourself to nosy on all bugs
marked 'ThePendingPatch'

resolve issue709: avoid adding changes to pending in rmpend when possible.
--------------------------------------------------------------------------
rmpend needles haystack (haystack == pending)

>            rmpend (x:>:xs) xys | Just ys <- removeFL x xys = rmpend xs ys

Unchanged code: if the needle x is in xys and we can commute it out of
xys (notice we call this xys because we know that x is in it), then
great, try gettinng rid of the rest of the needles xs.

> -          rmpend (x:>:xs) ys = case commuteWhatWeCanFL (x:>xs) of
> -                               a:>x':>b -> (invert (x':>:b) +>+) `mapSeal` rmpend a ys

Old logic: So here we wanted to commute the x needle out of pending but
we couldn't!  To understand this, let's revisit the issue709 example
from above, only adding another orgthonal patch O so we can explore the
whole code.  We want to remove the added patches AHO from the pending
patch AH-C-O (the dashes should be read as just whitespace).  Well, we
cannot commute A out of AH-C-O because C depends on A (you can't set
boringfile to something that does not exist).  So we enter this case.

Don't panic.  First let's use commuteWhatWeCanFL to figure out what
parts of AHO actually depend on A.  We see from this result:
 O :> A :> H
that O does not depend on A, so lets remove it from pending if possible
(this results in just C).

The buggy bit: Second, let's invert what's left (AH => H^R) and prefix
that to pending.  Whoops!  This results in pending H^RC, which is not
what we wanted.

> +          rmpend (x:>:xs) ys =
> +              case commuteWhatWeCanFL (x:>xs) of
> +              a:>x':>b -> case rmpend a ys of
> +                          Sealed ys' -> case commute (invert (x':>:b) :> ys') of
> +                                        Just (ys'' :> _) -> seal ys''
> +                                        Nothing -> seal $ invert (x':>:b)+>+ys'
> +                                        -- DJR: I don't think this
> +                                        -- last case should be
> +                                        -- reached, but it also
> +                                        -- shouldn't lead to
> +                                        -- corruption.

A more sophisticated attempt is to check if we can then commute H^R and
C... if we can then we just ignore them and return C' (which fixes
issue709).  These changes *should* be able to commute out (maybe we
should emit a warning), but if not, we default to the old wrong behaviour.

I still find this a little mysterious.  If we're trying to commute xb :>
ys' does that not mean that we are saying that ys' has xb as its
context?  I wonder if pending would make more sense if we viewed it from
the other end (as an RL) and tried to pop stuff off it instead.

By the way, potential documentation for commuteWhatWeCanFL.
commuteWhatWeCanFL p xs will commute p down xs as far as possible giving
us

  what_we_can :> p' :> what_we_cannot

Note that xs gets re-ordered as necessary, so that we only include a
patch in what_we_cannot it either depends on p, or depends on something
that does.

resolve issue885: fix patchSetToRepository to work with hashed.
---------------------------------------------------------------
> -patchSetToRepository :: RepoPatch p => PatchSet p C(x) -> [DarcsFlag] -> IO (Repository p C(r u t))
> -patchSetToRepository patchset opts = do
> +patchSetToRepository :: RepoPatch p => Repository p C(r1 u1 r1) -> PatchSet p C(x)
> +                     -> [DarcsFlag] -> IO (Repository p C(r u t))
> +patchSetToRepository (Repo fromrepo _ rf _) patchset opts = do
> +    when (format_has HashedInventory rf) $ -- set up sources and all that
> +       do writeFile "_darcs/tentative_pristine" "" -- this is hokey
> +          repox <- writePatchSet patchset opts
> +          HashedRepo.copy_repo repox opts fromrepo

Ok, I understand this change (if the repo we are fetching from is
hashed, then create a hashed inventory so that
HashedRepo.finalize_tentative_changes does not get confused).

Some possible ways to reduce the hokeyiness footprint:
 - only pass in the desired repoformat?
 - factor HashRepo.finalize_tentative_inventory out of
   HashRepo.finalize_tentative_changes and only call the former
   in writePatchSet (this also has the benefit of us keeping our
   promise that we do not touch pristine)

resolve issue1012: it seems to be fixed by the fix for issue709.
----------------------------------------------------------------
> David Roundy <droundy at darcs.net>**20080916173116] move ./bugs/issue1012_unrecord_remove.sh ./tests/issue1012_unrecord_remove.sh

Cool! I'm very pleased we have these two pending regression tests under
our belt.

remove now-unused is_addrmfile and is_addrmdir.
-----------------------------------------------
no comment

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.osuosl.org/pipermail/darcs-users/attachments/20080917/b10b26c9/attachment.pgp