[darcs-users] Re: Parsing Patches (was: Where Arch is going)

Sat Jun 4 15:53:20 UTC 2005

On Fri, Jun 03, 2005 at 03:14:55PM -0400, Max Battcher wrote:
> David Roundy wrote:
> >Another interesting problem
> >would be that of creating a sort of lexing/parsing language that would
> >allow customized patch types that are specific to a particular programming
> >language.  This is a particularly hard problem, as you'd need to have the
> >parsing always succeed and always give meaningful (and reasonable) results.
> >And the resulting patches would have to merge and commute in a meaningful
> >and useful manner.

Just a few comments here.  This is indeed something I don't have time to
work on right now, and reading/writing long emails on the subject takes
quite a bit of time.  So I'll be brief, and will be happy to discuss this
later (maybe a year or so) when I *do* have time to actually do something
about this.

> I understand why you think it is important to create meaningful patches 
> for non-parsing code for those instances when you want to save unstable 
> works in progress.

Making "fancy" patches apply to any file (or perhaps rather an
easily-defined subset of files) is probably important in making the
commutation simple.

> But, I think there might be a good case for restricting such a patch 
> type to parsing documents (and using hunks if it won't parse).  Proper 
> parsing documents make much better semantic sense (and thus better 
> cherry-pickable patches).

The trouble is that if you do this any hunk patch will almost certainly
conflict with any "parse" patch.

> A file can be "canonized" (pretty printed) from a tree structure:
> 
> [A]' = Canon<T>(B<T>)
> 
> [A] is not necessarily equal to [A]'.

I don't consider this acceptable.  It depends what data you're storing, but
the whitespace in my C++ programs is very often important in making it easy
to read, and I'd be upset if recording a patch messed that up.

For machine-generated XML documents, on the other hand, this would probably
be fine.  But I don't use those, so they aren't so interesting to me.  The
parser I'm interested in would be one that is reversible.  In some ways
this is more of a pain than an irreversible parser (since you don't get to
throw information away), but in other ways it may be simpler, since it
means you may be able to construct a smallish set of reversible primitives
from which you construct the parser, and one might be able to determine the
commutation behavior of those primitives in some sense.
-- 
David Roundy
http://www.darcs.net