[darcs-users] Language aware darcs

Thu Jan 13 12:42:05 UTC 2005

On Wed, Jan 12, 2005 at 07:00:31PM -0500, Michael Conrad wrote:
> On Wednesday, January 12, 2005 9:06 AM, Ketil Malde wrote:
> > And it needs to be told explicitly of any changes to it, like
> > e.g. moving a file.
> 
> Very true.  Automating that (to some extent) would be necessary in order to
> implement the idea.  And, I noticed another flaw in my description. While
> the filesystem and parse trees are both trees, the file system is a Map of
> Maps, where the parse tree is a List of Lists.

The parse tree doesn't *exactly* need to be a List of Lists, since at least
parts of it may be referenced by name.  So perhaps it should be a list of
(possibly) named lists? Although perhaps naming the items in the lists
isn't necesary, it's worth considering whether a modification should be to
the "third function" or "function foo".

> So, to refine my earlier statements, a super-robust version control system
> would need to be able to manage both Maps and Lists in an abstract way such
> that they could contain lists and maps as elements. (note that if we can
> support maps, we can inherently support "Sets")

Indeed.

> If this were succesful, then a tree of text files becomes a map of maps [of
> maps ...] of lists of text lines.

But I think it's easier and faster (as far as execution time goes) to
develop the concrete implementation first.  We have implemented hunk
modifications to lists, but haven't yet implemented hunk moves in lists,
which would be a very useful feature, but would have somewhat challenging
commutation behavior.

We've also already implemented an "item modification" patch, the replace
patch, which (possibly) modifies all items in a list.

Another possibility for a patch is a "line modification" patch, which
modifies a given line (or perhaps set of lines?) within a file.  Such a
patch would be potentially able to commute with hunk patches, provided the
line modification (e.g. indenting, or some such change) can be commuted
with the block modification.  This is also where perhaps word-within-a-line
patches (see other email I just sent) could be categorized.

My point being that I think many of the abstract patch types that we might
like to implement have concrete realizations within the current line-based
framework, and that implementing them now will allow us to more easily
develop the algorithms which can then later be generalized to more
interesting trees than a simple list of lines.

> If we wanted to manage text files where words were important, and lines were
> not, we could break the file into a list of words rather than a list of
> lines.
> 
> And, the load/save code could still be disconnected from the commutation
> code.  The loader would use whatever special algorithm was desired for
> breaking a file into Lists/Maps, the diffing and commutation and resolution
> code would handle the data, and the saver would put the data back in such a
> way that the loader would end up with the exact same data tree.

Indeed, the load/save code is already disconnected from the commutation
code.  True, there is only one load/save code, and the patch format doesn't
allow for more, but 

> > As a relatively non-intrusive and apparently-useful feature, wouldn't
> > better handling of directory structure be a good starting point for a
> > more tree-oriented darcs?  Detecting file moves, for instance?
> 
> Yeah, I think that would be a good place to start.  I guess I meant that the
> parsing end of the code would be really easy to implement, since it would be
> separate from the core of darcs.  As for implementing the diff algoritms for
> nested maps and lists, well, I'm sure someone would find it intellectually
> challenging and rewarding ;-)

There are three questions here:

1) Creating patch types to describe more complicated
   transformations such that they (or a useful subset) will commute with
   one another.  If they don't commute, they won't be useful, since you'll
   be perpetually getting conflicts.

2) Defining the actual commutation.  For more "interesting" patch types
   (e.g. hunk moves, which did exist in an earlier version of darcs) this
   can be a real challenge.  But without "interesting" patch types, there
   isn't much to be gained.  I.e. the whole point of new patch types is to
   increase the probability of commutation, and decrease the probability of
   conflicts.

3) Automatically determining the patches, i.e. a diff algorithm.  This is
   really secondary, and is only a user interface question.  That is, any
   old diff algorithm will work, and it can be improved incrementally.  Or
   one could manually specify the changes, and then only later implement a
   diff algorithm.  This is an interesting problem, but really isn't useful
   until the first two have been solved, which suggests that automatically
   finding file moves would be a good first attempt at an "interesting"
   diff algorithm, at least for now.

The issue of loading/saving parse trees, as you say is essentially
orthogonal to these issues except for the first one.

> > > While this isn't currently part of darcs' goals or design, it seems so
> > > appealing to me that I can't help but mention it.
> >
> > Your proposal does sound intriguing, but (perhaps I lack the necessary
> > imagination) I for one am quite happy with handling text for now.  To
> > me, it seems more important to have features like breaking up patches
> > on recording, hunk move detection and patches, and perhaps support for
> > finer granularity diffs and patches (say words instead of lines, or
> > like ediff's refinement).
> 
> Although as the feature requests keep coming in, it seems that an abstract
> approach might be a lot easier than continually trying to implement specific
> features without breaking things.  The abstract approach might actually save
> time in the long run.

In a sufficiently long run, probably abstracting things will help, but on
the other hand the implementation won't change when the types are
abstracted, so implementing it first does seem like a good idea.
I.e. generalizing the commutation of a hunk would just be the equivalent of
a change from

commuteHunk ::

to

commuteGeneralHunk :: Eq a => ((Int, [a], [a]), (Int, [a], [a]))
                           -> ((Int, [a], [a]), (Int, [a], [a]))

without changing any of the actual code.  Supporting nested Lists will
definitely be more challenging, but the parse trees for most languages are
not symmetric (at least not at all levels), so we'd most likely still have
a flat top-level List, which would commute in precisely the same way that
the existing hunks do (and that the proposed hunk moves would).
-- 
David Roundy
http://www.darcs.net