[darcs-users] darcs conflicts/dependencies -- is patch theory the place to start?

Tue Sep 18 22:06:30 UTC 2012

Kevin Quick <quick <at> sparq.org> writes:

> 
> I don't know if I'm helping the discussion: it seems we are exploring  
> theoretical VCS infrastructure so I'm enjoying our semi-abstract discourse  
> ...

Heck! So many new postings overnight (NZ time). It's very generous of darcs-
users to host this discourse, I hope we're not imposing on them.

I'm trying to keep it not-too-abstract. I find that many of the abstract VCS 
models fall down when it comes to being able to observe the programmer's 
intent from changes in what L/S/L call the 'external representation' (that is, 
the file system and its contents). (Perhaps that should say "guess" instead 
of "observe".)

I'm trying to think up a way to get from darcs' hunk changes (which are 
precisely observed) to a context-independent representation for patches. And 
since that's very hard, I'm starting with file system changes, where it's more 
natural to think of a file having identity and persistence.

> 
> > I'm suggesting the file id be the ppid of when it got added, to help  
> > with the
> > book-keeping. (I'm assuming this can also tell us in which repo the file
> > started life.)
> 
> For your purposes wouldn't any guid suffice?  I'm not sure that the source  
> repo is useful.
> 

I don't think there's anything stopping a ppid being a guid(?) What I'm 
thinking is that the addfile happens when the repo is standalone, but we need 
to generate a file-id that is guaranteed unique across all repos the file 
might get pulled/merged to. (I'm assuming darcs already handles this with a 
ppid mechanism?)

And in particular, if a hunk change to the file gets pulled later, it needs to 
target exactly that file wherever it's gone or whatever its name. (Note that 
the hunk change might get pulled into the repo the file was first added into.)

> > This discussion is all by way of warming-up for dealing with
> > hunk changes, where we need to implement some sort of line-id, and  
> > detect line movements.
> 
> I'm not sure line mapping follows from file mapping.  The file tends to be  
> a long-lived entity even though it's contents morph over time/patches.  A  
> particular line doesn't really have a "rename" operation;

Not so fast! (IMHO)
- a token replace changes the content of a line, but not its identity
  (arguably your example s/ready/!ready/ is like that)
- refactoring is typically moving lines to different positions
  (maybe in a different file)
  I want to retain the identity of those lines
- arguably, copying a line retains its identity

Then the idea is that a bug-fix change to a specific line (or small group of 
lines) can be pulled, and apply to those lines in their refactored position. 
(Obviously that can't work willy-nilly. What are the pre-conditions? -- that's 
the hard part!)

> 
> That said, I think there would be great value to a VCS that is  
> context-aware (as has been discussed previously in this thread) and  
> perhaps degrades to line-oriented management if no context can be  
> determined.  Two brief examples: ...
> 

Whoa! I think this is trying to run before we can walk. For the time being, 
I'm going to stick with line-based changes, with effects equivalent to darcs 
hunks.

What you're getting in to with the examples is more like editting the Abstract 
Syntax Tree, with specific knowledge of the semantics in the source file's 
language. Editting xml files would need something similar (per the Lempsink 
work).

Perhaps (one day) we could imagine plug-ins to the VCS that observe changes in 
a syntax-directed way, according to the file's type(?)

Contrast the situation if our repo had the semantics of a (relational) 
database:
- patches would be just sets of tuples to delete and insert
  (in the various relations)
- each tuple is 'self-identifying' because it carries key(s),
  as specified by the data model
- conflicts are attempts to insert duplicate keys
- dependencies are attempts to insert a tuple with a referential constraint
  (aka foreign key, aka inclusion dependency)
  (or attempts to delete a tuple that breaks a reference)
- we don't need to describe the conflict/dependencies (pre-conditions)
  in the patch semantics or the VCS;
  Because they're declared as constraints in the data model.

I see the L/S/L model as an attempt to turn text-based files into a relational 
model. The difficulties they're struggling with are:
- there's no equivalent to a 'key' in a line of text
- indeed, its perfectly possible to have many lines with the same content
- a file can be renamed/moved
  (whereas renaming a relational table (relvar) causes all sorts of trouble)

AntC