[darcs-users] darcs conflicts/dependencies -- is patch theory the place to start?

Wed Sep 19 06:25:58 UTC 2012

AntC <anthony_clayden <at> clear.net.nz> writes:

> ...

OK, here's a very speculative approach for how to deal with hunk changes in a 
context-independent way. (Compare my post that talks about the approach for 
file-id's and tracing the 'same' file through repos and moves 
http://lists.osuosl.org/pipermail/darcs-users/2012-September/026682.html  )

I'm aiming to support the same behaviour and validation as darcs hunk changes, 
not introduce any of the semantic-oriented wishfulness that's been discussed 
in this thread.

The VCS gives each text-line an 'internal' id that is:
- unique across all repos; and
- persistent wherever the line goes,
  or however its position in the file changes.

Since each line must have got into the repo only through a hunk change, the 
line-id is to be a pair (hunk-id, offset) where:
- hunk-id is the ppid of the hunk change, or a guid, or whatever suits
- offset is the relative position of the line in the text getting inserted

To spell out the consequences:
- if the line shuffles up/down the file because of inserts/deletes around it,
  that doesn't change the (hunk-id, offset)
- if some lines of the hunk later get deleted,
  that doesn't change the (hunk-id, offset) of the lines that remain.
  So there'll be a 'gap' in the offset sequence.
- if some lines get inserted in the middle of the hunk,
  that doesn't change the (hunk-id, offset) of the original lines.
  So there'll be a 'intruder' amongst the offset numbering.
- if some lines get moved around the file (or moved to a different file)
  that doesn't change the (hunk-id, offset) of the moved lines.
  So there'll be offset numbering out of sequence.

This implies there's a move-lines operation, for similar purposes as darcs' 
move-file operation. I think it also needs a copy-lines operation.

We need the VCS to maintain two 'internal' maps:
- (file-id, line-num) -> (hunk-id, offset)
- (hunk-id, offset) -> content

Notes:
(Yes, I know the file-to-line structure is horribly inefficient, and fails to 
capture that lines are in contiguous blocks, but for now I'm just trying to 
get the idea across.)

(The line-nums don't have to be integral, nor do they have to be sequential. 
Their only purpose is relative ordering of the lines. I've seen one cunning 
scheme where the line-nums were rationals, so that for line insertions, the 
line numbering interval could be arbitrarily sub-divided.)

(Replace token changes the content, but retains the (hunk-id, offset) and line 
position. I'm assuming that replace token can't merge or split lines.)

(A given (hunk-id, offset) could appear in more than one file, or even more 
than one position within the same file. That would be the result from a copy-
lines.)

Then a hunk-change patch is represented as a triple of:
- (hunk-id, offset) position for the operation
- a list of (hunk-id, offset, content) to delete, and
- a list of (hunk-id, offset, content) to insert.
(Where the to-insert hunk-id is this patch's ppid.)
(Either of those lists could be empty, meaning this is a delet-only or insert-
only operation.)

Pre-condition for applying a hunk patch:
- all of the to-delete (hunk-id, offset)'s must exist in the target repo;
- and their content must match the to-delete content;
- and they must appear contiguously and in the same relative sequence,
  within some file
  (not necessarily the file they came from in the source of the patch)
- none of the to-insert (hunk-id, offset)'s must exist in the target repo.
  (that would be a duplicate patch)

Applying the patch is just like hunk change:
- delete the to-delete lines
- insert in their place the to-insert lines
- those might be different numbers of lines,
  so renumber the (file-id, line-num)s throughout.

One wrinkle: as described so far, there's no way to insert fresh lines 'after' 
existing lines.
- Especially, we can't insert fresh lines into an empty file,
- or into the end of a file.
But notice in this case that darcs' insertion point is last-line-num + 1. 
(That is, line number 1 for an empty file. This is a 'ghost': there is no line 
number 1.)

I'll use the same trick:
- every hunk map has an extra 'invisible' after-last line
  (where its content is <endofhunk> or somesuch)
- if a hunk operation deletes all lines from a hunk,
  that after-last line remains
- and every file map has an extra 'invisible' after-last line
  that points to that remnant.
- so a newly-added file automatically gets a (invisible) hunk,
  we might as well have hunk-id same as file-id.

AntC