[darcs-users] patch file naming

Thu Mar 18 12:43:42 UTC 2004

On Wed, Mar 17, 2004 at 08:09:27AM -0800, Kevin Smith wrote:
> It *might* be possible to hash a canonical representation of the 
> conceptual patch that is implemented by the file. But canonical 
> representations have not yet been determined, and I haven't thought 
> about it long enough to even know if there could even be a such thing as 
> a canonical patch format in darcs.

I tried for about six months to find a canonical patch format, and failed,
which doesn't mean it's impossible, but does mean that I'm not going to do
it.  :) Basically, if we had a canonical patch format, darcs could be much
nicer and smarter when dealing with merges (which is why I was working on
it), but the problem is that there is almost nothing that is canonical
about a patch.  The filename could change, the patch contents could change,
almost anything could change.  I suppost it might be possible to create an
invariant content hash, which wouldn't include the filename or the contents
of lines modified... but that would seem to be too closely coupled with
what *other* patch types exist, and would thus prevent future extension of
darcs to support new patch types (or would overly restrict the commutation
of said new patch types).

> Your problem case does concern me a bit, but doesn't put me in to a 
> panic. darcs should probably refuse to record two patches within the 
> same timestame frame (1 second, I believe). That still wouldn't prevent 
> all problems, but probably eliminate any accidental cases.

Really, darcs should detect on record whether you're recording a duplicate
name (i.e. with the same date as well) and if so to wait a second to do the
record.  The problem with this is that it requires reading the entire repo
inventory on each record, which seems likely to cause quite a performance
penalty for large repositories.  I guess I could add a flag to skip this
check, but I've only seen this is as a problem when someone is creating an
artificial test case.

I guess this performance hit is probably only going to be a problem for
something like an automated conversion of a CVS repository.  On the other
hand, since my computer just spent nine days converting a (large) CVS
repository, I'm hesitant to slow that down by a factor of ten or
so... which is what I'd guess would be the hit for my particular test
case.  Obviously the larger the repository, the larger the performance
penalty for this check.
-- 
David Roundy
http://www.abridgegame.org