[darcs-users] [planet darcs] Automatic detection of file renames for Darcs

José Neder jlneder at gmail.com
Thu Aug 22 19:41:38 UTC 2013


It pretty much impossible to mis-guess a move unless you really try
it, because it means that there are different files with the same
inode, and the only way would be to use the same file but change the
content and use it like another file. I mean the probability to have
the same inode number in a new file is low(once you have deleted the
old file of course), and that a file with the same inode ends in the
same repo is even lower, much lower.
I think the inode number used by a new file is always an increment
from the inodes used before, so unless you cover all the numbers in a
64 bit range, you will not face with the same number twice. Maybe i'm
wrong.
If you restore a file from a backup not tracked by darcs repository
renaming the backup to the filename in question, you wouldn't see any
change like without "--look-for-moves" because it detects "moves" and
not adds and/or deletes. If you change the name you will see a remove
like you used to see without ""--look-for-moves".
Exemplifying:
~> touch foo
~> darcs record -al -m "foo added"
Finished recording patch 'foo added'
~> cp foo foo.bak
~> mv foo.bak foo
~> darcs wh --look-for-moves
No changes!                                  [1]
~> mv foo foo2
~> darcs wh --look-for-moves
rmfile ./foo                                     [2]

[1] there are not moves here so there are not changes, even if foo has
another inode number, --look-for-moves only marks moves and not
removes or adds.
[2] darcs lost track of the file because the backup doesn't have the
same inode number

but if you do this:
~> touch foo
~> darcs record -al -m "foo added"
Finished recording patch 'foo added'
~> cp foo foo.bak
~> mv foo foo.bak2
~> mv foo.bak foo
~> darcs wh --look-for-moves
move ./foo ./foo.bak2                      [1]

[1] darcs doesn't lose track of the file because it is still in the
repo so it marks a move even when you maybe wants not to be.
In this case i you should turn off --look-for-moves flag to get the
desired behavior.

It should not lead to "data corruption" unless you do that in which
case I guess it was on purpose and there is not data corruption.

I think an user can and should blindly accepts the moves if she/he
know that limitation.

Boring files are filtered by default, so not problem there.

It would be nice to have an algorithm to detect similar files/hunks
but, how much similarity is enough? this can not be generalized, the
only way is trial and error with a lot of examples and even there it
would be nice to have the choice to give the percentage threshold of
similarity like an option. Is like when you use the fill tool in gimp
to paint a zone with a color, without the option to select the
threshold the tool would be pretty useless in a lot of cases.

2013/8/22 Ganesh Sittampalam <ganesh at earth.li>:
> Hi,
>
> Just to clarify: as I understand the rename detection with inodes is
> just being used as a heuristic. It will be used to suggest to users that
> they might want to record a change as a rename, but it won't directly
> lead to any kind of data corruption if darcs mis-guesses the situation.
>
> Is that correct?
>
> If the user blindly accepts the offered renames, or uses record -a, then
> there still could be a problem as a dodgy patch could get recorded and
> cause trouble further down the line.
>
> One thing to watch out for is that renames to boring files are handled
> correctly (i.e. rejected/ignored at least by default).
>
> Another thought is that even though inodes are a really nice way to
> capture user actions, it would be good to also have an alternative
> approach such as detecting similar content, for cases the inode tracking
> doesn't pick up. Detecting similar content would also be useful in
> future for inferring hunk move patches once we support those.
>
> Cheers,
>
> Ganesh
>
>
> On 22/08/2013 17:54, José Neder wrote:
>> I guess is a little delicate. Maybe there is some editor that write a
>> backup and then rename, in my experience i haven't had any problem
>> with gvim, and scratch(elementary editor). Obviously restoring from a
>> backup would confuse the algorithm and if you rename the file at the
>> same time it would lose track of it. If you check out an old version
>> there is not problem, at checkout the inodes in the index are updated.
>> I am interested in the bzr implementation, if you could throw me a
>> link or more info about it. I made a quick search but i didn't found
>> anything about it.
>>
>> 2013/8/22 Stephen J. Turnbull <stephen at xemacs.org>:
>>> AntC writes:
>>>
>>>  > [inode, etc] seems an easy way to keep track of files. Do other
>>>  > VCS's use it? Is there some reason darcs hasn't used it before?
>>>
>>> Sounds very delicate to me.  Points that would worry me: The ambiguity
>>> of multiple links to the same file.  Breaking of hard links on edit by
>>> some editors and not others.  Resurrection of deleted files (whether
>>> by checking out an old version or restoring from backup).
>>>
>>> bzr (bazaar.canonical.com) does track "containers" (ie, it could know
>>> which on-disk data is the original and which the copy, and it
>>> distinguishes between remove from repo + readd under a new name from
>>> an atomic rename), but it doesn't track inodes.  It has its own
>>> database and identifiers for this.
>>>
>>> _______________________________________________
>>> darcs-users mailing list
>>> darcs-users at darcs.net
>>> http://lists.osuosl.org/mailman/listinfo/darcs-users
>> _______________________________________________
>> darcs-users mailing list
>> darcs-users at darcs.net
>> http://lists.osuosl.org/mailman/listinfo/darcs-users
>>
>


More information about the darcs-users mailing list