[darcs-users] switching all references by patches to be by hash only
Max Battcher
me at worldmaker.net
Mon Aug 24 06:35:47 UTC 2009
On 8/24/2009 1:50, Nathaniel W Filardo wrote:
>> The problem with using just the file hash is that the object you want
>> may just not be on the repository you're reading from. You may be able
>> to reconstruct it through commutation, but then you'll need the regular
>> patchinfo stuff to do that.
>
> First off, it's probably rude of me to send you a context file and a pointer
> into my repository for which I am unable to provide the backing store.
> Equivalently, I should ensure that every context file I touch either has its
> dependencies satisfied locally or that I emit a modified version which does.
Possible tangent here: but "emit a modified version which does" isn't
always a trivial operation. There could be discussions on making nice
commands/extensions to simplify things like "create a context against
this branch and include the full details of patches that don't exist in
this mainline branch"...
> Secondly, there's no check currently that two patches which are decorated
> with the same patch info are actually different commuted forms of each
> other. This is Zooko's point about the failure of security in darcs -- it's
> possible to provide you a bogus patch that looks right. Switching to hashes
> as identifiers closes the bogus-return hole at the expense of requiring
> either more time (to generate a locally relevant context file) or space (to
> store all the different commuted forms of a patch).
>
> As an alternative proposal, since commutation is going to alter the content
> of a patch -- but merely its place, rather than real "content" -- we could
> instead give each file a unique identifier (other than its name; cf.
> arch/tla) and discard positional data of hunks when hashing the patch. Such
> an identifier should be unmolested by commutation and may therefore stably
> name a patch; at least one set of location data would have to be provided in
> order to commute such a thing, but that's probably OK. This feels similar
> to http://web.mornfall.net/blog/patch_formats.html .
Eric's wiki page on hashes is actually useful here: darcs already has a
patch hash that is idempotent with respect to commutation (the one
output by ``darcs changes --xml-output``). I'm wondering if a possible
approach to a "hash-only" context format might use a two-hash system:
the patch hash gives us a representation of the "patchinfo" stuff and
the patch contents hash provide a contents hash that can be used to look
up if that specific commutation of the patch is already available in
caches and other available sources. (This isn't terribly far from the
modern inventory format, albeit with the full patchinfo still...)
With something like Base-64 encoding you could end up with incredibly
pithy two-hash context files...
It is still not "secure", in any of the ways that Zooko has brought up.
Albeit, faking two hashes that meaningfully work together in series with
all the other patches in a series/repo/context may take a lot of work or
a lot of luck to accomplish...
I'm just throwing that out there, I haven't thought very far through
this line of logic just yet, but it does seem like an "obvious" approach
to the problem, for the moment at least. (Still forgetting all the fun
problems like context file versioning and backwards compatibility for
the time being.)
--
--Max Battcher--
http://worldmaker.net
More information about the darcs-users
mailing list