[darcs-users] switching all references by patches to be by hash only

Max Battcher me at worldmaker.net
Mon Aug 24 06:35:47 UTC 2009


On 8/24/2009 1:50, Nathaniel W Filardo wrote:
>> The problem with using just the file hash is that the object you want
>> may just not be on the repository you're reading from.  You may be able
>> to reconstruct it through commutation, but then you'll need the regular
>> patchinfo stuff to do that.
>
> First off, it's probably rude of me to send you a context file and a pointer
> into my repository for which I am unable to provide the backing store.
> Equivalently, I should ensure that every context file I touch either has its
> dependencies satisfied locally or that I emit a modified version which does.

Possible tangent here: but "emit a modified version which does" isn't 
always a trivial operation. There could be discussions on making nice 
commands/extensions to simplify things like "create a context against 
this branch and include the full details of patches that don't exist in 
this mainline branch"...

> Secondly, there's no check currently that two patches which are decorated
> with the same patch info are actually different commuted forms of each
> other.  This is Zooko's point about the failure of security in darcs -- it's
> possible to provide you a bogus patch that looks right.  Switching to hashes
> as identifiers closes the bogus-return hole at the expense of requiring
> either more time (to generate a locally relevant context file) or space (to
> store all the different commuted forms of a patch).
>
> As an alternative proposal, since commutation is going to alter the content
> of a patch -- but merely its place, rather than real "content" -- we could
> instead give each file a unique identifier (other than its name; cf.
> arch/tla) and discard positional data of hunks when hashing the patch.  Such
> an identifier should be unmolested by commutation and may therefore stably
> name a patch; at least one set of location data would have to be provided in
> order to commute such a thing, but that's probably OK.  This feels similar
> to http://web.mornfall.net/blog/patch_formats.html .

Eric's wiki page on hashes is actually useful here: darcs already has a 
patch hash that is idempotent with respect to commutation (the one 
output by ``darcs changes --xml-output``). I'm wondering if a possible 
approach to a "hash-only" context format might use a two-hash system: 
the patch hash gives us a representation of the "patchinfo" stuff and 
the patch contents hash provide a contents hash that can be used to look 
up if that specific commutation of the patch is already available in 
caches and other available sources. (This isn't terribly far from the 
modern inventory format, albeit with the full patchinfo still...)

With something like Base-64 encoding you could end up with incredibly 
pithy two-hash context files...

It is still not "secure", in any of the ways that Zooko has brought up. 
Albeit, faking two hashes that meaningfully work together in series with 
all the other patches in a series/repo/context may take a lot of work or 
a lot of luck to accomplish...

I'm just throwing that out there, I haven't thought very far through 
this line of logic just yet, but it does seem like an "obvious" approach 
to the problem, for the moment at least. (Still forgetting all the fun 
problems like context file versioning and backwards compatibility for 
the time being.)

--
--Max Battcher--
http://worldmaker.net


More information about the darcs-users mailing list