[darcs-users] DRAFT Proposal: Navigating the space of versions using Tree hashes
kowey at darcs.net
Thu Sep 9 07:06:07 UTC 2010
On Wed, Sep 08, 2010 at 20:26:50 -0700, Jason Dagit wrote:
> That should be pretty clear if you read my proposal or glance at the
> older proposals. My proposal does not use the context to generate the
> hash but instead the pristine state. Everything else is an
> implication of that.
Doh! Lead-by-example fail. Apologies for failing to do my prior
discussion research in asking you for prior discussion research and
thank-you for making the key difference/implications explicit!
Indeed I can now see how hashing the pristine state as opposed to the
inventory has the important implication of being independent of patch
I'll return to the meta-commentary at the end of my email, wherein a
make a plea for signposting to help the triage-y quantity-over-quality
bozos of the world.
> I'm pretty sure that prior to using hashed storage darcs never stored
> a hash of any states. David would have treated a hash of a tag to be
> equivalent in meaning.
Can that be right? I asked Petr if the pristine hash entry in
_darcs/hashed_inventory has been around since darcs 2.0.0, way before
the hashed storage days, and he said it was.
This is partly where I was lead astray because for some reason I thought
David's proposal was using this fact even though re-reading it, it's
obvious he's just talking about the fact that we point to tag
inventories with their filecontents hash (which would be "secure").
> I'm assuming you didn't read the IRC discussion either. We came to
> the conclusion that it's actually really easy to get identical
> pristine states from different *sets* of patches. Meaning, not just
> different permutations of the inventory. This essentially means it's
> meaningless to talk about identical pristine state
Let's see if we can be a bit more precise than meaningless. I tried
to summarise the proposals so far (as I understand them) on
Hopefully the rest of you can correct any of my blunders.
The list of desiderata from it were:
- identify pristine state (id guarantees we hve same pristine)
- identify patch set (guarantee we have same patch set)
- accepts patch reordering
- rejects false patch contents (patch info for A but something else in contents)
- rejects false ordering even of true patch contents (clever and malicious)
Dunno if that's wanting the right thing, and if I'm reading you correctly,
you're saying that the identify patch set wish is actually maybe not so
important in practice?
> It was generally agreed that hashing the inventory leads to the right
> solution. It's hard to say what order the inventory will be in, so as
> a relaxation of hashing the inventory, it was suggested that we could
> sort patchinfos and then hash the inventory. Hashing the entire
> inventory might be expensive. The proposed actual implementation
> schemes use optimizations such as only hashing the inventory after the
> most recent tag. Personally, I'm not convinced that those
> optimizations are guaranteed to work. Perhaps incremental hashes of
> the inventory could be used instead. Also, Initially I thought that
> sorting the patch infos before hashing was problematic, but I'm less
> concerned about that now. I was thinking it was an aggressive over
> approximation of the property that your repo is defined by the set of
> patches. Now I think it may be fine.
I have to run, so I'm afraid I have to skip thinking about this paragraph,
sorry again! Is this something we could add to the page above maybe?
Why Signposting is Helpful
[wrote this bit first]
Folks who may have seen me interacting with darcs-users may recall me
failing rather spectacularly at reading from time to time. So what can
we do about this sort of Erician bozotude? Well, Eric could be less of
a bozo, which would be good, and which I do keep promising.
A complementary/parallel solution, for coping with bozos of the world
is adopt a technique of signposting, for example:
BAD: My proposal captures repository state using the pristine hash
GOOD?: Current proposals talk about hashing inventory contents to
capture state, but mine is different because it instead uses the
pristine hash to capture state.
Or as I've similarly requested to Petr on IRC:
- Single "start here" link to the hashed-storage
doc pointing out which module in the haddock to read first
- How-this-compares with cmdargs bullet points
to the cmdlib documentation
- Similar how-this-compares with pathtype for pathlib
The reason I try to make signposting an integral part of my writing if
I can remember to do it is that
A. some people (well me) adopt a two pass triage-based approach to
coping with information overload and
B. somewhat orthogonally, some people (well, me) have a bad habit of
systematically skimming/scanning when they should be reading.
The benefit of an A strategy plus B vice that they allow me to churn
through a lot of messages, ie to read and tag all my email. The
downside is that many times, I may have read all my email, but read it
So compounding my natural dimness, A+B also mean I make a lot of
mistakes and miss things which should be obvious!
People aren't always going to connect dots even if you shove them in
their face, partly for want of connection skillz and partly due to
fuzziness of dot vision.
We readers should do a better job with pre-existing documentation, but
also for maximum effectiveness, we writers should strive to master the
fine art of making what is obvious in our heads (these connections)
obvious on paper. Damned near impossible, if you ask me, but we gotta
PS: reading the IRC logs makes me think that Em
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
For a faster response, try +44 (0)1273 64 2905 or
xmpp:kowey at jabber.fr (Jabber or Google Talk only)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 195 bytes
Desc: not available
More information about the darcs-users