[darcs-devel] [patch1823] v3: basic implementation and tests

Sun Jul 7 11:37:04 UTC 2019

Ganesh Sittampalam <ganesh at earth.li> added the comment:

(Sorry for the delay in following up on this)

On 15/06/2019 15:01, Ben Franksen wrote:
> 
> Ben Franksen <ben.franksen at online.de> added the comment:
> 
>>> Again, certainly possible. But remember that the PrimPatchId is
>>> part of the on-disk representation. Using the full meta-data
>>> would make patches quite a bit larger, slowing down read and
>>> write of patches and also equality tests. And since we read 
>>> PrimPatchIds from disk, we cannot (at least not automatically) 
>>> share equal PrimPatchIds in memory, so memory requirements
>>> would also go up significantly.
>>
>> Yeah. I was thinking that the on-disk representation could elide the 
>> PrimPatchId when it's the same as that of the containing patch, so
>> it'd only be needed for conflicts.
>>
>> When reading patches, this would also implicitly mean we get back 
>> most of the sharing.
>>
>> A pointer is smaller than a SHA1, so this would be a win for the
>> non-conflicting case.
> 
> Okay, this would solve the space issue (on disk and in memory), assuming
> the repo doesn't contain too many conflicted patches.
> 
>> We could also choose to use hash-consing to get back sharing for
>> the conflicting case, at the cost of some speed when reading, and
>> extra code complexity in reading patches.
> 
> Let me flesh that out a bit and please correct me if I misunderstood:
> 
> So the idea here is to use a global (per repo) lookup table from SHA1
> hashes to meta-data. That would allow us to use the optimized on-disk
> representation for Conflictors, too, i.e. the current one base on
> PrimPatchId, but use the real meta-data in a shared way in memory.
> Access to this table is of course impure, we would need to hide the
> necessary unsafePerformIO stuff behind some hopefully semantically pure
> abstraction.

Yes. It doesn't necessarily need to be a global with unsafePerformIO, we
could just thread it around when parsing a repo if we don't want to have
perfect sharing all the time.

> I still cannot see how this would give us back cheap equality tests for
> prim patches in the implementation of RepoPatchV3.

You're right, it's not as simple as I had thought.

To get cheap equality we'd also need to use 'reallyUnsafePtrEquality'.
Clearly the name should give us some pause :-) It sounds like it can't
give false positives (false reports that

> BTW, one optimization of the on-disk representation that would be
> minimally invasive and even serve as a possible first step toward the
> more far-reaching changes discussed above, is to eliminate the
> PrimPatchIds for unconflicted patches and just re-calculate them when
> reading the patch.

Yep. I think that's also what I meant by eliding it when it's the same
as the containing patch.

One option for now would be to keep the full patch names in the on-disk
format, but keep the hashes in the in-memory representation for now.
That would be easy on reading (just hash), but harder when writing as
we'd need a table of hash -> patch name somewhere. But then the hashing
would just be an implementation detail we could change at any time.

__________________________________
Darcs bug tracker <bugs at darcs.net>
<http://bugs.darcs.net/patch1823>
__________________________________