<br><br><div class="gmail_quote">On Wed, Apr 7, 2010 at 9:45 AM, Duncan Coutts <span dir="ltr"><<a href="mailto:duncan.coutts@googlemail.com">duncan.coutts@googlemail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="im">On Tue, 2010-04-06 at 09:03 -0700, Jason Dagit wrote:<br>
<br>
> On Tue, Apr 6, 2010 at 6:26 AM, Petr Rockai <<a href="mailto:me@mornfall.net">me@mornfall.net</a>> wrote:<br>
> Duncan Coutts <<a href="mailto:duncan.coutts@googlemail.com">duncan.coutts@googlemail.com</a>> writes:<br>
<br>
</div><div class="im">> > One option would be to use a representation like a<br>
> reverse-order list of path components, with each component<br>
> stored as a short packed string. That allows for sharing<br>
> between paths and would reduce the cost of using long absolute<br>
> paths.<br>
<br>
</div><div class="im">> Interesting idea. I have already started using a path type<br>
> that is a list of components (represented as bytestrings), it<br>
> just did not occur to me to make it reverse to improve<br>
> sharing. I'll try to look into doing that.<br>
><br>
> I could be wrong, but I was under the impression that many short<br>
> bytestrings leads to memory fragmentation in the GC.<br>
<br>
</div>Indeed, which is why I said "short packed string" rather than<br>
ByteString. I would implement a short packed string as a wrapper around<br>
GHC's ByteArray# type which can be allocated unpinned. That gives an<br>
overhead of 2 or 4 words compared to 5 or 8 words (differences depend on<br>
sharing and if the type is unpacked into a containing data constructor).<br>
<div class="im"><br>
> I was also under the impression that small bytestrings have worse<br>
> overhead than small Strings.<br>
<br>
</div>If I recall correctly they even up at around 3-5 characters. However the<br>
pinning of ByteString is a major PITA. So I would not suggest using them<br>
for short strings.<br>
<br>
IMHO, ByteStrings using ForeignPtrs was a mistake (and one I hope to<br>
correct at some point in the future).<br></blockquote><div><br></div><div><with my darcs dev hat on></div><div>It sounds like we should keep using ByteString instead of rolling our own solution. We're in the version control business, not the low level string abstraction business. Maybe we should use ByteString and let it catch up in optimizations on its own schedule.</div>
<div><removing the hat></div><div><br></div><div>How much work/effort is the refactor you talk about? I mean, the refactor of ByteString to not use ForeignPtrs.</div><div><br></div><div>How much work/effort to convince the ByteString user base that the refactor is ready for consumption?</div>
<div><br></div><div>I ask because it's a change that could add a lot of value to our benchmarking efforts for darcs by giving us more accurate heap usage information and better garbage collection if the fragmentation is reduced.</div>
<div><br></div><div>Jason</div></div>