[darcs-users] Regular Expression libraries and linker errors

Trent W. Buck twb at cybersource.com.au
Tue Oct 6 01:18:28 UTC 2009


Petr Rockai <me at mornfall.net> writes:

> Jason Dagit <dagit at codersbase.com> writes:
>> In my travels profiling the performance of record I noticed that we
>> do spend about 1/3 of the time just matching regular expressions on
>> filenames.
> Just one thing... do we match those on String or on ByteString?

In principle, I would like to match Unicode codepoints, not bytes.

In practice, I avoid non-ASCII and non-printable characters in file
names, because there are so many such issues on Unix :-(

> Because not using String would probably lead to another substantial
> speedup on this. We may also want to switch to regex-dfa, since I
> believe we only care whether we have a match and not much else.

I see no downside there.

> But you are right that regex-pcre or pcre-light might be faster
> (before deciding, it may make a lot of sense to benchmark both in
> darcs, though).

I have no problem switching from EREs to PCREs, but if we do so, please
lets do it for all of Darcs at once!

As well as benchmarking, someone will need to check that the default
regexps that Darcs HAS shipped will have the same semantics after
switching to PCRE.

For example, in the following case the old ERE boring file will be used
with a libpcre Darcs:

    darcs init       # 1.0.9
    # install a libpcre darcs
    darcs add -r .   # 2.5 or whatever



More information about the darcs-users mailing list