[darcs-users] Regular Expression libraries and linker errors
dagit at codersbase.com
Tue Oct 6 03:05:35 UTC 2009
On Mon, Oct 5, 2009 at 6:18 PM, Trent W. Buck <twb at cybersource.com.au>wrote:
> Petr Rockai <me at mornfall.net> writes:
> > Jason Dagit <dagit at codersbase.com> writes:
> >> In my travels profiling the performance of record I noticed that we
> >> do spend about 1/3 of the time just matching regular expressions on
> >> filenames.
> > Just one thing... do we match those on String or on ByteString?
> In principle, I would like to match Unicode codepoints, not bytes.
On OS X, man regex give these two definitions:
regcomp(regex_t *restrict preg, const char *restrict pattern,
regexec(const regex_t *restrict preg, const char *restrict string,
size_t nmatch, regmatch_t pmatch[restrict], int eflags);
So, both regcomp and regexec take vectors of bytes. If a wchar version
exists then I don't think the Haskell bindings are using them.
I think as long as you're lucky enough that the regex and string are in the
same encoding then ByteString and String will be equivalent in their
matching ability here. Unfortunately I don't think darcs makes an such
> In practice, I avoid non-ASCII and non-printable characters in file
> names, because there are so many such issues on Unix :-(
> > Because not using String would probably lead to another substantial
> > speedup on this. We may also want to switch to regex-dfa, since I
> > believe we only care whether we have a match and not much else.
> I see no downside there.
I was going to agree that we don't need the extra capabilities like
extracting matches and doing replaces but it just occurred to me that we
could probably re-implement some things, like decode_white/encode_white
using regexps and potentially get better performance. It's worth doing
performance test to see. I'll try to get some data on this.
> > But you are right that regex-pcre or pcre-light might be faster
> > (before deciding, it may make a lot of sense to benchmark both in
> > darcs, though).
> I have no problem switching from EREs to PCREs, but if we do so, please
> lets do it for all of Darcs at once!
> As well as benchmarking, someone will need to check that the default
> regexps that Darcs HAS shipped will have the same semantics after
> switching to PCRE.
If it comes to this, do you think you would know how to determine this? I'd
have to do a bit of research to figure it out myself.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the darcs-users