[darcs-users] Regular Expression libraries and linker errors

Trent W. Buck trentbuck at gmail.com
Sat Oct 3 13:12:28 UTC 2009


Jason Dagit <dagit at codersbase.com> writes:

> 5) The default regex that we provide (eg., on darcs init), are not fully
> optimized and may not do what people expect in all cases.

What regexps are used by darcs init?

> Here is what I propose:
> a) We switch to regex-posix.

Fewer dependencies, so it suits me.

> b) We invest a small bit of time writing a function to optimize a list of
> simple regexes into one big but efficient regex.

The objection to this is that the resulting regex is unreadable, and
apparently people want to be able to edit _darcs/prefs/boring as well as
merely appending to it.

> Originally it looked like:
> \.foo$
> \.FOO$
>
> Specifically, our current list looks like:
> \.(foo|FOO)$

I made that change, because I was sick of the list being so damn long.

> I think we should transform that to:
> \.[fF][oO][oO]$
>
> I think that better captures the case-insensitive intent that we had.

I can't remember why this was kiboshed... possibly "readability".
Other than being fugly, I have no objection to it.

> I can also imagine other stop gap proposals like making a standalone
> commandline tool that can optimize the regexs and write them back out
> so people have a chance to review them.

Emacs can do this for OR'd literals.  Here's one I prepared earlier:

    ## From Darcs 1.0.9
    \#
    ^\.\#
    (^|/),
    ^\.d(arcs-temp-mail|epend)$
    (^|/)(\.DS_Store|T(AGS|humbs\.db)|co(nfig\.(log|status)|re)|tags|vssver\.scc)$
    (^|/)(\.(arch-ids|svn|tmp_versions)|BitKeeper|C(VS|hangeSet)|MT|RCS|SCCS|_darcs|a(rch|utom4te\.cache)|{arch})($|/)
    (,v|\.(BAK|b(ak|zr)|c(lass|ore|vsignore)|e(lc|xe)|hi(-boot)?|ko\.cmd|l[ao]|mod\.c|o(-boot|\.cmd|bj|rig)|p(rof|y[co])|rej|s(o|wp)|[ao])|~)$

I think it basically looks for common substrings, and converts them to
[abc] or (aa|bb|cc) as appropriate.

> But, having darcs optimize them on the fly (or adding that to the
> regex-base library) is nice because then they throw any old regex at
> darcs and it tries to clean it up before using it.

+1.



More information about the darcs-users mailing list