[darcs-devel] [issue1094] typo blacklist for patch names and descriptions

Trent Buck bugs at darcs.net
Tue Sep 23 03:28:02 UTC 2008


New submission from Trent Buck <trentbuck at gmail.com>:

Darcs' powerful cherry-picking features make it important to avoid
typos in patch names.  For example, if a user accidentally did

    darcs rec -am 'debain: bump changelog to 2.0.3~pre1-1"

then later doing

    darcs send -p ^debian:

would not match this patch.  For inattentive users like myself, this
can actually result in data loss -- because I *think* all the
important patches have been sent/pushed, and I rm -rf my local copy.

There are two ways to do spellchecking:

- The checker has a whitelist of "correct" words; if its not in this
  list, it's a mistake.  This results in lots of false positives.

- The checker has a blacklist of common typos; if its not in this
  list, it's correct.  This results in lots of false negative.

On the mailing list the general consensus (which I agree with) is that
the former alternative is a lot of work and will annoy a lot of people
(particularly when you take into account multilingual environments).

I'd like to reiterate my argument for the latter alternative, because
the fallback behaviour -- false negatives -- is what darcs does now,
without any spellchecking!

Here's a mockup of how this feature might behave.

    $ cat _darcs/typos
    typo tyop
    # Enforce en_GB-oed (Oxford English) conventions.
    colour color
    realize realise
    $ cat ~/.darcs/typos
    # Enforce capitalization as well as correct spelling.
    Debian debian debain
    # Unicode is fun, too.
    한국어 혼극어

This is a table of correctly spelled words, followed by one or more
incorrect versions on the same line.  Darcs init has created the file
with the "tyop" line, and the end user has gotten annoyed with his
co-workers and added some extra entries.  The user can also have a
personal typo blacklist.

    $ darcs rec -am "Fix colour blindness issue."

No change here, because there are no blacklisted words.

    $ darcs rec -a
    What is the patch name? Fix color blindness issue.

    Warning: possible typographical error (typo) detected!
    You said `color' but probably meant `colour'.

    Change patch name? [Y/n/q]

The user can choose N to override this warning, Y to go back to the
patch name step, or q to abort the entire record operation.

    $ darcs rec -am 'Fix color blindness issue.'
    Warning: possible typographical error (typo) detected!
    You said `color' but probably meant `colour'.
    Continuing anyway.

Because used in "non-interactively", this just prints a warning and
continues on anyway.

Note that I haven't shown it here, but I envisage this feature also
checking the patch description against the typo blacklist.  It would
*NOT* check the patch body (i.e. the hunks of changed lines and
suchlike).

We can probably harvest the output of "darcs changes -s" on some large
repositories like darcs and ghc to create a useful default blacklist.

PS: this feature was inspired by lintian, which does the same thing
for debian changelog entries.

----------
messages: 6091
nosy: dagit, kowey, simon, twb
status: unread
title: typo blacklist for patch names and descriptions

__________________________________
Darcs bug tracker <bugs at darcs.net>
<http://bugs.darcs.net/issue1094>
__________________________________


More information about the darcs-devel mailing list