[darcs-users] [patch37] Store textual patch metadata encoded in UTF-8

Trent W. Buck twb at cybersource.com.au
Tue Nov 10 08:32:32 UTC 2009


Eric Kow <kowey at darcs.net> writes:

> Trent: could you comment on the utf8.sh test?

As requested, I've prioritized this review.

I started commenting on this test script by hunks, before I realized it
had been amended later in the bundle.  Therefore this review jumps
around A LOT.  Sorry!

Apart from the VISUAL and JIS/10646 comments, I think these are all
style nitpicks.

>> [Add test for UTF-8 metadata
>> Reinier Lamers <tux_rocker at reinier.de>**20090917165327
>>  Ignore-this: 3e81237e8af61a45d64ac60269e1fe90 UTF-8
>> ] addfile ./tests/utf8.sh
>> hunk ./tests/utf8.sh 1
>> +#!/usr/bin/env bash
>> +## Test for issue64 - Should store patch metadata in UTF-8

The file might better be called issue64-utf8.sh.

>> +# This file is encoded in ISO-8859-15 aka latin9. It was crafted
>> with a hex editor.

Could do with vim/Emacs magic to tell them that the file is latin-9.

>> +echo 'Selbstverst\344ndlich \374berraschend' > something.txt

Is it worth including translations as comments, for monoglots? ;-)

>> +echo 'l33tking\2700r at example.org' > interaction_script.txt
>> +echo y >> interaction_script.txt
>> +echo y >> interaction_script.txt
>> +echo '\244uroh4xx0rz' >> interaction_script.txt
>> +echo n >> interaction_script.txt

Could be done with a heredoc later (i.e. <<EOF), but that might change
the nature of the input if bash and files do encoding differently.

>> +unset DARCSEMAIL
>> +unset DARCS_TESTING_PREFS_DIR
>> +unset EMAIL
>> +set

Is it necessary to clear these in this script specifically?  Could they
better be moved to tests/lib (or Setup.lhs) or deleted?

Either way, I would move them up so that interaction_script is written
to and read from on adjacent lines.

>> +# Test recording non-UTF-8-encoded non-ASCII ("funny") metadata from
>> +# interactive input
>> +darcs record -i < interaction_script.txt

This pattern repeats throughought the script, testing different ways for
metadata to get into Darcs (interactively, from a file, from arguments,
from the environment, ...)

>> +if grep 'l33tking\305\2760r at example.org' changes.xml ; then
>> +    echo "patch author OK from interactive prompt"
>> +else
>> +    echo "patch author from interactive prompt not UTF-8-encoded!"
>> +    exit 1
>> +fi

Given the way these tests are used, I would simplify this to

    # Patch author from interactive prompt is UTF-8 encoded?
    grep '...' changes.xml

Since if the grep fails, both the above lines will be shown by
Setup.lhs.  This comment applies to similar subsequent code.

Update: I see a later patch in the bundle replaces this with a wrapper
function.  I'm not enthused, because tests/lib's set -x will make this
even noisier.

AFAICT it doesn't matter for any of these strings, but I am in the habit
of saying "fgrep" if the search pattern is literal (i.e. not a regex).

>> +# Test recording funny metadata from environment,

Incidentally, "funny" is a bit loaded.  Non-Latin1 characters shouldn't
be amusing nor peculiar! :-)

>> +echo '#!/usr/bin/env bash' > editor
>> +echo 'echo All my \244s are gone > $1' >> editor # create an 'editor' that writes latin9
>> +chmod +x editor
>> +export EDITOR="`pwd`/editor"

Either use VISUAL, or ensure VISUAL is unset -- it takes precedence over
EDITOR.

>> +printf "y\ny\n" | darcs amend --edit -p 'Patch by '

This could simply be

    darcs amend ... <<<yyn

since Darcs doesn't care about missing newlines, and the bashism <<< is
a herestring.  You might even be able to get away with (untested):

    VISUAL=tee darcs amend --edit -p 'Patch by ' <<EOF
    yny
    All my \244s are gone
    EOF

>> +    if [ -z "$3" ]; then
>> +            last=""
>> +    else
>> +            last="--last $3"
>> +    fi
>> +
>> +    darcs changes $last --xml > changes.xml

For a function with optional extra arguments, I suggest simply using
"$@":

    f () {   # f x y zs
      x=$1 y=$2
      shift 2
      g "$x" "$y" "$@"
    }

    f foo bar
    f baz quux --last 3

This gives you some flexibility later if a different call to "f" needs
something other than a --last N, e.g.

    f quuux quuuux --max-count 7

In this case, you could perhaps add "changes max-count 1" to
_darcs/prefs/defaults.

>> +diff temp1/changes.xml temp2/changes.xml

Could use cmp instead of diff, since we only need to know IF the file
has changed, not HOW (because a human can inspect the files afterwards).

>> +if grep \264ors _darcs/prefs/author ; then

Not putting quotes around such a literal string won't affect a working
system, but I would be a little wary of The Right Thing happening when
the system isn't encoding things correctly.  That is, what happens if
Bash decides a multi-byte character is actually "Ã "?  You might end up
with "grep à oid file" instead of "grep 'à oid' file".  I'm probably
just being paranoid.

Hmm, but now that I'm thinking about it, this script is only testing
8859 -- a unibyte encoding.  Suggest iconving it into a second test,
using JIS or 10646.

PS: this test overall ends up being relatively long.  Long procedural
code means stuff might accidentally fail (or worse: accidentally pass)
due to state leftover from earlier in the script.  For example, I notice
you cleaning up after _darcs/prefs/author vs. $EMAIL.  Is it worth
breaking up this script into smaller "chunks" that each test a little
bit?



More information about the darcs-users mailing list