[darcs-users] [patch37] Store textual patch metadata encoded in UTF-8
Trent W. Buck
twb at cybersource.com.au
Tue Nov 10 08:32:32 UTC 2009
Eric Kow <kowey at darcs.net> writes:
> Trent: could you comment on the utf8.sh test?
As requested, I've prioritized this review.
I started commenting on this test script by hunks, before I realized it
had been amended later in the bundle. Therefore this review jumps
around A LOT. Sorry!
Apart from the VISUAL and JIS/10646 comments, I think these are all
style nitpicks.
>> [Add test for UTF-8 metadata
>> Reinier Lamers <tux_rocker at reinier.de>**20090917165327
>> Ignore-this: 3e81237e8af61a45d64ac60269e1fe90 UTF-8
>> ] addfile ./tests/utf8.sh
>> hunk ./tests/utf8.sh 1
>> +#!/usr/bin/env bash
>> +## Test for issue64 - Should store patch metadata in UTF-8
The file might better be called issue64-utf8.sh.
>> +# This file is encoded in ISO-8859-15 aka latin9. It was crafted
>> with a hex editor.
Could do with vim/Emacs magic to tell them that the file is latin-9.
>> +echo 'Selbstverst\344ndlich \374berraschend' > something.txt
Is it worth including translations as comments, for monoglots? ;-)
>> +echo 'l33tking\2700r at example.org' > interaction_script.txt
>> +echo y >> interaction_script.txt
>> +echo y >> interaction_script.txt
>> +echo '\244uroh4xx0rz' >> interaction_script.txt
>> +echo n >> interaction_script.txt
Could be done with a heredoc later (i.e. <<EOF), but that might change
the nature of the input if bash and files do encoding differently.
>> +unset DARCSEMAIL
>> +unset DARCS_TESTING_PREFS_DIR
>> +unset EMAIL
>> +set
Is it necessary to clear these in this script specifically? Could they
better be moved to tests/lib (or Setup.lhs) or deleted?
Either way, I would move them up so that interaction_script is written
to and read from on adjacent lines.
>> +# Test recording non-UTF-8-encoded non-ASCII ("funny") metadata from
>> +# interactive input
>> +darcs record -i < interaction_script.txt
This pattern repeats throughought the script, testing different ways for
metadata to get into Darcs (interactively, from a file, from arguments,
from the environment, ...)
>> +if grep 'l33tking\305\2760r at example.org' changes.xml ; then
>> + echo "patch author OK from interactive prompt"
>> +else
>> + echo "patch author from interactive prompt not UTF-8-encoded!"
>> + exit 1
>> +fi
Given the way these tests are used, I would simplify this to
# Patch author from interactive prompt is UTF-8 encoded?
grep '...' changes.xml
Since if the grep fails, both the above lines will be shown by
Setup.lhs. This comment applies to similar subsequent code.
Update: I see a later patch in the bundle replaces this with a wrapper
function. I'm not enthused, because tests/lib's set -x will make this
even noisier.
AFAICT it doesn't matter for any of these strings, but I am in the habit
of saying "fgrep" if the search pattern is literal (i.e. not a regex).
>> +# Test recording funny metadata from environment,
Incidentally, "funny" is a bit loaded. Non-Latin1 characters shouldn't
be amusing nor peculiar! :-)
>> +echo '#!/usr/bin/env bash' > editor
>> +echo 'echo All my \244s are gone > $1' >> editor # create an 'editor' that writes latin9
>> +chmod +x editor
>> +export EDITOR="`pwd`/editor"
Either use VISUAL, or ensure VISUAL is unset -- it takes precedence over
EDITOR.
>> +printf "y\ny\n" | darcs amend --edit -p 'Patch by '
This could simply be
darcs amend ... <<<yyn
since Darcs doesn't care about missing newlines, and the bashism <<< is
a herestring. You might even be able to get away with (untested):
VISUAL=tee darcs amend --edit -p 'Patch by ' <<EOF
yny
All my \244s are gone
EOF
>> + if [ -z "$3" ]; then
>> + last=""
>> + else
>> + last="--last $3"
>> + fi
>> +
>> + darcs changes $last --xml > changes.xml
For a function with optional extra arguments, I suggest simply using
"$@":
f () { # f x y zs
x=$1 y=$2
shift 2
g "$x" "$y" "$@"
}
f foo bar
f baz quux --last 3
This gives you some flexibility later if a different call to "f" needs
something other than a --last N, e.g.
f quuux quuuux --max-count 7
In this case, you could perhaps add "changes max-count 1" to
_darcs/prefs/defaults.
>> +diff temp1/changes.xml temp2/changes.xml
Could use cmp instead of diff, since we only need to know IF the file
has changed, not HOW (because a human can inspect the files afterwards).
>> +if grep \264ors _darcs/prefs/author ; then
Not putting quotes around such a literal string won't affect a working
system, but I would be a little wary of The Right Thing happening when
the system isn't encoding things correctly. That is, what happens if
Bash decides a multi-byte character is actually "Ã "? You might end up
with "grep à oid file" instead of "grep 'à oid' file". I'm probably
just being paranoid.
Hmm, but now that I'm thinking about it, this script is only testing
8859 -- a unibyte encoding. Suggest iconving it into a second test,
using JIS or 10646.
PS: this test overall ends up being relatively long. Long procedural
code means stuff might accidentally fail (or worse: accidentally pass)
due to state leftover from earlier in the script. For example, I notice
you cleaning up after _darcs/prefs/author vs. $EMAIL. Is it worth
breaking up this script into smaller "chunks" that each test a little
bit?
More information about the darcs-users
mailing list