Generalizing this:

ALL parsing tokens should accept reasonable alternative encodings -- if it
visible looks alike, it should be treated alike.

So, for example hyphen, en-dash em-dash should be equivalent back tick,
vertical tick and single quotes should be equivalent. double quotes and
guillemets and angle quotes;

If $ is used as a marker, then other currencies should be accepted too.

In terms of programming the easiest way would be to have a 'table of
equivalencies' and so the first pass through substitutes an arbitrary token
for these whenever they occur.  This allows easy customization depending on
what keyboard is local.

Along with this however you need either some error checking so that the
same character cannot be used for two different tokens.  E.g. if

range operator = dash => -, em-dash, en-dash, U1234, etc

then later

choice of options = pipe => |, solidus, double-dagger, en-dash

SOMETHING better fuss.


On Sun, Jan 29, 2012 at 10:18 AM, Thomas Kluyver <takowl at gmail.com> wrote:

> I found when copying and pasting examples from the DELTA standard that the
> examples of the 'to' separator all use the N dash (–, unicode U+2013),
> while the files I have use the hyphen-minus (-, U+002D, the standard dash
> on computer keyboards).
> I expect (and hope) that this is simply a mistake in the spec: the N dash
> is not an ASCII character, so it would be tricky to parse it reliably.
> However, for files encoded with windows-1252 (which is standard for more
> modern DELTA files), it is possible to store an N-dash.
> Can anyone confirm that code parsing DELTA files should only allow
> hyphen-minus for this separator? And if so, could the spec be updated to
> use hyphen-minus in examples?
> Thank-you,
> Thomas
