[delta-l] Delta files - character encoding
takowl at gmail.com
Mon Jan 23 15:15:14 CET 2012
> Text Encoding
> > --------------------
> > The directive to specify the text encoding used in DELTA files is:
> > *TEXT ENCODING cp1252
> > New programs should be able to read files using at least cp850, cp1252
> > and utf-8. If the directive is not present, programs should assume
> > text is stored as cp1252, unless instructed otherwise.
> It might be better to use the form of the HTML 'charset' parameter:
> *TEXT ENCODING windows-1252
> An alternative mechanism would be to incorporate the information in a
> special comment at the start of the file:
> *COMMENT @charset=UTF-8
> This would allow the file to be used by older programs that might not need
> the encoding information.
I quite like this comment idea, if it avoids having to add a separate file
for the encoding. Borrowing from how encoding is specified in Python source
code, I suggest a first line comment in each DELTA format file is checked
for the regex "coding[:=]\s*([-\w.]+)" . This would match comments
*COMMENT coding: UTF-8
Encoding names would be case insensitive. New programs should be prepared
to accept at least 'windows-1252', 'cp850' (or is 'ibm850' better for
this?) and 'utf-8'. If the first line isn't a comment, or is a comment
without this pattern, it is assumed to be 'windows-1252'.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the delta-l