[delta-l] Delta files - character encoding

Thomas Kluyver takowl at gmail.com
Mon Jan 16 20:00:28 CET 2012


Quentin,

Ideally, yes. But working with DELTA files is precisely the sort of
historic compromise you mention - the files need to be compatible with
older programs which don't use UTF-8.

I quite like Craig's suggestion of a character set directive, though. Even
if we have to keep using cp1252 for now, having the directive available for
new programs means we could one day use utf-8 (once the new software is
ubiquitous). Allow me to flesh out the proposal a bit:

Text Encoding
--------------------
The directive to specify the text encoding used in DELTA files is:
    *TEXT ENCODING cp1252
New programs should be able to read files using at least cp850, cp1252 and
utf-8. If the directive is not present, programs should assume text is
stored as cp1252, unless instructed otherwise.

Thomas

On 16 January 2012 17:12, Quentin Groom <qgroom at reticule.co.uk> wrote:

> All,****
>
> ** **
>
> IMHO, there is only one character set to use for any multilingual program
> and that is UTF8. All the others are historic compromises. only UTF8 has
> the full range of characters for any languages and is widely supported on
> all operating systems. Believe me, it is a mistake to use anything else!
> Particularly as it is no more difficult to use than any other character set.
> ****
>
> ** **
>
> Regards****
>
> Quentin****
>
> ** **
>
> *From:* delta-l-bounces at science.uu.nl [mailto:
> delta-l-bounces at science.uu.nl] *On Behalf Of *Thomas Kluyver
> *Sent:* 16 January 2012 17:57
> *To:* Descriptive taxonomic databases
> *Subject:* Re: [delta-l] Delta files - character encoding****
>
> ** **
>
> Thanks, Eric,
>
> I'm currently using code page 1252, the standard Windows code page for
> English and Western (sometimes called "ANSI"), as the default for DELTA
> files, but allowing another encoding to be specified. From what I can find
> online, this was the default in Windows from at least 98, and probably
> earlier. Is codepage 850 (from DOS) likely to be more common in existing
> DELTA files?
>
> As and when I implement saving DELTA files, I'll make 1252 the default
> there, as I presume this is what newer tools will expect. Ideally we could
> just use utf-8, but I guess this would cause problems for existing
> applications.
>
> Thankfully Python provides all the necessary tools for working with any of
> these character sets. The difficulty is just in picking the right one.
>
> Best wishes,
> Thomas
>
> On 16 January 2012 15:27, Eric Gouda <E.J.Gouda at uu.nl> wrote:****
>
> On most PC's code page 850 has been used and there are not that much
> differences with other frequently used code pages.****
>
> Windows is as far as I know not working with the standard MSDOS code
> pages any more and I am using OEM to ANSI conversion to load Delta files
> into DXedit.
> If you need the conversion table, let me know.****
>
> ** **
>
> _______________________________________________
> delta-l mailing list
> delta-l at science.uu.nl
> http://mailman.science.uu.nl/mailman/listinfo/delta-l
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.science.uu.nl/pipermail/delta-l/attachments/20120116/aea32506/attachment.html 


More information about the delta-l mailing list