[delta-l] Delta files - character encoding
takowl at gmail.com
Mon Jan 16 20:00:28 CET 2012
Ideally, yes. But working with DELTA files is precisely the sort of
historic compromise you mention - the files need to be compatible with
older programs which don't use UTF-8.
I quite like Craig's suggestion of a character set directive, though. Even
if we have to keep using cp1252 for now, having the directive available for
new programs means we could one day use utf-8 (once the new software is
ubiquitous). Allow me to flesh out the proposal a bit:
The directive to specify the text encoding used in DELTA files is:
*TEXT ENCODING cp1252
New programs should be able to read files using at least cp850, cp1252 and
utf-8. If the directive is not present, programs should assume text is
stored as cp1252, unless instructed otherwise.
On 16 January 2012 17:12, Quentin Groom <qgroom at reticule.co.uk> wrote:
> ** **
> IMHO, there is only one character set to use for any multilingual program
> and that is UTF8. All the others are historic compromises. only UTF8 has
> the full range of characters for any languages and is widely supported on
> all operating systems. Believe me, it is a mistake to use anything else!
> Particularly as it is no more difficult to use than any other character set.
> ** **
> ** **
> *From:* delta-l-bounces at science.uu.nl [mailto:
> delta-l-bounces at science.uu.nl] *On Behalf Of *Thomas Kluyver
> *Sent:* 16 January 2012 17:57
> *To:* Descriptive taxonomic databases
> *Subject:* Re: [delta-l] Delta files - character encoding****
> ** **
> Thanks, Eric,
> I'm currently using code page 1252, the standard Windows code page for
> English and Western (sometimes called "ANSI"), as the default for DELTA
> files, but allowing another encoding to be specified. From what I can find
> online, this was the default in Windows from at least 98, and probably
> earlier. Is codepage 850 (from DOS) likely to be more common in existing
> DELTA files?
> As and when I implement saving DELTA files, I'll make 1252 the default
> there, as I presume this is what newer tools will expect. Ideally we could
> just use utf-8, but I guess this would cause problems for existing
> Thankfully Python provides all the necessary tools for working with any of
> these character sets. The difficulty is just in picking the right one.
> Best wishes,
> On 16 January 2012 15:27, Eric Gouda <E.J.Gouda at uu.nl> wrote:****
> On most PC's code page 850 has been used and there are not that much
> differences with other frequently used code pages.****
> Windows is as far as I know not working with the standard MSDOS code
> pages any more and I am using OEM to ANSI conversion to load Delta files
> into DXedit.
> If you need the conversion table, let me know.****
> ** **
> delta-l mailing list
> delta-l at science.uu.nl
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the delta-l