[delta-l] Using geographical restrictions in interactive keys
m.j.dallwitz at netspeed.com.au
Mon Jan 30 00:20:50 CET 2012
- From: Ken Walker
>> Of course, the estimate of the taxa that might occur in a given area
>> can still be wrong. If you 'build a key' based on that estimate, then
>> an attempt to identify a specimen from a taxon that was erroneously
>> excluded from that area will inevitably fail.
> Keys are aids not axioms - they can all fail for a variety of
> key-related, data-related or user-related reasons.
Obviously. But it's better to avoid building keys with modes of failure
that can be avoided by better design of the keys.
The rate of failure of keys is often high - see 'Effectiveness of
Identification Methods – References'
(http://delta-intkey.com/www/idtests.htm). In the studies by Stucky et al.
(1984) and Morse et al. (1996), about 25% of the identifications were
wrong. Interactive-key programs should provide features that can reduce
the error rate. New programs often introduce one or two novel features,
but are ineffective because they lack long-known features that improve
identification accuracy, such as the features introduced by Goodall (1968)
and Morse (1971) (see 'History of interactive keys' in 'Principles of
interactive keys', http://delta-intkey.com/www/interactivekeys.htm).
Pankhurst's Online (Version 2, 1975) contained all the features of
Goodall's and Morse's programs, and Intkey (version 2, 1992) contained all
the features of Online (except one, which was deliberately omitted because
I considered it detrimental).
>> This is most easily done if the distribution data is recorded as a
>> character (or characters).
> I disagree that the process of including individual specimens records
> in a key, which could number in the tens of thousands for some
> species, can be "easily done" by recording them as characters.
The context in my posting was: 'If you use distribution information within
a complete key (covering all areas), then it's possible to recover from
erroneous distribution information. This is most easily done if the
distribution data is recorded as a character (or characters).' I meant
that this method is easiest for the user of the key, as shown in the
example I gave.
Unfortunately, things that are easier or better for the user are usually
more difficult for the author or programmer.
In my previous two postings, my main message may have been lost in the
detailed examples I gave. I was trying to convey that there are three
basic methods for incorporating distribution (or similar) data in an
1. Construct a special key for a region. (This method was often used for
conventional keys.) If the specimen being identified doesn't belong to one
of the taxa in the key, the identification inevitably fails. This is the
method that Ken seems to be advocating: 'build a key to the species of
Polychaetes recorded from Lizard Island', 'build a key for the known and
presumed taxa that occur within the geospatial area'.
2. In a full key, temporarily restrict the taxa to those found in a
region. With suitable software (e.g. Intkey) this can be done, or changed,
or undone, at any stage of the identification. If the initial
identification is wrong because the distribution information is wrong, the
user has to guess, or work out by trial and error, that the fault lies in
3. Treat the distribution data like a character, subject to the 'error
tolerance' mechanism. If the initial identification is wrong, the user can
simply proceed with the identification. After the correct answer is
reached, the program can work out where the fault was. (Of course, if the
'error tolerance' mechanism isn't available in the program, this method
has no advantage over method 2.)
With any of these methods, distribution information could be incorporated,
manually or automatically, by the author of the key. As Ken pointed out,
this is rather inflexible, because the amount of information that that can
be incorporated is limited. Nevertheless, even a single distribution
character with a fairly small number of states (as in the example in my
second posting), can considerably shorten identifications.
Intkey allows the user of any key to manually apply method 2 via a list of
taxon names. This allows complete flexibility, as it's not necessary to
rely on built-in lists. The list could be obtained by any means, e.g. from
a publication or by a database query.
Any of the methods could be implemented by automatically linking to a
specimen database, so that a user could make an arbitrary query of the
database and have the resulting information used in the key.
Method 1 is not worth considering in this context, because of the
intrinsic limitations of the method.
Method 2 would probably be fairly easy to implement for key programs that
already support taxon subsets. As I said in my first posting: 'It would
probably be possible to modify Intkey to query the specimen databases
directly, without using an intermediate keyword-definition file.'
Method 3 would be more difficult to implement. It would be best to
generalize the method to allow /any/ subset of taxa (not just those
resulting from database queries) to take part in the 'error tolerance'
mechanism. That is, the subset would behave like a character
#n. <membership of subset X>/
1. belongs to subset X/
2. does not belong to subset X/
The links to specimen databases would make use of this general mechanism.
It would be possible to make (or change) the database query at any stage
of the identification process.
For some purposes, it would be necessary to retain the ability to use
subsets in the current manner, i.e. to absolutely include some taxa, and
exclude the rest.
Contact information: http://delta-intkey.com/contact/dallwitz.htm
DELTA home page: http://delta-intkey.com
More information about the delta-l