[delta-l] Using geographical restrictions in interactive keys

Mike Dallwitz m.j.dallwitz at netspeed.com.au
Mon Jan 30 00:20:50 CET 2012

- From: Ken Walker

>> Of course, the estimate of the taxa that might occur in a given area
>> can still be wrong. If you 'build a key' based on that estimate, then
>> an attempt to identify a specimen from a taxon that was erroneously
>> excluded from that area will inevitably fail.
> Keys are aids not axioms - they can all fail for a variety of
> key-related, data-related or user-related reasons.

Obviously. But it's better to avoid building keys with modes of failure 
that can be avoided by better design of the keys.

The rate of failure of keys is often high - see 'Effectiveness of 
Identification Methods – References' 
(http://delta-intkey.com/www/idtests.htm). In the studies by Stucky et al. 
(1984) and Morse et al. (1996), about 25% of the identifications were 
wrong. Interactive-key programs should provide features that can reduce 
the error rate. New programs often introduce one or two novel features, 
but are ineffective because they lack long-known features that improve 
identification accuracy, such as the features introduced by Goodall (1968) 
and Morse (1971) (see 'History of interactive keys' in 'Principles of 
interactive keys', http://delta-intkey.com/www/interactivekeys.htm).

Pankhurst's Online (Version 2, 1975) contained all the features of 
Goodall's and Morse's programs, and Intkey (version 2, 1992) contained all 
the features of Online (except one, which was deliberately omitted because 
I considered it detrimental).

>> This is most easily done if the distribution data is recorded as a
>> character (or characters).
> I disagree that the process of including individual specimens records
> in a key, which could number in the tens of thousands for some
> species, can be "easily done" by recording them as characters.

The context in my posting was: 'If you use distribution information within 
a complete key (covering all areas), then it's possible to recover from 
erroneous distribution information. This is most easily done if the 
distribution data is recorded as a character (or characters).' I meant 
that this method is easiest for the user of the key, as shown in the 
example I gave.

Unfortunately, things that are easier or better for the user are usually 
more difficult for the author or programmer.

In my previous two postings, my main message may have been lost in the 
detailed examples I gave. I was trying to convey that there are three 
basic methods for incorporating distribution (or similar) data in an 
interactive key.

1. Construct a special key for a region. (This method was often used for 
conventional keys.) If the specimen being identified doesn't belong to one 
of the taxa in the key, the identification inevitably fails. This is the 
method that Ken seems to be advocating: 'build a key to the species of 
Polychaetes recorded from Lizard Island', 'build a key for the known and 
presumed taxa that occur within the geospatial area'.

2. In a full key, temporarily restrict the taxa to those found in a 
region. With suitable software (e.g. Intkey) this can be done, or changed, 
or undone, at any stage of the identification. If the initial 
identification is wrong because the distribution information is wrong, the 
user has to guess, or work out by trial and error, that the fault lies in 
that information.

3. Treat the distribution data like a character, subject to the 'error 
tolerance' mechanism. If the initial identification is wrong, the user can 
simply proceed with the identification. After the correct answer is 
reached, the program can work out where the fault was. (Of course, if the 
'error tolerance' mechanism isn't available in the program, this method 
has no advantage over method 2.)

With any of these methods, distribution information could be incorporated, 
manually or automatically, by the author of the key. As Ken pointed out, 
this is rather inflexible, because the amount of information that that can 
be incorporated is limited. Nevertheless, even a single distribution 
character with a fairly small number of states (as in the example in my 
second posting), can considerably shorten identifications.

Intkey allows the user of any key to manually apply method 2 via a list of 
taxon names. This allows complete flexibility, as it's not necessary to 
rely on built-in lists. The list could be obtained by any means, e.g. from 
a publication or by a database query.

Any of the methods could be implemented by automatically linking to a 
specimen database, so that a user could make an arbitrary query of the 
database and have the resulting information used in the key.

Method 1 is not worth considering in this context, because of the 
intrinsic limitations of the method.

Method 2 would probably be fairly easy to implement for key programs that 
already support taxon subsets. As I said in my first posting: 'It would 
probably be possible to modify Intkey to query the specimen databases 
directly, without using an intermediate keyword-definition file.'

Method 3 would be more difficult to implement. It would be best to 
generalize the method to allow /any/ subset of taxa (not just those 
resulting from database queries) to take part in the 'error tolerance' 
mechanism. That is, the subset would behave like a character

#n. <membership of subset X>/
     1. belongs to subset X/
     2. does not belong to subset X/

The links to specimen databases would make use of this general mechanism. 
It would be possible to make (or change) the database query at any stage 
of the identification process.

For some purposes, it would be necessary to retain the ability to use 
subsets in the current manner, i.e. to absolutely include some taxa, and 
exclude the rest.

Mike Dallwitz
Contact information: http://delta-intkey.com/contact/dallwitz.htm
DELTA home page: http://delta-intkey.com

More information about the delta-l mailing list