
We explored the accuracy of assignments to genus, tribe and subfamily of 118 query species with five different assignment criteria, one distance-based and four tree-based. These Costa Rican species belong to Sphingidae; a family for which there is an almost complete DNA barcode reference library. An automated procedure was used to simulate different levels of species richness (10 to 100% of the available species) in reference libraries, and to record assignments (positive or ambiguous) and their accuracy (true or false based on current classification) under the five criteria.
Using a liberal tree-based criterion, an average of 80% of queries were accurately assigned a genus name with libraries containing 20% of available species, while 87% were accurately assigned a genus name with a library containing all available species. The liberal tree-based criterion assigned an average of 74% of queries accurately to tribes and an average of 90% accurately to subfamilies, across all libraries. The results suggest that the species richness of the reference library had only a weak effect on assignment accuracy, whereas which assignment criterion was used had a strong effect. Additional parameters in the tree-based criteria, such as exclusivity of taxa, decreased the number of false positive assignments, but also increased the number of false ambiguous assignments. Our findings suggest that barcode reference libraries can be useful for higher-taxon assignments long before the libraries achieve complete species richness.