On How to Perform a Gold Standard Based Evaluation of Ontology Learning
by K. Dellschaft and St. Staab
This work provides an excellent overview of ontology evaluation measures, specifies criteria for good measures and introduces a new measure which considers (i) lexical precision and recall, (ii) taxonomic precision and recall, (iii) the taxonomic F- and F'-measure and (iv) taxonomic overlap. An evaluation demonstrates how the new evaluation measure allows to spot problems with ontology extension components more accurately.
Ontology Evaluation Measure:
- lexical layer: binary versus scalar measures (e.g. based on the edit distance, etc.) used to compute precision and recall
- taxonomy:
- Taxonomic Overlap (TO) compares to concepts based on the set of all their super- and sup-concepts.
- Learning Accuracy (LA) -> augmented precision and recall
- Balance Distance Metric (BDM) -> augmented precision and recall
- OntoRand Index: combines two alternative measure: (i) set of common ancestors, (ii) distance of the concepts within the tree (like LA and BDM)
- non-taxonomic relations:
- support for multiple dimensions (different weights to different kinds of errors)
- each measure should only be influence by one dimension
- the impact of an error should be proportional to the distance between the correct and the incorrect result
- monotony (a decrease of the measure corresponds to a worse ontology)