On How to Perform a Gold Standard Based Evaluation of Ontology Learning

1 minute read

by K. Dellschaft and St. Staab

This work provides an excellent overview of ontology evaluation measures, specifies criteria for good measures and introduces a new measure which considers (i) lexical precision and recall, (ii) taxonomic precision and recall, (iii) the taxonomic F- and F'-measure and (iv) taxonomic overlap. An evaluation demonstrates how the new evaluation measure allows to spot problems with ontology extension components more accurately.

Ontology Evaluation Measure:

  • lexical layer: binary versus scalar measures (e.g. based on the edit distance, etc.) used to compute precision and recall
  • taxonomy:
    • Taxonomic Overlap (TO) compares to concepts based on the set of all their super- and sup-concepts.
    • Learning Accuracy (LA) -> augmented precision and recall
    • Balance Distance Metric (BDM) -> augmented precision and recall
    • OntoRand Index: combines two alternative measure: (i) set of common ancestors, (ii) distance of the concepts within the tree (like LA and BDM)
  • non-taxonomic relations:

Criteria for Good Evaluation Measures:

  • support for multiple dimensions (different weights to different kinds of errors)
  • each measure should only be influence by one dimension
  • the impact of an error should be proportional to the distance between the correct and the incorrect result
  • monotony (a decrease of the measure corresponds to a worse ontology)