Domain relevance of terminology

less than 1 minute read

based on Navigli, R. and Velardi, P. (2004). ''Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites'', Computational Linguistics, pages 151--179, 30(2)

The authors define two measures to judge the relevance of a terms:
  • domain relevance ($$DR$$) - the conditional probability of a term to appear in one domain $$(P(t|D_k)$$ divided by the maximum of this probability $$DR_{t,k} = P(t|D_k) / max P(t|D_j)$$
  • domain consensus ($$DC$$) - which measures the distributed use of the term in a domain $$D_k$$: $$DC_{t,k} = \sum_{d \in D_k}( P_t(d) log( 1/P_t(d))$$
They also apply a word sense disambiguation (WSD) algorithm called structural semantic interconnection (SSI), based on WordNet's concept sense definitions.