A conceptual density-based approach for disambiguation of toponyms

less than 1 minute read

by Buscaldi and Rosso

This article explores the use of word-sense-disambiguation (WSD) techniques for toponym resolution. The authors explain two algorithms which are used for WSD:

  • Lesk's semantic similarity, which computes the similarity between a sense of the word and the context by calculating the overlap. Patwardhan et al (2003) have shown that this measures is among the best for computing the semantic relatedness of two concepts.
  • Conceptual density (CD) based WSD, which has been developed by the authors and computes the CD measures for every sense, selecting the sense with the highest CD value: $$CD(m, f, n) = m^{\alpha} (m/n)^{log f}$$
    • m ... number of relevant subsets in the (wordnet) subhierarchy of the given sense
    • n ... number of total synsets in the subhierarchy of the given sense
    • f ... frequency rank of the sense (1, 2, ...)
  • Using CD-WSD with a disambiguation windows of only two nouns, yielded a precision of over 81.5% on nouns in the SemCor corpus.