A conceptual density-based approach for disambiguation of toponyms

less than 1 minute read

by Buscaldi and Rosso

This article explores the use of word-sense-disambiguation (WSD) techniques for toponym resolution. The authors explain two algorithms which are used for WSD:

Lesk's semantic similarity, which computes the similarity between a sense of the word and the context by calculating the overlap. Patwardhan et al (2003) have shown that this measures is among the best for computing the semantic relatedness of two concepts.
Conceptual density (CD) based WSD, which has been developed by the authors and computes the CD measures for every sense, selecting the sense with the highest CD value: $$CD(m, f, n) = m^{\alpha} (m/n)^{log f}$$

m ... number of relevant subsets in the (wordnet) subhierarchy of the given sense
n ... number of total synsets in the subhierarchy of the given sense
f ... frequency rank of the sense (1, 2, ...)

Using CD-WSD with a disambiguation windows of only two nouns, yielded a precision of over 81.5% on nouns in the SemCor corpus.

Share on

Twitter Facebook LinkedIn

Albert Weichselbraun

A conceptual density-based approach for disambiguation of toponyms

Share on

You may also enjoy

Big, Linked Geospatial Data and Its Application in Earth Observation

Employment relations: a data driven analysis of job markets using online job boards and online professional networks

Suffix array

Dynamic feature scaling for online learning of binary classifiers