An empirical study of the effects of NLP components on Geographic IR performance

less than 1 minute read

This article focuses on the impact of NLP components on

the task of toponym resolution (TR) and
geographic information retrieval (GIR)

TR consists of two steps: (a) named entity recognition and classification (NERC) and (b) toponym resolution (TR) grounding the geographic entries to unique location entities. Stokes et al. perform their evaluation on the GeoCLEF collection (based on the Los Angeles Times and the Glasgow Herald) which tends to be more challenging to handle due to its high number of less well-known local place names.

Evaluation

they manually identified metonymic references (e.g. "Washington defends its invasion of Iraq" => Washington = political entity != location).
they note that NERC systems report F-scores of around 90% on newswire data, which is much lower if they deal with non-standard sources
they emphasize the usefulness of predominant sense information (e.g. the location with the highest population is expected to be the one named most frequently) for the word sense disambiguation (WSD) step
they note a comparison of toponym resolution heuristics done by Leidner 2006a

Bibiography

Leidner2006a: Toponym resolution: A first large-scale comparative evaluation

Share on

Twitter Facebook LinkedIn

Albert Weichselbraun

An empirical study of the effects of NLP components on Geographic IR performance

Share on

You may also enjoy

Big, Linked Geospatial Data and Its Application in Earth Observation

Employment relations: a data driven analysis of job markets using online job boards and online professional networks

Suffix array

Dynamic feature scaling for online learning of binary classifiers