An empirical study of the effects of NLP components on Geographic IR performance

less than 1 minute read

by Stokes et al.

This article focuses on the impact of NLP components on

  • the task of toponym resolution (TR) and
  • geographic information retrieval (GIR)

TR consists of two steps: (a) named entity recognition and classification (NERC) and (b) toponym resolution (TR) grounding the geographic entries to unique location entities. Stokes et al. perform their evaluation on the GeoCLEF collection (based on the Los Angeles Times and the Glasgow Herald) which tends to be more challenging to handle due to its high number of less well-known local place names.


  • they manually identified metonymic references (e.g. "Washington defends its invasion of Iraq" => Washington = political entity != location).
  • they note that NERC systems report F-scores of around 90% on newswire data, which is much lower if they deal with non-standard sources
  • they emphasize the usefulness of predominant sense information (e.g. the location with the highest population is expected to be the one named most frequently) for the word sense disambiguation (WSD) step
  • they note a comparison of toponym resolution heuristics done by Leidner 2006a


  • Leidner2006a: Toponym resolution: A first large-scale comparative evaluation