An evaluation dataset for the toponym resolution task

Leidner, Jochen L. (2006). ''An evaluation dataset for the toponym resolution task'', Computers, Environment and Urban Systems, pages 400-417

This paper motivates the need for a geo-reference corpus for evaluating geo-tagging and describes the process of designing such a corpus (including a reference gazetteer). According to Leidner both, corpus and gazetteer, need to be specified because the gazetteer influences the outcome of any evaluation experiment.

Criteria for the gazeteer selection The author states six criteria, including (i) scope, (ii) coverage, (iii) correctness, (iv) granularity, (v) balance (uniform degree of detail, correctness, etc.), and (vi) the richness of annotation.

Geo-Tagging Leidner defines the following tasks:

  • geo-coding: Mapping from implicitly geo-referenced data to an explicitly geo-referenced representation (grounding)
  • named-entity processing: consists of two-steps: (a) flat text span recognition (where does a name begin/end), (b) atomic classification/grounding (what type of name is it)