Evaluating Entity Linking with Wikipedia

1 minute read

Hachey, B. et al., 2013. Evaluating Entity Linking with Wikipedia. Artificial Intelligence, 194, pp.130—150.

This article compares the performance of three methods for named entity recognition against a Wikipedia gold standard. The authors also introduce a framework for named entity linking which allows an easier comparison of different approaches.

Named Entity Linking Framework

  1. Extractors detect and prepare named entity mentions and, therefore, often include pre-processing and tokenization steps.
  2. Searchers, determine candidate entities for mentions.
  3. Disambiguators, select the best fitting entity for every mention.
The following sections describe the NER approaches that have been compared by Hachey et al.

Bunescu and Pasca

The authors use Support Vector Machines (SVM) for disambiguation and rank candidate entities based on the following features:

  1. the cosine similarity between the query context and the text of the candidate entity page on Wikipedia
  2. a combinations of candidate categories according to Wikipedia classifications and context words.

Cucerzan

  1. Cuerzan uses a naive in-document co-reference resolution that condenses multiple mentions to the longest possible canonical mention in the document. (e.g. IBM Development Center, IBM, IBM Development Center Switzerland -> "IBM Development Center Switzerland")
  2. Uppercase mentions are considered to be acronyms and mapped to canonical mentions if the acronym letters correspond to the mention.
  3. Disambiguation takes place based on document-level vectors derived from all entity mentions which are compared to candidate vectors considering an article's categories and contexts.

Varma et al.

  1. The authors first identify acronyms and look for their verbose forms based on the starting letters.
  2. The searcher distinguishes between acronym and non-acronym queries and adapts its strategy accordingly.
  3. The disambiguation is based on the textual similarity between the query context and the text of the query page, using the cosine measure.

Evaluation Measures

The following metrics have been used to assess the performance of the NEL methods:

  1. accuracy
  2. candidate count
  3. candidate precision
  4. candidate recall
  5. nil precision
  6. nil recall

Literature

  1. Bunescu, R. & Pasca, M., 2006. Using encyclopedic knowledge for named entity disambiguation. In Proceedings of the 11th Conference of the EACL. Trento, Italy, pp. 9—16.
  2. Cucerzan, S., 2007. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP). pp. 708—716.
  3. Varma, V. et al., 2009. IIIT Hyderabad at TAC 2009. In Proceedings of the Text Analysis Conference (TAC).