Evaluating Entity Linking with Wikipedia

1 minute read

Hachey, B. et al., 2013. Evaluating Entity Linking with Wikipedia. Artificial Intelligence, 194, pp.130—150.

This article compares the performance of three methods for named entity recognition against a Wikipedia gold standard. The authors also introduce a framework for named entity linking which allows an easier comparison of different approaches.

Named Entity Linking Framework

Extractors detect and prepare named entity mentions and, therefore, often include pre-processing and tokenization steps.
Searchers, determine candidate entities for mentions.
Disambiguators, select the best fitting entity for every mention.

The following sections describe the NER approaches that have been compared by Hachey et al.

Bunescu and Pasca

The authors use Support Vector Machines (SVM) for disambiguation and rank candidate entities based on the following features:

the cosine similarity between the query context and the text of the candidate entity page on Wikipedia
a combinations of candidate categories according to Wikipedia classifications and context words.

Cucerzan

Cuerzan uses a naive in-document co-reference resolution that condenses multiple mentions to the longest possible canonical mention in the document. (e.g. IBM Development Center, IBM, IBM Development Center Switzerland -> "IBM Development Center Switzerland")

Uppercase mentions are considered to be acronyms and mapped to canonical mentions if the acronym letters correspond to the mention.
Disambiguation takes place based on document-level vectors derived from all entity mentions which are compared to candidate vectors considering an article's categories and contexts.

Varma et al.

The authors first identify acronyms and look for their verbose forms based on the starting letters.
The searcher distinguishes between acronym and non-acronym queries and adapts its strategy accordingly.
The disambiguation is based on the textual similarity between the query context and the text of the query page, using the cosine measure.

Evaluation Measures

The following metrics have been used to assess the performance of the NEL methods:

accuracy
candidate count
candidate precision
candidate recall
nil precision
nil recall

Literature

Bunescu, R. & Pasca, M., 2006. Using encyclopedic knowledge for named entity disambiguation. In Proceedings of the 11th Conference of the EACL. Trento, Italy, pp. 9—16.
Cucerzan, S., 2007. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP). pp. 708—716.
Varma, V. et al., 2009. IIIT Hyderabad at TAC 2009. In Proceedings of the Text Analysis Conference (TAC).

Share on

Twitter Facebook LinkedIn

Albert Weichselbraun

Evaluating Entity Linking with Wikipedia

Named Entity Linking Framework

Bunescu and Pasca

Cucerzan

Varma et al.

Evaluation Measures

Literature

Share on

You may also enjoy

Big, Linked Geospatial Data and Its Application in Earth Observation

Employment relations: a data driven analysis of job markets using online job boards and online professional networks

Suffix array

Dynamic feature scaling for online learning of binary classifiers