Automatic Semantic Web Annotation of Named Entities

1 minute read

Charton, E., Gagnon, M., & Ozell, B. (2011). Automatic semantic web annotation of named entities. In Proceedings of the 24th Canadian conference on Advances in artificial intelligence (pp. 74—85). Berlin, Heidelberg: Springer-Verlag.

Summary

This article presents a method for identifying named entities in text and linking them to a semantic knowledge base. In contrast to named entity recognition, which focuses on identifying the entity type (i.e. organization, person, location, etc), named entity linking determines which entity (i.e. individual) are mentioned in the text.

Method

The authors link entities to Wikipedia using the following description for each Wikipedia entity:

  1. surface forms (i.e. names that refer to this entity)
  2. entity description (i.e. the entity's context) - the tf/idf values are computed for each word occuring in these descriptions
  3. URI
The algorithm identifies candidate entities based on their surface forms. It then obtains context information by applying a sliding window to the text surrounding the entitiy. The final ranking of an entity is then performed by computing the cosine similarity between the candidate entities' tf/idf values and the context terms from the sliding window.

Experiments

The authors applied their approach to

  1. the French ESTER 2 corpus
  2. the Wall Street Journal (WSJ) corpus from the CoNLL Shared Task 2008.
and used the following two steps to annotate the corpora

  1. apply the annotator to provide tentative annotations
  2. manually remove or correct wrong semantic links
The evaluation only considered recall which amounted to 0.93 for French and 0.84 for English. The lower recall values for English entities are probably caused by the considerably greater size of the English Wikipedia which makes disambiguation tasks more difficult.