Automatic Semantic Web Annotation of Named Entities
Charton, E., Gagnon, M., & Ozell, B. (2011). Automatic semantic web annotation of named entities. In Proceedings of the 24th Canadian conference on Advances in artificial intelligence (pp. 74—85). Berlin, Heidelberg: Springer-Verlag.
This article presents a method for identifying named entities in text and linking them to a semantic knowledge base. In contrast to named entity recognition, which focuses on identifying the entity type (i.e. organization, person, location, etc), named entity linking determines which entity (i.e. individual) are mentioned in the text.
The authors link entities to Wikipedia using the following description for each Wikipedia entity:
The algorithm identifies candidate entities based on their surface forms. It then obtains context information by applying a sliding window to the text surrounding the entitiy. The final ranking of an entity is then performed by computing the cosine similarity between the candidate entities' tf/idf values and the context terms from the sliding window.
- surface forms (i.e. names that refer to this entity)
- entity description (i.e. the entity's context) - the tf/idf values are computed for each word occuring in these descriptions
The authors applied their approach to
and used the following two steps to annotate the corpora
- the French ESTER 2 corpus
- the Wall Street Journal (WSJ) corpus from the CoNLL Shared Task 2008.
The evaluation only considered recall which amounted to 0.93 for French and 0.84 for English. The lower recall values for English entities are probably caused by the considerably greater size of the English Wikipedia which makes disambiguation tasks more difficult.
- apply the annotator to provide tentative annotations
- manually remove or correct wrong semantic links