Automatic Semantic Web Annotation of Named Entities

1 minute read

Charton, E., Gagnon, M., & Ozell, B. (2011). Automatic semantic web annotation of named entities. In Proceedings of the 24th Canadian conference on Advances in artificial intelligence (pp. 74—85). Berlin, Heidelberg: Springer-Verlag.

Summary

This article presents a method for identifying named entities in text and linking them to a semantic knowledge base. In contrast to named entity recognition, which focuses on identifying the entity type (i.e. organization, person, location, etc), named entity linking determines which entity (i.e. individual) are mentioned in the text.

Method

The authors link entities to Wikipedia using the following description for each Wikipedia entity:

surface forms (i.e. names that refer to this entity)
entity description (i.e. the entity's context) - the tf/idf values are computed for each word occuring in these descriptions
URI

The algorithm identifies candidate entities based on their surface forms. It then obtains context information by applying a sliding window to the text surrounding the entitiy. The final ranking of an entity is then performed by computing the cosine similarity between the candidate entities' tf/idf values and the context terms from the sliding window.

Experiments

The authors applied their approach to

the French ESTER 2 corpus
the Wall Street Journal (WSJ) corpus from the CoNLL Shared Task 2008.

and used the following two steps to annotate the corpora

apply the annotator to provide tentative annotations
manually remove or correct wrong semantic links

The evaluation only considered recall which amounted to 0.93 for French and 0.84 for English. The lower recall values for English entities are probably caused by the considerably greater size of the English Wikipedia which makes disambiguation tasks more difficult.

Share on

Twitter Facebook LinkedIn

Albert Weichselbraun

Automatic Semantic Web Annotation of Named Entities

Summary

Method

Experiments

Share on

You may also enjoy

Big, Linked Geospatial Data and Its Application in Earth Observation

Employment relations: a data driven analysis of job markets using online job boards and online professional networks

Suffix array

Dynamic feature scaling for online learning of binary classifiers