The Web is not a Person - An Analysis of the Performance of Named-Entity Recognition

less than 1 minute read

Krovetz, R. et al. 2011. The web is not a person, Berners-Lee is not an organization, and African-Americans are not locations: an analysis of the performance of named-entity recognition. Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World.

This paper presents an evaluation of named-entity recognition that is based on

the agreement rate between three different NER-systems, and
the number of identified ambiguous entries (i.e. entities that were assigned to different NRE-classes (e.g. PERSON and ORGANIZATION) in the same document.

The authors show that although literature reports an accuracy of 85-95% for named entity recognition the agreement rates between the classifiers and identified ambiguities suggest a much lower accuracy.

Resources

NER taggers
- Stanford tagger
- LBJ tagger
- IdentiFinder (proprietary)

Language resources used for training
- English Gigawords corpus
- Reuters 1996 new corpus
- North American News corpus

Other language resources
- ETS SourceFinder corpus
- American national Corpus (ANC) - http://www.anc.org/annotations.html (tagged with NE)

Suggestions and Remarks

the authors hypothesize that it is unlikely for two ambiguous words (same word but different NE class) to appear in the same document.
they suggest to use grammar patterns such as "Bank of [LOCATION]" versus "[LOCATION]" to distinguish different NE classes.

Share on

Twitter Facebook LinkedIn

Albert Weichselbraun

The Web is not a Person - An Analysis of the Performance of Named-Entity Recognition

Resources

Suggestions and Remarks

Share on

You may also enjoy

Big, Linked Geospatial Data and Its Application in Earth Observation

Employment relations: a data driven analysis of job markets using online job boards and online professional networks

Suffix array

Dynamic feature scaling for online learning of binary classifiers