HYENA-live: Fine-Grained Online Entity Type Classification from Natural-language Text

2 minute read

Yosef, M. A., Bauer, S., Hoffart, J., Spaniol, M., & Weikum, G. (2013). HYENA-live: Fine-Grained Online Entity Type Classification from Natural-language Text. In 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, Proceedings of the Conference System Demonstrations, 4-9 August 2013, Sofia, Bulgaria (pp. 133—138). The Association for Computer Linguistics.

Introduction

This article introduces HYENA-live a named entity recognition (NER) system which supports a fine-grained hierarchy and multi-labeling (e.g. a single named entity may be a manager, a researcher and a person). Most state-of-the art tools only consider a small set of entity types such as Persons, Locations and Organizations.

The most common workaround for performing fine-grained NER is to deploy named entity linking (NEL) first and afterwards derive the entity's type from the knowledge base. Nevertheless, this approach has the following drawbacks:

  1. NEL is an inherently hard problem, especially with ambiguous mentions.
  2. NEL only works with mentions which are mentioned in the knowledge base and, therefore, heavily depends on the quality of the underlying knowledge base.

Method

HYENA creates its classification taxonomy from YAGO by organizing 100 descendant classes under the following five top level classes: Person, Organization, Location, Event and Artifact. The 100 descendant classes are picked based on the number of YAGO entities tagged with the class (i.e. the 100 most frequently used classes are selected).

HYENA then uses LIBLINEAR to train an SVM with the following features set:

  1. the mention string and uni-, bi- and trigrams overlapping with the mention string (i.e. "short name" variants)
  2. sentence level context information including information on whether an n-gram occurs before or after the mention (compare: prefixes and suffixes in Recognyze)
  3. paragraph level context which are features within a window size of 2000 characters before and after the mention to obtain topical clues (compare Recognyze context terms).
  4. grammatical features such as (i) part-of-speech tags, (ii) the first occurrence of a pronoun (he/she) in the same sentence and in the subsequent sentence following the mention, and (iii) the closest verb-preposition pair preceding the mention.
  5. gazetteer features extracted from the YAGO knowledge base which contain type specific mentions (e.g. Alice for person, location, song and movie; Lainach for location, ...)

Resources and related work

  1. HYENA online demo: http://d5gate.ag5.mpi-sb.mpg.de/webhyena/
  2. Hoffart, J., Suchanek, F. M., Berberich, K., & Weikum, G. (2013). YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence, 194, 28—61. doi:10.1016/j.artint.2012.06.001
  3. Weichselbraun, A., Streiff, D., & Scharl, A. (2014). Linked Enterprise Data for Fine Grained Named Entity Linking and Web Intelligence. In 4th International Conference on Web Intelligence, Mining and Semantics (WIMS 2014). Thessaloniki, Greece