[Banko:2008] Banko, Michele and Etzioni, Oren (2008). ''The Tradeoffs Between Open and Traditional Relation Extraction'', Proceedings of ACL-08: HLT, Association for Computational Linguistics, pages 28--36
This article discusses the shortcomings of traditional relation extraction methods and introduces Open Relation Extraction techniques. These techniques match the precision of traditional extraction systems but do not yield the same recall, although the paper presents a hybrid method yielding comparable recall.
The authors apply Conditional Random Fields (CRF), which are undirected graphical models trained to maximize the conditional probability of a finite set of labels Y given a set of input observations X. The algorithm considers features such as part-of-speech (POS) tags, regular expressions (capitalization, punctation, etc.), context words and conjunctions of features occurring in a sliding windows of 13 words . The authors train their method with sets of labeled examples.
Entities are extracted by performing
- entity identification using a phrase chunker, and
- labeling relations for possible entitiy pairs
The issue of low recall is addressed by creating a relation-specific algorithm (R1-CRF), which the authors combine with the generic approach.