Relation Extraction and the Influence of Automatic Named-Entity Recognition

2 minute read

Giuliano, C., Lavelli, A. & Romano, L., 2007. Relation extraction and the influence of automatic named-entity recognition. ACM Transactions on Speech and Language Processing, 5(1), pp.2:1—2:26.

Introduction

Relation extraction aims at identifying (directed) (binary) relations $$R_{ij} = (E_i, E_j) := R_{ji} = (E_j, E_i)$$ in text documents. This article introduces an approach that uses kernel functions to integrate information from (i) the sentence where the relation appears, and (ii) the local context around the interacting entities.

Method

The authors treat relation extraction as a classification task that distinguishes the following classes:

correct: locatedIn(Chur, Switzerland) -> 2
correct, but incorrect direction: locatedIn(Switzerland, Chur)-> 1
incorrect: locatedIn(Chur, St. Gallen) -> 0
wrong entity types: locatedIn(Christian Toth, Chur) -> -1

The method uses data from two different kinds of kernel functions (see below) for assigning sentences to one of the four classes. These kernels are combined to a shallow linguistic kernel ($$K_{SL}$$) using the linear combination

\[ K_{SL}(R_1,R_2) = K_{GC}(R_1,R_2) + K_{LC}(R_1,R_2). \]

The method was implemented using the LibSVM package.

Global Context Kernel

Bunesco and Mooney (2005) observe that relations between entities are usually expressed in one of the following contexts:

Fore-Between (FB): e.g. "the head of [org], Dr [per]"
Between (B): e.g. "[org] spokesman [per]"
Between-After (BA): e.g. "[per], a [org] law professor"

The context is represented as an unordered bag-of-words that contains the number of times a particular token (and the corresponding n-grams) $$t_i$$ is used in $$C$$ yielding the global context kernel ($$K_{GC}$$):

\[ K_{GC}(R_1, R_2) = K_{FB}(R_1, R_2) + K_{B}(R_1, R_2) + K_{BA}(R_1, R_2)\]

Local Context Kernel

The local context often provides clues for (i) the presence of a relation and (ii) its direction. The authors represent each local context by using the following basic features considering the ordering of tokens

Token
Lemma of the token
POS-Tag of the token
Stem of the token
Orthographic, a function that maps tokens into equivalence classes such as capitalization, punctuation and numerals.

The local kernel therefore amounts to

\[ K_{LC}(R_1, R_2) = K_{left}(R_1, R_2) + K_{right}(R_1, R_2)\]

Evaluation

The authors performed a 5-fold cross-validation with the dataset used by Roth and Yin (2007) that is based on the TREC 2004 corpus considering the following relation types: locatedIn, workFor, orgBasedIn, liveIn, kill, and noRel yielding

F1 values between 71 and 82% for gold-standard named entities (=> all named entities are known), and
F1 values between 69 and 81% without the gold-standard named entities. The evaluation also discusses the impact of named entities introduced by an incorrect NER (spurious named entities) and of missing named entities.

Bibliography

Bunescu, R.C. & Mooney, R.J., 2005. Subsequence Kernels for Relation Extraction. In 19th Conference on Neural Information Processing Systems (NIPS™05). Vancouveer, British Columbia, Canada.

Roth, D. & Yih, W., 2004. A Linear Programming Formulation for Global Inference in Natural Language Tasks. In H. T. Ng & E. Riloff, eds. 8th Conference on Computational Natural Language Learning (CoNLL 2004). Association for Computational Linguistics, pp. 1—8.

Share on

Twitter Facebook LinkedIn

Albert Weichselbraun

Relation Extraction and the Influence of Automatic Named-Entity Recognition

Introduction

Method

Global Context Kernel

Local Context Kernel

Evaluation

Bibliography

Share on

You may also enjoy

Big, Linked Geospatial Data and Its Application in Earth Observation

Employment relations: a data driven analysis of job markets using online job boards and online professional networks

Suffix array

Dynamic feature scaling for online learning of binary classifiers