Identifying Relations for Open Information Extraction

Albert Weichselbraun

is a Professor of Information Science at the University of Applied of the Grisons.

Identifying Relations for Open Information Extraction

1 minute read

by Fader et al.

This paper addresses two major shortcomings of state of the art open information extraction systems:

uninformative extractions that omit critical information "Faust made a deal with the devil" -> "Faust" - "made" - "deal"

incoherent extractions that yield phrase with no meaningful interpretation "The guide contains dead links and omits sites" -> "contains omits" "The Mark 14 was central to the torpedo scandal of the fleet" -> "was central torpedo"

The authors show that the use of syntactic and lexical constraints yield significant improvements in terms of precision.

Important Definitions

information extraction systems - learn an extractor per target relation and therefore do not scale to well
open information extraction - identify relational phrases by
- labeling sentences using heuristics or supervision
- learning phrases based on the training examples (TextRunner needs approximately 200,000 heuristically labeled sentences)
- extracting data based on the learned model

light verb constructions (LVC) are multi-word expressions that are composed of a verb and a noun, with the noun carrying the semantic content of the predicate examples:
- is -> is an album my, is the author of, is a city in
- has -> has a population of, has a Ph.D. in, ...

Applications of Open IE

learning of selectional preferences (Ritter et al., 2010)
acquiring common sense knowledge (Lin et al., 2010)
recognizing entailment (Schoenmackers et al., 2010; Berant et al., 2011)
mapping onto existing ontologies (Soderland et al., 2010)

Syntactic constraints

require phrases to match POS tag patterns
multiple possible matches => the longest is chosen

Lexical constraints

prevent over-specified relation phrases such as "is offering only ...." by requiring phrases to appear multiple times in the corpus

Used evaluation metric

Precision-Recall curve

Share on

Twitter Facebook LinkedIn

You may also enjoy

Big, Linked Geospatial Data and Its Application in Earth Observation

less than 1 minute read

Integrating earth observation data with linked open data would pave the way for easy reuse and integration of these datasets. The article discusses how knowl...

Employment relations: a data driven analysis of job markets using online job boards and online professional networks

less than 1 minute read

Career websites contain valuable data on employees, their skill sets and, employment history. This article uses k-means clustering on keywords describing ski...

Suffix array

1 minute read

The suffix array is a memory-efficient alternative to the suffix tree which provides a sorted list of string indices indicating the string’s suffixes.

Dynamic feature scaling for online learning of binary classifiers

less than 1 minute read

This article describes and evaluates different online feature scaling approaches and their impact on the performance of binary classifiers. online feature...