Identifying Relations for Open Information Extraction

1 minute read

by Fader et al.

This paper addresses two major shortcomings of state of the art open information extraction systems:

  1. uninformative extractions that omit critical information "Faust made a deal with the devil" -> "Faust" - "made" - "deal"
  2. incoherent extractions that yield phrase with no meaningful interpretation "The guide contains dead links and omits sites" -> "contains omits" "The Mark 14 was central to the torpedo scandal of the fleet" -> "was central torpedo"
The authors show that the use of syntactic and lexical constraints yield significant improvements in terms of precision.

Important Definitions

  • information extraction systems - learn an extractor per target relation and therefore do not scale to well
  • open information extraction - identify relational phrases by
    • labeling sentences using heuristics or supervision
    • learning phrases based on the training examples (TextRunner needs approximately 200,000 heuristically labeled sentences)
    • extracting data based on the learned model
  • light verb constructions (LVC) are multi-word expressions that are composed of a verb and a noun, with the noun carrying the semantic content of the predicate examples:

    • is -> is an album my, is the author of, is a city in
    • has -> has a population of, has a Ph.D. in, ...
Applications of Open IE

  • learning of selectional preferences (Ritter et al., 2010)
  • acquiring common sense knowledge (Lin et al., 2010)
  • recognizing entailment (Schoenmackers et al., 2010; Berant et al., 2011)
  • mapping onto existing ontologies (Soderland et al., 2010)
Syntactic constraints

  • require phrases to match POS tag patterns
  • multiple possible matches => the longest is chosen
Lexical constraints

  • prevent over-specified relation phrases such as "is offering only ...." by requiring phrases to appear multiple times in the corpus
Used evaluation metric

  • Precision-Recall curve