Automatic hidden-web table interpretation, conceptualization, and semantic annotation

by Cui Tao and David W. Embley

This paper presents an approach which solves the problem of hidden-web table interpretations for cases in which sibling pages (commonly generated from data bases) are available by comparing these pages. The method distinguishes varying components (containing data) from non-varying components which contain category labels.

Extracting categories allows automatic creation of a conceptualization of the data and to semantically annotate the data to facilitate data access with query languages such as SPARQL. TISP++ (Table Interpretation with Sibling Pages) automatically generates an OWL ontology from the table conceptualizations which contains

  • OWL classes for tables
  • object properties to describe the table-nesting and relations between tables
  • data type properties which describe labels and
  • constraints (optional and functional), if applicable.