Automatic hidden-web table interpretation, conceptualization, and semantic annotation
by Cui Tao and David W. Embley
This paper presents an approach which solves the problem of hidden-web table interpretations for cases in which sibling pages (commonly generated from data bases) are available by comparing these pages. The method distinguishes varying components (containing data) from non-varying components which contain category labels.
Extracting categories allows automatic creation of a conceptualization of the data and to semantically annotate the data to facilitate data access with query languages such as SPARQL. TISP++ (Table Interpretation with Sibling Pages) automatically generates an OWL ontology from the table conceptualizations which contains
- OWL classes for tables
- object properties to describe the table-nesting and relations between tables
- data type properties which describe labels and
- constraints (optional and functional), if applicable.