Thematic Exploration of Linked Data

1 minute read

by Castano et al.; Very Large Data Search (VLDS) 2011

This article addresses the problem of organizing linked data, which features an inherent flat organization, into a more easily browsable representation that the authors call inCloud.

Creating an inCloud requires:

a computation of the similarity between nodes based on a string metric (the dice coefficient) that considers terms Term$$_i$$ appearing in a node $$n_i$$ and all adjacent nodes.
clustering (thematic aggregation) based on the interconnectivity between nodes by using the clique percolation method (CPM), which yields multiple, potentially overlapping cliques.
the computation of descriptive terms for each clique by identifying frequent node types (rdf:type)

The prominence of clusters is determined by considering

the clusters variability, i.e. the degree of overlap among the cliques in that cluster (lower overlap = better defined and more consistent topic)
the cluster density, i.e. a higher density indicates a more focused and homogeneous discussion of the topic.

Graph Theory

The cluster density is computed based on the ratio between the number of links in the cluster $$R_i$$ and the maximum number of possible links:

\[ d_i = \frac{2\cdot R_i}{N_i(N_i-1)} \]

A clique is a subset of vertices so that for every two vertices in the clique there exists an edge connecting them. A pre-print of the Nature article describing the clique percolation method (CPM) can be found here.

Share on

Twitter Facebook LinkedIn

Albert Weichselbraun

Thematic Exploration of Linked Data

Graph Theory

Share on

You may also enjoy

Big, Linked Geospatial Data and Its Application in Earth Observation

Employment relations: a data driven analysis of job markets using online job boards and online professional networks

Suffix array

Dynamic feature scaling for online learning of binary classifiers