# Thematic Exploration of Linked Data

by Castano et al.; Very Large Data Search (VLDS) 2011

This article addresses the problem of organizing linked data, which features an inherent flat organization, into a more easily browsable representation that the authors call inCloud.

Creating an inCloud requires:

- a computation of the
**similarity between nodes**based on a*string metric*(the dice coefficient) that considers terms*Term*$$_i$$ appearing in a node $$n_i$$ and all adjacent nodes. - clustering (thematic aggregation) based on the
**interconnectivity**between nodes by using the clique percolation method (CPM), which yields multiple, potentially overlapping cliques. - the computation of descriptive terms for each clique by identifying frequent node types (rdf:type)

**prominence**of clusters is determined by considering

- the
*clusters variability*, i.e. the degree of overlap among the cliques in that cluster (lower overlap = better defined and more consistent topic) - the
*cluster density*, i.e. a higher density indicates a more focused and homogeneous discussion of the topic.

## Graph Theory

The cluster density is computed based on the ratio between the number of links in the cluster $$R_i$$ and the maximum number of possible links:\[ d_i = \frac{2\cdot R_i}{N_i(N_i-1)} \]

A **clique** is a subset of vertices so that for every two vertices in the clique there exists an edge connecting them. A pre-print of the Nature article describing the clique percolation method (CPM) can be found here.