From Names to Entities using Thematic Context Distance

2 minute read

Pilz, A., & Paaß, G. (2011). From Names to Entities Using Thematic Context Distance. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM), pp. 857—866


Named entity disambiguation aims to link mentions of named entities to unique entities within a knowledge repository such as DBpedia. Miller and Charles (1991) have shown that words with similar meaning usually appear in similar contexts. Hence, many techniques rely on a mention's context for disambiguation. Nevertheless, approaches which rely on similar words rather than similar topics are not robust if different wordings are used.

Topic models address this problem, by using topics rather than words to describe the entity context. The research presented in this paper uses LDA to create such a model and applies it to the English, German and French versions of Wikipedia.


Given is a name mention (m) and surrounding text (T(m)). Named entity disambiguation aims at linking the correct entity e(m) out of the set of candidate entities $$\epsilon(m) = \{e_1, ... e_{|\epsilon|}\}$$ with $$name(e_i) = m$$ to the mention. The method uses the Wikipedia text $$T(e_j)$$ of entity $$e_j$$ to assess whether $$e_j$$ is a good candidate for e(m).

The authors use the complete Wikipedia article as context (T(m)) since it yields better results than a sliding window. The named entity disambiguation is formulated as the following binary classification problem:

$$y(\vec{\phi}(m, e_j)) = \begin{cases} +1, & \text{if } e(m) = e_j\\ -1, & else.\\\end{cases}$$

The entity with the highest score $$y(\vec{\phi}(m, e_j))$$ is considered the most likely entity for the mention (m) and the difference between the best and second best scores indicate the confidence of the decision. The paper uses LDA topic probabilities as features to train an SVM which is then used in the evaluation.

Background: Latent Dirichlet Allocation (LDA)

  1. LDA generates a low-dimensional representation of sparse high-dimensional data (i.e. they convert high-dimensional word vectors used in the vector space model into low-dimensional topic vectors).
  2. they are considered as Bayesian probabilistic models which describe documents d as mixtures of topics $$t_i$$.
  3. the resulting word distributions $$p(w_n|z_n, \beta)$$ for each topic ($$z_n$$) have high probabilities for co-occurring words. They, therefore, address the problem of polysemy and synonymy.

Background: Thematic Distance

The authors discuss the following options for computing thematic distances:

  1. use the document topic probabilities as features for a machine learning algorithm such as SVM
  2. symmetric Kullback-Leibler divergence
  3. Helliner distance


  1. the authors use links to other Wikipedia articles within Wikipedia as reference data (i.e. a mention linking to Apply_Inc is considered to refer to the corresponding company)
  2. non-covered entities are simulated by removing a fixed fraction of entities from the test set and assigning them to the "unknown" class.
  3. the use approaches with apply (i) a cosine similarity measure and (ii) word-category pairs as baseline methods.
  4. the evaluation shows that the trained disambiguation models obtains good results for all three languages.