AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data

1 minute read

Summary

This paper introduces a graph-based disambiguation approach for named entity linking that achieves higher F-measures than the state of the art and a quadratic time complexity. The authors consider named entity linking an optimization problem which optimizes the assignment \(\mu^*\) with

\(\mu^* = \underset{\mu}{\text{arg max}} (\psi(\mu(C,N), N) + \phi(\mu(C,N),K))\).

Method

The approach use named entity recognition to extract named entities from the text.
A coherence function \(\psi\) determines candidate entities C in the knowledge base K (e.g. based on a string similarity function such as trigram similarity and known surface forms).
An **expansion policy ensures the mapping of mentions that are substrings of other mentions to the same resource (e.g. Obama and Barack are mapped to Barack Obama, if the later has been mentioned earlier).
Computation of the optimal assignment of candidate nodes to named entities using a disambiguation graph \(G_d\) and the HITS algorithm.

Evaluation

The evaluation has been performed with the (i) English and German DBpedia, and (ii) English YAGO 2 knowledge bases on a total of six different evaluation corpora yielding the following insights:</p>

the search depth d (i.e. number of links considered in the disambiguation graph) has only a limited influence on the algorithm’s performance (unless d=0 is chosen).
using string similarity only (rather than surface forms) considerably reduces performance
expanding surface forms (e.g. Obama to Barack Obama if the later form has occured earlier in the text) boosts the F-measure of AGDISTIS by 4%

Share on

Twitter Facebook LinkedIn

Albert Weichselbraun

AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data

Summary

Method

Evaluation

Share on

You may also enjoy

Big, Linked Geospatial Data and Its Application in Earth Observation

Employment relations: a data driven analysis of job markets using online job boards and online professional networks

Suffix array

Dynamic feature scaling for online learning of binary classifiers