Record Matching in Digital Library Metadata

less than 1 minute read

Kan, Min-Yen and Tan, Yee Fan: Record Matching in Digital Library Metdata, Communications of the ACM, Volume 51 (2), 91-94

The article provides an excellent overview over techniques used to identify duplicate records in digital libraries. The authors presents

  • uniform string matching techniques based on set-matching (the Jaccard measure, cosine similarity, degree of similarity, ...), sequence based measures (edit distance), and hyprid approaches (e.g. set-matching for single words and sequence based measures for the terms in sentences), and
  • graphical formalisms as for instance social networks, network cuts and random walks to distinguish different authors with the same name.