Collective Intelligence

less than 1 minute read

by Segaran Toby

An excellent guide to programming Web 2.0 applications, with code examples and excellent explanations of the used techniques.

Similarity Metrics

  • Eucleadian distance
  • Pearson coefficient (corrects grade inflation; users giving constantly higher/lower ratings)
  • Tanimoto coeffient $$\frac{A \cap B}{A \cup B}$$

Clustering The author applies Fick's law to clustering (only cluster terms, occuring in >0.1 and <0.5 percentage of the documents.

  • hierarchical clustering
  • k-means
  • Multidimensional Scaling (the clustering distance is proportional to the relations between the terms)

Search engines The book presents weighting techniques for search engine's, including:

  • Number of occurrences
  • Document location (early words have higher weights)
  • Word distance (for multiple terms)
  • Page rank
  • Link text (higher weights for terms occurring in links)