Collective Intelligence
by Segaran Toby
An excellent guide to programming Web 2.0 applications, with code examples and excellent explanations of the used techniques.
Similarity Metrics
- Eucleadian distance
- Pearson coefficient (corrects grade inflation; users giving constantly higher/lower ratings)
- Tanimoto coeffient $$\frac{A \cap B}{A \cup B}$$
Clustering The author applies Fick's law to clustering (only cluster terms, occuring in >0.1 and <0.5 percentage of the documents.
- hierarchical clustering
- k-means
- Multidimensional Scaling (the clustering distance is proportional to the relations between the terms)
Search engines The book presents weighting techniques for search engine's, including:
- Number of occurrences
- Document location (early words have higher weights)
- Word distance (for multiple terms)
- Page rank
- Link text (higher weights for terms occurring in links)