Collective Intelligence
by Segaran Toby
An excellent guide to programming Web 2.0 applications, with code examples and excellent explanations of the used techniques.
Similarity Metrics
- Eucleadian distance
 - Pearson coefficient (corrects grade inflation; users giving constantly higher/lower ratings)
 - Tanimoto coeffient $$\frac{A \cap B}{A \cup B}$$
 
Clustering The author applies Fick's law to clustering (only cluster terms, occuring in >0.1 and <0.5 percentage of the documents.
- hierarchical clustering
 - k-means
 - Multidimensional Scaling (the clustering distance is proportional to the relations between the terms)
 
Search engines The book presents weighting techniques for search engine's, including:
- Number of occurrences
 - Document location (early words have higher weights)
 - Word distance (for multiple terms)
 - Page rank
 - Link text (higher weights for terms occurring in links)