# Evaluation Without Ground Truth in Social Media Research

Zafarani, Reza, and Huan Liu. Evaluation Without Ground Truth in Social Media Research.

*Communcations of the ACM*58, no. 6 (May 2015): 54—60. doi:10.1145/2666680.### Summary

This paper discusses strategies for evaluating research findings in settings where no ground truth (or gold standard) is available, covering the following three types of evaluations:- Spatiotemporal - predict when and/or where things are going to happen
- Causality - why events are happening, and
- Outcome - predicting a certain outcome

#### Spatiotemporal evaluation

- draw upon the periodicity of individual behavior (i.e. use behavior observed in the first
*n*days as training to evaluate the next*m*days). - crowdsourcing
- ensemble methods (combine statistically independent methods)

#### Causality evaluation

- controlled experiments (two groups: control and treatment)
- natural experiments
- use a
*naturally*occurring event to measure the influence of an independent variable - temporal shuffling (i.e. use control groups from the time before an influencing independent variable became active)

- use a
- nonequivalent control - select the control group to be considered similar to a randomized group
- Granger (pseudo) causality - i.e. consider a variable
*X*causal, if*X*improved the prediction of values*Y*compared to*Y*alone

#### Outcome evaluation

Outcome evaluations consider three different aspects:- estimating the magnitude, i.e. the size of the population which has observed the contagion
- network scale-up methods - estimate the number of affected people based on (i) the ratio of
*people known to be affected*to*people known,*and (ii) the population size. - mark and recapture - estimate the size of a population based on the number of marked and re-captured marked individuals

- network scale-up methods - estimate the number of affected people based on (i) the ratio of
- estimating sample accuracy
- estimate outcome
- obtain feedback from other (correlated) indicators (i.g. alexa page rank rather than page hits).
- modularity for community-detection problems
- perform controlled experiments (e.g. A/B testing - i.e. a fraction of the population uses the new user interface while the other fraction acts as control group)