Evaluation Without Ground Truth in Social Media Research
1 minute read
Zafarani, Reza, and Huan Liu. Evaluation Without Ground Truth in Social Media Research. Communcations of the ACM 58, no. 6 (May 2015): 54—60. doi:10.1145/2666680.
Summary
This paper discusses strategies for evaluating research findings in settings where no ground truth (or gold standard) is available, covering the following three types of evaluations:
- Spatiotemporal - predict when and/or where things are going to happen
- Causality - why events are happening, and
- Outcome - predicting a certain outcome
Spatiotemporal evaluation
- draw upon the periodicity of individual behavior (i.e. use behavior observed in the first n days as training to evaluate the next m days).
- crowdsourcing
- ensemble methods (combine statistically independent methods)
Causality evaluation
- controlled experiments (two groups: control and treatment)
- natural experiments
- use a naturally occurring event to measure the influence of an independent variable
- temporal shuffling (i.e. use control groups from the time before an influencing independent variable became active)
- nonequivalent control - select the control group to be considered similar to a randomized group
- Granger (pseudo) causality - i.e. consider a variable X causal, if X improved the prediction of values Y compared to Y alone
Outcome evaluation
Outcome evaluations consider three different aspects:
- estimating the magnitude, i.e. the size of the population which has observed the contagion
- network scale-up methods - estimate the number of affected people based on (i) the ratio of people known to be affected to people known, and (ii) the population size.
- mark and recapture - estimate the size of a population based on the number of marked and re-captured marked individuals
- estimating sample accuracy
- estimate outcome
- obtain feedback from other (correlated) indicators (i.g. alexa page rank rather than page hits).
- modularity for community-detection problems
- perform controlled experiments (e.g. A/B testing - i.e. a fraction of the population uses the new user interface while the other fraction acts as control group)