Evaluation Without Ground Truth in Social Media Research

1 minute read

Zafarani, Reza, and Huan Liu. Evaluation Without Ground Truth in Social Media Research. Communcations of the ACM 58, no. 6 (May 2015): 54—60. doi:10.1145/2666680.


This paper discusses strategies for evaluating research findings in settings where no ground truth (or gold standard) is available, covering the following three types of evaluations:

  1. Spatiotemporal - predict when and/or where things are going to happen
  2. Causality - why events are happening, and
  3. Outcome - predicting a certain outcome

Spatiotemporal evaluation

  1. draw upon the periodicity of individual behavior (i.e. use behavior observed in the first n days as training to evaluate the next m days).
  2. crowdsourcing
  3. ensemble methods (combine statistically independent methods)

Causality evaluation

  1. controlled experiments (two groups: control and treatment)
  2. natural experiments
    • use a naturally occurring event to measure the influence of an independent variable
    • temporal shuffling (i.e. use control groups from the time before an influencing independent variable became active)
  3. nonequivalent control - select the control group to be considered similar to a randomized group
  4. Granger (pseudo) causality - i.e. consider a variable X causal, if X improved the prediction of values Y compared to Y alone

Outcome evaluation

Outcome evaluations consider three different aspects:

  1. estimating the magnitude, i.e. the size of the population which has observed the contagion
    • network scale-up methods - estimate the number of affected people based on (i) the ratio of people known to be affected to people known, and (ii) the population size.
    • mark and recapture - estimate the size of a population based on the number of marked and re-captured marked individuals
  2. estimating sample accuracy
  3. estimate outcome
    • obtain feedback from other (correlated) indicators (i.g. alexa page rank rather than page hits).
    • modularity for community-detection problems
    • perform controlled experiments (e.g. A/B testing - i.e. a fraction of the population uses the new user interface while the other fraction acts as control group)