Predicting the Future with Social Media

2 minute read

by Asur, S., & Huberman, B. A. (2010). IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)

This article elaborates on the use of Twitter for predicting the box-office revenues of movies based on (i) the number of tweets and re-tweets, and (ii) the sentiment of the Twitter messages.

The authors show that their Twitter-based prediction model outperforms

  • the Hollywood Stock Exchange (HSX), an artificial markets that has been considered the gold standard for predicting movie sales since it was usually more accurate than other techniques such as surveys and opinion polls (Pennock et al, 2001; Chen & Hubermann, 2003); and
  • web-based indicators, such as the ones developed by Zhang & Skiena, 2006
Asur & Hubermann use the following sentiment metrics to refine their prediction model:

  1. a Subjectivity measure: \[ Subjectivity = \frac{|\text{positive} + \text{negative tweets}|}{|\text{neutral tweets}|}\] - subjectivity rises after the release of a movie.
  2. the Polariy: \[ P=\frac{|\text{positive tweets}|}{ |\text{negative tweets}|} \]

Lessons Learned

  1. Social media feeds can be an effective indicator of real-world performance
  2. Considering the tweet-rate timeseries (i.e., the number of tweets per day, observed over a certain period of time) alone yields an adjusted R² of 0.92; including the Tweets polarity boosts this measure to 0.94.
  3. Retweets are posts originally made by one user that are forwarded to other users and boost Twitter's ability for viral marketing
  4. The distribution of Tweets per authors is close to a Zipfian distribution
    • a few authors generate a large number of tweets
    • this is consistent with observations from other networks (Wu et al, 2009)
  5. At a deeper level social media expresses collective intelligence and can yield powerful and accurate indicators of future outcomes
  6. Asur and Hubermann also suggest the following predictive regression model for the sales (y) of a product: \[ y = \beta_0 + \beta_a * A + \beta_p * P + \beta_d * D + \epsilon \]

    • A .. rate of attention seeking
    • P ... polarity of sentiment and reviews
    • D ... distribution parameter

Related Work

  1. Gruhl, 2005 and others mine blogs in order to predict spikes in book sales
  2. Zhang & Skiena, 2006 analyze movie sales based on News analysis


  1. Gruhl, D., Guha, R., Kumar, R., Novak, J., & Tomkins, A. (2005). The predictive power of online chatter. Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, KDD ™05 (pp. 78—87). New York, NY, USA: ACM. doi:10.1145/1081870.1081883
  2. Wu, F., Wilkinson, D. M., & Huberman, B. A. (2009). Feedback Loops of Attention in Peer Production. Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 04 (pp. 409—415). Washington, DC, USA: IEEE Computer Society. doi:10.1109/CSE.2009.430
  3. Zhang, W., & Skiena, S. (2009). Improving Movie Gross Prediction through News Analysis. Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01, WI-IAT ™09 (pp. 301—304). Washington, DC, USA: IEEE Computer Society. doi:10.1109/WI-IAT.2009.53
  4. Pennock, D. M., Lawrence, S., Giles, C. L., & Nielsen, F. A. (2001). The real power of artificial markets. Science (New York, N.Y.), 291(5506), 987-988.
  5. Chen, K.-Y., Fine, L. R., & Huberman, B. A. (2003). Predicting the Future. Information Systems Frontiers, 5(1), 47—61. doi:10.1023/A:1022041805438