Predicting the Future with Social Media

2 minute read

by Asur, S., & Huberman, B. A. (2010). IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)

This article elaborates on the use of Twitter for predicting the box-office revenues of movies based on (i) the number of tweets and re-tweets, and (ii) the sentiment of the Twitter messages.

The authors show that their Twitter-based prediction model outperforms

the Hollywood Stock Exchange (HSX), an artificial markets that has been considered the gold standard for predicting movie sales since it was usually more accurate than other techniques such as surveys and opinion polls (Pennock et al, 2001; Chen & Hubermann, 2003); and
web-based indicators, such as the ones developed by Zhang & Skiena, 2006

Asur & Hubermann use the following sentiment metrics to refine their prediction model:

a Subjectivity measure: \[ Subjectivity = \frac{|\text{positive} + \text{negative tweets}|}{|\text{neutral tweets}|}\] - subjectivity rises after the release of a movie.

the Polariy: \[ P=\frac{|\text{positive tweets}|}{ |\text{negative tweets}|} \]

Lessons Learned

Social media feeds can be an effective indicator of real-world performance
Considering the tweet-rate timeseries (i.e., the number of tweets per day, observed over a certain period of time) alone yields an adjusted RÂ² of 0.92; including the Tweets polarity boosts this measure to 0.94.
Retweets are posts originally made by one user that are forwarded to other users and boost Twitter's ability for viral marketing
The distribution of Tweets per authors is close to a Zipfian distribution
- a few authors generate a large number of tweets
- this is consistent with observations from other networks (Wu et al, 2009)

At a deeper level social media expresses collective intelligence and can yield powerful and accurate indicators of future outcomes
Asur and Hubermann also suggest the following predictive regression model for the sales (y) of a product: \[ y = \beta_0 + \beta_a * A + \beta_p * P + \beta_d * D + \epsilon \]
- A .. rate of attention seeking
- P ... polarity of sentiment and reviews
- D ... distribution parameter

Related Work

Gruhl, 2005 and others mine blogs in order to predict spikes in book sales
Zhang & Skiena, 2006 analyze movie sales based on News analysis

Literature

Gruhl, D., Guha, R., Kumar, R., Novak, J., & Tomkins, A. (2005). The predictive power of online chatter. Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, KDD ™05 (pp. 78—87). New York, NY, USA: ACM. doi:10.1145/1081870.1081883
Wu, F., Wilkinson, D. M., & Huberman, B. A. (2009). Feedback Loops of Attention in Peer Production. Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 04 (pp. 409—415). Washington, DC, USA: IEEE Computer Society. doi:10.1109/CSE.2009.430
Zhang, W., & Skiena, S. (2009). Improving Movie Gross Prediction through News Analysis. Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01, WI-IAT ™09 (pp. 301—304). Washington, DC, USA: IEEE Computer Society. doi:10.1109/WI-IAT.2009.53
Pennock, D. M., Lawrence, S., Giles, C. L., & Nielsen, F. A. (2001). The real power of artificial markets. Science (New York, N.Y.), 291(5506), 987-988.
Chen, K.-Y., Fine, L. R., & Huberman, B. A. (2003). Predicting the Future. Information Systems Frontiers, 5(1), 47—61. doi:10.1023/A:1022041805438

Share on

Twitter Facebook LinkedIn

Albert Weichselbraun

Predicting the Future with Social Media

Lessons Learned

Related Work

Literature

Share on

You may also enjoy

Big, Linked Geospatial Data and Its Application in Earth Observation

Employment relations: a data driven analysis of job markets using online job boards and online professional networks

Suffix array

Dynamic feature scaling for online learning of binary classifiers