Subrahmanian, V. S., A. Azaria, S. Durst, V. Kagan, A. Galstyan, K. Lerman, L. Zhu, E. Ferrara, A. Flammini, and F. Menczer. The DARPA Twitter Bot Challenge. Computer 49, no. 6 (June 2016): 38—46. doi:10.1109/MC.2016.183.
IntroductionAccording to Twitter's SEC filing approximately 8.5% of all Twitter users are bots such as (i) spambots, (ii) paybots (i.e. bots that copy content from respected sources and paste it into micro URLs that pay the bot's creator for redirecting traffic to their sites), and (iii) influence bots (i.e. bots that try to shape discussions in accordance to a certain agenda).
Research has shown that such bots have a surprisingly large influence, triggering a competition by DARPA's Social Media in Strategic Communication program for identifying influence bots that promote pro-vaccination on Twitter discussions based on a synthetic data set comprising over 7000 Twitter profiles, > 4 million tweets and weekly snapshots of the Twitter network that capture changes to profiles and a user's followers.
ApproachThe teams considered different features for identifying bots including
- Tweet syntax (i.e. syntax that indicates the use of natural language generation programs, etc.)
- Tweet semantics (number of posts related to that topic, sentiment, consistency, etc.)
- Temporal behavior features (variance in sentiment, durance of sessions, average number of Tweets)
- Network features (deviation of user sentiment scores from followers and followees, centrality, etc.)
- identify an initial set of bots based on the features mentioned above
- use cluster, outliers and network analysis to locate further bots
- once a large enough number of bots has been found, apply standard machine learning methods to identify the remaining bots.