The DARPA Twitter Bot Challenge

1 minute read

Subrahmanian, V. S., A. Azaria, S. Durst, V. Kagan, A. Galstyan, K. Lerman, L. Zhu, E. Ferrara, A. Flammini, and F. Menczer. The DARPA Twitter Bot Challenge. Computer 49, no. 6 (June 2016): 38—46. doi:10.1109/MC.2016.183.

Introduction

According to Twitter's SEC filing approximately 8.5% of all Twitter users are bots such as (i) spambots, (ii) paybots (i.e. bots that copy content from respected sources and paste it into micro URLs that pay the bot's creator for redirecting traffic to their sites), and (iii) influence bots (i.e. bots that try to shape discussions in accordance to a certain agenda).

Research has shown that such bots have a surprisingly large influence, triggering a competition by DARPA's Social Media in Strategic Communication program for identifying influence bots that promote pro-vaccination on Twitter discussions based on a synthetic data set comprising over 7000 Twitter profiles, > 4 million tweets and weekly snapshots of the Twitter network that capture changes to profiles and a user's followers.

Approach

The teams considered different features for identifying bots including

Tweet syntax (i.e. syntax that indicates the use of natural language generation programs, etc.)
Tweet semantics (number of posts related to that topic, sentiment, consistency, etc.)
Temporal behavior features (variance in sentiment, durance of sessions, average number of Tweets)
Network features (deviation of user sentiment scores from followers and followees, centrality, etc.)

The overall procedure for detecting bots contained the following steps:

identify an initial set of bots based on the features mentioned above
use cluster, outliers and network analysis to locate further bots
once a large enough number of bots has been found, apply standard machine learning methods to identify the remaining bots.

Machine learning couldn't be applied at an earlier stage since not enough training data has been available. In addition, all teams used semi-supervised approaches - i.e. machines would identify potential bots that are than confirmed (or rejected) by human experts.

Share on

Twitter Facebook LinkedIn

Albert Weichselbraun

The DARPA Twitter Bot Challenge

Introduction

Approach

Share on

You may also enjoy

Big, Linked Geospatial Data and Its Application in Earth Observation

Employment relations: a data driven analysis of job markets using online job boards and online professional networks

Suffix array

Dynamic feature scaling for online learning of binary classifiers