Clash of the Contagions - Cooperation and Competition in Information Diffusion

2 minute read

by Seth A. Myers and Jure Leskovec, IEEE International Conference on Data Mining (ICDM 2012), Brussels, Belgium

Introduction

The authors present a statistical information diffusion model that considers competition as well as cooperation between contagions.

during processing information sources (Web pages, TV, Tweets) - or being exposed to contagions - we constantly make choices whether to process (= getting infected with the contagion) or ignore these events.
the authors distinguish tree different factors that determine, whether an infection takes place:

interestingness of the content (= content virality)
likelihood of the user to share the content (=user bias)
the content interaction term (= past exposures)

Potential benefits

optimizing click-through rates by optimizing content placing
combat the spread of negative pieces of information

Related work:

models of contagions in isolation: standard information diffusion approaches such as: Linear Threshold Models, Independent Cascade Models, exposure curves
models assuming that contagions are mutually exclusive (e.g. adoption of technology in a company such as Skype versus Google Hangout)

Method

The test set consists of URLs in Twitter tweets. Users are infected when they re-tweet a certain URL. Users are modeled as nodes. If a user, re-tweets a URL all his followers become exposed to that particular URL.

The probability of infection is altered by a contagion's (X) predecessors ($$Y_k$$) that are considered within a sliding window of size K. Therefore, the conditional probability of an infection is computed as

\[P(X|Y_1, Y_2, ... Y_K)\]

The authors make the following assumptions to decrease the number of different contagion combinations:

$$Y_k$$ is independent of $$Y_l$$ => they only need to consider $$P(X|Y_k)$$ rather than every possible contagion sequence.
they only consider the interaction between clusters (i.e., latent topics) rather than between all pairs of contagions.

Based on these assumptions they model

\[P(X|Y_k) = P(X) + \Delta^{(k)}_{cont}(u_i, u_j)\]

where $$\Delta^{(k)}_{cont}(u_i, u_j)$$ represents the effect a contagion $$u_i$$ has on contagion $$u_j$$ from k exposures away.

Solution

in the rare cases where the $$\Delta$$ term leads to negative probabilities the authors set the probability to a minimum value of 1E-10
the models obtained have a high number of parameters. The authors tried numerous methods to optimize these parameters and discovered that a variation of stochastic gradient descent worked best for the given use case.

Dataset

The raw dataset contained more than 3 billion tweets that where filtered based on the following criteria:

Tweets containing URLs that where tweeted by at least 50 users (191,650 URLs)
URLs referring to sites which contain enough (>=50 tokens) texts to determine a latent topic (39,771 URLs)
URLs referring to English sites (18,186 URLs and 2,664,207 infectiuous events, i.e. Tweets)

Results

The paper provides evidence that more infectious URLs have

a negative (suppressive) effect on less infectious URLs of unrelated content, and
a positive effect on less infectious URLs of related content.

Share on

Twitter Facebook LinkedIn

Albert Weichselbraun

Clash of the Contagions - Cooperation and Competition in Information Diffusion

Introduction

Potential benefits

Related work:

Method

Solution

Dataset

Results

Share on

You may also enjoy

Big, Linked Geospatial Data and Its Application in Earth Observation

Employment relations: a data driven analysis of job markets using online job boards and online professional networks

Suffix array

Dynamic feature scaling for online learning of binary classifiers