# Clash of the Contagions - Cooperation and Competition in Information Diffusion

by Seth A. Myers and Jure Leskovec, IEEE International Conference on Data Mining (ICDM 2012), Brussels, Belgium

# Introduction

The authors present a statistical information diffusion model that considers competition as well as cooperation between contagions.- during processing information sources (Web pages, TV, Tweets) - or being exposed to contagions - we constantly make choices whether to process (= getting infected with the contagion) or ignore these events.
- the authors distinguish tree different factors that determine, whether an infection takes place:
- interestingness of the content (= content virality)
- likelihood of the user to share the content (=user bias)
- the content interaction term (= past exposures)

## Potential benefits

- optimizing click-through rates by optimizing content placing
- combat the spread of negative pieces of information

## Related work:

- models of
*contagions in isolation:*standard information diffusion approaches such as: Linear Threshold Models, Independent Cascade Models, exposure curves - models assuming that
*contagions are mutually exclusive*(e.g. adoption of technology in a company such as Skype versus Google Hangout)

# Method

The test set consists of URLs in Twitter tweets. Users are infected when they re-tweet a certain URL. Users are modeled as nodes. If a user, re-tweets a URL all his followers become exposed to that particular URL.The probability of infection is altered by a contagion's (*X*) predecessors ($$Y_k$$) that are considered within a sliding window of size *K*. Therefore, the conditional probability of an infection is computed as

\[P(X|Y_1, Y_2, ... Y_K)\]

The authors make the following assumptions to decrease the number of different contagion combinations:

- $$Y_k$$ is independent of $$Y_l$$ => they only need to consider $$P(X|Y_k)$$ rather than every possible contagion sequence.
- they only consider the interaction between clusters (i.e., latent topics) rather than between all pairs of contagions.

Based on these assumptions they model

\[P(X|Y_k) = P(X) + \Delta^{(k)}_{cont}(u_i, u_j)\]

where $$\Delta^{(k)}_{cont}(u_i, u_j)$$ represents the effect a contagion $$u_i$$ has on contagion $$u_j$$ from

*k*exposures away.## Solution

- in the rare cases where the $$\Delta$$ term leads to negative probabilities the authors set the probability to a minimum value of 1E-10
- the models obtained have a
**high number of parameters**. The authors tried numerous methods to optimize these parameters and discovered that a variation of**stochastic gradient descent**worked best for the given use case.

## Dataset

The raw dataset contained more than 3 billion tweets that where filtered based on the following criteria:

- Tweets containing URLs that where tweeted by at least 50 users (191,650 URLs)
- URLs referring to sites which contain enough (>=50 tokens) texts to determine a latent topic (39,771 URLs)
- URLs referring to English sites (18,186 URLs and 2,664,207 infectiuous events, i.e. Tweets)

# Results

The paper provides evidence that more infectious URLs have- a negative (suppressive) effect on less infectious URLs of
*unrelated*content, and - a positive effect on less infectious URLs of
*related content.*