# Meme ranking to maximize post virality in microblogging platforms

Bonchi, F., Castillo, C., & Ienco, D. (2013). Meme ranking to maximize posts virality in microblogging platforms. Journal of Intelligent Information Systems, 40(2), 211—239. doi:10.1007/s10844-011-0181-4

This article discusses the meme ranking problem, i.e. the problem of selecting the *k top-memes* to show users after logging into the system which maximize the size of the meme propagtion tree and, therefore, the total number of reposts in the network.

# Analysis

The authors define a social network as a directed graph*G=(V,E)*based on the followed-follower relationship with users $$u, v \in V$$ that are connected either by (i) an explicit follower relation, or (ii) by a previous repost of a meme posted by

*u*.

The average clustering coefficient of the resulting undirected graph is 0.25, the average repost per meme 2.5 (the original post + 1.5 reposts), and the number of memes that are never reposted amounts to 77%.

Reposts seem to depend on the following factors:

*currentness*of the meme: 80% of the reposts occur within the first day the meme is published*(T)**content*: memes that are more similar to the a users preferences (modeled by the concatenation of all memes he has ever reposted) have a higher likelihood of being reposted*(M)**followers*are more likely to repost a meme*(V)*

# Heuristics

- Baseline methods are (i) random, that randomly selects the top meme and, (ii) recency, which choses the
*k*most recent memes for a user and is the current de facto standard for microblogging sites. - User-centered methods select memes based on (i) the user's interest, (ii) the influence of u over v (i.e. the historical probability of posts coming from u to be reposted by v) and (iii) a combination of the two.
- Follower-centered methods, optimize the content based on the preferences of a user's followers (rather than for the user). The authors also reconfirmed that considering the top-n followers with respect to their spread yielded a better performance than popularity (i.e. the out-degree). Therefore, the out-degree does not seem to be the best metric for future influence.

# Method

The authors used Monte-Carlo simulations with a granularity of one hour in their experiments and learned the repost similarities and parameters as follows:**Extended user interest**is represented as a concatenation (bag-of-words) of all the memes a user has ever posted or reposted and of the first user that re-posted the memes. Considering these re-posts often yields text for posts that contain only URLs to images or videos.**Influence**refers to the probability of memes posted by user*u*to be reposted by a user*v.***Repost probability**is computed using logistic regression on a training dataset. Afterwards the time is incorporated as follows \[ p(repost(v, u, m, t)) = \begin{cases}max(p^m_{u,v}, \epsilon) &\text{if } t-t_u \leq \tau_v; \\\epsilon & \text{otherwise}\end{cases}\] The repost probability, therefore, either is $$p^m_{u,v}$$ if the meme is recent (not older than $$\tau_v$$) or a residue value $$\epsilon$$ otherwise.

# Notes

- time-based ranking strategies are quite successful.
- is the short attention spawn of microblogging platforms caused of the the dominance of time-based ranking strategies, or are these strategies so successful because people using microblogging platforms only repost new memes rather than discuss old ones.