Meme ranking to maximize post virality in microblogging platforms

2 minute read

Bonchi, F., Castillo, C., & Ienco, D. (2013). Meme ranking to maximize posts virality in microblogging platforms. Journal of Intelligent Information Systems, 40(2), 211—239. doi:10.1007/s10844-011-0181-4

This article discusses the meme ranking problem, i.e. the problem of selecting the k top-memes to show users after logging into the system which maximize the size of the meme propagtion tree and, therefore, the total number of reposts in the network.

Analysis

The authors define a social network as a directed graph G=(V,E) based on the followed-follower relationship with users $$u, v \in V$$ that are connected either by (i) an explicit follower relation, or (ii) by a previous repost of a meme posted by u.

The average clustering coefficient of the resulting undirected graph is 0.25, the average repost per meme 2.5 (the original post + 1.5 reposts), and the number of memes that are never reposted amounts to 77%.

Reposts seem to depend on the following factors:

  1. currentness of the meme: 80% of the reposts occur within the first day the meme is published (T)
  2. content: memes that are more similar to the a users preferences (modeled by the concatenation of all memes he has ever reposted) have a higher likelihood of being reposted (M)
  3. followers are more likely to repost a meme (V)
The probability of a repost is computed as the function $$p: M \times V \times V \times T \rightarrow [0,1]$$. The author provide a detailed analysis outlining why solving the meme ranking problem is prohibitively complex.

Heuristics

  1. Baseline methods are (i) random, that randomly selects the top meme and, (ii) recency, which choses the k most recent memes for a user and is the current de facto standard for microblogging sites.
  2. User-centered methods select memes based on (i) the user's interest, (ii) the influence of u over v (i.e. the historical probability of posts coming from u to be reposted by v) and (iii) a combination of the two.
  3. Follower-centered methods, optimize the content based on the preferences of a user's followers (rather than for the user). The authors also reconfirmed that considering the top-n followers with respect to their spread yielded a better performance than popularity (i.e. the out-degree). Therefore, the out-degree does not seem to be the best metric for future influence.

Method

The authors used Monte-Carlo simulations with a granularity of one hour in their experiments and learned the repost similarities and parameters as follows:

  1. Extended user interest is represented as a concatenation (bag-of-words) of all the memes a user has ever posted or reposted and of the first user that re-posted the memes. Considering these re-posts often yields text for posts that contain only URLs to images or videos.
  2. Influence refers to the probability of memes posted by user u to be reposted by a user v.
  3. Repost probability is computed using logistic regression on a training dataset. Afterwards the time is incorporated as follows \[ p(repost(v, u, m, t)) = \begin{cases}max(p^m_{u,v}, \epsilon) &\text{if } t-t_u \leq \tau_v; \\\epsilon & \text{otherwise}\end{cases}\] The repost probability, therefore, either is $$p^m_{u,v}$$ if the meme is recent (not older than $$\tau_v$$) or a residue value $$\epsilon$$ otherwise.
The simulations showed that time-dependent heuristics performed best, i.e. that time seems to be the dominant factor determining the probability of a repost.

Notes

  1. time-based ranking strategies are quite successful.
  2. is the short attention spawn of microblogging platforms caused of the the dominance of time-based ranking strategies, or are these strategies so successful because people using microblogging platforms only repost new memes rather than discuss old ones.