Online Discussion Participation Prediction Using Non-Negative Matrix Factorization

1 minute read

Fung, Y.-H., Li, C.-H., & Cheung, W. K. (2007). Online Discussion Participation Prediction Using Non-negative Matrix Factorization. In Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops (pp. 284—287). Washington, DC, USA: IEEE Computer Society.


This paper presents an approach for estimating the user participation in discussions taking place in Internet forums, i.e. to estimate whether a particular user will participate in a certain discussion.

User participation follows Zipf's law:

The authors note that the user participation frequency obeys the Zipf's law, i.e. the posting frequency of a user is inversely proportional to his rank in the "user activity" table. Therefore, active users post significantly more than inactive ones.


The used method encodes discussions and users in the $$n \times m$$ matrix $$F$$ where $$f_{ij} \in F$$ represents the number of posts posted by the user $$j$$ in the discussion$$i$$. To account for the Zipf's law frequency the following normalization which has been inspired by pf/idf is used:

$$x_{ij} = (pf_{ij}) \times (idf_j) = f_{ij} \times log\frac{n}{n_j}$$

where $$n_j$$ represents the number of discussions in which a user $$j$$ has participated. Furthermore the discussion $$i$$ is normalized to unit Euclidean length by dividing the $$L_2$$ norm corresponding to the pf/idf vector.

Afterwards, Weighted Non-negative Matrix Factorization (WNMF) rather than Singular Value Decomposition (SVD) is used to find the latent factors based on the observed participation frequency, i.e. to estimate the model.


The evaluation experiments observe user interaction in three different discussion boards and use the mean absolute error (MAE) to describe the method's performance:

$$MAE = \frac{1}{N} \sum_{i,j \in S_{test}} |x_{ij} - y_{iy} | $$