Text-Mining the Voice of the People

1 minute read

Evangelopoulos, N., & Visinescu, L. (2012). Text-mining the voice of the people. Communications of the ACM, 55(2), 62. doi:10.1145/2076450.2076467

This article elaborates on the application of text mining for e-government. E-government describes the use of ICT to provide citizens and businesses with convenient, high-quality access to government information and services. Mayor players of such systems are politicans,public administrators,programmers, who design e-government processes, and citizens who are the end users of these systems.

Method

  1. citizen provide feedback (SMS, twitter, ...)
  2. text summarization using concept-based processing methods.
  3. aggregated data is fed back to the elected decision makers.

Problems

  1. self-selection - some citizen groups may be more willing to participate in such feedback to promote their own agendas.
  2. factors for self-selection: income, education, and age

Process and Used NLP Technologies

  • text cleanup -- typos, abbreviations, removal of duplicate messages
  • text preprocessing -- tokenization, stemming, filtering, term weighting (TF/IDF) and dimensional reduction
  • text summarization: Latent Semantic Analysis (LSA), probabilistic LSA, Non-Negative Matrix Factorization, Latent Dirichlet Allocation

Results

  • list of "factors" which represent relevant concepts discussed by the public
  • high loading terms (most significant keywords) for these factors and the corresponding
  • high loading documents

Conclusion

One of the main advantages of the presented approach is the identification of abstract factors - topics and concepts that are discussed in the public and the corresponding keywords and documents. In contrast to keywords, factors provide another level of abstraction that makes it easier to identify and bundle important topics in the observed domain.