Text-Mining the Voice of the People
Evangelopoulos, N., & Visinescu, L. (2012). Text-mining the voice of the people. Communications of the ACM, 55(2), 62. doi:10.1145/2076450.2076467
This article elaborates on the application of text mining for e-government. E-government describes the use of ICT to provide citizens and businesses with convenient, high-quality access to government information and services. Mayor players of such systems are politicans,public administrators,programmers, who design e-government processes, and citizens who are the end users of these systems.
- citizen provide feedback (SMS, twitter, ...)
- text summarization using concept-based processing methods.
- aggregated data is fed back to the elected decision makers.
- self-selection - some citizen groups may be more willing to participate in such feedback to promote their own agendas.
- factors for self-selection: income, education, and age
Process and Used NLP Technologies
- text cleanup -- typos, abbreviations, removal of duplicate messages
- text preprocessing -- tokenization, stemming, filtering, term weighting (TF/IDF) and dimensional reduction
- text summarization: Latent Semantic Analysis (LSA), probabilistic LSA, Non-Negative Matrix Factorization, Latent Dirichlet Allocation
- list of "factors" which represent relevant concepts discussed by the public
- high loading terms (most significant keywords) for these factors and the corresponding
- high loading documents