Finding and tracking subjects within an ongoing debate

less than 1 minute read

by Rudy Prabowo and Mike Thelwall (prabowo2008)

This article tracks subjects in postings and bulletins by identifying co-occurring terms which represent these subjects. The presented research is different form topic detection and tracking because it does not focus on tracking individual topic but rather on all subjects discussed within a broad topic.

Method:

  • identify terms:
    • identify sentences and extract terms (nouns, noun phrases)
    • split noun phrases containing conjunction (and, or, ...)
    • depluralize words
    • apply a stopword list

  • create term graph
    • create all possible combinations between terms and assess the strength of their associations using mutual information (MI)
    • generate or extend a graph based on these terms;
    • find blocks representing subjects in this graph using Tarjan's block finding algorithm
Evaluation: The approach is suitable for short documents but cannot track subjects with bulletin data (long documents rather than short debates) because the interlink between the subjects was too dense.