# Using Word Association to Detect Multitopic Structures in Text Documents

*IEEE Intelligent Systems*, 29(5), pp.40—46.

## Method

Associative gravity focuses on the content structures*within*a single document and combines the following three steps:

**keyword detection**- computes a keyword rating*kw*which determines the most important terms*w*based on the number of times they occur in the text*n(w*), the total number of documents*|D|*and the number of documents in which they occur*I(w)*: $$!kw(w) = n(w) \cdot\frac{|D|}{I(w)}$$**CIMAWA**- to determine the relations between those keywords. CIMAWA(x(y)) measures the association between word x and y based on a certain windows size (ws) and the reverse association multiplied with the damping factor k. For the used experiments a window size of 10 has been chosen. $$!CIMAWA(x(y)) = \frac{Cooc_{ws}(x,y)}{n(y)^\alpha} + k \frac{Cooc_{ws}(x,y)}{n(x)^\alpha}$$**clustering**- to determine semantic topic clusters CIMAWA is used to compute the associative gravity force (AGF), i.e. the distance metric for the clustering $$!AGF(x, y) = \frac{CIMAWA(x(y)) \cdot kr(x)}{y}$$