Learning information diffusion process on the web

less than 1 minute read

by Wan, X. and Yang, J (wan2007)

The authors present an approach which identifies the diffusion process for a particular topic. Sets of documents with a given topic ($$D={d_1, ... d_n}$$) are associated with a tuple containing a time stamp denoting the time at which the document was published at the $$LocationSite_i$$ $$(t_i, LocationSite_i)$$. Due to the diffusion process we get $$[LocationSite_j: d_j \rightarrow LocationSite_i: d_i] (t_j

They use a support vector machine (SVM) with the following features to determine whether d_j is the source of d_i:

  • metadata-based features
  • cueword-based features: do appear cuewords (forward, from, source) mentioning the source's name in the document
  • similarity-based features: cosine similarity