Sentiment Analysis of Blogs by Combining Lexical Knowledge with Text Classification
by Melvile et al (IBM Watson Research Centre) *****
Analyzing blog posts raises a number of interesting questions:
- how to identify the subset of blogs discussing not products but high level concepts (properties?) relevant to these products.
- the identification of the mot authoritative/influential sources.
- detecting the sentiment expressed about an entity (product, properties, etc.)
The described approach draws upon pooling multinomial classifiers for providing a composite Naive Bayes class that incorporates background knowledge with training examples. This is achieved by combining the probability distributions of two experts: (i) an expert trained on labelled training data and (ii) an expert representing a generative model explaining the sentiment lexicon.
There is substantial literature on combining such distributions available. The authors performed their experiments with the following approaches:
- the linear opinion pool which performed best in the evaluation: (P=∑KiαiPi) the pooled probability is the sum of the expert's probabilities weighted by a factor αi (∑αi=1)
- logarithmic opinion pool: (P=∏KiPαii); ∑αi=1; Z is a normalizing constant; if αi=1/K this approach equals to the geometric mean of all expert opinions