Sentiment Analysis of Blogs by Combining Lexical Knowledge with Text Classification
by Melvile et al (IBM Watson Research Centre) *****
Analyzing blog posts raises a number of interesting questions:
- how to identify the subset of blogs discussing not products but high level concepts (properties?) relevant to these products.
- the identification of the mot authoritative/influential sources.
- detecting the sentiment expressed about an entity (product, properties, etc.)
The described approach draws upon pooling multinomial classifiers for providing a composite Naive Bayes class that incorporates background knowledge with training examples. This is achieved by combining the probability distributions of two experts: (i) an expert trained on labelled training data and (ii) an expert representing a generative model explaining the sentiment lexicon.
There is substantial literature on combining such distributions available. The authors performed their experiments with the following approaches:
- the linear opinion pool which performed best in the evaluation: ($$P=\sum_i^K \alpha_i P_i$$) the pooled probability is the sum of the expert's probabilities weighted by a factor $$\alpha_i$$ ($$\sum \alpha_i = 1$$)
- logarithmic opinion pool: ($$P=\prod_i^K P_i^{\alpha_i}$$); $$\sum \alpha_i = 1$$; $$Z$$ is a normalizing constant; if $$\alpha_i = 1/K$$ this approach equals to the geometric mean of all expert opinions