# Sentiment Analysis of Blogs by Combining Lexical Knowledge with Text Classification

by Melvile et al (IBM Watson Research Centre) *****

Analyzing blog posts raises a number of interesting questions:

- how to identify the subset of blogs discussing not products but
**high level concepts**(properties?) relevant to these products. - the identification of the mot
**authoritative/influential sources.** - detecting the
**sentiment**expressed about an entity (product, properties, etc.)

**refine**a given

**sentiment dictionary with training examples**.

The described approach draws upon pooling multinomial classifiers for providing a composite Naive Bayes class that incorporates background knowledge with training examples. This is achieved by combining the probability distributions of two experts: (i) an expert trained on labelled training data and (ii) an expert representing a generative model explaining the sentiment lexicon.

There is substantial literature on combining such distributions available. The authors performed their experiments with the following approaches:

- the linear opinion pool which performed best in the evaluation: ($$P=\sum_i^K \alpha_i P_i$$) the pooled probability is the sum of the expert's probabilities weighted by a factor $$\alpha_i$$ ($$\sum \alpha_i = 1$$)
- logarithmic opinion pool: ($$P=\prod_i^K P_i^{\alpha_i}$$); $$\sum \alpha_i = 1$$; $$Z$$ is a normalizing constant; if $$\alpha_i = 1/K$$ this approach equals to the geometric mean of all expert opinions