# Sentiment Analysis of Blogs by Combining Lexical Knowledge with Text Classification

by Melvile et al (IBM Watson Research Centre) *****

Analyzing blog posts raises a number of interesting questions:

1. how to identify the subset of blogs discussing not products but high level concepts (properties?) relevant to these products.
2. the identification of the mot authoritative/influential sources.
3. detecting the sentiment expressed about an entity (product, properties, etc.)
This article describes the use of background lexical information for sentiment detection. The authors refine a given sentiment dictionary with training examples.

The described approach draws upon pooling multinomial classifiers for providing a composite Naive Bayes class that incorporates background knowledge with training examples. This is achieved by combining the probability distributions of two experts: (i) an expert trained on labelled training data and (ii) an expert representing a generative model explaining the sentiment lexicon.

There is substantial literature on combining such distributions available. The authors performed their experiments with the following approaches:

1. the linear opinion pool which performed best in the evaluation: ($P=\sum_i^K \alpha_i P_i$) the pooled probability is the sum of the expert's probabilities weighted by a factor $\alpha_i$ ($\sum \alpha_i = 1$)
2. logarithmic opinion pool: ($P=\prod_i^K P_i^{\alpha_i}$); $\sum \alpha_i = 1$; $Z$ is a normalizing constant; if $\alpha_i = 1/K$ this approach equals to the geometric mean of all expert opinions
The authors compute the weights ($\alpha_i$) based on the experts' errors in explaining the training data.

Tags:

Categories:

Updated: