Sentiment Analysis of Blogs by Combining Lexical Knowledge with Text Classification

1 minute read

by Melvile et al (IBM Watson Research Centre) *****

Analyzing blog posts raises a number of interesting questions:

  1. how to identify the subset of blogs discussing not products but high level concepts (properties?) relevant to these products.
  2. the identification of the mot authoritative/influential sources.
  3. detecting the sentiment expressed about an entity (product, properties, etc.)
This article describes the use of background lexical information for sentiment detection. The authors refine a given sentiment dictionary with training examples.

The described approach draws upon pooling multinomial classifiers for providing a composite Naive Bayes class that incorporates background knowledge with training examples. This is achieved by combining the probability distributions of two experts: (i) an expert trained on labelled training data and (ii) an expert representing a generative model explaining the sentiment lexicon.

There is substantial literature on combining such distributions available. The authors performed their experiments with the following approaches:

  1. the linear opinion pool which performed best in the evaluation: ($$P=\sum_i^K \alpha_i P_i$$) the pooled probability is the sum of the expert's probabilities weighted by a factor $$\alpha_i$$ ($$\sum \alpha_i = 1$$)
  2. logarithmic opinion pool: ($$P=\prod_i^K P_i^{\alpha_i}$$); $$\sum \alpha_i = 1$$; $$Z$$ is a normalizing constant; if $$\alpha_i = 1/K$$ this approach equals to the geometric mean of all expert opinions
The authors compute the weights ($$\alpha_i$$) based on the experts' errors in explaining the training data.