Polarity Shift Detection, Elimination and Ensemble: A Three-Stage Model for Document-Level Sentiment Analysis

2 minute read

Xia, Rui, Feng Xu, Jianfei Yu, Yong Qi, and Erik Cambria. Polarity Shift Detection, Elimination and Ensemble: A Three-Stage Model for Document-Level Sentiment Analysis. Information Processing & Management, Emotion and Sentiment in Social and Expressive Media, 52, no. 1 (January 2016): 36—45.

Introduction

This article presents an approach for handling explicit polarity shifts due to (i) negation (I don't like this movie), and (ii) contrast (Fairly good, but not my style) as well as implicit shifts due to sentiment inconsistency that appears frequently if people express different opinions towards different aspects of a product or event.

  1. Negation usually shifts the sentiment of the affected parts,
  2. Contrast can shift the polarity of neighboring sentences or subsentences. If the polarity shift is not considered, the impact of the shifted part should be decreased relatively to the unshifted part.
  3. Sentiment inconsistency can be viewed as a type of implicit contrast - i.e. the impact of the inconsistent parts should be weakened.

Method

The authors deploy a three stage model to address polarity shifts:

  1. Rules identify negation as well as forward and backward contrasts. A statistical method that draws upon the weighted log-likelihood ratio (WLLR) detects statistical inconsistencies in reviews (i.e. cases where one sentence sentiment is inconsistent with the overall document sentiment).
  2. Negation polarity shift elimination by replacing negated terms with their antonyms. The antonyms are, again, determined based on the WLLR metric obtained for terms in the training documents. For example, the most positive term would be replaced with the most negative term according to WLLR (these substitutions correspond to the relative sentiment strength rather than to the actual meaning of the term). The relevance of a term for positive/negative sentiment is computed as outlined below. $$\text{relevance}(t_i, +) = p(t_i | +) log \frac{p(t_i |+)}{p(t_i | -)}$$ $$\text{relevance}(t_i, -) = p(t_i | -) log \frac{p(t_i |-)}{p(t_i | +)}$$
  3. The polarity shift ensemble model is trained based on three components: (i) sentences for which negations have been eliminated, (ii) sentences containing contrast, and (iii) sentences with sentiment inconsistency as well as a base classifier for sentences without polarity shifts.

Evaluation

The evaluation draws upon the Multi-domain sentiment datasets by Blitzer, Dredze and Pereira (2007) that contains four datasets of 1000 positive and negative Amazon reviews each. The authors use linear SVM (LibSVM), logistic regression (LibLinear) and Naive Bayes (OpenPR-NB) to evaluate their approach against the following four methods:

  1. Baseline (bag of words without negation detection)
  2. Das (2001): negated words are marked with the suffix "-NOT" prior to training and classification.
  3. REV: sentiment words in the scope of negation are reversed to their antonyms.
  4. Li et al. (2010): text is separated into a polarity-shifted and polarity-unshifted fraction based on which two classifiers are trained.
  5. the presented approach
The evaluation demonstrates that the presented approach outperforms all methods used in the experiments.