The viability of web-derived polarity lexicons

1 minute read

by Velikovich et al. (Google research)

This paper describes an approach for semi-automatically generating sentiment lexicon from seed terms and a Web corpus. The authors create a graph of phrases and use a graph propagation algorithm to determine how positive and negative polarity propagates through this graph. The graph has the following components:

node set V: contains the candidate phrases (=n-grams up to a length of ten)
set of edges E: connect two candidate phrases based on the cosine similarity of their context vector; the context vector is composed by aggregating the terms appearing within a six word window over all mentions of the phrase in the Web corpus

The authors discuss why their propagation method is expected to outperform labeled propagation, which works well with high quality data but not with noisy and untrustworthy graphs constructed from the Web. Evaluation Measure Finally, the authors present an evaluation which makes use of the following purity measure:

\[ purity(X) = \sum_{ x \in X } pol_x / ( \delta + \sum_{ x \in X } |pol_x| ) \]

Due to the parameter \delta this measure assigns higher scores to sentences which contain multiple sentiment phrases. This purity measure normalizes the polarity score to the range [-1,1] and yields higher values for sentences containing only positive or negative terms. Conclusion The evaluations show that the proposed method outperforms traditional lexicons because to generated dictionaries contain a wider range of phrases such as

spelling variations
slang
vulgarity, and
multi-word expressions

which have not been available to previous systems.

Share on

Twitter Facebook LinkedIn

Albert Weichselbraun

The viability of web-derived polarity lexicons

Share on

You may also enjoy

Big, Linked Geospatial Data and Its Application in Earth Observation

Employment relations: a data driven analysis of job markets using online job boards and online professional networks

Suffix array

Dynamic feature scaling for online learning of binary classifiers