Distributional Footprints of Deceptive Product Reviews

2 minute read

by Song Feng, Longfei Xing, Anupam Gogar and Yejin Choi 2012

The authors of this paper argue that there are natural distributions of opinions in reviews for any given domain that can be used for identifying deceptive product reviews.

Motivation

There is anecdotal evidence about fake reviews and a small number of cases where it was possible to identify deceptive reviewers.
Ott el al. (2011) showed that humans only perform slightly better than chance, when asked to identify deceptive reviews.
Fake reviews might contain lexical clues such as the overuse of self-references (I, me, my) and the lack of spatial information
Computers achieve an accuracy of up to 90% in identifying such reviews, provided that they have been trained with lexico-syntactic patterns or appropriate training data.

Distributions

Analyses of TripAdvisor and Amazon reviews shows that reviews from one-time reviewers are much more likely to have extreme options (5 or 1 star ratings) than multi-time reviewers.
Comparing the distributions of hotels reviews with average ratings between [3.2, 3.9] indicates that reviews with a rating of 3.9 are supported by an unnatural high number of single time 5-star reviews (compare: Gold Adler ;).

Method

The authors compare the "distribution of distributions", i.e. the ordered sequence of frequencies of reviews with an i-star rating for the appropriate entity. For instance, The Sequence (5 > 1 > 2 > 4) means that most reviews were 5-star reviews, followed by 1-, 2- and 4-star reviews. The authors neglect neutral (3-star) reviews in their analysis.
Afterwards, they identify suspicious patterns as for instance (5>1>2>4) and notice that such patterns occur more frequently for hotels that benefit highly from reviews authored by single-time reviewers.

Identifying trustworthy and fake reviews

The authors argue that fake reviewers can be identified by comparing the reviews over time, since there must have been a point in time, before the entity got engaged in solicitation
They collect truthful reviewers based on the following criteria:

such reviewers have a long reviewing history and have written more than 10 reviews,
they do not post reviews in short time intervals, and
do not divert too much from the product's average ratings.

The following metrics have been used for identifying deceptive reviews:

the discrepancy between the average rating by truthful reviewers and the average rating by single-time reviewers
the ratios of the number of strongly positive reviews to the number of strongly negative reviews among different groups of reviewers (e.g. single-time versus multi-time or truthful)
sudden bursts of very positive or negative reviews

Thoughts

Considering neutral (3-star) reviews might be beneficial for assessing controversial topics.
It would be interesting to show how/whether writer re-identification can be useful for identifying reviews produced by public relation companies.

Share on

Twitter Facebook LinkedIn

Albert Weichselbraun

Distributional Footprints of Deceptive Product Reviews

Motivation

Distributions

Method

Identifying trustworthy and fake reviews

Thoughts

Share on

You may also enjoy

Big, Linked Geospatial Data and Its Application in Earth Observation

Employment relations: a data driven analysis of job markets using online job boards and online professional networks

Suffix array

Dynamic feature scaling for online learning of binary classifiers