Distributional Footprints of Deceptive Product Reviews
by Song Feng, Longfei Xing, Anupam Gogar and Yejin Choi 2012
The authors of this paper argue that there are natural distributions of opinions in reviews for any given domain that can be used for identifying deceptive product reviews.
Motivation
- There is anecdotal evidence about fake reviews and a small number of cases where it was possible to identify deceptive reviewers.
- Ott el al. (2011) showed that humans only perform slightly better than chance, when asked to identify deceptive reviews.
- Fake reviews might contain lexical clues such as the overuse of self-references (I, me, my) and the lack of spatial information
- Computers achieve an accuracy of up to 90% in identifying such reviews, provided that they have been trained with lexico-syntactic patterns or appropriate training data.
Distributions
- Analyses of TripAdvisor and Amazon reviews shows that reviews from one-time reviewers are much more likely to have extreme options (5 or 1 star ratings) than multi-time reviewers.
- Comparing the distributions of hotels reviews with average ratings between [3.2, 3.9] indicates that reviews with a rating of 3.9 are supported by an unnatural high number of single time 5-star reviews (compare: Gold Adler ;).
Method
- The authors compare the "distribution of distributions", i.e. the ordered sequence of frequencies of reviews with an i-star rating for the appropriate entity. For instance, The Sequence (5 > 1 > 2 > 4) means that most reviews were 5-star reviews, followed by 1-, 2- and 4-star reviews. The authors neglect neutral (3-star) reviews in their analysis.
- Afterwards, they identify suspicious patterns as for instance (5>1>2>4) and notice that such patterns occur more frequently for hotels that benefit highly from reviews authored by single-time reviewers.
Identifying trustworthy and fake reviews
- The authors argue that fake reviewers can be identified by comparing the reviews over time, since there must have been a point in time, before the entity got engaged in solicitation
- They collect truthful reviewers based on the following criteria:
- such reviewers have a long reviewing history and have written more than 10 reviews,
- they do not post reviews in short time intervals, and
- do not divert too much from the product's average ratings.
- The following metrics have been used for identifying deceptive reviews:
- the discrepancy between the average rating by truthful reviewers and the average rating by single-time reviewers
- the ratios of the number of strongly positive reviews to the number of strongly negative reviews among different groups of reviewers (e.g. single-time versus multi-time or truthful)
- sudden bursts of very positive or negative reviews
Thoughts
- Considering neutral (3-star) reviews might be beneficial for assessing controversial topics.
- It would be interesting to show how/whether writer re-identification can be useful for identifying reviews produced by public relation companies.