MARKET | Jun 6, 2017

Data analysis counters false reviews

A San Jose State University study highlights the role of data analysis in detecting false users reviews of companies and products.

The amount of data available on customer behaviour and purchase choices is highly valuable for all companies, but one of the most  interested sectors, according to a recent study, is that involving food, which sees this as an  enabling factor capable of subverting the schemas in the critical areas of food security, product development and supply chain management. An organic reading of  data across marketing, sales, and service becomes crucial in order to process and design the customer data profile based on complete and accurate information that can and must be retrieved through new forms of business-to-business  interactions.

Social networks and consumer interactions dominate the food industry, where reviews become an opportunity to get an in-depth view on behaviours and  purchase choices that can help improve the products and services offered by businesses.

But how reliable are online reviews?

Several studies  have estimated that almost a third of all consumer reviews are false. This results in adjustments to ratings and the reputations of accommodation facilities or food businesses being promoted or destroyed in an artificial and fraudulent manner and without any form of control or monitoring. The economic benefits of false reviews have even given rise to a market of users who are handsomely paid to falsify reviews, to promote one business over another, or to curb sales of competing companies’ products. This attitude could be collectively identified as fraud in terms of opinions or spam that may unbalance and modify business and purchasing activities.

Astroturfing is a marketing practice involving  people who are ready to comment, tweet, write reviews, also using various different accounts, with the aim of  producing false consensus. While in European Union countries this has already been catalogued as misleading advertising and unfair competition and acknowledged as such in Italy, it is still legal practice in the United States. However, detecting a false review is not easy to do as there is a reference text for each judgement expressed, and sometimes the author, the date of publication and the grading are anonymous. In the past, the criticality of detecting false reviews was addressed by focusing on reviewing the information within the text and analysing the behavioural approaches of users considered  to be fake.

2016 study  conducted by the University of Texas at San Antonio tried to find a way to detect this practice by statistically analysing various written samples by means of a binary “n-gram” system that had already proven effective in identifying the authors of certain texts. The system reads text as a sequence of symbols: letters, punctuation marks and spaces between words are analysed, without considering the text or grammatical rules. Counting how many times each n-gram appears in a given text, one can construct an indicator that profiles a given author, since it is precisely the numeric difference in n-grams that determines the stylistic differences between different writers. Applying this system to social media, researchers have shown statistically that certain authors operating on different platforms were in fact the same person. Having sampled the styles of some of the most “prolific” commentators on numerous news websites they discovered that many of these people could be linked to a limited number of single users  who had created various accounts.

The role of neural networks

San Jose State University researchers conducted a study analysing the case of the YELP platform, one of the leading consumer review websites, which alone contains more than 100 million business reviews, with a market of approximately four billion dollars. Researchers at the Department of Computer Engineering analysed and studied social platform users, verifying the impact of their behaviour regarding classification as potential fakers and spammers. Social media interactive behavioural characteristics were analysed and compared between two groups of users: one group of reviewers whose reviews were filtered as non-spam reviews and another group whose reviews were evaluated as spam to improve the result of the classification on the algorithm data used by the site to filter false and suspect reviews. All of these social media behavioural features were analysed using a neural network to estimate the genuineness of the user and to predict their social media activity. The study shows how a neural network can be considered an effective measure for these cases.

The ANN-based model

In the field of machine learning, an Artificial Neural Network (ANN) is a mathematical model consisting of a group of information interconnections consisting of artificial neurons and processes that use a computational connectivity approach. Often an artificial neural network is an adaptive system that changes its structure, based on external or internal information that flows through the network during the learning phase. Neural networks are non-linear structures of statistical data organised as modelling tools and they can be used to simulate complex relationships between inputs and outputs that other analytical functions are not able to represent.

An artificial neural network receives external signals on a layer of nodes (processing unit), each of which is connected to multiple internal nodes, organised in multiple layers. Each node processes the signals received and transmits the result to the subsequent nodes. Neural networks, widely used for image processing applications such as facial recognition or autonomous driving, succeed in imitating the procedure by which our brain learns new skills and consolidates them for future use, applying them when necessary.

The recent model developed for the detection of spam or false reviews on Yelp uses social interaction behaviour functions, detecting fraudulent activity even by analysing the  number of friends and followers, the amount of  photos, compliments and votes received. According to researchers, the algorithm has proved to be suitable for  detecting the critical issues linked to spam and fake reviews. This applies to Yelp however. The possibility of using the same methodology on other platforms would not necessarily lead to the same positive result.

Emma Pietrafesa