Online Spam Review Detection: A Survey of Literature

Guandong Xu

doi:10.1007/S44230-022-00001-3

Outline

Online Spam Review Detection: A Survey of Literature

Guandong Xu

Human-Centric Intelligent Systems

https://doi.org/10.1007/S44230-022-00001-3

visibility

…

description

17 pages

link

1 file

Abstract

The increasingly developed online platform generates a large amount of online reviews every moment, e.g., Yelp and Amazon. Consumers gradually develop the habit of reading previous reviews before making a decision of buying or choosing various products. Online reviews play an vital part in determining consumers’ purchase choices in e-commerce, yet many online reviews are intentionally created to confuse or mislead potential consumers. Moreover, driven by product reputations and merchants’ profits, more and more spam reviews were inserted into online platform. This kind of reviews can be positive, negative or neutral, but they had common features: misleading consumers or damaging reputations. In the past decade, many people conducted research on detecting spam reviews using statistical or deep learning method with various datasets. In view of that, this article first introduces the task of spam online reviews detection and makes a common definition of spam reviews. Then, we comprehen...

Figures (12)

Fig. 1 Distribution of Focused Research Works. Review Mining Task takes the percentage of 17%, end-to-end classification task is 22%, and cold-start problems have the same proportion as classification task. Spammer detection task takes the percentage of 39%

Table 1 Description of features in CATS Multicriteria Decision: Viviani et al. [20] used a multicrite- ria choice making approach based both on the evaluation of numerous criteria and the utilize of accumulation adminis- trators with the point of getting a veracity score related with each review. Based on this score, it is conceivable to identify spam reviews [20]. Specifically, they provide a definition of aggregation operator F:

Fig. 2 Wang et al., model overview. They take the products items as the head part of the TransE network in their model, take the reviewers as the translation (relation) part and take the review as the tail part

Fig. 3 Li et al., model overview. The embedding network consists four parts: user embedding layers, item embedding layers, review embedding networks, and rating embedding layers

Fig.4 Xie et al., model overview. Relationship of Spammers. Reviewers and Customers

Fig.5 Yang et al., WMUSVM Algorithm Overview. This demon- strate builds up hypersphere with greatest volume, containing all beguiling reviews information, and all genuine reviews information are exterior of this hypersphere

Fig.6 Nayak et al., review indicator framework. They demonstrate the review spamicity expectation as a two-class classification issue. The review spamicity pointer was built utilizing stacked LSTM mod- els and show yields the individual probabilities of each review being genuine or fakes

Table 4 Op_spam_v1.4 Datasets statistics

“Industry manufactured products like electronics, computers, etc. Table 5 Various features of different categories of products

Table 7 Statistics of the 500 restaurants in Shanghai

References (86)

Anderson M, Magruder J. Learning from the crowd: regression discontinuity estimates of the effects of an online review database. Econ J. 2012;122(563):957-89.
Luca M. Reviews, reputation, and revenue: the case of yelp. com. In: Com (March 15, 2016). Harvard Business School NOM Unit Working Paper,2016; no. 12-016.
Park C-H, Kim Y-G. Identifying key factors affecting consumer purchase behavior in an online shopping context. Int J Retail Dis- trib Manage. 2003.
Jindal N, Liu B. Opinion spam and analysis. In: Proceedings of the 2008 international conference on web search and data mining, 2008; pp. 219-30.
Wu Y, Ngai EW, Wu P, Wu C. Fake online reviews: literature review, synthesis, and directions for future research. Decis Sup- port Syst. 2020;132: 113280.
Li A, Qin Z, Liu R, Yang, Y, Li D. Spam review detection with graph convolutional networks. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Manage- ment, 2019; pp. 2703-11.
Lau RY, Liao S, Kwok RC-W, Xu K, Xia Y, Li Y. Text mining and probabilistic language modeling for online review spam detection. ACM Trans Manage Inf Syst (TMIS). 2012;2(4):1-30.
Ott M, Cardie C, Hancock J. Estimating the prevalence of decep- tion in online review communities. In: Proceedings of the 21st international conference on World Wide Web, 2012; pp. 201-10.
López V, Del Río S, Benítez JM, Herrera F. Cost-sensitive lin- guistic fuzzy rule based classification systems under the mapre- duce framework for imbalanced big data. Fuzzy Sets Syst. 2015;258:5-38.
Fei G, Mukherjee A, Liu B, Hsu M, Castellanos M, Ghosh R. Exploiting burstiness in reviews for review spammer detection. In: Proceedings of the International AAAI Conference on Web and Social Media, 2013; vol. 7, no. 1.
Mukherjee A, Liu B, Glance N. Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st international conference on World Wide Web, 2012; pp. 191-200.
Wang C-C, Day M-Y, Chen C-C, Liou J-W. Detecting spamming reviews using long short-term memory recurrent neural network framework. In: Proceedings of the 2nd International Conference on E-commerce, E-Business and E-Government, 2018; pp. 16-20.
Weng H, Ji S, Duan F, Li Z, Chen J, He Q, Wang T. Cats: cross- platform e-commerce fraud detection. In: 2019 IEEE 35th Inter- national Conference on Data Engineering (ICDE). IEEE, 2019; pp. 1874-85.
Rayana S, Akoglu L. Collective opinion spam detection: bridging review networks and metadata. In: Proceedings of the 21th acm sigkdd international conference on knowledge discovery and data mining, 2015; pp. 985-94.
Shehnepoor S, Salehi M, Farahbakhsh R, Crespi N. Nets- pam: a network-based spam detection framework for reviews in online social media. IEEE Trans Inf Forensics Secur. 2017;12(7):1585-95.
Wang X, Liu K, Zhao J. Handling cold-start problem in review spam detection by jointly embedding texts and behaviors. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017; pp. 366-76.
Ren Y, Ji D. Learning to detect deceptive opinion spam: a sur- vey. IEEE Access. 2019;7:42934-45.
Vidanagama DU, Silva TP, Karunananda AS. Decep- tive consumer review detection: a survey. Artif Intell Rev. 2020;53(2):1323-52.
Lai C, Xu K, Lau RY, Li Y, Jing L. Toward a language modeling approach for consumer review spam detection. In: 2010 IEEE 7th International Conference on E-Business Engineering. IEEE, 2010; pp. 1-8.
Viviani M, Pasi G. Quantifier guided aggregation for the veracity assessment of online reviews. Int J Intell Syst. 2017;32(5):481-501.
Fontanarava J, Pasi G, Viviani M. An ensemble method for the credibility assessment of user-generated content. In: Proceedings of the International Conference on Web Intelligence, 2017; pp. 863-8.
Noekhah S, Fouladfar E, Salim N, Ghorashi SH, Hozhabri AA. A novel approach for opinion spam detection in e-commerce. In: Proceedings of the 8th IEEE international conference on E-com- merce with focus on E-trust, 2014.
Yang X. One methodology for spam review detection based on review coherence metrics. In: Proceedings of 2015 International Conference on Intelligent Computing and Internet of Things. IEEE, 2015; pp. 99-102.
Li H, Liu B, Mukherjee A, Shao J. Spotting fake reviews using positive-unlabeled learning. Computación y Sistemas. 2014;18(3):467-75.
You Z, Qian T, Liu B. An attribute enhanced domain adaptive model for cold-start spam review detection. In: Proceedings of the 27th International Conference on Computational Linguistics, 2018; pp. 1884-95.
Li Q, Wu Q, Zhu C, Zhang J, Zhao W. An inferable representation learning for fraud review detection with cold-start problem. In: 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 2019; pp. 1-8.
Xie S, Wang G, Lin S, and Yu PS. Review spam detection via temporal pattern discovery. In: Proceedings of the 18th ACM SIG- KDD international conference on Knowledge discovery and data mining, 2012, pp. 823-31.
Wang G, Xie S, Liu B, Philip SY. Review graph based online store review spammer detection. In: IEEE 11th international conference on data mining. IEEE. 2011;2011:1242-7.
Wang G, Xie S, Liu B, Yu PS. Identify online store review spam- mers via social review graph. ACM Trans Intell Syst Technol (TIST). 2012;3(4):1-21.
Hussain N, Mirza HT, Hussain I, Iqbal F, Memon I. Spam review detection using the linguistic and spammer behavioral methods. IEEE Access. 2020;8:53801-16.
Aghakhani H, Machiry A, Nilizadeh S, Kruegel C, Vigna G. Detecting deceptive reviews using generative adversarial net- works. In: IEEE Security and Privacy Workshops (SPW). IEEE. 2018;2018:89-95.
Zheng P, Yuan S, Wu X, Li J, and Lu A. One-class adversarial nets for fraud detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, 2019, pp. 1286-93.
Chen C. Mining the web: discovering knowledge from hypertext data. J Am Soc Inf Sci. 2004;55(3):275.
Mukherjee A, Venkataraman V, Liu B, Glance N, et al. Fake review detection: classification and analysis of real and pseudo reviews. UIC-CS-03-2013. Technical Report, 2013.
Alom Z, Carminati B, and Ferrari E. Detecting spam accounts on twitter. In: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 2018, pp. 1191-8.
Swe MM and Myo NN. Fake accounts detection on twitter using blacklist. In: 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS). IEEE, 2018, pp. 562-6.
Jia S, Zhang X, Wang X, and Liu Y. Fake reviews detection based on lda. In: 2018 4th International Conference on Informa- tion Management (ICIM). IEEE, 2018, pp. 280-3.
Aritsugi M, et al. Exploiting function words feature in clas- sifying deceptive and truthful reviews. In: 2018 Thirteenth International Conference on Digital Information Management (ICDIM). IEEE, 2018, pp. 51-6.
Mesnil G, Mikolov T, Ranzato M, and Bengio Y. Ensemble of generative and discriminative techniques for sentiment analysis of movie reviews. arXiv preprint; 2014. arXiv: 1412. 5335.
Yang X and Yu X. Recognizing deceptive reviews based on weighted multi-instance unbalanced support vector machine. In: Proceedings of the 2019 International Conference on Artificial Intelligence and Computer Science, 2019, pp. 705-8.
Kennedy S, Walsh N, Sloka K, Mccarren A, and Foster J. Fact or factitious? Contextualized opinion spam detection. In: Pro- ceedings of the 57th Annual Meeting of the association for com- putational linguistics: student research workshop, 2019.
Devlin J, Chang MW, Lee K, and Toutanova K. Bert: pre-train- ing of deep bidirectional transformers for language understand- ing. 2018.
Nilizadeh S, Aghakhani H, Gustafson E, Kruegel C, and Vigna G. Think outside the dataset: Finding fraudulent reviews using cross-dataset analysis. In: The World Wide Web Conference, 2019, pp. 3108-15.
Tingxuan S and Lau RYK. Collective classification for social opinion spam detection. In: Proceedings of the 2019 2nd inter- national conference on data science and information technology, 2019, pp. 181-6.
Sihombing A and Fong ACM. Fake review detection on yelp dataset using classification techniques in machine learning. In: 2019 International conference on contemporary computing and informatics (IC3I). IEEE, 2019, pp. 64-8.
Ott M, Choi Y, Cardie C, and Hancock JT. Finding deceptive opinion spam by any stretch of the imagination. arXiv preprint; 2011. arXiv: 1107. 4557.
Barushka A and Hajek P. The effect of text preprocessing strate- gies on detecting fake consumer reviews. In: Proceedings of the 2019 3rd international conference on e-business and internet, 2019, pp. 13-7.
Hassan R and Islam MR. Detection of fake online reviews using semi-supervised and supervised learning. In: 2019 International conference on electrical, computer and communication engi- neering (ECCE). IEEE, 2019, pp. 1-5.
Prakash P, Shashank N, Arjun M, Yadav PS, Shreyamsa S, and Prazwal N. Fake review prevention using classification and authentication techniques. In: ICT Systems and Sustainability. Springer, 2020, pp. 397-406.
Caruana R and Niculescu-Mizil A. An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning, 2006, pp. 161-8.
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. 2001. J Roy Stat Soc. 2004;167(1):192-192.
Li H, Chen Z, Liu B, Wei X, Shao J. Spotting fake reviews via collective positive-unlabeled learning. IEEE Int Conf Data Min. 2014;2014:899-904.
Ren Y, Ji D, and Zhang H. Positive unlabeled learning for decep- tive reviews detection. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 488-98.
Hai Z, Zhao P, Cheng P, Yang P, Li X-L, and Li G. Deceptive review spam detection via exploiting task relatedness and unla- beled data. In: Proceedings of the 2016 conference on empirical methods in natural language processing, 2016, pp. 1817-26.
Wu Z, Cao J, Wang Y, Wang Y, Zhang L, Wu J. hpsd: a hybrid pu-learning-based spammer detection model for product reviews. IEEE Trans Cybernet. 2018;50(4):1595-606.
Yilmaz CM and Durahim AO. Spr2ep: a semi-supervised spam review detection framework. In: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 2018, pp. 306-13.
Liu W, Jing W, Li Y. Incorporating feature representa- tion into bilstm for deceptive review detection. Computing. 2020;102(3):701-15.
Barushka A and Hajek P. Review spam detection using word embeddings and deep neural networks. In: IFIP International conference on artificial intelligence applications and innovations. Springer, 2019, pp. 340-50.
Archchitha K and Charles E. Opinion spam detection in online reviews using neural networks. In: 2019 19th International Con- ference on Advances in ICT for Emerging Regions (ICTer), vol. 250. IEEE, 2019, pp. 1-6.
Yuan C, Zhou W, Ma Q, Lv S, Han J, and Hu S. Learning review representations from user and product level information for spam detection. In: 2019 IEEE international conference on data mining (ICDM). IEEE, 2019; pp. 1444-9.
Wang Z, Zhang J, Feng J, Chen Z. Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the AAAI con- ference on artificial intelligence, 2014; vol. 28, no. 1
Nayak A, Chen H, Ruan X, and Ouyang J. Deepspot: understand- ing online opinion spam by text augmentation using sentiment encoder-decoder networks. In: Proceedings of the 3rd ACM SIG- SPATIAL international workshop on analytics for local events and news, 2019, pp. 1-10.
Ren Y, Zhang Y. Deceptive opinion spam detection using neural network. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, 2016; pp. 140-50.
Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M. Graph neural networks: a review of methods and applications. AI Open. 2020;1:57-81.
Kindermann R. Markov random fields and their applications. Am Math Soc. 1980.
Sun H, Morales A, Yan X. Synthetic review spamming and defense. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 2013; pp. 1088-96.
Weng H, Li Z, Ji S, Chu C, Lu H, Du T, He Q. Online e-commerce fraud: a large-scale detection and analysis. In: 2018 IEEE 34th international conference on data engineering (ICDE). IEEE, 2018; pp. 1435-40.
Xue H, Wang Q, Luo B, Seo H, Li F. Content-aware trust propa- gation toward online review spam detection. J Data Inf Quality (JDIQ). 2019;11(3):1-31.
Yuan D, Miao Y, Gong NZ, Yang Z, Li Q, Song D, Wang Q, and Liang X. Detecting fake accounts in online social networks at the time of registrations. In: Proceedings of the 2019 ACM SIGSAC conference on computer and communications security, 2019, pp. 1423-38.
Wang D, Lin J, Cui P, Jia Q, Wang Z, Fang Y, Yu Q, Zhou J, Yang S, and Qi Y. A semi-supervised graph attentive network for financial fraud detection. In: 2019 IEEE international conference on data mining (ICDM). IEEE, 2019, pp. 598-607.
Liu Z, Chen C, Yang X, Zhou J, Li X, and Song L. Heterogene- ous graph neural networks for malicious account detection. In: Proceedings of the 27th ACM international conference on infor- mation and knowledge management, 2018, pp. 2077-85.
Perozzi B, Al-Rfou R, and Skiena S. Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data min- ing, 2014, pp. 701-10.
Mikolov T, Chen K, Corrado G, and Dean J. Efficient estimation of word representations in vector space. arXiv preprint; 2013. arXiv: 1301. 3781.
Ali Alhosseini S, Bin Tareaf R, Najafi P, and Meinel C. Detect me if you can: Spam bot detection using inductive representation learning. In: Companion proceedings of The 2019 World Wide Web conference, 2019, pp. 148-53.
Hamilton WL, Ying R, and Leskovec J. Inductive representation learning on large graphs. arXiv preprint; 2017. arXiv: 1706. 02216.
Pearl J. Probabilistic reasoning in intelligent systems: networks of plausible inference. Elsevier, 2014.
Wang J, Wen R, Wu C, Huang Y, Xion J. Fdgars: fraudster detec- tion via graph convolutional networks in online app review sys- tem. In: Companion proceedings of The 2019 World Wide Web conference, 2019; pp. 310-6.
Ghadery E, Movahedi S, Faili H, Shakery A. Mncn: a multilingual ngram-based convolutional network for aspect category detection in online reviews. In: Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019; pp. 6441-8.
Dong W, Moses C, Li K. Efficient k-nearest neighbor graph construction for generic similarity measures. In: Proceedings of the 20th international conference on World wide web, 2011; pp. 577-86.
Rakhlin A. "Convolutional neural networks for sentence classifica- tion," GitHub, 2016.
Ott M, Cardie C, Hancock JT. Negative deceptive opinion spam. In: Proceedings of the 2013 conference of the north American chapter of the association for computational linguistics: human language technologies, 2013; pp. 497-501.
He R, McAuley J. Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In: Pro- ceedings of the 25th international conference on world wide web, 2016; pp. 507-17.
McAuley J, Targett C, Shi Q, Van Den Hengel A. Image-based recommendations on styles and substitutes. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, 2015; pp. 43-52.
Jindal N, Liu B. Opinion spam and analysis. In: WSDM'08 -Pro- ceedings of the 2008 international conference on web search and data mining, no. November,2008; pp. 219-29.
Learning to identify review spam. IJCAI international joint con- ference on artificial intelligence, no. January 2011,2011; pp. 2488-93
Mukherjee A, Venkataraman V, Liu B, Glance N. What yelp fake review filter might be doing?. In: Proceedings of the international AAAI conference on web and social media, 2013; vol. 7, no. 1.

Online Spam Review Detection: A Survey of Literature

Sign up for access to the world's latest research

Abstract

Related papers

References (86)

Related papers