Academia.eduAcademia.edu

Outline

Detection of Fake News on COVID-19 on Web Search Engines

2021, Frontiers in Physics

https://doi.org/10.3389/FPHY.2021.685730

Abstract

In early January 2020, after China reported the first cases of the new coronavirus (SARS-CoV-2) in the city of Wuhan, unreliable and not fully accurate information has started spreading faster than the virus itself. Alongside this pandemic, people have experienced a parallel infodemic, i.e., an overabundance of information, some of which is misleading or even harmful, which has widely spread around the globe. Although social media are increasingly being used as the information source, web search engines, such as Google or Yahoo!, still represent a powerful and trustworthy resource for finding information on the Web. This is due to their capability to capture the largest amount of information, helping users quickly identify the most relevant, useful, although not always the most reliable, results for their search queries. This study aims to detect potential misleading and fake contents by capturing and analysing textual information, which flow through search engines. By using a real-...

References (46)

  1. J.D. Greer. Evaluating the credibility of online information: A test of source and advertising influence. Mass Communication & Society, pages 11-28, 2003.
  2. Z. Xianjin, Y. Haijuan, Y. Yalan, L. Kunfeng, and H. Chengsong. Exploring the effect of social media information quality, source credibility and reputation on informational fit-to-task: Moderating role of focused immersion. Computers in Human Behavior, 79:227-237, 2018.
  3. W. Y. Chou, Y. M. Hunt, E. B. Beckjord, R. P. Moser, and B. W. Hesse. Social media use in the united states: implications for health communication. Journal of medical Internet research, 11(4), 2009.
  4. J. Y. Breland, L. M. Quintiliani, K. L. Schneider, C. N. May, and S. Pagoto. Social media as a tool to increase the impact of public health research. American journal of public health, 107(12):1890-1891, 2017.
  5. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1):107-117, 1998.
  6. S. Manjesh, T. Kanakagiri, P. Vaishak, V. Chettiar, and G. Shobha. Clickbait pattern detection and classification of news headlines using natural language processing. 2017 2nd International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS), pages 1-5, 2017.
  7. P. Bourgonje, J. M. Schneider, and G. Rehm. From clickbait to fake news detection: An approach based on detecting the stance of headlines to articles. Proceedings of the 2017 EMNLP workshop: Natural language processing meets journalism. Association for Computational Linguistics, pages 84-89, 2017.
  8. M. Aldwairi and A. Alwahedi. Detecting fake news in social media networks. Procedia Computer Science, 141:215-222, 2018.
  9. M.K. Elhadad, K.F. Li, and F. Gebali. Detecting misleading information on covid-19. IEEE Access, 8:165201- 165215, 2020.
  10. V. Agarwal, H.P. Sultana, S. Malhotra, and A. Sarkar. Analysis of classifiers for fake news detection. Procedia Computer Science, 165:377-383, 2019.
  11. Reuters. 2020. https://www.reuters.com, Accessed: 2020-07-20.
  12. Y. Madani, M. Erritali, and B. Bouikhalene. Using artificial intelligence techniques for detecting covid-19 epidemic fake news in moroccan tweets. Results in Physics, 25:104266, 2021.
  13. S. Helmstetter and H. Paulheim. Collecting a large scale dataset for classifying fake news tweets using weak supervision. Future Internet, 13:114, 2021.
  14. A.S. Desuky and S. Hussain. An improved hybrid approach for handling class imbalance problem. Arabian Journal for Science and Engineering (AJSE), 46:3853-3864, 2021.
  15. M.M. Al-Rifaie and H.A. Alhakbani. Handling class imbalance in direct marketing dataset using a hybrid data and algorithmic level solutions. 2016 SAI Computing Conference (SAI), pages 446-451, 2016.
  16. M. Sokolova, N. Japkowicz, and S. Szpakowicz. Beyond accuracy, f-score and roc: a family of discriminant measures for performance evaluation. AI 2006: Advances in Artificial Intelligence, Lecture Notes in Computer Science, 4304:1015-1021, 2006.
  17. G. H. Lee and S. Y. Shin. Federated learning on clinical benchmark data: Performance assessment. Journal of Medical Internet Research, 22:446-451, 2020.
  18. Financial Times. 2021. https://ig.ft.com/coronavirus-lockdowns/, Accessed: 2021-04-27.
  19. S. B. Naeem, R. Bhatti, and A. Khan. An exploration of how fake news is taking over social media and putting public health at risk. Health information and libraries journal, 2020.
  20. Poynter.org. 2020. https://www.poynter.org/, Accessed: 2020-03-15.
  21. C. Beleites, U. Neugebauer, T. Bocklitz, C. Krafft, and J. Popp. Sample size planning for classification models. Analytica Chimica Acta, 2013.
  22. S. Sarica and J. Luo. Stopwords in technical language processing. ArXiv, abs/2006.02633, 2020.
  23. Y. Zhang, R. Jin, and Z.H. Zhou. Understanding bag-of-words model: a statistical framework. International Journal of Machine Learning and Cybernetics, pages 43-52, 2010.
  24. B. Al Asaad and M. Erascu. A tool for fake news detection. 2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pages 379-386, 2018.
  25. H. Ahmed, Traore I., and S. Saad. Detecting opinion spams and fake news using text classification. Security and Privacy, 1:e9, 2017.
  26. E. Zhu, Y. Chen, C. Ye, X. Li, and F. Liu. Ofs-nn: An effective phishing websites detection model based on optimal feature selection and neural network. IEEE Access, 7:73271-73284, 2019.
  27. T. Li, G. Kou, and Y. Peng. Improving malicious urls detection via feature engineering: Linear and nonlinear space transformation methods. Information Systems, 91:101494, 2020.
  28. A. Luque, A. Carrasco, A. Martín, and A. de las Heras. The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, 91:216-231, 2019.
  29. W. Xie, G. Liang, Z. Dong, B. Tan, and B. Zhang. An improved oversampling algorithm based on the samples' selection strategy for classifying imbalanced data. Mathematical Problems in Engineering, pages 1-13, 2019.
  30. K. Agrawal, Y. Baweja, D. Dwivedi, R. Saha, P. Prasad, S. Agrawal, S. Kapoor, P. Chaturvedi, N. Mali, V. U. Kala, and V. Dutt. A comparison of class imbalance techniques for real-world landslide predictions. 2017 International Conference on Machine Learning and Data Science (MLDS), pages 1-8, 2017.
  31. J. Lever, M. Krzywinski, and N. Altman. Model selection and overfitting. Nature Methods, 13:703-704, 2013.
  32. X. Cho, D. Hoa, and V. Tisenko. Malicious url detection based on machine learning. International Journal of Advanced Computer Science and Applications, 2020.
  33. G. Wejinya and S. Bhatia. Machine learning for malicious url detection. In: Tuba M., Akashe S., Joshi A. (eds) ICT Systems and Sustainability. Advances in Intelligent Systems and Computing, 1270, 2021.
  34. G. Di Leo and F. Sardanelli. Statistical significance: p value, 0.05 threshold, and applications to radiomics-reasons for a conservative approach. European radiology experimental, 4(1):18, 2020.
  35. K. P. Vatcheva, M. Lee, J. B. McCormick, and M. H. Rahbar. Multicollinearity in regression analyses conducted in epidemiologic studies. Epidemiology (Sunnyvale, Calif.), 6(2):227, 2016.
  36. J. Gómez-Ramírez, M. Ávila Villanueva, and M.Á. Fernández-Blázquez. Selecting the most important self-assessed features for predicting conversion to mild cognitive impairment with random forest and permutation-based methods. Scientific Reports, 10:20630, 2020.
  37. S. Garera, N. Provos, M. Chew, and A. D. Rubin. A framework for detection and measurement of phishing attacks. Proceedings of the 2007 ACM Workshop on Recurring Malcode -WORM '07, 2007.
  38. S.C. Jeeva and E.B. Rajsingh. Intelligent phishing url detection using association rule mining. Human-centric Computing and Information Sciences (HCIS), 6, 2016.
  39. S. Sankhwar, P. Dhirendra, and R. Khan. Email phishing: An enhanced classification model to detect malicious urls. ICST Transactions on Scalable Information Systems, 6:158529, 2018.
  40. M. Bekkar, H. Djema, and T.A. Alitouche. Evaluation measures for models assessment over imbalanced data sets. Journal of Information Engineering and Applications, 3:27-38, 2013.
  41. L. A. Jeni, J. F. Cohn, and F. De La Torre. Facing imbalanced data recommendations for the use of performance metrics. Proceedings -2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013, pages 245-251, 2013.
  42. R. Dhanalakshmi and C. Chellappan. Detecting malicious urls in e-mail -an implementation. AASRI Procedia, 4:125-131, 2013.
  43. G. Sonowal. Phishing email detection based on binary search feature selection. SN Computer Science, 1:191, 2020.
  44. W. Wang and K.E. Shirley. Breaking bad: Detecting malicious domains using word segmentation. IEEE Web 2.0 Security and Privacy Workshop (W2SP), 2015.
  45. G. Palaniappan, S. Sangeetha, Balaji. Rajendran, S. Sanjay, Goyal, and B. S. Bindhumadhava. Malicious domain detection using machine learning on domain name features, host-based features and web-based features. Procedia Computer Science, 171:654-661, 2020.
  46. Factcheck.org. 2016. https://www.factcheck.org/, Accessed: 2016-11-18.