Academia.eduAcademia.edu

Outline

SMS Spam Detection Using Machine Learning: An Experimental Study

2025, International Journal of Emerging Trends in Engineering Research

https://doi.org/10.30534/IJETER/2025/011372025

Abstract

The exponential growth of mobile communication has intensified the threat of SMS spam, compromising user security and trust in messaging platforms. This study addresses this challenge by designing and deploying a robust spam detection system using machine learning. We analyze a publicly available SMS dataset through rigorous pre-processing, including text normalization, tokenization, and feature engineering, followed by TF-IDF vectorization. A comparative evaluation of 11 classifiers-spanning probabilistic models, ensemble methods, and linear classifiers-reveals that ensemble techniques outperform traditional algorithms. The Extra Trees Classifier and XGBoost achieve state-of-the-art results, with 97.9% accuracy and 97.5% precision, demonstrating their efficacy in distinguishing spam from legitimate messages. To bridge the gap between research and practical application, we develop an interactive Streamlit web application that enables real-time spam classification with a user-friendly interface. This work underscores the potential of ensemble learning for text classification tasks and provides a scalable framework for combating SMS spam in real-world scenarios.

FAQs

sparkles

AI

What machine learning algorithms performed best in SMS spam detection?add

The study reveals that Extra Trees Classifier and XGBoost achieved the highest accuracy of 97.87%.

How was the effectiveness of the algorithms evaluated in the study?add

The performance was rigorously assessed using metrics like accuracy, precision, recall, and F1-score.

What challenges exist with the current SMS spam detection dataset?add

The dataset is limited to English messages, and imbalances in spam types may affect performance.

What preprocessing steps were taken in the SMS data handling?add

Preprocessing included lowercasing, tokenization, stopword removal, and feature extraction for improved model performance.

How does the developed web application enhance user interaction?add

The interactive web application allows users to classify SMS messages in real-time, demonstrating practical applicability.

References (8)

  1. "SMS Spam Collection Dataset," UCI Machine Learning Repository. [Online]. Available: https://archive.ics.uci.edu/ml/datasets/sms+spam+collection.
  2. A. Almeida, J. M. G. Hidalgo, and A. Yamakami, "Contributions to the study of SMS spam filtering: New collection and results," in Proc. 11th ACM Symp. Document Engineering, 2011, pp. 259-262.
  3. I. Androutsopoulos, J. Koutsias, K. V. Chandrinos, and C. D. Spyropoulos, "An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages," in Proc. 23rd Annual Int. ACM SIGIR Conf. Research and Development in Information Retrieval, 2000, pp. 160-167.
  4. F. Pedregosa et al., "Scikit-learn: Machine learning in Python," Journal of Machine Learning Research, vol. 12, pp. 2825-2830, 2011.
  5. C. Cortes and V. Vapnik, "Support-vector networks," Machine Learning, vol. 20, no. 3, pp. 273-297, 1995.
  6. L. Breiman, "Random forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.
  7. T. Chen and C. Guestrin, "XGBoost: A scalable tree boosting system," in Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2016, pp. 785-794.
  8. NLTK Project. [Online]. Available: https://www.nltk.org/. [Accessed: 28-May-2025].