The rapid proliferation of mobile communication has kept Short Message Service (SMS) a popular ch... more The rapid proliferation of mobile communication has kept Short Message Service (SMS) a popular channel for personal and business communications. As the usage rose, the malicious exploitation followed, leading to the increased flow of unwanted and harmful content in form of spam. Spam sent by SMS is a channel for identity theft, phishing attacks, and fraud, etc., so it is important to ensure the detection of SMS spam using improved methodologies. This research paper presents a machine learning based intelligent spam filtering framework. The SMS Spam Collection Dataset consists of 5,574 entries, each labeled as either spam message or ham message. The preprocessing included normalization, tokenization, lemmatization, and removal of tokens that were not informative. Textual data was transformed into numerical vectors using Term Frequency-Inverse Document Frequency (TF-IDF) and Bag-of-Words (BoW) approaches. Some algorithms were compared against each other, including Multinomial Naïve Bayes, Support Vector Machine, Logistic Regression, Random Forest, and K-Nearest Neighbors. In particular, the probabilistic classifier was the best performer. Ensemble strategies i.e Voting Classifier, XGBoost were used to improve the predictive stability. Statistical and structural message attributes were used for feature extraction and a dimensionality reduction method was applied to allow for better generalization. The results were much better, 98.6% accuracy, the evaluation was performed with 10-fold cross-validation. The research paper also examines possibilities for deployment on multilingual datasets and suggests enhancing the model to be robust against concept drift, adversarial inputs, and evolving linguistic patterns.
Uploads
Papers by Kesavan K7