Spam Detection In Sms Using Machine Learning Through Text Mining
2020, International Journal of Scientific & Technology Research
Sign up for access to the world's latest research
Abstract
The development of the cell phone clients has prompted a sensational increment in SMS spam messages. Despite the fact that in many parts of the world, versatile informing channel is right now viewed as "spotless" and trusted, on the complexity ongoing reports obviously show that the volume of cell phone spam is drastically expanding step by step. It is a developing mishap particularly in the Middle East and Asia. SMS spam separating is a similarly late errand to arrangement such an issue. It acquires numerous worries and convenient solutions from SMS spam separating. Anyway it fronts its own specific issues and issues. This paper moves to deal with the undertaking of sifting versatile messages as Ham or Spam for the Indian Users by adding Indian messages to the overall accessible SMS dataset. The paper examinations distinctive machine learning classifiers on vast corpus of SMS messages for individuals.
Related papers
International Journal of Information Security Science, 2013
The growth of mobile phone users has lead to a dramatic increasing of SMS spam messages. Recent reports clearly indicate that the volume of mobile phone spam is dramatically increasing year by year. In practice, fighting such plague is difficult by several factors, including the lower rate of SMS that has allowed many users and service providers to ignore the issue, and the limited availability of mobile phone spam-filtering software. Probably, one of the major concerns in academic settings is the scarcity of public SMS spam datasets, that are sorely needed for validation and comparison of different classifiers. Moreover, traditional content-based filters may have their performance seriously degraded since SMS messages are fairly short and their text is generally rife with idioms and abbreviations. In this paper, we present details about a new real, public and non-encoded SMS spam collection that is the largest one as far as we know. Moreover, we offer a comprehensive analysis of such dataset in order to ensure that there are no duplicated messages coming from previously existing datasets, since it may ease the task of learning SMS spam classifiers and could compromise the evaluation of methods. Additionally, we compare the performance achieved by several established machine learning techniques. In summary, the results indicate that the procedure followed to build the collection does not lead to near-duplicates and, regarding the classifiers, the Support Vector Machines outperforms other evaluated techniques and, hence, it can be used as a good baseline for further comparison.
Turkish Journal of Computer and Mathematics Education (TURCOMAT), 2021
Over the last decade, the growth of short message services has been rising. These text messages are more powerful for corporations than even SMS. This is because about 80 percent of sms remain unopened while 98 percent of smartphone users read theirs by the end of the day. Spam, which refers to any irrelevant text messages sent via mobile networks, has also gained popularity. For consumers, they are seriously irritating. Due to the geographical material, use of abbreviated words, the current Spam Detection techniques are more challenging than e-mail spam detection techniques , unfortunately very few of the existing research addresses these challenges. Much of the current research that has attempted to filter Spam has focused on features that were manually found. This paper aims to solve these concerns. Filtering is one of the most effective strategies among the methods developed to stop spam. Days of machine learning techniques are now used to process the spam SMS automatically at a very good rate. The goal of this research is to differentiate between ham and spam messages by developing an accurate and responsive model of classification that provides good accuracy with a low false positive rate
2011
The growth of mobile phone users has lead to a dramatic increasing of SMS spam messages. In practice, fighting mobile phone spam is difficult by several factors, including the lower rate of SMS that has allowed many users and service providers to ignore the issue, and the limited availability of mobile phone spam-filtering software. On the other hand, in academic settings, a major handicap is the scarcity of public SMS spam datasets, that are sorely needed for validation and comparison of different classifiers. Moreover, as SMS messages are fairly short, content-based spam filters may have their performance degraded. In this paper, we offer a new real, public and non-encoded SMS spam collection that is the largest one as far as we know. Moreover, we compare the performance achieved by several established machine learning methods. The results indicate that Support Vector Machine outperforms other evaluated classifiers and, hence, it can be used as a good baseline for further comparison.
2016
Objective: To report a review of various machine learning and hybrid algorithms for detecting SMS spam messages and comparing them according to accuracy criterion. Data sources: Original articles written in English found in Sciencedirect.com, Google-scholar.com, Search.com, IEEE explorer, and the ACM library. Study selection: Those articles dealing with machine learning and hybrid approaches for SMS spam filtering. Data extraction: Many articles extracted by searching a predefined string and the outcome was reviewed by one author and checked by the second. The primary paper was reviewed and edited by the third author. Results: A total of 44 articles were selected which were concerned machine learning and hybrid methods for detecting SMS spam messages. 28 methods and algorithms were extracted from these papers and studied and finally 15 algorithms among them have been compared in one table according to their accuracy, strengths, and weaknesses in detecting spam messages of the Tiago ...
International Journal of Emerging Trends in Engineering Research, 2025
The exponential growth of mobile communication has intensified the threat of SMS spam, compromising user security and trust in messaging platforms. This study addresses this challenge by designing and deploying a robust spam detection system using machine learning. We analyze a publicly available SMS dataset through rigorous pre-processing, including text normalization, tokenization, and feature engineering, followed by TF-IDF vectorization. A comparative evaluation of 11 classifiers-spanning probabilistic models, ensemble methods, and linear classifiers-reveals that ensemble techniques outperform traditional algorithms. The Extra Trees Classifier and XGBoost achieve state-of-the-art results, with 97.9% accuracy and 97.5% precision, demonstrating their efficacy in distinguishing spam from legitimate messages. To bridge the gap between research and practical application, we develop an interactive Streamlit web application that enables real-time spam classification with a user-friendly interface. This work underscores the potential of ensemble learning for text classification tasks and provides a scalable framework for combating SMS spam in real-world scenarios.
Over recent years, as the popularity of mobile phone devices has increased, Short Message Service (SMS) has grown into a multi-billion dollars industry. At the same time, reduction in the cost of messaging services has resulted in growth in unsolicited commercial advertisements (spams) being sent to mobile phones. SMS spamming is an activity of sending 'unwanted messages' through text messaging or other communication services; normally using mobile phones. The SMS spam problem can be approached with legal, economic or technical measures. Nowadays there are many methods for SMS spam detection, ranging from the list-based, statistical algorithm, IP-based and using machine learning. However, an optimum method for SMS spam detection is difficult to find due to issues of SMS length, battery and memory performances. A database of real SMS Spams from UCI Machine Learning repository is used, and after preprocessing and feature extraction, different machine learning techniques are applied to the database. Among the wide range of technical measures, Bayesian filters are playing a key role in stopping sms spam. Here, we analyze to what extent Bayesian filtering techniques can be applied to the problem of detecting and stopping mobile spam. In particular, we have built SMS spam test collections of significant size in English. We have tested on them a number of messages representation techniques and Machine Learning algorithms, in terms of effectiveness. The effectiveness of the proposed features is empirically validated using multiple classification methods. The results demonstrate that the proposed features can improve the performance of SMS spam detection.
IEEE, 2022
In recent times, the increment of mobile phone usage has resulted in a huge number of spam messages. Spammers continuously apply more and more new tricks that cause managing or preventing spam messages a challenging task. The aim of this study is to detect spam message to prevent different cybercrimes as spam messages have become a security threat nowadays. In this paper, we contributed to previous studies on SMS spam problems to perform a better accuracy using several different techniques such as Support Vector Machine, K-Nearest Neighbor, Naïve Bayes, Random Forest, Logistic Regression and some more. Our result indicated that Support Vector Machine achieved the highest accuracy of 99%, indicating it might be useful as an effective machine learning system for future research.
Transactions on Machine Learning and Artificial Intelligence, 2019
This In the past years, spammers have focused their attention on sending spam through short messages services (SMS) to mobile users. They have had some success because of the lack of appropriate tools to deal with this issue. This paper is dedicated to review and study the relative strengths of various emerging technologies to detect spam messages sent to mobile devices. Machine Learning methods and topic modelling techniques have been remarkably effective in classifying spam SMS. Detecting SMS spam suffers from a lack of the availability of SMS dataset and a few numbers of features in SMS. Various features extracted and dataset used by the researchers with some related issues also discussed. The most important measurements used by the researchers to evaluate the performance of these techniques were based on their recall, precision, accuracies and CAP Curve. In this review, the performance achieved by machine learning algorithms was compared, and we found that Naive Bayes and SVM produce effective performance.
Advances in Artificial Intelligence Research, 2024
With the continuous rise in the number of mobile device users, SMS (Short Message Service) remains a prevalent communication tool accessible on both smartphones and basic phones. Consequently, SMS traffic has experienced a significant surge. This increase has also led to a rise in spam messages, as spammers seek financial or business gains through activities like marketing promotions, lottery scams, and credit card information theft. Consequently, spam classification has become a focal point of research. In this paper, we explore the effectiveness of 11 machine learning algorithms for SMS spam detection, including multinomial Naïve Bayes, K-Nearest Neighbors (KNN), and Random Forest, among others. Utilizing datasets from UCI and Bangla SMS collections, our experimental results reveal that the multinomial Naïve Bayes algorithm surpasses previous models in spam detection, achieving accuracies of 98.65% and 89.10% in the respective datasets.
Short Message Service (SMS) is one of the most frequently used services in the mobile phones, next to calls. In developing countries like India, SMS is the cheapest mode of communication. The advantage of this fact is exploited by the advertising companies to reach masses. The unsolicited SMS messages (a.k.a. spam SMS) generates notifications, thus consuming precious user attention. To formulate spam SMS problem and understand user's needs and preceptions, we conducted an online survey with 458 participants in different cities of India. Most of the survey participants admitted that they are quite annoyed with burst of SMS spams and ineffectiveness of regulatory solutions. However, some participants reported that, they do get useful information from spam SMSes sometime(e.g. discounts at a popular food joint). In this paper, we present design and implementation of a usercentric spam SMS filtering application i.e. SMSAssassin that uses content based machine learning techniques with user generated features to filter unwanted SMSes and reduces the burden of notifications for a mobile user.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (7)
- Camponovo G, Cerutti D., "The spam issue in mobile business: A comparative regulatory overview", Proc. 3rd Int. Conf. Mobile Bus., pp. 1-17..
- Cleff E.B., "Privacy issues in mobile advertisin"', Int. Rev. Law Comput.Technol., vol. 21, pp. 225-236.
- Fu J, Lin P, Lee S. , "Detecting spamming activities in a campus network using incremental learning", J. Netw. Comput. Appl., vol. 43, pp. 56-65.
- Hua J, Huaxiang Z., "Analysis on the content features and their correlation of Web pages for spam detection" , China Commun., vol. 12, no. 3, pp. 84-94.
- Reaves B, Scaife N, Tian D, Blue L, Traynor P, Butler K.R."Sending out an SMS: Characterizing the security of the SMS ecosystem with public gateways", Proc. IEEE Symp. Secur. Privacy (SP), pp. 339-356..
- Wang et al C.,"A behavior-based SMS antispam system", IBM J. Res. Develop., vol. 54, no. 6, pp. 3:1- 3:16.
- Yamakami T, "Impact from mobile SPAM mail on mobile internet services' in Parallel and Distributed Processing and Applications", Berlin, Germany:Springer, pp. 179-