Spam Filtering

description923 papers

group1,962 followers

lightbulbAbout this topic

Spam filtering is the process of identifying and blocking unsolicited or unwanted electronic messages, typically in email, using algorithms and heuristics to classify content as either legitimate or spam. This technique aims to enhance user experience and security by reducing the volume of irrelevant or harmful communications.

lightbulbAbout this topic

Key research themes

1. How do machine learning techniques address the evolving challenges of email spam filtering?

This theme explores the application and advancement of various machine learning (ML) algorithms in email spam filtering, focusing on handling concept drift, feature extraction, ensemble learning, and hybrid models to improve detection accuracy and adaptability under realistic scenarios where spam characteristics continuously evolve.

A review of machine learning approaches to Spam filtering

by Rajat Singh

2016

Key finding: This comprehensive review highlights that while traditional ML approaches such as Naive Bayes remain foundational, evolving challenges like concept drift and the obfuscation of spam texts necessitate adaptive filters. It... Read more

articleView Paper downloadDownload

SPAM EMAIL DETECTION USING MACHINE LEARNING INTEGRATED IN CLOUD

by Joyece Jane

2023

Key finding: Demonstrates the effectiveness of ensemble learning strategies—bagging and boosting—applied to classifiers including multinomial Decision Trees, Naive Bayes, KNN, Random Forest, and SVM for spam detection. The study finds... Read more

articleView Paper downloadDownload

Comparison of Three Machine Learning Models for the Detection of Emails Spam

by Raed Alkaied

2024, Research Square (Research Square)

Key finding: Through empirical comparison on the Spambase dataset, the study shows that Naive Bayes outperforms Support Vector Machines (SVM) and K-Nearest Neighbors (KNN) classifiers in email spam detection accuracy. This reinforces... Read more

articleView Paper downloadDownload

Evaluation of Supervised Learning Models for Automatic Spam Email Detection

by Tsehay Assegie

2024, Research Square (Research Square)

Key finding: Provides a multi-model evaluation (including Random Forest, AdaBoost, Decision Tree, SVM, and Naive Bayes) using balanced datasets and multiple metrics beyond accuracy. The Random Forest model attains the highest accuracy... Read more

articleView Paper downloadDownload

ML Approaches to Detect Email Spam Anamoly

by Joyece Jane

2023, various

Key finding: Proposes a spam detection system leveraging Naive Bayes classifiers integrated with tokenization and stop word filtering via scikit-learn. Emphasis is on the adaptability of ML techniques to changing spam tactics and the... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What are the roles and limitations of pre-acceptance filtering techniques in combating spam at the SMTP server level?

This research area investigates the application of pre-acceptance filtering mechanisms, such as blacklisting, whitelisting, and sender behavior profiling before accepting emails at the SMTP protocol handshake stage, aiming to reduce server load and increase early detection of spam. It also assesses the potential and practical limitations of these techniques in handling diverse spam sources.

On the Effectiveness of Pre-Acceptance Spam Filtering

by Zhuoqing Mao

2023

Key finding: Empirical analysis over millions of emails shows that well-constructed blacklists can filter up to 86% of spam by identifying offending IP blocks and individual senders during pre-acceptance SMTP interactions. However, a... Read more

articleView Paper downloadDownload

Trusting Spam Reporters: A Reporter-Based Reputation System for Email Filtering

by Andrei Schrenck

2016

Key finding: Introduces a reactive spam filtering system leveraging reporter reputation to enable earlier spam campaign detection. The method prioritizes feedback from trustworthy users to identify spamming quickly before widespread... Read more

articleView Paper downloadDownload

3. How can stylometric and content-based features alongside machine learning improve detection of sophisticated and AI-generated spam and phishing emails?

This theme focuses on detecting advanced unsolicited emails, including AI-generated phishing attempts, by extracting linguistic and stylometric features, employing interpretable machine learning models, and analyzing email content beyond traditional signature-based approaches to counteract increasingly sophisticated cyber threats.

Evaluating spam filters and Stylometric Detection of AI-generated phishing emails

by Paolo Modesti

2025, Expert Systems With Applications

Key finding: This work evaluates major email providers' abilities to block GPT-4o generated phishing emails, revealing vulnerabilities especially in Gmail and Outlook. Applying 60 stylometric features to classifiers identified XGBoost as... Read more

articleView Paper downloadDownload

Survey of Spam Comments Identification using NLP Techniques

by Vishal Borate

2024, International Journal of Research and Analytical Reviews (IJRAR)

Key finding: Investigates spam detection in user-generated comments by employing natural language processing (NLP) techniques, such as broken text flow and topic detection, combined with machine learning classifiers. The approach... Read more

articleView Paper downloadDownload

Artificial Intelligence and Its Impact on Punjabi Culture

by Devinder Pal Singh

2023, Punjab Dey Rang. Lahore. Pakistan. 17(3). 5-10. July- Sept.

Key finding: While not directly about spam filtering, this paper discusses AI's broader societal impacts, emphasizing the emerging challenges and opportunities in cultural settings due to AI's integration. It underlines the importance of... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Spam Filtering

Collective classification for spam filtering

by Igor Santos

2025, Logic Journal of IGPL

Spam has become a major issue in computer security because it is a channel for threats such as computer viruses, worms and phishing. Many solutions feature machine-learning algorithms trained using statistical representations of the terms... more

descriptionView Paper arrow_downwardDownload

The Best Kept Secrets with Corpus Linguistics

by neil cooke

2025

This paper presents the use of corpus linguistics techniques on supposedly "clean" corpora and identifies potential pitfalls. Our work relates to the task of filtering sensitive content, in which data security is strategically important... more

descriptionView Paper arrow_downwardDownload

Intellectual property escaped with the email? Press F1 for help

by neil cooke

2025

In this paper we describe an approach to information assurance in which we can prevent breach of confidentiality. Specifically, we examine aspects of the propagation of confidential information via email. Email provides one simple... more

descriptionView Paper arrow_downwardDownload

Emotion Mining on YouTube: An Intelligent System for Real-Time Comment Sentiment Classification

by Dr. Nitin Saraswat

2025, IJRAR

The growing popularity of YouTube video-sharing platforms requires organizations to analyze viewer comments for public opinion assessment and content development. A web application powered by machine learning techniques analyzes sentiment... more

descriptionView Paper arrow_downwardDownload

Content Based E Mail Classification

by Ms.sonal chakole

2025, International journal of scientific research in science, engineering and technology

Electronic Mail (E-mail) has established a significant place in information user’s life. Mails are used as a major and important mode of information sharing because emails are faster and effective way of communication. Email plays its... more

descriptionView Paper arrow_downwardDownload

BLAC: Towards Lightweight Protocol-Agnostic Access Control for Cloud Computing utilizing Bit-Level Traffic Signatures

by Aman Routh

2025, Proceedings of the 2025 12th International Conference on Computing for Sustainable Global Development (INDIACom)

Access control in multi-tenant cloud environments faces significant challenges due to encrypted communications, protocol diversity, and dynamic tenant behavior. Traditional access control methods, such as static-ruled-based and... more

descriptionView Paper arrow_downwardDownload

ADVANCES IN COMPUTER SCIENCE AND INFORMATION TECHNOLOGY

by Ejona Duci

2025, Evaluating SLA Performance in Xen Virtualized Environments

Abstract. In the modern digital era, the demand for highly available and resilient systems is constantly increasing, especially in cloud environments and data centers that provide critical services. Xen virtualization is one of the most... more

descriptionView Paper arrow_downwardDownload

SMS Spam Detection Using Machine Learning: An Experimental Study

by WARSE The World Academy of Research in Science and Engineering

2025, International Journal of Emerging Trends in Engineering Research

The exponential growth of mobile communication has intensified the threat of SMS spam, compromising user security and trust in messaging platforms. This study addresses this challenge by designing and deploying a robust spam detection... more

descriptionView Paper arrow_downwardDownload

Spam filtering using hybrid local-global Naive Bayes classifier

by Rohit Solanki

2025, 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

descriptionView Paper arrow_downwardDownload

Spam: A Big Data Challenge

by Dr. K. Poulose Jacob

2025, International Journal of Advanced Research in Computer Science

Spam consists of varieties of contents like text, image, embedded HTML, MIME attachments and also the volume of spam mails sent per day is massive. To handle this high volume, high velocity and large varieties of spam, a scalable spam... more

descriptionView Paper arrow_downwardDownload

E-Mail Spam Detection using Machine Learning and Deep Learning

by Shivam Pandey

2025, International Journal for Research in Applied Science and Engineering Technology

descriptionView Paper arrow_downwardDownload

A Comparative Analysis of Prediction of Autism Spectrum Disorder (ASD) using Machine Learning

by jabez j

2025

The Autism Spectrum Disorder (ASD) is a neurological disease, which affects the mental, social and physical state of a person. A person of any age group can be found infected by it. It is very difficult to identify, if a person is the... more

descriptionView Paper arrow_downwardDownload

High-Scale ATS and Semantic Resume Filtering using Django, NLP, and LLaMA3-GroqModels

by IJRASET Publication

2025, International Journal for Research in Applied Science & Engineering Technology (IJRASET)

An intelligent system that uses Natural Language Processing (NLP) and Machine Learning (ML) to automate resume classification is presented in this paper. Key resume features, such as education, skills, and job titles, are extracted and... more

descriptionView Paper arrow_downwardDownload

Web Spam Detection Using Different Features

by Sumit Sahu

2025

descriptionView Paper arrow_downwardDownload

SMS Spam Detection and Classification to Combat Abuse in Telephone Networks Using Natural Language Processing

by Oyeyemi Dare Azeez and

2025, Journal of Advances in Mathematics and Computer Science

In the modern era, mobile phones have become ubiquitous, and Short Message Service (SMS) has grown to become a multi-million-dollar service due to the widespread adoption of mobile devices and the millions of people who use SMS daily.... more

descriptionView Paper arrow_downwardDownload

A Survey on Extraction Approach for Spam Filtering

by rajesh nigam

2025

With the growth of networking the usage of mails are also enhanced. Due to rapid growth of internet, dependency of communication is mostly based on electronics mails for both commercial and business purposes. According to today's... more

descriptionView Paper arrow_downwardDownload

Assessing the quality of Web content

by Inayatullah Khan

2025

This paper describes our approach towards the ECML/PKDD Discovery Challenge 2010. The challenge consists of three tasks: (1) a Web genre and facet classification task for English hosts, (2) an English quality task, and (3) a multilingual... more

descriptionView Paper arrow_downwardDownload

Email Spam Detector Research Paper

by Ajinkya Pratap Singh

2025

The widespread use of email as a primary communication medium has led to an increase in spam messages, which pose significant threats to privacy, productivity, and cybersecurity. Spam emails, often disguised as legitimate messages, can... more

descriptionView Paper arrow_downwardDownload

PhishGuard: A Machine Learning Framework for Windows-Specific Phishing Detection

by IJRASET Publication

2025, International Journal for Research in Applied Science & Engineering Technology (IJRASET)

Phishing remains one of the most prevalent and evolving cybersecurity threats, exploiting humanvulnerabilities through deceptive digital communication. This study proposes a dynamic, Windows-specific phishing detection model leveraging... more

descriptionView Paper arrow_downwardDownload

Regular Expression Matching on Graphics Hardware for Intrusion Detection

by Sotiris Ioannidis

2025, Lecture Notes in Computer Science

The expressive power of regular expressions has been often exploited in network intrusion detection systems, virus scanners, and spam filtering applications. However, the flexible pattern matching functionality of regular expressions in... more

descriptionView Paper arrow_downwardDownload

A spam filtering multi-objective optimization study covering parsimony maximization and three-way classification

by José Ramón Méndez

2025, Applied Soft Computing

Classifier performance optimization in machine learning can be stated as a multi-objective optimization problem. In this context, recent works have shown the utility of simple evolutionary multi-objective algorithms (NSGA-II, SPEA2) to... more

descriptionView Paper arrow_downwardDownload

Question Answering System in Telugu using DeepSet/Xlm Roberta-Base-Squad2 Model

by Dr.SriSudha Garugu

2025

Question and Answering system is one of the widely used Mechanism in student Community for learning. This paper mainly focuses on Question and answering system based on Paragraphs. Datasets such as the Stanford Question-Answering Dataset... more

descriptionView Paper arrow_downwardDownload

Question Answering System in Telugu using DeepSet/Xlm Roberta-Base-Squad2 Model

by Dr.SriSudha Garugu

2025

descriptionView Paper arrow_downwardDownload

1 Information Operations Across Infospheres

by Bhavani Thuraisingham

2025

There is a critical need for organizations to share data within and across infospheres and form coalitions so that analysts could examine the data, mine the data, and make effective decisions. Each organization could share information... more

descriptionView Paper arrow_downwardDownload

Evaluating spam filters and Stylometric Detection of AI-generated phishing emails

by Paolo Modesti

2025, Expert Systems With Applications

The advanced architecture of Large Language Models (LLMs) has revolutionised natural language processing, enabling the creation of text that convincingly mimics legitimate human communication, including phishing emails. As AI-generated... more

descriptionView Paper arrow_downwardDownload

Adaptive filtering of spam

by Jalal almhana

2025, Proceedings. Second Annual Conference on Communication Networks and Services Research, 2004.

In this paper, we present a new spam filter which acts as an additional layer in the spam filtering process. This filter is based on what we call a representative vocabulary. Spam e-mails are divided into categories in which each category... more

descriptionView Paper arrow_downwardDownload

Processing Obtained Email Data By Using Naïve Bayes Learning Algorithm

by Aleksandr Baranenko

2025, International Journal of Computer Science and Information Technology

This paper gives a basic idea how various machine learning techniques may be applied towards processing the data from DEA services to find out whether people use these services for legitimate or non-legitimate purposes.

descriptionView Paper arrow_downwardDownload

Generalized Knowledge Discovery from Relational Databases

by RAY-I CHANG

2025

Summary The attribute-oriented induction (AOI) method is a useful tool for data capable of extracting generalized knowledge from relational data and the user's background knowledge. However, a potential weakness of AOI is that it... more

descriptionView Paper arrow_downwardDownload

Trust, Security and Privacy in Global Computing

by Jean-Marc Seigneur

2025

During the past thirty years, the world of computing has evolved from large centralised computing centres to an increasingly distributed computing environment, where computation and communication capabilities are being embedded in artefacts of everyday life. Billions of computational entities will interact in systems with ever changing configurations determined by local and global context, for example, the location of the user. In such dynamic environments, users would be overwhelmed if involved in computing-related decisions every time the context changes. Due to the number of decisions required to sustain continuous service, most decisions will have to be made by the computing entities themselves. Moreover, due to the global scale of the environment and the potential risk of disconnected operations, the computing entities may have to make these decisions autonomously, without relying on a given fixed infrastructure. Knowledge, especially about the context of the interaction, is vital for the accuracy of these decisions. However, keeping information on a global scale is unfeasible for resource-constrained entities, so some degree of uncertainty must be assumed. This peer-to-peer type of interaction in an uncertain world where interactions are needed to go forward resembles what occurs in human social networks. The notion of trust has emerged in human society to allow humans to make decisions under such circumstances. It has been proposed that computing entities can make decisions based on a computational model of trust. The trust engine run by each entity distributes and gathers pieces of evidence, that is, knowledge about the interacting entities: direct observations, recommendations or reputation. Since the trust engines collaborate and malicious collaborating entities exist, security through collaboration must be considered. As the real world does not have a unique legitimate authority, computing entities are owned by multiple authorities and operated from multiple jurisdictions. As in real life, no administrator can be perpetually present to manage the interactions. The trust engine can adapt security in a peer-to-peer way. A crucial element for the use of trust is to know with whom the entities interact, which corresponds to authentication in traditional computer security. However, this element has been disregarded in computational trust: this is ill-fated given that virtual identities are the means for a number of attacks that are less possible in face-to-face settings. This thesis sets up a framework, called entification, which encompasses both computational trust and identity aspects, and whose goal is to be applicable to global computing. For this purpose, this thesis

descriptionView Paper arrow_downwardDownload

Development of a Machine Learning Model for Image-based Email Spam Detection

by Christopher U . ONOVA

2025, FUOYE Journal of Engineering and Technology,

Combatting email spam has remained a very daunting task. Despite the over 99% accuracy in most non-image-based spam email detection, studies on image-based spam hardly attain such a high level of accuracy as new email spamming techniques... more

descriptionView Paper arrow_downwardDownload

Protecting the PIPE from malicious peers

by Neil Daswani

2025, Technical Report

Digital materials can be protected from failures by replicating them at multiple autonomous, distributed sites. A Peer-to-peer Information Preservation and Exchange (PIPE) network is a good way to build a distributed replication system. A... more

descriptionView Paper arrow_downwardDownload

Emailvalet: Learning user preferences for wireless email

by Sofus Macskássy

2025

This paper presents EmailValet, a system that learns users' emailreading preferences on email-capable wireless platforms -specifically, on two-way pagers with small "qwerty" keyboards and an 8-line 30-character display. In use by the... more

descriptionView Paper arrow_downwardDownload

Strategies for PageRank Optimization

by Henrik Bondtofte

2025, Self-published

This paper explores the fundamental principles of internal linking and PageRank optimization. It provides a detailed overview of how internal links influence a website’s SEO, helping distribute link equity and improving user navigation.... more

descriptionView Paper arrow_downwardDownload

Leveraging Vectorization Techniques for Malicious Website Detection With Machine Learning

by Chitra Baskar

2025, Iragi journal of science

Malicious websites are those that are created to harm visitors or exploit their information for illegal purposes. These websites are commonly utilized in attacks, such as phishing, malware distribution, and scams. Clicking on a malicious... more

descriptionView Paper arrow_downwardDownload

Stork: package management for distributed VM environments

by Duy Nguyen

2025

In virtual machine environments each application is often run in its own virtual machine (VM), isolating it from other applications running on the same physical machine. Contention for memory, disk space, and network bandwidth among... more

descriptionView Paper arrow_downwardDownload

Enhancing detection of zero-day phishing email attacks in the Indonesian language using deep learning algorithms

by beei iaes

2025, Bulletin of Electrical Engineering and Informatics

Email phishing is a manipulative technique aimed at compromising information security and user privacy. To overcome the limitations of traditional detection methods, such as blacklists, this research proposes a phishing detection model... more

descriptionView Paper arrow_downwardDownload

Auto-Grouping Emails For Faster E-Discovery

by Sachindra Joshi

2025

In this paper, we examine the application of various grouping techniques to help improve the efficiency and reduce the costs involved in an electronic discovery process. Specifically, we create coherent groups of email documents which... more

descriptionView Paper arrow_downwardDownload

World Academy of Science, Engineering and Technology 37 2010 Analysis of Classifications of Unsolicited Bulk

by Apurva Desai

2025

Abstract—In recent times, the problem of Unsolicited Bulk Email (UBE) or commonly known as Spam Email, has increased at a tremendous growth rate. We present an analysis of survey based on classifications of UBE in various research works.... more

descriptionView Paper arrow_downwardDownload

Identification Of Non-Lexicon Non-Slang Unigrams In Body-Enhancement Medicinal Ube

by Apurva Desai

2025

Email has become a fast and cheap means of online communication. The main threat to email is Unsolicited Bulk Email (UBE), commonly called spam email. The current work aims at identification of unigrams in more than 2700 UBE that... more

descriptionView Paper arrow_downwardDownload

Multilingual Rules for Spam Detection

by Minh Trị Vũ

2025, Journal of Machine to Machine Communications

In this paper, we introduced a statistical rule-based method to create rules for SpamAssassin to detect spams in different languages. The theoretical framework of generating and maintaining multilingual rules were also illustrated. The... more

descriptionView Paper arrow_downwardDownload

Modeling Coherency in Generated Emails by Leveraging Deep Neural Learners

by Rakesh Verma

2025, arXiv (Cornell University)

Advanced machine learning and natural language techniques enable attackers to launch sophisticated and targeted social engineering based attacks. To counter the active attacker issue, researchers have since resorted to proactive methods... more

descriptionView Paper arrow_downwardDownload

Detecting Spambot as an Antispam Technique for Web Internet BBS

by Juned Laliwala

2025

Spam which is one of the most popular and also the most relevant topic that needs to be understood in the current scenario. Everyone whether it may be a small child or an old person are using emails everyday all around the world. The... more

descriptionView Paper arrow_downwardDownload

O-IPCAC and its Application to EEG Classification

by Gabriele Lombardi

2024, … of Workshop on …

In this paper we describe an online/incremental linear binary classifier based on an inter-esting approach to estimate the Fisher subspace. The proposed method allows to deal with datasets having high cardinality, being dynamically... more

descriptionView Paper arrow_downwardDownload

A Novel Approach for Combating Spamdexing in Web using UCINET and SVM Light Tool

by Vijaya Kathiravan

2024

Search Engine spam is a web page or a portion of a web page which has been created with the intention of increasing its ranking in search engines. Web spamming refers to actions intended to mislead search engines and give some pages... more

descriptionView Paper arrow_downwardDownload

Search Engine Optimization based on Effective Factors of Ranking in Web Sites: A ‎ Review

by Farhad Soleimanian Gharehchopogh

2024

Nowadays, Search Engines have made progress lately and the number of the pages of web sites increases every days. The Search Engines ‫‬ ‫‬ ‫‬ the most common search systems are for meeting the needs of the users in searching the... more

descriptionView Paper arrow_downwardDownload

Email Spam Classification using Neighbor Probability based Naïve Bayes Algorithm

by Dr.D.Suresh Babu

2024

Email spam is a kind of electronic spam, which tends to be a more difficult problem nowadays among all internet challenges. Spam mails are mostly sent in commercial purpose, some of them may contain malware links that lead to phishing... more

descriptionView Paper arrow_downwardDownload

Lecture Notes in Networks and Systems

by Joyece Jane

2024

The series "Lecture Notes in Networks and Systems" publishes the latest developments in Networks and Systems-quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of... more

descriptionView Paper arrow_downwardDownload

Relacion entre la calidad del sueno y factores soc

by Joyece Jane

2024

Relación entre la calidad del sueño y factores sociodemográficos Esta obra está bajo una licencia Creative Commons de tipo (CC-BY-NC-SA).

descriptionView Paper arrow_downwardDownload

Spam Detection by Combining Bayesian Method and Regression Analysis

by Srikanth Kadainti

2024, BP Publishers

This study proposes a new method that utilizes the correlation structure between the number of words in the mail and the Bayesian score. Spam mails usually do not have a stable style and features. Spammers who send such mails, go on... more

Fig. 1(a). Distribution of Z2 Scores [2a]

Fig. 2. Percentage misclassification during random testing [2a] Research Updates in Mathematics and Computer Science Vol. 3 pam Detection by Combining Bayesian Method and Regression Analysis Spam Detection by Combining Bayesian Method and Regression Analysis

List 1. Classification errors Spam Detection by Combining Bayesian Method and Regression Analysis In this paper, we re-examine the Naive Bayesian filter by identifying some important statistical features of the filter. We also report the results of our experiment with Enron data set and carry out statistical analysis and establish the merits of the new method.

Table 1. Partial list of tokens and their probabilities obtained from the training data [2a]

Table 3. Z Scores for different mails [2a]

Table 4. Summary measures of Z2 statistic for naive bayesian and modified naive bayesian methods [2a] Interestingly the misclassification percent has come down to 8.76% and the FP has drastically come down. False negatives however, have increased [2a]. The new model has 67% of sensitivity and 98% of specificity. The odds ratio is also very high in this case compared to the results given in Table 4. The AUC indicates that the probability is 0.977 that a randomly selected spam mail has Z2 value higher than that of a randomly selected ham mail [2a]. The results of this modified Bayesian procedure classification as shown in Table 4.

Table 5. Comparison of performance statistics under random sampling [2a]

descriptionView Paper arrow_downwardDownload

An Anti-Filtering System Using a Hybrid Machine Learning Algorithm Based on a Variant of PSO

by Dr.D.Suresh Babu

2024

The use of email has grown exponentially over the past decade, making it one of the most widely used forms of electronic communication. Recently, spam emails have become a major issue for email users. A spammer is someone who sends out... more

descriptionView Paper arrow_downwardDownload

Spam Filtering

Key research themes

1. How do machine learning techniques address the evolving challenges of email spam filtering?

2. What are the roles and limitations of pre-acceptance filtering techniques in combating spam at the SMTP server level?

3. How can stylometric and content-based features alongside machine learning improve detection of sophisticated and AI-generated spam and phishing emails?

Related Topics

All papers in Spam Filtering