Academia.eduAcademia.edu

Email Spam Filter

description105 papers
group90 followers
lightbulbAbout this topic
An email spam filter is a software application designed to identify and block unsolicited or unwanted email messages, commonly known as spam. It employs various algorithms and techniques, such as keyword analysis and machine learning, to assess the likelihood of an email being spam and to protect users' inboxes from irrelevant or harmful content.
lightbulbAbout this topic
An email spam filter is a software application designed to identify and block unsolicited or unwanted email messages, commonly known as spam. It employs various algorithms and techniques, such as keyword analysis and machine learning, to assess the likelihood of an email being spam and to protect users' inboxes from irrelevant or harmful content.

Key research themes

1. How can machine learning algorithms and feature engineering optimally detect and classify email spam?

This research area focuses on leveraging diverse machine learning models, including traditional classifiers like Naïve Bayes, SVM, Random Forest, and ensemble techniques, to improve spam email detection accuracy. A strong emphasis is placed on feature extraction and data preprocessing methods such as TF-IDF vectorization, word embeddings, and keyword analysis to enhance the discriminatory power of models. This theme matters because accurate spam detection reduces resource wastage, protects user privacy, and mitigates financial and phishing risks associated with spam emails.

Key finding: This study applied four machine learning and two deep learning models on combined datasets including TREC07 and Enron to classify spam emails and identify recurrent spam keywords. It found that advanced feature engineering... Read more
Key finding: Utilizing TF-IDF text representation combined with machine learning algorithms such as Support Vector Machines (SVM), Random Forest, and Naïve Bayes, this work demonstrated how numerical features derived from NLP techniques... Read more
Key finding: Through an empirical comparison of eight supervised models on a pre-processed and balanced email dataset, the study found Random Forest to consistently outperform others with an accuracy of 96.6%. The evaluation incorporated... Read more
Key finding: This paper systematically applies and compares machine learning algorithms such as Naïve Bayes, SVM, and ensemble methods alongside bio-inspired algorithms on multiple datasets with extensive preprocessing. It confirms that... Read more
Key finding: The study investigated the application of multiple machine learning algorithms including Random Forest, Logistic Regression, Naïve Bayes, and SVM across multiple datasets, incorporating feature engineering, bagging, and... Read more

2. What advances do deep learning and hybrid attention mechanisms offer for detecting spam in email data?

This theme investigates the application of deep learning architectures—especially models integrating convolutional neural networks (CNN), gated recurrent units (GRU), and attention mechanisms—for email spam filtering. These methods focus on hierarchical feature extraction and contextual weighting of informative text segments, aiming to overcome limitations of classical techniques and improve generalization across datasets. The novelty lies in capturing complex semantic structures and temporal dependencies in email content, a crucial advance given the linguistic complexity of spam emails.

Key finding: This research proposed a hybrid model combining CNN, GRU, and hierarchical attention mechanisms, which selectively focused on relevant email text parts during training. The temporal convolution layers enabled flexible... Read more
Key finding: The study applied machine learning techniques to identify recurrent word groups characteristic of spam and introduced a feedback-trained model with tokenizers and Naïve Bayes classifiers to distinguish between spam and ham... Read more
Key finding: This research demonstrated the application of neural networks to email spam filtering, showing their capability to learn complex patterns and outperform standard classifiers in accuracy metrics, specifically for phishing and... Read more

3. How effective are email service providers' pre-acceptance spam filtering techniques and what are the limitations?

This research area delves into the strategies employed by major email providers like Gmail, Yahoo, and Outlook at the SMTP pre-acceptance stage to filter spam, including blacklists, whitelists, and sender reputation analysis. It quantifies the proportion of spam and legitimate emails filtered before message acceptance and analyses the challenges posed by sophisticated spam gangs and end-host spammers. Understanding these filtering boundaries is vital for optimizing server resources and enhancing spam mitigation strategies.

Key finding: Through a large-scale empirical study using millions of emails collected at UW-Madison, the authors found that pre-acceptance filtering methods, such as blacklists and whitelists constructed from sender-tracking heuristics,... Read more
Key finding: This review synthesizes state-of-the-art machine learning implementations in major email providers’ spam filters, highlighting Google's advanced neural network-based filtering achieving ~99.9% accuracy. It details innovative... Read more

All papers in Email Spam Filter

Email deliverability is a critical factor in digital communication, determining whether messages reach recipients’ primary inboxes or are relegated to spam and promotional folders. This paper examines deliverability in the context of both... more
When email became more widely used, in the late 1990s, I heard the term “snail mail” to refer to mail by post. Whereas anyone online could almost instantly send a little letter to someone else online, or a big one, if one uses traditional... more
The various forms and tremendous number of spam emails have brought great challenges to accurate email classification. In this paper, we present a behavior- and time-feature-based email classification method. Based on email logs, email... more
A phishing email is an attack that focused completely on people to circumvent existing traditional security algorithms. The email appears to be a dependable, appropriate, and solid communication medium for internet users. At present, the... more
The exponential growth of mobile communication has intensified the threat of SMS spam, compromising user security and trust in messaging platforms. This study addresses this challenge by designing and deploying a robust spam detection... more
In this paper, the clustering Algorithm known as Support Vector Classifier (SVC) is used. SVC offers classifiers such as logistic regression and decision trees that provide very high accuracy compared to others. The model first... more
Spam consists of varieties of contents like text, image, embedded HTML, MIME attachments and also the volume of spam mails sent per day is massive. To handle this high volume, high velocity and large varieties of spam, a scalable spam... more
this research project acts as a foundational guide for students and early-career professionals interested in AI. It combines academic theory, technical practice, and ethical reflection to provide a holistic understanding of AI and machine... more
This paper presents a framework to detect spammers and fake users on social networking platforms using machine learning algorithms. With the rise in usage of platforms like Twitter and Facebook, malicious activities like spam posting and... more
The widespread use of email as a primary communication medium has led to an increase in spam messages, which pose significant threats to privacy, productivity, and cybersecurity. Spam emails, often disguised as legitimate messages, can... more
The number of email users is increasing every day worldwide. If you have to communicate officially with someone, whether in a business matter or with someone else, your electronic mail is the best option. When identifying the emails,... more
The advanced architecture of Large Language Models (LLMs) has revolutionised natural language processing, enabling the creation of text that convincingly mimics legitimate human communication, including phishing emails. As AI-generated... more
During the past thirty years, the world of computing has evolved from large centralised computing centres to an increasingly distributed computing environment, where computation and communication capabilities are being embedded in... more
With the rise of Social Media platforms and new applications, the Rapid Expansion of fake accounts has become an important concern, posing threats to security, privacy and trustworthiness. In response, this research explores the... more
The internet has made it easier than ever to connect and do business. But it's also given bad guys more ways to trick people. Phishing is a common online scam where criminals try to get your personal information, like passwords and bank... more
With the continuous rise in the number of mobile device users, SMS (Short Message Service) remains a prevalent communication tool accessible on both smartphones and basic phones. Consequently, SMS traffic has experienced a significant... more
Combatting email spam has remained a very daunting task. Despite the over 99% accuracy in most non-image-based spam email detection, studies on image-based spam hardly attain such a high level of accuracy as new email spamming techniques... more
by IJCSMC Journal and 
1 more
Phishing is still one of the biggest threats in cybersecurity. It is the exploitation of users through the use of deceptive URLs. In this study, the outcomes of the Random Forest, Support Vector Machines, and Decision Tree models are... more
Email spam is a kind of electronic spam, which tends to be a more difficult problem nowadays among all internet challenges. Spam mails are mostly sent in commercial purpose, some of them may contain malware links that lead to phishing... more
The series "Lecture Notes in Networks and Systems" publishes the latest developments in Networks and Systems-quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of... more
The use of email has grown exponentially over the past decade, making it one of the most widely used forms of electronic communication. Recently, spam emails have become a major issue for email users. A spammer is someone who sends out... more
The upsurge in the volume of unwanted emails called spam has created an intense need for the development of more dependable and robust antispam filters. Machine learning methods of recent are being used to successfully detect and filter... more
The rapid development of email use and the convenience provided make email as the most frequently used means of communication. Along with its development, many parties are abusing the use of email as a means of advertising promotion,... more
In the real world, many online shopping websites or service provider have single email-id where customers can send their query, concern etc. At the back-end service provider receive million of emails every week, how they can identify... more
With the advent of the internet, along with email, and social networking, there are some new issues that have caused vulnerability of users against attackers. Internet users face a lot of undesirable emails and their data privacy and... more
Social media and e-commerce platforms have led online communities to utilize reviews as a means of exchanging opinions about products, services, and issues. Reviews can also help customers make better purchasing decisions by analyzing... more
Social media and e-commerce sites have prompted online communities to use reviews to provide input on goods, products, and services, as well as to support people to analyze customer input for buying choices, and corporations to improve... more
Feature extraction plays an important role in accurate preprocessing and real-world applications. High-dimensional features in the data have a significant impact on the machine learning classification system. Relevant feature extraction... more
Here we present an inclusive review of recent and successful content-based e-mail spam filtering techniques. Our focus is primarily on machine learning-based spam filters and variants that are inspired by them. We report on related ideas,... more
The Spamming is the use of messaging systems to send multiple unsolicited messages (spam) to large numbers of recipients for the purpose of commercial advertising, for the purpose of non-commercial proselytizing, for any prohibited... more
Spam emails are one of the crucial problems faced by most of the email users. There are a lot of algorithms to filter spam mails from ham mails. In this paper two efficient filters-Bayesian filters and Artificial Immunity filters are... more
E-mail is an efficient and reliable data exchange service. Spams are undesired email messages which are randomly sent in bulk usually for commercial aims. Obfuscated image spamming is one of the new tricks to bypass text-based and Optical... more
At present, spam is an actual and increasing problem that compromises email communications across the world. Thus, several solutions have been proposed to stop or reduce the amount of this threat. However, methods based on negative... more
Hybrid spam is an undesirable e-mail (electronic mail) that contains both image and text parts. It is more harmful and complex as compared to image-based and text-based spam e-mail. Thus, an efficient and intelligent approach is required... more
The proliferation of spam emails, a predominant form of online harassment, has elevated the significance of email in daily life. As a consequence, a substantial portion of individuals remain vulnerable to fraudulent activities. Despite... more
With the rapid increase in internet users, e-mail spam is also increasing, which has become a major problem. Now a days, emails have two subcategories: spam and ham. In addition to harming the system, malicious link senders via spam... more
Recently, machine learning has been applied into different major areas such as text classi cation, machine translation, and spam detection. The great performance of machine learning algorithms into several elds provided the humans with... more
Recently, machine learning has been applied into different major areas such as text classification, machine translation, and spam detection. The great performance of machine learning algorithms into several fields provided the humans with... more
In contemporary email communication, the everexpanding volume of digital correspondence has ushered in an era where big data plays a pivotal role in addressing the challenge of distinguishing between legitimate (ham) and unsolicited... more
The continuous growth of email users has resulted in the increasing of unsolicited emails also known as Spam. In current, server side and client side anti spam filters are introduced for detecting different features of spam emails.... more
Short Message Service (SMS) is one of the most frequently used services in the mobile phones, next to calls. In developing countries like India, SMS is the cheapest mode of communication. The advantage of this fact is exploited by the... more
Financial fraud detection poses a critical challenge in the contemporary digital economy due to its potential to inflict substantial harm on individuals, businesses, and financial institutions. In this research, we introduce an innovative... more
Short Message Service (SMS) is one of the most frequently used services in the mobile phones, next to calls. In developing countries like India, SMS is the cheapest mode of communication. The advantage of this fact is exploited by the... more
The internet has provided numerous modes for secure data transmission from one end station to another, and email is one of those. The reason behind its popular usage is its cost-effectiveness and facility for fast communication. In the... more
The internet has provided numerous modes for secure data transmission from one end station to another, and email is one of those. The reason behind its popular usage is its cost-effectiveness and facility for fast communication. In the... more
In the modern era, mobile phones have become ubiquitous, and Short Message Service (SMS) has grown to become a multi-million-dollar service due to the widespread adoption of mobile devices and the millions of people who use SMS daily.... more
With the emergence of big data and the interest in deriving valuable insights from ever-growing and ever-changing streams of data, machine learning has appeared as an effective data analytic technique as compared to traditional... more
Download research papers for free!