Academia.eduAcademia.edu

Email Spam Filter

description105 papers
group91 followers
lightbulbAbout this topic
An email spam filter is a software application designed to identify and block unsolicited or unwanted email messages, commonly known as spam. It employs various algorithms and techniques, such as keyword analysis and machine learning, to assess the likelihood of an email being spam and to protect users' inboxes from irrelevant or harmful content.
lightbulbAbout this topic
An email spam filter is a software application designed to identify and block unsolicited or unwanted email messages, commonly known as spam. It employs various algorithms and techniques, such as keyword analysis and machine learning, to assess the likelihood of an email being spam and to protect users' inboxes from irrelevant or harmful content.

Key research themes

1. How can machine learning algorithms and feature engineering optimally detect and classify email spam?

This research area focuses on leveraging diverse machine learning models, including traditional classifiers like Naïve Bayes, SVM, Random Forest, and ensemble techniques, to improve spam email detection accuracy. A strong emphasis is placed on feature extraction and data preprocessing methods such as TF-IDF vectorization, word embeddings, and keyword analysis to enhance the discriminatory power of models. This theme matters because accurate spam detection reduces resource wastage, protects user privacy, and mitigates financial and phishing risks associated with spam emails.

Key finding: This study applied four machine learning and two deep learning models on combined datasets including TREC07 and Enron to classify spam emails and identify recurrent spam keywords. It found that advanced feature engineering... Read more
Key finding: Utilizing TF-IDF text representation combined with machine learning algorithms such as Support Vector Machines (SVM), Random Forest, and Naïve Bayes, this work demonstrated how numerical features derived from NLP techniques... Read more
Key finding: Through an empirical comparison of eight supervised models on a pre-processed and balanced email dataset, the study found Random Forest to consistently outperform others with an accuracy of 96.6%. The evaluation incorporated... Read more
Key finding: This paper systematically applies and compares machine learning algorithms such as Naïve Bayes, SVM, and ensemble methods alongside bio-inspired algorithms on multiple datasets with extensive preprocessing. It confirms that... Read more
Key finding: The study investigated the application of multiple machine learning algorithms including Random Forest, Logistic Regression, Naïve Bayes, and SVM across multiple datasets, incorporating feature engineering, bagging, and... Read more

2. What advances do deep learning and hybrid attention mechanisms offer for detecting spam in email data?

This theme investigates the application of deep learning architectures—especially models integrating convolutional neural networks (CNN), gated recurrent units (GRU), and attention mechanisms—for email spam filtering. These methods focus on hierarchical feature extraction and contextual weighting of informative text segments, aiming to overcome limitations of classical techniques and improve generalization across datasets. The novelty lies in capturing complex semantic structures and temporal dependencies in email content, a crucial advance given the linguistic complexity of spam emails.

Key finding: This research proposed a hybrid model combining CNN, GRU, and hierarchical attention mechanisms, which selectively focused on relevant email text parts during training. The temporal convolution layers enabled flexible... Read more
Key finding: The study applied machine learning techniques to identify recurrent word groups characteristic of spam and introduced a feedback-trained model with tokenizers and Naïve Bayes classifiers to distinguish between spam and ham... Read more
Key finding: This research demonstrated the application of neural networks to email spam filtering, showing their capability to learn complex patterns and outperform standard classifiers in accuracy metrics, specifically for phishing and... Read more

3. How effective are email service providers' pre-acceptance spam filtering techniques and what are the limitations?

This research area delves into the strategies employed by major email providers like Gmail, Yahoo, and Outlook at the SMTP pre-acceptance stage to filter spam, including blacklists, whitelists, and sender reputation analysis. It quantifies the proportion of spam and legitimate emails filtered before message acceptance and analyses the challenges posed by sophisticated spam gangs and end-host spammers. Understanding these filtering boundaries is vital for optimizing server resources and enhancing spam mitigation strategies.

Key finding: Through a large-scale empirical study using millions of emails collected at UW-Madison, the authors found that pre-acceptance filtering methods, such as blacklists and whitelists constructed from sender-tracking heuristics,... Read more
Key finding: This review synthesizes state-of-the-art machine learning implementations in major email providers’ spam filters, highlighting Google's advanced neural network-based filtering achieving ~99.9% accuracy. It details innovative... Read more

All papers in Email Spam Filter

The upsurge in the volume of unwanted emails called spam has created an intense need for the development of more dependable and robust antispam filters. Machine learning methods of recent are being used to successfully detect and filter... more
The upsurge in the volume of unwanted emails called spam has created an intense need for the development of more dependable and robust antispam filters. Machine learning methods of recent are being used to successfully detect and filter... more
E-mail spam has remained a scourge and menacing nuisance for users, internet and network service operators and providers, in spite of the anti-spam techniques available; and spammers are relentlessly circumventing these anti-spam... more
The upsurge in the volume of unwanted emails called spam has created an intense need for the development of more dependable and robust antispam filters. Machine learning methods of recent are being used to successfully detect and filter... more
Here we present an inclusive review of recent and successful content-based e-mail spam filtering techniques. Our focus is primarily on machine learning-based spam filters and variants that are inspired by them. We report on related... more
The upsurge in the volume of unwanted emails called spam has created an intense need for the development of more dependable and robust antispam filters. Machine learning methods of recent are being used to successfully detect and filter... more
An unsolicited means of digital communications in the internet world is the spam email, which could be sent to an individual or a group of individuals or a company. These spam emails may cause serious threat to the user i.e., the email... more
Email has continued to be an integral part of our lives and as a means for successful communication on the internet. The problem of spam mails occupying a huge amount of space and bandwidth, and the weaknesses of spam filtering techniques... more
Email (Elektronik Mail) atau surat elektronik merupakan salah satu perkembangan teknologi saat ini, dengan email pengiriman pesan dapat dilakukan dengan cepat, dan dapat dikirimkan ke banyak penerima pesan dalam waktu yang singkat, Namun... more
Here we present an inclusive review of recent and successful content-based e-mail spam filtering techniques. Our focus is primarily on machine learning-based spam filters and variants that are inspired by them. We report on related ideas,... more
–The Spam Emails are regularly causing huge losses to business on a regular basis. The Spam filtering is an automated technique to identity SPAM and HAM (Non-Spam). The Web Spam filters can be categorized as: Content based spam filters... more
The volume of SMS messages sent on a daily basis globally has continued to grow significantly over the past years. Hence, mobile phones are becoming increasingly vulnerable to SMS spam messages, thereby exposing users to the risk of fraud... more
The purpose of email spam is to advertise to sell, phishing attacks, DDOS attacks and many more. Many solutions of various kinds such as blacklisting, whitelisting, grey-listing, content filtering have been proposed at the sender and... more
This paper reports on email classification and filtering, more specifically on spam versus ham and phishing versus spam classification, based on content features. We test the validity of several novel statistical feature extraction... more
In this age of electronic money transactions, the opportunities for electronic crime expanded at the same rate as ever expanding rise of on-line services. With world becoming a global village, crime over the internet transcends no... more
Email has continued to be an integral part of our lives and as a means for successful communication on the internet. The problem of spam mails occupying a huge amount of space and bandwidth, and the weaknesses of spam filtering techniques... more
Ethereum is an open-source, public, block chain-based distributed computing platform and operating system featuring smart contract functionality. In this paper, we proposed an Ethereum based electronic voting (e-voting) protocol,... more
Cilj ovog rada je izgraditi anti-spam filter, softverski alati koji automatski prepoznaje dolazne neželjene poruke. Dakle, radi se o machine learning sustavu koji na temelju naučenih podataka o tome što je dobar mail, a što loš može... more
Hidden salting in digital media involves the intentional addition or distortion of content patterns with the purpose of content filtering. We propose a method to detect portions of a digital text source which are invisible to the end... more
The upsurge in the volume of unwanted emails called spam has created an intense need for the development of more dependable and robust antispam filters. Machine learning methods of recent are being used to successfully detect and filter... more
Email spam filtering is performed based on a sender reputation and message features. When an email message is received, a preliminary spam determination is made based, at least in part, on a combination of a reputation associated with the... more
This paper presents a document classifier based on text content features and its application to email classification. We test the validity of a classifier which uses Principal Component Analysis Document Reconstruction (PCADR), where the... more
Salting is the intentional addition or distortion of content, aimed to evade automatic filtering. Salting is usually found in spam emails. Salting can also be hidden in phishing emails, which aim to steal personal information from users.... more
Negative selection algorithms (NSAs) are inspired by artificial immune system. It creates techniques that aim at developing the immune based model. This is done by distinguishing self from non-self spam in the generation of detectors. In... more
Email has continued to be an integral part of our lives and as a means for successful communication on the internet. The problem of spam mails occupying a huge amount of space and bandwidth, and the weaknesses of spam filtering techniques... more
Email has continued to be an integral part of our lives and as a means for successful communication on the internet. The problem of spam mails occupying a huge amount of space and bandwidth, and the weaknesses of spam filtering techniques... more
This paper reports on email filtering based on content features. We test the validity of a novel statistical feature extraction method, which relies on dimensionality reduction to retain the most informative and discriminative features... more
Download research papers for free!