Academia.eduAcademia.edu

Email Classification

description45 papers
group42 followers
lightbulbAbout this topic
Email classification is the process of categorizing email messages into predefined groups based on their content, context, or metadata. This technique utilizes algorithms and machine learning methods to enhance email management, improve user experience, and facilitate the identification of spam, important messages, or specific topics.
lightbulbAbout this topic
Email classification is the process of categorizing email messages into predefined groups based on their content, context, or metadata. This technique utilizes algorithms and machine learning methods to enhance email management, improve user experience, and facilitate the identification of spam, important messages, or specific topics.

Key research themes

1. How can semantic and structural attributes improve context-based email classification?

This research area focuses on leveraging the rich semantic and structural characteristics of emails to enhance classification accuracy. By representing emails not merely as text but as structured entities (e.g., graphs capturing semantic roles and event types), classifiers can better differentiate among nuanced classes like social, personal, and professional emails. This approach moves beyond traditional bag-of-words or keyword models to embrace the contextual and layout features inherent in emails, which is crucial for applications such as event management and prioritization.

Key finding: This paper introduces a novel graph-based representation of emails capturing semantic and structural attributes, which serves as a foundation for template mining algorithms that identify frequent event patterns. By... Read more
Key finding: This comprehensive review identifies the importance of exploiting diverse feature sets beyond simple textual content, including semantic features, structural patterns, and metadata in email classification. It highlights that... Read more
Key finding: By integrating feature selection via Particle Swarm Optimization with Support Vector Machines, this study optimizes classification accuracy and reduces computational overhead on large email datasets. The work underscores how... Read more

2. What machine learning models and feature engineering techniques yield high performance in spam email detection?

The surge in spam emails necessitates robust, efficient spam detection systems. This research theme investigates various supervised learning algorithms—such as Naive Bayes, Support Vector Machines (SVM), Random Forests, and ensemble methods like boosting—and feature extraction strategies like TF-IDF, bag-of-words, and word embeddings. It explores how these algorithms perform on benchmark datasets (e.g., Enron, Spambase, Ling-Spam) in terms of precision, recall, and accuracy, with considerations for computational efficiency and adaptability to evolving spam tactics.

Key finding: Through experimental comparisons across multiple classifiers, this study found Rotation Forest to achieve the highest accuracy (94.2%) in spam detection without feature selection or boosting, highlighting the potential of... Read more
Key finding: This work compares Naive Bayes, K-Nearest Neighbor (KNN), and Support Vector Machine (SVM) classifiers on the Spambase dataset, demonstrating that Naive Bayes outperforms the others in terms of accuracy and evaluation... Read more
Key finding: The study proposes hybrid bagging and boosting ensembles combining multinomial decision trees, Naive Bayes, KNN, Random Forest, and SVM to boost spam detection accuracy beyond standalone classifiers. Testing on Ling-Spam and... Read more
Key finding: This research innovates beyond bag-of-words by representing emails with paragraph vectors (PV-DM) and TF-IDF combined features. Empirical results on Enron and Ling spam datasets show that this representation, when coupled... Read more
Key finding: This paper rigorously evaluates Naive Bayes, SVM, Random Forest, and XGBoost classifiers using Bag of Words and TF-IDF feature representations on combined spam datasets. The linear SVM with TF-IDF features yields the best... Read more

3. How do specialized language and regional characteristics influence email classification techniques?

This research question addresses the challenges and methodologies involved in classifying emails written in specific languages, particularly Arabic, which has unique morphological and syntactic traits compared to widely studied languages like English. The focus is on adapting deep learning and natural language processing approaches to handle limited training data, complex morphology, and language-specific lexicons to classify business emails effectively. Understanding these tailored models is essential for enabling accurate automatic email classification and filtration in regional and resource-constrained language contexts.

Key finding: This study develops deep learning models leveraging natural language processing and domain-specific lexicons to classify a large-scale Arabic business email dataset into urgency, sentiment, and topic categories. It addresses... Read more
Key finding: The paper applies Random Forest classifiers for multi-label categorization of emails into user-defined semantic groups and label recommendation, demonstrating higher average recall (64%) compared to Naive Bayes (63%).... Read more

All papers in Email Classification

When email became more widely used, in the late 1990s, I heard the term “snail mail” to refer to mail by post. Whereas anyone online could almost instantly send a little letter to someone else online, or a big one, if one uses traditional... more
Electronic Mail (E-mail) has established a significant place in information user’s life. Mails are used as a major and important mode of information sharing because emails are faster and effective way of communication. Email plays its... more
In the real world, many online shopping websites or service provider have single email-id where customers can send their query, concern etc. At the back-end service provider receive million of emails every week, how they can identify... more
Trees have been a crucial component in humans' lives for hundreds of years, providing food, shelter, and medicine. Some trees have a lot of medicinal properties that cure many diseases. In the old days, Ayurvedic methods were popular for... more
Modern email clients support predicate-based folder assignment rules that can automatically organize emails. Unfortunately, users still need to write these rules manually. Prior machine learning approaches have framed automatically... more
Personal and business users prefer to use email as one of the crucial sources of communication. The usage and importance of e-mails continuously grow despite the prevalence of alternative means, such as electronic messages, mobile... more
The liver is one of the most significant organs in the human body. We can predict liver disease in a patient at an early stage based on previously predicted values using data from patients with abnormal liver function. Which helps the... more
Learning outcomes are measurable statements that articulate educational aims in terms of what knowledge, skills, and other competences students possess after successfully completing a given learning experience. This paper presents an... more
The continuous growth of email users has resulted in the increasing of unsolicited emails also known as Spam. In current, server side and client side anti spam filters are introduced for detecting different features of spam emails.... more
Email being an efficient, cost-effective, real-time communication mode results into effective productivity among the professional in the organization. It constitutes almost 90% of daily office procedures in organizations, hence the... more
Psychometric tests and personality assessments are widely used in a variety of settings, from academic research to employment screening. Traditional methods of administering and scoring these tests can be time-consuming and... more
Stress is a prevalent issue that affects individuals' mental and physical well-being, leading to various health problems. The use of machine learning (ML) has been gaining popularity as a tool for stress detection. ML techniques have... more
Emails have become the most economical and fastest communication forms. However, during the past few years, the increment of email users has dramatically increased spam emails. Various anti-spam techniques have been developed to minimize... more
Learning outcomes are measurable statements that articulate educational aims in terms of what knowledge, skills, and other competences students possess after successfully completing a given learning experience. This paper presents an... more
The liver is one of the most significant organs in the human body. We can predict liver disease in a patient at an early stage based on previously predicted values using data from patients with abnormal liver function. Which helps the... more
Objective: To report a review of various machine learning and hybrid algorithms for detecting SMS spam messages and comparing them according to accuracy criterion. Data sources: Original articles written in English found in... more
In this work we report on a pilot study where we used machine learning to predict whether students will correctly solve the classic "ballistic pendulum" problem based on an essay written by students elucidating their approach to solving... more
Personal and business users prefer to use email as one of the crucial sources of communication. The usage and importance of e-mails continuously grow despite the prevalence of alternative means, such as electronic messages, mobile... more
The detection of masked face is becoming an essential part of health care safetydue to the pandemic caused by the coronavirus and the surveillance systems. One of the most challenging problems in face recognition systems is the accurate... more
Big Data concern large-volume, growing data sets that are complex and have multiple autonomous sources. Earlier technologies were not able to handle storage and processing of huge data thus Big Data concept comes into existence. This is a... more
Phishing is the act of attempting to steal a user's financial and personal information, such as credit card numbers and passwords by pretending to be a trustworthy participant, during online communication. Attackers may direct the users... more
Leukemia is type of cancer in blood which impacts the lymphatic framework and the bone marrow and also impacts white blood cells. Leukemia, in contrast to other types of cancer, does not produce solid tumors; instead, it produces a huge... more
Internet of Things (IoT) involves a set of devices that aids in achieving a smart environment. Healthcare systems, which are IoT-oriented, provide monitoring services of patients' data and help take immediate steps in an emergency.... more
Under short messaging service (SMS) spam is understood the unsolicited or undesired messages received on mobile phones. These SMS spams constitute a veritable nuisance to the mobile subscribers. This marketing practice also worries... more
With the most preferred communication method e-mails have become part of day to day life. Spams which are also called unwanted, junk ,unsolicited mail is one of the major problem in using the e-mails. There are basically two things that... more
India as we know is a densely populated country and Every year more than 6 crores of Indians graduates from diverse backgrounds and with diversity in education. Almost similar number of students enter into colleges for taking various... more
Recent interest in the design of information retrieval systems that can balance an ability to find relevant content with an ability to protect sensitive content creates a need for test collections that are annotated for both relevance and... more
India as we know is a densely populated country and Every year more than 6 crores of Indians graduates from diverse backgrounds and with diversity in education. Almost similar number of students enter into colleges for taking various... more
Most of the public spaces are being used for the development of public buildings in the name of schools, hospitals, multipurpose halls, temples, community buildings, etc. The lack of lively and quality public places is faced by all in... more
The sixth-generation (6G) has stricter criteria for the online learning capability and high interpretability of taught algorithms. It is anticipated that machine learning would be crucial for making network effective and flexible, however... more
Binary classification is challenging when dealing with imbalanced data sets where the important class is made of extremely rare events, usually with a prevalence of around 0.1%. Such data sets are common in various real-world problems.... more
Spam and phishing emails are not only annoying to users, but are a real threat to internet communication and web economy. The fight against unwanted emails has become a cat-and-mouse game between criminals and people trying to develop... more
Emails have become one of the major applications in daily life. The continuous growth in the number of email users has led to a massive increase of unsolicited emails, which are also known as spam emails. Managing and classifying this... more
Timely analysis of cyber-security information necessitates automated information extraction from unstructured text. While state-of-the-art extraction methods produce extremely accurate results, they require ample training data, which is... more
Email spam has become a prevalent issue in recent times, with the growing number of internet users, spam emails are also on the rise. Many individuals use them for illegal and unethical activities such as phishing and fraud. Spammers send... more
In the virtual world, many internet applications are used by a mass of people for several purposes. Internet applications are the basic needs of people in the modern days of lifestyle which are also making habitual society. Like social... more
Human beings have various capability that is helpful for acquiring different types of knowledge form environment or surroundings. In which of them to find human gender by their voice is an easy task to the human because human as a growing... more
Email categorization is a critical function in any email client since it allows you to manage and arrange your emails into semantic groupings. Following the success of statistical artificial intelligence and machine learning in many areas... more
correspondence with different gadgets set all through various water treatment plants, as well as preventive and information examination strategies to help decision making.
The aim of malware analysis is to detect whether a file is infected or not in order to avoid any kind of system intrusion. The goal of this research is to find the optimal machine learning algorithm to predict whether a file is malicious... more
The vehicular adhoc network (VANET), a developing study area in the intelligent transportation system, provides the network's vehicles with crucial information. Road accidents harm around 160 000 people;thus, they must be reduced, and... more
In Renewable energy schemes, Solar photovoltaic (PV) systems provide effective incorporation of generating electrical energy. Many current control techniques such as Hysteresis control, predictive control and Sliding mode control are... more
In many nations, the prevalence of diabetes is rising, and its impact on national health cannot be overlooked. Smart medicine is a medical concept in which technology is used to aid in disease detection and treatment. The objective of... more
Sp-DELES scale; and the second to measure academic satisfaction using the SA scale. The results demonstrated a perfect positive relationship with a = .935. determining a relationship between the psychosocial learning environment of... more
Email is one of the most popular modes of communication we have today. Billions of emails are sent every day in our world but not every one of them is relevant or of importance. The irrelevant and unwanted emails are termed email spam.... more
Email categorization is a critical function in any email client since it allows you to manage and arrange your emails into semantic groupings. Following the success of statistical artificial intelligence and machine learning in many areas... more
Language classification is the process of identifying the disposition of a presented text, such as classifying an email or a text document into a particular category. Classifying text can involve determining
As web is expanding day by day and people generally rely on web for communication so e-mails are the fastest way to send information from one place to another. Now a day's all the transactions all the communication whether general or of... more
Personal and business users prefer to use e-mail as one of the crucial sources of communication. The usage and importance of e-mails continuously grow despite the prevalence of alternative means, such as electronic messages, mobile... more
Big Data concern large-volume, growing data sets that are complex and have multiple autonomous sources. Earlier technologies were not able to handle storage and processing of huge data thus Big Data concept comes into existence. This is a... more
Download research papers for free!