Email Classification

description45 papers

group42 followers

lightbulbAbout this topic

Email classification is the process of categorizing email messages into predefined groups based on their content, context, or metadata. This technique utilizes algorithms and machine learning methods to enhance email management, improve user experience, and facilitate the identification of spam, important messages, or specific topics.

lightbulbAbout this topic

Key research themes

1. How can semantic and structural attributes improve context-based email classification?

This research area focuses on leveraging the rich semantic and structural characteristics of emails to enhance classification accuracy. By representing emails not merely as text but as structured entities (e.g., graphs capturing semantic roles and event types), classifiers can better differentiate among nuanced classes like social, personal, and professional emails. This approach moves beyond traditional bag-of-words or keyword models to embrace the contextual and layout features inherent in emails, which is crucial for applications such as event management and prioritization.

Context-based email classification model

by Dr. Shaukat Wasi

2016

Key finding: This paper introduces a novel graph-based representation of emails capturing semantic and structural attributes, which serves as a foundation for template mining algorithms that identify frequent event patterns. By... Read more

articleView Paper downloadDownload

Email Classification Research Trends: Review and Open Issues

by Liyana Shuib

2022, IEEE Access

Key finding: This comprehensive review identifies the importance of exploiting diverse feature sets beyond simple textual content, including semantic features, structural patterns, and metadata in email classification. It highlights that... Read more

articleView Paper downloadDownload

An Optimized Feature Selection Technique For Email Classification

by OLANIYAN AYODELE

2025

Key finding: By integrating feature selection via Particle Swarm Optimization with Support Vector Machines, this study optimizes classification accuracy and reduces computational overhead on large email datasets. The work underscores how... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What machine learning models and feature engineering techniques yield high performance in spam email detection?

The surge in spam emails necessitates robust, efficient spam detection systems. This research theme investigates various supervised learning algorithms—such as Naive Bayes, Support Vector Machines (SVM), Random Forests, and ensemble methods like boosting—and feature extraction strategies like TF-IDF, bag-of-words, and word embeddings. It explores how these algorithms perform on benchmark datasets (e.g., Enron, Spambase, Ling-Spam) in terms of precision, recall, and accuracy, with considerations for computational efficiency and adaptability to evolving spam tactics.

Comparative Analysis of Classification Algorithms for Email Spam Detection

by Shafi'i Muhammad ABDULHAMID

2017, I. J. Computer Network and Information Security

Key finding: Through experimental comparisons across multiple classifiers, this study found Rotation Forest to achieve the highest accuracy (94.2%) in spam detection without feature selection or boosting, highlighting the potential of... Read more

articleView Paper downloadDownload

Comparison of Three Machine Learning Models for the Detection of Emails Spam

by Raed Alkaied

2024, Research Square (Research Square)

Key finding: This work compares Naive Bayes, K-Nearest Neighbor (KNN), and Support Vector Machine (SVM) classifiers on the Spambase dataset, demonstrating that Naive Bayes outperforms the others in terms of accuracy and evaluation... Read more

articleView Paper downloadDownload

SPAM EMAIL DETECTION USING MACHINE LEARNING INTEGRATED IN CLOUD

by Joyece Jane

2023

Key finding: The study proposes hybrid bagging and boosting ensembles combining multinomial decision trees, Naive Bayes, KNN, Random Forest, and SVM to boost spam detection accuracy beyond standalone classifiers. Testing on Ling-Spam and... Read more

articleView Paper downloadDownload

Hybrid Email Spam Detection Model Using Artificial Intelligence

by Bouabid EL OUAHIDI

2025, International Journal of Machine Learning and Computing

Key finding: This research innovates beyond bag-of-words by representing emails with paragraph vectors (PV-DM) and TF-IDF combined features. Empirical results on Enron and Ling spam datasets show that this representation, when coupled... Read more

articleView Paper downloadDownload

Performance Evaluation of Machine Learning Algorithms on Textual Datasets for Spam Email Classification

by IJRASET Publication

2022, International Journal for Research in Applied Science and Engineering Technology (IJRASET)

Key finding: This paper rigorously evaluates Naive Bayes, SVM, Random Forest, and XGBoost classifiers using Bag of Words and TF-IDF feature representations on combined spam datasets. The linear SVM with TF-IDF features yields the best... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How do specialized language and regional characteristics influence email classification techniques?

This research question addresses the challenges and methodologies involved in classifying emails written in specific languages, particularly Arabic, which has unique morphological and syntactic traits compared to widely studied languages like English. The focus is on adapting deep learning and natural language processing approaches to handle limited training data, complex morphology, and language-specific lexicons to classify business emails effectively. Understanding these tailored models is essential for enabling accurate automatic email classification and filtration in regional and resource-constrained language contexts.

A novel approach for Arabic business email classification based on deep learning machines

by Muhannad Al-jabi

2024, PeerJ Computer Science

Key finding: This study develops deep learning models leveraging natural language processing and domain-specific lexicons to classify a large-scale Arabic business email dataset into urgency, sentiment, and topic categories. It addresses... Read more

articleView Paper downloadDownload

Multi-Label Email Classification Using Random Forest Classifier

by International scientific research and publications and

2022

Key finding: The paper applies Random Forest classifiers for multi-label categorization of emails into user-defined semantic groups and label recommendation, demonstrating higher average recall (64%) compared to Naive Bayes (63%).... Read more

articleView Paper downloadDownload

All papers in Email Classification

Postmasters didn't panic? When email is the same as snail mail

by Terence Rajivan Edward

2025

When email became more widely used, in the late 1990s, I heard the term “snail mail” to refer to mail by post. Whereas anyone online could almost instantly send a little letter to someone else online, or a big one, if one uses traditional... more

descriptionView Paper arrow_downwardDownload

Content Based E Mail Classification

by Ms.sonal chakole

2025, International journal of scientific research in science, engineering and technology

Electronic Mail (E-mail) has established a significant place in information user’s life. Mails are used as a major and important mode of information sharing because emails are faster and effective way of communication. Email plays its... more

descriptionView Paper arrow_downwardDownload

Email Classification into Relevant Category Using Neural Networks

by Shruti Goyal

2024

In the real world, many online shopping websites or service provider have single email-id where customers can send their query, concern etc. At the back-end service provider receive million of emails every week, how they can identify... more

descriptionView Paper arrow_downwardDownload

Machine Learning Techniques for Medicinal Leaf Prediction and Disease Identification

by 'International Journal of Experimental Research and Review ISSN 2455-4855 (Online) and

2024, International Journal of Experimental Research and Review

Trees have been a crucial component in humans' lives for hundreds of years, providing food, shelter, and medicine. Some trees have a lot of medicinal properties that cure many diseases. In the old days, Ayurvedic methods were popular for... more

descriptionView Paper arrow_downwardDownload

EmFore: Online Learning of Email Folder Classification Rules

by Mukul Singh

2024, Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Modern email clients support predicate-based folder assignment rules that can automatically organize emails. Unfortunately, users still need to write these rules manually. Prior machine learning approaches have framed automatically... more

descriptionView Paper arrow_downwardDownload

Email Prioritization Using Machine Learning

by sangameshwari maitri

2024, SSRN Electronic Journal

Personal and business users prefer to use email as one of the crucial sources of communication. The usage and importance of e-mails continuously grow despite the prevalence of alternative means, such as electronic messages, mobile... more

Training Output- For Training data, we had taken 80% part of actual data. The data went through the implementation process. After that, it generates the desired output and this implementation process is again done for testing data. The graphs below are generated which shows frequency (no of mails) against no of users (i.e. no of emails per user).

descriptionView Paper arrow_downwardDownload

Multi Perceptron Neural Network and Voting Classifier for Liver Disease Dataset

by Khadar Babu

2024, IEEE Access

The liver is one of the most significant organs in the human body. We can predict liver disease in a patient at an early stage based on previously predicted values using data from patients with abnormal liver function. Which helps the... more

FIGURE 1. Patient gender count(Left), Grouping by age wise(Right). TABLE 1. An outline of the patient dataset for liver disease.

FIGURE 2. Classification of diseased person based on age(Left), Healthy and Unhealthy liver ratio(Right).

FIGURE 5. Multilayer Perceptron implementation by Layer wise. Since our datasets have been standardized and scaled down to be between | and -1, we chose a 3-layer neural network with the Tanh and Sigmoidal activation functions to handle the highly overlapping and non-linear nature of the data which can produce better accuracy. Then its procedures are given as Figure 5. Since our datasets have been standardized and scaled down

FIGURE 7. Pair-plots among MLP, SVM, KNN, and HVC.

FIGURE 9. Compared models using evaluating criteria. FIGURE 8. Correlation matrix based on the Attribute of the Liver diseased data set. so we have to choose a non-linear model function for this dataset.

FIGURE 10. Individual models vs criteria comparison.

TABLE 2. Result of models using various evaluating criteria.

FIGURE 11. Model confusion Matrices among MLP, SVM, KNN and HVC.

descriptionView Paper arrow_downwardDownload

Study on Using Machine Learning-Driven Classification for Analysis of the Disparities between Categorized Learning Outcomes

by Robert Banasiak

2024, Electronics

Learning outcomes are measurable statements that articulate educational aims in terms of what knowledge, skills, and other competences students possess after successfully completing a given learning experience. This paper presents an... more

descriptionView Paper arrow_downwardDownload

Spam Mails Filtering Using Different Classifiers with Feature Selection and Reduction Technique

by Renuka Yadav

2024, 2015 Fifth International Conference on Communication Systems and Network Technologies

The continuous growth of email users has resulted in the increasing of unsolicited emails also known as Spam. In current, server side and client side anti spam filters are introduced for detecting different features of spam emails.... more

descriptionView Paper arrow_downwardDownload

Text Analysis Based Human Resource Productivity Profiling

by Joyece Jane

2024

Email being an efficient, cost-effective, real-time communication mode results into effective productivity among the professional in the organization. It constitutes almost 90% of daily office procedures in organizations, hence the... more

For our research, we have developed the approach as depicted by Fig. 1. The proposed approach has been designed as the flow between the extractions of mail content from the ENRON email dataset to employee categorization based on their productivity mapped to the frequency of type of words used by them in their mail contents. The Enron email data set has around 517,431 digital communications. This dataset hac heen nrovided hv the FERC (Federal Fnerov Reonlatory Commission) for academic

Fig. 2. Bar Graph of emails analysis of year 1999

Fig. 5. Bar Graph of emails analysis of 5-years

descriptionView Paper arrow_downwardDownload

Psychometric Test and Personality Assessment by using Machine Learning

by IJRASET Publication

2024, International Journal for Research in Applied Science & Engineering Technology (IJRASET)

Psychometric tests and personality assessments are widely used in a variety of settings, from academic research to employment screening. Traditional methods of administering and scoring these tests can be time-consuming and... more

descriptionView Paper arrow_downwardDownload

Stress Detection Based on Naïve Bayes Algorithm

by IJRASET Publication and

2023, International Journal for Research in Applied Science & Engineering Technology (IJRASET)

Stress is a prevalent issue that affects individuals' mental and physical well-being, leading to various health problems. The use of machine learning (ML) has been gaining popularity as a tool for stress detection. ML techniques have... more

descriptionView Paper arrow_downwardDownload

Using Decision Tree Algorithms in Detecting Spam Emails Written in Malay: A Comparison Study

by Rizal Salim

2023, ITM Web of Conferences

Emails have become the most economical and fastest communication forms. However, during the past few years, the increment of email users has dramatically increased spam emails. Various anti-spam techniques have been developed to minimize... more

descriptionView Paper arrow_downwardDownload

Study on Using Machine Learning-Driven Classification for Analysis of the Disparities between Categorized Learning Outcomes

by Adrianna Kozłowska

2023, Electronics

descriptionView Paper arrow_downwardDownload

Multi Perceptron Neural Network and Voting Classifier for Liver Disease Dataset

by Victor Arun

2023, IEEE Access

descriptionView Paper arrow_downwardDownload

SMS Spam Filtering Using Machine Learning Techniques: A Survey

by Hedieh Sajedi

2023

Objective: To report a review of various machine learning and hybrid algorithms for detecting SMS spam messages and comparing them according to accuracy criterion. Data sources: Original articles written in English found in... more

Figure 1. Number of papers about SMS spam that published in Conferences’ proceedings and Journals from the year of 2004 to 2015.

Table 1. The way of Feature Extraction in different researches.

Table 2. Shows the descriptive statistics of the data sets and compare the data sets from different aspects.

Table 3. Classification accuracy comparison of machine learning approaches by using the same training and testing set [27].

Table 4. Evaluation and comparison of different approaches (ND means not defined and “-” means these methods don t have data set) accuracy of 90.17%. Therefore, the accuracy of the practical version of this method is 90.17% and it classifies an incoming SMS in 0.04 second. The problem of this method is that it detects spam messages on end-user so it does not eliminate the consumption of network’s traffic by spam messages.

descriptionView Paper arrow_downwardDownload

Using natural language processing to predict student problem solving performance

by Jeremy Munsell

2023, 2021 Physics Education Research Conference Proceedings

In this work we report on a pilot study where we used machine learning to predict whether students will correctly solve the classic "ballistic pendulum" problem based on an essay written by students elucidating their approach to solving... more

descriptionView Paper arrow_downwardDownload

Email Prioritization Using Machine Learning

by Ahmed Shaikh

2023, SSRN Electronic Journal

descriptionView Paper arrow_downwardDownload

An Efficient method for Recognition of Occluded Faces from Images

by Dr.SHASHI DHAR V

2023

The detection of masked face is becoming an essential part of health care safetydue to the pandemic caused by the coronavirus and the surveillance systems. One of the most challenging problems in face recognition systems is the accurate... more

descriptionView Paper arrow_downwardDownload

A Survey of Classification Techniques in the Area of Big Data

by Sheetal Girase

2023, arXiv (Cornell University)

Big Data concern large-volume, growing data sets that are complex and have multiple autonomous sources. Earlier technologies were not able to handle storage and processing of huge data thus Big Data concept comes into existence. This is a... more

descriptionView Paper arrow_downwardDownload

A Multi-Classifier Based Prediction Model for Phishing Emails Detection Using Topic Modelling, Named Entity Recognition and Image Processing

by Emilin Shyni

2023, Circuits and Systems

Phishing is the act of attempting to steal a user's financial and personal information, such as credit card numbers and passwords by pretending to be a trustworthy participant, during online communication. Attackers may direct the users... more

descriptionView Paper arrow_downwardDownload

Enhanced Machine learning algorithms Lightweight Ensemble Classification of Normal versus Leukemic Cel

by Aatif Jamshed

2023, Journal of Pharmaceutical Negative Results

Leukemia is type of cancer in blood which impacts the lymphatic framework and the bone marrow and also impacts white blood cells. Leukemia, in contrast to other types of cancer, does not produce solid tumors; instead, it produces a huge... more

The implementation of the models is made easier by the use of open source fastai and the deep learning package in Python fot both pre-trained models. The 34-layered pre-trained model is called ResNet-34. Deep neural networks' effectiveness is affected by the framework anc the dataset. Better performance is produced by the deep network of the CNNs and a huge dataset. The network is drilled deeper performance starts to suffer. The diminishing gradient is the root cause of this issue. By bypassing some levels, the ResNet i able to fix this issue when gradients advance from the initial layers to the final ones. The ResNet model's layers can be calculatec mathematically using

Figure 3: Architecture of Resnet-34 The gradient can simply flow by omitting the connections between layers, which speeds up layer training. A total of 34 layers make up ResNet-34, including one convolutional and the pooling layer also four additional layers that use identical pattern. Every layer is convolved with three-way convolution using a feature map with the corresponding sizes of 64, 128, 256, and 512.

Figure 4: Resnet-34 Confusion Matrix Classification of leukemia subtypes utilizing ResNet-34 and DenseNet-121, continuously, is represented by the confusiot matrix. Figures and 5 clearly show how well the presented models predicate. Various parameters, including the precision and recall along with Fl score, and accuracy, are utilized to demonstrate the effectiveness of the procedures used.

For ALL and healthy cases, ResNet-34 and the DenseNet-121's prediction of accuracy is 100%, and their precision and recall also FI score are also 100%, or 1.0. ResNet-34's AML prediction accuracy is 99.66%; its precision at 1.0%; its recall is the 0.99%; and its Fl score is also 0.98%. ResNet-34 has a 99.74% precision and recall also Fl score for the CLL. Precision, recall, and F1 score are all 0.99%, though. ResNet-34 predicts CML with an accuracy of 99.73%, precision of 0.99%, recall of 1.0%, and F1 score of 0.98%; DenseNet-121 predicts AML with an accuracy of 99.92%. Precision, recall, and F1 score are all 1.0%, though. DenseNet-121's prediction accuracy for CLL is 99.92%; its precision is 1.0%; its recall is 0.98%; and its Fl score is 1.0%. The DenseNet-121 F1 score, recall, precision, and accuracy for CML predictions are all 100%.

descriptionView Paper arrow_downwardDownload

Classification of Non-Functional Requirements From IoT Oriented Healthcare Requirement Document

by iqra Khurshid

2023, Frontiers in Public Health

Internet of Things (IoT) involves a set of devices that aids in achieving a smart environment. Healthcare systems, which are IoT-oriented, provide monitoring services of patients' data and help take immediate steps in an emergency.... more

descriptionView Paper arrow_downwardDownload

A Review on Mobile SMS Spam Filtering Techniques

by Adamu Ibrahim

2023, IEEE Access

Under short messaging service (SMS) spam is understood the unsolicited or undesired messages received on mobile phones. These SMS spams constitute a veritable nuisance to the mobile subscribers. This marketing practice also worries... more

descriptionView Paper arrow_downwardDownload

Performance Analysis, Comparative Survey of Various Classification Techniques in Spam Mail Filtering

by Uma Uma

2023, Oriental journal of computer science and technology

With the most preferred communication method e-mails have become part of day to day life. Spams which are also called unwanted, junk ,unsolicited mail is one of the major problem in using the e-mails. There are basically two things that... more

Fig. 2: The Process of Spam Mail Filtering. Feature Selection Methods

Table 1: Theoratical Findings of Classification Techniques

descriptionView Paper arrow_downwardDownload

Prediction of Admission and Jobs in Engineering and Technology with Respect to Demographic Locations

by IJRASET Publication

2023, International Journal for Research in Applied Science & Engineering Technology (IJRASET)

India as we know is a densely populated country and Every year more than 6 crores of Indians graduates from diverse backgrounds and with diversity in education. Almost similar number of students enter into colleges for taking various education to help them in seeking jobs. Many sectors have experienced tremendous growth in employment and thus masses opt for those sectors whereas in many sectors there is huge unemployment either due to low jobs availability or demand of skilled workers is required. Thinking of the each and every branch and when comparing it with the current employment in India and abroad, we will definitely find some points that will help in predicting the admissions and jobs scenarios in the fields of engineering and technology, management and pharmacy. Due to the changing technology and its requirement for getting employed in India and abroad, there has to improvements suggested by experts for predicting the Prediction of Admission & Jobs in Engineering & Technology with respect to demographic locations. This is not a one time process and needs to be done frequently as trends in the industry keep changing. Addressing this problem will introduce the required changes that would bring the current youth and upcoming generations in parallel with the students of other countries in terms of knowledge and skills in that domain. There is a need to forecast the current trend in the admissions and job sectors so as to blend the courses and syllabus accordingly to keep the youth employed and skilled with rapidly changing world. Here we will achieve it by using Machine Learning algorithm. I. INTRODUCTION A. Motivation India, as we all know, is a highly populated nation, and every year, more than 6 crore Indians graduate from a variety of educational backgrounds and socioeconomic backgrounds. A nearly equal percentage of students enroll in institutions to pursue various degrees that will aid in their career search. In comparison to ten years ago, the demands of the work market have changed significantly with the advancement of technology. While there is significant unemployment in many industries due to either a lack of jobs or a need for qualified workers, several sectors have seen remarkable development in employment and as a result, many people choose to work in those sectors. When considering each and every branch and contrasting it with the employment trends both in India and abroad, we will undoubtedly discover some factors that will assist in predicting admissions and employment scenarios in the fields of management, pharmacy, and engineering and technology. B. Problem Statement Improvements have been suggested by experts for predicting the Prediction of Admission & Jobs in Engineering & Technology, Management, and Pharmacy with respect to demographic locations due to the changing technology and its requirement for employment in India and abroad. Due to the constant shift in industry trends, this is not a one-time process that needs to be completed frequently. If this problem is solved, the necessary changes will be made, putting the young people of today and the generations to come up with the same knowledge and skills as students in other countries. However, despite AICTE's approval of the proposed courses, graduates of such institutions, engineering divisions, management departments, schools, and others continue to be unemployed. There is currently no method for estimating or forecasting the short-and long-term employment potential of any Engineering, Management, Pharmacy, or other course by combining data from various sources and developing a computer program or application. In order to keep young people employed and skilled in a world that is rapidly changing, it is necessary to forecast the current trend in the admissions and employment sectors in order to blend the courses and syllabus accordingly.

Here in the output 1 the admission is predicted in the ITT Kharagpur college for the AIEEE Rank of 57. Figure 7.1.1 Linear Regression Output Graph

nse eciaeeeneinaeatannat Figure 12.1 Salary vs Graduation year based on Specialization

Figure 12.2 Area wise Salary vs Graduation year

Those parameters are: Year, 10th Marks, 12th Marks, 12th Division, AIEEE Rank, College.

Figure 10.1 Dataset of Job Prediction The data set comprises of different factors attributed towards picking the right university. It contains data of 164 different data. Data set is classified into 4 different parameters which are considered important during the application for Engineering. Those parameters are: College State, Specialization, Graduation Year, Salary

descriptionView Paper arrow_downwardDownload

A Test Collection for Relevance and Sensitivity

by Mahmoud Foaad

2023, Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval

Recent interest in the design of information retrieval systems that can balance an ability to find relevant content with an ability to protect sensitive content creates a need for test collections that are annotated for both relevance and... more

descriptionView Paper arrow_downwardDownload

Book Cave: A Bookstore for Everyone

by IJRASET Publication

2023, International Journal for Research in Applied Science & Engineering Technology (IJRASET)

descriptionView Paper arrow_downwardDownload

Public Open Space Assessment and Management: A case of Damak Municipality, Jhapa, Nepal

by IJRASET Publication

2023, International Journal for Research in Applied Science & Engineering Technology (IJRASET)

Most of the public spaces are being used for the development of public buildings in the name of schools, hospitals, multipurpose halls, temples, community buildings, etc. The lack of lively and quality public places is faced by all in... more

descriptionView Paper arrow_downwardDownload

Optimization of E-Learning and Performance Using Iot and 6G Technology

by melanie lourens

2023, Journal of Pharmaceutical Negative Results

The sixth-generation (6G) has stricter criteria for the online learning capability and high interpretability of taught algorithms. It is anticipated that machine learning would be crucial for making network effective and flexible, however... more

The outcomes shows of MNIST information assortment using visual appraisal technique, and first figures are shown in Quite a while For the assessment, we shows the visual order delayed consequences of MNIST-0 and MNIST-4 in Figures Take note, white types pixels appearing in the last pictures of Fig. 2 seem, by all accounts, to be fairly more than in Figures , and that is in light of the fact that that Sabbas picks more establishment pixels

Other than the free interpretable technique, we likewise propose a joint strategy. In principle, the joint technique can further develop the expectation execution while giving the understanding outcomes. The deciphering results are practically equivalent to the autonomous strategy, which are discarded. Fig. 2. The assessment consequence of digit showing 0 with autonomous strategy. here principal line showing top-k significant elements, utilize the white various pixels to address them. Also, the subsequent column represents the top-k significant elements in the digits' diagram.

Fig 3 The assessment results of the digit 4 with the free technique. Fundamental line is showing top-k basic highlights, and we utilize the different white pixels to address them. Additionally, the subsequent line depicts the top-k immense parts in digits' construction.

descriptionView Paper arrow_downwardDownload

Threshold Optimization in Multiple Binary Classifiers for Extreme Rare Events using Predicted Positive Data

by edgar robles

2023

Binary classification is challenging when dealing with imbalanced data sets where the important class is made of extremely rare events, usually with a prevalence of around 0.1%. Such data sets are common in various real-world problems.... more

descriptionView Paper arrow_downwardDownload

Detecting known and new salting tricks in unwanted emails

by Brian Witten

2023

Spam and phishing emails are not only annoying to users, but are a real threat to internet communication and web economy. The fight against unwanted emails has become a cat-and-mouse game between criminals and people trying to develop... more

Figure 3: ROC graphs for two tricks and two OCR engines

Figure 5: Detailed ROC graph for font size trick

Figure 4: Detailed ROC graph for font color trick

Table 2 shows the results we obtain for the detection of two salting tricks, font color and font size. Table 2: Results for salting trick detection

Table 3: Results for salting trick detection In this subsection we look into the individual distance metrics proposed in Section 4. Table 3 summarizes the classification results for both salting tricks and each individual distance metrics. We can observe that the TOC-feature is clearly the most useful one. For the font color trick the classifier based on this feature alone even outperforms the clas- sifier based on all features. On the other hand, the LENGTH and COMPLEXITY features are useful for the font color trick detection but not at all useful for the font size trick detection.

descriptionView Paper arrow_downwardDownload

Efficient email classification approach based on semantic methods

by Eman Bahgat

2023, Ain Shams Engineering Journal

Emails have become one of the major applications in daily life. The continuous growth in the number of email users has led to a massive increase of unsolicited emails, which are also known as spam emails. Managing and classifying this... more

descriptionView Paper arrow_downwardDownload

9). Automatic Labeling for Entity Extraction in Cyber Security. Retrieved from Cornell University Library: http://arxiv.org/abs/1308.4941

by John Goodall

2023

Timely analysis of cyber-security information necessitates automated information extraction from unstructured text. While state-of-the-art extraction methods produce extremely accurate results, they require ample training data, which is... more

descriptionView Paper arrow_downwardDownload

USING MACHINE LEARNING AND NLP TECHNIQUES FOR EFFICIENT SPAM EMAIL DETECTION

by Joyece Jane

2023

Email spam has become a prevalent issue in recent times, with the growing number of internet users, spam emails are also on the rise. Many individuals use them for illegal and unethical activities such as phishing and fraud. Spammers send... more

descriptionView Paper arrow_downwardDownload

Email-Based Cyberstalking Detection On Textual Data Using Multi-Model Soft Voting Technique Of Machine Learning Approach

by Joyece Jane

2023

In the virtual world, many internet applications are used by a mass of people for several purposes. Internet applications are the basic needs of people in the modern days of lifestyle which are also making habitual society. Like social media, e-mail technology is also more prevalent among people of different categories for personal and official communications. The widespread use of e-mailbased communication is also raising various types of cybercrimes, including cyberstalking. Cyberstalkers also use an e-mail-based approach to harass the victim in the form of cyberstalking. Cyberstalkers utilize several content-wise and intent-wise approaches to target the victim, such as spamming, phishing, spoofing, malicious, defamatory, e-mail bombing, and non-spam e-mails, including sexism, racism, and threatening, and finally, trying to hack the account over e-mail technology. This paper proposed an EBCD model for automatic cyberstalking detection on textual data of e-mail using the multi-model soft voting technique of the machine learning approach. Initially, experimental works were performed to train, test, and validate all classifiers of three model sets on three different labeled datasets. Dataset D1 contains spam, fraudulent, and phishing e-mail subject, dataset D2 contains spam e-mail body text, while dataset D3 contains harassment-related data. After that, trained, tested, and validated classifiers of all model sets were applied as a combined approach to automatically classify the unlabeled e-mails from the user's mailbox using the multi-model soft voting technique. The proposed EBCD model successfully classifies the e-mails from the user's mailbox into cyberstalking e-mails, suspicious e-mails (spam and fraudulent), and normal e-mails. In each model set of the EBCD model, several classifiers, namely support vector machine, random forest, naïve bayes, logistic regression, and soft voting, were used. The final decision in classifying the e-mails from the user's mailbox was taken by the soft voting technique of each model set. The TF-IDF feature extraction method was used with the entire applied machine learning model sets to obtain the feature vectors from the data. Experimental results show that the soft voting technique not only enhances the performance of the e-mail classification task but also supports making the right decision to avoid the wrong classification. Overall performance of the soft voting technique was better than other classifiers, although the performance of the support vector machine was also notable. As per experimental results, the soft voting technique obtained an accuracy of 97.7%, 97.7%, 98.9%, a precision of 97%, 98.3%, 98.6%, recall of 98.3%, 96.5%, 99.1%, f-score of 97.6%, 97.4%, 98.9%, and AUC of 99.4%, 99.7%, 99.9% on dataset D1, D2, and D3 respectively. The average performance of soft voting of each model set on classified e-mails from the user's mailbox was also notable, with an accuracy of 96.3%, precision of 98.1%, recall of 94%, f-score of 95.9%, and AUC of 96.8%.

descriptionView Paper arrow_downwardDownload

Classifying Human Gender by Learning the Acoustic Features of Voice Samples

by Sandeep Kumar

2022, Neuro Quantology

Human beings have various capability that is helpful for acquiring different types of knowledge form environment or surroundings. In which of them to find human gender by their voice is an easy task to the human because human as a growing... more

descriptionView Paper arrow_downwardDownload

Multi-Label Email Classification Using Random Forest Classifier

by International scientific research and publications and

2022

Email categorization is a critical function in any email client since it allows you to manage and arrange your emails into semantic groupings. Following the success of statistical artificial intelligence and machine learning in many areas... more

critical structural features in phishing emails and employing different machine learning algorithms to their dataset for the classification process. They used 16 features in their model. They created unique features based on keywords; for example,

descriptionView Paper arrow_downwardDownload

The Smart IoT based Integrated Technology and Environmental Management for Air and Water Remediation

by International scientific research and publications and

2022

correspondence with different gadgets set all through various water treatment plants, as well as preventive and information examination strategies to help decision making.

descriptionView Paper arrow_downwardDownload

Static Malware Analysis Using Optimal Machine Learning Algorithm for Malware Detection

by International scientific research and publications

2022

The aim of malware analysis is to detect whether a file is infected or not in order to avoid any kind of system intrusion. The goal of this research is to find the optimal machine learning algorithm to predict whether a file is malicious... more

Figure 2: Gives the accuracy comparison of the classifiers used

NeuroQuantology | August 2022 | Volume 20 | Issue 10 | Page 4128-4141] doi: 10.14704/nq.2022.20.10.NQ5540; Mrunalini U. Buradkar / Static Malware Analysis Using Optimal Machine Learning Algorithm for Malware Detection

NeuroQuantology | August 2022 | Volume 20 | Issue 10 | Page 4128-4141] doi: 10.14704/nq.2022.20.10.NQ55402 Mrunalini U. Buradkar / Static Malware Analysis Using Optimal Machine Learning Algorithm for Malware Detection

descriptionView Paper arrow_downwardDownload

Ensemble based machine learning algorithms in 6g for VANET

by International scientific research and publications

2022

The vehicular adhoc network (VANET), a developing study area in the intelligent transportation system, provides the network's vehicles with crucial information. Road accidents harm around 160 000 people;thus, they must be reduced, and... more

descriptionView Paper arrow_downwardDownload

Improvement of Robustness in Grid Connected Solar System Using Artificial Neural Network based Sliding Mode Controller

by International scientific research and publications and

2022

In Renewable energy schemes, Solar photovoltaic (PV) systems provide effective incorporation of generating electrical energy. Many current control techniques such as Hysteresis control, predictive control and Sliding mode control are... more

descriptionView Paper arrow_downwardDownload

Artificial intelligence-based neural network for the diagnosis of diabetes and COVID: ANN model with optimum predictor variable

by International scientific research and publications

2022

In many nations, the prevalence of diabetes is rising, and its impact on national health cannot be overlooked. Smart medicine is a medical concept in which technology is used to aid in disease detection and treatment. The objective of... more

descriptionView Paper arrow_downwardDownload

Distance education psychosocial learning environment and graduate students' academic satisfaction during the pandemic

by International scientific research and publications

2022, Science Scholar

Sp-DELES scale; and the second to measure academic satisfaction using the SA scale. The results demonstrated a perfect positive relationship with a = .935. determining a relationship between the psychosocial learning environment of... more

According to table 2, it is evident that the perception of the participants was positioned at the high level with 64.2%. This means that students feel satisfied with the psychosocial learning environments for the development of their training their learning. Distribution of frequencies of responses grouped by levels of the distance education psychosocial learning environment variable measured by the Sp- DELES scale

Table 3 showed that all six psychosocial subscales have high levels of perception. The highest scoring one was authentic learning with 90.2 % and the lowest scoring was collaboration among students with 78.0%. This suggests that students have adapted positively with psychosocial learning environments in distance education however social isolation reduced interaction and collaboration among them. Table 3

Table 5 showed that the subscales of academic satisfaction were located at the highest levels of perception of the participants, the best positioned being teaching activity with a 78.0 % and the lowest rated educational services with 68.3 %. This showed that the teaching activity and academic planning received by the students were positive, but not the educational services. Rawal et. al (2021), Poongodi M et. al(2022), Poongodi M et. al (2021), Dhiman P et.al (2022), Sahoo S.K et.al (2022), K.A et. al(2022) , Dhanraj R.K et. al (2020), Yan Zhang et.al (2020), Md Hossain et. al (2021), Md Nazirul Islam Sarker et. al (2021) ,Y. Shi et. al (2020), Guobin Chen et. al (2020), Poongodi M et. al (2019), Poongodi M et. al (2020)

Table 6 shows that a significance value was obtained that is less than the significance level of Sig.=0.000 which is lower than the significance level 0.010 (Sig. < 0.010) in addition to reaching a perfect positive correlation level with an rs = .935** In summary, it was possible to demonstrate that the relationship

descriptionView Paper arrow_downwardDownload

Performance Evaluation of Machine Learning Algorithms on Textual Datasets for Spam Email Classification

by IJRASET Publication

2022, International Journal for Research in Applied Science and Engineering Technology (IJRASET)

Email is one of the most popular modes of communication we have today. Billions of emails are sent every day in our world but not every one of them is relevant or of importance. The irrelevant and unwanted emails are termed email spam.... more

Feature selection is essential for eliminating irrelevant features to avoid the problem of Curse of Dimensionality. In the combined dataset, the data pre-processing tasks such as tokenization, punctuation removal, stop-word removal, lower casing, and lemmatization were performed using Natural Language Toolkit (NLTK) library [19] in Python.

“Sty oe ST, ROMANE eS ES SARTRE As A confusion matrix is a combined representation of TP, TN, FP, and FN. It summarizes the classification performance of an ML algorithm in graphical or tabular format. Fig. 2, Fig. 3, Fig. 4, and Fig. 5 show the confusion matrices of MNB, SVM, RF, and XGB algorithms respectively (for both Phase-1 and Phase-2).

Fig. 6, Fig. 7, Fig. 8, and Fig. 9 show ROC curves of MNB, SVM, RF, and XGB algorithms respectively (for both Phase-1 and Phase-2).

Table I Performance of ML Algorithms with BOW features Table I contains performance metrics of all algorithms trained with BoW features in Phase 1 and Table II contains performance metrics of all algorithms trained with TF-IDF features in Phase 2.

Table II Performance of ML Algorithms with TF-IDF features

descriptionView Paper arrow_downwardDownload

Multi-Label Email Classification Using Random Forest Classifier

by Dr.Ashutosh D Gaur

2022, Multi-Label Email Classification Using Random Forest ClassifierMulti-Label Email Classification Using Random Forest Classifier AnKa Publisher

descriptionView Paper arrow_downwardDownload

Vietnamese spam detection based on language classification

by Tran Lan Anh

2022, 2008 Second International Conference on Communications and Electronics

Language classification is the process of identifying the disposition of a presented text, such as classifying an email or a text document into a particular category. Classifying text can involve determining

descriptionView Paper arrow_downwardDownload

Spam Mail Detection through Data Mining – A Comparative Performance Analysis

by Megha Rathi

2022, International Journal of Modern Education and Computer Science

As web is expanding day by day and people generally rely on web for communication so e-mails are the fastest way to send information from one place to another. Now a day's all the transactions all the communication whether general or of... more

descriptionView Paper arrow_downwardDownload

Email Classification Research Trends: Review and Open Issues

by Liyana Shuib

2022, IEEE Access

Personal and business users prefer to use e-mail as one of the crucial sources of communication. The usage and importance of e-mails continuously grow despite the prevalence of alternative means, such as electronic messages, mobile... more

descriptionView Paper arrow_downwardDownload

A Survey of Classification Techniques in the Area of Big Data

by Henny Warsilah

2022

descriptionView Paper arrow_downwardDownload