Academia.eduAcademia.edu

Statistical Classification

description715 papers
group3 followers
lightbulbAbout this topic
Statistical classification is a method in statistics and machine learning that involves assigning items or observations to predefined categories based on their features. It utilizes algorithms to analyze data patterns and make predictions, enabling the categorization of new data points based on learned relationships from training datasets.
lightbulbAbout this topic
Statistical classification is a method in statistics and machine learning that involves assigning items or observations to predefined categories based on their features. It utilizes algorithms to analyze data patterns and make predictions, enabling the categorization of new data points based on learned relationships from training datasets.

Key research themes

1. How can we comparatively evaluate the effectiveness of different classification algorithms across diverse application domains?

This research area focuses on empirically comparing the performance of widely-used classification algorithms, like Naive Bayes, Support Vector Machines (SVM), K-Nearest Neighbors (KNN), Decision Trees, Random Forest, and Gradient Boosting, across various real-world datasets. Understanding algorithm strengths and weaknesses in different contexts helps practitioners select appropriate classifiers for specific domains such as education, medical diagnosis, network security, and text classification. Evaluations often rely on metrics such as accuracy, precision, recall, F1 score, and computational efficiency.

Key finding: This paper found that K-Nearest Neighbor (KNN) outperformed Naive Bayes and Support Vector Machines (SVM) for predicting student study duration based on academic performance data. The study highlights the practical usage of... Read more
Key finding: On breast cancer datasets, Decision Trees (ID3), Naive Bayes, SVM, and KNN were empirically compared with R programming. The study observed that Decision Trees and SVM performed well, with KNN and Naive Bayes lagging... Read more
Key finding: Using five different datasets from the UCI repository, Decision Trees were found to generally provide higher accuracy than Naive Bayes and KNN, while KNN yielded faster execution times but higher average error rates. This... Read more
Key finding: This study comparing Random Forest, Logistic Regression, Support Vector Classification, Gradient Boosting, and XGBoost for SMS spam detection found Support Vector Classification (SVC) achieved the best accuracy (97.93%) and... Read more
Key finding: Deep learning algorithms with Tanh and Exprectifier activation outperformed classical classifiers such as SVM, KNN, Naive Bayes, Random Forest, and Decision Tree on breast cancer prediction tasks, achieving 93.14% accuracy... Read more

2. What are the critical complexity measures that characterize classification problems and how do they inform classifier selection and development?

This theme investigates theoretical and data-driven metrics to quantify the intrinsic difficulty of classification problems, encompassing class overlap, data sparsity, dimensionality, and decision boundary complexity. Such complexity measures can guide the choice of classification algorithms, feature engineering, and data preprocessing strategies by anticipating classification challenges and expected performance.

Key finding: The paper surveys numerous complexity measures extracted directly from training datasets, such as feature overlap, class separability, and decision boundary characteristics. It shows that these measures help predict problem... Read more

3. How can hybrid methods integrating clustering and ensemble classification improve text categorization tasks like news classification?

This research theme explores the combination of unsupervised clustering techniques and ensemble-based supervised classifiers to enhance text document classification accuracy and interpretability. Clustering captures underlying data structure and groups similar documents, which can be used as additional features to augment classification models. Ensemble methods leverage multiple classifiers to improve robustness and predictive performance. The integration enables effective handling of noisy, heterogeneous, and high-dimensional text data.

by k srikala and 
1 more
Key finding: This study proposed a pipeline combining Agglomerative Hierarchical Clustering with ensemble classifiers including Gradient Boosting, Bagging Classifier, and Random Forest on BBC News dataset features (derived from TF-IDF and... Read more

4. Which classification algorithms and features effectively support sentiment analysis in social media and cybersecurity contexts?

Sentiment analysis on social media and security-related textual data is hindered by linguistic subjectivity, informal language, and high dimensionality. This research area evaluates traditional machine learning models such as Naive Bayes, Support Vector Machine (SVM), Decision Trees, and ensemble methods alongside sophisticated feature extraction techniques (e.g., TF-IDF, network features) to improve the classification of sentiments and threat detection. It highlights the role of algorithm selection and feature engineering in boosting classifier performance.

Key finding: The study integrated text network features extracted from word co-occurrence graphs with traditional textual features to enhance sentiment classification on Yelp reviews. Machine learning models including SVM, Random Forest,... Read more
Key finding: Using 20,000 tweets about ChatGPT, SVM with optimized data splitting and feature selection achieved the highest classification accuracy (~80%) outperforming Naive Bayes, Decision Tree, and Gradient Boosting. This highlights... Read more
Key finding: On social media data regarding the notorious hacker Bjorka, Naive Bayes achieved better sentiment classification accuracy (70%) than C4.5 decision tree (68%) using TF-IDF weighted features. This study underlines Naive Bayes's... Read more
Key finding: The Random Forest classifier achieved the highest accuracy (99.4%) in detecting DDoS attacks in network traffic data, outperforming Decision Tree and SVM classifiers. This supports ensemble methods' utility in cybersecurity... Read more
Key finding: By utilizing a hybrid feature selection combining genetic and grasshopper optimization algorithms with Random Forest classification, this work achieved accuracies up to 99% for cloud intrusion detection on multiple benchmark... Read more

All papers in Statistical Classification

Resumen en: There is presently no unified methodology that allows the evaluation of supervised or non-supervised classification algorithms. Supervised problems are e...
The spread of illegal content, particularly online gambling promotions, is increasingly prevalent on the internet with the development of digital technology. To address this, our research developed an artificial intelligence (AI) model... more
This paper offers a new method for improving network security by using machine learning (ML) techniques in the design and implementation of an intrusion detection system (IDS). The primary objective is to address current challenges... more
Diabetes is a chronic disease that can significantly affect health at the global level, highlighting the importance of accurate early risk prediction to support prevention and management efforts. This study aims to evaluate the... more
This paper presents a multi-stage algorithm for multichannel ECG beat classification into normal and abnormal categories using a sequential beat clustering and a crossdistance analysis algorithm. After clustering stage, a search algorithm... more
This paper presents a multi-stage algorithm for multichannel ECG beat classification into normal and abnormal categories using a sequential beat clustering and a crossdistance analysis algorithm. After clustering stage, a search algorithm... more
This paper investigates the existing practices and prospects of medical data classification based on data mining techniques. It highlights major advanced classification approaches used to enhance classification accuracy. Past research has... more
Heart disease often causes death if not treated quickly and appropriately. Early diagnosis can prevent more serious complications and treat heart disease patients best. The existence of a disease prediction model can help health workers... more
Insulator pollution is a significant issue for the operation of power networks as it may lead to flashovers and thus excessive outages. Therefore, determining a Site's Pollution Severity (SPS) is an important aspect of the procedures... more
Permasalahan klasifikasi buku dalam sistem perpustakaan digital, khususnya di tingkat sekolah menengah atas (SMA), masih menjadi tantangan karena banyak institusi belum mengadopsi sistem klasifikasi otomatis. Proses manual dinilai tidak... more
The use of a Machine Learning (ML) classification algorithm to classify airborne urban Light Detection And Ranging (LiDAR) point clouds into main classes such as buildings, terrain, and vegetation has been widely accepted. This paper... more
Artificial immune systems (AIS) are relatively new class of meta-heuristics that mimics aspects of the human immune system to solve computational problems. They consist of three typical intelligent computational algorithms termed clonale... more
Artificial immune systems (AIS) are relatively new class of meta-heuristics that mimics aspects of the human immune system to solve computational problems. They consist of three typical intelligent computational algorithms termed clonale... more
In 2023, Indonesia was again devastated by a hacker known as Bjorka. Bjorka did not act just once or twice; every time, Bjorka made the entire Indonesian population proud. The 19 million BPJS Employment data belonging to the Indonesian... more
COVID-19 has appeared in china, spread rapidly the world wide and caused with many injuries, deaths between humans. It is possible to avoid the spread of the disease or reduce its spread with the machine learning and the diagnostic... more
Regression, Random Forest, Naive Bayes), the outcome of algorithms accuracy respectively was 99.61%, 94.82% ,98.37%,96.57%, and the result of execution time for algorithms respectively were 0.01s, 0.7s, 0.20s, 0.04. The Stochastic... more
COVID-19 has appeared in china, spread rapidly the world wide and caused with many injuries, deaths between humans. It is possible to avoid the spread of the disease or reduce its spread with the machine learning and the diagnostic... more
COVID-19 emerged in 2019 in china, the worldwide spread rapidly, and caused many injuries and deaths among humans. Accurate and early detection of COVID-19 can ensure the long-term survival of patients and help prohibit the spread of the... more
Modeling the effects of climate change using machine learning: A simulation study Researcher's name: It has been studied in the context of climate change. As climate change intensifies, policymakers face urgent decisions about... more
Modeling the effects of climate change using machine learning: A simulation study Researcher's name: It has been studied in the context of climate change. As climate change intensifies, policymakers face urgent decisions about... more
Modeling the effects of climate change using machine learning: A simulation study Researcher's name: It has been studied in the context of climate change. As climate change intensifies, policymakers face urgent decisions about... more
Routers classify packets to determine which flow they belong to, and to decide what service they should receive. Classification may, in general, be based on an arbitrary number of fields in the packet header. Performing classification... more
Predicting heart attacks is crucial as it can save lives and reduce the personal and societal impact of cardiovascular diseases. Early detection allows for timely intervention, enabling individuals to make lifestyle changes and medical... more
COVID-19 has appeared in china, spread rapidly the world wide and caused with many injuries, deaths between humans. It is possible to avoid the spread of the disease or reduce its spread with the machine learning and the diagnostic... more
Artificial Intelligence (AI) has emerged as one of the most transformative technologies of the 21st century, with its applications spanning across industries from healthcare and finance to retail and transportation. This research explores... more
A typical algorithm for signal classification consists of two steps: signal preliminary transformation and classification itself. The procedures of preliminary transformation are used to extract specific features of the initial signal and... more
A typical algorithm for signal classification consists of two steps: signal preliminary transformation and classification itself. The procedures of preliminary transformation are used to extract specific features of the initial signal and... more
Classification is one of the most considerable supervised learning data mining technique used to classify predefined data sets the classification is mainly used in healthcare sectors for making decisions, diagnosis system and giving... more
The spread of omnipresent sensing technology brings with it an increasing number of innovative models. The smart mobility initiatives offer new opportunities for Intelligent Systems to maximize the utilization of real-time data that are... more
The categorization of opinions into positive, negative, or neutral facilitates information gathering, pinpointing individual weaknesses, and streamlining the decision-making process. Precision in opinion classification enables... more
Feature Selection is the preprocessing process of identifying the subset of data from large dimension data. To identifying the required data, using some Feature Selection algorithms. Like Relief, Parzen-Relief algorithms, it attempts to... more
The classification of learning objects (LOs) enables users to search for, access, and reuse them as needed. It makes e-learning as effective and efficient as possible. In this article the multilabel learning approach is represented for... more
Robotics and artificial intelligence have played a significant role in developing assistive technologies for people with motor disabilities. Brain-Computer Interface (BCI) is a communication system that allows humans to communicate with... more
Applications of learning algorithms in knowledge discovery are promising and relevant area of research. It is offering new possibilities and benefits in real-world applications, helping us understand better mechanisms of our own methods... more
Email phishing is one of the most significant cybersecurity threats in the digital era, leading to financial losses and data breaches. This study presents an AI-based real-time phishing detection system that employs machine learning... more
Remote sensing and image processing techniques have brought about a great transformation in traditional measurements by providing spatial and temporal information and have the potential to increase our knowledge in technical and... more
The classification problem has got a new importance dimension with the growing aggregated value which has been given to the Social Media such as Twitter. The huge number of small documents to be organized into subjects is challenging the... more
The growing concern over climate change has necessitated the development of advanced tools for forecasting and mitigating carbon dioxide (CO₂) emissions. Machine learning (ML) approaches have emerged as a powerful tool for improving the... more
This research project focuses on utilizing machine learning techniques to predict loan default among applicants in the context of financial organizations. Loan approval decisions carry substantial risks, and not all applicants can be... more
In this digital era where social media is present everywhere, cyberbullying has emerged as a critical issue. It adversely affects mental health and online interactions. A bilingual cyberbullying detection system has been proposed to... more
Intrusion detection is one of the most critical network security problems in the technology world. Machine learning techniques are being implemented to improve the Intrusion Detection System (IDS). In order to enhance the performance of... more
Cardiovascular disease (CVD) is a major global health issue that affects death rates significantly. This research aims to improve the early detection and diagnosis of cardiovascular illness by utilizing machine learning methods,... more
Social media networks such as Twitter and Facebook plays important roles in many aspects of our lives and affects many of our decisions. This paper presents a data mining model consists of different five classification and regression... more
Phishing is a major concern on the Internet today and many users are falling victim because of criminal's deceitful tactics. Blacklisting is still the most common defence users have against such phishing websites, but is failing to cope... more
This work is licensed under a Creative Commons Attribution 4.0 International License. The license permits unrestricted use, distribution, and reproduction in any medium, on the condition that users give exact credit to the original... more
This study aims to find out the impact of data augmentation and Synthetic Minority Over-sampling Techniques (SMOTE) implementation on the classification models performance using small and imbalanced dataset of student academic... more
Download research papers for free!