Predicting underwriting risk has become a major challenge due to the imbalanced datasets in the field. A realworld imbalanced dataset is used in this work with 12 variables in 30144 cases, where most of the cases were classified as... more
Many practical data mining systems such as those for fraud detection and surveillance deal with building classifiers that are not autonomous but part of a larger interactive system with an expert in the loop. The goal of these systems is... more
Many practical data mining systems such as those for fraud detection and surveillance deal with building classifiers that are not autonomous but part of a larger interactive system with an expert in the loop. The goal of these systems is... more
It is often expensive to acquire data in real-world data mining applications. Most previous data mining and machine learning research, however, assumes that a fixed set of training examples is given. In this paper, we propose an online... more
Learning with different classification costs, cost-sensitive classification Cost-Sensitive Learning is a type of learning in data mining that takes the misclassification costs (and possibly other types of cost) into consideration. The... more
AI has the potential to greatly transform the field of healthcare management through improving the accuracy of diagnoses, effectiveness of treatments, and overall experience of patients. This paper aims at reviewing the... more
The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50... more
Traditionally, software quality models are built by assuming a uniform misclassification cost. In other words, cost implications of misclassifying a fault prone module as fault free are assumed to be the same as the cost implications of... more
The usage of data is increasing day by day. There is a huge amount of data storage is required for handling the millions of twits, shares in social networks (twitter, facebook, WhatsApp, and youtube) per second. Databases are playing a... more
Artificial intelligence (AI) is increasingly integrated into healthcare systems, promising advancements in diagnostics, treatment optimization, and patient care. However, patient apprehensions regarding AI adoption persist, influenced by... more
In the states, penetration of mobile platforms at an equal speed is completely changing things around for people and enterprises making life easy and accessible. This move, however has new baggage of cyber threats associated with it that... more
Cyber threats targeting these systems are increasingly being refined as the use of online banking platforms continue to grow. To address these concerns, this paper provides an insightful idea on how integrating AI and ML algorithms can... more
Machine learning model validation is a cornerstone and a requisite before a machine learning model can be generalized or relied upon. This paper analyses the validation strategy challenges and solutions to quantify cross validation... more
Predicting hospital readmission is crucial for improving patient care and optimizing healthcare resource allocation. Traditional methods often overlook the imbalanced costs associated with different types of prediction errors. This study... more
Brain-computer interface (BCI) technology holds promise for revolutionizing interaction between humans and machines, enabling direct communication pathways based on neural signals. However, optimizing BCI applications requires overcoming... more
In the realm of pervasive computing, where interconnected devices seamlessly integrate into everyday environments, the role of middleware becomes pivotal in managing and leveraging contextual information. This study proposes a... more
In the process of bankruptcy prediction models, a class imbalanced problem has occurred which limits the performance of the models. Most prior research addressed the problem by applying resampling methods such as the synthetic minority... more
Existing supervised learning techniques can support product recommendations but are ineffective in scenarios characterized by single-class learning; i.e., training samples consisted of some positive examples and a much greater number of... more
Parkinson's disease is a neurodegenerative condition that affects billions of persons worldwide. This abstract aims to shed light on the causes and consequences of this debilitating condition. The primary cause of Parkinson's disease is... more
Classification of data has become an important research area. The process of classifying documents into predefined categories Unbalanced data set, a problem often found in real world application, can cause seriously negative effect on... more
Boosting is one of the most important recent developments in classification methodology. It can significantly improve the prediction performance of any single classification algorithm and has been successfully applied to many different... more
Health datasets typically comprise of data that are heavily skewed towards the healthy class, thus resulting in classifiers being biased towards this majority class. Due to this imbalance of data, traditional performance metrics, such as... more
The use of classification methods in real-world problems has costs that are usually neglected in the early algorithms which cause inefficiencies in practice. One of these costs, which is significant in many cases, is the cost of obtaining... more
Diabetes mellitus is a hyperglycemia-like chronic condition that is a troublesome disease. It is estimated that, according to the growing morbidity, by 2040, the world will cross 642 million diabetic patients. This means that each one of... more
Cost-sensitive learning which deals with classification problems that have non-uniform costs has attracted great attention from the machine learning and data mining communities in recent years. In this study, a rescaling based... more
Classical construction of ecological models follows one of the following two approaches: (1) either the measured data are analyzed with a statistical approach and a black-box statistical model is constructed, or (2) the model is deduced... more
The use of classification as a data mining approach for performance prediction has been studied by many eminent researchers. The objective of this study is to determine the best classification models for predicting At Risk status of... more
Loan fraud is a critical factor in the insolvency of financial institutions, so companies make an effort to reduce the loss from fraud by building a model for proactive fraud prediction. However, there are still two critical problems to... more
Conformal prediction (CP) is a wrapper around traditional machine learning models, giving coverage guarantees under the sole assumption of exchangeability; in classification problems, a CP guarantees that the error rate is at most a... more
The multiclass imbalanced data problems in data mining were interesting cases to study currently. The problems had an influence on the classification process in machine learning processes. Some cases showed that minority class in the... more
Discretization is defined as the process that divides continuous numeric values into intervals of discrete categorical values. In this article, the concept of cost-based discretization as a pre-processing step to the induction of a... more
Heart disease, one of the major causes of mortality worldwide, can be mitigated by early heart disease diagnosis. A clinical decision support system (CDSS) can be used to diagnose the subjects' heart disease status earlier. This study... more
The global need for effective disease diagnosis remains substantial, given the complexities of various disease mechanisms and diverse patient symptoms. To tackle these challenges, researchers, physicians, and patients are turning to... more
We design a new adaptive learning algorithm for misclassification cost problems that attempt to reduce the cost of misclassified instances derived from the consequences of various errors. Our algorithm (adaptive cost sensitive... more
This article has been accepted for publication in a future issue of this journal, but it is not yet the definitive version. Content may undergo additional copyediting, typesetting and review before the final publication.
Cost-sensitive learning based on Bregman divergences Santos-Rodríguez et al. 1/ 15 Outline 1 Introduction 2 Posterior probability estimation 3 Designing Bregman Divergences 4 Towards a maximum margin classifier 5 Conclusions... more
A lot of approaches, each following a different strategy, have been proposed in the literature to provide AdaBoost with cost-sensitive properties. In the first part of this series of two papers, we have presented these algorithms in a... more
Classification is one important area in machine learning that labels the class of an instance via a classifier from known-class historical data. One of the popular classifiers is k-NN, which stands for “k-nearest neighbor” and requires a... more
The paper considers a conformal prediction method for bounded regression task. A predictor was based on the Defensive Forecast algorithm and has been applied for a medical prognostic problem. These empirical results are compared and... more
The work describes an application of a recently developed machine-learning technique called Mondrian predictors to risk assessment of ovarian and breast cancers. The analysis is based on mass spectrometry profiling of human serum samples... more
Semi-supervised learning is one of the significant field in machine learning or data mining. It deals with datasets that have many unlabeled and a few labeled samples. In this study we aim to predict students' success in educational... more
In the last years, it has become critical for financial institutions to develop efficient solutions that leverage all available heterogeneous customer data. While financial crime is growing in scale and complexity, financial analysts... more
This paper presents fraud detection problem as one of the most common problems in secure banking research field, due to its importance in reducing the losses of banks and e-transactions companies. Our work will include: applying the... more
Heart disease, one of the major causes of mortality worldwide, can be mitigated by early heart disease diagnosis. A clinical decision support system (CDSS) can be used to diagnose the subjects' heart disease status earlier. This study... more
Early diseases prediction plays an important role for improving healthcare quality and can help individuals avoid dangerous health situations before it is too late. This paper proposes a disease prediction model (DPM) to provide an early... more
We provide a unifying perspective for two decades of work on cost-sensitive Boosting algorithms. When analyzing the literature 1997-2016, we find 15 distinct costsensitive variants of the original algorithm; each of these has its own... more
In cost-sensitive learning, misclassification costs can vary for different classes. This paper investigates an approach reducing a multi-class cost-sensitive learning to a standard classification task based on the data space expansion... more
This research explores Cost-Sensitive Learning (CSL) in the fraud detection domain to decrease the fraud class's incorrect predictions and increase its accuracy. Notably, we concentrate on shill bidding fraud that is challenging to detect... more
This paper describes modelling of time behaviour of phytoplankton and zooplankton in the Danish lake Glumsø with a recently developed approach to machine learning in numerical domains, called Q 2 learning. An essential part of this... more