Journal of Advanced Research in Dynamical and Control Systems, 2019
The widely used pattern classification technique, Linear Discriminant Analysis (LDA), creates a c... more The widely used pattern classification technique, Linear Discriminant Analysis (LDA), creates a classifier to allocate new observations into one of the two known groups based on training sample mean vectors and covariance matrices. The optimality of the classifier's performance depends on the accuracy of the estimators. The default classical estimators are known to be easily influenced by contaminated data. Nevertheless, the influence of data contamination can be reduced by several approaches including trimming. One of the existing trimming approaches is through distance-based trimmed mean, but this approach still has its drawback as it uses the sensitiveto-contamination classical mean as the location estimator. Thus, to overcome the sensitivity of the location estimator to contamination, distance-based trimmed median is proposed to handle the issue. In this paper, three discriminants rules are developed; classical (CLDR), distance based trimmed mean (RLDR M) and distance based trimmed median (RLDR T). CLDR was constructed using classical estimators, while the two robust classifiers constructed using αtrimmed mean and α-trimmed median paired with robust covariance respectively producing RLDR M and RLDR T. Simulation study showed that classifier constructed using CLDR fared the worst as compared to the RLDR T and RLDR M when outliers exist. RLDR T is deemed the best among the investigated rules.
The presence of outliers in a dataset can cause the outcome of classical statistical tools to be ... more The presence of outliers in a dataset can cause the outcome of classical statistical tools to be inaccurate. Especially in a multivariate context, where researchers have to deal with either or both cellwise and casewise outliers. This study investigated the accuracy of the Multiple Discriminant Rule (MDR) when both cellwise and casewise outliers exist in a proportionate manner. Classical MDR (CMDR) was constructed using the classical sample mean (x ̅ ) and sample covariance (S) while Robust MDR (RMDRHL) was constructed using the Hodges-Lehmann estimator (𝛉 ̂𝐇𝐋 ) and Robust Covariance (SR). The simulation was carried out where cellwise outliers were shifted in location value and casewise outliers were involved with location, covariance and dual influence. Based on the simulation results, despite the performance of both CMDR and RMDRHL being quite close when dealing with cellwise-location and casewise-location outliers, RMDRHL outperformed CMDR in both cellwise-location and casewise-covariance as well as cellwise-location and casewise-dual conditions. In summary, the use of 𝛉 ̂𝐇𝐋 in robustifying MDR was competent even though dealing with outliers percentage beyond its tolerance.
AIP Conf. Proc. of The 7th International Conference on Quantitative Sciences and its Applications (ICOQSIA2022), 2023
Multivariate data may be contaminated by cellwise and/or casewise outliers. Cellwise outliers are... more Multivariate data may be contaminated by cellwise and/or casewise outliers. Cellwise outliers are individual data points within a variable that are extreme whereas casewise outliers are observations that come from a different distribution. Similar to other parametric methods, the Classical Multiple Discriminant Rule (CMDR) achieve optimal performance only when the normality assumption is fulfilled. The coexistence of cellwise-casewise outliers can disrupt the data distribution of the sample. Thus, in order to alleviate the problem, this paper employed a distribution-free estimator, Harrell-Davis Median (𝛉̂𝐇𝐃), together with Robust Covariance (𝐒𝐑) to construct Robust MDR (RMDRHD). The MDRs were evaluated based on misclassification rate via simulation study. The simulation results show that RMDRHD is able to achieve consistently lower misclassification rate than CMDR. Overall, the findings confirmed that the use of the distribution-free 𝛉̂𝐇𝐃 to robustify MDR is practical when dealing with both cellwise and casewise outliers.
Multivariate outliers can exist in two forms, casewise and cellwise. Data collection typically co... more Multivariate outliers can exist in two forms, casewise and cellwise. Data collection typically contains unknown proportion and types of outliers which can jeopardize the location estimation and affect research findings. In cases where the two coexist in the same data set, traditional distance-based trimmed mean and coordinate-wise trimmed mean are unable to perform well in estimating location measurement. Distance-based trimmed mean suffers from leftover cellwise outliers after the trimming whereas coordinate-wise trimmed mean is affected by extra casewise outliers. Thus, this paper proposes new robust multivariate location estimation known as α-distance-based trimmed median (� (,)) to deal with both types of outliers simultaneously in a data set. Simulated data were used to illustrate the feasibility of the new procedure by comparing with the classical mean, classical median and α-distance-based trimmed mean. Undeniably, the classical mean performed the best when dealing with clean data, but contrarily on contaminated data. Meanwhile, classical median outperformed distance-based trimmed mean when dealing with both casewise and cellwise outliers, but still affected by the combined outliers' effect. Based on the simulation results, the proposed � (,) yields better location estimation on contaminated data compared to the other three estimators considered in this paper. Thus, the proposed � (,) can mitigate the issues of outliers and provide a better location estimation.
The commonly employed classical linear discriminant rule, based on classical mean and covariance,... more The commonly employed classical linear discriminant rule, based on classical mean and covariance, are highly sensitive to outliers. Therefore, outlier influence on location and scale estimation will affect the accuracy of a discriminant rule and lead to high misclassification rates. The past studies used classical Mahalanobis Squared Distance (MSD) to alleviate the problem. However, the highly sensitive mean and covariance shortcoming can still affect the distance computation, causing masking and swamping effects. In a previous study, researchers proposed a double trimming procedure that adopted MSD-based α-trimmed mean into MSD-based α-trimmed median to construct a robust classifier. However, the proposed procedure has an overlooked flaw because the procedure employed the MSD in the computation. Thus, this study proposed to employ a robust MSD for the distance-based trimmed median procedure. The improvised trimmed median was then used to construct a robust linear discriminant rule ...
Uploads
Papers by Yik Siong Pang