local outlier factor (LOF)

description231 papers

group0 followers

lightbulbAbout this topic

Local Outlier Factor (LOF) is an algorithm used in anomaly detection that identifies outliers in a dataset by measuring the local density deviation of a data point with respect to its neighbors. It quantifies how isolated a point is compared to its surrounding points, allowing for the detection of local anomalies.

lightbulbAbout this topic

Key research themes

1. How can Local Outlier Factor (LOF) algorithms be adapted for scalable, real-time outlier detection in dynamic, high-dimensional data environments such as data streams and spatio-temporal data?

This research theme focuses on the challenges and methods of extending LOF algorithms—originally designed for static datasets—to handle the complexities of big data streams, multidimensional data, and spatio-temporal outliers. Key issues include computational scalability, concept drift responsiveness, and integrating spatial-temporal context for improved detection accuracy. Addressing these challenges enables LOF to operate effectively in domains requiring continuous anomaly monitoring such as network intrusion detection, sensor networks, and environmental monitoring.

A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams

by Terence Soule

2021, Big Data and Cognitive Computing

Key finding: This paper systematically reviews LOF and its variants developed for data stream environments, highlighting the limitations of traditional LOF on static datasets when applied to streaming data due to concept drift and volume... Read more

articleView Paper downloadDownload

Spatio-temporal outlier detection algorithms based on computing behavioral outlierness factor

by shaaban M abbady

2024, Data & Knowledge Engineering

Key finding: The authors introduce ST-BOF, a spatio-temporal extension to LOF, enabling simultaneous evaluation of spatial and temporal contexts. Their ST-BDBCAN and Approx-ST-BDBCAN algorithms leverage ST-BOF for clustering and outlier... Read more

articleView Paper downloadDownload

Online Outlier Detection Based on Relative Neighbourhood Dissimilarity

by Nguyễn Hoàng Vũ

2024, Lecture Notes in Computer Science

Key finding: This work proposes an unsupervised online outlier detection technique for multi-dimensional data streams using Relative Neighbourhood Dissimilarity (ReND). ReND adaptively learns from streaming data under concept drift,... Read more

articleView Paper downloadDownload

Disk-Based Sampling for Outlier Detection in High Dimensional Data

by Gia uyên Nguyễn phạm

2025, cse.iitb.ac.in

Key finding: The paper presents a novel sampling-based outlier detection method designed for large, high-dimensional datasets that combines randomized partitioning with efficient sampling to create a candidate outlier set. This approach... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What methodologies and statistical measures optimize the robustness and accuracy of outlier detection in multivariate, skewed, and circular data distributions beyond classical LOF applications?

This research theme investigates adapting LOF and other related methods to specialized data types such as skewed multivariate distributions and circular data, emphasizing robustness to data characteristics that invalidate assumptions of classical methods. It explores combining LOF with statistical depth functions, adjusted boxplots, and alternative metrics for better detection performance in these non-standard data spaces critical to applications like directional data analysis, regression diagnostics, signal processing, and environmental measurements.

Outlier Detection for Multivariate Skew-Normal Data: A Comparative study

by Herve Dovoedo and

2015

Key finding: This study compares the outlier detection capability of four robust outlyingness functions tailored for multivariate skew-normal distributions, noting that traditional Mahalanobis distance methods are insufficient for skewed... Read more

articleView Paper downloadDownload

Detection of Outliers in Univariate Circular Data by Means of the Outlier Local Factor (LOF)

by Ali H Abuzaid

2022, Statistics in Transition New Series

Key finding: The authors extend LOF to univariate circular data by mapping angular observations to bivariate Cartesian coordinates and applying LOF in this transformed space. This approach overcomes limitations of existing... Read more

articleView Paper downloadDownload

Boxplot-Based Outlier Detection for the Location-Scale Family

by Herve Dovoedo and

2015

Key finding: Focusing on univariate location-scale families including skewed distributions, this paper modifies traditional boxplot fences by employing semi-interquartile ranges and controlling false positive outlier rates over sample... Read more

articleView Paper downloadDownload

Robust multiple discriminant rule using Harrell-Davis median estimator: A distribution-free approach to cellwise-casewise outliers coexistence

by Yik Siong Pang

2023, AIP Conf. Proc. of The 7th International Conference on Quantitative Sciences and its Applications (ICOQSIA2022)

Key finding: This paper proposes a combination of the distribution-free Harrell-Davis median estimator with robust covariance estimation to handle simultaneous cellwise and casewise multivariate outliers, improving classification accuracy... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can Local Outlier Factor (LOF) be integrated with advanced machine learning and deep learning techniques to improve anomaly detection in complex rule-based systems and cybersecurity?

This research avenue explores hybrid approaches that combine LOF with modern AI architectures including autoencoders, attention mechanisms, and clustering algorithms to optimize detection of anomalous patterns in rule-based knowledge bases, network security, and related domains. The focus is on enhancing feature representation, temporal dependency modeling, and leveraging unsupervised learning paradigms to complement LOF's density-based outlier scoring for improved precision and reduced false alarms.

Detecting outliers in rule-based knowledge bases using Self-Organizing Map and Local Outlier Factor algorithms

by Czesław Horyń

2023, Procedia Computer Science

Key finding: This research demonstrates the synergistic use of LOF and Self-Organizing Maps (SOM) to identify unusual or rare rules in rule-based knowledge bases, improving completeness and quality of decision support systems. LOF... Read more

articleView Paper downloadDownload

DDOS attacks detection based on attention-deep learning and local outlier factor

by Abdelkader Dairi

2023

Key finding: The paper presents a semi-supervised framework combining an attention-equipped GRU autoencoder with LOF for feature extraction and anomaly scoring in DDOS detection. The integration of temporal feature learning (via GRU and... Read more

articleView Paper downloadDownload

Hybrid Machine Learning–Statistical Method for Anomaly Detection in Flight Data

by Alan Muscat

2023, Applied sciences

Key finding: By incorporating the LOF algorithm within a hybrid machine learning and statistical framework, this study improves unsupervised anomaly detection in high-dimensional flight data monitoring. The methodology adapts LOF to... Read more

articleView Paper downloadDownload

An Efficient Hashing-based Ensemble Method for Collaborative Outlier Detection

by Kitty Li

2023

Key finding: This work enhances collaborative outlier detection by employing locality-sensitive hashing (LSH) ensemble methods combined with LOF principles, enabling mergeable and privacy-preserving model aggregation across decentralized... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in local outlier factor (LOF)

Relative Asymptotics of Orthogonal Polynomials for Perturbed Measures

by Nikos Stylianopoulos

2025, arXiv (Cornell University)

descriptionView Paper arrow_downwardDownload

CATS: Cluster-Aided Two-Step Approach for Anomaly Detection in Smart Manufacturing

by Dattaprasad Shetve

2025, Springer eBooks

In the age of smart manufacturing, there are typically multitude of sensors that are connected to each assembly line. The amount of data generated could be used to create a digital twin model of the complete process; wherein virtual... more

descriptionView Paper arrow_downwardDownload

Detection Procedure for a Single Additive Outlier and Innovational Outlier in a Bilinear Model

by Ibrahim Ibrahim Ahmad

2025, Pakistan Journal of Statistics and Operation Research

A single outlier detection procedure for data generated from BL(1,1,1,1) models is developed. It is carried out in three stages. Firstly, the measure of impact of an IO and AO denoted by IO ω , AO ω , respectively are derived based on... more

descriptionView Paper arrow_downwardDownload

A Critical Review on Outlier Detection Techniques

by ritu Gautam

2025

Outlier Detection is a Data Mining Application. Outlier contains noisy data which is researched in various domains. The various techniques are already being researched that is more generic. We surveyed on various techniques and... more

descriptionView Paper arrow_downwardDownload

An Outlier Detection-based Tree Selection Approach to Extreme Pruning of Random Forests

by Khaled Fawagreh

2025, arXiv (Cornell University)

Random Forest (RF) is an ensemble classification technique that was developed by Breiman over a decade ago. Compared with other ensemble techniques, it has proved its accuracy and superiority. Many researchers, however, believe that there... more

descriptionView Paper arrow_downwardDownload

An Outlier Ranking Tree Selection Approach to Extreme Pruning of Random Forests

by Khaled Fawagreh

2025, Communications in Computer and Information Science

An outlier ranking tree selection approach to extreme pruning of random forests. FAWAGREH, K., GABER, M.M. and ELYAN, E. 2016. An outlier ranking tree selection approach to extreme pruning of random forests. In Jayne, C. and Iliadis, L.... more

descriptionView Paper arrow_downwardDownload

Spectral analysis of 2D outlier layout

by Mihai Putinar

2025, Journal of Spectral Theory

Thompson's partition of a cyclic subnormal operator into normal and completely non-normal components is combined with a noncommutative calculus for hyponormal operators for separating outliers from the cloud, in rather general point... more

descriptionView Paper arrow_downwardDownload

Discriminative Feature Selection by Nonparametric Way with Cluster Validation

by Dharmaiah Devarapalli

2025

Feature Selection is the preprocessing process of identifying the subset of data from large dimension data. To identifying the required data, using some Feature Selection algorithms. Like Relief, Parzen-Relief algorithms, it attempts to... more

descriptionView Paper arrow_downwardDownload

A flexible outlier detector based on a topology given by graph communities

by Oriol Terrades

2025, arXiv (Cornell University)

Outlier, or anomaly, detection is essential for optimal performance of machine learning methods and statistical predictive models. It is not just a technical step in a data cleaning process but a key topic in many fields such as... more

descriptionView Paper arrow_downwardDownload

A Flexible Outlier Detector Based on a Topology Given by Graph Communities

by Oriol Terrades

2025, Big Data Research

descriptionView Paper arrow_downwardDownload

AEGR: a simple approach to gradient reversal in autoencoders for network anomaly detection

by Kasra Babaei

2025, Soft Computing

Anomaly detection is referred to as a process in which the aim is to detect data points that follow a different pattern from the majority of data points. Anomaly detection methods suffer from several wellknown challenges that hinder their... more

descriptionView Paper arrow_downwardDownload

Anomaly Detection for Raw Water Quality – A Comparative Analysis of the Local Outlier Factor Algorithm and the Random Forest Algorithms

by Kiragu Henry

2025, International Journal of Computer Applications

The increased use of real-time water quality monitoring using automated systems with sensors demands and makes it possible to identify unexpected values in time. Anomalies are brought by technical issues that are likely to prevent... more

descriptionView Paper arrow_downwardDownload

Anomaly Detection for Raw Water Quality – A Comparative Analysis of the Local Outlier Factor Algorithm and the Random Forest Algorithms

by Nahshon Mokua

2025, International Journal of Computer Applications

descriptionView Paper arrow_downwardDownload

Anomalous human behavior detection: an adaptive approach

by Klamer Schutte

2025, Proceedings of SPIE

Detection of anomalies (outliers or abnormal instances) is an important element in a range of applications such as fault, fraud, suspicious behavior detection and knowledge discovery. In this article we propose a new method for anomaly... more

descriptionView Paper arrow_downwardDownload

Fine asymptotics for Bergman polynomials over domains with corners

by Nikos Stylianopoulos

2025, arXiv (Cornell University)

Let G be a bounded simply-connected domain in the complex plane C, whose boundary Γ := ∂G is a Jordan curve, and let {pn} ∞ n=0 denote the sequence of Bergman polynomials of G. This is defined as the sequence of polynomials that are... more

descriptionView Paper arrow_downwardDownload

Estimations asymptotiques fortes pour les polynômes de Bergman sur des domaines ayant une frontière analytique par morceaux

by Nikos Stylianopoulos

2025

Let G be a bounded simply-connected domain in the complex plane C, whose boundary Γ := ∂ G is a Jordan curve, and let {p n } ∞ n=0 denote the sequence of Bergman polynomials of G. This is defined as the sequence of polynomials that are... more

descriptionView Paper arrow_downwardDownload

Random Projection in supervised non-stationary environments

by Frank-michael Schleif

2025, The European Symposium on Artificial Neural Networks

Random Projection (RP) is a popular and efficient technique to preprocess high-dimensional data and to reduce its dimensionality. While RP has been widely used and evaluated in stationary data analysis scenarios, non-stationary... more

descriptionView Paper arrow_downwardDownload

Random Projection in supervised non-stationary environments

by Frank-michael Schleif

2025, The European Symposium on Artificial Neural Networks

descriptionView Paper arrow_downwardDownload

An Outlier Ranking Tree Selection Approach to Extreme Pruning of Random Forests

by Fawagreh Naseem

2024, Communications in Computer and Information Science

An outlier ranking tree selection approach to extreme pruning of random forests. In Jayne, C. and Iliadis, L. (eds.) Communications in computer and information science, 629, Engineering applications of neural networks: proceedings of the... more

descriptionView Paper arrow_downwardDownload

Life Span Prediction in Liver Transplantation Using Convolution Neural Network

by derrick dsouza

2024, International Journal of Advance Research and Innovative Ideas in Education

Due to unavailability of prediction system, the success rate of a liver transplant is subordinate. For optimal organ allocation, MELD score is used which follows the sickest first policy. In sickest first policy, the sickest patient gets... more

descriptionView Paper arrow_downwardDownload

Dynamic Construction of Outlier Detector Ensembles With Bisecting K-Means Clustering

by Manal Abdel Wahed

2024, IEEE Access

Outlier detection (OD) is a key problem, for which numerous solutions have been proposed. To deal with the difficulties associated with outlier detection across various domains and data characteristics, ensembles of outlier detectors have... more

descriptionView Paper arrow_downwardDownload

An Outlier Detection-based Tree Selection Approach to Extreme Pruning of Random Forests

by Fawagreh Naseem

2024, arXiv (Cornell University)

descriptionView Paper arrow_downwardDownload

Data-driven prognostics using a combination of constrained K-means clustering, fuzzy modeling and LOF-based score

by Ricardo Sanz

2024, Neurocomputing

Today, failure modes characterization and early detection is a key issue in complex assets. This is due to the negative impact of corrective operations and the conservative strategies usually put in practice, focused on preventive... more

descriptionView Paper arrow_downwardDownload

CAMLPAD: Cybersecurity Autonomous Machine Learning Platform for Anomaly Detection

by Trisha Pal

2024, Advances in Intelligent Systems and Computing

As machine learning and cybersecurity continue to explode in the context of the digital ecosystem, the complexity of cybersecurity data combined with complicated and evasive machine learning algorithms leads to vast difficulties in... more

descriptionView Paper arrow_downwardDownload

Filtered Clustering Based on Local Outlier Factor in Data Mining

by Brijesh Kumar Chaurasia

2024, International Journal of Database Theory and Application

In this paper, the impact of-means and local outliner factor on data set is studied. Outlier is the observation which is different from or inconsistent with the rest of the data. However, the main challenges of outlier detection are... more

descriptionView Paper arrow_downwardDownload

Self-Organizing Kernel-based Convolutional Echo State Network for Human Actions Recognition

by Stefan Wermter

2024, The European Symposium on Artificial Neural Networks

We propose a deterministic initialization of the Echo State Network reservoirs to ensure that the activation of its internal echo state representations reflects similar topological qualities of the input signal which should lead to a... more

descriptionView Paper arrow_downwardDownload

A Flexible Outlier Detector Based on a Topology Given by Graph Communities

by Debora Gil

2024, Big Data Research

descriptionView Paper arrow_downwardDownload

Dynamic Construction of Outlier Detector Ensembles With Bisecting K-Means Clustering

by Inas Yassine

2024, IEEE Access

descriptionView Paper arrow_downwardDownload

A Review on Educational Data Mining

by Deepika Pahuja

2024

Growing interest in data and analytics in education, teaching, and learning raises the priority for increased, high-quality research Data Mining is a technique used to find out possibly new information from huge amount of data.... more

descriptionView Paper arrow_downwardDownload

A Critical Review on Outlier Detection Techniques

by Deepika Pahuja

2024

descriptionView Paper arrow_downwardDownload

An incremental and approximate local outlier probability algorithm for intrusion detection and its evaluation

by Matthew Russell

2024, Journal of Cyber Security Technology

This paper proposes a novel incremental modification to the Local Outlier Probabilities algorithm, which is commonly used for anomaly detection, to allow it to detect outliers nearly instantly in data streams. The proposed incremental... more

descriptionView Paper arrow_downwardDownload

Anomaly detection in internet of medical things with artificial intillegence

by Ahmed Mohammed

2024, Eastern-European Journal of Enterprise Technologies

Internet of things (IoT) becomes the most popular term in the recent advances in Healthcare devices. The healthcare data in the IoT process and structure is very sensitive and critical in terms of healthy and technical considerations.... more

descriptionView Paper arrow_downwardDownload

A Survey on Wind Data Pre-processing in Electricity Generation

by Mahima Susan Abraham

2024, International Journal on Cybernetics & Informatics

Wind energy integration research generally relies on complex sensors located at remote sites. The procedure for generating high-level synthetic information from databases containing large amounts of lowlevel data must therefore account... more

descriptionView Paper arrow_downwardDownload

Dynamic Construction of Outlier Detector Ensembles With Bisecting K-Means Clustering

by Manal Abdel Wahed

2024, IEEE Access

descriptionView Paper arrow_downwardDownload

Anomaly detection in internet of medical things with artificial intillegence

by Ahmed Mohammed

2024, Eastern-European Journal of Enterprise Technologies

Fuzzy Logic is used as a decision support and validation tool by using the output of LOK COF and GLOF The purpose of FL is to re-classify points to outlier or inlier if one of the three outlier detection models has a different classification. Accordingly, the proposed FL system consists of three inputs and one output as shown in (Fig. 2). All programming and com- putational implementation are carried out using MATLAB with Fuzzy logic Toolbox (MathWorks, Natick, MA, United States) [19]. The programmatic implementation is carried on using a laptop with an i7-7300U CPU® 1.90 GHz, 2494 MHz, 2 core(s) CPU Intel(R) core. A Mamdani type-FL is used to initialize the FL system, where the main components and stages of the FL structure are as follows.

Fig. 3. Membership functions of input «LOF» Fuzzification: It is used to convert three inputs (degree of being outlier or inlier) into fuzzy sets. The outputs of LOF, COF and GLOF represent the outlier score of each evaluated data. The output of each method is mapped between. The our membership functions of each input divide the input value into four main fuzzy sets using trapezoidal and triangle functions with the following labels: Completely Inlier (CD, Semi Inlier (SI), Semi Outlier (SO) and Completely Out- ier (CO) as shown in (Fig. 3) for <LOF> input. The three inputs have the same membership structure and ranges after mapping their values. The output represents the final classi- ication of data type using the same membership functions.

data is classified wrongly. Thus, it can be confirmed that FL achieved the desired goal in terms of mitigating cases of false detection of anomalous data. 5. 2. FL-based outlier detection using physical activity nonitoring dataset

The PAMAP2 data are evaluated using LOF, COF and GLOF separately and then the result of each model was used as input to the FL system. Outlier detection results in this type of data also showed the ability to improve detection by significantly increasing the accuracy, precision and recall with values of 88.2 %, 87.6 %, and 87 %, respectively (Fig. 5), where the comparison shows a significant increase in detec- tion efficiency compared to the performance of LOK, COF, and GLOF individually, which did not exceed 83 %.

variable vector like OCSVM method [20] and Ramp loss based robust one-class SVM (ROCSVM) [21]. The results of the anomaly detection test in the HAR data show the ability of the proposed FL model to give better accuracy of 98.2 % (Fig. 6) because the fuzzy logic is able to take advan- tage of the varying anomaly score values resulting from LOF, COF and GLOF. This result reflects the ability of fuzzy logic to reduce or eliminate detection of misclassified or anomalous data by im- proving accuracy, precision, and search metrics of the dataset under investigation.

descriptionView Paper arrow_downwardDownload

Outlier Resistant PCA Ensembles

by Bogdan Gabrys

2024, Springer eBooks

Statistical re-sampling techniques have been used extensively and successfully in the machine learning approaches for generation of classifier and predictor ensembles. It has been frequently shown that combining so called unstable... more

descriptionView Paper arrow_downwardDownload

Effects of a single outlier on the coefficient of determination: An empirical study

by sohel rana

2024

This article investigates the effects of outliers on the coefficient of determination, R2 which is computed by Ordinary Least Squares (OLS) estimator. It is now evident that the OLS is greatly affected by outliers and hence the R2 is also... more

descriptionView Paper arrow_downwardDownload

Dynamic Construction of Outlier Detector Ensembles With Bisecting K-Means Clustering

by Rasha Ramadan

2024, IEEE Access

descriptionView Paper arrow_downwardDownload

Anomaly detection in internet of medical things with artificial intillegence

by Ahmed Mohammed

2024, Eastern-European Journal of Enterprise Technologies

descriptionView Paper arrow_downwardDownload

Detection of Winding Axial Deformation in Power Transformers by UWB Radar Imaging

by Razieh Mosayebi

2024, arXiv (Cornell University)

In this paper, a novel method for detecting transformer winding axial displacement has been presented. In this method, which is based on UWB radar imaging, a UWB pulse is transmitted to the transformer winding and the reflection from it... more

descriptionView Paper arrow_downwardDownload

Dynamic Construction of Outlier Detector Ensembles With Bisecting K-Means Clustering

by Manal Abdelwahed

2024, IEEE Access

descriptionView Paper arrow_downwardDownload

Hybrid Machine Learning–Statistical Method for Anomaly Detection in Flight Data

by Alan Muscat

2024, Applied Sciences

This paper investigates the use of an unsupervised hybrid statistical–local outlier factor algorithm to detect anomalies in time-series flight data. Flight data analysis is an activity carried out by airlines primarily as a means of... more

descriptionView Paper arrow_downwardDownload

Class Outliers Mining: Distance-Based Approach

by Nabil M. Hewahi

2023

In large datasets, identifying exceptional or rare cases with respect to a group of similar cases is considered very significant problem. The traditional problem (Outlier Mining) is to find exception or rare cases in a dataset... more

descriptionView Paper arrow_downwardDownload

Local Outlier Factor based Data Mining Model for Three Phase Transmission Lines Faults Identification

by Nithiyananthan Kannan

2023, International Journal of Computer Applications

The main objective of this paper is to design and developed a model of power system transmission lines fault identification using Local Outlier Factor (LOF) technique based on data mining. 9 bus power system and 30 bus power systems... more

descriptionView Paper arrow_downwardDownload

Detecting outliers in rule-based knowledge bases using Self-Organizing Map and Local Outlier Factor algorithms

by Czesław Horyń

2023, Procedia Computer Science

Our research deals with intelligent decision support systems based on rule-based knowledge bases. Decision support systems use rules ”If a condition, then a decision” as a form of knowledge representation. In the process of inference,... more

descriptionView Paper arrow_downwardDownload

Outliers in rules - the comparision of LOF, COF and KMEANS algorithms

by Czesław Horyń

2023, Procedia Computer Science

The aim of the article is the analysis of using LOF, COF and Kmeans algorithms for outlier detection in rule based knowledge bases. The subject of outlier mining is very important nowadays. Outliers in rules mean unusual rules which are... more

Fig. 1. Knowledge base vs. quality of rule clusters improvement

The course of performed experiments can be described as follows. The first stage is to load data and organize it. Data subjected to hierarchical clustering must be a numeric matrix, hose rows represent individual observations while columns represent variables. In order to do this the original 10wledge bases with rules containing various number of premises are subject to transformation into a matrix in hich the rows prompt the rules and the columns all the possible attributes in the premises of the rules with conditional tributes. Such a matrix allows determining similarities between rules and combining similar rules into clusters. For e obtained hierarchical structure, we run a quality analysis of such structure taking into account seven different dicators: Dunn, Davies-Bouldin, Sillhouette, CPPC index etc (see Section 5). In our assumptions, we want to know e unusual rules constituting respectively 1, 5 and 10% of all rules in the knowledge base. The three methods: LOF, OF and K Means, analyzed only overlap to some extent in the results, i.e. only some of the rules have identified all ree methods as deviations. The subject of a separate study will be the analysis of the similarity of selected methods “detecting deviations, i.e. how similarly they indicate unusual rules. After removing the unusual rules, the quality of e clusters should improve. To check this, we re-calculate previously calculated cluster quality indicators and check how many cases the quality of clusters has improved, in how many changes in cluster quality were not observed id in how many cases the cluster quality indicator did not react at all to the atypical rules. It turns out that despite e removal of a significant number of rules in the set, the quality of the clusters created does not improve. In this ay, it will be possible to analyze the specifics of these indicators and what really affects their values. Ultimately, it il] also turn out that a specific set of data has a very large impact on the results obtained in this experiment. There e data that did not allow to improve the quality of clusters despite the removal from the collection of a significant imber of clusters. Assessment of the nature of these collections will be the subject of separate studies. There are six fferent knowledge bases on which we based the research: weather, diseases, libra, diabetes, nursery and krukenberg. of them (weather, diseases, libra, diabetes and nursery) are from a known repository UC Irvine Machine Learning epository [7]. The sixth dataset, krukenberg is a real-life knowledge base created for medical domain. The process ‘rules acquisition from the original dataset is as follows. The authors focus on knowledge representation in the form ‘rules generated automatically from data with the use of rough set theory and the LEM2 algorithm, proposed by [8], plemented in RS E'S system. It generates short rules, easily interpretated. Of course, there are many rules induction gorithms from data: generating decision rules from decision trees and algorithms for generating association rules. ‘hen the size of input data (the ones that rules are to be generated from) increases, the number of generated rules does o. diabetes data set [7] contains 768 of objects described with 8 continuous attributes. The objects are divided into /o decision classes where | means ,,tested positive for diabetes” and covers 268 objects and 0 means the opposite id covers 500 instances. Processing the data with LEM2 and RS ES which contains an implementation of the LEM2 gorithm, 490 rules have been created. The short characteristics of each knowledge base are presented in the Table |. vey differ with number of rules, number of attributes but also the type of the attribute, the length of the rules.

Table 3. The frequency of improving the quality of rule clusters vs. number of outliers.

Table 4. The frequency of clusters quality improvement vs. outlier detection method. Surprisingly, the algorithm which the most often results in no quality changes is LOF algorithm, one of the most popular outlier detection algorithms. Definitely, the optimal choice in this case would be using COF algorithm, which in 87% of all cases results in improving rule clusters quality. In our research we wanted to review the seven quality indexes in case of rules exploration as outliers. We wanted to check which index most frequently improve the clusters quality. It seems (what can be observed in Table 5) that the indexes which defenately are sensitive on outliers apearing in a given knowledge base are Davies-Boulding, Pseudo F and CPCC indexes. The index that least often reacted to the occurrence of outliers in rules is Hubert and Levine index.

Table 5. The frequency of clusters quality improvement vs. quality indexes. The last aspect of our research concerned the nature of the data. We wanted to check if the data type influences the achieved results. Figure | shows that there are knowledge bases with rules in which, in each case, after the removal of unusual rules, there has been an improvement in the quality assessment of clusters of rules (in case of krukenberg). On the other hand, the knowledge base, which in 70% of cases did not allow to improve the quality of clusters of rules despite the removal of unusual rules was weather).

descriptionView Paper arrow_downwardDownload

Detection of Outliers in Univariate Circular Data by Means of the Outlier Local Factor (LOF)

by Ali H Abuzaid

2023, Statistics in Transition New Series

The problem of outlier detection in univariate circular data was the object of increased interest over the last decade. New numerical and graphical methods were developed for samples from different circular probability distributions. The... more

descriptionView Paper arrow_downwardDownload

A Hybrid Unsupervised Density-based Approach with Mutual Information for Text Outlier Detection

by Wesam Ashour

2023, International Journal of Intelligent Systems and Applications

The detection of outliers in text documents is a highly challenging task, primarily due to the unstructured nature of documents and the curse of dimensionality. Text document outliers refer to text data that deviates from the text found... more

descriptionView Paper arrow_downwardDownload

DDOS attacks detection based on attention-deep learning and local outlier factor

by Abdelkader Dairi

2023

One of the most significant security concerns confronting network technology is the detection of distributed denial of service (DDOS). This paper introduces a semi-supervised datadriven approach to the detection of DDOS attacks. The... more

Fig. 1: Proposed approach flowchart. Moreover, the sequences of normal traffic after the training

haeatinchortnaemncan Fig. 2: Deep Recurent Autoencoder with attention mechanism.

Fig. 3: Intrusion detection using anomaly detection methods. with attention (SAE-RNN-A) combined with two anomaly detection methods, namely: Elliptic Envelope [19], and Local Outlier Factor [20].

Fig. 5: Recorded AUC for DDoS attack detection using the SAE-RNN-A based on the Elliptic Envelope. Fig. 4: Recorded Fl-score performance detection for DDoS attack using the SAE-RNN-A based on Elliptic Envelope.

Fig. 7: Recorded AUC DDoS for attack detection using the SAE-RNN-A based on LOF. Fig. 6: Recorded F1-SCORE for DDoS attack detection using the SAE-RNN-A based on LOF.

TABLE II: DDoS attack detection performance using the SAE- RNN-A based on LOF.

descriptionView Paper arrow_downwardDownload

Multiple Instance Learning for Detecting Anomalies over Sequential Real-World Datasets

by Parastoo Kamranfar

2023, arXiv (Cornell University)

Detecting anomalies over real-world datasets remains a challenging task. Data annotation is an intensive human labor problem, particularly in sequential datasets, where the start and end time of anomalies are not known. As a result, data... more

descriptionView Paper arrow_downwardDownload