local outlier factor (LOF)

description232 papers

group0 followers

lightbulbAbout this topic

Local Outlier Factor (LOF) is an algorithm used in anomaly detection that identifies outliers in a dataset by measuring the local density deviation of a data point with respect to its neighbors. It quantifies how isolated a point is compared to its surrounding points, allowing for the detection of local anomalies.

lightbulbAbout this topic

Key research themes

1. How can Local Outlier Factor (LOF) algorithms be adapted for scalable, real-time outlier detection in dynamic, high-dimensional data environments such as data streams and spatio-temporal data?

This research theme focuses on the challenges and methods of extending LOF algorithms—originally designed for static datasets—to handle the complexities of big data streams, multidimensional data, and spatio-temporal outliers. Key issues include computational scalability, concept drift responsiveness, and integrating spatial-temporal context for improved detection accuracy. Addressing these challenges enables LOF to operate effectively in domains requiring continuous anomaly monitoring such as network intrusion detection, sensor networks, and environmental monitoring.

A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams

by Terence Soule

2021, Big Data and Cognitive Computing

Key finding: This paper systematically reviews LOF and its variants developed for data stream environments, highlighting the limitations of traditional LOF on static datasets when applied to streaming data due to concept drift and volume... Read more

articleView Paper downloadDownload

Spatio-temporal outlier detection algorithms based on computing behavioral outlierness factor

by shaaban M abbady

2024, Data & Knowledge Engineering

Key finding: The authors introduce ST-BOF, a spatio-temporal extension to LOF, enabling simultaneous evaluation of spatial and temporal contexts. Their ST-BDBCAN and Approx-ST-BDBCAN algorithms leverage ST-BOF for clustering and outlier... Read more

articleView Paper downloadDownload

Online Outlier Detection Based on Relative Neighbourhood Dissimilarity

by Nguyễn Hoàng Vũ

2024, Lecture Notes in Computer Science

Key finding: This work proposes an unsupervised online outlier detection technique for multi-dimensional data streams using Relative Neighbourhood Dissimilarity (ReND). ReND adaptively learns from streaming data under concept drift,... Read more

articleView Paper downloadDownload

Disk-Based Sampling for Outlier Detection in High Dimensional Data

by Gia uyên Nguyễn phạm

2025, cse.iitb.ac.in

Key finding: The paper presents a novel sampling-based outlier detection method designed for large, high-dimensional datasets that combines randomized partitioning with efficient sampling to create a candidate outlier set. This approach... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What methodologies and statistical measures optimize the robustness and accuracy of outlier detection in multivariate, skewed, and circular data distributions beyond classical LOF applications?

This research theme investigates adapting LOF and other related methods to specialized data types such as skewed multivariate distributions and circular data, emphasizing robustness to data characteristics that invalidate assumptions of classical methods. It explores combining LOF with statistical depth functions, adjusted boxplots, and alternative metrics for better detection performance in these non-standard data spaces critical to applications like directional data analysis, regression diagnostics, signal processing, and environmental measurements.

Outlier Detection for Multivariate Skew-Normal Data: A Comparative study

by Herve Dovoedo and

2015

Key finding: This study compares the outlier detection capability of four robust outlyingness functions tailored for multivariate skew-normal distributions, noting that traditional Mahalanobis distance methods are insufficient for skewed... Read more

articleView Paper downloadDownload

Detection of Outliers in Univariate Circular Data by Means of the Outlier Local Factor (LOF)

by Ali H Abuzaid

2022, Statistics in Transition New Series

Key finding: The authors extend LOF to univariate circular data by mapping angular observations to bivariate Cartesian coordinates and applying LOF in this transformed space. This approach overcomes limitations of existing... Read more

articleView Paper downloadDownload

Boxplot-Based Outlier Detection for the Location-Scale Family

by Herve Dovoedo and

2015

Key finding: Focusing on univariate location-scale families including skewed distributions, this paper modifies traditional boxplot fences by employing semi-interquartile ranges and controlling false positive outlier rates over sample... Read more

articleView Paper downloadDownload

Robust multiple discriminant rule using Harrell-Davis median estimator: A distribution-free approach to cellwise-casewise outliers coexistence

by Yik Siong Pang

2023, AIP Conf. Proc. of The 7th International Conference on Quantitative Sciences and its Applications (ICOQSIA2022)

Key finding: This paper proposes a combination of the distribution-free Harrell-Davis median estimator with robust covariance estimation to handle simultaneous cellwise and casewise multivariate outliers, improving classification accuracy... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can Local Outlier Factor (LOF) be integrated with advanced machine learning and deep learning techniques to improve anomaly detection in complex rule-based systems and cybersecurity?

This research avenue explores hybrid approaches that combine LOF with modern AI architectures including autoencoders, attention mechanisms, and clustering algorithms to optimize detection of anomalous patterns in rule-based knowledge bases, network security, and related domains. The focus is on enhancing feature representation, temporal dependency modeling, and leveraging unsupervised learning paradigms to complement LOF's density-based outlier scoring for improved precision and reduced false alarms.

Detecting outliers in rule-based knowledge bases using Self-Organizing Map and Local Outlier Factor algorithms

by Czesław Horyń

2023, Procedia Computer Science

Key finding: This research demonstrates the synergistic use of LOF and Self-Organizing Maps (SOM) to identify unusual or rare rules in rule-based knowledge bases, improving completeness and quality of decision support systems. LOF... Read more

articleView Paper downloadDownload

DDOS attacks detection based on attention-deep learning and local outlier factor

by Abdelkader Dairi

2023

Key finding: The paper presents a semi-supervised framework combining an attention-equipped GRU autoencoder with LOF for feature extraction and anomaly scoring in DDOS detection. The integration of temporal feature learning (via GRU and... Read more

articleView Paper downloadDownload

Hybrid Machine Learning–Statistical Method for Anomaly Detection in Flight Data

by Alan Muscat

2023, Applied sciences

Key finding: By incorporating the LOF algorithm within a hybrid machine learning and statistical framework, this study improves unsupervised anomaly detection in high-dimensional flight data monitoring. The methodology adapts LOF to... Read more

articleView Paper downloadDownload

An Efficient Hashing-based Ensemble Method for Collaborative Outlier Detection

by Kitty Li

2023

Key finding: This work enhances collaborative outlier detection by employing locality-sensitive hashing (LSH) ensemble methods combined with LOF principles, enabling mergeable and privacy-preserving model aggregation across decentralized... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in local outlier factor (LOF)

Enhancing Data Analysis with Noise Removal

by Gaurav Pandey

2006, IEEE Transactions on Knowledge and Data Engineering

Removing objects that are noise is an important goal of data cleaning as noise hinders most types of data analysis. Most existing data cleaning methods focus on removing noise that is the result of low-level data errors that result from... more

descriptionView Paper arrow_downwardDownload

Enhancing Data Analysis with Noise Removal

by Gaurav Pandey

2005

descriptionView Paper arrow_downwardDownload

DATA DRIVEN SOFT SENSOR FOR CONDITION MONITORING OF SAMPLE HANDLING SYSTEM (SHS

by Computer Science & Information Technology (CS & IT) Computer Science Conference Proceedings (CSCP)

Gas sample is conditioned using sample handling system (SHS) to remove particulate matter and moisture content before sending it through Continuous Emission Monitoring (CEM) devices. The performance of SHS plays a crucial role in reliable... more

descriptionView Paper arrow_downwardDownload

Enhancing data analysis with noise removal

by Michael Steinbach

2000, IEEE Transactions on Knowledge and Data Engineering

Fig. 1. The cluster nature of hyperclique patterns on the LA1 data set. entropy of hyperclique patterns will be less than 0.1 at all the given minimum support thresholds. This support thresholds. This means that frequent patterns tend to include objects from different classes.

then measured by using supervised learning metrics, such as recall and false-positive errors [24]. Fig. 2. A Data Mining Framework for Validating Data Cleaning Techniques at the Data Analysis Stage. he raw data is divided into training and test data and the effectiveness of data cleaning techniques an There is limited research on validation methodologies for data cleaning techniques at the data analysis framework for automatically validating the performance of data cleaning techniques. More specifically,

Fig. 3. The Experimental Evaluation Process for Data Analysis. TABLE VI THE EXPERIMENTAL PARAMETERS FOR DATA ANALYSIS.

Fig. 4. The impact of noise removal techniques on the performance of clustering analysis for ADS and WAP in terms of entropy. better (higher) F-measure values than other noise removal techniques for the most experimental cases.

Fig. 5. The impact of noise removal techniques on the performance of clustering analysis for OH8 and WEST5 in terms of entropy. In summary, regardless of the data set, HCleaner tends to be the best or close to the best technique

Fig. 6. The impact of noise removal techniques on the performance of clustering analysis for the yeast gene expression data in terms of entropy.

Fig. 7. The impact of noise removal techniques on the performance of clustering analysis for the ADS data set in terms of F-measure.

Fig. 8. The impact of noise removal on the results of association analysis for WEST5 and ADS in terms of the IS measure

Fig. 9. The impact of noise removal on the results of association analysis for REO and OH8 in terms of the IS measure D. Sensitivity Analysis

Fig. 10. The effect of the number of clusters on the performance of CCleaner for OH8 and WEST5 with respect to entropy. study was restricted to unsupervised data mining techniques at the data analysis stage.

A SAMPLE TRANSACTION DATA SET. TABLE | Hyperclique Patterns. Unlike frequent patterns, a hyperclique pattern contains items that are strongly 60% / 80% = 75%. of every other item that belongs to the same hyperclique pattern. The h-confidence measure is specifically

EXAMPLES OF HY PERCLIQUE PATTERNS OF WORDS OF THE LAI DATA SET. Table II shows some hyperclique patterns identified from words of the LA1 data set at the h-confidence Definition 2: A pattern X is a hyperclique pattern if hconf(X) > h., where h, is a user-specified

In addition, Table III shows some of the interesting hyperclique patterns extracted from a real-life table, the hyperclique pattern {season, team, game, play} is from the ‘sports’ category.

objects that are designated as noise. In some data sets, however, setting the support threshold to zero leads Je first derive all size-3 hyperclique patterns at a given h-confidence threshold h, from the transaction

CHARACTERISTICS OF THE Y EAST GENE EXPRESSION DATA SET. web pages. Some characteristics of these data sets are shown in Table IV. chine Learning repository'. The ADS data set represents a set of possible advertisements on Internet

descriptionView Paper arrow_downwardDownload

Enhancing data analysis with noise removal

by Gaurav Pandey

2006, Knowledge and Data Engineering, IEEE Transactions on

Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and... more

descriptionView Paper arrow_downwardDownload

Enhancing data analysis with noise removal

by Gaurav Pandey

2000, IEEE Transactions on Knowledge and Data Engineering

descriptionView Paper arrow_downwardDownload

Enhancing Data Analysis with Noise Removal

by Gaurav Pandey

2006, IEEE Transactions on Knowledge and Data Engineering