Academia.eduAcademia.edu

Outline

A Fast Distance-Based Algorithm to Detect Outliers

2007, Journal of Computer Science

https://doi.org/10.3844/JCSSP.2007.944.947

Abstract

A fast distance-based algorithm for outlier detection will be proposed. It was found that the proposed algorithm reduced the number of distance calculations compared to the nestedloop algorithm. Test results were performed on different well-known data sets. The results showed that the proposed algorithm gave a reasonable amount of CPU time saving.

References (15)

  1. Zhang, S., C. Zhang and Q. Yang, 2003. Data Preparation for Data Mining. Applied Artificial Intelligence, 17(5-6): 375-381.
  2. Bolton, R. and D. J. Hand, 2002. Statistical Fraud Detection: A Review, Statistical Science, 17(3): 235-255.
  3. Lane, T. and C. E. Brodley. 1999. Temporal Sequence Learning and Data Reduction for Anomaly Detection, ACM Transactions on Information and System Security, 2(3): 295-331.
  4. Chiu, A. and A. Fu, 2003. Enhancement on Local Outlier Detection. 7th International Database Engineering and Application Symposium (IDEAS03), pp. 298-307.
  5. Knorr, E. and R. Ng, 1998. Algorithms for Mining Distance-based Outliers in Large Data Sets, Proc. the 24 th International Conference on Very Large Databases (VLDB), pp. 392-403.
  6. Hodge, V. and J. Austin, 2004. A Survey of Outlier Detection Methodologies, Artificial Intelligence Review, 22: 85-126.
  7. Knorr, E., R. Ng, and V. Tucakov, 2000. Distance- based Outliers: Algorithms and Applications. VLDB Journal, 8(3-4): 237-253.
  8. Ramaswami, S., R. Rastogi and K. Shim, 2000. Efficient Algorithm for Mining Outliers from Large Data Sets. Proc. ACM SIGMOD, pp. 427- 438.
  9. Angiulli, F. and C. Pizzuti, Outlier Mining in Large High-Dimensional Data Sets, 2005. IEEE Transactions on Knowledge and Data Engineering, 17(2): 203-215.
  10. Acuna E. and C. Rodriguez, 2004. A Meta Analysis Study of Outlier Detection Methods in Classification, Technical paper, Department of Mathematics, University of Puerto Rico at Mayaguez, available at academic.uprm.edu/~eacuna/paperout.pdf. In proceedings IPSI 2004, Venice.
  11. Shrestha, M., H. Hamilton and Y. Yao, 2006. The PDD Framework for Detecting Categories of Peculiar Data. Proc. 6th International Conf. on Data Mining (ICDM06), pp. 562-571.
  12. Bay, S. and M. Schwabacher, 2003. Mining Distance-based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule, Proc. 9th ACM SIGKDD Int. Conf. Knowledge Discovey and Data Mining, ACM Press, pp. 29-38.
  13. Eskin, E., A. Arnold, M. Prerau, L. Portnoy, and S. Stolfo, 2002. A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data," Applications of Data Mining in Computer Security, Kluwer.
  14. Angiulli, F., S Basta, and Pizzuti, 2006. Distance- Based Detection and Prediction of Outliers,. IEEE Transactions on Knowledge and Data Engineering, 18(2): 203-215..
  15. Blake, C. L. & C. J. Merz, 1998. UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/mlearn/MLRepository.html, University of California, Irvine, Department of Information and Computer Sciences.