Spatial Weighted Outlier Detection
2006, Proceedings of the 2006 SIAM International Conference on Data Mining
https://doi.org/10.1137/1.9781611972764.71Abstract
Spatial outliers are the spatial objects with distinct features from their surrounding neighbors. Detection of spatial outliers helps reveal valuable information from large spatial data sets. In many real applications, spatial objects can not be simply abstracted as isolated points. They have different boundary, size, volume, and location. These spatial properties affect the impact of a spatial object on its neighbors and should be taken into consideration. In this paper, we propose two spatial outlier detection methods which integrate the impact of spatial properties to the outlierness measurement. Experimental results on a real data set demonstrate the effectiveness of the proposed algorithms.
FAQs
AI
What distinguishes spatial outliers from traditional outliers?
Spatial outliers focus on local neighborhood differences, while traditional outliers consider global data. They analyze complex spatial data formats, such as 3D objects, unlike traditional methods.
How do the proposed algorithms improve outlier detection accuracy?
The algorithms use weighted neighborhood comparisons, assigning different impacts based on spatial attributes. For instance, they consider factors like distance and common border length.
What was the primary dataset used for algorithm validation?
The validation was conducted on the West Nile virus data from the U.S. CDC, covering veterinary cases in 2003. This dataset provided a spatial context for the outlier detection.
How does the AvgDiff algorithm differ from the weighted z value approach?
AvgDiff computes the weighted average of absolute differences, capturing variance among neighbors. In contrast, the weighted z value approach averages neighbor attribute values before comparison.
What future improvements are planned for the spatial outlier detection methods?
Plans include extending algorithms to detect outliers with multiple attributes and developing a classification-based training method. This aims to assess the importance of spatial features and their influence.
References (67)
- Harford County,MD,0.0158 York County,PA,0.0175
- Lancaster County,PA,0.0683
- 2 Hot Springs County,WY,0.0008
- Chester County,PA,0.0501
- Lebanon County,PA,0.0245
- Lebanon County,PA,0.0245
- Cecil County,MD,0.0078
- Carroll County,MD,0.0378
- Lancaster County,PA,0.0683
- Gloucester County,NJ,0.0321
- Chester County,PA,0.0501
- Cecil County,MD,0.0078
- Cumberland County,NJ,0.0063
- Salem County,NJ,0.0309
- Montgomery County,PA,0.0184
- 11 York County,PA,0.0175
- York County,PA,0.0175
- 12 Baltimore city,MD,0.0000
- Adams County,PA,0.0178 Baltimore city,MD,0.0000
- 13 Howard County,MD,0.0199
- Carroll County,MD,0.0378
- Rockwall County,TX,0.0210
- 14 McKinley County,NM,0.0002
- Frederick County,MD,0.0175
- Cumberland County,NJ,0.0063
- 15 Philadelphia County,PA,0.0029
- Howard County,MD,0.0199
- 16 Weld County,CO,0.0050
- 17 Cumberland County,NJ,0.0063
- 18 Cecil County,MD,0.0078
- Montgomery County,PA,0.0184
- Camden County,NJ,0.0087
- Monmouth County,NJ,0.0147
- 20 Baltimore County,MD,0.0129 Baltimore city,MD,0.0000 Union County,PA,0.0134
- 21 Johnson County,WY,0.0012
- Salem County,NJ,0.0309
- Ramsey County,MN,0.0149
- 22 Boulder County,CO,0.0094
- Gloucester County,NJ,0.0321
- Camden County,NJ,0.0087
- 23 Montgomery County,PA,0.0184
- Mercer County,NJ,0.0051
- Atlantic County,NJ,0.0062
- Anne Arundel County,MD,0.0158
- Howard County,MD,0.0199
- 26 Guadalupe County,NM,0.0004
- Ocean County,NJ,0.0049
- Frederick County,MD,0.0175
- 27 Hood County,TX,0.0018
- Dallas County,TX,0.0070
- 28 Arapahoe County,CO,0.0072
- Montgomery County,MD,0.0125
- 30 Tarrant County,TX,0.0085
- Hancock County,WV,0.0093
- N. R. Adam, V. P. Janeja, and V. Atluri. Neighborhood based detection of anomalies in high dimensional spatio-temporal sensor datasets. In Proceedings of the 2004 ACM symposium on Applied computing, pages 576-583, 2004.
- V. Barnett and T. Lewis. Outliers in Statistical Data. John Wiley, New York, 1994.
- T. Cheng and Z. Li. A hybrid approach to detect spatial- temporal outliers. In Proc. of the 12th International Confer- ence on Geoinformatics, pages 173-178, 2004.
- R. Haining. Spatial Data Analysis in the Social and Environ- mental Sciences. Cambridge University Press, 1993.
- J. Haslett, R. Brandley, P. Craig, A. Unwin, and G. Wills. Dynamic Graphics for Exploring Spatial Data With Applica- tion to Locating Global and Local Anomalies. The American Statistician, 45:234-242, 1991.
- C.-T. Lu, D. Chen, and Y. Kou. Algorithms for spatial outlier detection. In Proc. of the 3rd IEEE International Conference on Data Mining, 2003.
- A. Luc. Local indicators of spatial association: Lisa. Geo- graphical Analysis, 27(2):93-115, 1995.
- Y. Panatier. VARIOWIN: Software for Spatial Data Analysis in 2D. Springer-Verlag, New York, 1996.
- S. Shekhar, C. Lu, and P. Zhang. A unified approach to detecting spatial outliers. GeoInformatica, 7(2):139-166, 2003.
- S. Shekhar, C.-T. Lu, and P. Zhang. Detecting graph-based spatial outliers: algorithms and applications. In Proc. of the 7th International Conference on KDD, 2001.
- W. Tobler. Cellular geography. In Philosophy in Geography, pages 379-386. Dordrecht Reidel Publishing Company, 1979.
- J. Zhao, C.-T. Lu, and Y. Kou. Detecting region outliers in meteorological data. In Proc. of the 11th ACM-GIS, pages 49-55, 2003.
- Downloaded 06/03/20 to 34.228.24.229. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php