Academia.eduAcademia.edu

Outline

Data Mining, Big Data, Data Analytics

2019, IGI Global eBooks

https://doi.org/10.4018/978-1-5225-7501-6.CH006

Abstract

Big data analytics (BDA) is a systematic approach for analyzing and identifying different patterns, relations, and trends within a large volume of data. In this paper, we apply BDA to criminal data where exploratory data analysis is conducted for visualization and trends prediction. Several the state-of-theart data mining and deep learning techniques are used. Following statistical analysis and visualization, some interesting facts and patterns are discovered from criminal data in San Francisco, Chicago, and Philadelphia. The predictive results show that the Prophet model and Keras stateful LSTM perform better than neural network models, where the optimal size of the training data is found to be three years. These promising outcomes will benefit for police departments and law enforcement organizations to better understand crime issues and provide insights that will enable them to track activities, predict the likelihood of incidents, effectively deploy resources and optimize the decision making process. INDEX TERMS Big data analytics (BDA), data mining, data visualization, neural network, time series forecasting.

References (57)

  1. A. Gandomi and M. Haider, ''Beyond the hype: Big data concepts, meth- ods, and analytics,'' Int. J. Inf. Manage., vol. 35, no. 2, pp. 137-144, Apr. 2015.
  2. J. Zakir and T. Seymour, ''Big data analytics,'' Issues Inf. Syst., vol. 16, no. 2, pp. 81-90, 2015.
  3. Y. Wang, L. Kung, W. Y. C. Wang, and C. G. Cegielski, ''An integrated big data analytics-enabled transformation model: Application to health care,'' Inf. Manage., vol. 55, no. 1, pp. 64-79, Jan. 2018.
  4. U. Thongsatapornwatana, ''A survey of data mining techniques for ana- lyzing crime patterns,'' in Proc. 2nd Asian Conf. Defence Technol., Chiang Mai, Thailand, 2016, pp. 123-128.
  5. W. Raghupathi and V. Raghupathi, ''Big data analytics in healthcare: Promise and potential,'' Health Inf. Sci. Syst., vol. 2, no. 1, pp. 1-10, Feb. 2014.
  6. J. Archenaa and E. A. M. Anita, ''A survey of big data analytics in healthcare and government,'' Procedia Comput. Sci., vol. 50, pp. 408-413, Apr. 2015.
  7. A. Londhe and P. Rao, ''Platforms for big data analytics: Trend towards hybrid era,'' in Proc. Int. Conf. Energy, Commun., Data Anal. Soft Comput. (ICECDS), Chennai, 2017, pp. 3235-3238.
  8. W. Grady, H. Parker, and A. Payne, ''Agile big data analytics: AnalyticsOps for data science,'' in Proc. IEEE Int. Conf. Big Data, Boston, MA, USA, Dec. 2017, pp. 2331-2339.
  9. R. Vatrapu, R. R. Mukkamala, A. Hussain, and B. Flesch, ''Social set analysis: A set theoretical approach to Big Data analytics,'' IEEE Access, vol. 4, pp. 2542-2571, 2016.
  10. Y. Zhang, S. Ren, Y. Liu, and S. Si, ''A big data analytics architecture for cleaner manufacturing and maintenance processes of complex products,'' J. Cleaner Prod., vol. 142, no. 2, pp. 626-641, Jan. 2017.
  11. E. W. Ngai, A. Gunasekaran, S. F. Wamba, S. Akter, and R. Dubey, ''Big data analytics in electronic markets,'' Electron. Markets, vol. 27, no. 3, pp. 243-245, Aug. 2017.
  12. Y.-Y. Liu, F.-M. Tseng, and Y.-H. Tseng, ''Big Data analytics for fore- casting tourism destination arrivals with the applied Vector Autoregres- sion model,'' Technol. Forecasting Social Change, vol. 130, pp. 123-134, May 2018.
  13. D. Fisher, M. Czerwinski, S. Drucker, and R. DeLine, ''Interactions with big data analytics,'' Interactions, vol. 19, no. 3, pp. 50-59, Jun. 2012.
  14. C.-H. Yu, M. Morabito, P. Chen, and W. Ding, ''Hierarchical spatio- temporal pattern discovery and predictive modeling,'' IEEE Trans. Knowl. Data Eng., vol. 28, no. 4, pp. 979-993, Apr. 2016.
  15. S. Musa, ''Smart cities-A road map for development,'' IEEE Potentials, vol. 37, no. 2, pp. 19-23, Mar./Apr. 2018.
  16. S. Wang, X. Wang, P. Ye, Y. Yuan, S. Liu, and F.-Y. Wang, ''Parallel crime scene analysis based on ACP approach,'' IEEE Trans. Computat. Social Syst., vol. 5, no. 1, pp. 244-255, Mar. 2018.
  17. S. Yadav, A. Yadav, R. Vishwakarma, N. Yadav, and M. Timbadia, ''Crime pattern detection, analysis & prediction,'' in Proc. IEEE Int. Conf. Electron., Commun. Aerosp. Technol., Coimbatore, India, Apr. 2017, pp. 225-230.
  18. N. Baloian, C. E. Bassaletti, M. Fernández, O. Figueroa, P. Fuentes, R. Manasevich, M. Orchard, S. Peñafiel, J. A. Pino, and M. Vergara, ''Crime prediction using patterns and context,'' in Proc. 21st IEEE Int. Conf. Comput. Supported Cooperat. Work Design, Wellington, New Zealand, Apr. 2017, pp. 2-9.
  19. X. Zhao and J. Tang, ''Exploring transfer learning for crime prediction,'' in Proc. IEEE Int. Conf. Data Mining Workshops, New Orleans, LA, USA, Nov. 2017, pp. 1158-1159.
  20. S. Wu, J. Male, and E. Dragut, ''Spatial-temporal campus crime pattern mining from historical alert messages,'' in Proc. Int. Conf. Comput., Netw. Commun., Santa Clara, CA, USA, 2017, pp. 778-782.
  21. K. R. S. Vineeth, T. Pradhan, and A. Pandey, ''A novel approach for intelligent crime pattern discovery and prediction,'' in Proc. Int. Conf. Adv. Commun. Control Comput. Technol., Ramanathapuram, India, 2016, pp. 531-538.
  22. C. R. Rodríguez, D. M. Gomez, and M. A. M. Rey, ''Forecasting time series from clustering by a memetic differential fuzzy approach: An application to crime prediction,'' in Proc. IEEE Symp. Ser. Comput. Intell., Honolulu, HI, USA, Nov./Dec. 2017, pp. 1-8.
  23. A. Joshi, A. S. Sabitha, and T. Choudhury, ''Crime analysis using K-means clustering,'' in Proc. 3rd Int. Conf. Comput. Intell. Netw., Odisha, India, 2017, pp. 33-39.
  24. N. M. M. Noor, W. M. F. W. Nawawi, and A. F. Ghazali, ''Supporting decision making in situational crime prevention using fuzzy association rule,'' in Proc. Int. Conf. Comput., Control, Informat. Appl. (IC3INA), Jakarta, Indonesia, 2013, pp. 225-229.
  25. M. Wang, F. Zhang, H. Guan, X. Li, G. Chen, T. Li, and X. Xi, ''Hybrid neural network mixed with random forests and Perlin noise,'' in Proc. 2nd IEEE Int. Conf. Comput. Commun. (ICCC), Chengdu, China, Oct. 2016, pp. 1937-1941.
  26. Z. Wang, D. Zhang, M. Sun, J. Jiang, and J. Ren, ''A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos,'' Neurocomputing, vol. 287, pp. 68-83, Apr. 2018.
  27. J. Ren and J. Jiang, ''Hierarchical modeling and adaptive clustering for real-time summarization of rush videos,'' IEEE Trans. Multimedia, vol. 11, no. 5, pp. 906-917, Aug. 2009.
  28. J. Han, G. Cheng, L. Guo, J. Ren, and D. Zhang, ''Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning,'' IEEE Trans. Geosci. Remote Sens., vol. 53, no. 6, pp. 3325-3337, Jun. 2014.
  29. Y. Yan, J. Ren, H. Zhao, G. Sun, Z. Wang, J. Zheng, S. Marshall, and J. Soraghan, ''Cognitive fusion of thermal and visible imagery for effective detection and tracking of pedestrians in videos,'' Cognit. Comput., vol. 10, no. 1, pp. 94-104, Feb. 2018.
  30. Y. Yan, J. Ren, G. Sun, H. Zhao, J. Han, X. Li, S. Marshall, and J. Zhan, ''Unsupervised image saliency detection with gestalt-laws guided opti- mization and visual attention based refinement,'' Pattern Recognit., vol. 79, pp. 65-78, Jul. 2018.
  31. Z. Zhao, S. Tu, J. Shi, and R. Rao, ''Time-weighted LSTM model with redefined labeling for stock trend prediction,'' in Proc. IEEE 29th Int. Conf. Tools Artif. Intell. (ICTAI), Boston, MA, USA, Nov. 2017, pp. 1210-1217.
  32. J. Dai, G. Sheng, X. Jiang, and H. Song, ''LSTM networks for the trend prediction of gases dissolved in power transformer insulation oil,'' in Proc. 12th Int. Conf. Properties Appl. Dielectr. Mater., Xi'an, China, 2018, pp. 666-669.
  33. H. Kashef, M. Abdel-Nasser, and K. Mahmoud, ''Power loss estimation in smart grids using a neural network model,'' in Proc. Int. Conf. Innov. Trends Comput. Eng. (ITCE), Aswan, Egypt, 2018, pp. 258-263.
  34. H. Hassani, X. Huang, M. Ghodsi, and E. S. Silva, ''A review of data mining applications in crime,'' Stat. Anal. Data Mining, ASA Data Sci. J., vol. 9, no. 3, pp. 139-154, Apr. 2016.
  35. Z. Jia, C. Shen, Y. Chen, T. Yu, X. Guan, and X. Yi, ''Big-data analysis of multi-source logs for anomaly detection on network-based system,'' in Proc. 13th IEEE Conf. Autom. Sci. Eng. (CASE), Xi'an, China, Aug. 2017, pp. 1136-1141.
  36. A. Agresti, An Introduction to Categorical Data Analysis, 3rd ed. Hoboken, NJ, USA: Wiley, 2018.
  37. M. Huda, A. Maseleno, M. Siregar, R. Ahmad, K. A. Jasmi, N. H. N. Muhamad, and P. Atmotiyoso, ''Big data emerging technology: Insights into innovative environment for online learning resources,'' Int. J. Emerg. Technol. Learn., vol. 13, no. 1, pp. 23-36, Jan. 2018.
  38. V. Viswanathan and S. R. Viswanathan, Data Analysis Cookbook, 2nd ed. Birmingham, U.K.: Packt Publishing Ltd, 2015, pp. 30-39.
  39. R. Boix, B. De Miguel-Molina, and J. L. Hervás-Oliver, ''Micro- geographies of creative industries clusters in Europe: From hot spots to assemblages,'' Papers Regional Sci., vol. 94, no. 4, pp. 753-772, Jan. 2015.
  40. Z. Krizan and A. D. Herlache, ''Sleep disruption and aggression: Impli- cations for violence and its prevention,'' Psychol. Violence, vol. 6, no. 4, pp. 542-552, Oct. 2016.
  41. I. N. da Silva and D. H. Spatti, ''Introduction,'' in Artificial Neural Net- works. Cham, Switzerland: Springer, 2017, pp. 3-19.
  42. R. J. Hyndman and G. Athanasopoulos, Forecasting: Principles and Prac- tice, 2nd ed. OTexts, 2018, pp. 333-339.
  43. K. Greff, R. K. Srivastava, J. Koutnìk, B. R. Steunebrink, and J. Schmidhuber, ''LSTM: A search space odyssey,'' IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 10, pp. 2222-2232, Oct. 2017.
  44. J. Lai, B. Chen, S. Tong, K. Yu, and T. Tan, ''Phone-aware LSTM-RNN for voice conversion,'' in Proc. IEEE 13th Int. Conf. Signal Process. (ICSP), Chengdu, China, Nov. 2016, pp. 177-182.
  45. P. Schober, C. Boer, and L. A. Schwarte, ''Correlation coefficients: Appro- priate use and interpretation,'' Anesthesia Analgesia, vol. 126, no. 5, pp. 1763-1768, May 2018.
  46. American Cities With the Worst Income Inequality. [Online]. Avail- able: https://www.cbsnews.com/media/9-american-cities-with-the-worst- income-inequality/
  47. M. Lofstrom and S. Raphael, ''Crime, the Criminal Justice System, and Socioeconomic Inequality,'' J. Econ. Perspect., vol. 30, no. 2, pp. 26-103, Mar. 2016.
  48. L. Zhou, J. Wang, A. V. Vasilakos, and S. Pan, ''Machine learning on big data: Opportunities and challenges,'' Neurocomputing, vol. 237, pp. 350-361, May 2017.
  49. M. Injadat, F. Salo, and A. B. Nassif, ''Data mining techniques in social media: A survey,'' Neurocomputing, vol. 214, pp. 654-670, Nov. 2016.
  50. Y. Gao, Y. Xia, J. Qiao, and S. Wu, ''Solution to gang crime based on graph theory and analytical hierarchy process,'' Neurocomputing, vol. 140, pp. 121-127, Sep. 2014.
  51. S. Zheng, S. Chen, L. Yang, J. Zhu, Z. Luo, J. Hu, and X. Yang, ''Big data processing architecture for radio signals empowered by deep learning: Concept, experiment, applications and challenges,'' IEEE Access, vol. 6, pp. 55907-55922, 2018.
  52. J. Zhao, Y. Gao, Y. Qu, H. Yin, Y. Liu, and H. Sun, ''Travel time prediction: Based on gated recurrent unit method and data fusion,'' IEEE Access, vol. 6, pp. 70463-70472, 2018.
  53. K. Niu, H. Zhang, C. Cheng, C. Wang, and T. Zhou, ''A novel spatio- temporal model for city-scale traffic speed prediction,'' IEEE Access, vol. 7, pp. 30050-30057, 2019.
  54. J. Peral, A. Ferrández, D. Gil, E. Kauffmann, and H. Mora, ''A review of the analytics techniques for an efficient management of online forums: An architecture proposal,'' IEEE Access, vol. 7, pp. 12220-12240, 2019.
  55. L. Stopar, P. Skraba, D. Mladenic, and M. Grobelnik, ''StreamStory: Exploring multivariate time series on multiple scales,'' IEEE Trans. Vis. Comput. Graphics, vol. 25, no. 4, pp. 1788-1802, Apr. 2019.
  56. L. Guo, X. Cai, F. Hao, D. Mu, C. Fang, and L. Yang, ''Exploiting fine- grained co-authorship for personalized citation recommendation,'' IEEE Access, vol. 5, pp. 12714-12725, 2017.
  57. MINGCHEN FENG received the M.S. degree in software engineering from the School of Software and Microelectronics, Northwestern Polytechni- cal University, Xi'an, China, in 2015, where he is currently pursuing the Ph.D. degree with the School of Computer Science. He was a Visiting Researcher with the Department of Electronic and Electrical Engineering, University of Strathclyde, Glasgow, U.K., in 2018. His research interests include big data analytics, data mining, machine learning, deep learning, and data visualization. JIANGBIN ZHENG received the B.S., M.S., and Ph.D. degrees in computer science from North- western Polytechnical University, in 1993, 1996, and 2002, respectively. From 2000 to 2002, he was a Research Assis- tant with The Hong Kong Polytechnic University, Hong Kong. From 2004 to 2005, he was a Research Assistant with The University of Sydney, Sydney, Australia. Since 2009, he has been a Professor and Ph.D. Supervisor with the School of Com- puter Science, Northwestern Polytechnical University. His research interests include intelligent information processing, visual computing, multimedia signal processing, big data, and soft engineering. He has published over 100 peer-reviewed journal/conference papers covering a wide range of topics in image/video analytics, pattern recognition, machine learning, and big data analytics. JINCHANG REN received the B.E. degree in computer software, the M.Eng. degree in image processing, and the D.Eng. degree in computer vision from Northwestern Polytechnical Univer- sity, Xi'an, China. He received the Ph.D. degree in electronic imaging and media communication from Bradford University, Bradford, U.K. He is currently a Senior Lecturer with the Centre for excellence for Signal and Image Pro- cessing (CeSIP), and also the Deputy Director of the Strathclyde Hyperspectral Imaging Centre, University of Strathclyde, Glasgow, U.K. He has published over 200 peer-reviewed journal/conferences papers, and acts as an Associate Editor for six international journals, includ- ing the IEEE-JSTARS and the Journal of The Franklin Institute. His research interests include visual computing and multimedia signal processing, espe- cially on semantic content extraction for video analysis and understanding, and more recently hyperspectral imaging.