Big data analytics (BDA) is a systematic approach for analyzing and identifying different patterns, relations, and trends within a large volume of data. In this paper, we apply BDA to criminal data where exploratory data analysis is conducted for visualization and trends prediction. Several the state-of-theart data mining and deep learning techniques are used. Following statistical analysis and visualization, some interesting facts and patterns are discovered from criminal data in San Francisco, Chicago, and Philadelphia. The predictive results show that the Prophet model and Keras stateful LSTM perform better than neural network models, where the optimal size of the training data is found to be three years. These promising outcomes will benefit for police departments and law enforcement organizations to better understand crime issues and provide insights that will enable them to track activities, predict the likelihood of incidents, effectively deploy resources and optimize the decision making process. INDEX TERMS Big data analytics (BDA), data mining, data visualization, neural network, time series forecasting.
INTERNATIONAL JOURNAL OF ADVANCE RESEARCH, IDEAS AND INNOVATIONS IN TECHNOLOGY
Due to the escalation in the rate of crimes, there is a requirement of the system that will detect and predict crimes at the dynamic time. The objective of this survey is to learn Data Mining techniques that will go on to help in detecting and predicting crimes using association rule mining, decision trees and naive Bayes, k-means clustering and Machine learning techniques such as deep neural network and artificial neural network. Noticeable findings from this survey were that when the dataset instances have a large number of missing values pre-processing becomes an important task and crime does not occur uniformly across urban landscapes but concentrates in specific areas. Hence, predicting crime hotspots is a very crucial task and also applying post-processing will help in decreasing the rate of crimes.
Predicting crime using machine learning and deep learning techniques has gained considerable attention from researchers in recent years, focusing on identifying patterns and trends in crime occurrences. This review paper examines over 150 articles to explore the various machine learning and deep learning algorithms applied to predict crime. The study provides access to the datasets used for crime prediction by researchers and analyzes prominent approaches applied in machine learning and deep learning algorithms to predict crime, offering insights into different trends and factors related to criminal activities. Additionally, the paper highlights potential gaps and future directions that can enhance the accuracy of crime prediction. Finally, the comprehensive overview of research discussed in this paper on crime prediction using machine learning and deep learning approaches serves as a valuable reference for researchers in this field. By gaining a deeper understanding of crime prediction techniques, law enforcement agencies can develop strategies to prevent and respond to criminal activities more effectively.
The main aim of this project is to explore the data and then draw out the distinct questions that come to mind at first glance of the data and try to answer those questions using the machine learning based crime prediction. Some interesting facts and patterns are discovered from criminal data which is been taken from the Kaggle it contains the data set of crime in Vancouver. Which includes the data from 2003-01-01 to 2017-07-13 it contains 530,652records. The Predictive model shows that the prophet model handles outliers well and also it is robust to missing data and shifts in the trend. These outcomes are going to benefit the police departments to better understand crime issues and provide insights that will enable them to track activities, predict the likelihood of incidents, effectively deploy resources and optimize the decision-making process.
International Journal of Recent Technology and Engineering
Big data analytics is a field in which we analyse and process information from large or convoluted data sets to be managed by methods of data-processing. Big data analytics is used in analysing the data and helps in predicting the best outcome from the data sets. Big data analytics can be very useful in predicting crime and also gives the best possible solution to solve that crime. In this system we will be using the past crime data set to find out the pattern and through that pattern we will be predicting the range of the incident. The range of the incident will be determined by the decision model and according to the range the prediction will be made. The data sets will be nonlinear and in the form of time series so in this system we will be using the prophet model algorithm which is used to analyse the non-linear time series data. The prophet model categories in three main category and i.e. trends, seasonality, and holidays. This system will help crime cell to predict the possibl...
The continued urbanization poses several challenges for law enforcement agencies to ensure a safe and secure environment. Countries are spending a substantial amount of their budgets to control and prevent crime. However, limited efforts have been made in the crime prediction area due to the deficiency of spatiotemporal crime data. Several machine learning, deep learning, and time series analysis techniques are exploited, but accuracy issues prevail. Thus, this study proposed a Bidirectional Long Short Term Memory (Bi-LSTM) and Exponential Smoothing (ES) hybrid for crime forecasting. The proposed technique is evaluated using New York City crime data from 2010–2017. The proposed approach outperformed as compared to state-of-the-art Seasonal Autoregressive Integrated Moving Averages (SARIMA) with low Mean Absolute Percentage Error (MAPE) (0.3738, 0.3891, 0.3433,0.3964), Root Mean Square Error (RMSE)(13.146, 13.669, 13.104, 13.77), and Mean Absolute Error (MAE) (9.837, 10.896, 10.598, ...
Maǧallaẗ Al-Ǧamʿiyyaẗ Al-Miṣriyyaẗ Li Nuẓum Al-Maʿlūmāt wa Tiknūlūğyā Al-Ḥāsibāt (Print), 2023
Big data is simply large, voluminous data collected from different sources which could be structured or unstructured. The ancient processing system may not be successful in processing such voluminous data. Big data analytics (BDA) uses extensive techniques and tools for analyzing large. This Paper introduced a solution to the crime prediction problem using machine learning algorithms. to classify this monstrous data. SVM and K-NN, Random Forest, and MLP algorithms solve this problem. In this Paper addresses, the researcher finds the best SVM algorithm, the main thing to keep in mind here is that these are to give you the most accurate answer possible as quickly as possible. The Predictive model shows that the prophet model handles outliers well and it is robust to missing data and shifts in the trend. These outcomes are going to benefit the police departments to better understand crime issues and provide insights that will enable them to track activities, predict the likelihood of incidents, effectively deploy resources, and optimize the decision-making process. After applying the proposed model to many machine learning algorithms, the paper concluded that the best crime prediction algorithm in terms of accuracy is in order as follows: SVM algorithm with an accuracy of 0.99997, time 1100 seconds, followed by K-NN algorithm with an accuracy of 0.999976, time 2450 seconds, then Random Forest algorithm with accuracy 0.999996, Time 1420 seconds, then MLP algorithm with accuracy 0.995886, time 1346 seconds. The best algorithm to achieve the best accuracy in the least time is SVM.
Big data is the collection of large amount of data which is generated from various application such as social media, e-commerce etc. Those large amount of data were found to be tedious for storage and analysis. Now a day's various tools and techniques are coming to an existence to solve this problem. One of the application where huge amount of data is massively increasing is Crime which makes a huge issue for government to provide deliberate decision by following the laws and order. These large data were kept with the solution called Big Data Analytics. The processing of this huge data generated can be analyzed with high performance when compared to traditional way of processing. By performing these analytics, the government and the people will be provided with security and a healthy nation.
In this paper, a detailed study on crime classification and prediction using deep learning architectures is presented. We examine the effectiveness of deep learning algorithms in this domain and provide recommendations for designing and training deep learning systems for predicting crime areas, using open data from police reports. Having time-series of crime types per location as training data, a comparative study of 10 state-of-the-art methods against 3 different deep learning configurations is conducted. In our experiments with 5 publicly available datasets, we demonstrate that the deep learning-based methods consistently outperform the existing best-performing methods. Moreover, we evaluate the effectiveness of different parameters in the deep learning architectures and give insights for configuring them to achieve improved performance in crime classification and finally crime prediction.
Crime activity in many cities worldwide causes significant damages to the lives of victims and their surrounding communities. It is a public disorder problem, and big cities experience large amounts of crime events. Spatio-temporal prediction of crimes activity can help the cities to have a better allocation of police resources and surveillance. Deep learning techniques are considered efficient tools to predict future events analyzing the behavior of past ones; however, they are not usually applied to crime event prediction using a spatio-temporal approach. In this paper, a Convolutional Neural Network (CNN) together with a Long-Short Term Memory (LSTM) network (thus CLSTM-NN) are proposed to predict the presence of crime events over the city of Baltimore (USA). In particular, matrices of past crime events are used as input to a CLSTM-NN to predict the presence of at least one event in future days. The model is implemented on two types of events: ''street robbery'' and ''larceny''. The proposed procedure is able to take into account spatial and temporal correlations present in the past data to improve future prediction. The prediction performance of the proposed neural network is assessed under a number of controlled plausible scenarios, using some standard metrics (Accuracy, AUC-ROC, and AUC-PR). INDEX TERMS CNN and LSTM neural networks, crime prediction, deep learning, spatial and temporal structure.
Crime is a bone of contention that can create a societal disturbance. Crime forecasting using time series is an efficient statistical tool for predicting rates of crime in many countries around the world. Crime data can be useful to determine the efficacy of crime prevention steps and the safety of cities and societies. However, it is a difficult task to predict the crime accurately because the number of crimes is increasing day by day. The objective of this study is to apply time series to predict the crime rate to facilitate practical crime prevention solutions. Machine learning can play an important role to better understand and analyze the future trend of violations. Different time-series forecasting models have been used to predict the crime. These forecasting models are trained to predict future violent crimes. The proposed approach outperforms other forecasting techniques for daily and monthly forecast.
Received May 8, 2019, accepted July 9, 2019, date of publication July 22, 2019, date of current version August 15, 2019.
Digital Object Identifier 10.1109/ACCESS. 2019.2930410
Big Data Analytics and Mining for Effective Visualization and Trends Forecasting of Crime Data
MINGCHEN FENG ⊙1, JIANGBIN ZHENG 1, JINCHANG REN ⊙2,3, (Senior Member, IEEE), AMIR HUSSAIN ⊙4, (Senior Member, IEEE), XIUXIU LI 5, YUE XI ⊙1, AND QIAOYUAN LIU ⊙6 1 School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China 2 Department of Electronic and Electrical Engineering, University of Strathclyde, Glasgow G11XW, U.K. 3 School of Electrical and Power Engineering, Taiyuan University of Technology, Taiyuan 030024, China 4 Cognitive Big Data and Cybersecurity Research Lab, Edinburgh Napier University, Edinburgh EH11 4DY, U.K. 5 School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China 6 Changchan Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchan 130000, China
Corresponding authors: Jiangbin Zheng (zhengjb@nwpu.edu.cn) and Jinchang Ren (jinchang.ren@strath.ac.uk)
This work was supported in part by the HGJ, HJSW, and Research and Development Plan of Shaanxi Province under Grant 2017ZDXM-GY-094 and Grant 2015KTZDGY04-01, and in part by the National Natural Science Foundation of China under Grant 61502382 .
ABSTRACT Big data analytics (BDA) is a systematic approach for analyzing and identifying different patterns, relations, and trends within a large volume of data. In this paper, we apply BDA to criminal data where exploratory data analysis is conducted for visualization and trends prediction. Several the state-of-theart data mining and deep learning techniques are used. Following statistical analysis and visualization, some interesting facts and patterns are discovered from criminal data in San Francisco, Chicago, and Philadelphia. The predictive results show that the Prophet model and Keras stateful LSTM perform better than neural network models, where the optimal size of the training data is found to be three years. These promising outcomes will benefit for police departments and law enforcement organizations to better understand crime issues and provide insights that will enable them to track activities, predict the likelihood of incidents, effectively deploy resources and optimize the decision making process.
INDEX TERMS Big data analytics (BDA), data mining, data visualization, neural network, time series forecasting.
I. INTRODUCTION
In recent years, Big Data Analytics (BDA) has become an emerging approach for analyzing data and extracting information and their relations in a wide range of application areas [1]. Due to continuous urbanization and growing populations, cities play important central roles in our society. However, such developments have also been accompanied by an increase in violent crimes and accidents. To tackle such problems, sociologists, analysts, and safety institutions have devoted much effort towards mining potential patterns and factors [34]. In relation to public policy however, there are many challenges in dealing with large amounts of available data. As a result, new methods and technologies need
The associate editor coordinating the review of this manuscript and approving it for publication was Yong Xiang.
to be devised in order to analyze this heterogeneous and multi-sourced data [35]. Analysis of such big data enables us to effectively keep track of occurred events, identify similarities from incidents, deploy resources and make quick decisions accordingly [36]. This can also help further our understanding of both historical issues and current situations, ultimately ensuring improved safety/security and quality of life, as well as increased cultural and economic growth.
The rapid growth of cloud computing and data acquisition and storage technologies, from business and research institutions to governments and various organizations, have led to a huge number of unprecedented scopes/complexities from data that has been collected and made publicly available [37]. It has become increasingly important to extract meaningful information and achieve new insights for understanding patterns from such data resources. BDA can effectively address
the challenges of data that are too vast, too unstructured, and too fast moving to be managed by traditional methods [2]. As a fast-growing and influential practice, DBA can aid organizations to utilize their data and facilitate new opportunities. Furthermore, BDA can be deployed to help intelligent businesses move ahead with more effective operations, high profits and satisfied customers. Consequently, BDA becomes increasingly crucial to organizations to address their developmental issues [3].
As one of the fundamental techniques of BDA, data mining is an innovative, interdisciplinary, and growing research area, which can build paradigms and techniques across various fields for deducing useful information and hidden patterns from data [4]. Data mining is useful in not only the discovery of new knowledge or phenomena but also for enhancing our understanding of known ones. With the support of such techniques, BDA can help us easily identify crime patterns which occur in a particular area and how they are related with time. The implications of machine learning and statistical techniques on crime or other big data applications such as traffic accidents or time series data, will enable the analysis, extraction and understanding of associated patterns and trends, ultimately assisting in crime prevention and management.
In this paper, state-of-the-art machine learning and big data analytics algorithms are utilized for the mining of crime data from three US cities, i.e. San-Francisco, Chicago and Philadelphia. After preprocessing, including data filtering and normalization, Google maps based geo-mapping of the features are implemented for visualization of the statistical results. Various approaches in machine learning, deep learning, and time series modeling are utilized for future trends analysis. The major contribution of this paper can be summarized as follows:
A series of investigative explorations are conducted to explore and explain the crime data in three US cities;
We propose a novel visual representation which is capable of handling large datasets and enables users to explore, compare, and analyze evolutionary trends and patterns of crime incidents;
A combination and comparison of different machine learning, deep learning and time series modeling algorithms to predict trends with the optimal parameters, time periods and models.
The rest of this paper is organized as follows. Following a literature review of related work in Section 2, Section 3 details the techniques used for data processing and visualization. Section 4 describes the deep learning and machine learning techniques for trends analysis. Experimental results are presented in Section 5, followed by some concluding remarks summarized in Section 6.
II. RELATED WORK
A. BIG DATA ANALYTICS
Big data analytics (BDA) has been extensively applied and studied in the fields of data science and computer science
for quite some time. While machine learning provide both opportunities and challenges when meets large amount of big data [52]. Raghupathi and Raghupathi [5] described the promise and potential of BDA in healthcare and summarized current challenges. Archenaa and Anita [6] conducted a survey on the applications of BDA in healthcare and the government. Londhe and Rao [7] presented various software frameworks available for BDA and discussed some widely used data mining algorithms. Grady et al. [8] illustrated the implications of an agile process for the cleansing, transformation, and analytics of data in BDA. Vatrapu et al. [9] demonstrated the suitability and effectiveness of BDA for conceptualizing, formalizing and analyzing of big social data from content-driven social media platforms e.g. Facebook. The so-called Social Set Analysis was used for studying events such as unexpected crises and coordinated marketing campaigns. Zhang et al. [10] proposed an overall BDA architecture for the lifecycle of products, where BDA and servicedriven patterns were integrated to assist in the decision making and overcome barriers. Ngai et al. [11] reviewed BDA and its applications on electronic markets. Liu et al. [12] exemplified the utilization of BDA in the tourism domain in terms of the datasets, data capture techniques, analytical tools, and analysis results to provide insights for Destination Management and Marketing. Fisher et al. [13] explained the conception of big data in BDA, its analytics and the associated challenges when interacting among them.
B. CRIME DATA MINING, VISUALIZATION & TRENDS FORECASTING
In criminology literature the relationship between crime and various factors has been intensively analyzed, where typical examples include historical crime records [14], unemployment rate [15], and spatial similarity [16]. Using data mining and statistical techniques, new algorithms and systems have been developed along with new types of data. For instance, classification and statistical models are applied for mining of crime patterns and crime prediction [17], [18], where transfer learning has been employed to exploit spatio-temporal patterns in New York city [19]. Wu et al. [20] developed a system to automatically collect crime-logged data for mining of crime patterns, in order to achieve more effective crime prevention in/around a university campus. In Vineeth et al. [21], a random forest was applied on the obtained correlation between crime types to classify the state based on their crime intensity point. Unsupervised learning based methods have also been used for mining of crime patterns and crime hotpots, such as memetic differential fuzzy cluster [22] for forecasting of criminal patterns, and fuzzy C-means algorithm [23] to cluster criminal events in space. Noor et al. [24] derived association mining rules to determine relationships between different crimes. Injadat et al. [53] conducted a survey to summary data mining techniques on social media.
With deep learning and neural networks, new models have been developed to predict crime occurrence [25]. As deep learning [26]-[28] and artificial intelligence [29], [30] have
achieved great success in computer vision, they are also applied in BDA for predicting trends and classification. In Zhao et al. [31] and Dai et al. [32], Long ShortTerm Memory networks were successfully applied to predict stock price and gas dissolved in power transformation. In Kashef et al. [33], a neural network was applied on a smart grid system to estimate the trends of power loss. Zheng et al. [55] proposed a big data processing architecture for radio signals analysis. Zhao et al. [56] proposed a neural network model to predict travel time and gained high accuracy. Niu et al. [57] utilized LSTM and developed an effective speed prediction model to solve spatio-temporal prediction problems. Peral et al. [58] summarized analytics techniques and proposed an architecture for online forum data mining.
In our proposed system, a similar but more comprehensive workflow has been adopted, which include statistical analysis, data visualization, and trends prediction. We explored crime data in three US cities, Chicago, San Francisco and Philadelphia. Data visualization and mining techniques are used to show the extracted statistical relationships among different attributes within the huge volume of data. State-of-the-art machine learning and deep learning algorithms are deployed to forecast trends and obtain optimal models with the highest accuracy.
III. DATA ANALYSIS AND VISUALIZATION
The three crime datasets we used for analysis are publicly available, which cover 3 cities in US, i.e. San-Francisco, Chicago, and Philadelphia. The San-Francisco crime data contains 2,142,685 crime incidents from 01/01/2003 to 11/08/2017 [38]. Data from Chicago has a total number of 5,541,398 records, dating back from 2017 to 2003 [39]. In the Philadelphia dataset, there are 2,371,416 crime incidents which were captured from 01/01/2006 to 12/31/2017 [40]. Detailed analysis of these dataset is presented as follows.
A. FEATURED ATTRIBUTES
For each entry of crime incidents in the datasets, the following 13 featured attributes are included:
IncidentNum - Case number of each incident;
Dates - Date and timestamp of the crime incident;
Category - Type of the crime. This is the target/label that we need to predict in the classification stage;
Descript - A brief note describing any pertinent details of the crime;
DayOfWeek - Day of the week that crime occurred;
PdDistrict - Police Department District ID where the crime is assigned;
Resolution - How the crime incident was resolved (with the perpetrator being, say, arrest or booked);
Address - The approximate street address of the crime incident;
X - Longitude of the location of a crime;
Y - Latitude of the location of a crime;
Coordinate - Pairs of Longitude and Latitude;
Dome - whether crime id domestic or not;
Arrest - Arrested or not;
B. DATA PREPROCESSING
Before implementing any algorithms on our datasets, a series of preprocessing steps are performed for data conditioning as presented below:
Time is discretized into a couple of columns to allow for time series forecasting for the overall trend within the data.
For some missing coordinate attributes in Chicago and Philadelphia datasets, we imputed random values sampled from the non-missing values, computed their mean, and then replaced the missing ones [41].
The timestamp indicates the date and time of occurrence of each crime, we deduced these attributes into five features: Year (2003-2017), Month (1-12), Day (1-31), Hour (0-23), and Minute (0-59).
We also omit some features that unneeded like incidentNum, coordinate.
C. NARRATIVE VISUALIZATION
Considering the geographic nature of the crime incidents, an interactive map based on Google map was used for data visualization, where crime incidents are clustered according to their latitude/longitude information. As shown in Fig. 1, the blue label stands for the distribution of police stations in each city, where the round label with numbers are for crime hot-spots and the associated number of incidents.
The marker cluster algorithm [42] can help us manage multiple markers at different zooming levels, corresponding to various spatial scales or resolutions. When zooming out, the markers will gather together into clusters to view a broader geographic range on the map yet with a much coarser scale. When viewing the map at a high zooming level, the individual markers can be shown on the map to indicate the exact location of the incident. By using this interactive map, users can look up crimes on specific dates and streets. This can provide some preliminary guidance of the distributions of the associated incidents. Actually, it will show more crimes at the downtown and less crimes around the edge of a city.
Fig. 2 summarized crime incidents in each year for the three cities. In San-Francisco, the number of crime incidents seems to soar since 2012 and reaches its peak in 2013, whilst the numbers for the other two cities tend to decreasing yet following certain patterns. Actually, crime incidents in these two cities tend to increase from January and reach the peaks in the middle of the year and then begin to decline until the end of year. It seems that at the beginning and the end of each year, crime incidents always reach the lowest point. This may due to it is the time when everyone is engaged in celebrating the new year. We also learned from US economic growth statistics [50] that San-Francisco is one of the 9 cities with the worst income inequality, which will lead to wealth gap and cause more crimes [51].
Among 30+ categories of reported crime incidents available in the datasets, the distribution of these categories is heavily skewed. As such, we focus mainly on the Top-10 frequently occurred crimes and plot their distributions as
The hourly trends unravel some interesting crime facts as shown in Fig. 4. As seen, crimes by hour in all three cities share similar patterns, where 5-6 am is the safest part of the day whilst 0 am is the most dangerous hour with the most crime incidents reported. Also, 12 pm is also very dangerous during a day and in fact the hour where the maximum incidents reported under some crime categories. This actually is reasonable as it is the time when more people go out for lunch and also the time when tired criminals become aggressive under limited police available [43]. As a result, more police resources should be allocated in the shift from noon
to midnight if the police force is limited. We also noticed that Chicago has more crimes than the other two cities, and San Francisco is only one-third of the size of Philadelphia but has the same number of criminal cases. From American census data we know that Chicago has more population and larger area, which explains somehow more crimes it has. When calculating the population density as person/square mile, however, it becomes 17179 for San-Francisco, 11841 for Chicago, and 11379 for Philadelphia, this actually is another main factor that boosts crimes [44].
In Fig. 5, we show the keywords of crimes for the three cities, where wordcloud plots are used to illustrate the significance of different categories of crimes. As seen, SanFrancisco is now facing the problem of theft and property
Fig. 6 shows the top streets where the maximum crime incidents were reported in the three cities. In San-Francisco the top three dangerous streets were Bryant Street, Market Street, and Mission Street. In Chicago these become Ohare Street, State Street, and Cicero Avenue. In Philadelphia,
Roosevelt BLVD, Frankford Avenue, and Market Street were the three streets with the most frequent crimes. These streets are all close to the downtown, where many financial districts, fortune 500 companies, and luxury hotels located nearby. This may be one reason that crime incidents boosts.
Fig. 7 demonstrates the monthly based crime statistics, in which we can clearly see that the number of crime incidents in San-Francisco is quite stable with an average of 400 cases per day. The month of December seems the safest and is observed to have the least crime incidents whilst October is
least safe month with the maximum number of reported crime incidents. For Philadelphia, the mean value of reported crimes in each day is 400-600 incidents with February the safest and June-August the least safe months. While Chicago has the most crimes reported among the three cities with a mean value of 1000 incidents per day, where February is the safest month followed by December and January. However, June-August are the most notorious months with the highest numbers of crime incidents. We found that monthly crime rate is very likely linked to local climates: With a Mediterranean climate, San-Francisco is warm hence people may share similar working and life patterns throughout the year. Whilst Chicago and Philadelphia are temperate continental climate with a cold winter and hot summer in June-August, significant more outdoor living/activities can be found in summer than winter days thus the associated higher or lower crime incidents in different seasons.
IV. PREDICTION MODELS
In order to tackle the problem of crime trends forecasting we explored several state-of-the-art machine learning and deep learning algorithms and time series models. A time series is a sequence of numerical data points successively indexed or listed/graphed in the time order. Usually, the
The Prophet model is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and/or daily seasonality, plus holiday effects [36]. It works best with time series that have strong seasonal effects and cover several seasons of historical data. The Prophet model is robust to missing data and shifts in the trend, and typically it handles outliers well. The Prophet model is designed to handle complex features in time series, it also designed to have intuitive parameters that can be adjusted without knowing the details of the underlying model.
The Prophet model decomposes time series into three main components, i.e. the trend, seasonality, and holidays. They are combined in the following equation:
y(t)=g(t)+s(t)+h(t)+εt
where g(t) is the trend of any non-periodic changes in the time series, s(t) represents periodic changes (e.g., weekly and yearly seasonality), and h(t) represents the holiday effects of any potentially irregular schedules over one or more days. The error term εt represents any random effects which are not accommodated by the model.
For the trend function g(t), we utilized a linear trend with limited change points, where a piece-wise constant rate of growth provides a parsimonious and useful model.
g(t)aj(t)=(k+a(t)Tδ)t+(m+a(t)Tγ)={1,0, if t≥sj otherwise
where k is the growth rate, δ is the rate adjustment, m is the offset parameter, and γ is set to −sjδj to make the function continuous.
Crime data is time series with multi-period seasonality, e.g. daily, weekly and annually. To this end, the seasonal function relies on Fourier series defined below, where P denotes the regular period:
s(t)=n=1∑N(ancos(P2πnt)+bnsin(P2πnt))
As for holidays related events, they are defined by generating a matrix of regressors:
Z(t)h(t)=[l(t∈D1,…,l(t∈DL))]=Z(t)k
where Di is the set of past and future dates for holidays, k∼Normal(0,v2),1 is an indicator function representing whether time t is during holiday i,Z(t) is the regressor matrix.
B. NEURAL NETWORK MODEL
A neural network is composed of a certain numbers of neurons, namely nodes in the network, which are organized in several layers and connected to each other cross different layers [45]. There are at least three layers in a neural network, i.e. the input layer of the observations, a non-observable hidden layer in the middle, and an output layer as the predicted results. In this paper we explored the multilayer feedforward network, where each layer of nodes receives inputs from the previous layer. The outputs of the nodes in one layer will become the inputs to the next layer. The inputs to each node are combined using a weighted linear combination below [46]:
zj=bj+i=1∑nωi,jxi
The hidden layer will modify the input above using a nonlinear function by
s(z)=1+e−z1
C. LSTM MODEL
LSTM model is a powerful type of recurrent neural network (RNN), capable of learning long-term dependencies [47]. For time series involves auto-correlation, i.e. the presence of correlation between the time series and lagged versions of itself, LSTMs are particular useful in prediction due to their capability of maintaining the state whilst recognizing patterns over the time series. The recurrent architecture enables the states to be persisted, or communicate between updated weights as each epoch progresses. Moreover, the LSTM cell architecture can enhance the RNN by enabling long term persistence in addition to short term [48].
where, ft is a sigmoid function to indicate whether to keep the previous state, Ct−1 is the old cell state, Ct is the updated cell state, Wf,Wi, and WC are the previous value in each layer, ht−1 and xt, is the input value, bf,bi, and bc are constant values, it decides which value will be used to update the state, Ct stands for the new candidate values.
V. EXPRRIMENTAL RESULTS
A. DECOMPOSE OF TIME SERIES
Time series can exhibit a variety of patterns, and it is always helpful to decompose a time series into several components, each representing an underlying pattern category. Fig. 8 illustrates a decomposed crime time series, where for each original time series on the top, the three decomposed parts can respectively show the estimated trend component, seasonal component, and irregular component, respectively. The estimated trend component has shown that the overall crimes in San-Francisco slightly decreased from 2003 to 2013, followed by a steady increase from then on to 2017. However, crimes in Chicago seemed to decrease quickly from 2003 to 2015 and then became quite stable, whilst in Philadelphia the number of crimes had a downward trend yet with some undulations until it became stable after 2016. Regarding the seasonal component, it changes slowly over the time, where a quite strong annual periodic pattern can be observed for Chicago and Philadelphia than San-Francisco. This may be due to the more apparent annual climate changes in the two cities, where the crimes reached the peak in the middle of the year when the temperature becomes the hottest. As such, time series models will perform well on these datasets to forecast crimes in the future.
B. RESULTS ANALYSIS
We have explored deep learning algorithms and time series forecast models to predict crime trends. For performance evaluation, the Root Mean Square Error (RMSE) and spearman correlation [49] are used in terms of different parameters and different sizes of training samples. The RMSE and spearman correlation are defined as follows.
RMSEρX,Y=n1i=1∑n(Yi−Yˉi)2=σXσYcov(X,Y)
where Yi and Y are the true values; Yˉi and X are predicted values; cov is the covariance; σX is the standard deviation of X , and σY is the standard deviation of Y .
To train our models for predicting trends, we first summarized the number of crime incidents per day, and then transformed these data into a “tibbletime” format, and then we divided the data into training and testing sets, where the training set contains data from 2003 to 2016 and the testing set has data from 2017, for training process we set 1 year’s data as validation set. We evaluated the performance of the prediction models whilst changing the number of training years from 1 to 10 and the results are summarized in Table 1. As seen in Table 2, more training data do not necessarily lead to better results although too little training data also fails to generate good results. The optimal time period for crime trends forecasting is 3 years where the RMSE is the minimum and the spearman correlation is the highest.
TABLE 1. Comparison of different algorithms/models in terms of RMSE and spearman correlation under different sizes of training samples.
city
Years for training
RMSE- Prophet
Correlation- Prophet
RMSE- LSTM
Correlation- LSTM
RMSE-Neural Network
Correlation- Neural Network
San-Francisco
10
38.21
0.384
55.21
0.354
54.79
0.097
San-Francisco
5
35.70
0.402
54.18
0.365
48.04
0.232
San-Francisco
4
36.18
0.415
53.96
0.411
48.04
0.236
San-Francisco
3
35.65
0.398
45.65
0.423
41.62
0.291
San-Francisco
2
91.93
0.087
160.2
0.098
41.17
0.128
San-Francisco
1
100.56
0.182
95.96
0.122
-
-
Chicago
10
76.89
0.560
77.01
0.532
77.19
0.367
Chicago
5
68.21
0.652
69.12
0.549
92.74
0.551
Chicago
4
66.75
0.654
67.45
0.612
88.04
0.492
Chicago
3
66.68
0.658
67.15
0.625
75.06
0.505
Chicago
2
67.42
0.632
68.14
0.576
75.90
0.552
Chicago
1
100.51
0.02
78.98
0.459
-
-
Philadelphia
10
51.83
0.716
50.65
0.709
82.21
0.422
Philadelphia
5
56.07
0.728
51.23
0.698
71.19
0.486
Philadelphia
4
55.37
0.728
49.22
0.714
67.74
0.588
Philadelphia
3
48.73
0.729
48.15
0.725
63.68
0.537
Philadelphia
2
50.35
0.718
57.16
0.705
170.12
0.128
Philadelphia
1
100.71
0.098
140.63
0.562
-
-
TABLE 2. Comparison of different changepoint ranges in the prophet model in terms of RMSE and spearman correlation.
The results also showed that Prophet model and LSTM model performed better than traditional neural network models as demonstrated in table. 1 that neural network seems has lower RMSE but the correlation between predicted values and the real ones is low. The visualization of the trends in Fig. 10, 11, and 12 also conforms this conclusion.
Besides, we also evaluated the effects of some key parameters in the best two approaches, the Prophet and LSTM models. For Prophet model, after training we can obtain trends and seasonality of the dataset, but for holiday components we have to manually input the value. As shown in Fig. 9, we summarize top 10 dates with the most and least crime
incidents respectively, thus we set these 20 dates as holidays. Moreover, we analyzed different changepoint ranges, referring to the proportion of history in which trend changepoints will be estimated. According to the results shown in Table 2, the best changepoint range is determined as 0.8 , which means that the first 80 points are specified as changepoints. For the LSTM model, we first investigated the effect of different echos, i.e. the total number of forward/backward propagation iterations. Generally, the more iterations, the better the model performs [46]. However, in our experiments, we found the optimal number of epoch for the best results is 300 for the three datasets. In addition, we analyzed in detail the impact of different layers and neurons as shown in table 3 and table 4, the best number of layers for LSTM
is computed as 50 . The number of neurons in our model is affected largely by a parameter called cell state, we found that when the cell state is set as 60 , we get the optimal results.
With the derived best training period as three years and the optimal parameters for the Prophet and the LSTM models, we predict the crimes for the three cities in 2018 as shown in Fig. 9, Fig. 10 and Fig. 11 respectively. All predictive results indicate that crime in 2018 may decrease but still boost in some months. When compared with part of the data available in 2018, our results show relatively low RMSE and high correlation with the new data. As such we suggest that police department deploy more forces in summer and citizens be more cautious when going out.
VI. CONCLUSION & FUTURE WORK
In this paper a series of state-of-the-art big data analytics and visualization techniques were utilized to analyze crime big data from three US cities, which allowed us to identify patterns and obtain trends. By exploring the Prophet model, a neural network model, and the deep learning algorithm LSTM, we found that both the Prophet model and the LSTM algorithm perform better than conventional neural network models. We also found the optimal time period for the training sample to be 3 years, in order to achieve the best prediction of trends in terms of RMSE and spearman correlation. Optimal parameters for the Prophet and the LSTM models are also determined. Additional results explained earlier will provide new insights into crime trends and will assist both police departments and law enforcement agencies in their decision making.
In future, we plan to complete our on-going platform for generic big data analytics which will be capable of processing various types of data for a wide range of applications. We also plan to incorporate multivariate visualization [59], graph mining techniques [54] and fine-grained spatial analysis [60] to uncover more potential patterns and trends within these datasets. Moreover, we aim to conduct more realistic case studies to further evaluate the effectiveness and scalability of the different models in our system.
REFERENCES
[1] A. Gandomi and M. Haider, “Beyond the hype: Big data concepts, methods, and analytics,” Int. J. Inf. Manage., vol. 35, no. 2, pp. 137-144, Apr. 2015.
[2] J. Zakir and T. Seymour, “Big data analytics,” Issues Inf. Syst., vol. 16, no. 2, pp. 81-90, 2015.
[3] Y. Wang, L. Kung, W. Y. C. Wang, and C. G. Cegielski, “An integrated big data analytics-enabled transformation model: Application to health care,” Inf. Manage., vol. 55, no. 1, pp. 64-79, Jan. 2018.
[4] U. Thongsatapornwatana, “A survey of data mining techniques for analyzing crime patterns,” in Proc. 2nd Asian Conf. Defence Technol., Chiang Mai, Thailand, 2016, pp. 123-128.
[5] W. Raghupathi and V. Raghupathi, “Big data analytics in healthcare: Promise and potential,” Health Inf. Sci. Syst., vol. 2, no. 1, pp. 1-10, Feb. 2014.
[6] J. Archenaa and E. A. M. Anita, “A survey of big data analytics in healthcare and government,” Procedia Comput. Sci., vol. 50, pp. 408-413, Apr. 2015.
[7] A. Londhe and P. Rao, “Platforms for big data analytics: Trend towards hybrid era,” in Proc. Int. Conf. Energy, Commun., Data Anal. Soft Comput. (ICECDS), Chennai, 2017, pp. 3235-3238.
[8] W. Grady, H. Parker, and A. Payne, “Agile big data analytics: AnalyticsOps for data science,” in Proc. IEEE Int. Conf. Big Data, Boston, MA, USA, Dec. 2017, pp. 2331-2339.
[9] R. Vatrapu, R. R. Mukkamala, A. Hussain, and B. Flesch, “Social set analysis: A set theoretical approach to Big Data analytics,” IEEE Access, vol. 4, pp. 2542-2571, 2016.
[10] Y. Zhang, S. Ren, Y. Liu, and S. Si, “A big data analytics architecture for cleaner manufacturing and maintenance processes of complex products,” J. Cleaner Prod., vol. 142, no. 2, pp. 626-641, Jan. 2017.
[11] E. W. Ngai, A. Gunasekaran, S. F. Wamba, S. Akter, and R. Dubey, “Big data analytics in electronic markets,” Electron. Markets, vol. 27, no. 3, pp. 243-245, Aug. 2017.
[12] Y.-Y. Liu, F.-M. Tseng, and Y.-H. Tseng, “Big Data analytics for forecasting tourism destination arrivals with the applied Vector Autoregression model,” Technol. Forecasting Social Change, vol. 130, pp. 123-134, May 2018.
[13] D. Fisher, M. Czerwinski, S. Drucker, and R. DeLine, “Interactions with big data analytics,” Interactions, vol. 19, no. 3, pp. 50-59, Jun. 2012.
[14] C.-H. Yu, M. Morabito, P. Chen, and W. Ding, “Hierarchical spatiotemporal pattern discovery and predictive modeling,” IEEE Trans. Knowl. Data Eng., vol. 28, no. 4, pp. 979-993, Apr. 2016.
[15] S. Musa, “Smart cities-A road map for development,” IEEE Potentials, vol. 37, no. 2, pp. 19-23, Mar./Apr. 2018.
[16] S. Wang, X. Wang, P. Ye, Y. Yuan, S. Liu, and F.-Y. Wang, “Parallel crime scene analysis based on ACP approach,” IEEE Trans. Computat. Social Syst., vol. 5, no. 1, pp. 244-255, Mar. 2018.
[17] S. Adler, A. Yadav, R. Vishwakarma, N. Yadav, and M. Timbadia, “Crime pattern detection, analysis & prediction,” in Proc. IEEE Int. Conf. Electron., Commun. Aerosp. Technol., Coimbatore, India, Apr. 2017, pp. 225-230.
[18] N. Baloian, C. E. Bassaletti, M. Fernández, O. Figueroa, P. Fuentes, R. Manasevich, M. Orchard, S. Peñafiel, J. A. Pino, and M. Vergara, “Crime prediction using patterns and context,” in Proc. 21st IEEE Int. Conf. Comput. Supported Cooperat. Work Design, Wellington, New Zealand, Apr. 2017, pp. 2-9.
[19] X. Zhao and J. Tang, “Exploring transfer learning for crime prediction,” in Proc. IEEE Int. Conf. Data Mining Workshops, New Orleans, LA, USA, Nov. 2017, pp. 1158-1159.
[20] S. Wu, J. Male, and E. Dragut, “Spatial-temporal campus crime pattern mining from historical alert messages,” in Proc. Int. Conf. Comput., Netw. Commun., Santa Clara, CA, USA, 2017, pp. 778-782.
[21] K. R. S. Vineeth, T. Pradhan, and A. Pandey, “A novel approach for intelligent crime pattern discovery and prediction,” in Proc. Int. Conf. Adv. Commun. Control Comput. Technol., Ramanathapuram, India, 2016, pp. 531-538.
[22] C. R. Rodriguez, D. M. Gomez, and M. A. M. Rey, “Forecasting time series from clustering by a memetic differential fuzzy approach: An application to crime prediction,” in Proc. IEEE Symp. Ser. Comput. Intell., Honolulu, HI, USA, Nov./Dec. 2017, pp. 1-8.
[23] A. Joshi, A. S. Sabitha, and T. Choudhury, “Crime analysis using K-means clustering,” in Proc. 3rd Int. Conf. Comput. Intell. Netw., Odisha, India, 2017, pp. 33-39.
[24] N. M. M. Noor, W. M. F. W. Nawawi, and A. F. Ghazali, “Supporting decision making in situational crime prevention using fuzzy association rule,” in Proc. Int. Conf. Comput., Control, Informat. Appl. (IC3INA), Jakarta, Indonesia, 2013, pp. 225-229.
[25] M. Wang, F. Zhang, H. Guan, X. Li, G. Chen, T. Li, and X. Xi, “Hybrid neural network mixed with random forests and Perlin noise,” in Proc. 2nd IEEE Int. Conf. Comput. Commun. (ICCC), Chengdu, China, Oct. 2016, pp. 1937-1941.
[26] Z. Wang, D. Zhang, M. Sun, J. Jiang, and J. Ren, “A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos,” Neurocomputing, vol. 287, pp. 68-83, Apr. 2018.
[27] J. Ren and J. Jiang, “Hierarchical modeling and adaptive clustering for real-time summarization of rush videos,” IEEE Trans. Multimedia, vol. 11, no. 5, pp. 906-917, Aug. 2009.
[28] J. Han, G. Cheng, L. Guo, J. Ren, and D. Zhang, “Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 6, pp. 3325-3337, Jun. 2014.
[29] Y. Yan, J. Ren, H. Zhao, G. Sun, Z. Wang, J. Zheng, S. Marshall, and J. Soraghan, “Cognitive fusion of thermal and visible imagery for effective detection and tracking of pedestrians in videos,” Cognit. Comput., vol. 10, no. 1, pp. 94-104, Feb. 2018.
[30] Y. Yan, J. Ren, G. Sun, H. Zhao, J. Han, X. Li, S. Marshall, and J. Zhan, “Unsupervised image saliency detection with gestalt-laws guided optimization and visual attention based refinement,” Pattern Recognit., vol. 79, pp. 65-78, Jul. 2018.
[31] Z. Zhao, S. Tu, J. Shi, and R. Rao, “Time-weighted LSTM model with redefined labeling for stock trend prediction,” in Proc. IEEE 29th Int. Conf. Tools Artif. Intell. (ICTAI), Boston, MA, USA, Nov. 2017, pp. 1210-1217.
[32] J. Dai, G. Sheng, X. Jiang, and H. Song, “LSTM networks for the trend prediction of gases dissolved in power transformer insulation oil,” in Proc. 12th Int. Conf. Properties Appl. Dielectr. Mater., Xi’an, China, 2018, pp. 666-669.
[33] H. Kashef, M. Abdel-Nasser, and K. Mahmoud, “Power loss estimation in smart grids using a neural network model,” in Proc. Int. Conf. Innov. Trends Comput. Eng. (ITCE), Aswan, Egypt, 2018, pp. 258-263.
[34] H. Hassani, X. Huang, M. Ghodsi, and E. S. Silva, “A review of data mining applications in crime,” Stat. Anal. Data Mining, ASA Data Sci. J., vol. 9, no. 3, pp. 139-154, Apr. 2016.
[35] Z. Jia, C. Shen, Y. Chen, T. Yu, X. Guan, and X. Yi, “Big-data analysis of multi-source logs for anomaly detection on network-based system,” in Proc. 13th IEEE Conf. Autom. Sci. Eng. (CASE), Xi’an, China, Aug. 2017, pp. 1136-1141.
[36] A. Agresti, An Introduction to Categorical Data Analysis, 3rd ed. Hoboken, NJ, USA: Wiley, 2018.
[37] M. Huda, A. Maseleno, M. Siregar, R. Ahmad, K. A. Jasmi, N. H. N. Muhamad, and P. Atmotiyoso, “Big data emerging technology: Insights into innovative environment for online learning resources,” Int. J. Emerg. Technol. Learn., vol. 13, no. 1, pp. 23-36, Jan. 2018.
[38] San Francisco Crime Data. [Online]. Available: https://data.sfgov.org/ Public-Safety/Police-Department-Incident-Reports-Historical-2003/tnmfyery
[39] Chicago Crime Data. [Online]. Available: https://data.cityofchicago. org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2
[40] Philadelphia Crime Data. [Online]. Available: https://www. opendataphilly.org/dataset/crime-incidents
[41] V. Viswanathan and S. R. Viswanathan, Data Analysis Cookbook, 2nd ed. Birmingham, U.K.: Packt Publishing Ltd, 2015, pp. 30-39.
[42] R. Boix, B. De Miguel-Molina, and J. L. Hervás-Oliver, “Microgeographies of creative industries clusters in Europe: From hot spots to assemblages,” Papers Regional Sci., vol. 94, no. 4, pp. 753-772, Jan. 2015.
[43] Z. Krizan and A. D. Herlache, “Sleep disruption and aggression: Implications for violence and its prevention,” Psychol. Violence, vol. 6, no. 4, pp. 542-552, Oct. 2016.
[44] US Population Density by City. [Online]. Available: http://www.governing. com/blogs/by-the-numbers/most-densely-populated-cities-data-map.html
[45] I. N. da Silva and D. H. Spatti, “Introduction,” in Artificial Neural Networks. Cham, Switzerland: Springer, 2017, pp. 3-19.
[46] R. J. Hyndman and G. Athanasopoulos, Forecasting: Principles and Practice, 2nd ed. O’Dots, 2018, pp. 333-339.
[47] K. Greff, R. K. Srivastava, J. Koutnik, B. R. Steunebrink, and J. Schmidhuber, “LSTM: A search space odyssey,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 10, pp. 2222-2232, Oct. 2017.
[48] J. Lai, B. Chen, S. Tong, K. Yu, and T. Tan, “Phone-aware LSTM-RNN for voice conversion,” in Proc. IEEE 13th Int. Conf. Signal Process. (ICSP), Chengdu, China, Nov. 2016, pp. 177-182.
[49] P. Schober, C. Boer, and L. A. Schwarte, “Correlation coefficients: Appropriate use and interpretation,” Anesthesia Analgesia, vol. 126, no. 5, pp. 1763-1768, May 2018.
[50] American Cities With the Worst Income Inequality. [Online]. Available: https://www.cbsnews.com/media/9-american-cities-with-the-worst-income-inequality/
[51] M. Lofstrom and S. Raphael, “Crime, the Criminal Justice System, and Socioeconomic Inequality,” J. Econ. Perspect., vol. 30, no. 2, pp. 26-103, Mar. 2016.
[52] L. Zhou, J. Wang, A. V. Vasilakos, and S. Pan, “Machine learning on big data: Opportunities and challenges,” Neurocomputing, vol. 237, pp. 350-361, May 2017.
[53] M. Injadat, F. Salo, and A. B. Nassif, “Data mining techniques in social media: A survey,” Neurocomputing, vol. 214, pp. 654-670, Nov. 2016.
[54] Y. Gao, Y. Xia, J. Qiao, and S. Wu, “Solution to gang crime based on graph theory and analytical hierarchy process,” Neurocomputing, vol. 140, pp. 121-127, Sep. 2014.
[55] S. Zheng, S. Chen, L. Yang, J. Zhu, Z. Luo, J. Hu, and X. Yang, “Big data processing architecture for radio signals empowered by deep learning: Concept, experiment, applications and challenges,” IEEE Access, vol. 6, pp. 55907-55922, 2018.
[56] J. Zhao, Y. Gao, Y. Qu, H. Yin, Y. Liu, and H. Sun, “Travel time prediction: Based on gated recurrent unit method and data fusion,” IEEE Access, vol. 6, pp. 70463-70472, 2018.
[57] K. Niu, H. Zhang, C. Cheng, C. Wang, and T. Zhou, “A novel spatiotemporal model for city-scale traffic speed prediction,” IEEE Access, vol. 7, pp. 30050-30057, 2019.
[58] J. Peral, A. Ferrández, D. Gil, E. Kauffmann, and H. Mora, “A review of the analytics techniques for an efficient management of online forums: An architecture proposal,” IEEE Access, vol. 7, pp. 12220-12240, 2019.
[59] L. Stopar, P. Skraba, D. Mladenic, and M. Grobelnik, “StreamStory: Exploring multivariate time series on multiple scales,” IEEE Trans. Vis. Comput. Graphics, vol. 25, no. 4, pp. 1788-1802, Apr. 2019.
[60] L. Guo, X. Cai, F. Hao, D. Mu, C. Fang, and L. Yang, “Exploiting finegrained co-authorship for personalized citation recommendation,” IEEE Access, vol. 5, pp. 12714-12725, 2017.
MINGCHEN FENG received the M.S. degree in software engineering from the School of Software and Microelectronics, Northwestern Polytechnical University, Xi’an, China, in 2015, where he is currently pursuing the Ph.D. degree with the School of Computer Science. He was a Visiting Researcher with the Department of Electronic and Electrical Engineering, University of Strathclyde, Glasgow, U.K., in 2018. His research interests include big data analytics, data mining, machine learning, deep learning, and data visualization.
JIANGBIN ZHENG received the B.S., M.S., and Ph.D. degrees in computer science from Northwestern Polytechnical University, in 1993, 1996, and 2002, respectively.
From 2000 to 2002, he was a Research Assistant with The Hong Kong Polytechnic University, Hong Kong. From 2004 to 2005, he was a Research Assistant with The University of Sydney, Sydney, Australia. Since 2009, he has been a Professor and Ph.D. Supervisor with the School of Computer Science, Northwestern Polytechnical University. His research interests include intelligent information processing, visual computing, multimedia signal processing, big data, and soft engineering. He has published over 100 peer-reviewed journal/conference papers covering a wide range of topics in image/video analytics, pattern recognition, machine learning, and big data analytics.
JINCHANG REN received the B.E. degree in computer software, the M.Eng. degree in image processing, and the D.Eng. degree in computer vision from Northwestern Polytechnical University, Xi’an, China. He received the Ph.D. degree in electronic imaging and media communication from Bradford University, Bradford, U.K.
He is currently a Senior Lecturer with the Centre for excellence for Signal and Image Processing (CeSIP), and also the Deputy Director of the Strathclyde Hyperspectral Imaging Centre, University of Strathclyde, Glasgow, U.K. He has published over 200 peer-reviewed journal/conferences papers, and acts as an Associate Editor for six international journals, including the IEEE-JSTARS and the Journal of The Franklin Institute. His research interests include visual computing and multimedia signal processing, especially on semantic content extraction for video analysis and understanding, and more recently hyperspectral imaging.
**AMIR HUSSAIN** received the B.Eng. (Hons.) and Ph.D. degrees from the University of Strathclyde, Glasgow, U.K., in 1992 and 1997, respectively. Following postdoctoral and academic positions held at the West of Scotland, from 1996 to 1998, Dundee, from 1998 to 2000, and Stirling Universities, from 2000 to 2018, respectively. He is currently a Professor and the founding Head of the Cognitive Big Data and Cybersecurity (CogBiD) Research Lab with Edinburgh Napier University, U.K. He has coauthored more than 350 papers, including over a dozen books and around 140 journal papers. Amongst other distinguished roles, he is the General Chair for the IEEE WCCI 2020 (the world’s largest technical event in Computational Intelligence), and the Vice-Chair of the Emergent Technologies Technical Committee of the IEEE Computational Intelligence Society. He is also founding Editor-in-Chief of two leading journals: *Cognitive Computation* (Springer Nature), BMC Big Data Analytics (BioMed Central), of the Springer Book Series on Socio-Affective Computing, and *Cognitive Computation Trends*. He is also an Associate Editor for a number of prestigious journals, including: *Information Fusion* (Elsevier), *AI Review* (Springer), the IEEE Transactions on Neural Networks and Learning Systems, the *IEEE Computational Intelligence Magazine*, and the IEEE Transactions on Emerging Topics in Computational Intelligence. He has led major national, EU and internationally funded projects and supervised over 30 Ph.D. students.
**XIUXIU LI** received the Ph.D. degree with the School of Computer Science, Northwestern Polytechnical University, Xi’an, China, in 2012. She is currently a Lecturer with the Xi’an University of Technology. Her research interests include computer vision and pattern recognition.
**YUE XI** received the B.S. degree from the Qingdao University of Technology, China, in 2011, the M.S. degree from Guizhou University, China, in 2014. He is currently pursuing the dual Ph.D. degrees in computer science with the University of Technology Sydney, Australia, and Northwestern Polytechnical University, Xi’an, China. His research interests include computer vision, image processing, machine learning, and deep learning.
**QIAOYUAN LIU** received the bachelor’s degree from the Department of Computer Science and Technology, Northeast University, Shenyang, China, in 2014, and received the master’s and Ph.D. degrees from the Department of Computer Science and Technology, Northeast Normal University. She is currently with the Changchun Institute of Optics, Fine Mechanics, and Physics, Chinese Academy of Sciences. Her research interests include pattern recognition and image processing.
References (57)
A. Gandomi and M. Haider, ''Beyond the hype: Big data concepts, meth- ods, and analytics,'' Int. J. Inf. Manage., vol. 35, no. 2, pp. 137-144, Apr. 2015.
J. Zakir and T. Seymour, ''Big data analytics,'' Issues Inf. Syst., vol. 16, no. 2, pp. 81-90, 2015.
Y. Wang, L. Kung, W. Y. C. Wang, and C. G. Cegielski, ''An integrated big data analytics-enabled transformation model: Application to health care,'' Inf. Manage., vol. 55, no. 1, pp. 64-79, Jan. 2018.
U. Thongsatapornwatana, ''A survey of data mining techniques for ana- lyzing crime patterns,'' in Proc. 2nd Asian Conf. Defence Technol., Chiang Mai, Thailand, 2016, pp. 123-128.
W. Raghupathi and V. Raghupathi, ''Big data analytics in healthcare: Promise and potential,'' Health Inf. Sci. Syst., vol. 2, no. 1, pp. 1-10, Feb. 2014.
J. Archenaa and E. A. M. Anita, ''A survey of big data analytics in healthcare and government,'' Procedia Comput. Sci., vol. 50, pp. 408-413, Apr. 2015.
A. Londhe and P. Rao, ''Platforms for big data analytics: Trend towards hybrid era,'' in Proc. Int. Conf. Energy, Commun., Data Anal. Soft Comput. (ICECDS), Chennai, 2017, pp. 3235-3238.
W. Grady, H. Parker, and A. Payne, ''Agile big data analytics: AnalyticsOps for data science,'' in Proc. IEEE Int. Conf. Big Data, Boston, MA, USA, Dec. 2017, pp. 2331-2339.
R. Vatrapu, R. R. Mukkamala, A. Hussain, and B. Flesch, ''Social set analysis: A set theoretical approach to Big Data analytics,'' IEEE Access, vol. 4, pp. 2542-2571, 2016.
Y. Zhang, S. Ren, Y. Liu, and S. Si, ''A big data analytics architecture for cleaner manufacturing and maintenance processes of complex products,'' J. Cleaner Prod., vol. 142, no. 2, pp. 626-641, Jan. 2017.
E. W. Ngai, A. Gunasekaran, S. F. Wamba, S. Akter, and R. Dubey, ''Big data analytics in electronic markets,'' Electron. Markets, vol. 27, no. 3, pp. 243-245, Aug. 2017.
Y.-Y. Liu, F.-M. Tseng, and Y.-H. Tseng, ''Big Data analytics for fore- casting tourism destination arrivals with the applied Vector Autoregres- sion model,'' Technol. Forecasting Social Change, vol. 130, pp. 123-134, May 2018.
D. Fisher, M. Czerwinski, S. Drucker, and R. DeLine, ''Interactions with big data analytics,'' Interactions, vol. 19, no. 3, pp. 50-59, Jun. 2012.
C.-H. Yu, M. Morabito, P. Chen, and W. Ding, ''Hierarchical spatio- temporal pattern discovery and predictive modeling,'' IEEE Trans. Knowl. Data Eng., vol. 28, no. 4, pp. 979-993, Apr. 2016.
S. Musa, ''Smart cities-A road map for development,'' IEEE Potentials, vol. 37, no. 2, pp. 19-23, Mar./Apr. 2018.
S. Wang, X. Wang, P. Ye, Y. Yuan, S. Liu, and F.-Y. Wang, ''Parallel crime scene analysis based on ACP approach,'' IEEE Trans. Computat. Social Syst., vol. 5, no. 1, pp. 244-255, Mar. 2018.
S. Yadav, A. Yadav, R. Vishwakarma, N. Yadav, and M. Timbadia, ''Crime pattern detection, analysis & prediction,'' in Proc. IEEE Int. Conf. Electron., Commun. Aerosp. Technol., Coimbatore, India, Apr. 2017, pp. 225-230.
N. Baloian, C. E. Bassaletti, M. Fernández, O. Figueroa, P. Fuentes, R. Manasevich, M. Orchard, S. Peñafiel, J. A. Pino, and M. Vergara, ''Crime prediction using patterns and context,'' in Proc. 21st IEEE Int. Conf. Comput. Supported Cooperat. Work Design, Wellington, New Zealand, Apr. 2017, pp. 2-9.
X. Zhao and J. Tang, ''Exploring transfer learning for crime prediction,'' in Proc. IEEE Int. Conf. Data Mining Workshops, New Orleans, LA, USA, Nov. 2017, pp. 1158-1159.
S. Wu, J. Male, and E. Dragut, ''Spatial-temporal campus crime pattern mining from historical alert messages,'' in Proc. Int. Conf. Comput., Netw. Commun., Santa Clara, CA, USA, 2017, pp. 778-782.
K. R. S. Vineeth, T. Pradhan, and A. Pandey, ''A novel approach for intelligent crime pattern discovery and prediction,'' in Proc. Int. Conf. Adv. Commun. Control Comput. Technol., Ramanathapuram, India, 2016, pp. 531-538.
C. R. Rodríguez, D. M. Gomez, and M. A. M. Rey, ''Forecasting time series from clustering by a memetic differential fuzzy approach: An application to crime prediction,'' in Proc. IEEE Symp. Ser. Comput. Intell., Honolulu, HI, USA, Nov./Dec. 2017, pp. 1-8.
A. Joshi, A. S. Sabitha, and T. Choudhury, ''Crime analysis using K-means clustering,'' in Proc. 3rd Int. Conf. Comput. Intell. Netw., Odisha, India, 2017, pp. 33-39.
N. M. M. Noor, W. M. F. W. Nawawi, and A. F. Ghazali, ''Supporting decision making in situational crime prevention using fuzzy association rule,'' in Proc. Int. Conf. Comput., Control, Informat. Appl. (IC3INA), Jakarta, Indonesia, 2013, pp. 225-229.
M. Wang, F. Zhang, H. Guan, X. Li, G. Chen, T. Li, and X. Xi, ''Hybrid neural network mixed with random forests and Perlin noise,'' in Proc. 2nd IEEE Int. Conf. Comput. Commun. (ICCC), Chengdu, China, Oct. 2016, pp. 1937-1941.
Z. Wang, D. Zhang, M. Sun, J. Jiang, and J. Ren, ''A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos,'' Neurocomputing, vol. 287, pp. 68-83, Apr. 2018.
J. Ren and J. Jiang, ''Hierarchical modeling and adaptive clustering for real-time summarization of rush videos,'' IEEE Trans. Multimedia, vol. 11, no. 5, pp. 906-917, Aug. 2009.
J. Han, G. Cheng, L. Guo, J. Ren, and D. Zhang, ''Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning,'' IEEE Trans. Geosci. Remote Sens., vol. 53, no. 6, pp. 3325-3337, Jun. 2014.
Y. Yan, J. Ren, H. Zhao, G. Sun, Z. Wang, J. Zheng, S. Marshall, and J. Soraghan, ''Cognitive fusion of thermal and visible imagery for effective detection and tracking of pedestrians in videos,'' Cognit. Comput., vol. 10, no. 1, pp. 94-104, Feb. 2018.
Y. Yan, J. Ren, G. Sun, H. Zhao, J. Han, X. Li, S. Marshall, and J. Zhan, ''Unsupervised image saliency detection with gestalt-laws guided opti- mization and visual attention based refinement,'' Pattern Recognit., vol. 79, pp. 65-78, Jul. 2018.
Z. Zhao, S. Tu, J. Shi, and R. Rao, ''Time-weighted LSTM model with redefined labeling for stock trend prediction,'' in Proc. IEEE 29th Int. Conf. Tools Artif. Intell. (ICTAI), Boston, MA, USA, Nov. 2017, pp. 1210-1217.
J. Dai, G. Sheng, X. Jiang, and H. Song, ''LSTM networks for the trend prediction of gases dissolved in power transformer insulation oil,'' in Proc. 12th Int. Conf. Properties Appl. Dielectr. Mater., Xi'an, China, 2018, pp. 666-669.
H. Kashef, M. Abdel-Nasser, and K. Mahmoud, ''Power loss estimation in smart grids using a neural network model,'' in Proc. Int. Conf. Innov. Trends Comput. Eng. (ITCE), Aswan, Egypt, 2018, pp. 258-263.
H. Hassani, X. Huang, M. Ghodsi, and E. S. Silva, ''A review of data mining applications in crime,'' Stat. Anal. Data Mining, ASA Data Sci. J., vol. 9, no. 3, pp. 139-154, Apr. 2016.
Z. Jia, C. Shen, Y. Chen, T. Yu, X. Guan, and X. Yi, ''Big-data analysis of multi-source logs for anomaly detection on network-based system,'' in Proc. 13th IEEE Conf. Autom. Sci. Eng. (CASE), Xi'an, China, Aug. 2017, pp. 1136-1141.
A. Agresti, An Introduction to Categorical Data Analysis, 3rd ed. Hoboken, NJ, USA: Wiley, 2018.
M. Huda, A. Maseleno, M. Siregar, R. Ahmad, K. A. Jasmi, N. H. N. Muhamad, and P. Atmotiyoso, ''Big data emerging technology: Insights into innovative environment for online learning resources,'' Int. J. Emerg. Technol. Learn., vol. 13, no. 1, pp. 23-36, Jan. 2018.
V. Viswanathan and S. R. Viswanathan, Data Analysis Cookbook, 2nd ed. Birmingham, U.K.: Packt Publishing Ltd, 2015, pp. 30-39.
R. Boix, B. De Miguel-Molina, and J. L. Hervás-Oliver, ''Micro- geographies of creative industries clusters in Europe: From hot spots to assemblages,'' Papers Regional Sci., vol. 94, no. 4, pp. 753-772, Jan. 2015.
Z. Krizan and A. D. Herlache, ''Sleep disruption and aggression: Impli- cations for violence and its prevention,'' Psychol. Violence, vol. 6, no. 4, pp. 542-552, Oct. 2016.
I. N. da Silva and D. H. Spatti, ''Introduction,'' in Artificial Neural Net- works. Cham, Switzerland: Springer, 2017, pp. 3-19.
R. J. Hyndman and G. Athanasopoulos, Forecasting: Principles and Prac- tice, 2nd ed. OTexts, 2018, pp. 333-339.
K. Greff, R. K. Srivastava, J. Koutnìk, B. R. Steunebrink, and J. Schmidhuber, ''LSTM: A search space odyssey,'' IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 10, pp. 2222-2232, Oct. 2017.
J. Lai, B. Chen, S. Tong, K. Yu, and T. Tan, ''Phone-aware LSTM-RNN for voice conversion,'' in Proc. IEEE 13th Int. Conf. Signal Process. (ICSP), Chengdu, China, Nov. 2016, pp. 177-182.
P. Schober, C. Boer, and L. A. Schwarte, ''Correlation coefficients: Appro- priate use and interpretation,'' Anesthesia Analgesia, vol. 126, no. 5, pp. 1763-1768, May 2018.
American Cities With the Worst Income Inequality. [Online]. Avail- able: https://www.cbsnews.com/media/9-american-cities-with-the-worst- income-inequality/
M. Lofstrom and S. Raphael, ''Crime, the Criminal Justice System, and Socioeconomic Inequality,'' J. Econ. Perspect., vol. 30, no. 2, pp. 26-103, Mar. 2016.
L. Zhou, J. Wang, A. V. Vasilakos, and S. Pan, ''Machine learning on big data: Opportunities and challenges,'' Neurocomputing, vol. 237, pp. 350-361, May 2017.
M. Injadat, F. Salo, and A. B. Nassif, ''Data mining techniques in social media: A survey,'' Neurocomputing, vol. 214, pp. 654-670, Nov. 2016.
Y. Gao, Y. Xia, J. Qiao, and S. Wu, ''Solution to gang crime based on graph theory and analytical hierarchy process,'' Neurocomputing, vol. 140, pp. 121-127, Sep. 2014.
S. Zheng, S. Chen, L. Yang, J. Zhu, Z. Luo, J. Hu, and X. Yang, ''Big data processing architecture for radio signals empowered by deep learning: Concept, experiment, applications and challenges,'' IEEE Access, vol. 6, pp. 55907-55922, 2018.
J. Zhao, Y. Gao, Y. Qu, H. Yin, Y. Liu, and H. Sun, ''Travel time prediction: Based on gated recurrent unit method and data fusion,'' IEEE Access, vol. 6, pp. 70463-70472, 2018.
K. Niu, H. Zhang, C. Cheng, C. Wang, and T. Zhou, ''A novel spatio- temporal model for city-scale traffic speed prediction,'' IEEE Access, vol. 7, pp. 30050-30057, 2019.
J. Peral, A. Ferrández, D. Gil, E. Kauffmann, and H. Mora, ''A review of the analytics techniques for an efficient management of online forums: An architecture proposal,'' IEEE Access, vol. 7, pp. 12220-12240, 2019.
L. Stopar, P. Skraba, D. Mladenic, and M. Grobelnik, ''StreamStory: Exploring multivariate time series on multiple scales,'' IEEE Trans. Vis. Comput. Graphics, vol. 25, no. 4, pp. 1788-1802, Apr. 2019.
L. Guo, X. Cai, F. Hao, D. Mu, C. Fang, and L. Yang, ''Exploiting fine- grained co-authorship for personalized citation recommendation,'' IEEE Access, vol. 5, pp. 12714-12725, 2017.
MINGCHEN FENG received the M.S. degree in software engineering from the School of Software and Microelectronics, Northwestern Polytechni- cal University, Xi'an, China, in 2015, where he is currently pursuing the Ph.D. degree with the School of Computer Science. He was a Visiting Researcher with the Department of Electronic and Electrical Engineering, University of Strathclyde, Glasgow, U.K., in 2018. His research interests include big data analytics, data mining, machine learning, deep learning, and data visualization. JIANGBIN ZHENG received the B.S., M.S., and Ph.D. degrees in computer science from North- western Polytechnical University, in 1993, 1996, and 2002, respectively. From 2000 to 2002, he was a Research Assis- tant with The Hong Kong Polytechnic University, Hong Kong. From 2004 to 2005, he was a Research Assistant with The University of Sydney, Sydney, Australia. Since 2009, he has been a Professor and Ph.D. Supervisor with the School of Com- puter Science, Northwestern Polytechnical University. His research interests include intelligent information processing, visual computing, multimedia signal processing, big data, and soft engineering. He has published over 100 peer-reviewed journal/conference papers covering a wide range of topics in image/video analytics, pattern recognition, machine learning, and big data analytics. JINCHANG REN received the B.E. degree in computer software, the M.Eng. degree in image processing, and the D.Eng. degree in computer vision from Northwestern Polytechnical Univer- sity, Xi'an, China. He received the Ph.D. degree in electronic imaging and media communication from Bradford University, Bradford, U.K. He is currently a Senior Lecturer with the Centre for excellence for Signal and Image Pro- cessing (CeSIP), and also the Deputy Director of the Strathclyde Hyperspectral Imaging Centre, University of Strathclyde, Glasgow, U.K. He has published over 200 peer-reviewed journal/conferences papers, and acts as an Associate Editor for six international journals, includ- ing the IEEE-JSTARS and the Journal of The Franklin Institute. His research interests include visual computing and multimedia signal processing, espe- cially on semantic content extraction for video analysis and understanding, and more recently hyperspectral imaging.
Big Data Analytics (BDA) is a structured approach for examining and recognizing different patterns relations, and trends within a huge amount of information. The BDA is applied to criminal data where preliminary data analysis is conducted for visualization and trends prediction. Various data mining and deep learning techniques are used. The predictive results show the Keras stateful LSTM perform better than neural network models, where the optimal size of training data are found to be three years. The system shows the graphs as end result using the LSTM model which shows the time series graph. These end result will be helpful for police departments and law enforcement organization to keep track of activities, predict the chance of incidents, effectively locate resources and enhance the decision-making process.
Big Data Analytics (BDA) is a structured approach for examining and recognizing different patterns relations, and trends within a huge amount of information. The BDA is applied to criminal data where preliminary data analysis is conducted for visualization and trends prediction. Various data mining and deep learning techniques are used. The predictive results show the Keras stateful LSTM perform better than neural network models, where the optimal size of training data are found to be three years. The system shows the graphs as end result using the LSTM model which shows the time series graph. These end result will be helpful for police departments and law enforcement organization to keep track of activities, predict the chance of incidents, effectively locate resources and enhance the decision-making process.
Crime and violation are the threat to justice and meant to be controlled. Accurate crime prediction and future forecasting trends can assist to enhance metropolitan safety computationally. The limited ability of humans to process complex information from big data hinders the early and accurate prediction and forecasting of crime. The accurate estimation of the crime rate, types and hot spots from past patterns creates many computational challenges and opportunities. Despite considerable research efforts, yet there is a need to have a better predictive algorithm, which direct police patrols toward criminal activities. Previous studies are lacking to achieve crime forecasting and prediction accuracy based on learning models. Therefore, this study applied different machine learning algorithms, namely, the logistic regression, support vector machine (SVM), Naïve Bayes, k-nearest neighbors (KNN), decision tree, multilayer perceptron (MLP), random forest, and eXtreme Gradient Boosting (XGBoost), and time series analysis by long-short term memory (LSTM) and autoregressive integrated moving average (ARIMA) model to better fit the crime data. The performance of LSTM for time series analysis was reasonably adequate in order of magnitude of root mean square error (RMSE) and mean absolute error (MAE), on both data sets. Exploratory data analysis predicts more than 35 crime types and suggests a yearly decline in Chicago crime rate, and a slight increase in Los Angeles crime rate; with fewer crimes occurred in February as compared to other months. The overall crime rate in Chicago will continue to increase moderately in the future, with a probable decline in future years. The Los Angeles crime rate and crimes sharply declined, as suggested by the ARIMA model. Moreover, crime forecasting results were further identified in the main regions for both cities. Overall, these results provide early identification of crime, hot spots with higher crime rate, and future trends with improved predictive accuracy than with other methods and are useful for directing police practice and strategies. INDEX TERMS LSTM and ARIMA based crime prediction, analysis and forecast.
International Journal for Research in Applied Science & Engineering Technology, 2021
Crimes are common social problems that can even affect the quality of life, even the economic growth of a country. Big Data Analytics (BDA) is used for analyzing and identifying different crime patterns, their relations, and the trends within a large amount of crime data. Here, BDA is applied to criminal data in which, data analysis is conducted for the purpose of visualization. Big data analytics and visualization techniques were utilized to analyze crime big data within the different parts of India. Here, we have taken all the states of Indian for analysis, visualization and prediction. The series of operations performed are data collection, data pre-processing, visualization and trends prediction, in which LSTM model is used. The data includes different cases of crimes with in different years and the crimes such as crime against women and children in which, kidnap, murder, rape. The predictive results show that the LSTM perform better than neural network models. Hence, the generated outcomes will benefit for police and law enforcement organizations to clearly understand crime issues and that will help them to track activities, predict the similar incidents, and optimize the decision making process.
Ceza Hukuku ve Kriminoloji Dergisi-Journal of Penal Law and Criminology, 2020
Along with the rapid change of information technologies and the widespread use of the internet over time, data stacks with ample diversity and quite large volumes has emerged. The use of data mining is increasing day by day due to the huge part it plays in the acquisition of information by making necessary analyses especially within a large amount of data. Obtaining accurate information is a key factor affecting decision-making processes. Crime data is included among the application areas of data mining, being one of the data stacks which is rapidly growing with each passing day. Crime events constitute unwanted behaviour in every society. For this reason, it is important to extract meaningful information from crime data. This article aims to provide an overview of the use of data mining and machine learning in crime data and to give a new perspective on the decision-making processes by presenting examples of the use of data mining for a crime. For this purpose, some examples of data mining and machine learning in crime and security areas are presented by giving a conceptual framework in the subject of big data, data mining, machine learning, and deep learning along with task types, processes, and methods.
Crime influences people in many ways. Prior studies have shown the relationship between time and crime incidence behavior. This research attempts to determine and examine the relationship between time, crime incidences types and locations by using one of the neural network models for time series data that is, Long Short-Term Memory network. The collected data is pre-processed, analyzed and tested using Long Short-Term Memory recurrent neural network model. R-square score is also used to test the accuracy. The study results show that applying Long Short-Term Memory Recurrent Neural Network (LSTM RNN) enables to come up with more accurate prediction about crime incidence occurrence with respect to time. Predicting crimes accurately helps to improve crime prevention and decision and advance the justice system.
International journal of advanced research in computer science, 2024
The goal of crime prediction is to help law enforcement authorities to prevent crimes before they happen by identifying and reducing potential future crimes. In present days, crime cases are rapidly occur so it is a challenging task to accurately predict and classify the future crimes, as criminal patterns are adaptable and constantly evolve, making it a challenging task. Sentiment analysis is essential for deciphering the feelings conveyed in text since social media has grown into a popular venue for individuals to communicate their thoughts, feelings, views, and comments. This analysis is particularly valuable for making informed decisions in business, politics and government agencies especially to identify crimes. However, it faces the difficulties like lexical diversity and dataset imbalance. In order to make better decisions and maybe lower the risk of predicting the suspects in a crime, early crime prediction has lately made use of Artificial Intelligence (AI) models like Deep Learning (DL) and Machine Learning (ML). When applied to crime data, ML and DL models may assist pinpoint possible crime hotspots and foretell when crimes may occur. Crime prediction and categorization using social media posts is covered in depth in this paper, which offers a comprehensive overview of several ML and DL frameworks. Initially, different ML and DL based crime prediction models designed by many researchers are examined in brief. The next step is to do a comparison research to learn about the shortcomings of those frameworks and provide an alternative method for effectively predicting crimes based on social media postings.
Cyber crime is the new age concern which has originated due to the increasing growth and advancements in computer systems and networks, which has given rise to a new environment for criminal activities. Regular occurrences of the crime incidents pose a serious threat to the safety and well-being of the society. This paper offers a extensive overview of crime occurrences and their relevance in literature by combining approaches. The paper initially reviews and identifies features of crime incidents proposing a combinatorial incident description schema. The proposed schema intends to raise a opportunity to systematically combine different elements, or crime characteristics. Additionally, a comprehensive list of crime-related offences is put forward. This enables a thorough understanding of the repeating and underlying criminal activities. This paper intends to be of use to specialists and law enforcement officers in discovering patterns and trends for making forecasts, finding relatio...
International Journal of Research in Engineering and Technology, 2015
Crime reduction and prevention challenges in today's world are becoming increasingly complex and are in need of a new technique that can handle the vast amount of information that is being generated. Traditional police capabilities mostly fall short in depicting the original division of criminal activities, thus contribute less in the suitable allocation of police services. In this paper methods are described for crime event forecasting, using Hadoop, by studying the geographical areas which are at greater risk and outside the traditional policing limits. The developed method makes the use of a geographical crime mapping algorithm to identify areas that have relatively high cases of crime. The term used for such places is hot spots. The identified hotspot clusters give valuable data that can be used to train the artificial neural network which further can model the trends of crime. The artificial neural network specification and estimation approach is enhanced by processing capability of Hadoop platform.
Crime detection is one of the most important research applications in machine learning. Identifying and reducing crime rates is crucial to developing a healthy society. Big Data techniques are applied to collect and analyse data: determine the required features and prime attributes that cause the emergence of crime hotspots. The traditional crime detection and machine learning-based algorithms lack the ability to generate key prime attributes from the crime dataset, hence most often fail to predict crime patterns successfully. This paper is aimed at extracting the prime attributes such as time zones, crime probability, and crime hotspots and performing vulnerability analysis to increase the accuracy of the subject machine learning algorithm. We implemented our proposed methodology using two standard datasets. Results show that the proposed feature generation method increased the performance of machine learning models. The highest accuracy of 97.5% was obtained when the proposed meth...