Study on crime trend analysis and countermeasures based on law enforcement databases

With the development of society, criminal crimes present intelligent, criminal object uncertainty, diversification of criminal means and other characteristics, which increases the difficulty of the public security organs to solve the case under the new situation, a single mode of solving the case by virtue of the traditional investigation, to a certain extent, it has been difficult to adapt to the needs of the times [1-4]. It is necessary for public security organs to pay attention to the study of the development trend of criminal offenses, no longer only focusing on the detection of the issued cases, but also focusing more on the prediction of criminal offenses that may occur in the future, and deploying the police force in advance to prevent the occurrence of criminal offenses [5-8].

Criminal crime trend analysis is a routine work of Chinese public security organs, and some traditional working methods and means have been formed, generally focusing on grasping the overall situation of criminal crimes from a macro point of view, and making inference and estimation of possible criminal crimes through comprehensive analysis of factors that may affect criminal crimes [9-12]. And the law enforcement database is a set of computer software system designed to enhance the registration, query, statistics and analysis of cases, which, through informatization, enables the Public Security Bureau to have the ability to deal with cases more efficiently and accurately, and effectively enhances the ability to maintain social stability and public security [13-16]. Through this system, rapid registration, accurate query, efficient statistics and in-depth analysis of case information can be realized, providing a scientific basis for the decision-making and strategy of public security departments [17-19]. Therefore, overcoming the limitations of traditional criminal crime trend analysis, making full use of law enforcement databases for comprehensive processing and analysis, and proposing corresponding countermeasures through the analysis of criminal crime trends are the urgent problems to be solved at present [20-23].

In this paper, we use the time series decomposition method to analyze the periodic changes and trend changes in the time series, and the closest neighbor index method to assess the degree of aggregation of the geographic location of the crime. Relying on the multilayer perceptron to enhance the node features, the feature representation of the node to be predicted is obtained by two-step convolution. The binary cross entropy is used as the target prediction function to construct the multi-view synchronized convolution algorithm. Based on the law enforcement database, the case category with the largest amount of data in City B is selected as the research object. The crime events are visualized through a two-dimensional color matrix, reflecting the immediate characteristics of the cases and the high incidence period. Cases are categorized and aggregated based on multiple time scales to explore long-term trends and patterns of criminal behavior. Calculate the standardized crime rate of each spatial unit to measure the crime risk from the spatial dimension. The local Moran index is selected to study the specific clustering characteristics of theft crime cases. The multi-view simultaneous convolution algorithm is used to fuse the spatio-temporal features to obtain the distribution of crimes in different time periods. The fitting effect of the model in the training set and test set is examined to evaluate its performance level. Based on the model output results, corresponding countermeasures are proposed for the prevention of criminal behavior.

2

Time-space fusion analysis technique based on multi-view synchronized convolution algorithm

2.1

Spatio-temporal distribution pattern recognition

2.1.1

Time series decomposition

Time series decomposition is a method of decomposing time series into trend, seasonal and stochastic components. The method can be used to analyze cyclical and trend changes in time series and has high accuracy in practical forecasting.

The time series decomposition method is based on the following assumption: a time series can be viewed as consisting of trend, seasonal and stochastic components. The trend component represents the long-term growth or downward trend of the time series, the seasonal component represents cyclical changes within the same time period, and the stochastic component represents random noise changes.

The time series decomposition method can be categorized into two types: additive model and multiplicative model. The additive model is applicable when the magnitude of fluctuations in the time series is relatively stable, while the multiplicative model is applicable when the magnitude of fluctuations increases over time.

The steps of the time series decomposition method are as follows: 1)

Smoothing the time series, such as moving average or exponential smoothing.

2)

Perform a seasonality analysis on the smoothed series to determine the seasonal components, e.g., using seasonal indices or regression analysis.

3)

Perform trend analysis on seasonal series, e.g., using linear regression or curve fitting.

4)

Perform random error analysis on the trend series, for example, by calculating residuals or ANOVA.

The time series decomposition method makes it possible to obtain specific values for the trend, seasonal, and stochastic components and to predict future values based on these components.

2.1.2

Nearest neighbor index method

The Nearest Neighbor Index (NNI) method is a spatial statistical method used to analyze point patterns to assess the degree of aggregation of a point set. It calculates the ratio between the distance of each point in a point set to the nearest other point and the average distance, which is called the Nearest Neighbor Index.

If the nearest neighbor index is less than one, it means that the point set shows an aggregated distribution; if the nearest neighbor index is equal to one, it means that the point set is randomly distributed; and if the nearest neighbor index is greater than one, it means that the point set shows a dispersed distribution. The advantages of the nearest-neighbor index method are that it is simple to understand, easy to calculate, and applicable to different types of point patterns. However, it has some drawbacks, such as being sensitive to the shape and density of point patterns and may not be applicable to large-scale datasets. In addition, in order to accurately calculate the nearest neighbor index, the choice of proximity distances and the boundary effects of the point patterns need to be taken into account.

The specific procedure of the nearest neighbor index method is as follows: 1)

Calculate the average value of the distance between all case points and their nearest neighbors, Eq: (1) ${\bar{d}}_{\min} = \frac{1}{n} \sum_{i = 1}^{n} d_{\min} (b_{i})$

Where, n represents the number of points where cases occur, and b_i represents the cases in the region where they occur. 2)

The nearest neighbor distance expectation value $E (d)$ is calculated by the formula: (2) $E (d) = \frac{1}{2 \sqrt{n / A}}$

3)

Finally, the value of the Nearest Neighbor Index, NNI, is calculated as: (3) $N N I = \frac{{\bar{d}}_{\min}}{E (d)}$

If the value of NNI is less than 1, it means that the set of points shows an aggregated distribution; if the value of NNI is equal to 1, it means that the set of points is randomly distributed; if the value of NNI is greater than 1, it means that the set of points shows a decentralized distribution.

2.2

Predicting crime trends based on multi-view simultaneous convolution algorithms

In order to realize the simultaneous extraction and fusion of features from three perspectives: time, space and crime type, this paper proposes a multi-perspective simultaneous convolution algorithm. The goal of the multi-view synchronous convolution algorithm is to initialize the node features by the temporal and spatial neighbors of the node to be predicted (featureless), and then the initialized node features are used for prediction. In this paper, the algorithm is divided into three parts: (1) feature enhancement; (2) neighborhood convolution; and (3) target prediction.

2.2.1

Feature enhancement

In order to enhance the representation of node features, we use multilayer perceptron (MLP) to enhance the node features. Considering that the node features are divided into three parts: crime pattern F_patrern, local spatial features F_local and time period features F_time, we augment each of these three parts and downsize the augmented features by MLP to get the final node features. The formula for this part is shown below: (4) $X = M L P (M L P (F_{p a t t e r n}) | M L P (F_{l o c a l}) | M L P (F_{t i m e}))$

In order to represent this process more intuitively, we plot it schematically, and the result of the feature enhancement visualization is shown in Figure 1.

2.2.2

Neighborhood Convolution

The crime patterns of a jurisdiction are closely related to those of neighboring jurisdictions, so we obtain the feature representation of the node to be predicted through information aggregation among neighbors. In order to enhance the perceived strength of nodes in time and space, we extend the information aggregation to second-order neighbors.

Considering that the influence between different neighboring jurisdictions may not be the same, we construct edge features in the spatio-temporal fusion topology graph, so we replace the neighbor matrix A in the graph with the edge feature matrix W in the subsequent convolution process.

Succinctly, the initial features of a node to be predicted are obtained by weighted summation of the node features of its first-order neighbors according to the edge features. Prior to this, its first-order neighbors converge the information of their neighbors (i.e., the second-order neighbors of the node to be predicted) in a similar way.

The convolution process is divided into two phases: in the first phase, the convolution obtains the feature representation of the first-order neighbors of the node to be predicted. In the second stage, the convolution obtains the node features of the node to be predicted.

In the first stage, convolution obtains the feature representation of the first-order neighbors of the node to be predicted, and the first stage process of convolution is shown in Figure 2.

We denote the prediction nodes as R. For the to-be-predicted node r_i ∈ R_o, we first select its first-order neighbor nodes. These neighbor nodes are divided into two categories, one is featureless nodes (other to-be-predicted nodes), and the other is neighbor nodes that possess node features in time or space. The former do not contribute to the subsequent information convergence process, so we need to temporarily exclude them from the spatio-temporal fusion topology so that they do not participate in the subsequent convolution. We denote the set of neighbor nodes obtained after screening in this process as R₁.

Further, for node r_i ∈ R₁, we select its neighbors in the same way and perform weighted summation by the corresponding edge features to obtain the neighbor aggregation result of the node, denoted as F_nbr. The formula for this stage is shown below: (5) $F_{n b r} = M a s k (W) F_{s e l f}$

where Mask() denotes the elimination of temporarily useless neighbors from the spatio-temporal fusion graph.

In order to preserve r_i own features, we employ a residual structure, where r_i own features (denoted as F_self) are summed with the features F_nbr aggregated from the neighboring nodes to obtain a feature representation after the first-order neighbor convolution. The formula for this step is shown below. (6) $F = F_{s e l f} + F_{n b r}$

In the second stage, the node features of the nodes to be predicted are obtained by convolution, and the second stage flow of convolution is shown in Fig. 3.

At this point the effective first-order neighbors of the node to be predicted $(R_{1})$ have sufficiently aggregated the information of their neighbors (i.e., the second-order neighbors of the node to be predicted). The initial features of the to-be-predicted node are obtained by weighted summation of its first-order neighbors with edge features. In order to improve the learning ability in this stage, we augment the result of the convolution by a layer of neural network.

The formulation of this stage is shown below. (7) $X = A c t (W F W_{0}) + b_{0})$

where W₀ is the weight matrix of the fully connected layer, b₀ is the bias vector, and Act() is the activation function.

2.2.3

Targeted projections

As we mentioned in the previous section, the to-be-predicted nodes initially have no feature representation because their original features are to be used as labels for model training. After the two-step convolution in the previous stage, the to-be-predicted nodes already have initial feature representations, which are next used for training. We use a two-layer neural network to train these features. The prediction results are represented as follows. (8) $Y = A c t (A c t ((g e t N o d e s (X) W_{1} + b_{1}) W_{2} + b_{2}))$

Among them, the getNodes() function represents the extraction of the nodes to be predicted from the spatio-temporal fusion graph. W₁ and W₂ are the weight matrices of the fully connected layer, b₁ and b₂ are the confidence vectors, and Act() is the activation function.

The model is to predict the occurrence or non-occurrence of each type of criminal activity in each jurisdiction, which is actually a binary classification task, so we choose the binary cross entropy, which is a commonly used Loss function in binary classification tasks. Its formula is shown below, where y_i denotes Groundtruth and p_i denotes the predicted outcome. (9) $L_{c} = - \frac{1}{n} \sum_{i = 1}^{n} y_{i} \log p_{i} + {(1 - y)}_{i} \log (1 - p_{i})$

3

Analysis of crime trends based on law enforcement databases

The prediction of crime trends cannot be separated from the process of data analysis. Data analysis is based on existing data, visualizing the characteristics of the data information and the information hidden in the data, so as to facilitate intuitive access to the information of the data in preparation for further mining or processing. Law enforcement departments based on law enforcement databases, the use of data analysis is used to predict crime trends in time and space, so that early warning programs can be given according to the laws and information analyzed by mining, and the application of crime trend prediction to early intervention in policing and preventive strikes is of great significance.

3.1

Data analysis

3.1.1

Data pre-processing

The data source is the real data of the information construction and application project of the Public Security Big Data Institute, which is the crime data set of City B for the years 2022 and 2023, with a total of 18,635 records, and a total of 17,336 pieces of data in the real validity period category retained after desensitization and data preprocessing (including supplementation and deletion of data with serious deficiencies or duplicated data). Raw data include serial number; case number; case category; case category 2 is a specific description of the case category; the beginning of the time of occurrence; the end of the time of occurrence; the occurrence of the location of the detailed address; the occurrence of the case for the occurrence of the case of the type of information on the premises; community name for the case of the occurrence of the community to which the case belongs to; the content of the alarm or a brief description of the case of the information content includes time, the informant, report the content of the report; the loss of the total value of the goods; the informant The information content of the information includes name, gender, age, certificate number, identity, household registration, work unit, contact phone number, the degree of victimization, the attributes of the person; the information content of the record in the analysis column of the process of crime includes time, personnel, location, and the course of events; the information content of the pattern of crime includes the choice of timing, choice of place, choice of object, the equipment used in the crime, and the characteristics of the technique; the information content of the information of the offender includes Information content: name, gender, age, certificate number, identity, household registration and detailed address, contact phone number, degree of victimization, attributes of the person, and a brief description of the solved/canceled case and its basis. The processed data are shown in Table 1.

Table 1.

Processed data information (part)

Property	Example
Serial number	56
Case number	A12**********74
Case category	Theft case
Case category 2	Burglary at home
Occurrence time start	2023/7/11
Duration	3:22(h)
Time of incident	Antemeridian
Day of week	Thursday
Address	*Village*number
Longitude of the place of attack	118.3***
The dimension of the crime place	33.5***
Type of premises	Residential residence
Community name	***Street
Jurisdiction	***Area

Retained attributes are case number; case category and case category 2, so that you can understand the case of the type of crime and the type of detailed description of the case; the date of the case; according to the date of the case added to appear to be attributes, Day of week represents the weekly information; the length of the case; the time period of the information represents the case of the time of the division of the segment in a day, the dataset dealt with by the division of the time period as follows: 0 point to 6 points, including 0 point; 6:00 to 12:00, including 6:00; 12:00 to 18:00, including 12:00; 18:00 to 24:00, including 18:00; Address information on behalf of the detailed Chinese information address of the crime site, this paper uses the API to process the conversion of latitude and longitude information, so the added information for the case of the longitude and the case of the dimensional information; place information to record the case occurs in the place of information; community information; according to the results of the API query latitude and longitude results. According to the result of latitude and longitude of API query and the actual precinct division, the precinct information is added.

3.1.2

Data selection

The case types in the dataset are analyzed by making a visual statistical graph, and the specific case types are divided as shown in Figure 4. It can be concluded that the case types in 2022 and 2023, theft crime as the main type of crime, the number of crimes is much higher than other types of crime, the number of theft crimes in 2023 more than 12,000, the number of other crimes in 2022 and 2023 is relatively stable, the experiment was initially selected to focus on a relatively large amount of data to carry out a study of the theft crime data.

According to the type of dataset selected to analyze the division of B city area, B city has a total of 10 jurisdictions, the total number of theft crime cases in 2022 2023 according to the case detailed address information for statistics and divided into the region to which it belongs, 10 jurisdictions, the distribution of cases as shown in Figure 5, so that further selection of theft crime data in the 6th district to analyze the investigation, hereinafter referred to as the 6th district of the city of B, X district. The full study of the data to the X district as an example, such a practice is to avoid the selection of special case data to interfere with the experiment.

3.2

Time Distribution Pattern Recognition

After obtaining the number of theft crimes in different time dimensions in the law enforcement database, the immediate characteristics of the crime events and the high incidence time can be analyzed through the two-dimensional color matrix. The “week - hour” relationship table of crime events is made into a two-dimensional color matrix, and the specific visualization results are shown in Figure 6. The saturation of the color is used to indicate the number of crimes, from the figure it can be easily seen that Friday at 3:00 a.m. is the high incidence of theft crimes. Secondly there is also a small peak of crimes between 15 and 17 hours.

In order to further explore the long-term trend and pattern of criminal behavior, the theft cases in the study area were classified and summarized according to five time scales, namely “daily”, “weekly”, “monthly”, “quarterly” and “seasonal”. The time distribution of the average annual theft crime in city B under multiple time scales is shown in Figure 7.

From the daily distribution of theft crimes, it can be seen that the number of cases occurring in the middle of the month on the 18th is the highest, the end of the month on the 26th and 31st the number of cases occurring the least, and in January there are many times a smaller degree of theft crimes “peak” phenomenon. According to the number of cases of each day with the average ratio can be divided into the whole month “high and low staggered, two high and two low” four phases; the beginning of the month on the first day and the middle of the month (6-22 days) the number of cases above the average level, for the month of burglary crime “high stage”; The beginning of the month 2-5 days and the end of the month (23-31 days) the number of cases in the average level below, for the month of burglary crime “low incidence stage”.

From the weekly distribution of theft crimes, it can be seen that the number of cases occurring on Mondays is the highest, and the number of cases occurring on Sundays is the lowest. A week can be divided into “high and low staggered, a high and low” two stages, that is, from Monday to Wednesday for the week of burglary crime “high incidence stage”, Thursday to Sunday for the week of burglary crime “low incidence stage”.

According to the monthly distribution of theft crimes, the number of cases occurred in October was the largest, higher than the monthly average of 18%; Conversely, February had the lowest number of cases, falling below the monthly average of 60%. The whole year can be divided into two stages: “high and low, one high and one low”, that is, January to April is the “low incidence stage” of theft crimes during the year, and May to December is the “high incidence stage” of theft crimes during the year.

The quarterly distribution of burglary crimes shows that the fourth quarter has the highest number of cases, which is about 15 per cent above the quarterly average; on the contrary, the quarter with the lowest number of cases is the first quarter, which is about 30 per cent below the quarterly average. By comparing the number of cases in each quarter with the quarterly average, the year can be divided into two phases, namely, the first and second quarters for the “low incidence” of burglary and the third and fourth quarters for the “high incidence” of burglary.

As can be seen from the seasonal distribution of burglary crimes, the lowest number of cases occurs in winter, which is about 20 per cent below the seasonal average; on the contrary, the highest number of cases occurs in summer and fall, which is 12 per cent above the seasonal average. The whole year can be divided into three stages according to the season, that is, spring is the “low incidence stage” of theft crimes during the year, summer and autumn are the “high incidence stages” of theft crimes during the year, and winter is the “low incidence stage” of theft crimes during the year.

3.3

Spatial distribution pattern recognition

3.3.1

Standardized crime rate distribution analysis

Based on the burglary crime data and demographic data in Area X, the standardized crime rate of each spatial cell was calculated to measure the crime risk, and then visualized, and the spatial distribution of crime risk is shown in Figure 8. From the figure, it can be seen that the crime risk is higher in the southwestern and northeastern part of the study area, and the highest crime risk cell is as high as 16.56, which is noteworthy. This indicates that the crime risk in this spatial unit is significantly higher than the average, suggesting that there may be some influencing factors that lead to high risk in these spatial units. The southwestern part of the study area belongs to the core functional area of the city, there are many large shopping malls, hospitals, transportation hubs and other places with high pedestrian flow, more merchants, the possibility of exposure of the object of the crime is higher, and the police resource force is limited, which leads to a higher risk of burglary crime; the northeastern part of the study area has a lower density of population, the crime supervision force is relatively weak, and the risk of crime is higher. The central part of the study area, as well as the north-western part of the study area, has a low crime risk and is well below the average crime risk level.

3.3.2

Analysis of spatial aggregation of crime

This paper further introduces the local Moran index to analyze the specific clustering characteristics of theft crime cases, and the results of the local Moran index test for theft crime cases are shown in Figure 9. As can be seen from Figure 9, the high value - high value agglomeration of theft crime cases mainly occurs in the center of the region, and the autocorrelation effect of the crime in the central region is more significant, with obvious proximity repetition effect; the low value - low value agglomeration of theft crime cases mainly occurs in the east of the region.

3.4

Analysis of crime trends in the dimension of temporal and spatial integration

3.4.1

Model predictions

The distribution of crimes in different time periods can help the effective allocation of police force in one day and effectively improve the social security environment. The multi-view synchronous convolutional algorithm proposed in this paper is used to fuse time and space for analysis, and 70% of the dataset is divided into a training set and 30% is divided into a test set. The 24 hours of a day are divided into four time periods, and the distribution of crimes in different time periods is plotted, and the results are shown in Fig. 10. According to Fig. 10, it can be seen that there are roughly three levels of time periods for crime distribution in a day. The smallest number of burglary cases is 0-6 hours, a relatively large number of 6-18 hours, and the largest number of hours in the 18-24 hours. Combined with field visits and research, four time periods, the top three places in order of occurrence of residential homes, squares and streets, commercial premises. Facts have proved that, regardless of the time period, need to tighten control of the above three types of places. At the same time, 0-6 hours, restaurants and hotels in the case of theft in a relatively high incidence of places. 6-12 hours, hospitals in the case of theft in a relatively high incidence of places. According to the characteristics of the time distribution of crime, strengthen social defense and control. To further establish a joint mechanism, fully mobilize social forces, for different time periods, different areas, especially shopping malls, bars, restaurants and other places, joint security, shopkeepers and other relevant personnel, to strengthen the reminders and patrol work.

3.4.2

Model performance levels

The fitting results of the crime trend prediction model based on the multiview simultaneous convolution algorithm in the training and test sets for city B are shown in Figure 11. The proposed model in the training set can fit the overall trend similar to a sinusoidal curve, but the overall value is small and cannot fit the great and small values; in the test set, it can fit the trend of the development of the number of crime cases, and the overall prediction effect is good.

4

Conclusions and responses

4.1

Conclusion

In this paper, theft crimes in City B are selected to study the latent crime patterns based on law enforcement databases.

In terms of time, 3 a.m. on Friday is the time when theft crimes are high. Secondly, there was also a small spike in incidents between 15 and 17 o'clock. The whole month can be divided into four stages of “high and low staggered, two high and two low”, the week is divided into two stages of “high and low staggered, one high and one low”, the whole year is divided into two stages of “high and low staggered, one high and one low” by month, the whole year is divided into two stages of “low incidence period” and “high incidence period” by quarter, and the whole year is divided into three stages of “high and low staggered, one high and two low” according to the season.

In the spatial dimension, the crime risk is higher in the southwest and northeast of the study area, with the highest crime risk unit as high as 16.56. High-value-high-value clustering of burglary cases mainly occurs in the central part of the region, where the autocorrelation effect of crime is more significant, and low-value-low-value clustering of burglary cases mainly occurs in the eastern part of the region.

Using the multi-view simultaneous convolution algorithm to fuse time and space for analysis, there are roughly three levels of time periods in the distribution of crimes in a day. The lowest number of burglary occurrences is from 0-6 hours, the relatively high number is from 6-18 hours, and the highest number of hours is at 18-24 hours. The model in the training set proposed model can fit the overall sinusoidal-like trend, but the overall value is small, and can not be fitted to the very large and very small values; in the test set can be fitted to the trend of the development of the number of crime occurrences, and the overall prediction effect is good.

4.2

Countermeasures

In view of the temporal and spatial characteristics of burglary crimes, this study puts forward three suggestions for burglary crime prevention. 1)

Strengthen the patrol and monitoring of the night time hours

According to the crime distribution analysis, 0-6 hours of theft cases are relatively small, but this period of time in restaurants and hotels and other places but there is a higher risk of crime. Therefore, the public security department and security forces can be increased in this period of time to restaurants, hotels and other places of patrol and monitoring efforts. By increasing the security and police equipment, set up a high density of monitoring equipment to enhance the ability to combat crime.

2)

Strengthening the focus of prevention and control during the day and in the evening

During the 6-18 hours, the number of crimes occurring is relatively high, especially in public places such as hospitals, which also show a high incidence rate. Therefore, in addition to regular patrols, emphasis should be placed on strengthening security control in places such as hospitals and public transportation hubs, increasing security personnel, and monitoring the behavior of suspicious persons in real time through intelligent monitoring systems, so that preventive measures can be taken in a timely manner.

3)

Strengthening precautionary efforts in residential and commercial areas

At all times of the day, residential houses, plaza streets and commercial places are always the high incidence of theft cases. In order to reduce the number of theft cases in these areas, in addition to strengthening community security management, intelligent means can be utilized, such as the installation of more intelligent monitoring, alarm systems, intelligent door locks and so on. In addition, places such as commercial premises and plaza streets need to join forces with shopping malls, stores and merchants to strengthen joint security, especially during peak hours.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

Study on crime trend analysis and countermeasures based on law enforcement databases

Zexin Liu

Zhuqiang Ye

Published Online: Sep 29, 2025

Received: Dec 20, 2024

Accepted: Apr 18, 2025

DOI: https://doi.org/10.2478/amns-2025-1128

KeywordsSpatio-temporal analysis of crime, Time series decomposition method, Nearest neighbor index method, Multi-perspective simultaneous convolution algorithm, Crime trend

© 2025 Zexin Liu and Zhuqiang Ye, published by Sciendo.

This work is licensed under the Creative Commons Attribution 4.0 International License.

Keywords
Spatio-temporal analysis of crime, Time series decomposition method, Nearest neighbor index method, Multi-perspective simultaneous convolution algorithm, Crime trend