Research on Intelligent Recognition System of Traffic Image Based on CNN and Intelligent Recognition of Foreign Object Intrusion

In China, high-speed rail is a typical representative of the rail transportation system, which occupies an important position in China's transportation and has the advantages of safety, timeliness, and affordable price, which cannot be replaced by other transportation systems [1–2]. With strong carrying capacity, small footprint, and low cost required for manufacturing and maintenance, railroad transportation has gradually become one of the backbone modes of the transportation system [3–4].

In recent years, China's railroad construction has been developing rapidly, and many types of terrain are covered by railroads, so the surrounding environment is complex and diverse when trains are running, and there are often intrusions within the limit range, which can create huge hidden dangers if they cannot be detected and removed in time [5–6]. There are many types of track foreign objects, including trespassing people or animals, tools left behind after construction, and hanging objects, which all bring safety hazards to train operation [7–8]. With the increase of wagon transportation and the speed increase of passenger cars, the train and foreign object impact has more destructive, and these foreign object intrusion events are random and unpredictable, so there is a need to have a reliable method of foreign object intrusion detection, once a foreign object is detected on the track, take immediate measures to send out on-site alarm prompts and send alarm information to the dispatch center, and take appropriate methods to remove the foreign object in a timely manner. This method can greatly reduce the incidence of safety accidents, which is of great significance for maintaining rail transportation safety [9–12].

Early through the installation of protective nets, manual inspection means to prevent the invasion of foreign objects, for the densely populated eastern region, the installation of protective nets, manual inspection can effectively reduce the probability of invasion of foreign objects, but for the line is longer, sparsely populated areas of the railroad line, such as the Qinghai-Tibet line, the protective nets and manual inspection is not only time-consuming and labor-intensive, and the effect of the protection is poor [13–17].

With the development needs of the national intelligent transportation construction, the existing railroad facilities video surveillance and operation and maintenance means have been unable to meet the new task requirements, to achieve the dynamic real-time monitoring of intrusion monitoring of the operating site, monitoring the real-time collection and analysis of anomalous information, the automation of personal protection, monitoring data analysis and operation and maintenance of the requirements of the intelligent is imminent [18–19]. Therefore, on the basis of the existing video surveillance network, combined with artificial intelligence, edge computing and other technologies, the study of intelligent image recognition system for railroad operation safety is of great significance to the intelligent management of railroad operation [20–21].

Based on the images captured by road traffic surveillance cameras, this paper proposes a foreign object recognition method of “pre-determination” + “fine detection”. The pixel coordinates are extracted first, the video image is divided into sensitive and non-sensitive areas, and a moving object detection algorithm based on image technology is used to prejudge the intrusion. A pedestrian intrusion video is taken as an experimental sample to compare the detection accuracy of this paper's moving object detection method with that of the optical flow method, the three-frame differential method, the KNN background differential null and the filled KNN background differential method. The frames judged to have foreign objects are finely detected using the convolutional neural network algorithm. PASCAL VOC is used as the experimental dataset to measure the evaluation index values of the classical target detection algorithms and this paper's method under this dataset, and analyze the foreign object intrusion recognition accuracy of the model. In addition, this paper's method is utilized to detect different objects and in different complex environments to explore its foreign object intrusion recognition effect in different scenarios.

2

Overall technical approach

Detection of intruding foreign objects is a very important part of a road traffic intelligent monitoring system. At the same time, whether there are foreign objects intruding into the traffic line will be related to whether the vehicle can be operated safely. Although the moving object detection technology can timely and accurately detect when a moving object enters the monitoring area, it does not have the ability to recognize and distinguish moving objects. Therefore, this paper proposes a CNN-based intelligent recognition method for foreign object intrusion, whose overall technical route is shown in Fig. 1, including two processes of moving object detection and intelligent recognition analysis. 1)

Extract the pixel coordinate matrix from the input video image sequence.

2)

Divide the image into multiple grids and determine the center coordinates of each grid, and classify the grids into sensitive and non-sensitive areas based on the Euclidean distance between the center coordinates of the grid and the road edge line.

3)

Using traditional image recognition methods, a target detection algorithm is used to initially detect whether there is a moving object in each grid.

4)

If a moving object is detected, a foreign object intrusion intelligent recognition model is used for further judgment.

3

Moving object detection methods

Moving object detection mainly uses digital image processing techniques to accomplish the initial detection of moving targets within the video image, and the process detects all input videos in a comprehensive and real-time manner, including extracting pixel coordinates, calculating image sensitive areas, and detecting moving objects.

3.1

Extracting pixel coordinates

The process of extracting pixel coordinates from the input video image sequence is shown in Fig. 2. The exact implementation is as follows: 1)

Preprocessing the input images. In order to ensure that all calculations for image pixel coordinates are within the same scale standard, the pixel dimensions of the input video sequence images are adjusted in a standardized way, and the pixel dimensions of each image frame are adjusted to (W × H). Then grayscaling, Gaussian smoothing filtering and other means are used to remove noise and other defects in the original input image that exist due to the influence of external factors, to improve the quality of the image and simplify the image data, so that it is convenient for the subsequent extraction of image features.

2)

Extract feature data from the processed single-channel image. The Canny operator is used to detect the edge features of the image, and then the useful region of interest (ROI) is extracted from the feature map, which facilitates the removal of redundant and useless information and reduces the amount of computation at the same time. Finally, the Huogh straight line detection algorithm is used to plot all the straight lines in the ROI region.

3)

Extract coordinates in the image. The straight lines in the image that meet the road characteristics are screened by setting the slope threshold, and then the coordinates of the endpoints of all the straight lines are extracted and fitted to generate the equations of the straight lines, and ultimately the pixel coordinate matrix of the two roads is obtained based on the equation of the straight lines [x_i, y_i], which is used for the subsequent delineation of the sensitive areas.

3.2

Calculate the image sensitive area

There are also edge areas such as vegetation outside the line limits in the acquired field monitoring images, and foreign objects falling in the edge areas will not affect the safety of vehicle operation, so excluding the interference of the edge areas when detecting the intrusion of foreign objects not only improves the accuracy of the calculation of the key area information, but also saves the computational resources. In this section, the extraction of image key information is realized by dividing the image into sensitive and non-sensitive grids, and the sensitive region calculation process is shown in Fig. 3. Firstly, the standardized video image sequence is divided into K × K scale grids, and the pixel coordinate of the center of each grid is [x_i, y_i].

The shortest Euclidean distance from the center coordinate of each grid in the image to the edge of the road is then calculated: 1 $d_{j} = M i n {\sqrt{{(x_{j} - x_{i})}^{2} + {(y_{j} - y_{i})}^{2}}}$ d_j represents the minimum value of the Euclidean distance between the center coordinates of the j nd grid and the road, where (x_i, y_i) is the center coordinate of the j th grid, and (x_i, y_i) is the pixel point coordinate of said road.

Finally, the plurality of grids are classified into sensitive and non-sensitive areas based on the Euclidean distance of the grid center coordinates from the road. If d_j < Max{W / 2K,H / 2K}, the j th grid corresponding to d_j is a sensitive area grid, otherwise the j th grid corresponding to d_j is a non-sensitive area grid. Any area in the middle of the grid that is sensitive in the X -axis direction should be counted as a sensitive area. Record the coordinates of the upper left vertex [x_ken, y_w] and the lower right vertex [x_right, y_down] of each sensitive area grid.

3.3

Detecting moving objects

Using traditional image processing methods, moving object detection is performed for each grid individually, and if a moving object is detected to appear in a sensitive grid, the sensitive image sequence is sent to the central device for further recognition and analysis. For the input video image sequence, the grayscale values f_n(x, y) and f_n–m(x, y) are calculated for each grid of the n st and n – mnd frame images, based on which the average grayscale difference between each grid of the previous and previous two frames is calculated: 2 $m_{j} = | [f_{n} (x_{j}, y_{j}) - f_{n - m} (x_{j}, y_{j})] / [W \times H + (K \times K)] |$ where m_j is the average gray difference value of the j nd grid, (x_j, y_j) is the center coordinate of the j th grid, f_n(x_j, y_j) is the gray value of the j th grid of the n th frame image, and f_n–m(x_j, y_j) is the gray value of the corresponding j th grid of the n – m th frame image.

Set a suitable threshold, if the average gray difference value m_j is greater than the gray threshold, it means that the corresponding grid pixels in the image have changed, representing that there may be a moving object, and upload the image of this frame to the central device for further recognition and analysis. Otherwise, it is judged that there is no moving object and continues to process the next grid or the next frame image.

3.4

Experimental results and analysis

In order to test the detection effect of the moving object detection method in real scenarios, this paper shoots a video of foreign object intrusion on the railroad line, in which the pedestrian walks from theposition close to the camera to the distance, and then approaches the camera position from the distance, simulating the pedestrian intrusion scene in the real railroad line, because the pedestrian has been located in the video within the safety limit of the railroad, and the number of frames that are judged to be the intrusion of the foreign object should be the full video frame number. The number of frames should be the whole video frame. So that the number of detected foreign object frames is f_r, the total number of frames is f_z, and the accuracy of the foreign object judgment method is P_r, i.e., the percentage of foreign object intrusion frames in the total number of frames. Because the intrusion region has removed most of the interference of irrelevant images for detection, this paper sets the foreign object judgment pixel point threshold T to 25, and when the pixel point where the foreign object is located exists within the intrusion region and the area is more than 25, it is judged that there is a foreign object. The video is imported into the Pycharm development platform, and the number of frames judged as having foreign objects is compared with the optical flow method, three-frame difference method, KNN background difference method, cavity filling KNN background difference method and the method in this paper.

The results of the foreign object judgment accuracy comparison are shown in Table 1. The lower judgment accuracy rate (67.83%) of the optical flow method is mainly due to the fact that when the pedestrians are moving in the distance, the embodied movement range in the camera is small, and the optical flow method has increased the difficulty of detection, and the range of action is small. The three-frame differential method has improved accuracy (70.67%) compared to the optical flow method, but due to more voids, it can actually only segment the foreground contour of the foreign object, and the foreground area that can be segmented is smaller when detecting distant objects, which leads to poor detection. While the KNN background differencing method has a high accuracy (85.31%), which is slightly improved after void filling, the void filling KNN background differencing method performs better in terms of foreground segmentation completeness (88.20%). Taking into account the above comparisons, the method in this paper is the most efficient for detecting moving objects, with a detection accuracy of 92.33%.

Table 1.

Comparison of foreign body judgment accuracy

Methods	f_r	f_z	P_r
Light flow method	1265	1865	67.83%
Three frame difference method	1318	1865	70.67%
KNN background difference method	1591	1865	85.31%
Hollow filling KNN background difference method	1645	1865	88.20%
Our method	1722	1865	92.33%

4

Foreign object intrusion intelligent identification

After the center device receives the picture of the scene where a moving object is detected, it uses the intelligent recognition model of foreign object intrusion established through convolutional neural network to analyze the type of foreign object more accurately.

4.1

Convolutional Neural Networks

4.1.1

Convolutional layers

In mathematical operations the convolution operation is a mathematical operator that generates a function z outside the piece from two known functions f, g. Commonly used to characterize the function f after rotation, translation and other operations with two numbers g overlap the area of the definition of the formula shown in equation (3): 3 $z (t) \overset{d e f}{=} f (t) * g (t) = \sum_{t = - t .}^{\infty} f (τ) g (t - τ)$

Its integral form is: 4 $z (t) = f (t) * g (t) = \int_{- \infty}^{+ \infty} f (τ) g (t - τ) d τ = \int_{- \infty}^{+ \infty} f (t - τ) g (τ) d τ$

And the convolution function in the image processing process, the use of discrete forms of convolution operation, as an example, the gray-scale image, the convolution operation on the image can be expressed as: 5 $z (i, j) = f (i, j) * g (i, j) = \sum_{k, j} f (i - k, j - k) g (k, l)$

Where, f(i, j) represents the gray value of the point on the i nd row and j rd column on the image. g is called convolution kernel and its size can be set. If the size of g is taken as 3*3, the result of the above convolution equation can be expressed as: 6 $\begin{array}{l} [\begin{matrix} a 1 & a 2 & a 3 \\ a 4 & a 5 & a 6 \\ a 7 & a 8 & a 9 \end{matrix}] * [\begin{matrix} b 1 & b 2 & b 3 \\ b 4 & b 5 & b 6 \\ b 7 & b 8 & b 9 \end{matrix}] \\ = a 1 \times b 1 + a 2 \times b 2 + a 3 \times b 3 + a 4 \times b 4 + a 5 \times b 5 \\ + a 6 \times b 6 + a 7 \times b 7 + a 8 \times b 8 + a 9 \times b 9 \end{array}$

Convolutional operations can be done to detect and extract the shape, texture and specific colors of an image by the selection of different convolutional kernels. Three convolution kernels as given in equation (7): 7 $K_{e} = [\begin{matrix} 0 & - 4 & 0 \\ - 4 & 16 & - 4 \\ 0 & - 4 & 0 \end{matrix}], K_{h} = [\begin{matrix} 1 & 2 & 1 \\ 0 & 0 & 0 \\ - 1 & - 2 & - 1 \end{matrix}], K_{r} = [\begin{matrix} 1 & 0 & - 1 \\ 2 & 0 & - 2 \\ 1 & 0 & - 1 \end{matrix}]$

Where K_c is commonly used for overall edge filtering, K_h is commonly used for horizontal edge filtering, and K_r is commonly used for vertical edge filtering.

4.1.2

Pooling layer

Pooling is an operation that aggregates statistics of features at different locations in an image. A pooling unit calculates the value of a local block in the feature map, and adjacent pooling units read data from a small area by moving several rows or columns, and then process the data.

4.1.3

Activation layer

The activation layer is located after the operational layers such as the convolutional layer and pooling layer, and is used to set the activation function in the convolutional neural network. Sigmoid function is a common activation function and its expression is shown in equation (8): 8 $f (x) = \frac{1}{1 + e^{- x}}$

In order to avoid the occurrence of gradient saturation effect, the modified linear unit (ReLU function) is introduced into the neural network with the expression shown in equation (9): 9 $f (x) = \max (0, x)$

The ReLU function has hard saturation at x<0. When x>0, its functional derivative is 1, so ReLU is able to keep the gradient non-decaying for x>0.

Other functions that can be used as activation functions include the Tanh-Sigmoid hyperbolic tangent function: 10 $\tanh (x) = \frac{1 - e^{- 2 x}}{1 + e^{2 x}}$

Exponentiated linear unitary ELU function: 11 $f (x) = {\begin{array}{l} x, x > 0 \\ α (e^{x} - 1) . x \leq 0 \end{array}$

Parameterize the ReLU function, where the value of α varies with the network structure: 12 $f (x) = {\begin{array}{l} x, x > 0 \\ α x, x \leq 0 \end{array}$

4.1.4

Full connectivity layer

After the multi-layer convolution and pooling operation, the information of the picture needs to be output in the form of classes, and then it is necessary to apply the fully connected layer to generate an output equal to the number of classes needed. The fully connected layer acts as a “classifier” in the whole convolutional neural network.

4.2

Training of Convolutional Neural Networks

The training process of a neural network involves finding the parameters that correspond to the minimization of the cost function from the existing samples. The solution to minimize the cost function is usually done using the gradient descent method. The most important step is to find the gradient, which can usually be achieved using the backpropagation algorithm.

4.2.1

Consideration Functions

The cross entropy cost function is used in the convolutional neural network instead of the variance cost function with the expression: 13 $J (W, b; x, y) = - \frac{1}{m} \sum_{i = 1}^{m} [y, \ln h_{i} + (1 - y_{i}) \ln (1 - h_{i})]$

For a set of input-output data (x_i, h_i) for a single neuron, the cross-entropy cost function can be expressed as: 14 $J (W, b; x_{i}, y_{i}) = - [y_{i} \ln h_{i} + (1 - y_{i}) \ln (1 - h_{i})]$

The sensitivity of this neuron, i.e., the partial derivative of the cost function with respect to the input, is determined through equation (14) as: 15 $δ_{i} = \frac{\partial J (W, b; x_{i}, y_{i})}{\partial x_{i}} = - \frac{y_{i} - h_{i}}{h_{i} (1 - h_{i})} \frac{\partial h_{i}}{\partial x_{i}}$

This neuron using the sigmoid function then has: 16 $\frac{\partial h_{i}}{\partial x_{i}} = f^{'} (x_{i}) = f (x_{i}) [1 - f (x_{i})] = h_{i} (1 - h_{i})$

Bringing in Eq. (16), it can be obtained: 17 $δ_{i} = - (y_{i} - h_{i})$

4.2.2

Backpropagation Algorithm

For convolutional neural networks, a back propagation of error algorithm is required to determine the gradient of weight adjustment. Assuming that the relationship between input z^l and output h^l of a neuron at a certain layer is h^l = f(z^l) = f(W^lh^l⁻¹ + b^l), the expression formula of the cost function on the parameter derivatives can be expressed as: 18 ${\begin{array}{l} \frac{\partial J}{\partial W^{l}} = (h^{l} - y^{l}) □ f^{'} (z^{l}) \times {(h^{l - 1})}^{T} \\ \frac{\partial J}{\partial b^{l}} = (h^{l} - y) □ f^{'} (z^{l}) \end{array}$

Where □ denotes the Hadamard product, for two vectors A = (a₁ ,a₂, ⋯ a_n)^T and B = (b₁, b₂, ⋯ b_n)^T of the same dimension there are: A□ B = (a₁b₁, a₂b₂, a₃b₃, ⋯, a_nb_n)^T. At this point, let δ^l = (h^l – y)□ f^l(z^l), when the δ^l of layer 1 is obtained, the δ^l⁻¹ of layer l – 1 can be introduced sequentially as follows: 19 $δ^{l - 1} = \frac{\partial J}{\partial b^{l - 1}} = \frac{\partial J}{\partial z^{l - 1}} \frac{\partial z^{l - 1}}{\partial b^{l - 1}}$ 20 $z^{l - 1} = W^{l - 1} h^{l - 2} + b^{l - 1}$

Can be obtained from the above two equations: 21 $δ^{l - 1} = \frac{\partial J}{\partial z^{l - 1}}$

Then equation (19) can be transformed into: 22 $δ^{l - 1} = \frac{\partial J}{\partial z^{l}} \times \frac{\partial z^{l}}{\partial z^{l - 1}} = δ^{l} \frac{\partial z^{l}}{\partial z^{l - 1}}$

Finding δ^l⁻¹ requires the relationship between z^l and z^l⁻¹, which can be obtained from the neuron's output equation: 23 $z^{l} = W^{l} h^{l - 1} + b^{l} = W^{l} f (z^{l - 1}) + b^{l}$

Accordingly, it can be concluded: 24 $\frac{\partial z^{l}}{\partial z^{l - 1}} = {(W^{l})}^{T} □ (f^{l} (z^{l - 1}), f^{l} (z^{l - 1}), \dots f^{l} (z^{l - 1}))$ 25 $δ^{l - 1} = δ^{l} \frac{\partial z^{l}}{\partial z^{l - 1}} = {(W^{l})}^{T} δ^{l} □ f^{l} (z^{l - 1})$

This makes it possible to derive δ^l⁻¹ from δ^l and thus determine the gradient of the parameter in the previous layer.

4.2.3

Gradient descent

After the gradient has been determined, gradient descent is required for parameter updating. Assuming that the cost function assumes x* to be the point of minimal value of f(x), in order to obtain x*, it is necessary to choose the appropriate initial value x⁽⁰⁾ and keep iterating to update the value of x so that it is constantly close to x*. Since f(x) has a first-order successive partial derivative, if the value of the k th iteration is x^(k), a first-order Taylor expansion of f(x) can be carried out in the vicinity of x^{(k )}: 26 $f (x) = f (x^{(k)}) + g_{k} (x - x^{(k)})$ where g_k = g_k(x – x^(k)) = ∇f(x^(k)) is the gradient of f(x) at x^(k). The formula to find the value x^k⁺¹ for the k + 1 th iteration is: 27 $x^{k + 1} = x^{k} + λ_{k} p_{k}$

Where, ∇f(x^(k)) = g_k(x – x^(k)) is the search direction, which generally takes the negative gradient direction, i.e., p_k = –∇f(x^(k)) = –g_k(x – x^(k)), λ_k is the training accuracy, which needs to be set manually.

4.2.4

Boundary regression algorithm

In order to make the training and testing process faster, the concept of bounding regression algorithm is proposed. The bounding box of an object can be represented by a four-dimensional vector: (x, y, w, h), where (x, y) denotes the center coordinates of the bounding box and (w, h) denotes the width and height of the bounding box. The aim of bounding regression is to: for initialization parameter (P_x, P_y, P_w, P_h), seek an object box vector parameter (g_x, g_y, g_w, g_h) to make it close to the given object bounding box parameter (G_x, G_y, G_w, G_h ). i.e.: find a function f such that: 28 $f (P_{x}, P_{y}, P_{w}, P_{h}) = (g_{x}, g_{y}, g_{w}, g_{h}) \approx (G_{x}, G_{y}, G_{w}, G_{h})$

The function f can be artificially specified as: 29 $G_{x} = P_{w} d_{x} (P) + P_{x} G_{y} = P_{h} d_{y} (P) + P_{y} G_{w} = P_{w} \exp (d_{w} (P)) G_{h} = P_{h} \exp (d_{h} (P))$

Then the problem of determining function f can be transformed into the problem of obtaining parameter (d_x(P), d_y(P), d_w(P), d_h(P)) by gradient descent. Since the input data to the network are the CNN features corresponding to each window, i.e., a feature vector Φ_s(P), the quantity to be solved is transformed into w, and the formula is shown below: 30 $d_{*} (P) = w_{*}^{T} Φ_{s} (P)$ where * denotes (x, y, w, h), based on the given object bounding box data: (G_x, G_y, G_w, G_h), which can be ordered: 31 ${\begin{array}{l} t_{x} = (G_{x} - P_{x}) / P_{x} \\ t_{x} = (G_{x} - P_{y}) / P_{h} \\ t_{w} = \log (G_{w} - P_{w}) \\ t_{h} = \log (G_{h} - P_{h}) \end{array}$

The loss function is obtained as: 32 $L o s s = \sum_{i}^{n} {(t^{i} - {\hat{w}}_{i}^{T} Φ_{s} (P^{i}))}^{2}$

Optimization goals are: 33 $W_{*} = a g \min_{w_{*}} \sum_{i}^{n} [{(t_{*}^{i} - {\hat{w}}_{*}^{T} Φ_{s} (P^{i}))}^{2} + λ □ {\hat{w}}_{*} □^{2}]$

4.3

Convolutional model structure

The structure of the convolutional neural network designed in this chapter is shown in Fig. 4, with 31 layers, the first layer being the input layer, followed by a series of convolutional and pooling layers. The classification vector generated by the network contains the probability that the object to be detected belongs to each class.

In the network structure the convolution kernel is of 3*3 size with a step size of 1, and the ReLU function is used as the activation function. The pooling layer pooling unit size is 2*2, the step size is 2, and the maximum pooling is used uniformly. A fully connected layer is added in the middle of the convolutional layer and the pooling layer to compress the features of the image. The fully connected layer is realized by using convolution operation with convolution kernel of 1*1. The successive convolution-pooling structure can extract more underlying features from the input original image, and the multilayer network structure makes the extracted feature map gradually transformed into more abstract high-level features, which also makes the recognition ability of the sample improved. The final classification function is a softmax function, and it can be seen in equation (34):

34

P (i) = \frac{e^{θ_{i}^{'} x}}{\sum_{k = 1}^{k} e^{θ_{i}^{'} x}}

4.4

Experimental results and analysis

4.4.1

Data sets

In order to validate the foreign object intrusion recognition effect of the proposed algorithm in this paper, validation experiments are conducted on the PASCAL VOC dataset, which is a dataset containing a total of airplanes, bicycles, boats, buses, automobiles, motorcycles, trains, flying birds, cats, cows, dogs, horses, pedestrians, goats, potted plants, sofas, bottles, chairs, tables, and TVs containing a total of 20 types of detection targets, which consists of the VOC2007 and VOC2012 datasets are composed of two datasets, where the training set consists of the merged VOC2007 training set and VOC2012 dataset, and the test set consists of the test set of VOC2007.

4.4.2

Comparative analysis of models

In this section, the common framework of classical detection algorithms is used as a comparison, and a total of three algorithms, the two-stage detection algorithm Faster R-CNN VGG16, the single-stage detection algorithm SSD300, and YOLOv5s, are used for the control experiments. The main framework of the Faster R-CNN network is adopted as VGG16, with an input image size of 600 × 600. The SSD algorithm mainframe is adopted as VGG16 with an input image size of 300 × 300, and YOLOv5s-c is used for the YOLOv5s benchmark, which effectively weighs speed and accuracy.

The validation effect of different classical models on the dataset is shown in Fig. 5, which represents the mAP value under the 0.5 threshold of the model on the validation set during the training process of 300 iterations, and it is easy to see that this paper's CNN-based foreign object intrusion intelligent recognition method is significantly higher than the SSD, FasterR-CNN, and the YOLOv5s model in terms of the average accuracy, and the model converges to its average accuracy of 0.945.

The recognition performance of different algorithms on the test set is shown in Figure 6. The CNNbased foreign object intrusion intelligent recognition method in this paper improves the average accuracy of all categories on the test set by 2.04% to 18.64% and the FPS value by 3.44 to 74.44 relative to other intelligent recognition methods, and the detection accuracy and speed are both improved. It shows that the convolutional neural network model in this paper has a better effect on foreign object intrusion recognition than other intelligent recognition methods.

4.4.3

Identification of different foreign objects

On the PASCAL VOC dataset, this paper conducts comparison experiments to verify the effectiveness of the proposed algorithm using the CNN-based foreign object intrusion intelligent recognition method and other classical detection algorithms, and the recognition results of different foreign object intrusions are shown in Table 2 below. The foreign object intrusion intelligent recognition method proposed in this paper has good results on the PASCAL VOC dataset, and the accuracy of 18 out of 20 categories of objects is the highest among all the algorithms, and the average accuracy is increased by 7.92%, 6.98% and 5.19% compared with Faster-RCNN, SSD300 and YOLOv5s, respectively. The model size and test time results of this paper are 72.24M and 26.01ms, which have a greater model size and test speed, respectively.

Table 2.

Identification results of different foreign invasion

Algorithms	Faster R-CNN	SSD300	YOLOv5s	Our method
mAP (%)	80.55	81.49	83.28	88.47
Model size(M)	619.81	192.93	98.12	72.24
Test time(ms)	287.52	87.58	49.85	26.01
Airplane	76.13	70.35	78.99	84.55
Bicycle	81.72	70.14	78.51	85.95
Bird	84.14	87.96	80.87	86.78
Boat	82.69	80.72	90.22	93.46
Bottle	80.07	86.95	77.57	89.42
Bus	78.26	76.59	88.21	90.05
Car	86.55	87.49	89.77	92.67
Cat	72.91	81.89	82.73	85.14
Chair	81.45	78.58	80.49	84.43
Cow	83.31	80.47	82.26	85.34
Table	83.96	82.02	83.77	87.92
Dog	79.08	83.81	90.12	92.65
Horse	81.77	84.12	81.63	87.63
Motorcycle	86.59	83.24	82.37	88.81
Pedestrians	88.58	78.68	80.41	94.65
Potted plant	74.75	78.47	80.66	84.97
Sheep	83.93	92.64	87.32	90.81
Sofa	74.11	83.62	80.39	86.82
Train	77.84	80.59	86.74	90.44
TV	73.19	81.56	82.52	86.84

4.4.4

Identification of different environments

Simulate the actual railroad scene, respectively shooting six railroad videos in different periods, simulation experiments are conducted in MATLAB2016B platform to compare the CNN-based foreign object intrusion intelligent identification method of this paper with other classical detection algorithms. Experiment 1:

Simulated foreign object intrusion video sequence 1 shot in summer sunny scene, respectively, with different algorithms to detect the intrusion of foreign objects.

Experiment 2:

Simulated foreign object intrusion video sequence 2 taken in cloudy winter environment for detection.

Experiment 3:

In order to verify that the foreign object intrusion intelligent recognition method has a certain robustness to camera shake, simulate the external influence on the camera produced by the jitter characteristics, randomly selected video sequence frames for the detection of the effect of judgment.

Experiment 4:

Acquisition of a section of railroad video3 for the detection of moving objects in complex environments.

Experiment 5:

Acquisition of another location of the railroad video 4 for multiple moving objects detection.

Experiment 6:

In order to test the detection performance of the algorithm for the complex environment, the railroad video 5 at the same location is collected for multiple moving objects detection under the complex environment.

The foreign object recognition accuracy rates of different methods are shown in Figure 7. In different experimental environments, the recognition accuracy rate of foreign object intrusion by the CNN-based intelligent recognition method in this paper is above 88%, which is an overall improvement of 3.36% to 12.40% over other target detection algorithms, and can be well adapted to the detection of moving objects in the complex railroad environment.

5

Conclusion

Foreign object intrusion detection has been an important research in the field of transportation security, this topic is based on CNN neural network, and proposes an intelligent recognition model of foreign object intrusion that contains moving object detection and intelligent recognition. Railroad traffic images are selected to train and test the recognition model to explore its recognition effect on foreign object intrusion. The main conclusions of the research are as follows: 1)

In the moving object detection experiment, the detection accuracy of the detection method proposed in this paper for pedestrian intrusion in railroads is 92.33%, which is higher than that of the other four object detection algorithms by 4.13% to 24.50%, i.e., the moving object detection method proposed in this paper has a better performance of detecting moving targets.

2)

The CNN-based foreign object intrusion intelligent recognition method in this paper excels in both detection accuracy and speed. Its average accuracy and FPS values are improved over the comparison algorithms by 2.04%~18.64% and 3.44~74.44, respectively. The model can recognize different categories of objects more accurately, and its recognition accuracy for foreign object intrusion is greater than 88% in different complex environments.

3)

The CNN-based foreign object intrusion intelligent recognition system designed in this paper successfully and efficiently realizes the intelligent recognition of foreign object intrusion in rail transportation scenes by using image processing technology and deep learning technology, and the system is tested to have good detection effect in different environmental conditions, but the detection effect of part of the scene needs to be improved, and at the same time the dataset needs to be further enriched, so in the future, we can try to In the future, we can try to expand the original dataset to improve the richness of the dataset.

Język:: Angielski

Częstotliwość wydawania:: 1 razy w roku
Dziedziny czasopisma:: Nauki biologiczne, Nauki biologiczne, inne, Matematyka, Matematyka stosowana, Matematyka ogólna, Fizyka, Fizyka, inne

Kanał RSS czasopisma

Research on Intelligent Recognition System of Traffic Image Based on CNN and Intelligent Recognition of Foreign Object Intrusion

Jinlin Tan

Liang Wang

Xiaotian Yang

Yunfei Song

Weiming Wang

Xin Yu

Data publikacji: 19 mar 2025

Otrzymano: 31 paź 2024

Przyjęty: 06 lut 2025

DOI: https://doi.org/10.2478/amns-2025-0400

Słowa kluczoweCNN, Moving object detection, Foreign object intrusion, Intelligent recognition, Traffic image

© 2025 Jinlin Tan et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Słowa kluczowe
CNN, Moving object detection, Foreign object intrusion, Intelligent recognition, Traffic image