Research on Traffic Flow Detection by Incorporating Improved Deep Learning Algorithms under Intelligent Transportation Construction
Publicado en línea: 17 mar 2025
Recibido: 09 oct 2024
Aceptado: 03 feb 2025
DOI: https://doi.org/10.2478/amns-2025-0310
Palabras clave
© 2025 Tiancheng Ma, published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
Intelligent Transportation System (ITS) refers to the intelligent management and control of traffic through the application of computer technology, communication technology, and intelligent technology to improve the efficiency and safety of traffic flow. Deep learning algorithm is a kind of machine learning method, which simulates the working mode of human brain neural network and carries out complex pattern recognition and data analysis. Deep learning algorithms have been widely used in intelligent transportation systems and play an important role in improving traffic efficiency, optimizing resource scheduling, intelligent driving and so on [1-3].
The goal of intelligent traffic flow detection is to accurately understand the vehicle flow on the road in the future period. Past research is mainly based on traditional statistical methods and machine learning methods, such as linear regression and support vector machine. However, these methods often fail to capture complex traffic flow patterns and are not effective [4-5]. With the development of deep learning techniques, especially the application of recurrent neural networks (RNN) and convolutional neural networks (CNN), a great breakthrough has been made in intelligent traffic flow prediction. RNN can process serial data and can better cope with the characteristics of time-series data. CNN is suitable for extracting and learning spatial features in traffic data. Combining RNN and CNN can make full use of the spatio-temporal information in traffic data and improve the accuracy of traffic flow detection [6-8].
Literature [9] describes the vehicle detection algorithm based on YOLOv3 model and proposes a real-time vehicle tracking counter to achieve traffic flow detection by combining vehicle detection and vehicle tracking algorithms. And through experiments, it is proved that the model can effectively detect traffic flow on edge devices with high accuracy. Literature [10] shows that target detection in smart cities is an important way to avoid traffic congestion. An end-to-end target detection paradigm based on different deep learning methods including primary and secondary detectors for UAV images aimed at detecting targets in traffic congestion conditions is examined and an analysis of the evaluation is carried out, in terms of cost reduction and design optimization. Literature [11] proposes advanced deep learning methods and multiple vehicle tracking algorithms, which are tested using several different input videos and two benchmark datasets, and the results point to the fact that the method exhibits reasonably good tracking results. Literature [12] explores the issues that need to be addressed in order to achieve a seamless integration of ITS with deep learning, aiming at improving traffic flow, predicting the best routes for cargo transportation, intelligent environmental condition awareness, and traffic speed management and accident prevention. Literature [13] developed a target detection method based on deep learning techniques and proposed a vehicle recognition algorithm based on FE-CNN, and the experimental results verified that FE-CNN effectively improves the recognition accuracy and the convergence speed of the model, whereas the developed algorithm has a very high recognition efficiency in a traffic environment, demonstrating real-time and accurate detection capabilities. Literature [14] mentioned a practical and simple algorithm multi-target detection method YoLO, which is used in the detection of mobile vehicles, and the improved Kalman filter algorithm is utilized to dynamically track the detected vehicles, and the results show that the method is robust to the occlusion of the vehicle or congested roads, and its accuracy is very high. Literature [15] examines the techniques used to develop new systems for solving traffic congestion problems, emphasizing that safer, greener and more efficient roads can be achieved with the assistance of emerging intelligent transportation technologies. Literature [16] designed DeepsORT, a deep learning approach for automatic detection and tracking of urban vehicles, by fusing YoLov4 and detection-based multi-target tracking algorithms. Simulation experiments showed that the algorithm was able to achieve automatic detection and tracking of vehicles.
Literature [17] developed a deep learning architecture in order to predict traffic flow, which is capable of capturing nonlinear spatio-temporal effects, and practiced on two bursts of traffic flow, which verified the effectiveness of the architecture and demonstrated the accuracy of deep learning in short-term traffic prediction. Literature [18] reviewed the research results of deep learning in traffic flow prediction, reviewed various deep learning models developed to solve the problem of traffic flow prediction, and introduced the factors affecting these models and the effectiveness of these models under different conditions. Literature [19] constructed convolutional neural networks and used modern convolutional neural networks with fast regions to classify and detect objects. The whole map as input and feature class probability estimation as output are used for the study of cars, bicycles and other models.The results show that the projection system can significantly improve the detection accuracy of the traffic flow.Literature [20] mentions fast region-based convolutional neural networks for vehicle detection and systematically introduces Fast R-CNN.By using the SHRP 2 NDS database provided by VTTI to test the The accuracy of the proposed method is verified for detection. Literature [21] proposes the faster R-CNN method and applies FLIR ADAS dataset to evaluate thermal and RGB images and emphasizes through experiments on thermal map lines that the method is greatly improved in terms of accuracy and unambiguous detection with respect to the traditional Faster R-CNN method. Literature [22] introduces an enhanced framework based on Faster R-CNN for vehicle recognition with better accuracy and and faster processing time. A study on a customized dataset reveals that the presented method performs better in terms of both detection efficiency and processing time compared to the traditional Faster R-CNN model. Literature [23] mentions a convolutional neural network based vehicle detection model and optimizes the model for best performance by evaluating different parameter configurations. The model is specified experimentally to be trained in combination with Python and OpenCV datasets with an accuracy of more than 94%, while the precision exceeds 95%.
In this paper, deep learning algorithms are used to localize and recognize target vehicles in videos or images, in order to detect traffic flow. Firstly, the detection principle and network framework of YOLOv5s algorithm are elaborated in detail, then for the problems of lack of vehicle detection accuracy and insufficient tracking in complex scenes, this paper puts forward three improvement points and names the improved algorithm as YOLOv5s-ours. The network framework partly increases the attention mechanism module, optimizes the multi-scale feature fusion stage, and reconstructs the Head structure. Finally, the improved YOLOv5s-ours algorithm is tested for reliability through real-time system tests and tracking the trend of loss changes during training. The actual performance of the algorithm before and after the improvement is also compared in complex scenarios.
YOLO series of algorithms are the main algorithms for single stage detection algorithms based on deep learning regression algorithms.Faster R-CNN is the dominant target detection method with better accuracy but still not up to the speed of real time.The advent of end to end detection algorithms such as YOLO based on regression methods has opened up the possibility of real time applications. The regression algorithm is as follows, obtaining the coordinates and class probabilities of the bounding box directly from the pixels of the input image, i.e., the objects and locations of the occurrences can be predicted by viewing them only once on the image. The main advantages are the direct skipping of the region proposal process, the simple structure, which consists of a single network trained end-to-end, which greatly improves the speed of the model.In addition, the model can acquire global information about the image.
YOLO uses the entire topmost feature map to predict the location of the bounding box and the confidence level of multiple categories.YOLO is simple and clear, YOLO divides the input image into
The complete network structure of YOLO is borrowed from GoogLeNet. the main difference is in the last two layers of the structure, where the convolutional layer is followed by a 4096-dimensional FC connected to an
Compared to traditional object detection methods, YOLO, a unified model, has many advantages. First, YOLO is fast. Because object detection is defined as a regression problem, there is no need to perform candidate region extraction; the regression directly predicts the location and class of the object. Running on TitanX and without batch processing, YOLO’s base network operation can reach a speed of 45 frames per second, which shows that YOLO can meet real-time processing requirements. Second, unlike methods based on sliding window and region proposals, YOLO synthesizes the global information of the image during training and testing. Therefore, YOLO actually implicitly encodes the context and appearance information related to the class as well.FastR-CNN is a promising detection method, but it has the problem of incorrectly detecting regions in the background as objects, the main reason being that it is not able to see the larger contextual information. Compared to FastR-CNN, YOLO has a much reduced number of background false detections, which is less than half of it. Further, YOLO can learn a generalized representation of the object, i.e., it generalizes well. When tested on images for training, YOLO outperforms top detection methods such as DPM and R-CNN. However, in terms of accuracy, YOLO is not very competitive and still lags behind detection algorithms such as FastR-CNN. Because of the absence of the region proposal mechanism, only the grid of
The YOLOv5 algorithm model is an update to Ultralytics that enables fast training and running inference. The detection results of YOLOv5 are clearer than those of several previous YOLO algorithms and are optimized for issues like overlapping borders. Throughout the four YOLOv5 models, among them, YOLOv5s has gained more favor among researchers and scholars by being faster and stronger, not only the detection speed is about 2.5 times faster compared to the other three, but also in terms of the detection accuracy, the detection of small targets also has better results. Therefore, it is decided to adopt YOLOv5s as the target detection model in this study.
The YOLOv5 network architecture is examined as depicted in Fig. 1. For YOLOv5s target detection algorithm, it is divided into four modules for the convenience of the study, which are input, backbone network backbone, Neck network and output Head.

Yolov5 network architecture analysis
The benchmark network module is usually used to extract some feature maps. The benchmark network of YOLOv5 uses the Focus structure, and this network structure is an original version of YOLOv5. The Focus structure is a cropping and slicing operation on the input image, and after feeding the original image into the Focus structure, a feature map with reduced size but elevated dimension is output through a series of slicing operations, but no information is lost; then a feature map of 304 × 304 × 32 is output through the convolution layer with 32 convolution kernels (filters) to complete the feature extraction.
The Neck network bridges the feature information generated by the backbone network to further enhance the anti-interference ability of the extracted features.YOLOv5 has just begun to use mainly the Neck network aggregated feature pyramid network FPN module, which is a bottom-up feature mapping input that integrates the information paths of all the feature layers and enhances the detection of images of different sizes, and therefore is able to recognize objects of different sizes. Finally, integrating the PAN structure into the CSP2 structure used by the neck network of YOLOv5s enables the model to learn more features.
The loss function of deep learning indicates the update iteration ability of the network, which needs to complete the updating of model parameters by calculating the gradient of the loss value.The loss function of YOLOv5 chooses CIoU_Loss, which calculates the position loss between the predicted frame and the real frame, so as to find the gap with GT. In contrast, YOLOv5 uses CIoU_Loss as in equation (2):
Due to some shortcomings of the initial IoU_loss loss function, when there is no overlap between the predicted frame and the GT real frame, no correlation can be generated between the two, and therefore the gap between the two cannot be measured and backpropagation is not possible; on the other hand, if the two frames produce the same IoU, the loss function does not distinguish between them. In view of this, the loss function CIoU_Loss of YOLOv5 additionally takes into account the distance between the aspect ratio and the center point to solve this problem. Finally, the DIoU_NMS method is used to filter the prediction frames, and the most suitable prediction frames can be obtained by subjecting the many obtained prediction frames to the non-maximum suppression NMS process.
In really complex traffic scenarios, such as dense vehicles, dense vehicles in the dark, multi-lane roads, foggy weather, etc., where vehicles shade each other as well as light, trees and pedestrians interferes, the YOLOv5 vehicle detection and traffic flow analysis algorithm based on it is still deficient in terms of accuracy, speed and vehicle tracking. Therefore, an improved YOLOv5 algorithm is proposed in this paper and named YOLOv5-ours.
Although the YOLOv5 algorithm has added many modules to improve its performance, for the time being, its performance is still difficult to maintain a faster detection speed while achieving a sufficiently high accuracy, and there is still a lot of room for improvement in the performance of small-scale target detection. Therefore, this paper proposes three improvement points based on the YOLOv5 algorithm: (1) Adding the attention mechanism module to the backbone network part. (2) Optimize in the multi-scale feature fusion stage. (3) Reconstruct the structure of Head part. The above improvement points aim at maintaining the detection speed while improving the detection accuracy as much as possible. In this section, we will elaborate on the implementation of the three improvement points above in detail.
In order to enhance the performance of YOLOv5 for vehicle detection, this paper introduces the CA attention mechanism, which is obtained by further improving on the basis of SENet. Considering the attention mechanism module as a computational unit, inputting any intermediate feature tensor
SENet as a whole can be divided into two steps, squeezing and excitation, for Squeeze on channel
The second step Excitation can be expressed by Equation (4):
From Eq. (4), the information about the dependence between channels can be obtained, which is the main role of the step Excitation. Where the symbol. Denotes the channel multiplication,
SENet can obtain the information between channels, but in the process of two-dimensional global pooling will be lost to the location information, the emergence of CA is designed to allow the model to obtain long-distance features while retaining accurate location information. The specific implementation can still be divided into two steps, which are coordinate information embedding and coordinate attention generation.
In the coordinate information embedding stage, CA splits Eq. (4) into two one-dimensional feature encoding operations, so the output of channel
Similarly, the output of channel
Coordinates note the generation phase by doing the splicing operation of the two results obtained from Eq. (5) and Eq. (6) in the first phase, followed by processing them through a 1×1 convolutional transform function
The feature fusion in the YOLOv5 algorithm is not effective at collecting feature information at multiple scales, and the structure is simple and less efficient. Therefore, this paper introduces the BiFPN structure into YOLOv5s algorithm, aiming to improve the ability of the model to extract features and at the same time, more fully utilize the feature information of different scales.BiFPN can make full use of the feature information of different scales, and the FPN and PAN structures are only simple to fuse the feature information of different scales in one direction or in both directions, whereas the bi-directional fusion module of the BiFPN structure is more complex and can be overlapped repeatedly, so that the extracted feature tensor can retain more feature information, which is more effective. The two-way fusion module of the BiFPN structure is more complex and can be overlapped repeatedly, so that the extracted feature tensor can retain more feature information from different scales.
In this paper, Decoupled head is used to replace YOLO head in YOLOv5 algorithm, aiming to accelerate the convergence speed of the model and further improve the performance of vehicle detection by decoupling the computation.The Neck module of YOLO series algorithms is very highly coupled with the Head module, such as the structure of the FPN and the PAN in the Neck module, and the classification problem and the regression task are conflicting with each other in the target detection algorithm. algorithm, the classification problem and the regression task are in conflict with each other, then the classification and regression analysis in the Head module should ideally be independent of each other, rather than coupled together to reduce the performance of the model, Decoupled head is based on the idea of separating the classification and regression analysis to accelerate the convergence speed of the model and increase the overall performance of the model.
Systems with practical application value need to be rigorously tested for reliability in order to be recognized by users, this subsection will be based on deep learning traffic flow detection system for reliability testing, mainly from two aspects of the indicators to test, on the one hand, the real-time testing of the system detection, this indicator determines whether the system can meet the requirements of real-time traffic flow detection, whether it has the value of the actual deployment of the ground. On the other hand, the accuracy of the test model when running on the system is determined by this indicator. This determines whether the system can meet the requirements for detecting system deployment. In the real-time test of system detection, the main purpose is to calculate the time consumed by the processor from reading the image from the read register to the completion of the traffic flow detection reasoning, and use the library function sys/time.h to calculate the time difference between the program running, so as to complete the real-time test of system detection. In this paper, the real-time test for system detection is as follows: a duration of 120 seconds, with a frame rate of 30FPS for road vehicle monitoring video traffic flow detection experiments. When the target vehicle first collision to the yellow virtual coil indicates that the target vehicle upward, at this time the number of upward vehicle counter plus 1; when the target vehicle first collision to the cyan virtual coil indicates that the target vehicle downward, at this time the downward vehicle counter plus 1, so as to realize the real-time display of traffic flow.
The results of the system real-time test are shown in Figure 2. In order to verify the effectiveness of the improved detection algorithm running on the system, the original YOLOv5s algorithm and the improved algorithm YOLOv5s_Ours are used as the detector of the traffic flow detection system for experimental comparison, respectively. This experiment uses frame-skipping detection, which is set in the program to detect every 10 frames interval, so as to improve the efficiency of traffic flow detection. sys/time.h library function is used in the detection system to calculate the time consumed from reading the video from the registers to completing the inference of traffic flow detection. It can be seen that the time used by YOLOv5s_Ours is less than YOLOv5s, up to 75ms.

Real-time test results of the system
In order to visualize and analyze the loss trend of the algorithm networks before and after the improvement, the loss change curves of the YOLOv5-ours algorithm and the YOLOv5 algorithm networks during training are extracted individually as shown in Fig. 3. The training loss curve of YOLOv5 is always above that of YOLOv5-ours, which indicates that the loss of YOLOv5 during training is always higher, and the convergence speed of YOLOv5-ours algorithm is higher than that of YOLOv5 algorithm.The loss of YOLOv5 algorithm decreases rapidly until 10 Epochs, and starts to level off from about 170 Epochs, and the final value of the loss is 0.623.The loss of YOLOv5-ours algorithm decreases rapidly until 15 Epochs, and the loss of YOLOv5-ours algorithm decreases rapidly until 52 Epochs, which is due to the learning rate of YOLOv5-ours. A sudden change occurs, which is due to a change in the learning rate, which is done here to prevent the network from overfitting. From the 70th Epoch onwards, the loss direction tends to stabilize gradually, and the final loss value reaches 0.244. The smaller the loss value, the closer the distance between the detection box and the labeling box, the more accurate the detection, which indicates the effectiveness of the improved algorithm.

Modified loss variation image
As can be seen from the comparison experiment, when the traffic flow system uses YOLOv5s_Ours as the vehicle detector, the average time consumed in the RV1126 platform to reason about 360 frames of images, run once to read the images from the register to complete the traffic flow detection reasoning is about 66.9ms, while the average time consumed to complete the traffic flow detection reasoning when YOLOv5s is used as the vehicle detector is about 78.4ms, and its reasoning speed is significantly slower than that of YOLOv5s_Ours. road vehicle monitoring video tests, the results of the statistical detection of the number of vehicles, and the results of the accuracy test of the traffic flow detection system are shown in Table 1: the accuracy of the system based on the detection of YOLOv5s_Ours is 6% higher than the accuracy of the system based on the detection of YOLOv5s. In the accuracy test of the system based on YOLOv5s_Ours detection, the largest number of cars is detected, and its detection accuracy is 99.11%, the accuracy of bus and other types of car detection is 100%, the accuracy of van detection is 96%, and the overall detection accuracy of the traffic flow detection system is 98.67%, which is basically in line with the requirements of the traffic flow detection on the accuracy rate. As can be seen from the above real-time testing of system detection and accuracy testing of system detection, all the indicators of system detection have a very good performance, and can meet the performance requirements for deployment in real application scenarios.
Test system accuracy test
| Yolov5s test system accuracy test | |||
|---|---|---|---|
| Vehicle type | Actual quantity ¥ vehicles | System detection statistics ¥ vehicles | Detection accuracy |
| Car | 112 | 110 | 98.21% |
| Bus | 8 | 5 | 62.50% |
| Van | 25 | 20 | 80.00% |
| Other | 5 | 4 | 80.00% |
| Total amount | 150 | 139 | 92.67% |
| System accuracy test for rep_yolov5s_ours | |||
| Car | 112 | 111 | 99.11% |
| Bus | 8 | 8 | 100.00% |
| Van | 25 | 24 | 96.00% |
| Other | 5 | 5 | 100.00% |
| Total amount | 150 | 148 | 98.67% |
Two sets of experiments are designed to test the performance of the proposed background prediction and group normalization-based vehicle flow detection algorithm in dense environments.Experiment 1 evaluates the effectiveness of the algorithm in a dense vehicle environment during the daytime, while Experiment 2 evaluates the effectiveness of the algorithm in a dense vehicle environment at night. The problem of different scale sizes and mutual occlusion between vehicles exists in both tests. Traffic surveillance videos with different weather conditions and different shooting angles were selected, totaling 10 videos with a video length of 15 minutes. Firstly, target vehicle detection is performed on each video, and the vehicle category and vehicle detection frame are marked. The detection results are shown in Fig. 4. From the experimental results, it can be seen that the vehicle detection method has better performance in videos with different weather conditions and different shooting angles, and the traffic flow detection accuracy reaches more than 95%, which indicates that the traffic flow detection algorithm based on the background prediction and group normalization in dense environments can solve the problem of the traffic flow computation error due to the omission and misdiagnosis of the vehicles in the dense environments, and improve the detection accuracy of the traffic flow. The algorithm is based on background prediction and group normalization.

Specific data
Traffic surveillance videos with different shooting angles are selected, totaling 10 videos with a duration of 10 minutes. Firstly, target vehicle detection is carried out for each video, and the vehicle category and vehicle detection frame are labeled. The specific detection data is shown in Figure 5. From the experimental results, it can be seen that the vehicle detection method has a better performance in the videos with different shooting angles under nighttime conditions, and the traffic flow detection accuracy reaches more than 90%, which indicates that the traffic flow detection algorithm based on the dense nighttime environment can solve the problem of the traffic flow calculation error caused by the omission and misdetection of vehicles at nighttime in the dense environment, and improve the detection accuracy of the nighttime traffic flow.

Effect assessment
In this subsection, the accuracy of the traffic counting scheme designed in this paper is verified in different traffic environments. The road traffic videos used to validate the traffic counting scheme are all obtained manually by shooting at the angle of a simulated surveillance camera on a viaduct. A total of four videos were shot, namely, three lanes in daytime environment, five lanes in daytime environment, three lanes in nighttime environment and five lanes in nighttime environment, and the length of each of the four videos was fifteen minutes. The experiment uses the YOLOv5s_Ours algorithm as a detector, and to verify the accuracy of the statistics, the real values are counted manually, and the experimental results are compared and analyzed with the real values. In order to verify the effectiveness of the improved algorithm, the YOLOv5 algorithm was also used for comparative experiments, and the specific data are shown in the subsequent table of experimental results. First of all, the daytime environment of the traffic flow statistics experiment, using the YOLOv5s_Ours algorithm of the experimental process is shown in Table 2. As can be seen from the experimental results, the traffic counting scheme designed in this paper achieves an accuracy of 94.88% in the three-lane environment during the daytime, which is 3.86% higher than the traffic counting scheme using the original algorithm, and has a better performance. Since the number of vehicles in this road condition is not much and the road condition is smooth, the accuracy of both methods is higher, but the accuracy of this paper’s scheme is still improved. In the daytime five-lane environment, the accuracy of the two schemes has a large gap, the accuracy of this paper’s program is 94.10%, compared with the original algorithm of traffic flow statistics method to improve the 8.78%. This is due to the complexity of the five-lane roadway and the high number of vehicles, which are prone to congestion and mutual occlusion.
Experimental results of multi-lane traffic flow
| Statistical plan | Experimental environment | Experimental results | True result | Accuracy rate |
|---|---|---|---|---|
| Yolov5s test system accuracy test | Triplane | 400 | 430 | 91.02% |
| Five lane | 593 | 695 | 85.32% | |
| System accuracy test for Yolov5s_ours | Triplane | 408 | 430 | 94.88% |
| Five lane | 654 | 695 | 94.10% |
This paper focuses on vehicle flow detection based on YOLOv5 to improve and optimize the vehicle detection algorithm. The accuracy of YOLOv5-ours for vehicle detection and traffic flow detection in complex environments is explored through comparative experiments. The experimental results show that:
YOLOv5-ours consumes less time than YOLOv5 from reading video to completing traffic flow detection. The loss value of YOLOv5-ours is 0.244, which is smaller than that of YOLOv5’s 0.623, indicating the effectiveness of the YOLOv5-ours algorithm in the traffic flow detection comparison experiments on a section of road vehicle monitoring video.
The improved vehicle detection algorithm can detect more accurately in various complex scenarios and can adapt to real-time detection needs.
