Vehicle Target Detection in Rainy and Foggy Scenes Based on Generative Adversarial Networks and Dynamic Fuzzy Compensation Techniques
Online veröffentlicht: 29. Sept. 2025
Eingereicht: 14. Jan. 2025
Akzeptiert: 22. Apr. 2025
DOI: https://doi.org/10.2478/amns-2025-1096
Schlüsselwörter
© 2025 Tao Dong et al., published by Sciendo.
This work is licensed under the Creative Commons Attribution 4.0 International License.
Weather changes are also an important factor affecting traffic accidents, and bad weather is an important causative factor for traffic accidents [1]. In typical bad weather such as foggy and rainy conditions, the driver cannot clearly identify the direction of travel due to low visibility and insufficient sight distance during driving, and the control performance and stability of the car will be affected [2-3]. Compared with sunny day accidents, the consequences of traffic accidents caused by bad weather are more serious, and the casualty rate increases significantly, about 10 times of normal weather conditions [4].
How to reduce the incidence of traffic accidents and improve the safety performance of road transportation is a global problem that deserves the common attention of mankind, and is also a great challenge faced by scientific researchers and scholars and related practitioners around the world [5-6]. In order to reduce the occurrence of traffic accidents and improve the efficiency of vehicle traffic, automobiles are gradually developing in the direction of intelligence, lightweight, network connectivity, electrification, and sharing of travel modes [7]. With the development of intelligent transportation and smart cities, the development of intelligent driving cars and intelligent driver assistance systems has attracted great attention [8]. As a complex system integrating perception, cognition, planning and control functions, the development of intelligent driving car benefits from the increasing maturity of machine vision, sensor technology, artificial intelligence, automatic control technology and other related technologies [9-11]. With its good application prospects and broad potential market, intelligent driving cars have received financial investment and research and development support from many countries, and have made many breakthroughs in technology [12-13].
The environment sensing technology of road traffic is a crucial link for intelligent vehicles to realize intelligence, and it is also a basic guarantee to realize the safety and intelligence of intelligent transportation [14]. Environment sensing is equivalent to the “eyes” of intelligent vehicles, through the visual sensors to sense the surrounding environment when driving, the most commonly used visual sensors are cameras [15-16]. Compared with active ranging sensors, cameras can provide dense pixel information for the surrounding scene at a relatively low cost, which improves the accuracy of perceiving object categories and shapes, and is widely used in the fields of image processing and target detection [17-18]. Vehicle target is one of the main factors constituting the road traffic scene, and the detection of vehicles refers to the accurate identification and localization of vehicle targets from the acquired visual data [19]. Detection algorithms for vehicles are important component algorithms of visual perception systems, which are widely used in video surveillance, intelligent transportation and other fields [20]. Target detection is an important branch of image processing and computer vision, and one of the most important and indispensable tasks in automated driving, because detecting targets quickly and accurately can not only support precise navigation, but also deal with potential hazards in complex driving environments. It is also a prerequisite for other processing such as vehicle monitoring, vehicle type recognition, vehicle traffic flow statistics and license plate recognition in ITS [21-22].
In recent years, a large number of detection models based on deep convolutional neural networks have been introduced to improve target detection performance [23]. Although these detectors have achieved good accuracy under good weather conditions, in real life there is often a lot of bad weather, such as rain, snow, fog, sand and dust, etc., and the images acquired under these weather conditions will be degraded and blurred, which makes it difficult to extract effective features, so the detection accuracy of these detectors is greatly reduced when they work under these weather conditions, and it is estimated that almost 30% of all traffic accidents are caused by rain and fog weather conditions, which is 70% higher than the weather conditions, because in bad weather conditions, such as rain, snow and fog can interfere with the driver’s vision, so it leads to misjudgment of road situation information [24-27]. Therefore, rain and fog weather conditions have become an important factor affecting road safety, but this situation cannot be eliminated artificially, so how to improve the accuracy of vehicle target detection in rain and fog weather is extremely important for intelligent transportation management [28].
A dynamic blurred image processing method based on Wiener filter and generative adversarial network is first proposed. Using Wiener filter deblurring algorithm, noise is removed by mean square error minimization. Then a Generative Adversarial Network model (GAN) which is free and not distributed by predefined conditions is considered and a UNIT based de-fogging and de-raining algorithm is proposed which introduces an encoder and decoder architecture to capture more useful information. The loss function formulation is further modified by a new loss function to generate realistic and clear images. On this basis, an image de-raining assisted local perception enhanced vehicle detection model is constructed, and Swin Transformer is chosen as the design prototype of the backbone network, and the whole network is divided into two parts: image de-raining and fogging and target detection, and finally, the performance of the proposed method is verified by experiments on both simulated data and real scenarios.
Filter-based deblurring is the estimation of a desirable clear image
For a pure denoising process of a noisy image unaffected by blurring, linear filtering can be considered as a natural tool for noise suppression by convolution, and for deblurring, it can be considered as an attempt to remove the effect of a particular convolution operation by another convolution operation [29]. For example, without considering noise, it can be expressed in the form of a Fourier frequency domain, where the Fourier transform is known and the frequency domain quantity is denoted as
From the above equation,
In order to solve this instability in the recovery process, the above equation is improved as:
where * denotes the conjugate of a complex number and an attempt is made to regularize the instability of the denominator when it is in the high-frequency part by adding a positive factor
Assuming that the estimate is recorded as
or in the Fourier frequency domain:
In the low-frequency part of
Since image noise affects deblurring, it is particularly important to select an optimal regularization factor, for which we use Wiener’s minimum mean square error for the original implementation.
Generative Adversarial Network (GAN) is a network that contains a class generator and a class discriminator. The class generator generates samples that “look similar to real samples” based on the input noise signal, and the class discriminator is used to distinguish between the samples generated by the class generator and real samples [30]. As an example, a photo is generated ( Throughout the training process, the class generator tries to generate real photos to deceive the class discriminator. And the class discriminator tries to distinguish the real photos from the photos generated by the class generator. This realizes a dynamic “game process”.
As the latest form of machine learning, Generative Adversarial Networks (GANs) have the advantages of high-definition output images, high sharpness, and universality to both generators and discriminators compared to general neural networks. Compared to other generative models, GANs no longer require a predefined data distribution and have maximum freedom of fit.
Wiener filtering hypothesis de-blurring estimation results Image
Wiener filter
Reach the smallest optimal filter:
which generates orthogonal conditions:
can be transformed as per the correlation function:
This leads to the explicit form of the optimal Wiener filter:
For fuzzy graph
where the regularization factor
The UNIT structure is schematically shown in Fig. 1. The UNIT network can be formally viewed as a combination of two VAE/GAN models. The network consists of three main parts: the autoencoder

Structure of the UNIT
As shown in (a), there are two different styles of picture domains in UNIT, the source domain
As shown in (b),
Network Structure In this section, the proposed VAE-CoGAN de-fogging and de-raining model can better handle images in foggy and rainy scenes without preclassifying the images. VAE-CoGAN consists of three parts: encoder, generative model, and discriminative model. The encoder is responsible for converting the input picture codes into vector form to be used as input for generative models in the GAN. For a blurred picture The generative model is responsible for realizing the conversion of the potential encoding Loss function The objective function of this network consists of four components: the VAE loss, the GAN loss, the cyclic consistency loss, and the VGG perceptual loss. The objective function is as follows:
The VAE loss function aims to minimize the objective function, and its loss function consists of two components, regularization and reconstruction error:
where
Regularization provides a simple way to sample from the latent space, and minimizing the negative log-likelihood term in the reconstruction error is equivalent to minimizing the absolute distance between the image and the reconstructed image. This chapter uses two auto-variable encoders with:
where
In this model, the GAN loss function is used to ensure that the generated images are as similar as possible to the images in the target domain:
Pictures
The formula for VGG loss is:
where
The final goal in this network is:
Here
This section focuses on the network structure of the vehicle detection part and proposes a detection framework with local perceptual enhancement compared to the most commonly used CNN-based detection algorithms. Unlike the detection backbone network used for foggy images, the model feeds the features extracted from the de-raining part to the local perceptual enhancement Transformer backbone network, and the vehicle detection network is shown in Fig. 2. Similar to Swin Transformer, each stage has 2, 2, 6 and 2 blocks respectively.

Vehicle detection network
First, given a rainy day image of size
The positional coding in Transformer easily fails to detect local correlation and structural information in the image, Swin-Transformer m used a window-based hierarchical structure to solve the scaling problem and high computational complexity of high-resolution images. Each Swin-Transformer block consists of a normalization layer, a multi-head self-attention module, residual connectivity, and a multilayer perceptron (MLP), and has two fully connected layers with GELU nonlinearity. The window-based multi-head self-attention (W-MSA) module and the shift-window-based multi-head self-attention (SW-MSA) module are applied in two consecutive Transformer blocks, respectively.
Although Swin Transformer constructs a hierarchical Transformer and implements attentional operations in each non-overlapping window, Swin transformer is limited in its ability to encode contextual information, in order to enhance the network’s learning of local relevance and structural information, this paper proposes a locally-aware augmented Transformer. Each block is composed of two consecutive improved Transformer blocks.
The standard convolution kernel dilation process can be viewed as the spacing of the values of the convolution kernel when doing data processing. Expansion convolution introduces hyperparameter i.e. expansion rate
where
Further after the introduction of dilated convolution, the size of the corresponding output feature map is calculated as in equation (25):
Where,
Comparison methods The algorithm in this paper will be compared with three different classes of rain removal methods, including (1) one raindrop removal method, AttentGAN.(2) six rain pattern removal methods, including DetailNet, RESCAN, PReNet, PReNet, JORDER-E, RCDNet and RLNet.(3) two robust rain removal methods, including Pix2pix and CCN. In addition to the algorithms in this paper, the RadNet-algorithm is also tested in this paper. Datasets In this paper, four datasets are selected as the training and testing datasets, including: Rain200H, Rain200L, RainDrop and RainDS. RainDS contains three synthetic data subsets (RS_syn, RD_syn and RDS_syn) and three real data subsets (RS_real, RD_real and RDS_ real). Based on the above datasets, three data strategies are designed to test the robustness of the rain removal method:(1) Single-type data strategy, i.e., the dataset is a single dataset and each image contains only one type of rain degradation, and the eligible datasets include the single rain pattern dataset (Rain200H, Rain200L, RS_syn and RS_real) and the single raindrop dataset ( RainDrop, RD_syn and RD_real). (2) Stacked data strategy, i.e., the dataset is a single dataset but contains two rain degradation types in each image, the eligible datasets include RDS_syn and RDS_real.(3) Hybrid data strategy, i.e., multiple datasets are mixed together, and each image may contain a single degradation or multiple degradation types, and three hybrid datasets are constructed Blended- 1={RD_syn+RS_syn+RDS_syn}, Blended-2={RD_real+RS_real+RDS_real} and Blended-3={Rain200H+Rain200L+RainDrop}. In addition to this, real scene data was collected from the Internet and previous work to construct real datasets as benchmarks to examine the de-emphasis ability of each method in real scenes. Peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) are used to evaluate the paired data. For unlabeled data, comparisons were made based on visualization results. Training details The algorithms in this paper were trained on two NVIDIA GeForce GTX 3080 GPUs with 24 GB of RAM using the Pytorch deep learning framework in a Python environment. Adam was chosen as the optimizer, and the weight decay and momentum were set to 0.0001 and 0.9, respectively. The initial learning rate is set to 1e-3 for RAM and DRM and 1e-6 for FWM, and the learning rate will be decayed by multiplying 0.2 every 30 epochs. Each image will be randomly cropped to 128×128 pixels. 100 epochs are trained to converge the network and the batch size is set to 16.
Experimental results and analysis on single-type data Firstly, the performance of all methods is tested under single type strategy, and the quantitative evaluation results on single type dataset (rain pattern) are shown in Table 1, and on single type dataset (raindrop) are shown in Table 2. It can be seen that (1) the performance of this paper’s algorithm is much better than other methods on two real datasets, RS_real and RD_real. (2) Compared with CCN, which is also a robust method, this paper’s algorithm obtains better performance on all but the RainDrop dataset. Specifically, the PSNR of this paper’s algorithm in RS_syn is 5 dB higher than that of CCN.(3) The performance of this paper’s algorithm on the RainDrop dataset is poorer than that of CCN, which may be attributed to the fact that CCN has two independent modules to process rainbars and raindrops separately, so that it can deal with the RainDrop data that contains large blurring regions. (4) In terms of the average score, the algorithm in this paper has a very competitive performance.
Quantitative evaluation results on single-type dataset (rain streak)
| Method | Rain streak | Average | |||
|---|---|---|---|---|---|
| Rain200H | Rain200L | RS_syn | RS_real | ||
| AttentGAN | 22.98/0.73 | 28.29/0.885 | 27.53/0.658 | 24.42/0.425 | 25.81/0.675 |
| DetailNet | 26.34/0.835 | 34.41/0.869 | 30.86/0.686 | 26.18/0.84 | 29.45/0.808 |
| RESCAN | 26.71/0.114 | 37.02/0.723 | 38.64/0.982 | 26.36/0.938 | 32.18/0.689 |
| PReNet | 28.17/0.869 | 36.82/0.951 | 39.37/0.969 | 25.93/0.998 | 32.57/0.947 |
| JORDER-E | 29.51/0.741 | 39.29/0.975 | 40.16/0.737 | 26.31/0.603 | 33.82/0.764 |
| RCDNet | 30.77/0.775 | 39.79/0.286 | 44.16/0.833 | 27.24/0.991 | 35.49/0.721 |
| RLNet | 29.47/0.787 | 38.36/0.992 | 37.06/0.966 | 26.85/0.755 | 32.94/0.875 |
| Pix2pix | 24.1/0.496 | 29.78/0.943 | 28.06/0.815 | 24.9/0.937 | 26.71/0.798 |
| CCN | 28.98/0.722 | 37.86/0.864 | 35.1/0.904 | 26.81/0.901 | 32.19/0.848 |
| RadNet - | 30.38/0.995 | 38.74/0.985 | 39.17/0.891 | 26.67/0.773 | 33.74/0.911 |
| Ours | 30.23/0.985 | 38.56/0.855 | 39.57/0.201 | 27.69/0.929 | 34.01/0.743 |
Quantitative assessment results on single-type data sets (raindrops)
| Method | Raindrop | Average | ||
|---|---|---|---|---|
| RainDrop | RD_syn | RD_real | ||
| AttentGAN | 30.6/0.932 | 27.26/0.87 | 21.75/0.669 | 26.54/0.824 |
| DetailNet | 25.02/0.594 | 28.42/1.262 | 22.13/0.821 | 25.19/0.892 |
| RESCAN | 25.54/0.953 | 34.45/0.885 | 23.03/0.584 | 27.67/0.807 |
| PReNet | 25.6/0.526 | 34.92/0.8 | 23.66/0.465 | 28.06/0.597 |
| JORDER-E | 26.62/0.989 | 35.55/0.832 | 23.83/0.918 | 28.67/0.913 |
| RCDNet | 26.28/0.887 | 35.18/0.991 | 24.36/0.837 | 28.61/0.905 |
| RLNet | 26.6/0.518 | 33.28/0.921 | 23.85/0.726 | 27.91/0.722 |
| Pix2pix | 25.55/0.935 | 25.07/0.493 | 20.46/0.779 | 23.69/0.736 |
| CCN | 31.49/0.871 | 33.45/0.815 | 24.63/0.888 | 29.86/0.858 |
| RadNet - | 24.66/0.945 | 35.4/0.922 | 23.69/0.832 | 27.92/0.900 |
| Ours | 24.09/0.475 | 35.61/0.582 | 28.25/0.943 | 29.32/0.667 |
Experimental results and analysis of superimposed type data The task of de-raining under this data strategy is more difficult than the single-type strategy, mainly because the network needs to synchronize the processing of rain patterns and raindrops. The quantitative evaluation results on the superimposed-type data set are shown in Table 3. From the PSNR/SSIM results in the table, it can be seen that (1) the algorithm in this paper obtains the best performance on both synthetic dataset (RDS_syn) and real dataset (RDS_real). Compared with CCN, the algorithms in this paper have 2dB PSNR improvement on RDS_syn and 4dB PSNR improvement on RDS_real. (2) All the methods perform poorly on RDS_real data, which is mainly due to the fact that the image pairs in it do not correspond at the pixel level, which makes it difficult for all the supervised methods. And based on the effectiveness of the proposed FWM, the algorithm in this paper can still achieve excellent results.
Quantitative evaluation results on superimposed-type dataset
| Method | RDS_syn | RDS_real | Average |
|---|---|---|---|
| AttentGAN | 24.91/0.816 | 21.03/0.999 | 22.97/0.908 |
| DetailNet | 26.56/0.814 | 22.43/0.19 | 24.50/0.502 |
| RESCAN | 31.65/0.898 | 21.66/0.446 | 26.66/0.672 |
| PReNet | 32.79/0.782 | 22.8/0.893 | 27.80/0.838 |
| JORDER-E | 33.3/0.372 | 23.09/0.352 | 28.20/0.362 |
| RCDNet | 34.18/0.744 | 23.33/0.817 | 28.76/0.781 |
| RLNet | 32.29/0.861 | 23.73/0.581 | 28.01/0.721 |
| Pix2pix | 23.78/0.658 | 20.16/0.652 | 21.97/0.655 |
| CCN | 32.15/0.98 | 22.81/0.805 | 27.48/0.893 |
| RadNet - | 34.25/0.981 | 23.55/0.908 | 28.90/0.945 |
| Ours | 34.07/0.801 | 27.06/0.807 | 30.57/0.804 |
Experimental results and analysis of hybrid data The de-raining task under this data strategy is more difficult than the above two strategies, not only needing to synchronize the two degenerate phenomena of rain patterns and raindrops, but also suffering from the fitting problems caused by the different distributions of the datasets. The results of the quantitative assessment on the hybrid dataset are shown in Table 4. From the PSNR/SSIM results in the table, it can be seen that: this paper’s algorithm achieves the best performance, with 1dB PSNR over RCDNet on the Blended-1 dataset, while on the Blended-2 dataset, this paper’s algorithm outperforms RCDNet by close to 3dB PSNR and 0.011dB SSIM. From the results, it can be seen that the algorithm in this paper is 1dB PSNR higher than CCN*.
Quantitative evaluation results on blended-type dataset
| Method | Blended-1 | Blended-2 | Blended-3 | Average |
|---|---|---|---|---|
| AttentGAN | 26.09/0.414 | 22.64/0.446 | 23.91/0.916 | 24.21/0.592 |
| DetailNet | 27.15/0.893 | 23.52/0.905 | 23.78/0.803 | 24.82/0.867 |
| RESCAN | 33.16/0.776 | 23.49/0.897 | 28.56/0.987 | 28.40/0.887 |
| PReNet | 34.19/0.929 | 23.96/0.982 | 29.88/0.791 | 29.34/0.901 |
| JORDER-E | 34.99/0.828 | 24.2/0.812 | 28.23/0.822 | 29.14/0.821 |
| RCDNet | 35.54/0.725 | 24.49/0.902 | 29.49/0.761 | 29.84/0.796 |
| RLNet | 35.54/0.873 | 25.31/0.908 | 30.61/0.924 | 30.49/0.902 |
| Pix2pix | 24.59/0.73 | 22.49/0.581 | 24.53/0.68 | 23.87/0.664 |
| RadNet - | 36.65/0.882 | 24.27/0.808 | 30.74/0.708 | 30.55/0.799 |
| Ours | 36.82/0.582 | 30.02/0.947 | 30.14/0.888 | 32.33/0.806 |
| CCN* | 33.6/0.735 | 24.58/0.561 | 32.91/0.592 | 30.36/0.629 |
| RadNet -* | 36.37/0.931 | 24.75/0.872 | 31.34/0.948 | 30.82/0.917 |
| Ours* | 36.39/0.941 | 27.49/0.913 | 31.21/0.958 | 31.70/0.937 |
The performance of the different methods under the three data strategies is shown in Fig. 3 (Fig. a shows the results of PSNR values and Fig. b shows the results of SSIM values). It can be found that (1) the algorithm in this paper is only slightly weaker than RCDNet on single-class data strategy, while the best performance is obtained in all other cases. (2) This paper’s algorithm improves very significantly on the stacked data strategy and hybrid data strategy, which mainly test the robustness of each method, and thus the robustness of this paper’s algorithm can be fully verified. (3) The algorithm in this paper is far better than another robust rain removal method CCN.

Different methods are performed under the strategy of three
Dataset library construction When performing the target detection task under rain and fog conditions, it is crucial to obtain a sufficient number of rain and fog dataset samples. However, there are relatively few rain and fog datasets under public traffic conditions, which poses a challenge to the performance improvement of deep learning network models under rain and fog conditions. To address this problem, this paper uniformly generates fog-containing data samples with different concentrations based on the atmospheric scattering model using the BDD100K dataset as a basis, a process that allows the dataset to contain images of foggy conditions with different concentrations, ranging from light fog to dense fog, covering a variety of foggy conditions. However, the samples generated by relying only on the atmospheric scattering model have certain singularities and limitations. To further enrich the dataset, an adversarial generative network is employed to generate more realistic and diverse fog-containing images, effectively expanding the size and diversity of the dataset. At the same time, real rainy day condition images are manually collected and labeled to enrich the training data, and a rainy fog condition dataset integrated with real, physical models and adversarial generative networks is constructed. The rain and fog images generated by this method have different concentration differences, which improves the breadth and abundance of the dataset and provides more abundant data resources for the subsequent target detection model based on convolutional neural network. Model parameter setting The model in this paper is implemented using Pytorch framework and trained based on NVIDIA RTX 3090 GPU. During the training process, the original image size is 640 360, which is resized to 640640 during input and training of the model. The original size is maintained during testing. The initial learning rate is 0.01 and the optimizer uses stochastic gradient descent (SGD). The momentum parameter and weight decay parameter are set to 0.937 and 0.0005 respectively, the batchsize is 8, the number of training rounds is 200, and the data enhancement strategy uses Mosaic and image left-right flip data enhancement strategies as a way of enriching the dataset and enhancing the generalization of the model.
Experimental results and analysis of the effectiveness of data derivation for rain and fog scenes In order to verify the effectiveness of the data augmentation method, by inputting the Haze Sim dataset, the GANHaze dataset and the AtmoGAN Haze dataset into the model network model of this paper respectively, and using mAP0.5 as the evaluation index, the data augmentation results are shown in Table 5. As can be seen from the table, the dataset achieves 75.6%, 54.8%, and 67.1% in precision (P), recall (R), and mean category accuracy (mAP), respectively, and the AtmoGAN Haze dataset after data expansion improves the detection effect by 4.3% relative to the un-expanded HazeSim dataset, and improves the detection effect by 1.1% relative to the GANHaze dataset Compared to the pre-expansion dataset, the AtmoGAN Haze dataset has better performance.
Data amplification
| Data set | P | R | mAP |
|---|---|---|---|
| AtmoGAN Haze | 75.6% | 54.8% | 67.1% |
| HazeSim | 72.8% | 49.8% | 62.8% |
| GANHaze | 67.3% | 45.5% | 66% |
| Fog Traffic | 85.5% | 58.9% | 72.7% |
The detection accuracy in each category is shown in Table 6. Secondly, the detection accuracy in each category shows that the AtmoGAN Haze dataset achieves relatively high mAP for pedestrians, riders, cars, buses, trucks, bicycles, and motorcycles. mAP for pedestrian detection reaches 48.1%, which achieves the highest detection result compared to the unexpanded dataset. It proves that the expanded dataset can detect pedestrians better. For the detection accuracy of riders, cars, buses and trucks, AtmoGAN Haze also achieves 71.1%, 80.9%, 66.8% and 60.4%, respectively, which maintains the leading position compared to the unexpanded dataset. This result demonstrates that the fog-containing dataset augmented with different methods enhances the generalization of the model, and also highlights the superiority and usefulness of the AtmoGAN Haze dataset in image processing tasks, which provides an important reference for further research and applications.
No detection accuracy
| Data set | Pedestrian | Rider | Car | Bus | Waggon | Bicycle | Motorcycle |
|---|---|---|---|---|---|---|---|
| AtmoGAN Haze | 48.1% | 71.1% | 80.9% | 66.8% | 60.4% | 64.2% | 73.7% |
| HazeSim | 44.8% | 67.3% | 68.2% | 71.9% | 57.7% | 60.2% | 66.7% |
| GANHaze | 57.6% | 60.2% | 86.3% | 54.3% | 51% | 27.2% | 57% |
| Fog Traffic | 70.6% | 70.4% | 81.4% | 84.6% | 77.2% | 65.2% | 62.7% |
In order to achieve the same good detection results of the model under rainy conditions, the 750 dataset obtained from Rain dataset, 500 images were added to the training set of AtmoGAN Haze dataset, and the other 250 images were added to the test set to form the rainy and foggy condition dataset HydroFogRain for this study. This dataset was constructed to ensure that the model has robust target detection performance in the rainy and foggy conditions with robust target detection performance. Based on the HydroFogRain dataset, the network model of this paper’s model with sensory field amplification is compared with the current mainstream model YOLOV5, and the YOLOV5 comparison results are shown in Table 7. It can be seen that the precision rate of this paper’s model reaches 77.2%, which is higher than that of YOLOV5, indicating that this paper’s model is more accurate in predicting positive samples. Secondly, the recall rate reaches 53.8%, which is also higher than YOLOV5, indicating that this paper’s model can better capture the real positive examples in the positive samples, and the mAP of this paper’s model reaches 66.2%, which is significantly higher than YOLOV5, indicating that this paper’s model performs better in terms of comprehensive performance.
YOLOV5 comparison results
| Model | P | R | mAP | Parameter quantity | GFLOPS |
|---|---|---|---|---|---|
| YOLOV5 | 76.9% | 53% | 56.5% | 45.3M | 109.4% |
| Ours | 77.2% | 53.8% | 66.2% | 54.4M | 302.1% |
The detection accuracy for each category is shown in Table 8. As can be seen from the table, for various different traffic scenarios, this paper’s model achieves relatively high detection accuracies for pedestrians, riders, cars, buses, trucks, bicycles and motorcycles. First, the detection accuracy of this paper’s model is significantly better than that of YOLOV5 for pedestrians and riders, reaching 53.1% and 73.5%, respectively, compared with only 44.5% and 59.1% for YOLOV5, indicating that this paper’s model is more capable of accurately detecting pedestrians and riders in complex situations. Second, this paper’s model performs equally well on other categories. For example, in the detection accuracy of targets such as cars, buses and bicycles, this paper’s model achieves 80.9%, 62.9% and 66.5%, respectively, which is higher than YOLOV5’s 80.3%, 56% and 46.6%. It shows that this paper’s model has better detection performance for different traffic targets. Finally, the detection accuracy of this paper’s model is also higher than that of YOLOV5 for trucks and motorcycles, which are 60.3% and 75.2%, respectively. In summary, the model in this paper performs well in the detection accuracy of multiple categories, which is especially suitable for the target detection task in complex scenes.
No detection accuracy
| Model | Pedestrian | Rider | Car | Bus | Waggon | Bicycle | Motorcycle |
|---|---|---|---|---|---|---|---|
| YOLOV5 | 44.5% | 59.1% | 80.3% | 56% | 56.9% | 46.6% | 66.5% |
| Model of this paper | 53.1% | 73.5% | 80.9% | 62.9% | 60.3% | 66.5% | 75.2% |
Comparison of mainstream target detection models In order to verify the effectiveness of the detection model, the proposed target detection model with sensory field amplification This paper’s model is compared with some mainstream deep learning networks based on the HydroFogRain dataset, including Faster R-CNN, SSD, YOLOV3, YOLOV3-SPP, YOLOV4, YOLOV7, YOLOV8, and DETR, and the comparative experiments are shown in Table 9, which clearly shows that the proposed model algorithm in this paper has a much higher mAP than other mainstream detection models and almost twice as much as SSD when compared to other models. Although FasterR-CNN represents a two-stage detection algorithm with better detection accuracy, the detection accuracy is far less than that of this paper’s model and is not applicable to real-time applications. In contrast, YOLOV3, YOLOV3-SPP and YOLOV4 have faster detection speeds but lower detection accuracy and overall poor results. Compared with YOLOV5X, the model in this paper performs well in terms of detection precision and recall, while the number of parameters is much smaller than that of YOLOV5X. Compared with YOLOV7, YOLOV8, and DETR, the model in this paper shows the best performance in terms of precision, recall, and detection speed, although the number of parameters does not dominate. Taken together, the target detection algorithm proposed in this study for this paper’s model achieves the best level in terms of detection precision, and although the detection speed is slightly reduced compared to some YOLO series models, it is still the optimal choice in terms of comprehensive performance.
Comparison experiment
| Model | P | R | mAP | Parameter | FPS |
|---|---|---|---|---|---|
| Faster R-CNN | 50.2% | 59.6% | 57.9% | 62M | 15.5 |
| SSD | 33.2% | 44.3% | 38.8% | 68.6M | 31.3 |
| YOLOV3 | 75.8% | 52.2% | 59.5% | 61.4M | 49.7 |
| YOLOV3_SPP | 73.5% | 46.6% | 54.3% | 64.5M | 43.8 |
| YOLOV4 | 69.5% | 45.9% | 48.8% | 63.9M | 50.5 |
| YOLOV5X | 78.9% | 53.7% | 57.8% | 85.8M | 27.2 |
| YOLOV7 | 75.9% | 53.5% | 59.4% | 34.4M | 42.2 |
| YOLOV8 | 70.1% | 51.6% | 59.6% | 41.8M | 34.1 |
| DETR | 61.8% | 44% | 46.7% | 31.7M | 28.8 |
| YOLO-Z | 71.5% | 47.7% | 52.8% | 55.6M | 28.2 |
| Ours | 77.1% | 52.4% | 68.2% | 53.2M | 35.5 |
In order to discuss the vehicle color recognition performance under rainy conditions, quantitative evaluations and qualitative comparisons of the methods in this paper, Da-Faster, SA-Da-Faster, and SMNN-MSFF were performed. To ensure the fairness of the comparison experiments, all settings are compared against the original text. The three algorithms of this paper method, Da-Faster and SA-Da-Faster, train network models on the labeled source domain dataset Vehicle Color-24 and the unlabeled target domain dataset Rain Vehicle Color-24. In this case, there are 8094 images in the training set of Vehicle Color-24 and 8194 images in the training set of Rain Vehicle Color-24, SMNN-MSFF is trained using the training set of Rain Vehicle Color-24 dataset, and after the training is completed all the algorithmic models are trained on the Rain Vehicle Color-24 on 576 test set images and the detection accuracy of different algorithms on Rain Vehicle Color-24 for each category is shown in Table 10. The table shows the AP values for each color category and finally the average AP values for all the color categories are given. The experimental results in the table show that the method proposed in this paper has the highest mAP value compared to other state-of-the-art unsupervised domain-adapted target detection methods and vehicle color recognition algorithms, and the mAP values are 3.73%, 2.23%, and 1.19% higher than Da-Faster, SA-Da-Faster, and SMNN-MSFF, respectively, and it can be very good in improving the vehicle color recognition under rainy weather condition accuracy of the task. Overall, the method in this paper can reduce the domain differences of the model in the target domain and improve the localization accuracy.
Detection accuracy
| Algorithm | Da-Faster | SA-Da-Faster | SMNN-MSFF | Ours |
|---|---|---|---|---|
| White | 0.78 | 0.63 | 0.64 | 0.77 |
| Black | 0.8 | 0.52 | 0.65 | 0.73 |
| Orange | 0.8 | 0.83 | 0.75 | 0.77 |
| Silver grey | 0.85 | 0.27 | 0.34 | 0.54 |
| Grass green | 0.69 | 0.8 | 0.84 | 0.82 |
| Deep grey | 0.73 | 0.27 | 0.45 | 0.54 |
| Scarlet | 0.77 | 0.65 | 0.49 | 0.8 |
| Gray | 0.17 | 0.04 | 0.25 | 0.12 |
| Red | 0.59 | 0.65 | 0.46 | 0.61 |
| Green color | 0.76 | 0.85 | 0.6 | 0.75 |
| champagne | 0.57 | 0.17 | 0.33 | 0.28 |
| Dark blue | 0.68 | 0.39 | 0.54 | 0.55 |
| Blue | 0.72 | 0.58 | 0.59 | 0.74 |
| Dark brown | 0.45 | 0.07 | 0.38 | 0.27 |
| Brown | 0.27 | 0.38 | 0.32 | 0.21 |
| Yellow | 0.52 | 0.66 | 0.31 | 0.31 |
| Lemon yellow | 0.87 | 0.96 | 0.57 | 0.43 |
| Dark orange | 0.61 | 0.63 | 0.34 | 0.99 |
| Dark green | 0.37 | 0.3 | 0.57 | 0.09 |
| Salmon | 0.27 | 0.33 | 0.37 | 0.38 |
| Earth yellow | 0.64 | 0.47 | 0.66 | 0.09 |
| Green | 0.61 | 0.08 | 0.17 | 0.73 |
| Pink | 0.55 | 0.7 | 0.91 | 0.58 |
| Purple | 0.00 | 0.00 | 0.22 | 0.00 |
| Mean accuracy(%) | 46.15 | 47.65 | 48.69 | 49.88 |
How to overcome the influence of bad weather conditions on the image data quality and ensure the accuracy of the detection system under various weather conditions is of great significance to improve the safety of automatic driving and the reliability of intelligent transportation systems. In this paper, the vehicle target detection method for rain and fog scene is studied, and the experimental results are as follows:
In this paper, we design the dynamic vehicle target detection technique based on GAN with dynamic fuzzy compensation technique, and evaluate the performance of this paper’s method under various data strategies. In the quantitative evaluation experiments on a single type of dataset (rain pattern), this paper’s algorithm has a PSNR 5dB higher than CCN in RS_syn. It is demonstrated that this paper’s method can outperform the other state-of-the-art de-raining methods by a large number of experiments.
In the rain and fog scene data-derived effectiveness experiments, the experimental results show that the proposed model in this paper achieves 75.6%, 54.8% and 67.1% in precision, recall and detection accuracy, which is excellent and can accurately detect vehicles of different colors.
