Research on image de-raining method based on high scale rain pattern image block training algorithm

Image rain removal algorithm is an important algorithm that has emerged in the development of automated computer vision technology in recent years. It is applied in the field of image processing and computer vision to improve the quality of images captured in low light or bad weather (e.g., rain). It has been widely used in automated vision computer technology, such as robot navigation, street scene image recognition, etc. [1-4]. The basic principle of image rain removal algorithm is to use image restoration algorithm to restore the blurred raindrops and other noises. The core of image rain removal algorithm is model construction, accurate construction is conducive to improve image quality and reduce noise in the image [5-7]. The main technical strategies are: quality-based methods, application of deep learning techniques, hybrid entropy algorithms and optimization techniques [8-9].

The quality-based approach treats the rain removal process as a parameter estimation problem, estimates the noise and non-noise parts of the image separately, and uses least squares to estimate the noise, thus obtaining a small estimation error, which makes the quality of the image in low light or bad weather at the time of shooting greatly improved [10-13]. Deep learning technology, which utilizes deep neural networks, can automatically identify patterns and build models. In the image de-raining algorithm, deep learning technology can be used to train multiple deep neural networks, which can effectively enhance the performance of image processing, thus significantly improving the clarity of images in low light such as raindrops [14-17]. The mixed entropy algorithm combines the least squares technique, as well as information entropy and constrained optimization techniques to achieve better rain removal [18-19].

In this paper, a CNN-based rain removal algorithm for high-scale rain streak image patch training is proposed to address the problem of excessive blurring of rain-free regions in rain streak detection algorithms. A three-layer CNN network structure is used to extract the rain streak module using multi-scale feature extraction to obtain location information about rain streaks. The mapping relationship between the high scale rain streak image block and the rain-free image is trained by CNN to reduce the interference of the background features on the network training and optimize the accuracy of the rain removal algorithm. Establish the objective function that will separate the rain streak layer and the background layer. Remove the residuals in the background layer using a gradient before restoring the real image. Design experimental sessions and experimental details to compare the performance of this paper’s algorithm with the classical rain removal algorithm using public datasets.

2

Deep learning based image de-raining algorithm

2.1

Description of the problem

Since rain and snow are often considered to be the same type of noise in image processing, this paper only discusses the case of rain. For the complexity of the direction and shape of rain streaks, it leads to localized blurring of the image after de-raining, which degrades the road recognition accuracy. In the case of heavy rain, it is easy to produce a masking effect similar to fog. If this type of situation is not considered, the masking effect distorts the road after de-raining and affects the accuracy of road recognition. Therefore, the study of effective rain removal algorithms can solve the problem of road recognition failure in complex weather.

In this regard, this paper proposes a rain removal algorithm based on the training of a high percentage of rain streak image blocks, and can automatically detect the location of rain streaks and remove the blurring effect of rain streaks and the masking effect on the image. The algorithm uses the following image processing techniques to address the shortcomings in existing methods.

2.2

Mechanism and characteristics of rain and fog image formation

Rain and fog are very common in nature. In rainy and foggy weather, the impact of rain and fog on the imaging quality of the imaging system should not be ignored, rain and fog can cause image degradation, color distortion, lack of saturation, brightness and contrast reduction and other problems.

Among computer vision, the generalized fog image imaging model is as follows: (1) $I (x) = J (x) t (x) + A (1 - t (x))$

where I(x) is the fog image, J(x) is the clear image, A is the global atmospheric light value, and t(x) is the direct transmittance between the light and the imaging system. The transmittance is generally given by the following equation: (2) $t (x) = e^{- β d (x)}$

where β is the atmospheric scattering coefficient, which represents the ability of atmospheric components to scatter light of different wavelengths. d(x) is the scene depth. The physical meaning of transmittance is the proportion of atmospheric light that ends up in the imaging system after scattering and attenuation by a medium such as air particles.

Most studies on image defogging algorithms use the atmospheric scattering model as the theoretical model for image defogging. Based on the relevant a priori assumptions, the fog-free image J(x) can be recovered by estimating the atmospheric transmittance t(x) and the global atmospheric value A from the fog-bearing image and bringing them into (1) for solving.

In general, when an imaging system photographs rain, the rain appears as vertical stripes on the image. In the presence of other factors such as wind, the rain vertical stripes become diagonal stripes under the effect of wind.

Models of rain are generally divided into three categories: simple additive models, α mixing models, and nonlinear mixing models. Simple additive models are: (3) $I (x) = B (x) + R (x)$

where x represents the pixel point of rain on the image, I(x) represents the input image, B(x) represents the rain-free background, and R(x) represents the rain line foreground. The additive model is the rain image is equivalent to the rain-free background and the rain line foreground added together. The simple additive model does not take into account possible real-life influences.

α The hybrid model is: (4) $I (x) = B (x) α (x) + R (x) (1 - α (x))$

Where parameter α takes values ranging from 0 to 1. α The mixing model is a mixture of background and foreground images to generate rain images, and parameter α controls the ratio of rain foreground to rain-free background. By controlling parameter α, a more realistic rain image is formed.

The nonlinear mixing model is: (5) $I (x) = B (x) + R (x) - B (x) R (x)$

The above model adds the nonlinear term B(x)R(x) to the nonlinear hybrid model compared to the simple additive model, so the rain image generated by the nonlinear hybrid model is more realistic.

2.3

Image De-Raining

2.3.1

Imaging model for rain-containing images

Since physical models play a key role in for describing the imaging process under rainfall conditions, this subsection provides a brief review of the evolution of rainfall modeling. For the traditional approach, a simple linear combination model is defined: (6) $I = B + R$

where I denotes the rain-containing image, B denotes the clean background layer, and R denotes the rain layer. This rainfall model assumes that the rain streaks in the rain image are sparse and are similarly characterized in terms of both fall direction and shape. However, in reality, for this purpose, an improved model related to rain shape and direction is defined as: (7) $I = B + \sum_{i = 1}^{N} R_{i}$

where R_i denotes the i nd rain-stripe layer and N denotes the total number of different rain-stripe layers. By iteratively learning the rain characteristics of different rain layers and removing them layer by layer from the original image I, a clean image can be recovered B. However, the equation still ignores the effect of attenuation and scattering on the brightness of rain streaks that may be caused by the accumulation of rain streaks in the air. For camera imaging, the accumulation of rain streaks can cause an obscuring effect of the atmosphere, which further leads to image blurring.

In order to solve the above problems, the formula is further extended to obtain a physical model for synthetic images that is more representative of natural images: (8) $I = α * (B + \sum_{i = 1}^{N} R_{i}) + (1 - α) * A$

where α denotes atmospheric transmitted light and A denotes global atmospheric light.

The atmospheric shading effect produced by rain streak accumulation is imposed on the image of rainwater pollution, and according to Equation (8), the division of rainwater accumulation effect and rain streak separation and removal has some rationality. However, in practice, solving the problem in the equation requires an artificial a priori decomposition method. Therefore, in this paper, we plan to use a convolutional neural network-based approach to learn a function f to directly establish the mapping from image I to B.

2.3.2

Evaluation indicators

In this paper, Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) are used as methods to objectively evaluate image quality. The performance of the de-raining methods is compared visually through numerical values. The principles of these two evaluation metrics will be described separately below.

PSNR: Peak Signal-to-Noise Ratio is an objective criterion for evaluating the quality of an image by the error between the corresponding pixel points of the image. Its unit is dB, and a larger value indicates a smaller distortion of the image. I.e.: (9) $M S E = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} {(X (i, j) - Y (i, j))}^{2}$ (10) $P S N R = 10 \log_{10} (\frac{{(2^{n} - 1)}^{2}}{M S E})$ (11) $P S N R = 10 \log_{10} (\frac{{(2^{n} - 1)}^{2}}{M S E})$

Where MSE denotes the mean square error of Fig. X and Fig. Y, H denotes the height of the image, W denotes the width of the image and n denotes the number of bits per pixel.

SSIM: Structural similarity is a measure of similarity between two images. The similarity of the images is evaluated through three aspects: brightness, contrast and structure. It has a value range of [0,1]. Larger values indicate less distortion in the image. I.e.: (12) $l (X, Y) = \frac{2 μ_{X} μ_{y} + C_{1}}{μ_{x}^{2} + μ_{Y}^{2} + C_{1}}$ (13) $c (X, Y) = \frac{2 σ_{x} σ_{Y} + C_{2}}{σ_{x}^{2} + σ_{y}^{2} + C_{2}}$ (14) $s (X, Y) = \frac{σ_{X Y} + C_{3}}{σ_{X} σ_{Y} + C_{3}}$

where μ_X and μ_Y denote the mean of Figures X and Y, respectively, $σ_{X}^{2}$ and $σ_{Y}^{2}$ denote the variance of Figures X and Y, respectively, σ_XY denotes the covariance of Figures X and Y, and C₁, C₂ and C₃ are constants.

3

Rain removal algorithm based on high scale rain streak images

In this section, a CNN-based rain removal algorithm for high-percent rain streak images is proposed using the CNN deep learning framework [20]. The data processing process of this algorithm is divided into four steps, and the flowchart of CNN-based rain removal algorithm is shown in Figure 1.

In the first step, according to the characteristics of rain streaks belonging to high-frequency noise, low-pass bilateral filtering is used to initially separate the low-frequency background layer (IL) and high-frequency rain streak layer (IH). Meanwhile, in order to calculate the location information of rain streaks and improve the accuracy of the rain removal algorithm, the input RGB image with rain is transformed into a YUV image, and the CNN1 is trained to learn the mapping from the luminance image to the binary rain map.

The second step utilizes the rain streak location information of the binary rain map to select the image blocks with high percentage of rain in the high-frequency rain layer as the training set of the CNN2, so that the CNN2 learns the mapping from the high-frequency rain layer to the high-frequency rain-free layer, and improves the de-raining ability of the CNN2, so as to obtain the high-frequency rain-free layer.

The third step superimposes the low-frequency background layer and the high-frequency rain-free layer, which were separated in the first step, to compute the rain-free image.

In the fourth step when fog residue is detected in the image, the de-fogging algorithm in the rain-fog model is utilized to remove the masking effect of the image and improve the clarity of the image as well as to improve the subsequent road recognition accuracy.

3.1

Rain streak detection subnetwork

The rain streak detection sub-network consists of a multi-scale feature extraction module and a multi-scale feature fusion module. The main purpose of the multi-scale feature extraction module is to extract additional features from multi-scale inputs. Different scales contain rich rain streak information, and the rain streaks at the same location will be complemented with the rain streak information at other locations after downsampling. The multi-scale feature fusion module is used to integrate and complement the obtained multi-scale rain streak information, so that the rain streak detection sub-network can quickly and accurately detect all rain streaks. These two parts are described in detail below.

3.1.1

Multi-scale feature extraction module

The input continuous video frame I is first downsampled and then initially modeled by 3D residual convolution to model the temporal correlation between the continuous video frames. Then three different scales of features are obtained and the processing can be described by the following equation: (15) $I_{s} = Re s 3 D (D o w n S a m p l e (I))$

where Down Sample(·) denotes the downsampling process and Res3D(·) denotes the 3D residual convolution. The obtained multi-scale characterization information goes into the corresponding 3D nonlinear activation function-independent block, which simplifies the network structure and computation by removing and replacing the commonly used nonlinear activation functions. Moreover, replacing the 2D convolutional layer and the adaptive average pooling layer with 3D structure facilitates the extraction of additional timing information.

The multiscale representation information is first normalized by layer normalization (LN), and then the features are processed by 3D convolution with a convolution kernel size of 1×1×1 and 3D depth-separable convolution of 3×3×3. Model complexity can be reduced by using simple gating units (SGUs) instead of nonlinear activation functions. The SGU divides the features into two parts and multiplies them directly in the channel dimension, then passes the features into the simplified channel attention (SCA) for attention computation, and then outputs the results of the features in the upper half of the irrelevant block of the 3D nonlinear activation function through a convolution with a convolution kernel of 1×1×1.

The process of processing the feature results of the upper half of the 3D nonlinear activation function-independent block can be described using the following equation: (16) $F_{s} = C o n v (S C A (S G U (D C o n v (C o n v (L N (I_{s}))))))$

The inputs for the lower part of the 3D nonlinear activation function independent block are obtained by jump-joining, and the inputs are sequentially subjected to layer normalization, 1×1×1 -convolution, simple gating unit, and 1×1×1 -convolution to obtain the final multiscale feature result. The processing can be described using the following equation: (17) $F_{s}^{'} = (F_{s} \oplus I_{s}) \oplus C o n v (S G U (C o n v (L N (F_{s} \oplus I_{s}))))$

The obtained multiscale features have rich rain streak information at each scale, and the complementary information between them facilitates the subsequent multiscale feature fusion, and also provides an antecedent basis for accurate and fast rain streak detection.

3.1.2

Multi-scale feature fusion module

In order to efficiently fuse spatio-temporal features extracted at different scales, a multi-scale feature fusion module (MSFF) is proposed. The structure of the multiscale feature fusion module is shown in Fig. 2, where the features extracted in the lower scale are subjected to an up-sampling operation $F_{s / 2}^{'} = U p S a m p l e (F_{s / 2}^{'})$ . They are then summed with the next scale $F_{s}^{'}$ i.e., $F_{s}^{'} \oplus F_{s / 2}^{'}$ , and computed by a channel attention block (CAB) and a pixel attention block (PAB). Where Channel Attention Block consists of Global Average Pooling (GAP), 3D convolution with two convolution kernels of 1×1×1 and ReLU activation function and Pixel Attention Block consists of 3D convolution with two convolution kernels of 1×1×1 and ReLU activation function. The obtained attention weights are soft-selected by multiplying them with the features $F_{s}^{'}$ and $F_{s / 2}^{'}$ of the two scales, respectively, and the sum of the features of the two scales after soft-selection is used as the final multiscale fusion result.

The processing can be represented by the following process: (18) $\begin{matrix} F_{s}^{'} = (P A B (C A B (F_{s}^{'} \oplus F_{s / 2}^{'})) ⊙ F_{s}^{'}) \\ + (1 - (P A B (C A B (F_{s}^{'} \oplus F_{s / 2}^{'}))) ⊙ F_{s / 2}^{'}) \end{matrix}$

where CAB(·) represents the channel attention operation, and the resulting channel attention weight is directly multiplied by the $F_{s}^{'} \oplus F_{s / 2}^{'}$ elements, and PAB(·) represents the pixel attention operation. The calculated pixel attention weights are multiplied by the current scale features and the low-scale feature elements to obtain the final fusion result of the current scale $F_{s}^{'}$ . When s = 1, the predicted rain streak R ∈ ℝ^C×F×H×W is obtained. Enter I minus the rain stripe plot to get a rough rain stripe result of B_c ∈ ℝ^C×F×H×W.

3.2

Image layer separation based on rain streak features

According to the frequency characteristics of rain streaks, low-pass filtering such as bilateral filtering or median filtering will have rain image (low-pass filtering method for layer separation of the image is shown in Fig. 3), initially decomposed into a high-frequency rain streak image and a low-frequency background layer image [21]. As shown in equation (19): (19) $I = I_{H} + I_{L}$

where I_H and I_L represent high-frequency rain streaks and low-frequency background layer images, respectively. Since both rain streaks and background edge features belong to the high-frequency component, as shown in Fig. 3(b). Therefore they are left in layer I_H after low-pass filtering. The distribution of rain streaks in layer I_H is sparse. Therefore the CNN is trained to learn the mapping from layer I_H to the high-frequency rain-free layer as shown in Eq. (20), thus recovering the rain-free image.

(20)

L = \frac{1}{N} \sum_{n = 1}^{N} {‖ f w (I^{n}) - G^{n}_{H} ‖}_{F}^{2}

Where fw(·) represents the CNN network structure and weight parameters, N is the number of samples in the training image. I_Hⁿ represents the I_H th layer of the n th image, G_Hⁿ is the high frequency layer image of the n th real rain-free image. F represents the Frobenius paradigm, and n represents the image index value.

By using the mean square error to minimize the difference between layer I_H and the real rain-free HF image, the trained HF rainy image converges infinitely to the HF rain-free image. Finally, the high-frequency rain-free layer and the low-frequency background layer are linearly superimposed to realize the purpose of image de-rain. However, this type of method has certain limitations. As the amount of rain increases, the density of rain streaks increases, and the low-pass filtering is no longer applicable to rain streaks from different sizes and directions, leaving a large number of rain streaks in the I_L layers.

For example, as can be seen in Fig. 3(c), there are still some rain streaks in layer I_L, as shown in the red area. Therefore the method of extracting rain streak features by training only on layer I_H will fail in the case of heavy rain.

To overcome these constraints and limitations, a binary rain map is used to provide a high proportion of rain streak image blocks as a training set to solve the above problem.

3.3

CNN and binary rain map based rain trace feature extraction

In order to compute the binary rain map as well as to save storage space, the color image with rain RGB is converted to YCbCr image, and the luminance channel image (Y) is extracted as the input of CNN1, so that CNN1 learns the mapping from the luminance image of rain streaks to the binary rain map. The deep learning network CNN1 in this paper consists of three layers and the network structure can be represented as shown below: (21) $Y = 0.257 * R + 0.564 * G + 0.098 * B + 16$ (22) $f^{0} (I_{Y}) = I_{Y}$ (23) $f^{l} (I_{Y}) = σ (W^{l} * f^{l - 1} (I_{Y}) + b^{l}), l = 1, 2$ (24) $f_{W} (I_{Y}) = W^{l} * f^{l - 1} (I_{Y}) + b^{l}, l = 3$

Where, I_Y is the Y -channel input image of size N×M×l_l. l_l is the index value of the number of I_Y channels, * denotes the convolution operation, b denotes the deviation value, and l represents the number of layers of the convolutional neural network. σ(·) is defined as the modified linear activation function (ReLU). To avoid overfitting in Eq. (24), two hidden layers are used in the deep learning architecture proposed in this paper to extract high-level features of rain streaks. The network loss function for the binary rain figure S is shown in Eq. (25): (25) $L_{1} = \frac{1}{N} \sum_{n = 1}^{N} {‖ f w (I^{n}_{Y}) - G^{n} ‖}_{F}^{2}$

where Gⁿ represents the true value of the binary rain map from the BSD200 dataset, which contains a large number of natural scene images.

In order to obtain the high-frequency rain-free image, this paper combines the binary rain streak location information and the high-frequency layer rain streak image to obtain a denser rain streak feature map. Then CNN2 is trained to learn the mapping from the high-frequency rain streak map to the rain-free high-frequency image (I_nr–H).

The rain-free image O_nr is a linear superposition of the I_L background layer (low-frequency layer) and the high-frequency rain-free I_nr–H layer trained by the CNN2 network with the following expression: (26) $O_{n r} = I_{n r - H} + I_{L}$

After 1000 iterations of the CNN2 network, the high T -value based training network for rain removal outperforms the high-frequency layer based results. A comparison of the results of the rain removal algorithms based on the high-frequency layer and based on the T-value is shown in Fig. 4.

3.4

Rain removal algorithm based on CNN and gradient constraints

In order to solve the above problems and to speed up the processing of the algorithm, the fusion of LiDAR and vision sensors is used to obtain the position information of the object as well as the road. Then the image of the region is optimized instead of optimizing the whole image. In this regard, based on the observation of the residual rain streaks in the image and the a priori knowledge of the natural image, it can be obtained that the color gradient of the natural image varies more compared to the area covered by the rain streaks. Therefore, the objective function based on gradient constraints is used to optimize the local regions of the de-raining image.

Assuming that a spatial point P has coordinates P(x_c,y_c,z_c) under the vision sensor coordinate system and coordinates P(x_i,y_i,z_i) under the LiDAR coordinate system, the mapping relationship between them is as follows: (27) $[\begin{array}{l} x_{c} \\ y_{c} \\ z_{c} \end{array}] = R [\begin{array}{l} x_{l} \\ y_{l} \\ z_{l} \end{array}] + T = [\begin{array}{l} r_{11} & r_{12} & r_{13} \\ r_{21} & r_{22} & r_{23} \\ r_{31} & r_{32} & r_{33} \end{array}] [\begin{array}{l} x_{t} \\ y_{l} \\ z_{l} \end{array}] + T$

where R and T are the rotation and translation matrices, respectively. r₁₁…r₃₃ and so on are all elements in the rotation matrix R, and the size of z_c depends on the visual sensor parameters. Then the point P image pixel coordinates (u,v) are obtained based on establishing the relationship between the LiDAR coordinate system and the image coordinate system, as shown in Equation (28): (28) $z_{c} [\begin{matrix} u \\ v \\ 1 \end{matrix}] = K [\begin{array}{l} R & t \end{array}] [\begin{matrix} x_{l} \\ y_{t} \\ z_{l} \\ 1 \end{matrix}] = [\begin{array}{l} m_{11} & m_{12} & m_{13} & m_{14} \\ m_{21} & m_{22} & m_{23} & m_{24} \\ m_{31} & m_{32} & m_{33} & m_{34} \end{array}] [\begin{matrix} x_{l} \\ y_{l} \\ z_{l} \\ 1 \end{matrix}] = M [\begin{matrix} x_{l} \\ y_{l} \\ z_{l} \\ 1 \end{matrix}]$

K and [R,t] are the parameter matrix and the outer parameter matrix of the visual sensor, respectively. M is the projection matrix related to the visual sensor parameters, and m₁₁⋯m₃₄ are all elements of the projection matrix M.

To optimize the region of interest (background layer), the binary rain streak location map (S) and the rain streak map (R), the maximum a posteriori probability (MAP) is used to maximize the joint probability density of B, S and R. Then negative logarithm is taken for the joint probability and the objective function is shown below: (29) $\min_{B, S, R} {‖ O - B - S R ‖}_{F}^{2} + P_{1} (B) + α {‖ \nabla B ‖}_{1} + P_{2} (S) + P_{3} (R) + β {‖ R ‖}_{F}^{2}$

The first term ${‖ O - B - S R ‖}^{2} F$ in the objective function functions to reduce the difference between the background layer after rain removal and the real rain-free layer. The second term P₁(B)+α║∇B║₁ represents the probability value of the background layer and its gradient linearly weighted sum, where the probability value of the background layer P₁(B) is the probability of determining that each pixel in the image belongs to the background layer, and ║∇B║ is the smoothing of the image that is locally affected by the rain streaks, and the degree of smoothing is controlled by parameter α. P₂(S) and P₃(R) represent the probability that a pixel belongs to the binary rain streak location map S and the rain streak layer R, respectively. ${‖ R ‖}_{F}^{2}$ is proposed for the sparse rain streak features in the image, and the use of Frobenius’ parameter controlled by parameter β aims to reduce the background details in the rain streak layer R so that the background details are retained in B.

The rain removal algorithm combining CNN and gradient constraints can obtain rain-free images with high clarity. However, under heavy rain conditions, rain streaks and masking effect exist simultaneously as shown in Fig. 5. The image after rain removal is still affected by the masking effect and locally blurred, resulting in reduced image clarity as well as subsequent road recognition failure. According to the rain and fog model proposed in this paper, the fog in the image is removed, the image color information is restored, and the road recognition accuracy is improved. Finally, the rain removal effect is obtained as shown in Fig. 5(c).

4

Comparison and analysis of experimental results

4.1

Experimental environment and experimental details

The experiments in this paper are network models built using Python programming language and the Pytorch framework. Table 1 illustrates the specific hardware experimental environment.

Table 1.

Experimental environment

Device name	Equipment type	Detailed parameter information
Operating system	Windows 10	E7-2620
CPU	I7-8700K	2.1GHz
GPU	NVIDIA RTX 3060	64G
Memory	DDR4 2666 hackers	16G×2
Hard disk	860EVO	1TB

The software environment for the experiments is Python 3.6.7, CUDA 10.0, and CUDNN 7.5. In the experiments, the models are trained using the same training setups except for specific statements. The patch size is 100 × 100 and the training batch size is chosen as 10. In this paper, the Adam optimizer is chosen with a learning rate of 1 × 10⁻³ and ends after 500 iterations.

The datasets used for experimental training and testing are shown in Table 2. All the training and test sets used in this paper to train the model are listed in the table. Among them, the SPA-data test set is the real rain map test set, and all other training and test sets are synthetic datasets. All the above mentioned datasets are public datasets.

Table 2.

Experimental training and test use of data sets

	Training set	Test set
Heavy rain image data set	Rain Train H	Rain100H
Heavy rain image data set	Rain Train H	Rain300H
Small rain image data set	Rain Train L	Rain100L
Small rain image data set	Rain Train L	Rain300L
Irregular pattern data set	Rain Train H	Rain10
Multi-type rain striped data set	Rain12600	Rain1500
Real rain chart data set	Rain12600	SPA-data

4.2

Single rain streak removal

In this section, a dataset with a single class of rain streaks will be used for training and their effect on rain removal will be evaluated.

In order to demonstrate the ability of the rain removal algorithm trained on a high percentage of CNN-based rain streak image blocks proposed in this paper to produce high fidelity outputs. In this section, the output results of several state-of-the-art models are quantitatively compared on the test sets Rain10 and Rain100H, Rain100L and such as GMM, DDN, SSIR, ResGuigeNet, JORDER, and RESCAN. The comparison results are shown in Fig. 6. The algorithms in this paper have significant advantages in both datasets Rain10, Rain100H, and Rain100L. The PSNR, SSIM metrics scores of the CNN based rain removal algorithm trained on high percentage rain streak image blocks are 35.77/0.977, 30.41/0.925, 38.97/0.984 in datasets Rain10, Rain100H, Rain100L respectively.

4.3

Different types of rain streak removal

In order to verify the generalization ability and robustness of the CNN-based rain streak image de-raining algorithm proposed in this paper for various types of rain streaks, this section is tested on another dataset, Rain1500. The Rain1500 test set is synthesized for each clean image with 15 types of rain streaks. This section provides examples of the effectiveness of the rain removal algorithm for various road scenes, including different rain streak orientations and sizes, and the road recognition results after rain removal.

A comparison of the rain removal results of different algorithms is shown in Fig. 7. As can be seen from the figure, the first row shows three different scenarios of roads with rain images, the directions of the rain streaks are +50°, -50° and +45°, and their corresponding rain sizes are light rain, heavy rain and heavy rain, respectively. The second to seventh rows are the results of GMM, RESCAN, SSIR, JORDER and DDN and this paper, respectively. The eighth row shows the real image without rain. JORDER removes most of the rain streaks, but some of the background details appear to have been over-simplified compared to the real image without rain in the last row. The results of the method proposed in this paper are in the seventh row, and compared with the real rain-free image in the eighth row, the local areas of the de-rain image do not show excessive smoothing, and a larger and more accurate road drivable area is obtained than other methods.

4.4

Ablation experiments

In order to validate the necessity of the design of each module of the CNN-based rain streak image de-raining model for high scale rain streak images proposed in this paper, ablation experiments are conducted in this section. It is divided into three parts for validation: 1)

Single use of rain-streak modeling module

2)

Single use of multi-scale convolution module

3)

Instead of using LSTM in the rain and fog model module, ordinary convolution is used instead of LSTM module for learning the rain and fog layer transfer spectra (NLSTM).

In this section, these three parts are trained separately and finally tested at Rain100H, Rain100L, Rain300H, Rain300H, and Rain10 respectively, and the PSNR and SSIM obtained are shown in Table 3. The table shows the single use of the rain and fog model module and the single use of the multi-scale convolution module respectively. NLSTM is rain and fog model module without LSTM.

Table 3.

The ablation experiment is compared (SSIM/PSNR)

Data set	SSIM/PSNR	Single use rain fog model module	Single use multiscale convolution module	NLSTM	Based on CNN’s high ratio rain stripe image to rain model
Rain100H	SSIM	0.875	0.922	0.896	0.941
Rain100H	PSNR	30.44	29.55	21.65	30.79
Rain100L	SSIM	0.961	0.971	0.874	0.981
Rain100L	PSNR	34.96	35.43	20.14	35.67
Rain300H	SSIM	0.873	0.913	0.803	0.939
Rain300H	PSNR	28.89	29.68	22.69	30.65
Rain300L	SSIM	0.968	0.975	0.878	0.987
Rain300L	PSNR	35.51	34.78	21.41	36.12
Rain10	SSIM	0.957	0.908	0.952	0.966
Rain10	PSNR	36.42	36.69	24.57	36.58

It can be seen that the combination of a single rain and fog model module with a multi-scale convolution module has a significant optimization effect on the rain removal. The use of LSTM has a significant optimization effect on the model, and using only a single convolution to predict the rain and fog layer transfer spectra is very unsatisfactory. The CNN based rain streak image de-raining algorithm for high percentage of rain streak images proposed in this paper has obtained the best results on both the study datasets. The mean value of SSIM is 0.9628 on the datasets Rain100H, Rain100L, Rain300H, Rain300H, Rain10.

4.5

Real Image Processing

4.5.1

Rain and fog image processing

The next step will be to compare time from the perspective of comparing time, keeping all settings unchanged, using pictures of different concentrations. Four concentrations of images in the SPA-data test set are selected, no fog light rain, no fog medium rain, no fog medium rain, and medium rain medium fog, all of which have a size of 256 × 256. Ten images of each concentration are processed using each algorithm for image processing, and the time difference between the input image and the output image result is calculated.

The results of the runtime test experiments are shown in Fig. 8. In the figure, RESCAN is the algorithm that focuses on removing raindrops, and the algorithm of this paper has an improvement of 0.062s to 0.076s in time compared to the RESCAN raindrop removal algorithm. This paper outperforms several other algorithms in time performance on all four concentration pictures. Meanwhile, the fastest processing speed is achieved on the medium rain and fog pictures, which takes only 0.056s. The time for all four concentration pictures is less than 0.1s.

4.5.2

Comparison of rain and fog removal algorithms

Fog and rain pictures containing different concentrations in the SPA-data dataset are selected and tested based on the rain and fog removal algorithm, and the simulation experimental results of the image rain and fog removal in this paper are obtained. In order to verify the effectiveness of the model provided in this paper, several models with significant effects in recent years are used for comparison, and the model is tested with different concentrations of fog as well as rain of different size and direction, and compared and analyzed with GMM, DDN, SSIR, ResGuigeNet, JORDER, and RESCAN algorithms in terms of numerical value and picture effect in PNSR and SSIM. Among them, RESCAN focuses on removing raindrops and DDN focuses on removing fog.

In this paper, different concentration pictures are taken and tested separately, and classical and latest algorithms are applied for comparison, and the test metrics are used as PNSR (Peak Signal to Noise Ratio) and SSIM (Structural Similarity). Two different sets of test samples are applied for each type of picture. Each group has 100 pictures with different rain and fog concentrations in the same environmental conditions, and the PNSR and SSIM values of the pictures obtained from each test sample after de-raining and fogging were recorded and averaged. The comparison of PSNR and SSIM values is shown in Figure 9.

The PNSR values of this paper’s algorithm are significantly better than those of other algorithms for rain in no fog pictures, fog in no rain pictures, and rain in medium fog pictures. It can be seen that RESCAN as a rain removal model is not effective in removing foggy pictures, and the SSIM values are 0.755 and 0.764 for foggy pictures without rain 1 and 2 respectively. The rest of the algorithms are poorly processed on foggy pictures with large concentrations, compared to this paper’s algorithm which performs better in rainy and foggy pictures of all concentrations.

5

Conclusion

In this paper, we propose a rain removal algorithm based on the training of high scale rain streak image patches to improve the image quality and reduce the impact of rain and fog images on road recognition in complex weather. The proposed algorithm is compared with the most common methods in the field to verify the efficiency of the proposed rain removal algorithm for high-scale rain streak images.

Images with different road scenes and rain streak directions are selected to test the algorithm, and the CNN-based high-ratio rain streak image de-raining algorithm proposed in this paper does not have over-smoothing in the local area of the de-raining image compared with real rain-free images, and obtains a larger and more accurate drivable area of the road than other methods. In terms of processing real images, the CNN-based high-scale rain streak image de-raining algorithm takes 0.062s to 0.076s less time than RESCAN (an algorithm that focuses on de-raining raindrops).

The combined rain streak removal accuracy and the real image processing speed are seen. The CNN-based rain streak image removal algorithm for high scale rain streak images proposed in this paper performs better in all concentrations of rain and fog images, and is more operational and universal.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

Research on image de-raining method based on high scale rain pattern image block training algorithm

Kan Ni

Xiongwen Jiang

Qiyu Ni

Seiji Hashimoto

Published Online: Mar 19, 2025

Received: Nov 02, 2024

Accepted: Feb 09, 2025

DOI: https://doi.org/10.2478/amns-2025-0355

KeywordsRain removal algorithm, CNN, Bilateral filtering, Rain streaks, Image processing

© 2025 Kan Ni et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
Rain removal algorithm, CNN, Bilateral filtering, Rain streaks, Image processing