Accès libre

Research on infrared image target detection technology based on deep learning

, ,  et   
17 mars 2025
À propos de cet article

Citez
Télécharger la couverture

Introduction

With the further development of artificial intelligence technology, target recognition technology based on deep learning is widely used in the fields of drone inspection, automatic driving, security and so on. Target recognition is a technique to localize and classify various targets appearing in images through deep learning models, and it is a multidisciplinary cross-technology integrating the scientific theories of artificial intelligence, computer vision and so on [1-3]. Most of the current mainstream open source target recognition models are based on visible light images, which have become the preferred images for target recognition algorithms due to their high resolution, clear edge contours, and rich detail information [4-6]. However, its disadvantage is that it is easily affected by changes in illumination, which largely increases the difficulty of target recognition, especially in some special weather conditions, the visible distance is short, the visibility is poor, and the captured images can not be used for effective target localization and recognition [7-9]. While infrared image imaging belongs to passive imaging, through the foreground and background of the gap between the radiated energy to imaging, completely independent of the weather and lighting conditions, and has the advantages of day and night work, the role of long distance and easy to conceal [10-13]. Under the hardware constraints of the existing material science and imaging devices, there is still a lot of space for how to further improve the accuracy, robustness, real-time and reduce the false alarm rate and missed detection rate of infrared target detection in the face of dynamically changing complex backgrounds [14-16]. The infrared image target recognition algorithm based on deep learning can meet the needs of specific scenarios such as long-distance imaging and night imaging, and has high reliability and stability, which has certain research value.

Literature [17] proposes a convolutional neural network-based thermal infrared image target detection method TIRNet, which adopts a lightweight feature extractor, and effectively solves the problems of complex backgrounds and occlusions with the support of continuous information fusion strategy, which enhances the validity of image target recognition results. Literature [18] constructed an image- and video-oriented target detection framework for UAV thermal infrared images, utilized the YOLO model based on convolutional neural network to extract features from images and videos captured by front-view infrared camera, and determined the target detection results by evaluating the metrics. Literature [19] improves the infrared image target detector based on YOLOv5 kernel by compressing the channels and optimizing the parameters, and proposes the YOLO-FIR detector, and the comparison experiments show that the YOLO-FIR detector has a high performance improvement. Literature [20] integrated the SPD module into a deep learning-based target detection model for infrared images and developed a new algorithm based on YOLOv8, which significantly improved the recognition accuracy of low-resolution images and small targets. Literature [21] shows that vision-based crack detection methods for steel plates can only detect surface defects in steel plates, but not its internal features, so the convolutional neural network-based infrared thermal imaging crack detection method for steel plates is proposed, and the detection results of the database images show that the proposed method has good robustness. Literature [22] pointed out that passive thermal imaging without controlled excitation complements the traditional machine vision based on visible light, while active thermal imaging with controlled excitation is a non-destructive inspection method for quality assessment and safety assurance of non-self-heating objects, and emphasized that deep learning techniques improve the intelligence and automation of infrared imaging machine vision. Literature [23] investigated fused visual target tracking methods for visible and infrared images and showed that deep learning based methods have the strongest recognition performance on RGB-infrared datasets. Literature [24] utilized an image-to-image transformation framework to generate a pseudo RGB equivalent of a given thermal image and trained a pseudo multimodal target detector using the extracted visual RGB equivalent rich domain domain features to improve the performance of target detection in thermal infrared images.

The infrared image target detection in this paper uses CBDNet network for image denoising and then introduces the attention mechanism module. And for the fusion image target detection part, the results of the YOLOv5 network are improved based on the Transformer network.Then, the improved algorithm model is transferred to the embedded platform to build a hardware platform for infrared image target detection. Finally, the target detection algorithm based on the improved YOLOv5 network is verified through experiments, and then the deployment method of the embedded platform is tested to see if it meets the practical needs, so as to study the practical value of the deep learning-based infrared image detection method in this paper.

Deep learning based infrared image detection
Deep learning techniques

Deep learning is a new research branch in the field of machine learning in recent years. As we all know, the ultimate goal of machine learning is to realize artificial intelligence, and the proposal of deep learning makes it a big step closer to realizing artificial intelligence. The core idea of deep learning is first to learn from a large number of data samples to learn their intrinsic correlation patterns and features, and second to obtain valuable information about the interpretability of data such as images and text and summarize the relevant intrinsic laws from the learning process, and finally to have the ability to learn and analyze certain data (text, images, sound). Deep learning has achieved better results compared to traditional techniques in related research. As one of the main network models in the field of deep learning, convolutional neural networks have been widely used in speech recognition, computer vision, natural language processing, and other fields, and have achieved better results.

The main feature of convolutional neural network is the introduction of convolution operation, which has a better effect on extracting the overall information of a certain region, so the algorithms based on convolutional neural network can achieve better performance in target detection, image classification, natural language processing and other computer vision tasks. It is divided into input, convolutional layer, pooling layer, fully connected layer, and other parts according to the order of reasoning and calculation.

The role of convolutional layers is to define a convolutional algorithm. The weight parameters of this layer are trained to perform a specific convolution operation on a tensor such as the input image, which generally serves to extract important features. Usually the formula for the convolution operation can be written as: s(t)=(x*w)(t)

where x represents the loser tensor and w represents the function that defines the convolution kernel. For the convolution operation in the image, the convolution operation between the convolution kernel and the input image can be regarded as matrix multiplication, because the size of the convolution kernel is often much smaller than the input image, and the convolution operation tends to be a kind of operation to extract the local features of the image. Assuming that the input image after processing can be regarded as a matrix of 5×5, the size of the convolution kernel is 3×3, the specific process of convolution operation is to traverse from the upper left corner of the image to the lower right corner of the image, in the process of traversing the input image and the convolution kernel will be compared at each time, the output image and the convolution kernel of the corresponding position of the value of multiplication and then accumulated, each convolution kernel operation corresponds to the result of an operation, which is a scalar, and according to the convolution kernel, the convolution kernel is a scalar. Each operation of the convolution kernel corresponds to an operation result which is a scalar, and according to the trajectory of the operation of the convolution kernel, the operation result will also be filled in the corresponding position of the output matrix corresponding to the overall convolution operation result. The dimensions of the output feature map of the convolution operation are: Noutput=(WinputF+2P)S+1

Where, W represents the size of the input image, F represents the size of the convolution kernel, P represents the number of pixels filled by padding, and S represents the step size of the convolution kernel move. Through training, the parameters of the convolution kernel can be updated, and the ideal training achieves the result that each convolution kernel is perfectly fitted to their respective functions that they should have, such as extracting the edges of the target, extracting a certain morphology of the target, and so on.

The main role of the activation function is to solve nonlinear problems. During the inference computation of neural networks, since the computation basically consists of linear weighting operations, the result of the operations often lacks the performance of fitting a nonlinear objective function, which requires the introduction of activation functions to enhance the nonlinearity of the network. The commonly used activation functions are as follows:

Sigmoid function

Sigmoid function, also known as Logistic function, is a typical activation function, the mathematical definition of the function can be written: σ(x)=1exp(x)+1

Tanh function

Tanh function is also called hyperbolic tangent activation function, the main contribution of this function is to solve the mean value problem in Sigmoid function to some extent. The output range of the input data after processing by Tanh function is [–1,1], and the image of this function is centrosymmetric with the origin as the center in the right-angle coordinate system, so Tanh function can be regarded as an improved version of the Sigmoid function after translation and elongation. Although the performance of the Tanh function in the experiment is improved compared to the Sigmoid function, the problem of vanishing gradient still exists: σ(x)=tanh(x)=exp(x)exp(x)exp(x)+exp(x)=2sigmoid(2x)1

ReLU function

From the derivative function of the ReLU function, it can be seen that the ReLU function has the property that the derivative value is 1 at x ≥ 0, so the ReLU function has an effective treatment of the phenomenon of gradient disappearance, which makes the ReLU function become one of the most commonly used activation functions in various types of convolutional neural networks: σ(x)=max(0,x)={ xx00x<0

A typical way of combining modules of a convolutional network is convolution, activation and pooling. The specific process is that firstly, the input feature map is convolved by each convolution kernel on the convolution layer to get the output feature map, after getting the feature map produced by convolution the result is input into the activation function to activate the output features to stimulate the nonlinearity of the network computation, and the last step is the pooling operation, which mainly serves to merge and sparsify features extracted from convolutional and preamble modules, so as to enhance the network’s ability to perceive the overall features and at the same time reduce the network’s computation.

Fully connected layers are more common in classical fully connected networks, and in convolutional neural networks, a small number of fully connected layers are usually used to integrate and adjudicate the image target features and other information extracted by the preceding convolutional layer and other modules, so as to output the entire algorithm.

Integration and judgment are necessary to produce the final result of the entire algorithm. However, the size of the input tensor received by the fully connected layer is generally fixed, so the actual algorithm is often used to replace the fully connected layer with a convolutional layer to form a fully convolutional network, and the output of the network with a convolutional layer can also play a role in integrating the results and drawing conclusions.

Deep learning based infrared image detection
Infrared Image Target Detection and Processing

Target detection is a core task and an important research topic in the field of computer vision, whose task is to determine whether there is a target object belonging to a predefined object category in an arbitrary image, and if there is such a target object, then the corresponding category and location information of the target object can be returned efficiently and accurately. However, due to the diversity of target morphology and application scenarios, target detection is challenging. In order to overcome the difficulties in the task of target detection, a large number of scholars have invested in algorithmic research in this field, so that the relevant algorithms have been improved in various aspects. At present, deep learning-based target detection algorithms can be divided into two main categories according to the different detection ideas, the first is the Two-stage target detection algorithm based on the generation of candidate regions, and the second is the One-stage target detection algorithm based on regression.

Two-stage target detection algorithms, first extract candidate regions from the input image using a candidate region generation algorithm, and feed the generated candidate regions into the CNN for feature extraction, and then use the extracted features to classify and regress the selected candidate region targets. Such as R-CNN, SPP-Net, Fast R-CNN, Faster R-CNN, R-FCN and so on.

One-stage target detection algorithms eliminate the step of generating candidate regions using correlation algorithms. Instead, they directly implement the steps of feature extraction, target classification, and regression in the same convolutional neural network.Such as YOLO series algorithms, SSD, and RetinaNet.

The quality of the image is directly related to whether the algorithm can play the best level and whether the detection accuracy can reach the optimal performance, so before image recognition and detection, we need to preprocess the acquired dataset images. Batching the basic image preprocessing of the dataset in the early stage can reduce the impact of irrelevant information on image processing to a certain extent, and better focus on the valuable target information.

In this paper, the dataset preprocessing mainly consists of two parts, one is the denoising process according to the characteristics of infrared images, and the other is the data enhancement operation for the insufficient richness of the dataset.

Image denoising is mainly for the image acquisition process may produce a variety of noise processing, especially infrared image acquisition equipment is more complex, acquisition environment is more diverse, so the image preprocessing for infrared image dataset is particularly important.

Classification of denoising algorithms such as shown in Figure 1, the traditional infrared image denoising methods include all kinds of filtering algorithms based on the image spatial domain denoising methods, as well as Fourier transform, wavelet transform and other major image transformation domain denoising methods. In recent years, with the maturity of deep learning technology, many image denoising algorithms based on deep learning have emerged, which can be categorized into four major classes.

Figure 1.

De-denoising algorithm classification

In this paper, we use CBDNet network for image denoising, and its network structure can be divided into two parts, the first part is a five-layer fully convolutional network as a noise estimation sub-network, which is used to convert the noisy observation image x into the estimated noise level Fig. σ^(x) . The second part is a UNet with a residual structure as a denoising sub-network, which is used to get final denoising result y by taking x and σ^(x) as inputs, with residual mapping as: (x,σ^(x);WD)

The WD in Eq. (6) denotes the network parameters in the denoising sub-network, and finally the clean image after the denoising process is completed can be expressed as: y=x+(x,σ^(x);WD)

Infrared target detection based on improved YOLOv5

Considering the excellent detection accuracy and inference rate of the YOLO series target detection network, this paper designs an improved YOLOv5 network based on Transformer network. The improved YOLOv5 network structure is shown in Fig. 2. First, this paper designs Backbone according to the YOLOv5 structure, which is based on the CSPDarknet-53 structure, which stacks multiple CSP layers and convolutional modules. And in Backbone, Focus module and Spatial Pyramid Pooling Fast (SPPF) are introduced, which are used to reduce the image size and merge the contextual information of targets with different scales without loss of information, respectively.In addition, considering the problem of low accuracy of YOLO series in small target detection, a Transformer structure to enhance the network’s ability of feature extraction for small targets. In this paper, the Bot-CSP structure is improved and designed, while the Vision Transformer structure is introduced to improve the performance of the network.

Figure 2.

Improve the YOLOv5 network structure

In the Neck network, the top-down structure of FPN and bottom-up structure of PAN are still used.FPN utilizes up-sampling to generate high-resolution feature maps, while PAN computes a feature hierarchy consisting of feature maps at multiple scales, which is advantageous for multiscale detectionAfter the Backbone processing, FPN utilizes up-sampling to progressively recover the feature maps, which contain the abstract semantic information. Then jump connections merge the feature maps at different scales in Backbone into FPN to preserve the target feature regions after up-sampling. Thus, the FPN structure obtains higher resolution features by up-sampling the spatially coarser but semantically stronger feature maps from Backbone, while the PAN structure enhances these feature maps. Specifically, the merge connections in PAN feature maps of the same spatial scale from both FPN and PAN.Although the feature map of PAN has a lower level of semantic information, its activation location information is more accurate due to its smaller number of subsamples.

Embedded Platform Construction

This paper investigates deep learning based infrared and visible light image fusion and target detection technology, for the infrared and visible light image fusion part, this paper firstly proposes a two-channel self-coding and decoding image fusion algorithm, designed two-way feature extraction network for infrared and visible light source image respectively, then connects the channel dimensions and finally reconstructs the fused image using the decoding network. Then an image fusion algorithm based on improved generative adversarial network is proposed according to the adversarial network architecture combined with the previous method, which utilizes the game between the generator and the discriminator of the generative adversarial network to achieve better fusion effect and obtain better quality fused images. For the target detection part of the fused image, an improved YOLOv5 target detection algorithm based on Transformer network is designed to improve the network’s ability to obtain information from the fused image, thus increasing the accuracy of detection and reducing the time complexity. In this paper, the realization of infrared light image fusion and target detection technology relies on the embedded platform, image fusion and target detection hardware platform shown in Figure 3. The algorithm and training simulation designed in this paper are mainly developed on the PC platform. After the algorithm verification is completed, the algorithm will be transferred to the embedded platform Hi3559AV100 for actual testing. The main process is as follows:

Use infrared and visible light detectors to obtain data in the same scene, clean and label the data to obtain the final dataset.

The dataset obtained from (1) is utilized to complete the deep learning-based fusion algorithm training and target detection algorithm training for the PC platform.

Port the fused image target detection algorithm to the embedded platform HI3559AV100 for real data testing.

Figure 3.

Image fusion and target detection hardware platform

Infrared image detection experiment analysis
Experimental analysis of target detection based on improved YOLOv5
Comparative Experiments on Different Attention Mechanisms

This paper uses the attention mechanism to improve the backbone network, the more popular attention mechanisms include SENet, CA and CBAM, in order to verify the effectiveness of this paper in the backbone network part of the addition of the attention mechanism to improve the effectiveness of the strategy, here with the three attention mechanisms for comparison experiments.

Two types of datasets are used in this paper, one is a near infrared dataset made by ourselves and named Near_Infrare. The other is taken from the thermal infrared dataset released by FLIR. The former is divided into training set, validation set and test set according to the ratio of 8:1:1, and the latter is divided into training set and test set according to 8:2. The labels are all containing three types of targets, where the three types of targets for the Near_Infrare dataset are pedestrians, cars, and motorcycles, and the three types of targets for the FLIR dataset are pedestrians, cars, and bicycles.

The comparison results of the three attention mechanisms are shown in Fig. 4. After experimental verification, SENet and CA are slightly lower in mAP compared to CBAM when dealing with thermal infrared targets, which indicates that the CBAM attention mechanism can better capture the key features and improve the target detection rate when dealing with some scenarios with thermal infrared images, and illustrates the validity of adding the CBAM mechanism in the backbone network in this paper. The introduction of CBAM in the backbone network is 0.53% and 0.44% higher on mAP respectively, compared to SENet and CA. On Precision, it is higher by 3.21% and 2.25%, respectively, although on Recall, it is low compared to the other two attentional mechanisms, which illustrates that there are still some deficiencies in using the CBAM attentional mechanism only in the backbone network, and there is still room for further improvement in the overall effectiveness of the model.

Figure 4.

The comparison of the three attention mechanisms

Comparison experiment of different sampling algorithms

The up-sampling algorithm in the original YOLOv5 model has the problem of discontinuous gray values, which will have some impact on the detection rate, in order to solve the problem, this subsection hereby performs a simple replacement of the up-sampling algorithm, and conducts a comparative experiment with the other two classical interpolation methods, the bilinear interpolation and the bicubic interpolation. Among them, Bilinear interpolation calculates the changed pixel values by finding the weighted values of the four points around the mapping point, and Bicubic interpolation utilizes the grayscale values of the 16 points around the sampling point to perform three times interpolation, thus obtaining a smoother resampling effect, although the computational complexity is higher. The experiment uses Precision, Recall, mAP, and Inferrence as evaluation criteria for selecting the algorithm. The comparison results of the three types of interpolation are shown in Fig. 5, and from the data, it can be seen that the mAP value of bicubic interpolation is the highest, 0.809, and the nearest neighbor interpolation and bilinear interpolation are relatively low, with the mAP values of 0.792 and 0.807, respectively. Bilinear interpolation’s mAP values are in the middle of the range, although in terms of the inference time, it’s just the opposite, with the bicubic interpolation taking the most time, nearest neighbor interpolation taking the least time the inference time for bilinear interpolation is still in the middle. In order to ensure that the model maintains a certain degree of accuracy under the condition of minimizing the inference time, the double-cubic interpolation interpolation algorithm is chosen as the subsequent up-sampling method in this paper.

Figure 5.

Comparison results of three kinds of interpolation method

Comparative testing of different models

To verify the usability of the improved YOLOv5 image detection model algorithm in this paper, the proposed algorithm is experimentally compared with nine current mainstream algorithms.The lightweight algorithm has absolute advantages in terms of parameters, computation, and model size, making it easier to develop and deploy on hardware devices to meet the requirements of real-time detection. The lightweight algorithms were YOLOv3-tiny, YOLOv5s, YOLOv5-ghost, YOLOv5-repvgg, and YOLOv7-tiny, respectively, and the proposed algorithm were compared and tested. Table 1 shows that the weight of the YOLOv5 lightweight network model is 14.38MB, and the size of the optimized model in this chapter is 2.11MB more than that of the original YOLOv5, and the number of parameters and calculations increases slightly, but the accuracy and recall rates reach 85.7% and 81.1%, which are greatly improved compared with YOLOv5, effectively reducing the model false detection rate, and the mAP is increased by 4.8% to 86.2%, which improves the model detection accuracy. Compared with the YOLOv3-tiny, YOLOv5-ghost, YOLOv5-repvgg, and YOLOv7-tiny lightweight models, the mAP of the optimized algorithm in this paper is also far ahead.

The comparison results of the lightweight algorithm model experiment

Model Parameter quantity(M) Model size(MB) Precision Recall mAP FLOPs(G)
YOLOv3-tiny 9.35 18.08 0.800 0.620 0.706 13.58
YOLOv5 7.69 14.37 0.827 0.728 0.814 16.48
YOLOv5-ghost 4.36 8.58 0.832 0.728 0.811 8.78
YOLOv5-repvgg 7.69 15.48 0.845 0.778 0.839 16.48
YOLOv7-tiny 6.69 12.98 0.813 0.754 0.826 13.68
Ours 9.60 16.48 0.857 0.811 0.862 22.68

At the same time, in order to further verify the universality and generalization of the optimization model in this paper, the second set of comparison experiments are conducted, and the algorithm selects YOLOv5m, YOLOv6s, YOLOv7, and YOLOv8s algorithms with large number of parameters. Experimental comparisons are made with models with higher detection accuracy so as to verify the effectiveness of the algorithms in this chapter, and the results of the comparison of models with large number of parameters are shown in Table 2. The optimized model in this chapter reaches the minimum in the second group of experiments in terms of the number of parameters, the computational amount and the model size.There are five different models with different numbers of parameters in the YOLOv6 and YOLOv8 models, and the YOLOv6s and YOLOv8s models in the second group are the lighter s models in the YOLOv6 and YOLOv8 versions, respectively, with a relatively weak feature extraction capability. The mAP of the optimized model in this chapter is improved by 2.7% compared to YOLOv5m, which is 2.2 and 2.1 percentage points higher than YOLOv6s and YOLOv8s, respectively, and only 0.01% lower than YOLOv7, but the number of parameters is reduced by 27.06M compared to YOLOv7. It can be concluded from the experiments that this paper’s optimized model based on the improvement of YOLOv5 performs better overall than the above models.The model that detects targets with the best overall performance among the above models.

The parameter modulus model compares the results

Model Parameter quantity(M) Model size(MB) Precision Recall mAP FLOPs(G)
YOLOv5m 21.03 42.37 0.831 0.772 0.835 48.58
YOLOv6s 17.37 36.47 0.833 0.786 0.840 44.88
YOLOv7 36.66 74.97 0.861 0.814 0.865 103.88
YOLOv8s 11.29 22.67 0.835 0.775 0.841 29.08
Ours 9.60 16.48 0.857 0.811 0.862 22.68
Embedded real-time platform target detection test

In this section, based on the improved YOLOv5 algorithm, the implementation of infrared light image fusion and target detection technology is realized by relying on an embedded platform. Three common traditional edge deployment schemes, such as Torch, Darknet, and TensorRT, were tested in comparison with the scheme in this paper. The experiments are based on the C++ version of the inference framework and the YOLOv5sm algorithm to test the impact of quantization on model performance, and then test the performance power consumption of different deployment schemes. The experiments are based on VisDrone data test set and Xavier hardware device, the test set is reasoned with a single batch and 1536 × 768 resolution, and the Json test results are saved afterward, the accuracy is evaluated using pycocotools test tool, and the real-time evaluation is evaluated using the average reasoning time. The comparison results of the model deployment scheme are shown in Table 3, from the data comparison results, it can be seen that the accuracy of this paper’s deployment scheme is 54.87, compared with Darknet is lower by 1.04, the accuracy is ranked second, and the CPU occupancy rate is 50.21%, which belongs to the middle level compared with the other schemes. However, the running time, memory usage, and number of parameters of this paper’s scheme are optimal, which are 43.150ms, 2584360KB, and 40, respectively. The reasoning time is decreased by 50% compared to Darknet. Comprehensively, the performance of the deployed scheme in this paper is more excellent, and it is more capable of meeting the actual needs of infrared image detection.

The model deployment scheme compares the results

Index Accuracy Running time(ms) Memory footprint(KB) CPSA(%) Parameter quantity
Scheme
Torch 50.43 44.946 2815645 80.45 45
Darknet 55.91 102.757 2795642 35.42 95
TensorRT 53.32 52.854 2809325 68.45 51
Ours 54.87 43.150 2584360 50.21 40
Conclusion

In this paper, based on deep learning, we constructed an improved infrared target detection model for YOLOv5 and built an embedded platform. The results of the research in this paper are as follows:

Aiming at the problem of low target detection accuracy in infrared images, this paper introduces the attention mechanism module. The comparison of different attention mechanisms reveals that SENet and CA are slightly lower in mAP compared to CBAM when dealing with thermal infrared targets. The introduction of CBAM improves 0.53% and 0.44% on mAP, respectively, compared to SENet and CA. The accuracy is 3.21% and 2.25% higher on mAP, respectively. In the comparison of sampling algorithms, the mAP values of bicubic interpolation, nearest neighbor interpolation and bilinear interpolation are 0.809, 0.792 and 0.807, respectively.In terms of inference time, bicubic interpolation takes the most time, but on a comprehensive consideration, the bicubic interpolation interpolation algorithm is used as the sampling method in this paper, under the condition of guaranteeing the model accuracy.

The precision, recall and mAP of the infrared target detection model based on the improved YOLOv5 in this paper are 0.857, 0.811 and 0.862, respectively. Compared to the same type of lightweight algorithmic model and large parametric model, the improved model in this paper has advantages.Especially, the performance of the optimized algorithm in this paper is far ahead compared to the lightweight model.

The accuracy and CPU occupancy of this paper’s embedded platform deployment scheme are 54.87 and 50.21%, respectively, with a lower accuracy of 1.04 compared to Darknet, and a relatively centered CPU occupancy. However, the running time, memory usage, and number of parameters of this paper’s scheme are all optimal. The inference time decreases by 50% compared to Darknet. It verifies that the embedded deployment of the infrared target detection model based on the improved YOLOv5 in this paper has excellent performance, which is more capable of meeting the practical needs of infrared image detection and provides application value for the development of infrared target detection.