Acceso abierto

A Study on Visual Enhancement of Graphic Processing Techniques in Interactive Image Design in Digital Media Arts

 y   
19 mar 2025

Cite
Descargar portada

Introduction

Digital media art is a comprehensive art covering a variety of art forms, which gathers the fields of computer science, engineering, visual design and music production. The emergence of digital media art stems from the progress and popularization of computer technology, which makes the artistic expression more diversified, interactive and instantaneous, and therefore attracts more and more attention from artists and audiences [12]. The forms of digital media art are various, including but not limited to digital images, digital audio and digital video. The most important feature of digital media art is that it is highly expressive and demonstrative, and compared with traditional art, digital media art is more flexible and diverse. The application of graphics processing technology in digital media art can promote the visual enhancement of interactive images [35].

With the continuous development of science and technology, graphics processing technology has become a very important research content and application field in the digital era. Through a variety of graphic processing techniques, it is possible to discover, identify and analyze the information contained in graphics, including but not limited to color, shape, texture and so on. Graphics processing technology involves a wide range of applications, including but not limited to computer vision, medical imaging, security monitoring, virtual reality, artificial intelligence, etc., which are used to improve or enhance the quality, visibility, information content, or fit the specific needs of graphics [67].

Literature [8] emphasizes the importance of image enhancement techniques. It discusses the null domain image enhancement processing techniques and implements a classification of processing methods based on representative image enhancement techniques, which facilitates the evaluation of various image enhancement techniques. Literature [9] examines various image processing detection techniques in the recent years and provides an in-depth meta-analysis of image forgery and the corresponding detection and localization research work for the image tampering problem. Literature [10] provides not only the concepts and models needed to analyze digital images and develop computer vision and human consumption applications, but also important information on algorithm development using the CVIPtools environment, aiming to make it a reference tool in the field of digital image processing. Literature [11] emphasizes the wide range of applications of graphic image processing techniques, whose use in design can create a visual impact. Computer image graphics techniques and visual communication design are outlined and the significance of the former for the realization of the latter element is discussed. Literature [12] developed an integrated material painting design system for installation art with the help of image processing technology, aiming to improve the problems in integrated material painting for installation art through this technology. Through experiments, it is confirmed that the system has a high level of intelligence, which not only enhances the brightness of material painting, but also ensures a certain degree of recognition and clarity. Literature [13] explored the development status of visual communication and image processing technology and the direction of application of computer-based image processing technology in visual communication system, and studied the ship image optimization system of visual communication technology. The experimental results reveal that ship image optimization can effectively reduce the peak signal-to-noise ratio and ensure the visible map of the image. Literature [14] emphasizes the important role of digital image technology in graphic design. Based on the relationship between digital image technology and graphic design, the processing guidelines of the former in the latter are explored with the aim of providing reference for other researchers. Literature [15] affirms the important role of optimization methods in the visual communication of image processing techniques, which contribute to improving the quality and clarity of visual effects while ensuring user understanding. It also discusses the optimization methods for visual communication of image processing techniques and the results show that image processing techniques achieve visual communication optimization. Literature [16] aims to review image processing and denoising noise in images to improve the overall quality, and a systematic qualitative study was carried out, which found that image processing has great potential for application and development in the new era.

In this paper, the visual characteristics of the human eye are studied in depth, which paves the way for the proposed method of applying image processing technology in interactive image design, and then based on the theory of the cerebral cortex, the purpose of restoring the interactive image is achieved by changing the contrast of the image. An interactive image segmentation method for interactive images based on a dual-stream fusion network is proposed, which consists of a diversion network and a fusion network, and the image stream network in the diversion network inputs RGB images into the network as a way of extracting image information. The other network in the shunt network, the interaction flow network, is responsible for the input of interaction information, which is used as an intermediary to extract the image information. The two networks are output and fused, and after fusion, they are inputted into the fusion network. The features are extracted through the convolutional layer to complete the segmentation of the image. Through experiments, it has been verified that the two image processing techniques have significant effects on visual enhancement of interactive image design.

Application of graphic processing techniques in interactive image design

Computer graphics and image processing technology refers to the use of computers to digitally represent, store, process, and transmit graphics and images. Its main goal is to process images using digital technology, making them clearer, truer, more accurate, and with better visual effects. Computer graphics image processing technology encompasses two significant areas: graphics processing and image processing. Among them, graphics processing is mainly used for linear model graphics processing, such as CAD design, animation production, and so on.; and image processing is mainly to deal with two-dimensional images, such as digital photography, satellite remote sensing and so on.

Graphics processing technology
Image Enhancement

Image enhancement is a technique to improve the quality of an image by performing operations such as noise reduction, sharpening, and removal of artifacts [17]. Common image enhancement methods include histogram equalization, filtering, sharpening, and other methods. Among them, histogram equalization refers to the conversion of gray values in the original image to new gray values by some function mapping to achieve the effect of enhancing the contrast of the image; filtering is the application of low-pass or high-pass filters to the image to smooth the image or to highlight the details of a particular frequency, and sharpening is used to enhance the edges and details of the image by increasing the high-frequency component of the image.

Image Restoration

Image restoration is a technique to fill in the missing information by interpolating the image and other methods. In the field of digital photography, satellite remote sensing, etc., due to various reasons, there may be different types of missing information in the image such as voids, scratches, noise, etc. By using methods such as interpolation, these missing information can be recovered, making the image more complete and realistic.

Image Segmentation

Image segmentation is the process of dividing an image into multiple parts or regions [18]. Common image segmentation methods include the following:

Threshold-based segmentation: the gray values in the image are divided into several groups, each group corresponds to a threshold value, and then the pixels are segmented according to their gray values and the groups they belong to.

Region growing method: a seed point is selected first, and then pixels with similar properties around it are added into the same region until certain conditions are met to stop growing.

Edge detection algorithm: image segmentation is realized by detecting the boundary in the image, such as Sobel, Canny and other algorithms.

Cluster analysis method: the pixels are divided into several classes according to the similarity between pixels, commonly used cluster analysis algorithms include K-means, MeanShift and so on.

Visual characteristics of the human eye

Since the ultimate goal of image processing is to provide human or machine with images that can be easily interpreted and recognized, and the evaluation of image quality and image processing effect cannot be separated from the human eye visual system, understanding the characteristics of the human eye visual system can help guide the construction of more effective image processing methods [19].

With the in-depth study of the visual characteristics of the human eye and the establishment of various mathematical models, image processing has been one of the focuses of attention. The following are some of the recognized visual features of the human eye.

Resolving power of the human eye. The ability of the human eye to distinguish details of a scene is called discriminative power. Resolving power of the human eye with the eye to be observed on the scene between two points can be distinguished between the smallest angle of view θ of the inverse, that is, resolving power =1θ${1 \over \theta }$, the human eye resolving principle shown in Figure 1.

Figure 1.

Resolution of the human eye

The minimum angle of view depends on the distance between two neighboring visually sensitive cells. The larger the minimum viewing angle, the lower the ability of the human eye to distinguish details.

In addition, the human eye’s discriminatory power of the motion scene is much lower than that of the static scene, and its discriminatory angle is about 5 times that of the static scene, i.e. θ = 7.5′. This value is closely related to the human eye’s sense of continuity of the motion scene. When there are moving objects in the scene picture, even if the frame frequency of the picture meets the visual continuity requirements, which can make the visual sense of continuity of the picture, but if the objects in the front and back of the two frames move a large distance, the human eye will still feel that the objects are doing jumping motion, rather than continuous smooth motion.

Luminance discrimination vs. color discrimination. At any given luminance adaptation level, the human eye’s response to changes in luminance is nonlinear. The minimum difference in light intensity required for the human eye to subjectively just discern a luminance difference is usually referred to as the visibility threshold for luminance. That is, when the light intensity I increase, within a certain range of the human eye can not feel, must be changed to a certain value of I + ΔI, the human eye to feel the brightness has changed. Therefore, the use of brightness contrast indicates that the human eye on the perception of brightness: Cw=ΔII${C_w} = {{\Delta I} \over I}$

The expression is called contrast sensitivity. Where I is the background luminance value and ΔI is the luminance difference between the target and the background. Due to visual adaptation, the threshold contrast value corresponding to JND is a constant and the lowest discrimination threshold in the luminance range of about 2 to 1096 cd/m’. When the background luminance is either stronger or weaker, the contrast sensitivity threshold of the human eye increases and the ability to discriminate luminance differences decreases. In addition, in image recovery processing, if the error of the recovered image is lower than the contrast sensitivity, the difference between the recovered image and the original image will not be detected by the human eye.

If only the color is different under the premise that the brightness is the same, the discrimination threshold is used as the color discrimination threshold.

In the color discrimination threshold, the smallest wavelength difference that can be identified by comparing monochromatic light with each other is called the wavelength discrimination threshold.

Visual amplitude nonlinear properties. The discrimination of details by the visual system depends on the relative change in the brightness of the image rather than on the absolute brightness of the whole image, and the increment in the perception of brightness ΔS can be measured by the increment in the relative brightness, viz: ΔS=KΔII$\Delta S = K{{\Delta I} \over I}$ where I is the objective luminance and ΔI is the relative luminance increment. Integration of Eq. (2) yields the sensory luminance: S=KlnI+C=K'lgI+C$S = K\ln I + C = K'\lg I + C$

Where K′, C is a constant, K is a constant related to the average brightness of the whole image, the average brightness of the image is larger or smaller, the value of K can be chosen to be smaller, for the usual range of brightness, K can be taken to be 1. Eq. (3) illustrates that the human eye’s perceived brightness S is linearly related to the logarithm of the actual brightness B, i.e., the Weber-Fechner law.

Spatial Frequency Characteristics. A stimulus whose luminance varies according to a sinusoidal wave is presented to the subject to find the contrast of the boundaries of the perceptible light and dark streaks, which is called Michelso’s contrast and is expressed as: C=ImaxIminImax+Imin$C = {{{I_{\max }} - {I_{\min }}} \over {{I_{\max }} + {I_{\min }}}}$ where Imax, Imin is the maximum and minimum luminance of the target, respectively. Michelso contrast varies with the thickness of the stripes, i.e., with the spatial frequency, and thus it can be used to represent the spatial frequency characteristics of vision, called the contrast sensitivity function (CSF), which can be interconverted with the modulation transfer function (MTF).

Masking effect. The masking effect is generally explained as the perception of one signal being blocked by another, meaning that the visibility of the image signal is reduced due to the influence (interference) of other signals. When there are multiple signals present at the same time, the interference between them is very complex, and masking effects are categorized as follows.

Contrast masking

The masking effect is an important phenomenon that must be considered when describing the interaction between signals in multiple channels. In addition, masking effects lead to a change in the detection threshold JND of the visual system, which can be either suppressed or enhanced. When two signals have similar or identical spatial frequencies, orientations and positions, the stronger the contrast masking effect.

Entropy masking effect

The entropy masking effect is closely related to the contrast masking effect. The basic idea is that a distorted signal is easily detected in the smooth region of the image, while it can be covered in the region rich in high-frequency components.

Spatial masking

Edges in an image have a masking effect on their surrounding regions, reducing visual sensitivity to small changes in these regions. Physiologically, this phenomenon is due to lateral inhibition.

Color Masking

The sensitivity of the human eye to the R, G, B channel of the color image is different, and the human heel is the weakest sensitivity to the blue channel. Due to the complexity of the color masking mechanism, there is still no good way to model it.

Retinex-based image enhancement method for interactive images

Retinex is a synonym for retina and cerebral cortex [20], also known as cerebral cortex theory. The theory states that the color of an object is determined by the object’s ability to reflect light: the color of an object has consistency i.e., the color of an object is not subject to light non-uniformity of the image.

According to Retinex theory, an image is composed of both reflected and incident light as shown in equation (5): G(x,y)=R(x,y)L(x,y)$G(x,y) = R(x,y)L(x,y)$

G(x, y) is the color image formed by the camera, R(x, y) is the reflection image of the original image, and L(x, y) is the illumination image of the original image. The basic flow of Retinex algorithm is as follows:

Simplify the algorithm by taking the logarithm of the original image, then it is obtained from equation (5): ln(G(x,y))=ln(R(x,y)L(x,y))=ln(R(x,y))+ln(L(x,y))$\ln (G(x,y)) = \ln (R(x,y)L(x,y)) = \ln (R(x,y)) + \ln (L(x,y))$

Select the appropriate function for filtering to obtain the incident component i.e. ln(L(x, y))

Bring the result of step 2) to Eq. (6) to obtain the reflected component: ln(R(x,y))=ln(G(x,y))ln(L(x,y))$\ln (R(x,y)) = \ln (G(x,y)) - \ln (L(x,y))$

Perform a logarithmic inverse transformation on the reflected components to obtain the reflected image F(x, y) i.e. the enhanced image: F(x,y)=exp(ln(R(x,y)))$F(x,y) = \exp (\ln (R(x,y)))$

Retinex theory based enhancement methods include two categories based on global features and based on local features, the local Retinex algorithm based on the center surround Retinex algorithm is the most widely used Retinex theory methods, which can be divided into single scale Retinex algorithm (SSR), multi 1-1 single scale Retinex algorithm (SSR) scale Retinex algorithm (MSR) and Multi-scale Retinex algorithm with color recovery (MSRCR).

Interactive image segmentation method
Dual-stream convergence network

The 1 × 1 convolutional kernel, also known as a net within a net, is equivalent to a fully connected layer. It does the following:

Dimension reduction and dimension enhancement

Since the 1 × 1 convolution kernel does not change the height and width of the feature map, it only changes the number of channels, i.e., the original amount of data can be increased or decreased.

Increase nonlinearity

1 × 1 convolution kernel can maintain the size of the feature map remains unchanged that is, without loss of resolution under the premise of the use of nonlinear activation function to significantly increase the nonlinear characteristics, it is suitable for use in deep neural networks.

Cross-channel information interaction

Through the 1 × 1 convolutional kernel to achieve dimensionality reduction and upgrading, in fact, is a linear combination of inter-channel information changes, for example, a 3 × 3, 128channels convolutional kernel behind the addition of a 1 × 1, 56channels convolutional kernel, becomes a 3 × 3, 56channels convolutional kernel, which can be interpreted as the original 128 channels across the channel. After the linear combination becomes 56channels, which is the information interaction between channels. Cross entropy is defined as the difference information between two probability distributions. The loss function used in this paper for network training is the cross entropy loss function.

Currently all deep learning models use Relu, Relu function has a great advantage in the convergence speed of the gradient descent method, and its convergence speed is close to 6 times that of the Sigmiod function, Relu also solves the problem of gradient vanishing that exists in the Sigmiod function. However, due to the sparsity of Relu, it can only be used in the implicit layer. Sparsity refers to the number of null values. When the hidden layer is exposed to a certain range of input values, the function will result in more nulls, which will result in fewer neurons being activated, which will mean that part of the neural network will not work.

Interactive information

The visual enhancement interaction method chosen in this paper is the method of segmenting the object based on the extreme points, specifically by labeling the extreme points on the target, i.e., the top, bottom, leftmost, and rightmost points of the target, and then creating a two-dimensional Gaussian heatmap centered on the extreme points, and in order to make sure that the target area contained by the extreme points contains the context, the target area contained by the extreme points is widened by a few pixels.

Network structure

In this section, we propose a fully convolutional two-stream fusion network for interactive image segmentation, which is divided into two parts, the diversion network and the fusion network, and the diversion network includes the image stream network and the interactive stream network. The image streaming network refers to the input of RGB images into the network to extract image information. Interaction flow network is to input the interaction information into the interaction flow network, extract the image information through the interaction information, and then fuse the output of the image flow network and the output of the interaction flow network into the fusion network, which performs the layer-by-layer feature extraction through the convolutional layer and outputs the segmentation results to complete the segmentation of the image target.

The diversion network’s input consists of two components: one is the image and the other is the user interaction Gaussian heat map. The diversion network can output probability maps of the foreground at a reduced resolution, predicting whether a pixel belongs to the foreground or not. This network uses the ResNet-101 network as its base network. It consists of three parts: the image stream network, the interaction network, and the fusion network. The networks in the image and interaction streams consist of Convl and Conv2 of ResNet-101, and the fusion network consists of Conv3, Conv4 and Conv5, which splices the image stream features and the interaction stream features and inputs them into the fusion network. To solve the channel mismatch problem, a 1 × 1 convolutional kernel is added before the fusion network, and a 512 × 512 upsampling layer is added at the end of the network to let the output feature map be reduced to the size of the original image.

Research on image enhancement effects

In order to verify the effectiveness and credibility of the interactive interface image enhancement method of this research, simulations are conducted, and several other proposed methods are compared with the researched method to scientifically evaluate the image enhancement effect from an objective point of view.

The experiments of this research were conducted on a computer in a laboratory, and the experimental system used was Windows 10.

The objective evaluation is mainly done by calculating the corresponding values through mathematical theorems in order to objectively reflect some characteristics of the image. The main metrics used are as follows:

Average gradient, the value mainly shows the obvious gray scale difference parameter at the image boundary, i.e., the parameter that characterizes the degree of clarity of the image, the larger the value represents the more complex the image level, the higher the clarity, and the calculation formula is shown below: V=1(M1)(N1)×F(x,y)$V\> = \>{1 \over {(M - 1)\>(N - 1)}} \times F(x,y)$ where F(x, y) represents the gray value of the image at point (x, y).

The peak signal to noise ratio, the formula for this value is shown below: z=10×aLmax2MSE$z\> = \>10\> \times \>a\>{{L_{{\rm{max}}}^2} \over {MSE}}$

Where, Lmax2$L_{{\rm{max}}}^2$ represents the maximum gray value of the image cords in the image.

In addition to this, the mean square error after comparing the processing of the three methods is calculated as follows: Q=1M×Ni=1nj=1n(riej)2$Q\> = \>{1 \over {M \times N}}\>\sum\limits_{i = 1}^n {\sum\limits_{j = 1}^n {{{({r_i}\> - \>{e_j})}^2}} } $

The results of the comparison of the average gradient of the images after enhancement by the three image enhancement methods are shown in Figure 2. Based on Figure 2, it can be found that the average gradient is the highest after the studied interactive interface image enhancement methods, and the average gradient reaches 9.2% when the recognized images are 8. This indicates that the image hierarchy is better and the texture is more prominent after enhancement by the studied method. The average gradient value of the other two methods is smaller, which indicates that the sense of hierarchy of the enhanced image is still poor, and the enhancement effect is poor.

Figure 2.

Mean gradient contrast

The peak signal-to-noise ratio of the images after enhancement by the three methods is compared in Figure 3. It can be found through Figure 3 that the proposed enhancement method has the highest image SNR after enhancement, when the image is 8 images, the image SNR reaches 17.6dB, which indicates that the image distortion is minimized and the image quality is effectively improved. The other two methods have lower signal-to-noise ratios and poorer image quality after enhancing.

Figure 3.

Peak signal-to-noise ratio

The mean square error mainly represents the mean square value between the pixel values of the original image and the distorted image, and the comparison results of the three methods are shown in Figure 4. Based on the results, it can be found that after the processing of the method studied in this paper, the mean square error obtained is lower, and its mean square error is always lower than 8%, which is lower than the mean square error of the other two methods. It indicates that the degree of image distortion is lower after processing using the studied method, and a better processing effect is obtained.

Figure 4.

The average error comparison diagram

Experimental results of image segmentation methods

In this paper, the JSRT dataset is chosen to evaluate the method of this paper by comparing the performance of different networks on this dataset.

Interactive image segmentation has gained a lot of applications in many fields the art field is not an exception, and there is an increasing number of interactive frameworks for art image segmentation, in order to compare the performance of the framework proposed in this paper with other interactive frameworks, in this chapter experiments will be compared with their performances on the JSRT dataset.

Table 1 outlines the results of the experiments on the JSRT dataset using various interactions, with GC referring to processing using the graph cut algorithm. RG refers to the Region Growth Algorithm. DEXTRE is using the Extreme Point Interaction Network. MIDeepSeg is using point interaction and checkboxes for image segmentation. In this paper method is using dual stream fusion network as main network prototype for interaction. It can be seen that among these methods the GC and RG methods have the worst segmentation results among all, with the lowest indicators being Sen at 70.52%, IoU at 66.40% and Acc at 90.92%. The method proposed in this paper has the highest performance, where the highest values in Sen and IoU metrics are obtained, respectively, at 89.62% and 73.02%.

Different interactive segmentation

Interactive mode Sen/% IoU/% Acc/%
GC 76.79 67.32 90.92
RG 70.52 66.40 93.46
DEXTRE 83.38 70.89 96.55
MIDeepSeg 88.13 72.41 95.99
This method 89.62 73.02 98.46

By comparing the proposed interaction with other interactions, it can be concluded that this method is more suitable in the case of segmentation of tiny objects with many fine structures, this is because other interactions are designed to provide a priori information to the algorithm or network, but in this paper the interaction information can be used to guide the network to iterate in a better direction, so that the method in this paper is more suitable for the case of segmentation of small objects with many fine structures compared to traditional graph cut algorithms GC or region growing algorithm RG, it makes full use of the advantages of deep learning computation, because this method can use multiple networks to explore more results and introduce user guided segmentation results, so it can be a good performance of the user’s expectations of the segmentation results. Fig. 5 shows the real segmentation result of the image, Fig. 6, Fig. 7 and Fig. 8 show the segmentation result of GC, RG and this paper’s method with its interactive framework method on the dataset JSRT, respectively, from the image it can be clearly seen that the segmentation result of this paper’s method using the dual-stream fusion network is better than the other two methods in the microstructure part. The performance is as expected and can be compared with the actual labeled images to achieve excellent results.

Figure 5.

True segmentation

Figure 6.

GC algorithm image

Figure 7.

RG algorithm image

Figure 8.

Algorithm image of this article

Evaluation of visual enhancement effects
Physiological experimental data analysis

After conducting the eye movement experiments, Shapiro-Wilk normality tests were performed on the first gaze time, total gaze time, number of gaze points, and average pupil diameter for the original interactive image design scheme and the optimized scheme based on the algorithm in this paper. The normal distribution of the data for specific eye movement indexes is shown in Table 2.

Eye movement index data normal distribution

Eye indicator Plan Statistics Freedom Significance
Initial injection time Original plan 0.925 14 0.169
Optimization plan 0.972 14 0.753
Total injection time Original plan 0.936 14 0.332
Optimization plan 0.957 14 0.628
Fixation point Original plan 0.959 14 0.534
Optimization plan 0.933 14 0.331
Mean pupil diameter Original plan 0.911 14 0.124
Optimization plan 0.917 14 0.137

According to the test results, it can be seen that the four eye movement indicators are in line with normal distribution (p>0.05), and thus the paired-sample t-test can be further performed on the original and optimized solutions. The results of the specific eye movement index data analysis are shown in Table 3. According to Table 3, it can be seen that there were significant differences between the original and optimized protocols in the four indicators of first gaze time, total gaze time, number of gaze points, and average pupil diameter (p<0.001). To put it simply, the optimized scheme outperforms the original scheme in terms of first gaze time, total gaze time, number of gaze points, and average pupil diameter, resulting in an improvement in the user’s visual efficiency.

Eye action index data t test

Eye indicator Original plan (mean ± standard deviation) Optimization scheme (mean plus ± standard deviation) t Significance (double tail)
Initial injection time/ms 7644.612±1453.234 4822.613±914.756 9.305 P<0.001
Total injection time/ms 7474.265±946.527 6079.599±978.069 6.425 P<0.001
Fixation point/one 35.668±3.965 20.579±1.406 11.879 P<0.001
Mean pupil diameter/mm 28.441±0.776 26.467±0.626 13.754 P<0.001
Analysis of subjective evaluation data

Comparative analysis of the original interface design solution and the optimized interface design solution, and evaluation of the user’s subjective cognitive load using the NASA-TLX scale [21].

The NASA-TLX scale is a classic subjective workload evaluation tool that can be used to assess subjective cognitive load and work stress in task performance. It provides multiple dimensions of evaluation indicators, each of which corresponds to a scale, with subscale scoring intervals of 0-100 and 5 points for each interval, with the leftmost and rightmost ends corresponding to the lowest and highest scores, respectively. After completing the experimental task, the subjects scored the six evaluation indicators according to the subjective experience of the user’s feelings about the interface information acquisition and operation behavior during the task, and determined the weights of each indicator. The six indicators were paired two by two to get 15 pairs of paired indicators, and the relatively more important one in each pair of indicators was selected and assigned a score of 1. The maximum score for each indicator was 5. Ultimately, the combined weighted average score allows for an accurate assessment of the user’s subjective workload.

The cognitive load formula for the NASA-TLX Subjective Cognitive Load Scale. Wi=NiN,N=i=16Ni${W_i} = {{{N_i}} \over N},N = \sum\limits_{i = 1}^6 {{N_i}} $

In the experiment, the importance of the six evaluation indicators was expressed in the form of indicator weights and brought into the total cognitive load calculation. In the formula, Ri represents the score of each indicator, Wi represents the weight of each indicator, and Ni represents the number of indicators for which the i th indicator was selected as important. W=i=16WiRi$W = \sum\limits_{i = 1}^6 {{W_i}} {R_i}$

After the subjects completed the operation task, they scored each of the six evaluation indexes using the NASA-TLX though scale and ranked them in order of importance.

The subjective cognitive load data of the original scheme and the optimized scheme are shown in Figure 9. The score of the NASA-TLX evaluation scale indicates the level of cognitive load, and the lower the score means the lower the level of cognitive load. According to the subjective cognitive load data, it can be seen that the score of NASA-TLX evaluation scale of the original scheme is 58.56 and the score of the optimized scheme is 32.55. Therefore, the cognitive load of the optimized scheme is lower, and it is easier for users to recognize and understand the information in the target image, indicating that the optimized scheme designed in this paper achieves the purpose of visual enhancement.

Figure 9.

Total cognitive load

Conclusion

Based on the study of users’ visual enhancement needs in interactive image design, this paper proposes an interactive image image enhancement algorithm based on Retinex and an interactive image image segmentation algorithm based on dual-stream fusion network.

The interactive image image enhancement method designed in this paper has the best performance in terms of average gradient, image signal-to-noise ratio, and mean square error, and its average gradient, image signal-to-noise ratio, and mean square error reach 9.2%, 17.6 Db, and 8%, respectively, when the images are 8 frames. It shows that the hierarchy and texture of the image after image enhancement are more obvious, the distortion of the image is small, the image quality is further improved, and the processing effect is better than the rest of several algorithms.

The interactive image segmentation method outlined in this paper is the most successful in Sen and IoU metrics with a score of 89.62% and 73.02%. The segmented image on the dataset JSRT is more complete and clear, and its performance in the tiny structure part is better than the other two methods, and the gap between the comparison with the real labeled image is smaller. It validates the superiority of the image segmentation method in this paper, which is useful for visual enhancement of interactive image design.

The performance of the optimized scheme, which integrates two excellent algorithms, is improved compared with the original scheme in the indexes of first gaze time, total gaze time, number of gaze points, and average pupil diameter, indicating that the optimized scheme in this paper saves the user’s visual efficiency.

The score of the NASA-TLX evaluation scale for the optimized scheme is 32.55, which is 26.01 points lower than the score of the original scheme. It shows that the cognitive load of the optimized scheme is lower, which is conducive to the user’s recognition and understanding of the designed image, and achieves visual enhancement in visual art image design.