A study of the intersection of art and design in the advertising industry and computer science and technology

Into the 21st century, the rapid development of information technology makes the digital media era ushered in a high-speed development stage, thanks to the support of computer and Internet technology, mankind has entered the digital era [1-2]. Driven by the rapid development of computers, Internet technology and multimedia technology, human production, lifestyle and way of thinking have undergone obvious changes, and the same art and design have also produced new changes with the development of information science and technology [3-6].

As the most mainstream form of advertising communication, digital media art has a strong vitality and unlimited development prospects, which is incomparable to the traditional information service field [7-8]. It is also because of the innovativeness presented by digital media that people can fully feel the convenience brought by the information age, and moreover, people can feel the impact brought by the charm of art in an all-round way [9-10]. With the continuous development and prosperity of digital media, advertising art forms are also endless, and the speed of product advertising art design is even faster. People usually brush in social software, short videos to some of the advertisements after watching with the passage of time quickly forgotten, even if it reaches a certain degree of saturation marketing, the public’s visual and consumer desire to form a positive stimulus, then the product advertising design can be said to be ineffective [11-14]. At present, most of the advertising design is only concerned about the publicity aspects of the product, not standing in the public’s perspective to see, not a good combination of product and advertising art. Most of the product advertisements do not satisfy the modern people’s aesthetic demand for art and emotional needs, because the designers did not make more thinking in improving the audience’s acceptance of art and the ability to taste [15-18].

Digital media art presents a great attraction to the public in advertising art. Dynamic images and short videos can increase the visual impact of advertisements and attract the attention of the audience, and digital images and virtual reality technology can make advertisements more realistic and improve the immersion and experience of advertisements [19-21]. It can be seen that the power of digital advertising will only grow year by year. The creation of new media art has pushed digital technology to the forefront [22-23].

In this era of fresh blood, art design in the digital media environment should also dare to break through the traditional art forms, the pursuit of diversified development in design style and form, should not be limited to the traditional comfort zone, the courage to try, the courage to “jump off” [24-26]. With the progress of the times, the development of art, the traditional advertising design has failed to stimulate people’s aesthetic physiological mechanisms, the uniform advertising design so that the public no longer have a sense of freshness, thus affecting the commercial value of the product [27-28].

Valeriivna Pryshchenko, S studied the methodology of creating the elements of art, culture, color, and connotation in the design of existing advertising products, with the types of advertisements ranging from posters to digital media advertisements [29]. Zhang, Q proposed the guidelines that should be followed in the design of advertisements such as the principles of innovation, aesthetics, humanity, and comprehensiveness, and pointed out that the visual communication design concepts play an active role in enhancing the design effect of online advertisements [30]. Pryshchenko, S et al. explored the use and performance of color elements in advertising design, and believed that the audience of color elements is a group of people with aesthetic taste and national color, and they are mainly from Italy, Switzerland and other countries [31]. Hüttl-Maack, V illustrated that the injection of artistic elements into the design of product advertisements does not have a significant effect on purely hedonistic products, but if the product is positioned as moderately hedonistic, it will to some extent influence the preferences and consumption intentions of people with a high level of artistic interest [32]. Estes, Z et al. conducted assessment test experiments revealing that branding has a significant effect on the emotional perceived value infused by artistic elements in advertising design, and that this emotional mediation is more pronounced for utilitarian products as opposed to hedonic products [33].

Artistic design elements are very important for advertising product design, the same information technology empowered advertising design is also very meaningful, Liu, Z based on the interactive characteristics of advertising, put forward the information technology and visual communication design concepts into the scientific fusion, in order to enhance the interactive effect of people and advertising [34]. Yang, J describes the fall of traditional art forms in the new media era and explores the innovation of traditional hand-drawn art as well as the forms and paths of incorporation into new media advertising design [35]. Gao, Y et al. combined tools such as observation method, case study method and questionnaire survey to examine the practice of digital media art in advertising design, and clarified that the large-scale application of digital media technology in advertising design is the future development trend of the advertising industry [36]. The future development of advertising design should pay more attention to the organic integration of computer science and technology and art design, while the current relevant research is relatively shallow, mainly for descriptive social practice type of research, the depth of research still needs to be improved.

In this paper, Generative Adversarial Network (GAN) is selected as the technical support to carry out the research of artistic design in the advertising industry from two parts, namely, advertisement layout generation and advertisement image style migration. Based on the layout principle of LayoutGAN, an advertisement layout generation model FL-GAN composed of multimodal embedding network and layout generation network is proposed, and the loss function is constructed by the least squares method. The multimodal embedding network and the layout generation network learn the semantic features of users’ preferred clothing advertisement layouts, and further generate reasonable and beautiful advertisement layouts. Aiming at the problem of style migration such as image style is not obvious, contour constraints are introduced, and the advertisement image style migration model based on contour constraints is proposed. Corrosion and fuzzy techniques are used to simulate the diffusion effect of pigment, to constrain the object contour in the content image, and SmoothL1 is used instead of L1 to optimize the loss function and improve the stability of model training. Through the simulation experiment of advertisement image style migration to test the effectiveness of the style migration model in this paper, and use it for the design of international cosmetic print advertisements, with the help of eye tracking technology to obtain eye movement data to analyze the actual application effect and performance.

2

Advertisement layout generation model based on generative pair networks

The layout generation model aims to learn the relationship between the layout elements of a page, so as to automatically generate a layout with reasonable typography and aesthetic meaning. The layout design of an advertisement is related to the hierarchical structure of the page, functional partition and information communication, which in turn affects the visual flow of the consumers, and the key point of its design is to follow the basic design guidelines and aesthetics. This chapter combines computer science and technology to propose an advertisement layout generation model based on generative adversarial network to realize the efficient generation of advertisement layout, which provides the basis for the style migration of the subsequent advertisement painting images [37].

2.1

LayoutGAN structural principles

LayoutGAN is a proposed layout model for graphic design that synthesizes layouts by modeling the geometric parameters (size, position) of different graphic elements [38]. Before LayoutGAN was proposed, there were two main structures for data-driven layout based on typography, one layout network consists of fully-connected networks connected by geometric parameters as input and output, and the other consists of convolutional neural networks with wireframe diagrams as input and output. LayoutGAN combines the above two network structures to realize the rendering of micro-geometric parameters to layout wireframe diagrams. LayoutGAN combines the above two network structures to realize the rendering from micro-geometric parameters to layout wireframes, where the inputs of the model are geometric parameters, which are mapped into new geometric parameters and corresponding wireframes, and outputted. The advantage of LayoutGAN compared with other layout models lies in the fact that the ability to recognize the authenticity of layouts is effectively improved through the introduction of another additional discriminator and its training.

2.1.1

LayoutGAN generator composition

The generator first inputs the randomly generated category labels and bounding boxes into the self-coding model. The input random arrays are first processed by an encoder consisting of a multilayer fully connected network and a deep neural network consisting of a multilayer self-attention network, followed by a multilayer fully connected network with an attention layer where the input categories and geometric parameters are processed to become real from random to capture the global relationships of the layout elements. Where the multilayer fully connected layer network first embeds the class label vector and geometric parameters of each graphic element and the self-attention module is used to embed the features of each graphic element. Denoting f(p_i, θ_i) as the embedded feature of graphical element i, the refined feature representation f(p_i, θ_i) can be obtained by a contextual residual learning process which can be defined as: (1) $f^{'} (p_{i}, θ_{i}) = W_{r} \frac{1}{N} \sum_{\forall j \neq i} H (f (p_{i}, θ_{i}), f (p_{j}, θ_{j})) U (f (p_{j}, θ_{j})) + f (p_{i}, θ_{i})$

In the above equation the unitary function U represents the computation of the embedded features of element j, H represents the computation of the scalar value of the relationship between element i and element j, and the matrix W_r represents the computation of the weights of the linear embedding, which further generates the contextual residuals of f(p_i, θ_i) for feature refinement. Where the dot product of H is calculated as: (2) $H (f (p_{i}, θ_{i}), f (p_{j}, θ_{j})) = φ {(f (p_{i}, θ_{i}))}^{T} ϕ (f (p_{j}, θ_{j}))$

where both φ(f(p_i, θ_i)) = Wφf(p_i, θ_i) and ϕ(f(p_j, θ_j)) = Wϕf(p_j, θ_j) are linear embeddings.

2.1.2

LayoutGAN discriminator composition

The discriminator will linearly render the data generated by the generator and compute the adversarial loss of the geometric parameters. The initial wireframe image input by LayoutGAN is constantly doing adversarial training with the generated wireframe image and the discriminator constructed by CNN is utilized to distinguish the authenticity of the wireframe image, which in turn learns the visual attributes of the classified layout. In the rendering process of the discriminator, it is assumed that the number of layout elements is N with parameter {(p₁, θ₁),…,(p_N, θ_N)}, the gray scale image presented by the elements is defined as F_θ(x, y), where θ represents the shape (dots, rectangles, triangles), and the image size of each individual F is defined as W × H, and W and H represent the width and height of the elements in pixels, respectively. And the layer output of the CNN is a multichannel image I of W × H × M, where M represents the number of element types, the channel corresponds to the element type, and the letter c represents the probability that the image I belongs to class c. It can be understood that the pixel (x, y) of the output image I is the class activation vector of that pixel, which is defined by the formula: (3) $I (x, y, c) = \max_{i \in [1... N]} p_{i, c} F_{θ_{i}} (x, y)$

2.1.3

Layout element wireframe rendering

The black grid represents the grid location where the target image is located, the blue solid lines are the differentiable functions of the geometric parameters and class probabilities represented by rasterized wireframes, and the orange dotted-dashed lines represent the mapping of the graphical elements in the grid. The wireframe rendering process of the graphical elements enables backward propagation of the gradient to the geometric parameters and class probabilities for joint optimization.

A rectangular element can be represented by the coordinates of the upper-left and lower-right corners as θ = (x^L, y^T, x^R, y^B) and the coordinates of rectangle i as $θ_{i} = (x_{i}^{L}, y_{i}^{T}, x_{i}^{R}, y_{i}^{B})$ . The rendering of the rectangle in (x, y)-space can be represented as: (4) $. F_{θ_{i}} (x, y) = \max (\begin{matrix} k (x - x_{i}^{L}) b (y - y_{i}^{T}) b (y_{i}^{B} - y), \\ k (x - x_{i}^{R}) b (y - y_{i}^{T}) b (y_{i}^{B} - y), \\ k (y - y_{i}^{T}) b (x - x_{i}^{T}) b (x_{i}^{R} - x), \\ k (y - y_{i}^{B}) b (x - x_{i}^{L}) b (x_{i}^{R} - x) \end{matrix})$

With reference to the network structure of the LayoutGAN model and the layout generation principle, this paper proposes the content-aware model FL-GAN for generating print advertisement layouts.

2.2

FL-GAN model network structure design

FL-GAN consists of two parts: multimodal embedding network and layout generation network. The multimodal embedding network consists of multiple encoders, namely image encoder, text encoder, and attribute encoder. By using the encoders to construct the semantic embedding network, the multimodal features in the layout are further learned to model the conditional distribution of the advertisement layout in terms of image vision, text semantics, and design attributes. Among them, the image encoder learns visual features, the text encoder learns textual features, and the attribute encoder learns content features. The layout generation network consists of three parts: the generator, the discriminator and the encoder, which takes the multimodal feature y as the condition and learns the high-level features of the layout, and the generator and the encoder together form a generative network for adversarial learning with the discriminator.

2.2.1

Constructing a multimodal embedding network

The multimodal embedding network is jointly constructed by three encoders, namely, the image encoder, the text encoder and the attribute encoder, and the role of the encoders is to learn the layout element features and fuse them to form the category labels y. The multimodal embedding network takes layout images, keywords, and design attributes as inputs, and the outputs are the corresponding feature vectors, which are reshaped to vector dimensions by a fully connected network to output dimension vectors adapted to the inputs of the layout generation network.

2.2.2

Constructing Layout Generation Networks

Layout generation network introduces an encoder on top of the generator and discriminator, which learns layout-aware features by taking user-preferred advertisement layouts x as input and learning layout-aware features. The principle of layout generator and discriminator is the same as the principle of Conditional Generative Adversarial Network (CGAN), the generator first receives as input a random vector z with dimension 128 and and a category label y with the same dimension, which is the output feature of the multimodal embedding network y [39]. The generator maps the random vector z to a layout sample $\bar{x}$ conditioned on y, which has the same distribution as the training data.

2.2.3

Constructing the loss function

The reasonableness of the loss function selection is important for the performance of the model [40]. Since the least squares method is a common optimization method to find the optimal value of the objective function, this method uses the least square error to estimate the parameters, which can make the total residuals between the true and predicted values of the model to be minimum [41]. Therefore, the least squares method is used to construct the loss function in this experiment. Following this method, the formulas for the loss functions of the discriminator, generator and encoder are shown below, respectively: (5) $L_{G A N}^{D} = \frac{1}{2} {(D (x, E (x, y), y) - 1)}^{2} + \frac{1}{D} {(D (G (z, y), z, y))}^{2}$ (6) $L_{G A N}^{G} = \frac{1}{2} {(D (G (z, y), z, y) - 1)}^{2}$ (7) $L_{E} = L_{r e c} + L_{K L}$

In Equation (5), L_rec denotes the reconstruction loss of the encoder, which is used to calculate the difference between the reference image and the reconstructed image, thus ensuring the period consistency between the encoder and the generator. L_KL denotes the scattering loss, which serves to make the output distributions of the encoder and the generator close to each other, ensuring that the respective feature vectors are distributed in approximately the same space.

3

Advertisement Image Style Migration Model Based on Generative Adversarial Networks

In the previous chapter, this paper proposed an advertisement layout generation model based on generative adversarial network to realize the automated generation of advertisement layout. In this chapter, we will continue to use the generative adversarial network technology as the basis to realize the image style migration on the generated advertisement layout typography to enhance the artistic design effect of the advertisement.

3.1

Overall network structure

This chapter proposes a style migration method for art painting images based on sum contour constraints. The training network contains generator G, generator F, and discriminator D. The encoder VGG is used to extract the features of the input image, the overall generator G is responsible for converting the real image to the target domain image, the generator F is responsible for converting the target domain image output from G back to the source domain, and the discriminator D is used to discriminate whether the converted image is realistic or not.

Compared to the traditional methods, the network in this chapter achieves the modeling of contours through edge detection and image processing techniques. This only improves the quality of the migration and also enables the complex color changing special effects in the art creation process to be well simulated.

3.2

Optimization of loss constraints and loss functions

3.2.1

Contour constraints

Contour constraints were developed in order to clearly describe the contours of the objects in the art design images. To make the global color tone and diffusion effect consistent between the real art design y and the generated art design g(x), that is, to clearly describe the contours of the objects in the oil painting image, for example, the main body of the house and the surrounding buildings should have strong contours in the generated image. In order to model the various types of oil painting paintings with different styles in a uniform manner, the contour constraints are formulated, and erosion operations and Gaussian blurring operations are applied to simulate it, and by adjusting the size of the flushing kernel and the deviation of the Gaussian blurring function, contour constraints adapted to the region of the oil paintings can be formulated, and the image processed by erosion and blurring can better demonstrate the hue and diffusion effects of the migrated image.

The Gaussian blur function G_kl shown in equation (8), where is Θ the erosion operator, B is the erosion kernel, y_eb is the real painting after erosion and blurring, G(x)_eb is the generated painting after erosion and blurring, and D_I is the adversarial discriminator, which is trained to distinguish between y_eb and G(x)_eb. (8) $G_{k l} = \frac{1}{2 π σ^{2}} \exp (- \frac{k^{2} + l^{2}}{2 σ^{2}})$ (9) $y_{e b} (i, j) = \sum_{k, l} {(y Θ B)}_{i + k, j + l} \cdot G_{k, l}$ (10) $G {(x)}_{e b} (i, j) = \sum_{k, l} (G (x) Θ B)_{i + k, j + l} \cdot G_{k, l}$

The loss of contour constraints is defined as follows: (11) $L (G, D_{I}, X, Y) = E_{y ~ P_{d a t a} (y)} [l o g D_{I} (y_{e b})] + E_{x ~ P_{d a t a} (x)} [l o g (1 - D_{I} (G {(x)}_{e b})$

3.2.2

Optimization of the loss function

L1loss in CycleGAN networks is often used to measure the difference between the generated image and the real image, which promotes the generated image to be as close as possible to the real image. However, L1loss may cause the detail information of the generated image to be too smooth in some cases, losing features such as sharp edges and detailed textures of the real image. Replace the Lloss in the cyclic consistency loss and identity loss in the generator with SmoothL1loss. The formula for L1loss is as follows: (12) $l o s s (x, y) = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - f (x_{i}) |$

where y_i is the true value, f(x_i) is the predicted value, and n is the number of sample points. The formula for SmoothL1loss is as follows: (13) $l o s s (x, y) = \frac{1}{n} \sum_{i = 1}^{n} {\begin{matrix} 0.5 * {(y_{i} - f (x_{i}))}^{2}, i f | y_{i} - f (x_{i}) | < 1 \\ | y_{i} - f (x_{i}) | - 0.5, o t h e r w i s e \end{matrix}$

In contrast, the SmoothL1 loss function has the advantage of being insensitive to outliers. The SmoothL1 loss function is based on the use of a squared function in regions with large gradients (i.e., regions where the absolute error is less than 1) and an absolute value function in regions with small gradients, which makes the SmoothL1 loss function insensitive to outliers. In CvcleGAN, this means that when there are some extreme pixel values between the generated image and the real image, the SmoothL1 loss function may be more stable than the L1 loss function, and the generated image has better detail information. The SmoothL1 loss function can make the generated image retain more detail information, such as sharp edges and textures, in some cases. This is because the SmoothL1 loss function uses an absolute value function in regions where the gradient is small, which makes it less penalized for small errors, thus allowing the generated image to retain more detail. The SmoothL1 loss function can improve the smoothing detail problem that may exist in the L1 loss function to a certain extent, and has the advantage of being insensitive to outliers, which improves the quality of the migrated generated images.

4

Simulation experiment on the migration of advertising image styles

In order to verify the effectiveness of the proposed generative adversarial network-based advertisement image style migration model in this paper, this chapter carries out simulation experiments on advertisement image style migration, and compares the model in this paper with three other image style migration methods, SNI, DAT, and TransEditor, as well as the commonly used StyleGAN-2 method. To ensure the fairness of the comparison, all methods were retrained on the same dataset using the same configuration.

4.1

Comparative analysis of generated results

In the quantitative comparison, the commonly used FID metrics and PPL metrics were used to evaluate the generation quality of image style migration, respectively. The quantitative comparison results of the traditional painting elements image dataset are shown in Table 1. It can be seen that except for StyleGAN-2’s slightly higher FID score on Gongbi landscape painting, the FID of this paper’s model on all other datasets outperforms that of other comparison methods. By comparing the decoupling index PPL, the PPL scores of this paper’s model are 65, 45, 27.5, 79.1 and 48.6 for Tangka flower painting, ink landscape painting, face painting, potted plant painting and Gongbi landscape painting, respectively, which are lower than those of other comparative models. It can be seen that the decoupling performance of this paper’s method on the traditional painting dataset dataset is better than other methods, and it can better control the generation of painting element images with multiple combinations of contents and styles.

Table 1.

FID and PPL indicators

Dataset	FID↓					PPL↓
Dataset	StyleGAN-2	SNI	DAT	Trans-Editor	Ours	StyleGAN-2	SNI	DAT	Trans-Editor	Ours
Thangka flower painting	26.1	67.4	68.7	26.1	21.7	106.7	154.2	142	137.7	65
Ink landscape painting	42.3	126.8	-	64.6	41.4	66.3	244.3	-	206.8	45
Face painting	33.9	31.7	126.3	53.3	24.4	61.7	261	174.8	75.3	27.5
Potted painting	30.1	102.5	60.3	59.1	28.7	117.7	163	84.3	89	79.1
Fine brushwork landscape painting	45.7	76.3	148.5	91.2	48.3	57.4	370.4	76.4	137.9	48.6

In order to further compare the decoupling performance of the different generation methods, in addition to measuring the decoupling scores on the conventional latent space, the decoupling scores of each network were also measured in Z Gaussian noise space, W_C content space, and W_S style space, respectively. The quantitative results of the decoupling performance of all methods in different spaces on the conventional painting element dataset are specifically shown in Table 2. TransEditor was tested only in W_C content space and W_S style space due to its dual-space nature, and StyleGAN-2 is a non-decoupled design method, so no comparison is made. It can be seen that. The method in this paper achieves optimal results in decoupling performance in all spaces. Taking face painting as an example, the decoupling scores of this paper’s method in Z Gaussian noise space, W_C content space, and W_S style space are 53.5, 26.3, and 2.3, respectively, which are lower than other methods.

Table 2.

Decoupling performance

Dataset	Evaluation index	SNI	DAT	TransEditor	Ours
Thangka flower painting	Z PLL↓	287.4	299.1	-	84.6
	W_C PLL↓	156.3	89.4	123	46
	W_S PLL↓	55.6	52.1	44.6	8.4
Ink landscape painting	Z PLL↓	455.9	-	-	95.7
	W_C PLL↓	249.6	-	145	39.8
	W_S PLL↓	233.4	-	59.5	3
Face painting	Z PLL↓	494.3	911.7	-	53.5
	W_C PLL↓	219.1	93.9	69.6	26.3
	W_S PLL↓	179.8	34.7	3.2	2.3
Potted painting	Z PLL↓	233.7	132.9	-	126.4
	W_C PLL↓	159	77.4	83.8	75.7
	W_S PLL↓	87.7	5.2	6.1	3.1
Fine brushwork landscape painting	Z PLL↓	670.5	160.3	-	71
	W_C PLL↓	433.7	71.3	93.7	42.2
	W_S PLL↓	261.8	9.5	45.5	5

In order to compare the ability of each method to generate high quality images, subjective tests were also conducted in this paper. A total of 800 images were randomly generated for each method. This is because scoring traditional paintings is different from scoring natural images from everyday life. Therefore, 10 professional observers were invited to classify the generated images into two levels, i.e. “acceptable image quality” and “unacceptable image quality”. The subjective test results are shown in Figure 1. Compared with other methods, this paper’s method has a higher percentage of acceptable image quality, 66.8% of the images are recognized by the professional related to recognize that the generation quality reaches an acceptable level achieved. More high quality images can be obtained by this paper’s method compared to other methods, which partially validates the effectiveness of the proposed generative network.

4.2

Extended Experiments on Natural Image Generation

To demonstrate the generality of the method in this paper, we performed extended experiments on several natural image datasets, including the Oxford 102 Flower real flower dataset, the Caltech-UCSD Birds-200-2011 bird, dataset, the automobile dataset, and the Flickr-Faces-HQ high-definition face dataset. For a fair comparison, both this paper’s method and StyleGAN-2 were retrained on the same dataset using the same configuration. The quantitative comparison results are specifically shown in Table 3. It can be seen that except on the bird dataset, the PPL score of this paper’s method is 180.7, which is higher than the PPL score of StyleGAN-2, and the performance is slightly worse than that of StyleGAN-2, while the FID and PPL scores of the other datasets outperform that of StyleGAN-2. Although the method proposed in this paper is designed for advertising painting element images, for natural images this method can still obtain excellent generation quality and decoupling performance, which verifies the versatility of this paper’s method.

Table 3.

Quantitative comparison

Dataset	Resolution	Quantity	StyleGAN-2		Ours
Dataset	Resolution	Quantity	FID↓	PPL↓	FID↓	PPL↓
Real flowers	128	8180	19.1	56.2	16.7	53.7
Birds	128	8165	12.3	164.3	9	180.7
Car	128	8846	17.1	101.2	16.1	97.7
HD face	256	50000	11.4	47	9.9	45.7

5

Practice in applying style migration to advertising images

This chapter adopts the Generative Adversarial Network-based advertisement image style migration model proposed in this paper to design international cosmetic print advertisements and obtain cosmetic advertisement design works. This chapter expects to utilize eye-tracking technology to accurately reflect the subjects’ attention and cognition when viewing advertisements, and explore the effect and performance of this paper’s advertisement image style migration model applied to advertisement design.

5.1

Experimental design

5.1.1

Experimental subjects

Eye-tracking studies usually use a relatively small number of participants, and a sample size in the range of 12~63 is appropriate. In view of the fact that the subject of the advertisement work designed in this chapter is cosmetics, and that college students have a certain amount of cosmetic purchasing experience and aesthetic judging ability in light of existing eye movement-related studies, it is generally considered feasible and representative to select college students for the eye-tracking experiment. Therefore, this experiment utilized a computer to randomly select 40 undergraduate and graduate students as subjects with normal vision, color vision, normal corrected visual acuity, and native Chinese language, all of whom volunteered to participate in the experiment, and all of whom had previous experience in purchasing cosmetics. A total of 50 subjects were recruited to participate in the experiment, and each subject viewed 8 experimental stimulus images of print advertisements, including the advertisements designed by applying the advertisement image style migration model in this paper.

5.1.2

Experimental equipment

The Tobii Pro Lab X3-120 is a high-performance portable eye-tracking device with a sampling rate of up to 250 Hz that can be used in a variety of locations, and can be secured underneath the monitor, with the subject sitting on the monitor, to conduct the experiment. The monitor is 15.6 inches in size and has a screen resolution of 1920 x 1080. Tobii Pro Lab eye tracking software was utilized to collect eye tracking data from the subjects in the experiment.

5.2

Analysis of eye movement data

In order to better analyze the data of the experiment, this study first carried out the basic processing of the data to summarize the results of the four eye-movement indexes, namely, spokesperson, brand name and tagline, of the eight print advertisements. Through the data of gaze in different interest areas, the different effects on subjects’ attention produced by various parts of different cosmetic print advertisements can be found. The eight print advertisement images are coded as A1~A8, where A1 is the advertisement work designed by applying the advertisement image style migration model in this paper.

5.2.1

Spokesperson’s Area of Interest

The various eye movement data of the spokespersons of the eight print advertisements are specifically shown in Table 4. From the table, it can be found that the subjects’ gaze time for the spokesperson image in A1 print advertisement is the longest and the most times, with the mean value of 1.676s and 12.622, respectively, while the gaze time for the oriental spokesperson image in A6 print advertisement is the shortest and the fewest times, with the mean values of 1.073s and 8.54. In terms of the first time gaze time, the first time gaze time of the western spokesperson image in A4 is the shortest, and the first time gaze time of the oriental spokesperson image in A6 is the longest. A6 had the longest first gaze duration for the Eastern spokesperson image. In terms of the duration of the first gaze, the subjects had the longest first gaze to the image of the oriental spokesperson in A8, and the shortest first gaze to the image of the oriental spokesperson in A6. Although the advertisement designed by applying this advertisement image style migration model, A1 is not able to attract consumers’ attention the fastest, it is the best in terms of sustaining consumers’ attention, and consumers may have carried out more mental activities on it than on other print advertisements.

Table 4.

Eye tracking data

Plane advertising	Length of gaze (s)	Number of fixations	First fixation time (s)	Duration of first fixation
A1	1.676	12.622	2.957	0.108
A2	1.489	11.036	2.312	0.087
A3	1.434	10.825	2.202	0.103
A4	1.144	9.924	1.85	0.109
A5	1.537	12.039	2.287	0.096
A6	1.073	8.54	3.657	0.092
A7	1.427	11.585	2.846	0.101
A8	1.313	10.038	3.833	0.117

5.2.2

Brand Name Area of Interest

The eye movement data of the brand name area of interest in each advertisement work is specifically shown in Table 5. From the table, it can be found that the foreign language brand name in the I1B1S2 print advertisement has the shortest gaze duration, the foreign language brand name in A6 has the longest gaze latency, and the sinicized brand name in the A3 print advertisement has the shortest first gaze duration. In addition, the attention time and the duration of the first time attention of the Chineseized print advertisement in A1 print advertisement are the longest, reaching 1.15s and 0.185s respectively, while the duration of the first time attention is only the shortest, 3.461s, and at the same time, it also has the most number of times of attention, which is 8.175. The above data indicate that the brand name design of the advertisement work A1, which is designed by the application of this paper’s advertisement style migration model, is the most effective, not only attracting the interest of foreign language brand names, but also attracting the attention of foreign language brand names. The above data shows that the brand name of advertisement A1 designed by applying the advertisement image style migration model of this paper has the best effect, which not only arouses consumers’ interest but also attracts their attention.

Table 5.

Eye movement data of brand name interest area

Plane advertising	Length of gaze (s)	Number of fixations	First fixation time (s)	Duration of first fixation
A1	1.15	8.175	3.461	0.185
A2	0.562	4.62	4.24	0.139
A3	0.972	7.851	3.642	0.126
A4	0.753	5.639	3.829	0.169
A5	1.093	7.301	4.418	0.129
A6	0.638	4.341	4.677	0.134
A7	1.074	7.89	3.913	0.131
A8	0.877	6.784	4.038	0.117

5.2.3

Advertising slogan area of interest

Facing the design of the advertising slogan of different advertising works, the eye movement data of the subjects are specifically shown in Table 6. It can be seen that the longest and the most frequent attention to the advertising slogan are in the A1 advertisement, reaching 1.353s and 10.933 respectively, which can get more attention from consumers. The shortest attention time and the least number of times of attention are in A2 ads, whose slogans are the least attractive and difficult to attract consumers’ interest. In addition, A5 print ads have the longest first gaze time and first gaze duration, and A6 print ads have the shortest first gaze time and first gaze duration.

Table 6.

Eye movement data of advertising language interest area

Plane advertising	Length of gaze (s)	Number of fixations	First fixation time (s)	Duration of first fixation
A1	1.353	10.933	3.737	0.1
A2	0.767	6.198	4.955	0.113
A3	1.101	8.338	4.232	0.111
A4	1.233	9.303	3.732	0.147
A5	0.858	6.61	5.681	0.163
A6	1.274	9.723	3.246	0.089
A7	0.93	5.672	5.623	0.149
A8	0.97	6.924	5.229	0.115

5.2.4

Product interest areas

The eye movement data of the product interest area of different advertisement design works are specifically shown in Table 7. From the table, it can be seen that the A1 print advertisement attracted the subject’s attention for the longest time, reaching 1.237s, and the A4 print advertisement caused the subject’s attention for the least time. By comparing the first gaze time of the product’s area of interest in different combinations, it is found that the A8 print advertisement attracted the subjects’ attention to the product the fastest, while the A4 print advertisement attracted the subjects’ attention to the product the slowest. In addition, the data on the number of times of attention in the table shows that the A1 print advertisement was able to reach more attention of the consumers with the highest number of times of attention of 9.658 times.

Table 7.

Eye movement data of product interest area

Plane advertising	Length of gaze (s)	Number of fixations	First fixation time (s)	Duration of first fixation
A1	1.237	9.658	3.707	0.119
A2	0.879	6.998	4.627	0.124
A3	1.211	8.129	4.127	0.11
A4	0.428	3.713	7.303	0.116
A5	0.643	5.716	4.112	0.096
A6	0.638	5.502	4.671	0.084
A7	0.628	5.085	5.623	0.12
A8	0.86	7.798	3.514	0.113

Combining the above eye movement data to compare and analyze, the advertisement designed by applying the advertisement image style migration model in this paper, A1, is able to arouse consumers’ interest better whether it is in the spokesperson, brand design, or advertisement slogan, product image design, and its performance is more outstanding in advertisement design, which verifies the utility of the generative adversarial network-based advertisement image style migration model proposed in this paper in the actual advertisement design. This validates the utility of the proposed advertisement image style migration model in actual advertisement design.

6

Conclusion

In this paper, the advertisement layout generation model FL-GAN is composed of multimodal embedding network and layout generation network, and on the basis of efficiently generated advertisement layout typography, the advertisement image style migration model is further proposed based on contour constraints to realize a better art style migration effect and to promote the application of art design in the advertisement industry.

The advertising image style migration application practice is utilized to test the effectiveness of the advertising image style migration model in this paper. The FID index and PPL index are used to evaluate the generation quality of image style migration, and this paper’s model outperforms other comparative methods in all datasets except for the FID score on Gongbi landscape painting, which is slightly inferior to StyleGAN-2. In terms of decoupling index PPL, the model of this paper outperforms other comparative methods in PPL scores of Tangka Flower Painting, Ink Landscape Painting, Face Painting, Bonsai Painting, and Gongbi Landscape Painting, which are 65, 45, 27.5, 79.1, and 48.6, and also achieves the optimal decoupling performance in Z Gaussian Noise Space, W_C Content Space, and W_S Style Space. When comparing the image quality generated by each style migration method through subjective tests, the model in this paper is more recognized by the rating observers with 66.8% of “acceptable image quality”. In the natural image generation extension experiment, comparing with StyleGAN-2, the model in this paper only performs slightly worse in the PPL score of the bird dataset, and the FID and PPL scores of the rest of the datasets are all better than those of StyleGAN-2. Obviously, the model in this paper shows good performance both in terms of the generation effect of the style migration and the generalization of the performance of the model in a variety of aspects.

This paper applies the advertisement image style migration model to the design of international cosmetic print advertisements, and compares the resulting designs with seven other designs in terms of eye-movement data, in order to explore the practical application of this paper’s model in advertisement design by analyzing the viewing situation of the subjects. In the analysis of spokespersons, brands, advertising slogans and product interest zones, the A1 print ads designed by applying this paper’s advertisement image style migration model have the longest gaze time and the most number of times of gaze. In the spokesperson, brand, tagline and product interest areas, the viewing time reaches 1.676s, 1.15s, 1.353s, 1.237s respectively, and the number of viewing times reaches 12.622, 8.175, 10.933, 9.658 respectively, of which in the brand name area of interest, the A1 advertisement also has the most number of viewing times, with the number of times reaching 8.175. Overall, the advertisement A1 designed by applying the advertisement image style migration model in this paper can better arouse the interest of consumers, and its performance in advertisement design is more outstanding, which proves the utility of the advertisement image style migration model proposed in this paper in advertisement art design.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

A study of the intersection of art and design in the advertising industry and computer science and technology

Yong Yang

Published Online: Sep 26, 2025

Received: Jan 08, 2025

Accepted: May 08, 2025

DOI: https://doi.org/10.2478/amns-2025-1059

KeywordsGenerative adversarial networks, LayoutGAN model, Style migration, Contour constraints, Advertising art design

© 2025 Yong Yang, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Keywords
Generative adversarial networks, LayoutGAN model, Style migration, Contour constraints, Advertising art design