Research on AIGC empowering digital cultural and creative design style transfer and diversified generation methods
Pubblicato online: 24 mar 2025
Ricevuto: 12 nov 2024
Accettato: 13 feb 2025
DOI: https://doi.org/10.2478/amns-2025-0791
Parole chiave
© 2025 Ran Jia, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
With the continuous development and popularization of digital technology, digital literacy is rapidly emerging as an important part of the development of national cultural and creative industries, as well as a new engine of global economic development [1-2]. Characterized by digitization, networking, and intelligence [3-4], digital literacy covers a wide range of fields such as digital cultural content creation, digital cultural product development, and digital cultural service provision, including digital film and television [5], digital music [6], digital games [7], digital design [8], digital art, and so on [9]. These fields have become an important part of digital literacy, injecting new vitality into its development. Digital Literature and Creativity plays an important role in promoting cultural inheritance, cultural innovation, promoting economic growth, and satisfying people’s needs for spiritual culture [10-12].
However, the development of digital creativity still faces some challenges. On the one hand, digital literacy needs to combine the connotation of cultural creativity to create innovative and valuable products that are recognized in the market [13]. This requires a breakthrough in design style. On the other hand, digital LCS needs to overcome technical bottlenecks and improve the quality of content and user experience [14-15]. This requires optimizing the application of technology in design and introducing new technologies to achieve stylistic diversification. Style migration and diversification generation provides new ideas, innovative expressions, new visual effects, etc. in digital cultural and creative design, which, with the help of deep learning and computer vision technology, is able to apply the style of one image to another image, thus creating works with novel artistic effects, making the art works show diversity and personalization [16-17].
In the context of the rapid development of AIGC, the author introduces artificial intelligence algorithms to the style migration of cultural and creative designs, and chooses the VGG-19 model as the pre-training model. The CycleGAN algorithm is improved through the multi-attention mechanism and bilinear interpolation method, so as to make the style migration of cultural and creative product design more natural and optimize the visual effect of cultural and creative design, and to construct the AIGC cultural and creative style migration model based on the improved CycleGAN. The improved CycleGAN model of this paper is objectively evaluated by PSNR, MSE, MS-SSIM, Per-pixel acc and the convergence of the loss function during model training. Using the hierarchical analysis method to construct the evaluation index system of literary and creative style migration application to subjectively evaluate the improved CycleGAN model of this paper, so as to explore the effect of the improved CycleGAN model on literary and creative style migration and generation.
AIGC stands for Artificial Intelligence Generated Content and is also known as Generative Artificial Intelligence [18]. It refers to the creation of multiple types and styles of digital works, such as text, images, sounds, and videos, individually or in combination, based on user inputs or their own logic through the use of AI technology.AIGC is not only capable of digitally presenting and augmenting real-world content, but also generating original or variant content with the help of AI’s autonomy of creativity.AIGC has the characteristics of automation, high efficiency, creativity, and interactivity. Its key technologies cover three important elements.
Data As the core pillar of AIGC technology, data includes data sources (open domain data, domain-specific data, user data), data storage methods (centralized database, distributed database, cloud-native database, vector database); data forms (structured data, unstructured data), and data processing methods (filtering, annotation, manipulation, enhancement, etc.), which have a direct impact on the generated level and quality of content. Arithmetic power As the hardware infrastructure of AIGC technology, the arithmetic power includes semiconductor processors (commonly CPU, GPU, etc.), servers, large-scale model computing clusters, and the construction of distributed training environments on Infrastructure as a Service (IaaS) or self-built data center deployment. This guarantees the running speed and performance of AIGC applications by providing hardware application services such as cloud computing, edge computing, and distributed computing. Algorithm The algorithm platform covers machine learning platform, model training platform and automatic modeling platform, etc., which cover the steps of model design, model training, model inference and model deployment. These algorithm platforms build the core innovation power of AIGC technology, provide support and coverage for actual business, and reflect the ability and effect of AIGC application operations. At present, the AIGC industrial ecosystem has constructed a three-layer structure: the top layer is the AIGC technology infrastructure built based on pre-trained models, the middle layer contains verticalized, scenario-based and personalized models and application tools, and the bottom layer is the application layer, which provides content generation services such as text, pictures, audio and video for C-end users. AIGC is divided into four basic modes: text generation, audio generation, image generation, and video generation. Based on these basic modes, it also derives cross-modal generation between text, audio, and image, strategy generation, GameAI, and virtual human generation, and other modes. One of the major advantages of AIGC-enabled cultural and creative products is their ability to intelligently apply big data technology, comprehensively and timely collect and analyze big data-related information, and carry out personalized product design. AIGC technology provides artists and designers with a wealth of product information, and at the same time stimulates creative inspiration, helping them to automatically generate product concepts, prototypes, styles, and other design elements at a rapid speed based on a variety of factors such as market demand, user preferences, and industry trends. This helps to create more creative and aesthetically pleasing product designs, making products more personalized and attractive.
Style migration algorithms combine art creation and computer vision to achieve image style conversion through deep learning. Neural network-based style migration algorithms are mainly classified into two categories: one is the optimization method, which extracts features by pre-training CNNs (e.g., VGG-19 or ResNet) and adjusts the pixels of the input image to be close to the features of the target style by minimizing the loss function. Another class of methods is based on synthetic adversarial networks (GANs), which use synthesizers and discriminators to learn style transformation mapping relationships. There are also other methods, such as self-encoder-based methods and variational autoencoder-based methods. In this paper, the VGG-19 pre-trained model is used in the experiments [19].
The current artificial intelligence applications are mainly based on deep learning technology as its core foundation, through the development of neural networks, deep learning, and the formation of the use of convolutional networks as the basis for the construction of the form. As the basis of many AI applications, this section will unfold with common convolutional networks, and the convolutional network operation process related to style migration is introduced with VGG network.
Convolutional layer The convolutional layer plays a central role in the convolutional neural network, which mainly performs matrix operations for feature extraction. In the operation, different sizes and shapes of convolution kernels with corresponding step size are used to perform matrix operations to obtain features such as edges, lines, global, local, etc., which provide different training data for subsequent network training to realize the corresponding network parameter update. In the convolutional operation, the core parameters are the size of the convolutional kernel and the running step size, both of which form the basis of the convolutional operation. Each convolution kernel in the convolution process can only get one data, so the convolution process can not only carry out the extraction of feature maps, at the same time, it also has a positive significance for reducing the amount of data. In convolutional computation, each convolutional kernel slides accordingly according to the corresponding step size to obtain the corresponding computational regions in horizontal and vertical directions, and then repeats the matrix inner product computation. Therefore, the size of the convolution kernel and the motion step will determine the size of the data shape after convolution. Pooling layer The pooling layer is an important part of the convolutional neural network, and the main role of the pooling layer is to reduce the corresponding shape of the input data, downsampling, in order to reduce the parameters and reduce the amount of subsequent calculations. In the calculation process, similar to the convolutional calculation process of convolution, the pooling process is also based on the shape of the convolutional kernel, and then processes the data of the same shape. In each pooling calculation, similar to the convolutional calculation, the shape of the convolutional kernel is used as the basis for the calculation, and each time the move is made in the corresponding step to obtain the next calculation region. In the multilayer network model, pooling can continuously merge the data for processing, from shallow to deep, and continuously merge and feed the shallow local information to the high-level depth information, so as to realize the global grasp of the input data. Activation function In the convolutional neural network, the activation function is one of the special layers, whose main role is to carry out a nonlinear transformation, remapping the data under the action of nonlinear activation function, in order to increase the nonlinear expression ability of the network, so as to realize better training. Ideally, the activation function should directly output the input data as “0” and “1” through a certain threshold. In the process of forward propagation and error back propagation of convolutional neural network, it is necessary to carry out operations such as derivation, so the activation function is required to have the properties of continuity and differentiability. The common functions used in the construction of the network and the use of the model layer for style migration are shown below:
Relu-like activation functions were first used by the Alexnet network in 2012. The derivative of the Relu function is always 1 in derivation training, which is a good solution to the gradient problem of Sigmoid, Tanh, and other functions in deeper networks. At the same time, the ease of derivation makes the training speed up. However, its negative half-axis gradient is 0, if the learning rate is larger there may be neuron necrosis, Leakrelu in Relu based on its negative half-axis using a smaller positive number to obtain a linear parameter to mitigate the disadvantage of Relu.
In 2014, the University of Oxford proposed the VGG network, which uses a uniform 3x3 convolutional kernel instead of a large one, and uses a deeper network layer depth to obtain better results. In the subsequent development, VGG gained good development and was widely used, and in 2015, VGG network was used for the first time for style migration feature extraction. There are two commonly used network architectures in the subsequent development of VGG, one is VGG16 and the other is VGG19 network. In the use of VGG for style migration, either using one of the layers for model construction or using a pre-trained network for feature extraction, the convolution, pooling, and activation layers before full connectivity are mainly used. The computational procedure for feature map extraction using VGG is shown in equation (5):
Where,

VGG 19 network
Self-attention mechanism is a mechanism that can model global dependencies within a sequence [20]. It obtains a new feature representation of a sequence by computing the correlation between positions in a sequence. Compared to traditional RNNs, this global attention mechanism can model dependencies at arbitrary distances in long sequences more efficiently and in parallel. The self-attention formula is as follows:
The flow of multi-head self-attention computation is as follows: first, the input sequence

Schematic diagram of the calculation of the multinomial self-attention
The specific computational procedure of the multi-head self-attention layer is:
Obtain
where Self-attention calculation: calculate the self-attention of each head separately, take the Multihead result fusion: multiple self-attention results are fused to obtain the final output vector
Where ⊕ denotes the splicing along the last dimension of the vector, the dimension of the spliced vector may not be equal to the dimension of the original vector
To further enhance the model’s ability to generate target images, while using the multi-head self-attention mechanism, i.e., mapping the input features to multiple subspaces, performing self-attention computation in each subspace individually, and finally stitching the outputs of all heads. This allows learning different semantic expressions of the input, such as context dependency in terms of color, texture, shape, etc., and facilitates the learning of style migration.
The up-sampling method used in the decoder part of the traditional CycleGAN generator structure is the inverse convolution, also known as transposed convolution. However, there are some problems with the anti-convolution operation. When a large-size convolutional kernel is used, the sensory field is wider, which can enhance the utilization of features and improve the quality of the restored image. However, the phenomenon of “uneven overlap” can also occur, resulting in obvious stacking traces in the image. In addition, inverse convolution is also prone to the checkerboard effect and low-frequency artifacts when up-sampling the image resolution.
To address this problem, the sampling method of bilinear interpolation is used instead of the anti-convolution [21], which can provide smoother sampling and avoid the artifact problem of the anti-convolution, because the bilinear interpolation is a linear interpolation using the surrounding four points, which can make full use of more peripheral pixel information, and the operation of local convolution is smoother compared to the anti-convolution. Bilinear interpolation does not produce a problem of mismatch between the convolution kernel and the step size, and it can accurately interpolate at any scaling factor, thus effectively eliminating the checkerboard effect of deconvolution. Bilinear interpolation has a small amount of computation, which improves the efficiency while ensuring the effect. The specific principle is:
Let the image to be upsampled be
The pixel values are then calculated by the bilinear interpolation formula:
where
In this section, an improved CycleGAN modeling approach [22] is proposed for the image style migration task. The improved network is based on the generator structure of the original CycleGAN in the following two ways: first, a multi-head self-attention mechanism is added between the encoder and the converter, and a new network structure connection from the encoding layer to the converter layer is designed to enable better output of style features to the converter layer. The self-attention module can model global dependencies between different regions of the image and capture the intrinsic structural information of the image. The multi-head design can learn the feature representations of different subspaces, and different heads can focus on different global structural information of the input image, such as shape, texture, etc., and aggregate the information to enhance the model’s understanding of the global structure of the image. This parallel multi-task learning approach can enhance the effect of style migration, thus generating resultant images with more natural style migration. Second, in the decoder’s decoding network, the inverse convolution is replaced with bilinear interpolation for up-sampling. Bilinear interpolation can obtain more natural and smooth sampling results, effectively reducing artifacts such as the checkerboard effect that is easily produced by the anti-convolution operation. This further optimizes the visual quality of the style migration results. By incorporating the self-attention mechanism with the use of bilinear interpolation, the improved CycleGAN network proposed in this paper further enhances the generation effect of style migration while maintaining the processivity of the original network. The overall network structure is shown in Fig. 3.

Diagram of the overall network structure
In this paper, we explore the image generation effect of the improved CycleGAN style migration model using the cultural and creative products of the Forbidden City as an example.
In order to verify the effectiveness of the proposed method MSRes-CycleGAN, three experiments are designed, which are the conversion of original image → cartoon style, original image → oil painting style, and original image → new Chinese style. Starting from objective evaluation indexes, the differences between the improved CycleGAN and other style migration methods are compared using peak signal-to-noise ratio (PSNR), mean square error (MSE), multi-scale structural similarity index (MS-SSIM), per-pixel accuracies in the FCN scores, and the convergence of the loss function during model training. To illustrate, the final evaluation results of each objective index for the conversion results of original image → cartoon style, original image → oil painting style, and original image → new Chinese style are shown in Table 1.
Style transfer experiment evaluation results
| Style transfer | Method | PSNR/dB | MSE | MS-SSIM/dB | Per-pixel acc |
|---|---|---|---|---|---|
| Original image→cartoon style | AdaIN | 20.304 | 87.851 | 0.871 | 0.040 |
| SANet | 19.801 | 85.228 | 0.919 | 0.041 | |
| StyTr2 | 20.181 | 88.609 | 0.921 | 0.051 | |
| AdaAttN | 20.436 | 87.892 | 0.887 | 0.064 | |
| CycleGAN | 19.778 | 90.999 | 0.898 | 0.067 | |
| Ours | 21.293 | 82.678 | 0.939 | 0.071 | |
| Original image→oil painting style | AdaIN | 17.841 | 84.445 | 0.836 | 0.576 |
| SANet | 17.477 | 84.234 | 0.832 | 0.477 | |
| StyTr2 | 17.583 | 83.689 | 0.843 | 0.524 | |
| AdaAttN | 17.552 | 83.709 | 0.842 | 0.608 | |
| CycleGAN | 17.635 | 83.275 | 0.838 | 0.419 | |
| Ours | 17.857 | 81.808 | 0.854 | 0.723 | |
| Original image→new Chinese style | AdaIN | 22.882 | 73.425 | 0.951 | 0.088 |
| SANet | 23.214 | 72.737 | 0.946 | 0.106 | |
| StyTr2 | 22.833 | 72.887 | 0.959 | 0.116 | |
| AdaAttN | 22.874 | 74.443 | 0.941 | 0.155 | |
| CycleGAN | 22.956 | 74.198 | 0.955 | 0.142 | |
| Ours | 24.306 | 71.243 | 0.962 | 0.179 |
Observing Table 1, it can be found that in the conversion of original image → cartoon style, the PSNR value and MS-SSIM value of the improved CycleGAN method in this paper are 21.293dB and 0.939dB respectively, which are the maximum value among all methods, the MSE value is 82.678, which is the minimum value among all methods, and the Per-pixel acc value is 0.071, which is the the maximum value among all the methods. The improved CycleGAN method achieves the same effect in the conversion of original image → oil painting style and original image → new Chinese style. It can be seen that the improved CycleGAN method in this paper obtains the best results in all three image style migration experiments. The images generated by this model are visually more vivid, the texture is more specific and realistic, and the subjective visual effect is better than the style conversion of other style migration algorithms, with the best style migration effect.
The loss functions of the training process for the three datasets of original image → cartoon style, original image → oil painting style, and original image → new Chinese style are shown in Fig. 4. From the convergence of the loss function, all of them tend to converge in the end, but the improved CycleGAN converges fast and rarely rises, indicating that the model training strategy designed in this paper is better.

Training process loss function
In order to make the evaluation of the application of improved CycleGAN for AIGC’s cultural and creative style migration more scientific and reasonable, the hierarchical analysis method (AHP) is applied here to carry out the evaluation research. The hierarchical analysis method can quantify the application evaluation indexes, so as to make the results of the evaluation more accurate and more specific. The hierarchical analysis method is applied below to evaluate the application of the improved CycleGAN model in literary style migration. Table 2 displays the improved CycleGAN model evaluation index system for the application of cultural and creative style migration.
Application evaluation index system of improved CycleGAN model
| Primary index | Secondary index | |
|---|---|---|
| Style transfer effect evaluation system | Image quality (A) | Image fidelity (A1) |
| Detail accuracy (A2) | ||
| Image diversity (A3) | ||
| Image expression (A4) | ||
| Image innovation (A5) | ||
| Image art (A6) | ||
| Image practicality (A7) | ||
| Style simulation (B) | Style transfer coordination (B1) | |
| Style transfer speciality (B2) | ||
| Style transfer applicability (B3) | ||
| Style fusion rationality (B4) | ||
| Connotation expression (C) | Connotation expression form (C1) | |
| Connotation art (C2) | ||
| Cultural expression (C3) | ||
| Historical expression (C4) |
In order to simplify the calculation, Matlab software is applied here to calculate the weights, only the judgment matrix needs to be entered into the Matlab software, and the weight value of each indicator can be calculated quickly. After performing the judgment matrix consistency test, calculate the synthetic weight of the moment, which is the ratio value occupied by each index. The calculation results are shown in Table 3.
Application of evaluation index weight synthetic distribution in improved CycleGAN
| Primary index | Weight | Secondary index | Weight | Synthetic weight | |
|---|---|---|---|---|---|
| Style transfer effect evaluation system | Image quality (A) | 0.3758 | Image fidelity (A1) | 0.1274 | 0.0479 |
| Detail accuracy (A2) | 0.1820 | 0.0684 | |||
| Image diversity (A3) | 0.1136 | 0.0427 | |||
| Image expression (A4) | 0.1538 | 0.0578 | |||
| Image innovation (A5) | 0.1243 | 0.0467 | |||
| Image art (A6) | 0.1548 | 0.0582 | |||
| Image practicality (A7) | 0.1441 | 0.0542 | |||
| Style simulation (B) | 0.3815 | Style transfer coordination (B1) | 0.2639 | 0.1007 | |
| Style transfer speciality (B2) | 0.2213 | 0.0844 | |||
| Style transfer applicability (B3) | 0.2847 | 0.1086 | |||
| Style fusion rationality (B4) | 0.2301 | 0.0878 | |||
| Connotation expression (C) | 0.2427 | Connotation expression form (C1) | 0.1745 | 0.0423 | |
| Connotation art (C2) | 0.2942 | 0.0714 | |||
| Cultural expression (C3) | 0.2736 | 0.0664 | |||
| Historical expression (C4) | 0.2577 | 0.0625 |
Using the improved CycleGAN model of this paper to carry out style migration for the cultural and creative products of the Forbidden City, six experimenters (numbered 1~6) were invited to score the generated stylized cultural and creative products with subjective evaluation, and the scoring range was 0-10 points. The scores of each index are calculated on a percentage basis, and the comprehensive score of the stylized migration of cultural and creative products with the improved CycleGAN model is finally calculated as shown in Table 4.
Weight calculation of application evaluation score in improved CycleGAN
| Primary index | Secondary index | Synthetic weight | Experimental participant | |||||
|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | |||
| Image quality (A) | A1 | 0.0479 | 9 | 8 | 9 | 8 | 8 | 9 |
| A2 | 0.0684 | 9 | 8 | 9 | 9 | 8 | 8 | |
| A3 | 0.0427 | 8 | 8 | 9 | 8 | 9 | 9 | |
| A4 | 0.0578 | 9 | 9 | 9 | 9 | 9 | 8 | |
| A5 | 0.0467 | 10 | 10 | 9 | 10 | 9 | 9 | |
| A6 | 0.0582 | 9 | 9 | 10 | 10 | 9 | 8 | |
| A7 | 0.0542 | 10 | 10 | 10 | 10 | 9 | 10 | |
| Style simulation (B) | B1 | 0.1007 | 10 | 9 | 10 | 8 | 8 | 8 |
| B2 | 0.0844 | 9 | 10 | 9 | 8 | 8 | 8 | |
| B3 | 0.1086 | 9 | 9 | 9 | 9 | 8 | 9 | |
| B4 | 0.0878 | 9 | 10 | 9 | 10 | 8 | 9 | |
| Connotation expression (C) | C1 | 0.0423 | 8 | 9 | 10 | 8 | 9 | 9 |
| C2 | 0.0714 | 9 | 8 | 8 | 10 | 9 | 9 | |
| C3 | 0.0664 | 9 | 10 | 8 | 9 | 9 | 9 | |
| C4 | 0.0625 | 9 | 8 | 9 | 10 | 9 | 9 | |
| Decimal score | 9.1166 | 9.0466 | 9.1176 | 9.0628 | 8.5022 | 8.6847 | ||
| Centesimal score | 91.17 | 90.47 | 91.18 | 90.63 | 85.02 | 86.85 | ||
| Overall evaluation average score | 89.22 | |||||||
From Table 4, it seems that the evaluation of the application effect of the improved CycleGAN model on the style migration of cultural and creative products is composed of three aspects: image quality, style simulation, and connotation conveyance. The average score of the comprehensive evaluation is 89.22 after scoring and converting into weights by the six experimenters. According to the author’s proposed rating scale (excellent 85-100 points, good 70-84 points, passing 60-69 points, failing 60 points or less) to see just reached the excellent level. It also shows that the performance of the application of the improved CycleGAN model in style migration and the generation of cultural and creative products is excellent.
In this paper, AIGC is used to empower cultural and creative design, and the CycleGAN algorithm in artificial intelligence algorithm is improved to construct a model for cultural and creative style migration based on improved CycleGAN. Objective and subjective evaluations of the style migration and image generation effects of this paper’s model for cultural and creative products are carried out to obtain the effect of this paper’s model on cultural and creative design.
In the conversion of original image → cartoon style, original image → oil painting style, and original image → new Chinese style, the PSNR values of the improved CycleGAN model in this paper are 21.293dB, 17.857dB, and 24.306dB, the MSE values are 82.678, 81.808, and 71.243, and the MS-SSIM values are 0.939 dB, 0.854 dB, and 0.962 dB, and Per-pixel acc values of 0.071, 0.723, and 0.179, respectively, which all achieve the best style migration performance among all comparison models. On subjective evaluation, the improved CycleGAN model in this paper achieved a comprehensive score of 89.22 points, and the percentile score range of each experimenter’s rating was 85.02~91.18, with two experimenters’ ratings lower than 90 points, which is an excellent grade overall, and the AIGC Cultural and Creative Style Migration Model with improved CycleGAN in this paper has an excellent performance in the cultural and creative products’ style migration and image generation with excellent performance.
