Acceso abierto

Data Mining and User Profile Construction in Marketing Strategy of Cultural Industry

  
24 sept 2025

Cite
Descargar portada

Introduction

With the rapid development of the Internet, many new cultural industries have emerged, promoting the further development of the cultural industry. In this process, the market acceptance of cultural industries full of innovation and creativity is higher, and it can also drive the progress of cultural industries. To maximize the effectiveness of innovation and creativity, it is necessary to accurately locate the market and do a good job in the marketing of cultural industries.

One of the special characteristics of cultural industry is that it does not have clear input and output forms like traditional industries [1-2]. It has its own input and output forms, but is not limited to a certain kind and part. Culture becomes a resource and foundation that can be related to the industry, while the process of industrial operation is manifested in the production of cultural products and the fulfillment of market demands and mass group needs [3-5]. In the three-dimensional and multi-dimensional marketing vision, the marketing of cultural industry should think about the dialectical relationship between culture, people and market, and pay more attention to the organic connection between the characteristics of cultural products and the self-realization of audience groups [6-8]. This real demand is directly oriented to the in-depth exploration and use of new media marketing functions.

Simple and crude traditional marketing strategy has been difficult to meet the development needs of enterprises, “Internet online + physical offline” precision marketing is gradually penetrating various industries [9-10]. Data mining technology through the analysis and mining of a large number of user attributes and behaviors and other data, to find the laws and patterns implied in them, for the cultural industry in the marketing planning to provide a more scientific and accurate marketing strategy [11-12].

Gutnik, S. investigated the effectiveness of data mining and machine learning methods in digital marketing strategies, where companies utilize data analytics tools to transform data and information generated by the marketing process into knowledge that aids in the development of their marketing strategies [13]. Fan, L. mined customer consumption data and utilized the IPSO-k-means algorithm to build an RFM model to categorize customers in order to achieve precision marketing of commercial consumer products [14]. Ernawati, E. et al. similarly constructed a customer segmentation framework based on RFM models and data mining techniques, showing that firms are able to use the framework to discover and understand customer characteristics, and then to develop marketing strategies that are in line with their target markets [15]. Yusnidar, Y. et al. used a hybrid approach to explore the personalized marketing and data mining Yusnidar, Y. et al. used a hybrid approach to explore the dynamic relationship between personalized marketing and data mining, which can enhance customer engagement and satisfaction, but also needs to balance the relationship between users’ personalization interests and privacy protection [16]. Yoseph, F. et al. proposed the use of the Hadoop distributed file system to process dynamic data, and the use of Expectation Maximization (EM) and k - means++ clustering algorithms to accurately segment the consumer market, and the experiments proved that this marketing strategy increased the sales rate of retail products [17]. Hou, R. et al. discussed the use of various data mining techniques in marketing decision making and used it as a basis for designing a BP neural network based marketing decision support system to provide scientifically sound marketing decision analysis [18].

This paper proposes a user data mining and preprocessing technique for cultural industry marketing, and constructs a user portrait model for cultural industry marketing on this basis. The concept of local reachable density is used to optimize the selection of initialization centers, and after completing the preprocessing of user data and the corresponding data mining, labels are marked for users. The firefly algorithm is used to solve the problem of randomness in the selection of the initial clustering center of the traditional K-means algorithm, and the optimal solution is found to determine the initial clustering center. Improve the movement mode and random perturbation mode of firefly to enhance the stability and accuracy of clustering, and obtain stable clustering results. From the clustering effect and algorithm efficiency of the two aspects of the performance of this paper’s user image model for testing, and choose the public library cultural and creative products industry as the research object, with A public library in the past five years of 15068 effective user data as sample data, the best clustering analysis of user image, and around the clustering results of user image to analyze the marketing strategy of the public library cultural and creative products industry of the potential customers for the support for the development of marketing strategies in the cultural industry.

Data mining technology in marketing strategy of cultural industry

As the basic technology of big data, the essence of data mining is to collect and sample huge amount of data, explore, model and evaluate according to the characteristics of the data, and discover the potential value of the data. With the development of computer technology, data mining presents more accurate, faster and more scientific characteristics, cultural industry marketing can use data mining technology for data collection, collation and analysis, and then assist enterprises to make the next step in decision-making, and constantly optimize the precise marketing strategy.

The role of data mining technology in cultural industry marketing

Advanced, data mining technology can make the cultural industry marketing more forward-looking.

Under the support of data mining technology, customer information is standardized and organized, and the model learns customer’s past consumption, browsing and other information to derive the customer’s consumption habits and patterns, and then combines the current development trend and other information data to infer the customer’s next consumption tendency, and the enterprise can formulate forward-looking cultural industry marketing strategy according to the results, so that the marketing content can follow up the development of the times and meet the needs of consumers. To ensure that the enterprise marketing strategy is always ahead of the market and consumers.

Economy, data mining technology can make cultural industry marketing faster.

Through data collection, model training and architecture construction, data mining can divide and learn customer characteristics by itself, find target customers more quickly, save labor and material costs for cultural industry marketing, reduce the proportion of “useless work”, and improve the marketing success rate.

Reliability, data mining technology can make cultural industry marketing more scientific.

The application of data mining technology in cultural industry marketing can improve the reliability of quantitative analysis of cultural industry marketing in the form of mathematical model for customer identification, strategy formulation and other activities, making the marketing strategy more scientific. In addition, the data mining technology can analyze the previous cultural industry marketing strategy in depth, and provide scientific suggestions for the formulation of the next cultural industry marketing strategy of the enterprise by analyzing the factors of success and failure of the marketing strategy, so as to enhance the reliability of the strategy.

Cultural Industry Marketing User Data Mining and Preprocessing

The collected user data of cultural industry marketing needs to be pre-processed before it can be used. Data pre-processing includes data cleaning, data integration, data standardization. Data standardization in this paper uses the linear function transformation equation for standardization. As shown in equation (1): Xnormal=XXminXmaxXmin

Where, Xnormal normalized value, Xmax indicates the maximum value of the indicator, Xmin indicates the minimum value of the indicator, and X indicates the initial value of the indicator.

Software data has a large amount of structured data in various download centers, which can be crawled and cleaned by crawling and cleaning. The web page data crawling module is used to crawl the content of the web pages visited by users for subsequent web page interest analysis.

Data mining layer

The traditional K-means algorithm has the defect of randomly selecting the initialization centers, to address this problem, this paper uses the concept of locally reachable density to optimally select the initialization centers [19]. The concepts involved in local reachable density are described below.

Definition k for the 1st distance.

Definition dkO is the k th distance from point O and dO,P is the distance from point O to point P . dkO=dO,P if the following condition is satisfied.

First, there exists at least k point PD\O in the set such that dO,PdO,P

Second, there exists at most k1 points PD\O in the set such that dO,P<dO,P

Definition 2, k distance neighborhood NkO .

Let NkO be the k th distance neighborhood of O , which needs to satisfy equation (2): NkO=PD\OdO,PdO

Definition 3, reachable distance dkO,P .

Let dkO,P be the k th reachable distance from point P to point O , which needs to satisfy equation (3): dkO,P=maxdkO,dO,P

Definition 4, locally accessible density ρkO .

The locally accessible density is defined as equation (4): ρkO=1PNNOdkO,P/k=kPNkOdkO,P

From the above equation, it can be seen that the greater the local density, the more likely it is to be the cluster center of the clustering. However, if the selection is made only on the basis of local density, the initial points may be selected too densely. For this reason this paper introduces the factor of distance. In this paper, the improved steps of its K-means to select the initialization center point set are as follows (the data set is represented by D , the data set size is n , the initialization center point set is represented by C , and the number of initialization centers of K-means is represented by k ).

Step1, the local reachable density ρi of each sample i in dataset D is calculated and normalized using equation (1) to obtain ρi .

Step2, select a point with the highest reachable density on D as the initial center C1 , C=C1 .

Step3, repeat

The distance dCi,xi of each sample i from each initial center point Cj in C is calculated on D using equation (5): dCj,xi=xi1cj12+xi1cj12++xincjn2

For xiD , disi=mindCi,xi and normalized using equation (1) to obtain disi .

The weight factor γi is calculated for each sample i using equation (6): γi=pi*disi

Select the sample iγmax with the largest weight factor as Ci , C=CCi until k initialization centers are found.

Step4, output k initialization centers.

In this paper, the SSE (sum of squares of errors) is used to measure the clustering effect. As the value of k , the number of clustered clusters, increases, the total SSE decreases rapidly. When the value of K is greater than the actual number of clusters, the value of total SSE will tend to decrease slowly as the value of k increases.

Label Marking Layer

After pre-processing and corresponding data mining of user data collected by Internet access service establishments, users can be labeled. The labeling of basic attributes is relatively simple, and the corresponding information can be extracted directly for labeling. The labels of consumption attributes need to be labeled with the data mined by statistical and clustering algorithms. In the interest attribute, the software interest tag is labeled according to the established software database and the web interest tag is labeled according to the classification result.

The qualitative labels of basic attributes and consumption attributes are weighted with a coefficient of 1 due to their uniqueness.The weights of qualitative labels in interest attributes need to be calculated. The calculation process is shown in equation (7): ωtarget= timetarget timetotal*N(t)

Where, ωtarget is the weight of the tag to be calculated, timetarget is the time used to mark the internet behavior of the tag, timetotal is the total time spent by the user on the internet, and N(t) is the time decay function as shown in equation (8): N(t)=N0eλt

Where t denotes the time interval from the last time the user generated the behavior, N0 is a set time weight factor, which can be set according to the business. λ is a decay constant, which can be adjusted according to the actual application scenarios, and the use of the exponential decay function as the time decay factor can reflect the gradual cooling of the label’s hotness over time [20].

User Profiling Model in Marketing Strategy of Cultural Industry

In the previous chapter, this study proposed a data mining and preprocessing method for cultural industry marketing users, using data deep mining for the refined analysis of user information. As an important technology based on data mining and with the ability to accurately analyze user characteristics and depict user behavioral features, user profiling is also becoming more and more important in cultural industry marketing and strategy development [21].

Impact of user modeling on cultural industry marketing

The impact of user profiling on cultural industry marketing is mainly reflected in the following aspects.

More accurate positioning.

Through the establishment of user profiles, cultural industry marketing can more accurately understand the user’s needs, interests, preferences and other information, so as to be able to more accurately locate the user group. In this way, when personalized marketing is carried out, cultural industry marketing can better formulate marketing strategies for specific user groups and improve the accuracy and effect of marketing.

Improve user experience.

Through the user profile to understand the personalized needs of users, in the cultural industry marketing to provide users with products and services that better meet their needs. This can not only improve user satisfaction and loyalty, but also enhance users’ willingness to buy and consumption frequency, thus improving the effect of personalized marketing in cultural industry.

Strengthen marketing communication.

Through user profile, enterprises in the marketing process of cultural industry can better understand the preferences and habits of users, so that they can choose more appropriate communication methods and channels for personalized marketing. Therefore, user profile can help marketers to target marketing communication. According to the channel preferences in the user profile, choose the appropriate communication channels for marketing communication.

Improve marketing efficiency.

Through user profiling, the cultural industry marketing strategy can be refined according to the user’s purchasing behavior and consumption preferences, to understand the user’s willingness to buy and consumption ability, so as to better formulate product pricing strategies and promotional activities. This can increase sales and profitability and improve the benefits of personalized marketing. According to the needs and characteristics of different user groups, cultural industry marketing can be targeted for positioning and promotion to better meet the needs of users and improve marketing effectiveness.

Application strategy of user profile in cultural industry marketing

The main research results of user image technology in the field of cultural industry focus on service personalized recommendation, precision marketing and other aspects.

Service personalized recommendation.

Based on the data analysis of user information, the attribute characteristics of users are presented in a visual way, providing a new method for personalized information recommendation service.

Precision marketing.

Enterprises use user portrait technology on the basis of marketing application research in the cultural industry, and can analyze user data to derive data and information to support business decisions, specifically including users’ consumption patterns, product preferences, daily browsing and interest records. On the basis of data analysis, the enterprise provides product precision marketing services to meet the different needs of product users.

Culture Industry Marketing User Portrait Model

In order to realize the in-depth application of user portrait technology in the marketing and strategy formulation of cultural industry, this paper proposes a user portrait model based on the improved firefly optimization weighted K-means algorithm.

Basic idea of the algorithm

Based on the improved firefly optimization weighted K-means algorithm using the traditional K-means algorithm and the firefly algorithm’s own advantages to make up for each other’s shortcomings, based on which local improvements are made to optimize the performance of the algorithm.

Specific ideas firstly, for the traditional K-means algorithm initial clustering center selection of randomness and other shortcomings, this paper adopts the firefly algorithm (FA) to find the optimal solution, as the initial clustering center of K-means algorithm [22]. Secondly, the traditional K-means algorithm corrects the shortcomings of the firefly algorithm, which is slow to converge and easy to oscillate, due to its fast speed and good clustering effect. Again considering the different business relevance of the collected data, weights are introduced to the traditional Euclidean distance to mitigate the impact of anomalies. Finally, the accuracy and stability of clustering is improved by improving the movement of fireflies and random perturbation to get stable clustering results.

Let the sample dataset to be clustered X , m be the data dimensions.

X=X1,X2,,Xj,,Xn;Xj=xj1,xj2,,xjmTRm

The algorithm-related definitions are as follows.

Definition 1, the distance between fireflies i and j : rij=xixj=k=1mxikxjk2 where m is the data dimension and xij is the j th data component of firefly i .

Definition 2, the brightness of the firefly: I=I0×eγij where I0 is the fluorescence brightness at the firefly itself r=0 . γ is the light intensity absorption coefficient, usually a constant.

Eq. (10) is computationally large, resulting in a slow convergence of the firefly algorithm, and the brightness is related to the objective function, so the algorithm in this paper directly adopts the objective function Jc to reflect the brightness of the firefly, Jc calculated by Eq. (12).

Definition 3, firefly attractiveness: β(r)=β0×eγrij2

Where β0 is the maximum degree of attraction, that is, the degree of attraction at r=0 .

The firefly is attracted to move, the distance becomes smaller and smaller, by the principle of equivalent infinitesimal substitution, the use of formula (13) instead of formula (12) can reduce the amount of calculation and improve the speed of operation.

β(r)=β01+γrij2

Definition 4, position update formula.

The perturbation term α×rand0.5 of the firefly algorithm perturbation effect is not obvious, easy to cause fluctuations in the vicinity of the local optimal value, so the perturbation operator α×rand×XiV02 is introduced in the firefly algorithm, then the position update formula of the firefly i is attracted to move toward the firefly j can be optimized as equation (14). It can be seen that the position update is related to the attraction degree, which determines the size of the moving distance: Xi+1=Xi+β01+γrij2XiV0+αrandXiV02 where V0 is the current optimal clustering center, α is the step factor, which is a constant on 0,1 , and rand is a random number obeying a uniform distribution on 0,1 : V0=1niyCiy where ni is the number of data in cluster Ci and y represents the value of data in cluster Ci .

The brightest firefly X moves according to equation (16): X*+1=X*+α×rand×XiV02

The optimization of the perturbation operator effectively avoids the random movement of the brightest firefly and improves the convergence speed and accuracy of the algorithm.

Definition 5, Weights.

Considering the different business relevance and influence of the sample data to be clustered, weight Ω=ω1,ω2,,ωj,,ωn;ωj=ωi1,ωi2,,ωimTRm is introduced into the calculation of the objective function to reflect the overall distribution characteristics of the data: ωid=xid1nd=1nxid

Definition 6, Objective function.

Brightness and the objective function are related, using the objective function to represent brightness, the greater the brightness, the better the firefly position, the smaller the value of the objective function, the better the clustering effect, i.e., the firefly with high brightness attracts the firefly with low brightness, and the brightness determines the direction of movement.

The center of clustering using the traditional K-means algorithm is V=V1,V2,,Vk

The Euclidean distance between the data object and the clustering center is: dX,V=xivj=j=1mXiVj2 where Vj denotes the center position of class j , i=1,2,3,,n , j=1,2,3,,k .

Vj=1nj xi where nj is the number of data in Vj and xj is the sample data in Vj .

The objective function is obtained using only the traditional K-means algorithm as: JX,V=j=1kXiGjdXiVj=j=1kXiGjj=1mXiVj2 where Gj is the set of data in j classes.

After weighting, the distance between the data object and the clustering center is: dωX,V=xivjω=j=1mωidXiVj2

The introduction of weight ωid in equation (20) highlights the data distribution characteristics, which makes it easy to exclude anomalies and improve the clustering accuracy, while reducing the number of iterations and making it faster.

The weighted objective function is: JcX,V=j=1kXiGjdXiVj=j=1kXiGjj=1mωidXiVj2

User Profile Construction and Application

In order to improve the calculation speed and accuracy, this paper proposes a hierarchical clustering portrait recommendation model based on the improved firefly optimization weighted K-means algorithm. First of all, on a single business, design a simple, targeted tag library to find out the specific user groups with some typical commonalities, through the data analysis of the group, deep mining, and further extract the group characteristics of the user, using the improved firefly optimization based weighted K-means algorithm for two-layer clustering of electric power users, respectively, to build two groups of different similar characteristics of the KY class and KN class group portrait Model.

This paper only takes the public library cultural and creative products industry as an example to introduce the application method of product marketing to target potential users. The specific user portrait modeling process is as follows.

Data Acquisition. Obtain a large amount of real and reliable user data from the management system of public libraries. Collect business data closely related to book activities.

Data feature mapping. Assuming that the user dataset is X=X1,X2,,Xj,,Xn , the customer data is preprocessed, and the vector space model (VSM) is used to map the m -dimensional features of users into X=X1,X2,,Xj,,Xn;Xj=xj1,xj2,⋯,xjmTRm .

The first layer of clustering, based on the improved firefly optimization weighted Kmeans algorithm extracts users and clusters them into 2 classes. Two categories, Y groups that have participated in public library activities and N groups that have not participated in public library activities, are identified.

Second level clustering. Once again, the improved firefly optimization based weighted K-means algorithm is used to cluster group Y and group N respectively, and group Y is divided into KY classes and group N is divided into KN classes.

Group feature extraction, after two layers of clustering, a total of KY+KN cluster clusters and cluster centers are obtained. The clustering center of the clusters represents all the objects of the clusters, and its parameters, i.e., labels, reflect the common characteristics of the group.

Group portrait expression, visualize the labels, and finally get the group portrait of KY+KN users.

Empirical analysis of the application of user profile in cultural industry marketing

The empirical object selected in this chapter is the public library cultural and creative products industry, following the user profiling model based on firefly K-means clustering proposed in this paper, connecting the Tableau data analytics platform to the heterogeneous business systems such as access control management system, self-service system, resource management platform, and space reservation system of Public Library A, and collecting the various types of public library user’s behavioral data of the past five years in full volume. Behavioral data. After applying the data mining techniques proposed in this paper for data cleansing and preprocessing of missing values, outliers and invalid data, a total of 15,068 valid user data is obtained.

User Profiling Model Performance Testing

This performance test uses 15068 effective user data from A public library in the past 5 years, and selects the bisection K-means algorithm based on bisection K-means algorithm and the classical K-means algorithm as a comparison, and carries out comparative experiments on the clustering effect and the algorithmic efficiency and the average elapsed time, respectively.

Clustering effect

Using the improved algorithm of this paper and the bifurcated K-means algorithm and the classical K-means algorithm to compare the above experimental sample set for different clustering clusters, calculate the total sum of the clustering results of the Mahalanobis distance, the smaller the sum represents the clustering results of the global optimum, the better the clustering effect, the results of the comparison are shown in Figure 1. From the figure, it can be seen that the sum of Mahalanobis distances of this paper’s algorithm under different numbers of clustering clusters are smaller than those of the bisection K-means algorithm and the classical K-means algorithm. When the clustering cluster is the largest 10, the sum of Mahalanobis distance of this paper’s algorithm is 13.84, which is lower than 19.03 and 19.51 of the bisection K-means algorithm and classical K-means algorithm.

Algorithm efficiency

In addition to the clustering effect and algorithm efficiency, this study also compares the efficiency of the three algorithms in different clusters under the number of clusters, each cluster number of the same algorithm are done five experiments to take the average time consumed, and the results of the obtained comparisons are shown in Figure 2. As can be seen from the figure, the processing time of the algorithm in this paper is much lower than that of the bisection K-means algorithm and the classical K-means algorithm, and the time consumed is relatively stable, and is always maintained in the interval of 6~8s.

Figure 1.

Cluster effect

Figure 2.

Algorithm efficiency

From the above two experimental results, it can be seen that the improved firefly K-means algorithm outperforms the bisection K-means algorithm and the classical K-means algorithm in terms of global optimization, stability, and efficiency.

Determination of the optimal number of clusters

Cluster analysis and Wilke’s Lambda indicator

The firefly K-means algorithm proposed in this paper was used to further classify all the sample data, extract the characteristic labeling factors of the public library user group, and classify them into four major categories: interest and experience factors, service feeling factors, culture-oriented factors, and communication and sharing factors. K-means clustering was carried out for the four types of characteristic factors, and after selecting the variables, the maximum number of iterations was set to 10, and the number of clusters was selected to be 3, 4, and 5 to carry out the clustering analysis in order. Using the Wilke Lambda value (the ratio between the within-group sum of squares and the total sum of squares, the range of its value is 0-1, the closer to 0 means the stronger the difference, and vice versa, the closer to 1 means the smaller the difference) as a discriminant analysis index, the variables are gradually discriminated, and a comprehensive decision is made to select the final clustering results. The results of cluster analysis and Wilke’s Lambda indicator are specifically shown in Table 1. From the table, it can be observed that when the K value is chosen to be 5, the F-value gap between the various public factors in the table is the smallest, and the difference ranges from 200 to 510, which represents the smallest variability between the user profiles. Although the Wilke Lambda value is the smallest 0.156, it is still not the optimal choice. When three K values are selected, the Wilke Lambda value reaches 0.315, which is the largest value among all the schemes, and the F-value difference between each characterization factor is general. In summary, when the K value of 4 is finally selected, the clustering scheme is the most significant, and the F-value gap between each feature factor is more obvious, with the difference ranging from 222 to 649, and the clustering difference is more significant, and the generated user profile will be more significant. Wilke Lambda value of 0.178 is close to 0, the difference is obvious, and also the clustering probability reaches 88.6%, reflecting the superiority of this choice.

Elbow curve and contour coefficient curve

In order to further ensure the feasibility of the above empirical clustering results to assist the marketing of public libraries in the cultural and creative products industry, the Elbow elbow curve method combined with the contour coefficient to jointly establish the optimal clustering, and test the clustering results.The Elbow elbow curve and the contour coefficient are shown in Figure 3. From the figure, it can be seen that with the K value increasing, the optimal number of clusters is between [4,5,6], and when the K value is 4, it shows that the contour coefficient is 0.4921, which is much higher than the other clustering number of cases, and the separation degree of the user portrait clustering results is better. Therefore, K value of 4 is the best clustering.

Cluster analysis and Wilk lambda

Cluster number Cluster number is 3 Cluster number is 4 Cluster number is 5
Clustering index F value Sig F value Sig F value Sig
Interest experience factor 973.635 .000 771.849 .000 651.434 .000
Service sensory factor 594.806 .000 530.906 .000 495.883 .000
Cultural orientation factor 1096.785 .000 956.419 .000 802.983 .000
Communication sharing factor 548.275 .000 308.347 .000 293.879 .000
Wilk lambda 0.311 0.178 0.156
Grouping probability(%) 82.40% 88.60% 87.20%
Figure 3.

Elbow curve and contour coefficient curve

Analysis of user profile clustering results

The 15,068 valid user data of public library A in the past 5 years were clustered, and its final clustering center results are shown in Table 2. From the table, it can be observed that each category of user profile has very obvious differences in the four clustering indexes: interest experience factor, service feeling factor, cultural orientation factor, and communication and sharing factor. According to the mean value and the number of samples, Class A users can be categorized as “recreational” users, aiming at physical and mental relaxation, and the number of samples reaches 9,995, accounting for 66.33%, which can be defined as the potential main customers of the marketing of cultural and creative products in public libraries. Class B users pay more attention to product content quality and experience, while Class D users pay more attention to emotional and social needs, which can be categorized as “participation and experience type” users and “socialite type” users, with the sample size accounting for 12.34% and 19.07%, respectively, belonging to the potential secondary customers. The users in category C have clear knowledge learning needs and are categorized as “target learning” users, with a sample share of only 2.26%, which is a potential marginal customer.

Based on the clustering results of user profiles, marketing strategies can be targeted to specific user groups. In the face of potential main customers - “leisure” users, we can emphasize social media and private traffic, combining online and offline marketing channels to enhance daily contact and leisure interaction with consumers. For “experience-oriented” users, we can increase the quality and experience of cultural and creative products, and provide customers with offline trial and experience activities to meet their needs for the quality and experience of cultural and creative products. Organize cross-border marketing activities, such as multi-IP co-branding, to cater to the emotional and social needs of “socialite” users. For the “target learning type” users, which account for the smallest proportion, the brand loyalty of potential marginal customers can be maintained through the regular development of cultural and creative products based on education and knowledge learning needs.

Cluster results of user portraits

Type of user portrait Class A user Class B user Class C user Class D user
Interest experience factor -0.745 0.225 -0.884 0.558
Service sensory factor -0.732 0.334 -1.044 0.585
Cultural orientation factor -0.647 0.378 -1.084 0.654
Communication sharing factor -0.422 0.472 -1.135 0.696
Sample size 9995 1859 341 2873
Effective sample size 9995 1859 341 2873
Missing sample size 0 0 0 0
Conclusion

Aiming at the status quo that the traditional cultural industry strategy can no longer meet the market demand, this paper proposes the cultural industry marketing user data mining and preprocessing technology, and constructs a user portrait model on this basis.

The performance test of the user portrait model of cultural industry marketing constructed in this paper is carried out. The sum of Mahalanobis distances of this model is smaller than that of the dichotomous K-means algorithm and the classical K-means algorithm as a comparison, and the time consumed in different clusters is always stabilized within 6~8s.

In the user portrait clustering analysis, when the K value is 4, the F value difference between each feature factor ranges within 222~649, and the clustering variability is relatively large. The Wilke Lambda value is 0.178, and the grouping probability also reaches 88.6%, generating a more significant user portrait. When the K value is 4, the profile coefficient is 0.4921, which is much higher than the other clustering numbers, and also confirms that it has a better separation of household portrait clustering results.

Determine K = 4, the clustering analysis of 15068 effective user data of public library A in the past 5 years. It can be learned that Class A users, i.e., “recreational” users, are the potential main customers of the public library cultural and creative products industry, and the sample size reaches 66.33%. Class B users and Class D users are categorized as “participatory” and “social” users, respectively. Class B users and Class D users are categorized as “Participation and Experience” users and “Socialite” users respectively, with a sample share of 12.34% and 19.07%, which are potential secondary customers for marketing, while Class C users, i.e., “Targeted Learning” users, have the lowest sample share of 2.26%, which is a potential marginal customer. The results of the cluster analysis of user profiles can provide the basis and support for the formulation of marketing strategies in the cultural industry.