Acceso abierto

Research and Application of User Behavior Data Analysis Technology for E-commerce

,  y   
27 feb 2025

Cite
Descargar portada

Introduction

In the wave of the information age, e-commerce, as a new business model, has profoundly changed people's consumption habits and enterprise operation modes [1]. With continuous advancement of Internet technology and the popularity of mobile devices, e-commerce platforms have sprung up and become the main channel connecting consumers and merchants [2]. Under this background, user behavior data analysis technology came into being, which is not only the key driving force for the development of e-commerce but also the core weapon for enterprises to enhance their competitiveness. The purpose of this paper is to discuss the research and application of user behavior data analysis technology for e-commerce and provide theoretical support and practical guidance for the continuous prosperity of e-commerce.

The rapid development of e-commerce platforms has brought a huge amount of user behavior data. These data include users' browsing records, purchase history, evaluation feedback, search habits, etc., which constitute a huge data resource library [3, 4]. However, how to mine valuable information from these seemingly chaotic data has become a major challenge in the field of e-commerce. User behavior data analysis technology is the key to solving this problem. Through in-depth analysis of user behavior, it helps enterprises better understand consumer needs, predict market trends, and formulate effective marketing strategies [5].

Globally, the competition in e-commerce market is becoming increasingly fierce, and the competition among enterprises has shifted from the traditional price war and product war to the data war [6, 7]. User behavior data analysis technology has become an important means for enterprises to compete for market share and improve user experience. Against this background, it is not only of theoretical significance but also of practical urgency to study user behavior data analysis technology. Through in-depth analysis of user behavior, enterprises can achieve accurate positioning of the market and personalized service to users so as to stand out from the competition.

The development of user behavior data analysis technology benefits from the promotion of cutting-edge technologies such as big data, cloud computing, and artificial intelligence [8, 9]. The integrated application of these technologies provides powerful tools and methods for user behavior data analysis. For example, big data technology makes it possible to process massive user data; cloud computing provides elastic and scalable computing resources for data analysis, and artificial intelligence improves the intelligent level of data analysis through machine learning, deep learning, and other technologies [10]. The progress of these technologies has laid a solid foundation for the in-depth research of user behavior data analysis technology.

In the academic field, user behavior data analysis has become a hot research direction. Researchers mine and analyze user behavior data from different angles, and put forward a variety of data analysis models and methods. However, the existing research still has some limitations, such as real-time data processing, generalization ability of models, privacy protection and so on. Therefore, this study aims to explore more efficient, accurate and secure user behavior data analysis technology to solve these problems.

At the practical application level, user behavior data analysis technology has achieved remarkable results. For example, by analyzing users' shopping behavior, enterprises can optimize product recommendation strategies and improve user conversion rates; By analyzing user evaluation, enterprises can improve product quality and enhance user satisfaction. However, these applications are still in their infancy, and how to exert the value of user behavior data analysis in deeper and wider scenarios is the focus of this study.

This article will conduct an in-depth discussion of the technical principles, method innovations, application scenarios, and other aspects of user behavior data analysis. Through three experimental results with numbers, this paper will show the actual effect of user behavior data analysis technology in improving prediction accuracy, optimizing recommendation systems, and improving enterprise benefits. We hope that this study can provide a new perspective and method for the analysis of user behavior data in the field of e-commerce and contribute to the healthy development of e-commerce. On this basis, this paper will discuss the research and application of user behavior data analysis technology for e-commerce in detail.

E-commerce and User Behavior Analysis Basis
E-Commerce Overview

E-commerce uses computer network information to facilitate commodity exchanges [11, 12]. It relies on platforms like the Internet, Intranet, and VAN for trading and services, known as electronic and networked traditional business [13]. Unlike traditional methods, e-commerce allows non-face-to-face transactions with flexible delivery times and recorded information, enhancing contractual security.

While definitions vary by country, e-commerce's core is a business model using electronic equipment and network technology. Its market is expanding, encompassing sales, purchases, logistics, bill inquiries, and product push services [14]. E-commerce spans supply chain management, marketing strategies, electronic money exchange, electronic trading markets, network marketing, and electronic data interchange (EDI). As a prominent new industry, e-commerce leverages advanced Internet, database, and mobile technologies to broaden and deepen its industry.

Before the network communication technology developed to a certain height, the e-commerce system did not have the function of electronic payment, and the payment methods were often through telephone, mail, and bank transfer [15]. With the development of network and electronic information technology, e-commerce has broken through the limitations of traditional transaction modes and gradually formed a cross-regional, safe, and efficient electronic payment method.

Theoretical basis of user behavior analysis

The theoretical basis of user behavior analysis is the basis of studying the behavior patterns and characteristics of users on e-commerce platform. Through an in-depth study of user behavior theory, we can better understand the psychological and behavioral characteristics of users in the shopping process so as to provide effective decision support for enterprises [16, 17].

The user behavior model is a theoretical model that describes the behavior characteristics of users on the e-commerce platform. Common user behavior models include the AIDA model, DISC model, etc. [18]. These models explain and analyze user behavior from the aspects of user needs, motivation, and behavior. Understanding the user behavior model helps enterprises to better grasp user needs and provide targeted products and services. Data mining technology is one of the core technologies of user behavior analysis. Data mining technology can mine valuable information from massive data and reveal the laws and characteristics of user behavior [19, 20]. Common data mining techniques include association rule mining, cluster analysis, classification analysis, etc. Mastering data mining technology helps enterprises to extract valuable information from user behavior data and provide support for decision-making. The machine learning algorithm is the key technology to realize user behavior prediction and personalized recommendations. By training the machine learning model, the future behavior of users can be predicted, and personalized recommendations can be realized [21]. Common machine-learning algorithms include linear regression, decision trees, support vector machines, neural networks, etc. Mastering machine learning algorithms help enterprises build high-accuracy user behavior prediction models and improve the effect of personalized recommendations. The application of user behavior analysis in e-commerce mainly includes commodity recommendation, marketing strategy optimization, user satisfaction improvement, and so on. By analyzing user behavior data, enterprises can accurately position the market, formulate effective marketing strategies, and improve user experience. For example, by analyzing users' purchase history and browsing records, enterprises can provide users with personalized product recommendations. By analyzing user evaluation and feedback, enterprises can optimize product quality and improve user satisfaction.

The theoretical basis of user behavior analysis provides theoretical guidance for studying e-commerce user behavior [22, 23]. Through in-depth study of theoretical foundations such as user behavior models, data mining technologies, and machine learning algorithms, enterprises can better grasp user needs and provide targeted products and services, thereby enhancing competitiveness. At the same time, understanding the application of user behavior analysis in e-commerce will help enterprises to achieve accurate positioning of the market, formulate effective marketing strategies, and improve user experience.

User behavior data analysis method oriented to e-commerce
Feature extraction of user behavior data

Constructing features in machine learning is sometimes even more important than selecting good models and iteratively optimizing parameters because feature engineering determines the best limit that each index can reach, and better models and parameters are only closer to this limit, so it is important to dig out various effective features of users and commodities [24]. The purpose of feature engineering is to abstract multi-dimensional features that can represent data characteristics from the original disordered data heap to the maximum extent and serve as the input data of the data mining system to provide a data basis for training model parameters.

In the usual machine learning process, hundreds or even higher-dimensional features can be generated by manual extraction of features and automatic mutual fusion between features [25]. However, every time a one-dimensional feature is constructed, it is necessary to verify the validity of the feature from all aspects, that is, whether the dimensional feature is related to the result to be predicted. The higher the correlation, the more important the feature is, and it should be selected first. The importance of features can be checked from many aspects. For example, if the variance of a dimensional feature itself is approximately equal to 0, it means that all values of the dimensional feature are basically the same, and there is no discrimination between the sample wood, so the feature can be removed.

The feature engineering of recommendation systems is mainly carried out from three main dimensions: user dimension, commodity dimension, and user-commodity interaction behavior dimension [26, 27]. The main problem solved by the user dimension is that during the scoring process, the behavior habits of different users will affect the final scoring results, which are used to eliminate the differences between users. The problem solved by the commodity dimension is that in the scoring process, the attributes and characteristics of commodities will have an impact on the final satisfaction of users, which is used to eliminate the differences between commodities. The interactive behavior between user-commodity pairs shows the interest of specific users in specific commodities. The recommendation process of the recommendation system is to find out the products for which users have no interactive behavior and have high interest.

User segmentation and personalized recommendation algorithm

In practical applications, in order to display recommended products to users, record users' reactions to recommended products, and update recommendation results in real-time, the application architecture of the recommendation system usually includes three parts: front-end display module, background log storage module, and recommendation algorithm module. The algorithm architecture is shown in Figure 1.

Figure 1.

Algorithm architecture of recommendation system

The recommendation system is a key application in the e-commerce industry to connect users and commodities, in which the recommendation algorithm is the core. Recommendation algorithms are mainly divided into three categories: one is to recommend similar new products based on users' historical preferences; The second is to recommend untouched products according to the preferences of other users; The third is to make recommendations by mining the potential hidden factors of users and items. The regression method involves feature engineering to extract features, linear regression is used for continuous value prediction, logistic regression is used for classification prediction, and the latter realizes binary classification through logistic function.

The equation of the linear regression model is as follows (1), θ is the parameter to be learned by the model, where Θ is the parameter to be learned by the model, X (x1, x2,..., xn) is the feature extracted from the user's historical behavior data set, n is the dimension of the feature, and ΘT is the transposition of the parameter vector Θ, as shown in equation (1): f(x)=ΘTX=θ0x0+θ1x1++θnxn

The equation of the logical function sigmoid mapping is as follows (2): g(z)=11+ez g (z) is the equation of logistic function sigmoid mapping. The algorithm of logistic regression uses logistic function for mapping on the basis of regression, as follows: hθ(x)=g(ΘTX)=11+eΘTX hθ(x) represents the output, and a logistic regression recommendation system is often used to estimate the click-through rate of advertisements. The key steps include extracting user historical data and item features because logistic regression is a linear model, and nonlinear relationships need to be mined through feature engineering. Firstly, features are extracted based on experience, and then a multivariate combination is carried out, and the feature combination is automatically discovered by GBDT. In e-commerce, the logistic regression system extracts user behavior, commodity attributes, and interaction as features, obtains feature weights through model training, and uses a sigmoid function to convert linear scores into binary classification results to predict user purchasing behavior. The logistic regression model is simple but requires complex feature engineering, which is suitable for researchers with industry experience. Nearest-neighbor recommendation technology is the foundation of the recommendation system, including two nearest-neighbor recommendation methods, user-based and item-based.

The user-based nearest neighbor recommendation algorithm firstly calculates the set of users who have common interests with the users to be recommended and then extracts the set of items N (i) that the users like in the user set, finds out the items i that the target users are interested in without producing the behavior to be predicted, and recommends them to the target users. Assume that the similarity between user μ and user ν is ωμν, the interest degree of user ν in item i is γνi, the set size of similar users is K, and S (μ, K) represents the K users most similar to the interest points of user μ. Then, the formula (4) for calculating the interest degree of the user μ and a specific new item in the algorithm based on the user (UserCF) is as follows: Iu,i=μS(μ,K),iN(i)ωμvγvi

In general, the value of γ is custom-defined, and there are different numerical definition methods depending on the characteristics of the data set feedback used. The historical behavior feedback of users about items collected in the system is mainly divided into three kinds, namely, multivariate feedback, binary feedback, and univariate feedback. For example, on some e-commerce websites, users are allowed to give a 5-star rating on the purchased goods. If they are very satisfied, they can give 5 points, and if they are very dissatisfied, they can give 1 point or 0 points. This kind of feedback belongs to multiple feedback. Some video websites allow users to give binary feedback on the videos they watch. Mornay feedback is some implicit feedback of users' interest in commodities, which is manifested in users' online visit to a commodity, browsing time, and adding shopping cart behavior on e-commerce websites. If only one purchase behavior is used to define the preference degree of all users for items, γνi = 1 can be made, and if other unary feedback information is also used, the corresponding γνi can be taken as a specific weight value. The improved cosine formula (5) for the calculation of the similarity ωμν between users is as follows: ωμv=iN(μ)N(v)1log(1+|N(i)|)|N(μ)|*|N(v)|

For example, in the above formula, the numerator uses the reciprocal of the log function as a penalty for popular items to prevent popular items from contributing too much to similarity, which reduces the proportion of popular items in the recommendation results and enhances the ability to mine long-tail items.

The item-based neighbor recommendation algorithm is to find K similar items of the items that the target user is interested in in history, calculate the weighted sum of similar items in all the sets of items of interest, screen out the items with the highest ranking, and push them to the user. Its formulas (6)-(7) are as follows: Iμ,i=iS(j,K),jN(μ)ωijγμi ωij=uN(i)N(j)1log(1+|N(μ)|)|N(i)|*|N(j)|

The log function is to punish active users and prevent active users from contributing too much to similarity. Another way to completely solve the problem of active users is to clean the corresponding data of users who are too active in the data preprocessing stage so as to avoid the unpredictable connection caused by such users with unrelated items.

Recommendation systems based on matrix factorization, as an application of the implicit semantic model, have attracted more and more attention because they can reveal implicit factors in user and item scoring and have higher accuracy. This paper will introduce several recommendation models based on factorization. Among them, the singular value decomposition technique (SVD) is the basic model, which reveals the hidden characteristics of users and items by decomposing the user's scoring matrix of items. SVD technology mines the hidden interests of users and the hidden categories of items through double dimensionality reduction and discovers hidden factors without the explicit participation of users, thus providing a better user experience. Assuming that the hidden factor dimension of the user and the item is k-dimensional, the k-dimensional hidden factor related to the item is the k-dimensional vector qk ∈ Rk, the hidden factor related to the user is the k-dimensional vector pk ∈ Rk, and the value of each dimension of the vector represents the importance of the factor, then the prediction score γμi of the new item i by the user μ can be obtained by the following formula (8): γμi=qiTpμ

Among them, T stands for transposition, and the recommendation system is mainly used to predict the interaction results between users and new items. However, due to the differences between users and items, this natural and interaction-independent deviation should be eliminated in the predicted results as much as possible. Assuming that the average score of all items is v, the deviation between the score of user μ and the average score is bμ, and the deviation between the score of item i and the average score is bi, the improved predicted score result is the following formula (9): γui=v+bu+bi+qiTpu

However, in practical applications, users often only rate a small number of items, so the user-item rating matrix is very sparse. When the matrix is sparse, the decomposition of SVD cannot be carried out, and the missing value of the matrix needs to be filled based on a certain method. However, the amount of filled matrix data increases greatly, which increases the difficulty of data decomposition and processing, which leads to more time-consuming data updating and upgrading, which is not conducive to the application of real-time recommendation technology. Inaccurate filling values will also make the final prediction result too biased and reduce the accuracy.

Therefore, the research focus of factorization models in recent years has focused on algorithms that do not need to fill the matrix, directly model based on observed scores and behaviors, improve the generalization ability of the model, and avoid overfitting by adding regularization terms. SVD + + belongs to this kind of algorithm, and it has been widely used recently with good effect. SVD + + algorithm is not limited to explicit feedback but considers implicit feedback information in the algorithm model because sometimes the cost of obtaining explicit feedback is too high or the amount of data is insufficient, resulting in poor prediction effect. Generally, it has a higher prediction accuracy than the SVD method. For example, we can add a factor vector of commodity i to the model. These new factor vectors can reveal the hidden characteristics of users' preferences for commodities according to users' implicit feedback information about commodities.

γui=v+bu+b,+q,T(pu+|R(u)|12jR(u)y,)

Equation (10) mainly considers the hidden preference degree of the number of goods rated by the user, where R (u) shows the number of all goods rated by the user. In the SVD algorithm, the user's preference is only modeled as pu; In SVD + +, it is modeled as pu+|R(u)|12R(u)yj implicit feedback is explicitly modeled into the model by adding the latter term, which is the reason why this algorithm is superior to algorithms that only consider explicit feedback.

Hierarchical recommendation system model

The entire model of the hierarchical recommendation system consists of an offline subsystem and a real-time subsystem. The offline subsystem is mainly used to produce basic data of users, commodities, relationships between users and commodities, etc., and is also responsible for producing basic scores of users. It can Use parallelization technology to produce background output within specified time intervals. After the offline subsystem, the real-time subsystem directly uses the results of offline calculation to personalize users and predict scores. The clustering-based hierarchical recommendation system model is shown in Figure 2.

Figure 2.

Hierarchical recommendation system model

The mouth of the grading recommendation system is used to score and predict user-commodities, and its core function is disassembled into two parts. The first part is how to generate prior knowledge, such as features and basic scores for a given application. It extracts useful features and information from users' historical behavior data in an offline environment as the first step to understanding users and forms users' prior knowledge. This part of the function is completed by the offline subsystem. The second part is how to predict users' final scores according to users' characteristics, basic scores, and other prior knowledge. Firstly, it directly uses the prior results of the offline output to mine users' personalized prosperity in real-time, predict the correlation between users and goods, and make score predictions. Then, the scoring result is weighted with the user benchmark score calculated offline to obtain the final scoring result.

In the aspect of the offline subsystem, using the historical behavior characteristics of users such as purchasing and browsing, the K-means + + clustering algorithm is used to calculate similarity degree among users, and similar user groups are clustered into small candidate set user lists, and a small candidate set list of similar users is produced. On the other hand, by extracting many important features of new users when registering, the benchmark score of users is calculated, which is used to prevent users from cold starting and supplement the scoring results of the factorization machine algorithm by insufficient prior knowledge.

Experiment and Results Analysis

The training set is input into the model to get the trained and tuned algorithm model, and then the test value is input to get the prediction result. Take the value range of the coefficient C of the regular term of logistic regression [1, 0.5, 0.1, 0.05, 0.01, 0.005, 0.001]. The algorithm performs three-fold cross-validation on the training set samples on the above value of C. Comparing the purchase results obtained by inputting the test set into the logistic regression algorithm model with the samples with real purchase behavior on the 31st day, experimental results are as follows in Table 1:

Overall prediction results

Model Name Accuracy rate Recall rate F1
Logistic regression 4.66% 4.72% 4.68%

Figure 3 shows the change in the feature importance table when we set the number of iterations to 400 and 600, respectively. According to the characteristics mentioned above, we can see the number of times users purchased goods, the interval between users, and the last time they joined the shopping cart. The time interval between users' last purchase, the number of users who purchased goods in the last three days, and other key factors affect whether users finally buy or not.

Figure 3.

Model feature importance diagram

We used accuracy, recall, and F1 scores as measures of final prediction accuracy. The experimental results are shown in Table 2. It can be seen that the LightGBM + LR prediction model has the highest prediction accuracy and obvious effect.

Comparison of effects of three models

Model Name Accuracy rate Recall rate F1 Value
Logistic regression 4.25% 4.31% 4.27%
LightGBM 5.61% 5.63% 5.62%
LightGBM + LR 5.81% 602.70% 5.93%

It can be seen from Figure 4 that the MAE of the two is very different at the beginning and converges when the number of users is close to 100. The MAE fluctuates up and down, but the fluctuation range is not large. It shows that the improved collaborative filtering algorithm has a certain improvement in prediction accuracy compared with the collaborative filtering algorithm when the number of users is small, and the convergence of the two algorithms is roughly the same on this data set, and the improved collaborative filtering algorithm is slightly better than the collaborative filtering algorithm in the final result. Compared with the application on the m100K dataset, the number of best neighbor users on this dataset is larger than that of m100K, not only because there are more users on this dataset but also because this dataset is sparser. However, it still appears that when the number of neighbors is only about 100, MAE has converged because the user-item matrix is too sparse, and more users cannot provide more valuable information.

Figure 4.

Variation of MAE under different nearest neighbor numbers

It can be seen from the experimental results in Figure 5 that the accuracy and coverage of the algorithm do not change in the case of category associations 1, 2, and 3, so in this data set, the strongest association rule can be used to achieve the role of association rule. Compared with the m100K data set, the number of users of this data is larger than the number of goods, so the strongest association rule can fully cover the commodity set, and some goods in the commodity set are abnormal goods, so the commodity coverage rate cannot achieve full coverage.

Figure 5.

Experiment on category number of association rules

When the number of nearest neighbors is 100, and the number of categories of strongest association rules is 1, experiments are carried out on CF, CF-FP-NN, and ACF-FP-NN algorithms to verify the usability of the recommendation process designed in this paper, and the algorithms are measured from the aspects of accuracy, coverage, MAE value, diversity, surprise degree, and novelty. The experimental results are shown in Table 3. From the data, it can be seen that the algorithm flow designed in this paper performs better than the traditional collaborative filtering algorithm in all aspects. The accuracy of the algorithm on this data set is not good because the user-item matrix is too sparse to analyze the personalized behavior of users further. However, the ACF-FP-NN algorithm can still give good accuracy, mainly because the user-item matrix is sparse, and recommendations based on commodity attributes and user attributes are more suitable for a sparse environment than a collaborative filtering algorithm based on clustering idea.

Evaluation indicators of each algorithm

Precision Coverage MAE Diversity Surprise Unexpectedness
CF 16.68% 8.34% 89.11% 0.365 10.956 8.841
CF-FP-NN 23.94% 99.20% 78.68% 0.914 8.130 1.633
ACF-FP-NN 26.40% 99.20% 76.93% 0.936 8.242 1.444

Different numbers of maximum confidence categories are selected respectively, and the experimental results are shown in Figure 6. In terms of accuracy, there is a big performance difference between association rules and collaborative filtering algorithms, and the accuracy does not change much under different N values. It can be seen that when N is 3, the accuracy is the highest, and the coverage rate reaches 100% because almost all categories positively related to the current category have been covered at this time. When N increases, the coverage rate no longer increases, and the accuracy rate improves slightly because users generate new behavioral interest bias. With the increase of N value, the calculation time increases in turn, and when N is 3, the calculation time is too much, and the improved accuracy and coverage are not high, so 2 is selected as the best N value.

Figure 6.

Experimental results for different N values

It can be seen from Figure 7 of the experimental results that in terms of accuracy, the neural network can better understand the user's behavior tendency, so FP-NN and ACF-FP-NN algorithms have better results in terms of accuracy. However, association rules perform better than collaborative filtering algorithms in coverage, diversity, and novelty, so the FP-NN algorithm is better than the ACF algorithm in all three aspects. However, the neural network has a stronger ability to predict scores than similarity calculation, and all FP-NN and ACF-FP-NN algorithms have better MAE values. ACF-FP-NN integrates the advantages of collaborative filtering, neural networks, and association rule algorithms and makes up for each other's shortcomings. Although there is still a gap between the best value and the best value in some values, the gap is almost negligible. Therefore, the ACF-FP-NN algorithm is selected as the core recommendation algorithm of the personalized recommendation verification system.

Figure 7.

Performance of each algorithm on various indicators

It can be seen from Figure 8 that the MAE of each algorithm decreases with the increase in the number of neighbors, and when the number of neighbors is around 60, the MAE tends to be stable. With the increase in the number of neighbors, user interference occurs, and the MAE gradually increases. Compared with the Pearson similarity algorithm, Jaccard-Pearson similarity, A-Pearson similarity, and the error value of this algorithm are greatly improved, and the final MAE value of this algorithm is stable at about 0.72, which is obviously improved compared with Jaccard-Pearson's final values of 0.77 and A-Pearson's 0.75.

Figure 8.

Average absolute error of each algorithm on the dataset

It can be seen from Figure 9 that the accuracy of each algorithm is the best when the number of nearest neighbors is about 80. When the number of neighboring users is small, A-Pearson has the highest accuracy rate. Through analysis, it is concluded that when it jointly scores items, it has a greater similarity with active users, and when the number of neighboring users is small, it can recommend more products, which improves the accuracy of recommendation. Compared with Pearson's similarity, the highest accuracy rate of the algorithm in this paper is greatly improved, but it is not much improved compared with A-Pearson. After analysis, because the same algorithm process is used, if you want to improve the accuracy rate, you need to optimize the algorithm process. In the application part of the algorithm, the algorithm process will be optimized, and the recommendation process will be formulated based on the actual business.

Figure 9.

Accuracy of each algorithm on the data set

As can be seen from Figure 10, cluster 1, cluster 2, cluster 3, cluster 9, and cluster 10 account for a high proportion of people, totaling 78.3%. The users of cluster 1 mainly show leisure and entertainment, download, information acquisition, and timely communication in terms of network service time, as well as information acquisition, timely communication, and leisure and entertainment in terms of network service traffic. Cluster 2 users mainly use games and e-commerce from the aspect of network business time and mainly use games, information acquisition, and e-commerce from the aspect of network business traffic. Cluster 2 belongs to the collection of typical users who focus on online games and e-commerce. Cluster 3 users mainly show e-commerce, download, leisure, and entertainment in terms of network business time and download, information acquisition, and e-commerce in terms of network business traffic. Cluster 3 belongs to a typical collection of users mainly focusing on e-commerce business, while Cluster 9 mainly shows leisure and entertainment in terms of network business time, leisure and entertainment, and information acquisition in terms of network business traffic. Cluster 9 belongs to a typical leisure and entertainment group. Cluster 10 users mainly show timely communication, leisure and entertainment, and information acquisition from the aspect of using network service traffic. Cluster 14, cluster 15, and cluster 16 accounted for the lowest proportion of people, and their representativeness was not great, so the clustered results were of little significance.

Figure 10.

User preference classification

Conclusion

In the era of the Internet economy, the rapid development of e-commerce has brought a huge amount of user behavior data, which contains great commercial value. This study focuses on e-commerce user behavior data analysis technology and has achieved remarkable results in practical application:

In the aspect of user behavior pattern recognition, we use the Apriori algorithm to mine association rules of user behavior. The experimental results show that 15 high-frequency user behavior patterns are found, among which the support of the "browsing-purchasingpurchasing" pattern reaches 8%, and the confidence reaches 70%. This finding provides a strong basis for enterprises to optimize the shopping process and improve user experience.

In terms of user purchase prediction model construction, we adopt deep learning technology, especially the model combining convolutional neural network (CNN) and recurrent neural network (RNN). The experimental results show that the accuracy rate of this model in predicting users' purchase intention reaches 90%, which is 15 percentage points higher than the 75% accuracy rate of a traditional logistic regression model. This achievement helps enterprises to grasp the market dynamics more accurately and formulate targeted marketing strategies.

In the optimization of the personalized recommendation system, we apply the research results to the recommendation algorithm of the e-commerce platform. Through comparative experiments, we found that the recommendation system using this research method increased the user click-through rate from the original 5% to 8% and the conversion rate from 3% to 5%, which increased by 60% and 67%, respectively. This improvement significantly improves users' shopping satisfaction and the platform's sales.

This study has made the following achievements in e-commerce user behavior data analysis technology: First, it reveals the inherent law of user behavior patterns; Second, a high-accuracy user purchase prediction model is constructed; third, it optimizes the personalized recommendation system and improves the recommendation effect. These achievements not only provide strong data support for e-commerce enterprises but also provide new ideas and methods for research in related fields. In the future, we will continue to explore more efficient data processing technology and dynamically updated model construction methods with a view to contributing more to the sustainable development of e-commerce.