Otwarty dostęp

ARIMA model analyzes the tendency and challenges of intelligent marketing in the era of digitalization

  
24 wrz 2025

Zacytuj
Pobierz okładkę

Introduction

As a kind of data collection with many types, large quantities and complex structures, big data contains high business and application values. Capturing, calculating, analyzing and mining big data on the cloud architecture platform can generate the data required by enterprises, which can be used to gain real-time insights into economic development and consumer demand, predict market development trends, optimize related businesses, and improve service levels [1-5]. The arrival of the big data era has triggered a huge change in the internal and external environment of marketing, the traditional market-oriented marketing model is declining, and consumer-centric smart marketing has emerged. The application of smart marketing to big data has greatly eased the data processing intensity of marketing personnel, so that they can apply more energy to marketing creative planning, so that marketing activities are more cross-border, agile and creative [6-10]. In the face of the increasingly competitive market environment, enterprises have to seek a way out from the market of “too much gruel and too little porridge” to maintain market share, in addition to accelerating product research and development, innovation and upgrading, the key is to take the initiative to embrace the era of big data as a core tool to improve the customer experience, customer stickiness and operation [11-14]. Autoregressive Integral Moving Average (ARIMA) model, which is one of the most common and widely used time series forecasting methods in statistics in recent years, because it can simultaneously take into account the dependence of the change law of the target variable on the time series as well as the interference of random fluctuations on the impact of the time series in the process of fitting the time series, and it has demonstrated a very good degree of accuracy and reliability in trend forecasting [15-19]. Therefore, while fully exploring the value of data, comprehensively grasping the trends and challenges of smart marketing, continuously optimizing smart marketing strategies through ARIMA model to create differentiated competitiveness, can shape long-term value for corporate branding [20-23].

This paper analyzes the basis of ARIMA sales forecasting model construction, including linear smooth model and linear non-smooth model two parts. Using ARIMA modeling steps, ARIMA sales forecasting model modeling is carried out by combining the sales data of products. Perform time series smoothness test and time series smoothing, calculate the ACF and PACF values of the smooth time series, and determine the ARMA model properties of this series. The model is tested as well as the model sales results are predicted. Analyze the shortcomings of single ARIMA model in product sales forecasting, and establish a combination forecasting model that combines ARIMA model. Perform sales forecasting on the combined forecasting model and compare the difference between the single model forecast, the combined model forecast, and the actual sales value of the product.

Predictions for smart marketing in the age of big data
Big Data Technology Aids Marketing Transformation

Compared with many Internet informationization concepts in the current era, the concept of big data analysis market for intelligent marketing is not the latest theory. Due to the popularization and use of various cloud computing technologies, application platforms and different mobile terminal devices, coupled with the emergence of social media, the construction of various big data analytics and intelligent marketing systems has become more systematic and complete. To a certain extent, this has increased the technical weight of big data in marketing, and it has become an inevitable choice for many companies. Therefore, it is not difficult to see that the main direction of the future integration of network marketing market is data analysis through big data technology. Combined with the actual situation over the past few years, the development of traditional social media such as various radio stations and newspapers and magazines is gradually slowing down. Coupled with the promotion of multi-network integration strategy, it makes big data technology in all kinds of intelligent marketing and traditional enterprise daily marketing work involved in the data information integration more thoroughly, so as to produce a new marketing pattern of data is king.

Big data analytics can assist companies to fully outline the consumer “DNA” of their customers, a comprehensive understanding of customers and marketing is the key to business cooperation with customers. Big data analytics can provide companies with a personalized understanding of their customers. Through the comprehensive collection and analysis of social media data, mobile data, web data and other data types, companies can have a comprehensive grasp of their target consumer groups and keep abreast of their consumption needs and goals. In addition, companies can use big data to analyze and find out in advance what customers have not yet asked for, and sometimes even what customers themselves are not aware of. By analyzing the data models that they already have, companies will know various requirements and ideas about themselves that customers are not yet aware of [24-25].

Intelligent Marketing and Major Marketing Approaches

Intelligent marketing is the comprehensive use of mobile Internet communication technology, cloud technology, Internet of Things technology, big data technology and other advanced technologies to reintegrate the traditional marketing system, making it a new marketing system that is more adaptable to the development of the current era and more responsive to people’s ever-growing and changing consumer needs. Intelligent marketing will be supported by a variety of technologies to improve the performance of marketing, so that the marketing is more accurate, interactive effects closer to the psychological needs of consumers.

The main marketing methods of smart marketing include:

Precision marketing

The marketing method of smart marketing is to track and analyze the consumption behavior of online consumers and get more effective information after scientific reasoning evolution so as to carry out precise marketing. At the same time, according to the statistical market user needs and consumption habits, from the product culture to the physical product for more accurate development and design. From this, we can see that intelligent marketing is consumer-oriented, the use of modern high-tech means to carry out precision marketing.

Programmed Marketing

Intelligent marketing will be the content of the marketing of specific programmed operations, Internet technology and mobile communication technology as the basis for relying on modern programming technology, to achieve better human-computer interaction, highlighting the humanistic performance of the product at the same time also improves the degree of matching between the product and the user in intelligent marketing.

Cross-border marketing

Border fusion cross-border marketing means are increasingly appearing in smart marketing. Consumers will pay more attention to a certain product or a certain type of product when considering their own needs.

Interactive and experiential social marketing

Intelligent marketing utilizes its technological advantages to fully carry out interactive experiential social marketing. Intelligent marketing in the implementation of closed-loop marketing model, the model for the “technology + service + product + creativity”, in this model, interactive services and service scenarios is the key.

ARIMA model for sales marketing forecasting
ARIMA

1) Linear smooth models

It is often assumed that random sequences are generated by linear combinations of random shocks, and such random sequences are described using a general linear stochastic model. In concrete expressions, especially when applied to specific economic problems, it is often desirable to use models with parsimonious parameters. This can be done by using a small number of autoregressive and sliding average terms to describe the linear process. First discuss the properties of the autoregressive sliding average ARMA model [26].

Definition 1: Let Xt be a real smooth random sequence EXt=0 which satisfies the following stochastic difference equation: Xtφ1Xt1φpXtp=εtθ1εt1θqεtq where polynomials: φi(z)=k=0pφkZk Whichφ0=1 θi(z)=k=0qθkZk Whichθ0>0 $${\theta _i}(z) = \mathop \sum \limits_{k = 0}^q {\theta _k}{Z_k}Which\;\left( {{\theta _0} > 0} \right)$$ are all real coefficient polynomials whose solutions lie outside the unit circle of Z=1 , and εt is a standard white noise sequence, then Xt,t=0,±1, is said to be a ARMAp,q sequence (or an ARMA model), i.e., an autoregressive sliding average model.

Definition 2: In Definition 1, an equation is said to be satisfied if θ1=θ2==θq=0 : Xtφ1Xt1φpXtp=εt

Of a smooth random sequence Xt is ARp modeled as an autoregressive model.

Definition 3: In Definition 1, an equation is said to be satisfied if φ1=φ2==φp=0 : Xt=εtθ1εt1θ2εt2θqεtq

Of the smooth random sequence Xt is the MAq model, i.e., the sliding average model. Obviously, both the ARp and MAq models are special cases of the ARMAp,q model.

For the autoregressive sliding average model ARMAp,q , a more complete theory has been given to the data to be processed.

The first step is to perform model identification, i.e., to see if X , meets the characteristics and requirements of an autoregressive sliding average model. Assuming Xt meets the requirements, the computer program can perform a number of tasks. Determine the model form (i.e., the size of p,q the model), estimate the parameters φi , i=1,,p , θj , j=1,,q and make predictions under certain criteria (e.g., whether the estimation form is linear or not, the criterion for the estimation, the minimization of the mean squared deviation, etc.). However, the ARMAp,q model is more demanding on the time series, which requires Xt a smooth time series, i.e., the mean and mean squared deviation are constants, and the covariance is only a function of the time Γ lag. And in many practical problems, especially for some economic variables, most of them do not satisfy the smoothness requirement, but show obvious trend or seasonality. For some such time series, it is not possible to simulate them directly using the ARMAp,q model, but need to be described by a more general model, i.e., the order: Xt=μt+Yt

Where μ denotes the mean value over time and Yt denotes the zero mean smooth process which can be fitted by ARMA model. The way to deal with this type of event series is to first eliminate μ by some method. Then fit Yt by using ARMA model. And finally derive Xt from the prediction of Yt based on the inverse of the value, which is known as ARIMA method.

2) Linear non-stationary model

The time series formed in the process of selling a product does not have a fixed mean value in its evolution, nevertheless, except for local differences in levels, the series shows some sense of homogeneity, i.e., one part of the series is extremely similar to any other part. It is assumed that the process can be smoothed by appropriate differencing, i.e., the corresponding series becomes a smooth mixed autoregressive sliding average process, i.e., a summed autoregressive sliding average ARIMA process, after d th order differencing.

Let: Xt=XtXt1 dXt=d1Xtd1Xt1

Calling them first order difference, second order difference, ...... and d st order difference respectively, the following definitions are available.

Definition 4: If the time series Xt , after d rd order differencing is a smooth time series, then call Xt a ARIMAp,d,q series.

For the ARIMAp,d,q sequence Xi , do the d th order difference, Zt=dXt , and then use the ARIMAp,q model to fit and predict the Zf , the fitting and prediction results can be obtained by appropriate inverse Xt .

ARIMA modeling process

ARIMA constructs the model in the following order: test the smoothness of the original data, do the difference operation on the original data that is not smooth, and after the difference operation to test the smoothness of the series again. If it is not smooth continue to do the difference operation on the time series. If the test result is smooth then the sequence of white noise test. If the series is white noise analysis ends and the data is not suitable for ARIMA modeling. If the sequence is not a white noise sequence, then the time series is fitted to build ARIMA differential autoregressive moving average model.

The ARIMA model construction process is shown in Figure 1.

Figure 1.

The ARIMA model builds the process

Modeling Process and Sales Forecasting

The following sales data of a product is selected to be analyzed to determine whether the sales data of the product fits the ARIMA modeling. The sales data of the product series in the paper is obtained from the commercial database of the product, and the data spans from January 2019 to December 2023 for the national sales data of the product.

1) Time series smoothness test

In this paper, we mainly use data plots combined with unit root test methods to test the smoothness of the time series.

Data plot test method: using SAS software to draw a time series plot of the data, this time series plot is a line graph of the time series on the tX1 plane. The smoothness of the time series is determined by observing the trend, periodicity, and the magnitude of its fluctuations of the time series graph, so this method is not accurate enough, and the smoothness of the time series is mainly determined by the subjective feeling of the observer.

Unit root test (DF test): In MyEclipse, according to the unit root formula, write the program unit root calculation function “DickeyFullerTEST”. After the DF unit root test only need to run the program “DickeFullerTEST”, assuming that the sequence is non-stationary sequence exists β=1 . If the sequence is smooth, then β1 . Now assume that the sequence is non-stationary, so that β=1 into the formula (9), (10) to calculate the value of DF: s(β^)=1T1t2Tu^t2t2Tyt12 DF(X)=β^1s(β^)

The result of the run is the unit root test value of the original data, according to the size of the value you can judge whether the original data is smooth or not. When the value does not exist or is particularly small, the original data can be judged as a smooth sequence, otherwise it is not smooth. In this paper, the use of time series graph combined with the unit root test to determine the smoothness of the time series. According to the original data to draw its time series graph, the original data time series graph shown in Figure 2.

Figure 2.

Raw data time sequence diagram

In the MyEclipse system to run the program “DickeFullerTEST”, the unit root of the original data is: DF (x) = 0.565893372, combined with the original data of the time series plot, fully explained that the experimental data for the non-smooth sequence.

2) Time series smoothing

Because the original series is a non-smooth series, so the original series is calculated by difference to do the smoothing process. The calculation formula is: Δxt=xtxt1Δ2xt=ΔxtΔxt1Δdxt=Δd1xtΔd1xt1

Where t is the point in time, if there are periodic fluctuations in the time series, then the data should also be subjected to the seasonal difference operation, the seasonal difference processing operation removes the periodicity of the time series, which is calculated as follows: ΔsXt=XtXts where S is the cycle length.

The time series plot after first order differencing Δxt=xtxt1 is shown in Figure 3. Looking at the time series plot again, it can be seen that the data are still not smooth after the first order difference. Run the program “DickeFullerTEST” in Myeclipse again to get the unit root of the data after first-order differencing: DF(x)=0.115296404. Combining the original first-order difference-processed time series plot with the value of the unit root DF(x), it is concluded that the first-order difference-processed data is still a non-stationary series.

Figure 3.

The time sequence diagram of the first order difference

So the results after the first-order difference processing also have to do the difference processing. As can be seen from the time series graph, the fluctuation of the data after the first-order differencing is periodic, and the results of the processing show significant fluctuations at every delay of 12 time points, so the data are done seasonal differencing processing. The processing formula is: Δ12Xt=XtXt12

Seasonal Differential Post p(x)=0.0000073 .

The plot of the time series after seasonal differencing is shown in Figure 4. It can be seen that the series of the original data after first-order differencing and seasonal differencing is a smooth series, and the order of differencing can be determined as d =5.

Figure 4.

Time sequence diagram of the seasonal difference

3) Model Recognition

Autocorrelation coefficients and partial correlation coefficients or AIC, SBC are computed to determine the suitable model for a smooth time series. If the autocorrelation function of the series is trailing and the partial correlation function is truncated, the series is suitable for the AR model. If the autocorrelation function of the series is truncated and the partial correlation function is trailing, the series is suitable for the MA model. If the autocorrelation function and the partial correlation function of the series are trailing, then this series is suitable for ARMA model.

The ACF and PACF plots of the smoothed time series obtained after the smoothing process of the time series in this paper are shown in Figure 5. From the figure, it can be seen that the ACF and PACF of the smoothed series are trailing, indicating that this series is suitable for ARMA model.

Figure 5.

The ACF and PACF diagram of the difference sequence

4) Model parameter estimation

After analyzing the ACF and PACF plots and iterative screening using the AIC criterion, p=5 and q=7 were determined.

5) Model testing

The main test of the validity of this model is to calculate whether the residuals of the model are white noise. There is already a smooth sequence yt , and for any s,tN , there is: Eyt=μCovyt,ys=σ2t=s0ts

Then yt is a white noise sequence, from which we get that this model is valid, otherwise go back to step y1,y2,,yn and re-select the model.

6) Sales Forecasting

The ARMA model utilizing the smooth time series 2 can be expressed as: yt=φ1yt1++φpytp+θ1εt1++θqεtp+εt where εt is white noise and satisfies: Eεt|yt1,yt2,=0 for all t.

This model is used to obtain the sales forecast value of this key brand from January to November 2023, which is compared with the actual sales value and compared with the sales value of the same period of 2023 in cases. The results of the ARIMA model based sales forecast results compared with the actual values are shown in Figure 6.

Figure 6.

Comparison of sales forecast results based on ARIMA model

The ARIMA-based sales forecasting model has a better prediction of the seasonality and periodicity of the monthly sales of the product, but the relative error of the predicted sales in January, February, July and November 2023 is still more than 10%, and the prediction of the extreme values appearing in the time series is not good enough, and it can be continued to be improved on the basis of this prediction.

ARIMA-BP combined prediction model construction and validation
Limitations of the model in sales forecasting

The traditional time series model is based on the assumption of linear relationship, so it is less effective for the prediction of nonlinear time series. The artificial neural network model is a nonlinear modeling process with strong learning and data processing capabilities, which can tap into the nonlinear features in the data. ARIMA and BP neural network models were established respectively to predict the sales volume of a product. In terms of the predicted and true values, when the time series predicted by the ARIMA model has extreme values, the prediction error is larger than that of the BP neural network model. Since the time series of sales volume is characterized by a binary trend change, the results of forecasting with only a single model are less than satisfactory.

In this chapter, the ARIMA model will be used to fit the linear part of the time series, and then the BP neural network model will be used to estimate the nonlinear residual part of the time series, which will eventually be superimposed as the forecasting result of sales volume. Compared with using a single model, this fusion model fully utilizes the respective advantages of the single model, significantly improves the forecasting performance of the single model, and reduces the risk of using the model.

Combinatorial prediction model construction

Different single forecasting models have certain assumptions and ranges of conditions. For example, the time series model requires the time series to be smooth and assumes that the environmental conditions are similar. The BP neural network model is better at fitting non-linear relationships, and fits the relationship between multiple variables through the input layer, hidden layer and output layer, and has a better fitting effect for linear trends, predicting future trends through the relationship between historical data and time. The combination of multiple single prediction models can fully take into account the different actual data characteristics of the prediction target and more accurately reflect the objective changes of things. At the same time, it can also consider the relationship between the prediction target and other influencing factors from different perspectives, which can help enterprises establish a prediction model with higher accuracy.

ARIMA-BP combination model is shown in formula (16): f(x)=w1G1(x)+w2G2(x) where w represents the weights of a single model, G1x represents the predicted values of a time series model at x time, and G2x represents the predicted values of a neural network model at x time.

Since the ARIMA time series and BP neural network models have large differences in their respective algorithmic principles and implementations. Therefore, when combining the two to establish a combined forecasting model to forecast the same problem, it is necessary to ensure that the time interval and granularity of the two single models are consistent. Due to the large differences in the implementation principles of ARIMA and neural network models, the former is one of the classic algorithms in the field of statistics. It models the time series entirely with time as the independent variable. The latter belongs to a kind of machine learning method, and the core advantage of neural networks is the strong ability to fit nonlinear data, while time is a non-essential factor.

Therefore, when performing ARIMA-BP combination modeling, the time factor needs to be taken into account in the neural network model construction. The essence of building ARIMA-BP combined forecasting is actually to use the time series model to fit the change characteristics of the forecast object in the time dimension, including seasonality, trend and other linear features. While the BP neural network for some non-linear or time series can not take into account the relationship between other factors and the forecast value in the time dimension of the change to fit. Therefore, the input variables of the BP neural network model should be the data sets of all kinds of factor variables at different times. All the input and output layer data selected in this paper are their corresponding values in a period of time, which can reflect the data changes of these variables in a continuous time period.

The second key issue in constructing the combined prediction model is the determination of the model weights. The determination of the weights is closely related to the predictive effect of the final combined model.

1) Equally weighted averaging method: in the equally weighted averaging method, the weights of n single model are equal, i.e., it can be assumed that the predictive effect of each single model is considered to be the same, and its specific calculation is shown in Equation (17): wi=1n,i=1,2,3n where i=1nwi=1 , w0 , i=1,2,3n

2) Error variance weighted average method: the weighted average method first finds the variance of the prediction error of each single model e , and then ranks them. After that, according to the principle that the weight of a single model is inversely proportional to the error variance, the larger the prediction error variance e corresponding to a single model is, the smaller the weight it occupies in the combined prediction model. The specific calculation is shown in Equation (18): wi=ii=1ni=2in(n+1),i=1,2,3n where i=1nwi=1 , w0 , and i=1,2,3n .

The use of the sum of squares of errors in this experiment is mainly used to measure the degree of fit of the time series model, BP neural network model for nonlinear fitting. The formula for the sum of squares of errors is given in (19): E=i=1ny^iyi2

E is the sum of squared errors, i is the i rd observation in the model, y is the model predicted value for the i th observation, and n is the number of observations.

The relative error is the ratio between the absolute error and the actual value, expressed as a percentage. The mean relative error is the average of the relative errors between all predicted values, which is more reflective of the model’s predictive effect than the absolute error. This indicator is not affected by the amount of data input to the model and the size of the value, which better reflects the degree of model error. The formula for the average relative error is shown in (20): E=1ni=1ny^iyiyi

E is the mean relative error, i is the i rd observation in the model, y is the number of model predictions for the i th observation, and n is the number of observations.

Combinatorial predictive modeling process

Input the data from January 2019 to December 2023 for each model and output 6 periods of demand forecast (January 2023 to June 2023) to get the combined forecast test set.

The model outputs optimal weights based on the forecast accuracy table for the optimal test set of the combined forecast test set. The optimal weights of the combination prediction model are shown in Table 1.

The optimal weight of the combination prediction model

Set Expectation Forecast Accuracy Weighted moving average Once the index is smooth Secondary index smoothing Three times the index smoothness ARIMA Combination prediction
Test set 6 MAE 256 397 251 204 224 298
Test set 6 MSE 45767 178937 79004 75232 178236 122365
Test set 6 MAPE 32% 52% 37.5% 35.1% 37.7% 21.4%
Weight 12.32% 20.41% 22.07% 0.0% 45.2% 100.0%

At this time, the prediction accuracy of the combination prediction MAPE = 21.4%, weighted moving average MAPE = 32%, ARIMA MAPE = 37.7%. Combination prediction is the optimal prediction among the above various prediction methods, and it can be reasoned that the prediction accuracy of product demand prediction can be improved by combination prediction.

Measured effects of combinatorial predictions

Because different models are built on different principles, the information that can be captured and learned may be different. Different models are like different people having different views of the same thing from different perspectives. By synthesizing the views of all people, we can see the problem in a comprehensive way. Synthesize all models to complement each other’s strengths and weaknesses. By combining the results obtained from multiple model learning in a certain way, the final prediction accuracy should be improved and a better integrated model should be obtained. Even models with lower prediction results are valuable to use. But only if the method contains independent information. Organic combination of models with low prediction results and models with relatively small errors may improve the prediction accuracy of the model.

For a product sales volume prediction problem studied in this paper, in order to improve the final prediction results, the combination of nonlinear and linear two models are combined here for prediction. The specific performance is to combine the ARIMA model based on the principle of inertia, with the BP model to form a combined prediction modeling scheme.

According to the above formula, substituting the predicted values obtained from the previous ARIMA model and BP model, as well as other required parameters, the results of the combined model prediction are shown in Figure 7.

Figure 7.

Portfolio model week sales forecast

The figure shows the actual sales value of the product from January to November 2023, the combined ARIMA-BP model prediction value, the percentage of absolute error between the combined prediction value and the actual value, and the percentage of relative error, respectively. From the relative error percentage, it can be seen that the relative error of the combined prediction model consisting of ARIMA model and BP model for the product prediction value of January-November 2023 are less than 8%, and the prediction results have the value of participation.

The results of the deviations predicted by the ARIMA model, the BP model, and the combined model constructed from ARIMA and BP were summarized and compared in terms of error. The three model error comparisons are shown in Figure 8. From the results, it can be found that from the individual prediction results, all are greater than the combination prediction results. While the overall average error is 9.045% for ARIMA model, 7.233% for BP model, and 2.948% for the combination model, the combination model reflects better results.

Figure 8.

Three model error comparisons

The so-called combination forecasting means combining various forecasting methods, utilizing the information provided by different methods, and obtaining the combination forecasting model with appropriate weighted average. In this paper, ARIMA sales forecasting model is combined and optimized, combined with the advantages of BP model, to construct ARIMA-BP combination forecasting model. Through the use of ARIMA-BP combination prediction model to analyze the consumption behavior of network consumers, and after the evolution of scientific reasoning, more effective information is obtained so as to carry out accurate marketing. Through the scientific leadership of rational consumer consumption and interactive communication with consumers, it enhances the experience of consumers in the process of consumption, creates the social effect of consumption, discovers more potential consumers, and extends the marketing chain of intelligent marketing. Utilizing the closed-loop marketing system created by smart marketing, it strengthens the precipitation and reuse of data to create the long-lasting value of the product or brand.

Conclusion

In this paper, the ARIMA model is used for product sales prediction, and the sales prediction results of the ARIMA model are obtained by combining the fitting and validation of the ARIMA model. In order to narrow the relative error value of product sales prediction and meet the demand for accurate marketing of products in the era of big data, the ARIMA model is combined with the BP model to form a combined prediction model of ARIMA-BP.

The modeling process analysis of the combined prediction model shows that the MAPE of the ARIMA-BP model is 21.4%, which is better than the MAPE value of the single model. This shows the forecasting advantage of the combined ARIMA-BP model. The ARIMA-BP combined forecasting model is also used to forecast the sales volume of the product from January to December 2023, and the relative error results obtained are compared with those of the ARIMA model.Comparison of the relative error values of the ARIMA model, the BP model, and the combined model shows that the forecasting results of the combined forecasting model are more similar to the actual sales values and are more informative.The BP model is added to optimize the forecasting results of the single model (ARIMA-BP). The addition of BP model optimizes the limitations of a single model (ARIMA model) in sales forecasting, and meets the development needs of constant updating and conversion in the data era.

Język:
Angielski
Częstotliwość wydawania:
1 razy w roku
Dziedziny czasopisma:
Nauki biologiczne, Nauki biologiczne, inne, Matematyka, Matematyka stosowana, Matematyka ogólna, Fizyka, Fizyka, inne