Application and Accuracy Improvement of Big Data Analytics in Market Demand Forecasting in Tourism Economy 
Online veröffentlicht: 21. März 2025
Eingereicht: 09. Nov. 2024
Akzeptiert: 14. Feb. 2025
DOI: https://doi.org/10.2478/amns-2025-0586
Schlüsselwörter
© 2025 Yonghe Yang, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Tourism is a large and growing industry that plays an important role in driving economic development. In the post epidemic era, tourism has reached a new high.
The tourism economy contributes significantly to economic growth. First, it provides employment opportunities [1-4]. Tourism is a labor-intensive industry, which can provide a large number of employment opportunities. Whether directly engaged in tourism services, or closely related to the tourism industry, such as hotels, restaurants, transportation and other fields of employees, can be from the tourism industry to obtain stable employment opportunities. Secondly, it enhances the added value of the economy [5-7]. Tourism can drive the development of many related industries, such as transportation, catering, retail and so on. The development of these industries will increase the total output value and wealth creation of the national economy and promote economic growth. Next, it increases foreign exchange earnings [8-10]. Tourism usually attracts a large number of foreign tourists to spend money. The consumption behavior of foreign tourists will bring a large amount of foreign exchange income and increase the country’s foreign reserves. Furthermore, it promotes regional economic balance [11-13]. The development of tourism can attract tourists to visit a specific region, promote the development of regional economy, and improve the economic pattern between regions. This has positive significance in promoting regional economic balance. Finally, the promotion and protection of cultural heritage [14-16]. The development of tourism requires the protection and inheritance of rich cultural heritage. Through the promotion of tourism, these cultural heritages can be better protected and inherited, further enriching the diversity of human civilization.
Tourism market has the plurality of supply, diversity of demand and uncertainty of market demand [17-20]. The supply of tourism market includes transportation, accommodation, catering, scenic spots and other fields, forming a complex industrial chain. Collaboration and competition among different suppliers promote the development of the tourism market. Travelers have different needs, some pursue leisure and relaxation, some like to explore and adventure, and some are fond of history and culture. This diversity prompts enterprises to provide a variety of tourism products and services to meet the needs of different travelers. And since tourism demand is affected by a variety of factors, such as policy, weather, vacation, etc., there is uncertainty in tourism market demand. Tourism enterprises need to respond flexibly to the uncertain environment in order to improve their market competitiveness.
For the dynamic and time-sensitive tourism industry and the hospitality industry for tourist accommodations, both need market intelligence for their promotion, and predicting customer movements is the main direction, which can improve the competitive advantage [21]. Literature [22] mentions that search engines in internet data are the most used in tourism forecasting because engines are more aware of the user’s desire to know a certain information at that time. In addition, in 2017, literature [23] proposed to build a big data-based tourism forecasting framework in the hope of bringing a shift in the tourism business situation. Literature [24] analyzed and predicted the daily passenger flow of an attraction by using the long and short-term memory network of big data to synthesize multiple sources of data such as historical visitor numbers, engine platforms, and weather, etc. The method provided a reference and a preparation plan for the management of the attraction’s reception, and provided a reference for the safety of tourists and the itinerary arrangement. Literature [25] Big data can effectively predict the demand for cruise tourism in China, reducing the financial risk of multiple parties such as ports, investors, promotional activities, and cruise ships. Literature [26] used big data to give timely and granular data on the decline in travel rate of tourist flights under the epidemic, which provided ideas for marketing strategies for tourist local airlines. And literature [27] utilized good big data to predict the number of people in tourist places, proposed the mutual prediction ability between web engine search results and prediction results, predicted the final number of people more accurately, and improved the management and marketing strategy of tourist places. Literature [28], on the other hand, predicted the number of tourists by analyzing the data from three platforms, Baidu, Ctrip, and GoWhere.com, either by single or multi-source analysis, and the results showed that multi-source analysis was better than single analysis. Both studies provide reference directions regarding the accuracy of big data in predicting market demand in the tourism economy.
In this paper, we first summarize four market demand forecasting analysis steps based on tourism consumption behavior to construct a forecasting framework for tourism market demand. For the time series data of the number of tourists, dynamic time regularization is used to carry out the time series similarity measurement, and the absolute level and dynamic change distance are combined to measure the similarity between the number of travelers in each country. Aiming at the shortcomings of ARIMA in extracting relevant information from time series data, a seasonal product tourism demand forecasting model was constructed, and the inbound tourism market of Xiamen City, Fujian Province, was forecasted modeled and analyzed through difference operations and non-stationary time series analysis. The integrated distance method was selected to cluster the tourism data, and the different characteristics of each source were analyzed. The seasonal fluctuation of tourism data was reduced by taking logarithmic and seasonal differencing, and the forecasting model was determined by adjusted R-squared value, AIC, and MAPE, and the model was utilized to forecast the tourist arrivals in Xiamen City.
To study the relationship between tourism big data and tourism market demand, we can start from the research framework of tourism consumption behavior.
Tourism consumption behavior refers to the behaviors and activities of people consuming tourism products or services (mainly including the six links of food, housing, transportation, tourism, shopping and entertainment), and the research framework of consumer tourism consumption behavior can be derived by analogy with the research framework of consumer consumption behavior (i.e., the personal consumption behaviors of people purchasing means of living). In general, the study of consumer consumption behavior allows for the identification of various factors that influence consumers’ decisions before and during the purchase of goods or services. It is generally believed that the formation of consumer behavior consists of a combination of back-and-forth and interacting activities, including relevant activities carried out in a series of processes from the actual post-purchase evaluation to the confirmation of demand.
The first part is to collect information, refers to the consumer through the friends and relatives around the recommendation, mass media publicity or through personal experience and other methods to collect information related to the product. Rational decision-making can not be separated from information, tourism purchasing decision is even more so, generally speaking, the amount of money spent on tourism consumption is large, so consumers will want to more comprehensive understanding of tourism goods or services information, from a variety of channels to collect tourism information, especially in the wan network, mobile Internet is highly developed today, the amount of information explosive growth, has been changed from the previous difficult to collect information to the present day to get the effective information is difficult. In addition, the unavailability of tourism goods and services is a major factor. In addition, the characteristics of tourism goods and services such as non-storability, non-transferability, inseparability of production and consumption, and large demand elasticity lead to high risk in tourism consumption, and therefore, consumers also prefer to reduce this risk through comprehensive information collection.
After comprehensive information collection, the second part is assessment of choice, which refers to analyzing and weighing the information obtained and making preliminary choices. Consumers’ evaluation is based on the information collected in the previous part, mainly comparing and choosing tourism goods or services and making personal value judgment. Consumers’ personal value judgment of tourism goods or services varies because there are many influencing factors, including price, quality, time, location, and so on. After the consumer’s evaluation is completed and the most satisfactory tourism goods or services are selected, the next part of the purchase decision, which refers to the final expression of the consumer’s intention to purchase, will occur naturally. Either joining a tour group, or purchasing an air ticket, or buying a boat ticket, or completing a hotel room reservation.
The third part of the ending of tourism is the evaluation of post-purchase consumption effect, including post-purchase satisfaction and attitude towards whether to re-purchase, a good experience of tourism goods or services can make consumers produce positive evaluation and word-of-mouth publicity, while a poor experience will have a negative evaluation. Especially in the role of the Internet and mobile Internet, people share the experience more conveniently, and this publicity and evaluation effect will be big.
The fourth part is to confirm the demand, which means that consumers have some kind of demand due to their own feelings or external stimuli. Consumers’ demand for tourism is not created out of thin air, but due to some short-term or long-term reasons or stimuli short-term such as the introduction of friends and relatives, seeing the photos of tourism sharing in the circle of friends, the tourism information published in newspaper advertisements, and so on. In the long term, such as planning for a future relaxation period. With the intrinsic or extrinsic travel demand triggers, the consumer’s travel purchase decision is formed.
To sum up, from the perspective of consumer behavior and people’s habit of using the Internet in general nowadays, all the five parts of tourism consumption behavior will leave traces on the Internet and form Internet search keywords. Therefore, this paper will construct an analytical framework for the interconnection between the keyword network search index and tourism market demand in tourism big data to elucidate the correlation between the two and lay the foundation for the next stage of tourism market demand forecasting. The analytical framework is shown in Figure 1.

Detailed framework of consumer travel consumption behavior
Cluster analysis is an unsupervised classification process that can be applied without a priori knowledge, and plays an extremely important role in data analysis, pattern recognition, detection of outliers and outliers, and refinement services. Using suitable clustering methods, data can be analyzed and studied at a deeper level to find the hidden patterns within the data.
The K-means algorithm is a classical clustering algorithm that minimizes the similarity measure between points within each cluster and maximizes the similarity measure between clusters. The steps of the K-means algorithm are as follows: first, the number of clusters K is set artificially by the writer or user of the algorithm, and then the initial K center-of-masses locations are randomly generated, and the distance from each point to each center-of-mass is calculated by choosing the appropriate distance measure and traversing the clusters to the nearest center-of-mass. Each point is assigned to the center of mass closest to its distance according to the distance minimization rule, so that each point is assigned to a cluster. However, since the initial K center-of-mass positions may not be optimal, the centers of each cluster need to be constantly updated for optimization purposes. The method of updating the center of mass position is to find the mean value of each point in the cluster, and the position where the mean value is located is the new cluster center of mass position, and after continuously finding the mean value to continuously update the cluster position, until the center of mass of the cluster does not change the magnitude of the change reaches the optimization criterion, the algorithm is over, and the final clustering result is the K clusters formed by K clusters with the center of mass that no longer changes [29].
For the K-mean clustering algorithm, the distance metric can be chosen in various forms, such as the Euclidean distance, Manhattan distance, Minkowski distance, and so on. Which distance method is used in practical applications should be selected according to the characteristics of the actual application. Since this paper is for time series to cluster, distance metrics such as Euclidean distance are not very suitable, so Dynamic Time Warping (DTW) is used for the measurement of time series similarity.
Dynamic Time Warping (DTW) is an algorithm that uses the idea of dynamic programming to sequentially match the similarity points between two time series, and uses the sum of the distances of all the similarity points to determine the similarity of the two time series, and the smaller the total distance is, the higher the similarity is [30].
Suppose there are two time series 
Construct the regularized distance matrix WD to record the cumulative distance by the idea of dynamic programming, matrix element 
Finally, the points of the regularized path can be found by backtracking on the regularized distance matrix WD 
In this paper, we construct the following statistics to measure similarity between individuals.
 The full-time “absolute horizontal” distance between Individual  The full-time “dynamic change” distance between Individual  where Δ The “combined” distance between individual 
The composite distance is 
Standard deviation of the distances of changes 
The systematic clustering method is one of the most commonly used cluster analysis methods, and its clustering process depends on the definition of the distance between individuals and the distance between classes. The steps of the systematic clustering method are as follows: first, each sample of the clustering is treated as a class, then the similarity statistic between classes is determined, the closest two classes or a number of classes are merged into a new class, and then the aggregated subclasses are merged again according to their interclass distances, and the given data samples are disaggregated layer by layer until all the samples are merged into a single class. Using this type of clustering method a clustering tree consisting of data samples can be obtained with the characteristic of stopping the clustering division at any time [31]. The main interclass distances commonly used in practice are: the shortest distance method, the longest distance method, the middle distance method, the center of gravity method, the class average method, the variable class average method, and the Ward method. In this paper, Ward’s method is used as a measure of inter-class distance.
In real life, most time series data, especially quarterly or monthly data, usually have strong seasonality. There exists a more complex interaction between the long-term trend, seasonal fluctuations, and stochastic fluctuations in the series, and the common ARIMA model is not sufficient to extract the relevant information in the time series data. Therefore, the ARIMA model is extended to construct the seasonal product model, also known as the seasonal difference autoregressive sliding average model.
For time series { ∀ ∀ ∀
Smoothness in time series analysis generally refers to wide smoothness, which is also known as weak smoothness or second-order smoothness. A series that does not satisfy the smoothness condition is called a non-smooth series. The smoothness test is the basis and premise of time series modeling, and its test methods are mainly the following two.
 Graph test method Time series chart refers to the construction of the time as the horizontal axis, the sequence of values for the vertical axis of the plane of the two-dimensional coordinate chart, which can intuitively reflect the basic distribution characteristics of the time series. When the time series of a time series is always around a constant value of random fluctuations, and its fluctuation range has more obvious boundaries, then the time series is usually a smooth time series. On the contrary, if the time series plot shows more obvious periodic characteristics or trend characteristics, then the series is usually non-semi-stable. To be on the safe side, autocorrelation plots should be used to further assist in identification after observing the time series plot. An autocorrelation plot is a planar two-dimensional coordinate hanging line plot in which the horizontal axis represents the autocorrelation coefficient and the vertical axis represents the number of delay periods, while the magnitude of the autocorrelation coefficient is represented by the hanging line. Usually, a smooth time series has short-term correlation, which means that the autocorrelation coefficient  Hypothesis testing The use of graphical tests to determine the smoothness of the time series is highly subjective, the use of hypothesis testing can overcome this limitation. In various types of statistical methods to test the smoothness of the series, the most widely used is the ADF unit root test. The principle is as follows: For any 
Its characteristic equation is:
Time series {
Assuming Order 
Its test statistic is:
where 
In time series analysis, the first step in analyzing time series observations, regardless of the method of analysis, is to take effective means to extract the deterministic information embedded in the time series observations.
From Cramer’s decomposition theorem, it can be seen that all time series can be decomposed into two parts, i.e., a deterministic trend determined by a polynomial and a smooth zero-mean error:
where {
Performing the 
It can be obtained after conversion:
From equation (17), it can be seen that the essence of 
For example, 1st order differencing is the operation of subtraction between two time series values separated by one period, i.e.: ∇ The 
where ∇
ARIMA model Difference operation has a strong ability to extract deterministic information, most of the non-smooth time series after difference operation will show the nature of smooth time series. If the series  The ARIMA model can be abbreviated as:
where  Seasonal product model (SARIMA) Seasonal product model is the original non-stationary time series for differential transformation and then seasonal differential and through the smoothness test after the establishment of the model. The fitted model is actually the product of 
In the formula:
The seasonal product model can be simply notated as 
In this paper, based on the number of inbound tourists received in Fujian Province from 2014 to 2019 in Fujian Statistical Yearbook 2019 as statistical information, the inbound tourism source market in Fujian Province is clustered and analyzed, and the comprehensive distance method is selected for clustering. Table 1 shows the number of tourists from each region of inbound tourism in Fujian Province. It is not difficult to find that the number of inbound tourists from Japan and South Korea, China, Hong Kong and Taiwan are at the top of the list, both exceeding 400,000 people.
The number of visitors to the tourist area in Fujian Province
| Country | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 
|---|---|---|---|---|---|---|
| Japan(1) | 162902 | 202580 | 262992 | 223202 | 348256 | 422070 | 
| South Korea(2) | 89334 | 121731 | 216002 | 202790 | 297193 | 454552 | 
| Malaysia(3) | 40004 | 48655 | 99031 | 81375 | 151483 | 177845 | 
| United States(4) | 65679 | 72984 | 90742 | 75071 | 129820 | 176730 | 
| Singapore(5) | 32430 | 37698 | 69847 | 57267 | 99759 | 108600 | 
| Thailand(6) | 20452 | 26852 | 48884 | 32991 | 67780 | 78972 | 
| German(7) | 23737 | 27613 | 30073 | 27600 | 55078 | 71956 | 
| Italy(8) | 15345 | 21770 | 25838 | 26992 | 44086 | 62361 | 
| France(9) | 18096 | 22111 | 28512 | 22391 | 41139 | 55919 | 
| Indonesia(10) | 14788 | 18034 | 22338 | 24002 | 37792 | 42593 | 
| Australia(11) | 13068 | 16741 | 19630 | 20180 | 34600 | 45941 | 
| England(12) | 14154 | 16336 | 20863 | 19936 | 30685 | 43617 | 
| Chinese Hongkong(13) | 180728 | 221070 | 265659 | 289380 | 409393 | 451129 | 
| Chinese Taiwan(14) | 271175 | 399666 | 524182 | 427180 | 533778 | 640391 | 
The above data were inputted into SPSS statistical software, and the clustering results are shown in Figure 2. Using the shortest distance method to analyze the clustering of inbound tourism source country markets in Fujian Province, the following conclusions are drawn:
 In the categorization of 14 source countries from 2014 to 2019, it is obvious that the source market of Taiwan, China is a separate category. Taiwan’s proximity to Fujian Province, the fact that there are many Taiwanese whose ancestry is from Fujian, the proximity in terms of geography and bloodline, and the fact that there are many Taiwanese who have been investing and doing business in Fujian in recent years, all of these have laid a good foundation for Taiwan to become an important source of tourists in Fujian. At present, Taiwanese tourists coming to Fujian mainly visit their relatives, sightseeing and business tours. Japan’s source market and Hong Kong, China’s source market for a class. Japan has long been a major overseas source market for Fujian Province, and the situation in China as a whole is roughly the same. The rapid development of the Japanese economy after the war and the increase in residents’ leisure time makes outbound tourism become a symbol of Japanese fashion and high quality of life. The spatial proximity and cultural similarity have made more and more Japanese tourists choose China as one of the major overseas tourist destinations. Fujian Province, as a more economically developed region in mainland China, is particularly favored by Japanese overseas tourists for its beautiful natural and humanistic environment. Hong Kong, China has also been one of the key inbound source markets for Fujian Province, and its growth trend is in line with that of Japan. The Mazu Cultural Festival held in Fujian has expanded the influence of Fujian’s tourism destination, in addition to Hong Kong and Macao tourists are interested in Gulangyu Island and other famous attractions. South Korea’s source market is a category. Korea has developed close relations with China, with very frequent economic and cultural exchanges, and the inbound tourism market has been maintaining a rapid growth trend. To Fujian, Korean tourists basically to business tourism, in addition to Fujian and South Korea in the economy, cultural exchanges continue to increase, study, exchanges, etc. has become a fast-growing source of customers. From the statistics, it can be seen that South Korea, as an emerging source country for inbound tourism in Fujian Province, has a huge growth rate and potential, and is the top priority for Fujian Province in the future in terms of publicity and marketing. The U.S. source market and the Malaysian source market are in one category. The United States is currently the world’s largest source of one of the exporting countries, followed by Japan, South Korea is the third largest source of Fujian Province. The U.S. visitor source market in Fujian is relatively stable and has been maintaining a steady growth trend. U.S. tourists traveling to China are mainly for sightseeing and leisure, business conference tourism, but also visiting friends and relatives, the U.S. tourists to Fujian, sightseeing is still the main purpose, followed by business activities and the proportion increased significantly. Malaysia’s source market as the emerging market of inbound tourism in Fujian Province, its development potential should not be ignored. Malaysia has a large population base and a large number of middle-class people. Like the U.S. source market, the two source markets maintain equally important growth potential. Thailand source market and Singapore source market for a class. Southeast Asia is the traditional source of tourism in China. Singapore has a very good investment base in Fujian area, every year Singapore to Fujian to a variety of study tours more, there are economic and trade visits, science and technology and cultural exchanges, and the Singapore Buddhist Association to Xiamen and other places every year exchanges, etc., the local customs of southern Fujian is also quite attractive to Singaporeans. Thai tourists to Fujian age span, the purpose of the main tour of Xiamen and Putian commodity markets, trade activities in recent years has become an important motivation for tourism. These two source markets are grouped together because of their equal geographic location and the trend of tourism to Fujian. Germany, Italy, France, Indonesia, Australia, Britain and other source markets are categorized. Tourists from Western European countries come to China mainly for business and sightseeing vacation. Italy in recent years, the number of tourists in Min has been a steady growth trend, the market outlook is favorable, tourists to business casual visitors mainly France in recent years, the number of tourists in Min has also grown faster, the United Kingdom is the largest number of people in the European region to travel to China, but to Min, the United Kingdom tourists are relatively small. Germany is also a market worthy of vigorous tourism promotion. The Australian source market is an emerging source of tourism in China. According to the forecast of the Australian Tourism Forecasting Council, the average annual growth rate of Australian travelers to China will be higher than that of any other tourist destination, which is a market with great potential. It can be said that the Western European and Australian source markets are a newly emerging category of source markets in Fujian Province, and the number of tourists from these source markets will not grow rapidly in the short term, but will have potential to be explored in the longer term.

Cluster spectrum
The data selected for this paper are the monthly data of Xiamen tourist trips from January 2014 to May 2019 (data source: data published by Xiamen Tourism Network). The original data series of tourism attendance is shown in Figure 3.

The original data series of the tourist
Analyzing the time series characteristics of the data, the data is obviously non-stationary and seasonal, accompanied by certain cyclical fluctuations, and there are obvious peak seasons in a year, with peaks around April to May, July to August, and October.
In order to eliminate the trend and at the same time reduce the fluctuation of the sequence, the logarithm of the original sequence is taken and the sequence is named ly, whose time series is plotted in Fig. 4, and it is found that the sequence is still not smooth.

Sequence diagram of the logarithm of the tourist
The first order difference is done on the sequence ly, the sequence is named ily, and its autocorrelation and partial autocorrelation analysis is shown in Fig. 5. From the figure, it can be seen that the trend of the series is basically eliminated, but when k=12, the sample autocorrelation coefficient and partial autocorrelation coefficient of the series are significantly not 0, which indicates the existence of seasonality.

Self-correlation and partial correlation data of time series ily
Seasonal differencing is done on the sequence sily to obtain the new sequence sily, and the autocorrelation and partial autocorrelation analysis of the sequence sily are plotted, as shown in Fig. 6. The sample autocorrelation coefficients and partial autocorrelation coefficients of the sequence sily quickly fall into the random interval, so the sequence trend has been basically eliminated.

The self-correlation and partial correlation data of sily of the seasonal difference
The unit root test is a formal method for testing the smoothness of a time series, and to further test whether the sequence sily is smooth or not, the ADF unit root test for the sequence sily is performed. Table 2 shows the results of unit root test for the sequence sily. The value of the t-statistic of the test is -7.86162, which is smaller than any critical value with a significance level of 1%, so the original hypothesis is rejected and it is concluded that there is no unit root in the sequence, and therefore the sequence sily is smooth.
Unit root test of sily sequence
| Test result | t | p | 
|---|---|---|
| Critical value | -7.86162 | 0.000 | 
| -3.51473 | ||
| -2.88265 | ||
| -2.59831 | 
In the “identification” stage of the model, we find that after the first-order logarithmic period-by-period differencing, the period of the sequence is basically eliminated, so 
The Akaike’s Criterion of Minimum Information (AIC) from the Best Criterion Function Order Fixing method was used to determine the order of the models. The relevant test results of the four selected models were summarized and Table 3 shows the comparison of the test results. After calculation, all four models satisfy the smooth condition and reversible condition of ARMA process, and the model setting is reasonable. In addition, the concomitant probability of the white noise test of the residual series shows that the residuals of each model satisfy the independence assumption and the models fit well. Comparing the test results of Table 3: Individual Models compared to each model, the first model (2, 1) has the largest adjusted R-squared value (0.83085), the smallest AIC and MAPE, and a smaller SC value. Thus the selection of the first model i.e. ARMA (2, 1, 1) (1, 1, 1) model is appropriate.
Test results of each model
| ( | Adjusted R2 | AIC | SC | p-Q | MAPE | 
|---|---|---|---|---|---|
| (2,1) | 0.83085 | -3.24701 | -3.29563 | 0.893 | 4.57 | 
| (2,0) | 0.80135 | -3.62486 | -3.36381 | 0.736 | 6.01 | 
| (1,1) | 0.80279 | -3.36972 | -3.13434 | 0.854 | 5.19 | 
| (1,0) | 0.82164 | -3.43296 | -3.31292 | 0.892 | 4.95 | 
Based on the identification and selection of the model above, we choose ARMA (2, 1, 1) (1, 1, 1)1 as our best prediction model, and estimate the parameters of the model and the correlation test results of the model are shown in Table 4. The results show that the parameter estimates of the model ARMA (2, 1, 1) (1, 1, 1) are statistically significant. The prediction model is: (1 − 0.2046
Model parameter estimation and relevant test results
| Variable | Coefficient | Se. | t | p | 
|---|---|---|---|---|
| AR(1) | -0.51131 | 0.19472 | -3.11772 | 0.01832 | 
| AR(2) | -0.19991 | 0.21130 | -1.71485 | 0.08535 | 
| MA(1) | -0.47858 | 0.19733 | -2.59064 | 0.01897 | 
| SAR(12) | -0.20460 | 0.20054 | -2.32730 | 0.04667 | 
| MA(12) | 0.88047 | 0.09894 | -9.10831 | -0.00464 | 
| R2 | 0.87094 | The mean of the dependent variable | -0.01129 | |
| Adjusted R2 | 0.80094 | The standard deviation of the dependent variable | 0.08314 | |
| Regression standard error | 0.01898 | Red pool information (AIC) | -3.63257 | |
| Residual sum of squares | 0.01663 | Schwarz information (SC) | -3.37128 | |
| Logarithmic likelihood ratio | 78.67004 | D-W statistics | 2.03874 | |
The above model is used to forecast the tourist trips in Xiamen. Among them, the prediction results from 2014 to 2019 are shown in Figure 7. It can be seen that the predicted value is basically consistent with the actual value, and the results show that the model in this paper has a good fitting effect.

Prediction of the number of tourists in Xiamen city
Using the above model, the predicted value of tourist arrivals to Xiamen for June-December 2019 is then given, as shown in Table 5. Comparing with the actual data, the situation is also basically consistent. Therefore, the model in this paper has obvious reference value for predicting and analyzing the tourism reception of tourist destinations.
Forecast results for the number of visitors from June to December 2019
| Month | Real value | Predicted value | 
|---|---|---|
| June | 113.77 | 113.91 | 
| July | 149.09 | 149.04 | 
| August | 167.51 | 167.51 | 
| September | 147.64 | 147.46 | 
| October | 210.73 | 210.81 | 
| November | 115.19 | 115.18 | 
| December | 102.37 | 102.40 | 
In view of the current difficulties in predicting the demand of the tourism market, this paper carries out systematic clustering of tourism big data through the comprehensive distance method, and constructs a seasonal product model that can predict the market demand based on the clustering results. Taking the tourism data of Fujian Province from 2014 to 2019 as an example, Taiwan, China is the most obvious and independent category of travelers in Fujian Province, while the Australian source has a great potential of tourism market. Significantly smooth time series data (p<0.01) were obtained by logarithmic processing and seasonal differencing, and the ARMA (2, 1, 1) (1, 1, 1) model was chosen for tourism demand forecasting. Between January 2014 and May 2019, the predicted value of the number of travelers to Fujian is basically consistent with the real value, and it also has a good predictive performance for June-December 2019 travelers in Xiamen City. So it can be considered that the prediction accuracy of the market demand of tourism economy can be better improved by the clustering processing of tourism data and the seasonal product model.
