Combining big data technology to study the geographical distribution characteristics of tourism consumption behavior

In recent years, China’s consumption structure has undergone significant changes, especially since the implementation of double holidays, golden weeks and paid vacations. People’s leisure and recreation time has gradually increased, their daily lifestyles and contents have become richer and richer, and the opportunities for outbound travel have also increased; thus, tourism has become an important part of the daily life of urban residents [1-2]. However, along with the development of urbanization, the fast pace of urban life and the heavy pressure of urban life have made people’s tourism consumption behavior change greatly in form, content and purpose compared with the past [3-4]. In general, people’s total demand for tourism has expanded, the opportunities for tourism have increased, the spatial scope of tourism has been extended, and the consumption behavior of tourism has become more rational. In a research report on the tourism consumption of the new middle class in urban China, it is pointed out that people’s tourism consumption behavior is changing from original sightseeing to leisure, and it also brings challenges to the tourism industry in terms of tourism marketing and service [5-6]. Under the situation that the demand of the tourism market tends to be refined, it is of great significance to grasp the characteristics of travelers’ tourism consumption behavior to promote the development of the tourism industry [7-8].

Staying close to tourism consumers, deeply understanding their new needs and new ways of tourism consumption, integrating new market resources, and planning new product development and operation modes should become the basic work for the tourism industry to seek sustainable development and high-quality development. Miah, S.J et al. mainly focus on the expansion of strategic decision support based on social media-generated big data in the field of tourism and show that the approach is universal and can be further discussed in terms of what adaptation problems and solutions may exist when applied in other domains or different types of big data streams [9]. Han,Q et al. proposed and validated Tourism2vec as a novel and effective method to help people deeply understand and analyze tourism behaviors, which not only can help us discover the laws hidden behind the data but also can portray the destination more accurately and comprehensively, in addition, it can help to improve the traditional administrative zoning methods to achieve more scientific and accurate tourism planning and promotion [10]. Salas-Olmedo,M.H et al. indicated that when exploring urban tourism behavior, people need to comprehensively utilize various types of data sources with rich and diverse origins, and by analyzing the digital footprints provided by different data sources, we can more accurately depict the trajectories of tourists’ actions and spatial distribution characteristics in the city, and carry out more detailed and comprehensive different types or functions of regional research [11]. Song, H et al. indicated that with the help of powerful and efficient big data technology, it is possible to precisely monitor and analyze all kinds of travel activities on a global scale in the future and adjust the strategic planning and resource allocation according to the results obtained, which will further promote the world tourism industry to move forward in the direction of intelligent and sustainable development [12].

This paper divides tourism consumption activities into tourism consumption subjects. Media has three parts and attributes the factors affecting tourism consumption to four aspects: tourism consumption level, people’s living standard, economic development level, and tourism infrastructure construction level, so as to select the variable factors affecting the emergence of spatial differences in tourism consumption. Spatial and economic distances are used to analyze the spatial weights. The SLM model and SEM model are proposed, and the great likelihood method is used to estimate the parameters of SLM and SEM. Global and local spatial autocorrelation tests are conducted in conjunction with the 2010-2022 sample data to validate the parameter estimation of the SLM model and SEM model. The degree of correlation between each influential factor in different models is discussed.

2

Establishment of a system of indicators

2.1

Subjects and Objects of Tourism Consumption

Tourism is a complex activity that functions as an economic activity and benefits from both spiritual and cultural aspects.Tourism consumption is a cross-overlapping behavior of tourism and consumption activities, an integral part of total social consumption, which is hierarchical, structured, conditioned, and attributed to higher documented consumption behaviors.It is a form of enjoyment consumption, as well as a form of development and investment, and the constituent factors of tourism consumption are very complex.The normal market activity for tourism consumption is composed of three parts: tourism consumption subject, tourism consumption object, and tourism consumption media.

The so-called subject of tourism consumption refers to the person who leaves their place of residence to visit other places and stays there in order to achieve travel excursion and some other purposes and carry out non-remunerated activities. The subject of tourism referred to here must be a tourist who has a desire to travel and a preference for a certain tourism product, a tourist who has leisure time, a tourist who has financial security and a tourist who has considerable physical support.

Tourism consumption can be divided into three categories: inbound tourists, outbound tourists, and domestic tourists.

Tourism consumption object refers to the tourism consumption object, which has the attributes of general commodities but is different from general commodities, is a special commodity. The tourism object here refers to the object of tourism consumption, i.e., tourism products and by virtue of the object, i.e., the hardware and software of food, lodging, traveling, touring, purchasing and entertainment, and the comprehensive products through the combination of services. The purchase of tourism products refers to the process of purchasing the right to enjoy, including the purchase of ownership (such as food and souvenirs) and the purchase of ownership without the right to enjoy (such as travel, accommodation, entertainment, tourism products, etc.).

Tourism consumption media refers to intermediary organizations and enterprises serving tourism, including travel agencies, intermediaries, trade associations, and so on. The tourism media in this context refers to operators and intermediaries of tourism products and goods, including destinations and objects.

2.2

Variables affecting spatial differences in tourism consumption

2.2.1

Variable selection

Exploring the causal relationship between several variables using observed data is the basis of regression analysis.Subsequent statistical inference and analysis are only meaningful if the relationship between the dependent and independent variables is correctly formulated. However, in the absence of a clear theoretical relationship, it is uncertain which of the regression moduli to choose as the independent variables. Variable selection, as an important means of screening the independent variables, not only improves the prediction accuracy and enhances the interpretability of the model but also reduces the cost of action for the application workers and avoids unnecessary losses.

This section is only based on the linear regression model for elaboration. The model is as follows: (1) $y = X β + ε$ Where y = (y₁,⋯,y_n)′ is the observed value of the dependent variable yes, X is a n×k-dimensional design array, β = (β₁,⋯,β_k)′ is a vector of regression coefficients, and ε is the observation error, which satisfies the condition of being independently and identically distributed, E(ε) = 0, Var(ε) = σ²I_n.

What are the effects of variable misselection on estimation? And how does variable selection improve the predictive accuracy of the dependent variable? For simplicity, model (1) is used as an example, and the standard deviation of the error term σ = 1 is assumed.

Let X = (X₁,X₂), β = (β₁′,β₂′)′, where X₁ is a n×s-dimensional matrix, X₂ is a n×(k–s)-dimensional matrix, β₁ is an s-dimensional vector, and β₂ is a k–s-dimensional vector, the full model (1) is written in the following form: (2) $y = X_{1} β_{1} + X_{2} β_{2} + ε$

And the alternative model is: (3) $y = X_{1} β_{1} + ε$

For the full model (2), the least squares estimate of the regression coefficient β₁ is: (4) $β = (β_{1}', β_{2}')' = {(X' X)}^{- 1} X' Y$

For the alternative model (3), the least squares estimate of the regression coefficient β₁ is: (5) ${\bar{β}}_{1} = {(X_{1}' X_{1})}^{- 1} X_{1}' Y$

If the full model (2) is correct, then there are E(β) = β, Var(β) = (X′X)⁻¹, E(β₁) = β₁+(X₁′X₁)⁻¹X₂′β₂, Var(β₁) = (X₁′X₁)⁻¹.

From the above, it can be seen that as long as β₂ ≠ 0, β is biased estimation, its variance can not be used as a measure of β estimation accuracy, a more reasonable criterion can be measured by the mean square error $M S E ({\bar{β}}_{1}) = E {(β_{1} - β_{1})}^{'} (β_{1} - β_{1})$ and the calculation can be obtained: (6) $M S E (β_{1}) = {({X^{'}}_{1} X_{1})}^{- 1} + {({X^{'}}_{1} X_{1})}^{- 1} {X^{'}}_{2} β_{2} {β^{'}}_{2} X_{2} {({X^{'}}_{1} X_{1})}^{- 1}$

The above derivation shows that choosing the wrong model can result in biased parameter estimates. Furthermore, it can be shown that if Var(β₂)–β₂β₂′ is a semi-positive definite matrix, then Var(β₁)–MSE(β₁) is also semi-positive definite, indicating that fewer variables can lead to an improvement in estimation accuracy. Therefore, if variable (β₂ ≈ 0), which has a very small effect, is present in the model and can be removed using the relevant method, the resulting estimates will not only be less biased but also have improved accuracy.

In summary, it can be concluded from the discussion above that the wrong selection of variables will produce biased estimates and predictions, especially the wrong selection of significant variables will have a larger bias, making the subsequent statistical inference unreliable. However, when the effects of variables are very small or absent (coefficients close to or equal to zero), not selecting these variables not only results in less bias in estimation and prediction but also improves the accuracy of estimation and prediction. Therefore, it is necessary and meaningful to find a suitable method for variable selection.

2.2.2

Selection of variables

Combined with the characteristics of China’s development, the geographical environment is complex and diverse, with great differences between the north, south, east and west, and different levels of economic development, so the spatial differences affecting the level of tourism consumption as well as the influencing factors should be considered from various aspects. Selecting the representative influencing factors among them can help us understand which counterfactuals play a facilitatory role and which factors have an inhibiting effect. Based on the relevant literature, the factors affecting tourism consumption are finally attributed to four aspects: the level of tourism consumption, people’s living standards, the level of economic development, and the level of tourism infrastructure construction. In order to make the data without the influence of heteroskedasticity, all the data were processed by taking a logarithm.

According to the previous theoretical analysis, this paper takes the mean value of the comprehensive score of tourism consumption as the explanatory variable, the per capita tourism consumption, the consumption level of the people, and the number of A-grade scenic spots as the explanatory variables, and the per capita GDP, the number of travel agencies, and the number of star-rated hotels as the control variables, which are explained as follows: 1)

Explained variables

The explanatory variable is selected as the comprehensive score (ss) whose value is mainly used to reflect the degree of agglomeration of tourism consumption, so the comprehensive score is used to measure the agglomeration level of tourism consumption development. There are many indicators used to measure the composite score. This paper uses the composite score (ss) to reflect the level of tourism consumption and the level of agglomeration.

2)

Explanatory variables

Per capita tourism consumption refers to the average monetary amount spent by each tourist, that is, the ratio of the total tourism income and the total number of tourists, reflecting the average spending level of the people. In this paper, the per capita tourism consumption (unit: yuan) is selected to be recorded as a pcts indicator to measure the impact of tourism consumption level.

The impact of the people’s living standard on tourism consumption is mainly reflected in the impact of the income level of tourists on tourism consumption and the impact of demographic and environmental factors on tourism consumption and, therefore, is mainly measured by the level of residents’ consumption (unit: yuan) recorded as hcl as an indicator.

This aspect of tourism infrastructure infrastructure reflects that the number of A-class scenic spots plays a decisive role in people’s choice of whether or not to travel to this place. So, choose the number of A-class scenic spots (unit: a) recorded asa these three indicators.

3)

Control variables

GDP per capita: this paper mainly analyzes the spatial differences in tourism consumption, so it is more appropriate to choose the GDP per capita of each province. Therefore, the level of economic development has been chosen as the indicator of per capita gross domestic product (unit: yuan), which is labeled as pcgdp.

The relationship between the number of travel agencies and star-rated hotels and the level of development of tourism consumption is mainly the impact of these two tourism infrastructures on the level of development of tourism consumption, so the control variables are selected as the number of travel agencies (unit: nta) recorded as nta, the total number of star-rated hotels (unit: nsh) recorded as nsh.

3

Modeling

3.1

Spatial autocorrelation

Spatial autocorrelation is a spatial statistical method used to describe spatially interacting phenomena, referring to the correlation of the same variable at different spatial locations. Many geographical phenomena are spatially autocorrelated because they are influenced by processes that are continuous in their geographical distribution [13-15].

3.1.1

Measurement of spatial global autocorrelation

The global indices for measuring spatial autocorrelation are the global Maran'I index, the global Geacy'C index and the global G index.

1)

The global Maran'I index reflects the degree of similarity of attribute values of spatially adjacent or neighboring regional units, and its calculation formula is: (7) $I = \frac{N \cdot \sum_{i = 1}^{N} \sum_{j = 1}^{N} w_{i j} \cdot (x_{i} - \bar{x}) (x_{j} - \bar{x})}{(\sum_{i = 1}^{N} \sum_{j = 1}^{N} w_{i j}) \cdot \sum_{j = 1}^{N} {(x_{i} - \bar{x})}^{2}}$ Where N represents the number of spatial data. x_i, x_j represents the value of spatial attributes in zone i and zone j. $\bar{x}$ is the mean value of all spatial data, w_ij is the element of the spatial weight matrix, the spatial weight matrix is generally a symmetric matrix, and w_ii = 0.

The value of Maran'I index is approximated to be between -1 and +1. Positive value indicates that there is positive correlation between spatial things, negative value indicates that there is negative correlation between spatial things, and zero or near zero value indicates that there is no correlation between spatial things, i.e., the spatial distribution is random.

For the global Maran'I index, the significance level of spatial autocorrelation can be tested using the standardized statistic Z(I). The formula for Z(I) is: (8) $Z (I) = \frac{I - E (I)}{\sqrt{V a r (I)}}$ Where Var(I) is the theoretical variance of the Maran'I index and $E (I) = \frac{- 1}{m - 1}$ is its theoretical expectation.

2)

Global Geary'C index

The value of the Geary'C index ranges from 0 to 2. When 0<C<1, it means that the distribution of spatial things with the value of the attribute has positive autocorrelation. When 1<C<2, it means that the distribution of spatial things with the value of this attribute has negative correlation. When C ≈ 1, it indicates that there is no spatial correlation, i.e., the spatial thing with the value of this attribute is randomly distributed in space. The formula is as follows: (9) $C = \frac{\sum_{i = 1}^{N} \sum_{j = 1}^{N} w_{i j} {(x_{i} - x_{j})}^{2}}{2 \sum_{i = 1}^{N} \sum_{j = 1}^{N} w_{i j} σ^{2}}$ Where $σ^{2} = \frac{\sum_{i = 1}^{N} {(x_{i} - \bar{x})}^{2}}{(N - 1)}$ , the variance of the attributes of the spatially analyzed object.

Similar to the Moran' I index, the Geary' C coefficients can be tested for the significance level of spatial autocorrelation using the standardized statistic Z(C). The formula for Z(C) is: (10) $Z (C) = \frac{C - E (C)}{\sqrt{V a r (C)}}$

3.1.2

Measurement of spatially localized autocorrelation

There are three main measures of local spatial autocorrelation, i.e., the local index of spatial associations (LISA), the G statistic, and the Moran scatterplot. Among them, the local index of spatial association (LISA) includes two kinds of indices: local Moran' I and local Geary' C.

1)

Local Moran' I

Decomposing the Moran' I statistical analysis method based on global correlation analysis to the local space, Anselin proposed the local Moran' I statistical analysis method, i.e., for every distributed object in the space: (11) $I_{i} = \frac{(N - 1) (x_{i} - {\bar{x}}_{i}) \sum_{j = 1}^{N} w_{i j} (x_{j} - {\bar{x}}_{i})}{\sum_{j = 1, j = i}^{N} {(x_{i} - {\bar{x}}_{j})}^{2}}$ I_i is the correlation coefficient of the i nd analyzed subject. Its expectation and variance are respectively: (12) $E (I_{i}) = - \frac{\sum_{i = 1}^{N} w_{i j}}{N - 1}$ (13) $\begin{matrix} V a r (I_{i}) = \frac{(N - b_{2}) \sum_{j = 1, k = i}^{N} w_{i j}^{2}}{N - 1} \\ + \frac{(2 b_{2} - N) \sum_{k = 1, k = i}^{N} \sum_{k = 1, h + k i}^{N} w_{i k} w_{i k}}{(N - 1) (N - 2)} - {[E (I_{i})]}^{2} \end{matrix}$

Among them: (14) $b_{2} = N \sum_{i = 1}^{N} {(x_{i} - \bar{x})}^{4} / {(\sum_{i = 1}^{N} {(x_{i} - \bar{x})}^{2})}^{2}$

2)

Localization Geary' C

Local Geary' C is defined as follows: (15) $C_{i} = \sum_{j = 1}^{N} w_{i j} {(Z_{i} - Z_{j})}^{2}$

Among them: (16) $Z_{i} = (x_{i} - \bar{x}), Z_{j} = (x_{j} - \bar{x})$

Local Geary' C is related to Global Geary' C as follows: (17) $\begin{matrix} C = \frac{(N - 1) \sum_{i = 1}^{N} \sum_{j = 1}^{N} w_{i j} {(Z_{i} - Z_{j})}^{2}}{2 N S^{2} \sum_{i = 1}^{N} \sum_{j = 1}^{N} w_{i j}} \\ = \frac{(N - 1) \sum_{i = 1}^{N} \sum_{j = 1}^{N} w_{i j} {(Z_{i} - Z_{j})}^{2}}{2 N^{2}} = \frac{(N - 1)}{2 N^{2}} \sum_{i = 1}^{N} C_{i} \end{matrix}$

3.2

Spatial econometric modeling

3.2.1

Spatial weight setting

In defining the spatial weight, the first step is to quantify the location of the spatial unit, and the quantification of the location is generally based on the “distance”, and the most commonly used distance setting methods include spatial distance and economic distance.

1)

Spatial distance

The spatial distance is mainly set with neighboring distance, limited distance and negative index distance weights.

2)

Economic distance

Depending on the distance between two locations for one or several economic variables such as GDP, foreign trade volume d_ij, d_ij = |z_i–z_j|. where z_i and z_j are the GDP or foreign trade of the two regions and then the distance recession theorem determines the recession function in order to get the weights, e.g.: w_ij = 1/d_ij. There is a zero-distance problem in the setting of the economic distance, when z_i = z_j, W_ij = 0.

3.2.2

Spatial econometric models

This paper focuses on spatial regression models that incorporate spatial effects (spatial correlation and spatial variance), including two types of spatial lag models (SLM) and spatial error models (SEM).

1)

Spatial lag model (SLM)

The general form of the spatial lag model (SLM) can be expressed as follows: (18) $y = ρ W y + β X + ε ε \sim N (0, σ^{2} I_{n})$

The model is a standard regression model integrating spatially lagged dependent variables. Where W is the spatial weighting matrix of n×n and usually W is determined by exogenous geographic factors.

2)

Spatial Error Model (SEM)

The interrelationships between institutions or regions in the spatial error model are represented by their error terms, which are expressed in their general form: (19) $\begin{matrix} y = β X + u \\ u = λ W u + ε ε \sim N (0, σ^{2} I_{n}) \end{matrix}$

The spatial error model is essentially a spatial autoregressive model combining a standard regression model with an error term whose spatial correlation role is present in the perturbation error u, reflecting the effect of error shocks in neighboring regions about the dependent variable on observations in that region. u is a vector of random error terms with W the same settings as in the spatial lag model, and parameter λ represents the spatial error coefficient, which measures the role of spatial dependence in the sample observations and includes in the model some omitted variables and those perturbations and shocks that are present in a spatial form and are not easily observable. ε is the vector of normally distributed random errors.

3.3

Selection and estimation of spatial econometric models

3.3.1

Model selection

In determining the spatial correlation of regional economic growth behavior, not only the Moran’I test can be used, but also the two Lagrangian multiplier forms LM-LAG, LM-ERR (Lagrangian multiplier tests for spatial lag and spatial error models, respectively) and the robust estimates R-LMLAG, R-LMERR (robust estimates for spatial lag and robust estimates of Lagrange multiplier tests for spatial error models) are performed.

LM-Lag and Robust LM-Lag are suitable for spatial lag models, and LM-Error and Robust LM-Error are suitable for spatial error models.

Both tests, LM-Lag and LM-Error, obey a chi-square distribution with one degree of freedom, and they are tests for different forms of spatial measurement modeling equations, but both tests need to be performed simultaneously in the actual test.

The discriminatory criteria for choosing SLM or SEM are: if the maximum likelihood LM-Lag test is more significant than the LM-Error test in the spatial dependence test in the case that the Moran I test is significant and if the robust estimate R-LMLAG is significant, but R-LMERR is not then the spatial lag model (SAR) is chosen. Conversely, if LM-Error is statistically significant more than LM-Lag and R-LMERR is significant, R-LMLAG is not significant. Then, the spatial error model is chosen.

Second, in diagnosing overall significance, in addition to comparing the goodness-of-fit R², the natural log-likelihood function value (Log L) can be used to make judgments.

3.3.2

Estimation of spatial econometric models

Spatial autoregressive models are no longer unbiased, efficient, and consistent estimates by ordinary least squares OLS estimation due to the endogeneity of the variables. Here in this paper, we focus on the great likelihood method to estimate the parameters of SLM and SEM, which is used in the empirical analysis.

The estimation steps of the spatial lag regression model include: 1)

Perform least squares estimation on model y = Xβ₀+ε₀ and calculate residual $e_{0} = y - X {\hat{β}}_{0}$ .

2)

Perform least squares estimation on model y = Xβ_L+ε₀ and calculate residual $e_{L} = W y - X {\hat{β}}_{L}$ .

3)

From the values of e₀ and e_L, find the ρ that maximizes the likelihood function, i.e: (20) $L_{c} = - (n / 2) \ln (1 / n) {(e_{0} - ρ_{L})}^{'} (e_{0} - ρ e_{L}) + \ln | I - ρ W |$

4)

Given $\hat{ρ}$ which maximizes L_c, compute the remaining parameter estimates (21) $\hat{β} = ({\hat{β}}_{0} - \hat{ρ} β_{L})$ (22) ${\hat{σ}}_{ε}^{2} = (1 / n) {(e_{0} - {\hat{ρ}}_{L})}^{'} (e_{0} - ρ e_{L})$

Then the maximum likelihood function value is: (23) $\begin{matrix} \log L = - (N / 2) \ln (2 π) - (N / 2) \ln {\hat{σ}}_{c}^{2} + \ln | I - ρ W | \\ + \ln | I - ρ W | - (1 / 2 {\hat{σ}}_{e}^{2}) e^{'} {(y - \hat{ρ} W y - \hat{β} X)}^{'} (y - \hat{ρ} W y - \hat{β} X) \end{matrix}$

The estimation procedure for the spatial error model is: 1)

Perform OLS estimation of model y = βX+u to obtain an unbiased estimate $\hat{β}$ of β.

2)

Calculate the residuals $e = y - \hat{β} X$ of the above OLS estimation.

3)

From the value of e, obtain an estimate of parameter λ by the maximum likelihood function $\hat{λ}$ . i.e: (24) $L_{c} = - (n / 2) \ln (1 / n) (e - λ W e)' (e - λ W e) + \ln | I - ρ W |$

4)

Calculate the remaining parameter estimates from the $\hat{λ}$ values: (25) ${\hat{σ}}_{ε}^{2} = (1 / n) {(e_{0} - {\hat{ρ}}_{L})}^{'} (e_{0} - ρ_{L})$

Then the maximum likelihood function value is: (26) $\log L = - (N / 2) \ln (2 π) - (N / 2) \ln {\hat{σ}}_{ε}^{2} + \ln | I - ρ W | - (I - λ W) e$

4

Analysis of factors influencing the level of tourism consumption

4.1

Spatial correlation analysis of tourism consumption levels

In order to facilitate the comparative analysis, this paper uses geographic distance, economic distance, and functions based on geographic and economic distance to construct spatial weight matrices, respectively. The spatial weights of functions are constructed, and the gravitational model is introduced into the study of spatial action to establish the spatial weights of functions based on the gravitational model. The gravity model follows the first law of geography, which makes it clear that the association and correlation between similar things is stronger than that between more distant things, which is also in line with the general law of economic operation. Combined with the gravity model, the size of the interaction force between two regions or units is negatively correlated with the distance between the two regions and positively correlated with the total economic volume of the two regions. The gravity model has become an important model for studying the spatial effects of regional and district factor flows.

Based on the gravity model, a non-binary spatial weight matrix of functions is established. After row normalization, global and local spatial autocorrelation tests are carried out. The Getis-Ord index G is not applicable since it requires a non-standardized symmetric spatial weight matrix with all elements 0 and 1. Only the Moran Index I and the Geary Index C were calculated for the explanatory and interpretive variables of tourism economic efficiency.

4.1.1

Global correlation test

The global correlation test of tourism consumption level is shown in Table 1. According to the test results, most of the values of Moran index I are around 0 and Gillray index C is around 1 in 2010-2022. However, the P-value is high, indicating that both global spatial autocorrelation indices cannot reject the original hypothesis of “no spatial autocorrelation”, and in the Moran index I, there is a result greater than 0 from 2012 to 2015, indicating that there is a positive spatial correlation in these two years, and the Gillette index also obtains the same result.

Table 1.

Global correlation test for consumption levels

YEAR	Moran’s I					Geary’s C
YEAR	I	E(I)	sd(I)	z	p-value*	C	E(C)	sd(C)	z	p-value*
2010	-0.025	-0.027	0.124	0.042	0.925	1.524	1.000	0.142	0.254	0.724
2011	-0.018	-0.027	0.124	0.214	0.798	0.839	1.000	0.143	-0.078	0.951
2012	0.105	-0.027	0.125	1.53	0.335	0.827	1.000	0.146	-1.241	0.352
2013	0.042	-0.027	0.124	0.721	0.652	0.931	1.000	0.142	-0.158	0.816
2014	0.045	-0.027	0.128	0.715	0.652	0.993	1.000	0.145	-0.129	0.948
2015	0.142	-0.027	0.127	1.568	0.241	0.824	1.000	0.145	-1.505	0.247
2016	-0.064	-0.027	0.124	-0.415	0.825	1.124	1.000	0.142	-1.557	0.124
2017	-0.089	-0.027	0.124	-0.524	0.662	1.181	1.000	0.143	0.682	0.542
2018	-0.036	-0.027	0.124	0.069	0.963	1.012	1.000	0.143	0.725	0.415
2019	-0.021	-0.027	0.125	0.051	0.942	1.068	1.000	0.142	0.241	0.856
2020	-0.055	-0.027	0.127	-0.182	0.785	1.043	1.000	0.143	0.359	0.521
2021	-0.075	-0.027	0.124	-0.358	0.852	1.029	1.000	0.142	0.522	0.856
2022	-0.061	-0.027	0.126	-0.182	0.896	1.014	1.000	0.145	0.384	0.722

4.1.2

Local spatial autocorrelation test

The local correlation test of tourism consumption levels is shown in Table 2. The results of the local spatial autocorrelation test indicate that spatial autocorrelation exists in most areas. The study shows that although there is no spatial autocorrelation globally, there is spatial autocorrelation locally, and the reason for this situation may lie in the fact that the local autocorrelations cancel each other out, resulting in the absence of autocorrelation globally.

Table 2.

Survey of local correlation of travel consumption level

Region	Moran’s I					Geary’s C
Region	Ii	E(Ii)	sd(Ii)	z	p-value*	ci	E(ci)	sd(ci)	z	p-value*
Beijing	1.253	-0.043	0.725	1.078	0.018	0.652	2.352	1.872	-0.789	0.179
Tianjin	0.893	-0.043	0.380	0.783	0.167	0.666	2.352	2.376	-0.133	0.234
Hebei	-0.425	-0.043	0.378	-0.606	0.516	4.463	2.352	1.887	0.848	0.476
Shanxi	0.025	-0.043	0.093	0.741	0.839	1.324	2.352	1.409	-0.789	0.092
Neimenggu	-0.072	-0.043	0.740	-0.674	0.239	4.884	2.352	1.307	0.275	0.399
Liaoning	0.024	-0.043	0.573	0.456	0.853	0.630	2.352	1.135	-0.509	0.174
Jilin	-0.142	-0.043	0.931	1.182	0.223	0.723	2.352	1.330	-0.058	0.383
Heilongjiang	1.352	-0.043	0.310	-0.431	0.283	1.835	2.352	1.037	-0.146	0.187
Shanghai	-0.012	-0.043	0.145	0.508	0.371	1.639	2.352	1.404	-0.393	0.120
Jiangsu	-0.157	-0.043	0.095	0.437	0.151	1.320	2.352	1.520	-0.710	0.831
Zhejiang	-0.214	-0.043	0.685	-0.609	0.396	1.493	2.352	1.047	-0.183	0.179
Anhui	0.135	-0.043	0.603	0.428	0.443	0.562	2.352	1.316	-0.422	0.334
Fujian	-0.024	-0.043	0.041	0.446	0.625	1.403	2.352	1.966	-0.798	0.409
Jiangxi	0.328	-0.043	0.546	0.772	0.036	0.527	2.352	1.306	-0.990	0.805
Shandong	0.258	-0.043	0.239	0.404	0.391	1.754	2.352	1.264	-0.245	0.759
Henan	0.321	-0.043	0.202	0.491	0.340	1.700	2.352	1.253	-0.682	0.097
Hubei	0.205	-0.043	0.079	0.334	0.756	0.620	2.352	1.351	-0.050	0.428
Hunan	0.172	-0.043	0.422	0.779	0.832	1.545	2.352	1.378	-0.052	0.526
Guangdong	-0.724	-0.043	0.353	-0.276	0.407	5.592	2.352	1.250	0.960	0.630
Guangxi	0.024	-0.043	0.180	0.756	0.144	0.603	2.352	1.364	-0.660	0.083
Hainan	-0.825	-0.043	0.537	-0.664	0.068	4.566	2.352	1.177	0.135	0.608
Chongqing	0.152	-0.043	0.244	0.728	0.540	1.793	2.352	1.265	-0.562	0.173
Sichuan	0.283	-0.043	0.395	0.710	0.617	0.656	2.352	1.022	-0.072	0.235
Guizhou	-0.983	-0.043	0.517	-0.340	0.230	5.449	2.352	1.261	-0.088	0.709
Yunnan	0.058	-0.043	0.406	0.635	0.802	1.519	2.352	1.113	0.422	0.429
Xizang	0.089	-0.043	0.244	0.759	0.262	1.538	2.352	1.210	-0.215	0.312
Shanxi	0.087	-0.043	0.369	0.696	0.677	1.707	2.352	1.434	-0.602	0.533
Gansu	-0.135	-0.043	0.364	-0.363	0.590	6.610	2.352	1.177	0.494	0.309
Qinghai	0.198	-0.043	0.610	0.195	0.505	3.765	2.352	1.923	-0.028	0.013
Ningxi	-0.675	-0.043	0.205	-0.525	0.359	4.656	2.352	1.375	0.952	0.128
Xinjiang	-0.178	-0.043	0.380	1.078	0.862	2.597	2.352	1.526	0.416	0.021

It can be seen that the Moran Index I shows that Beijing, Shanghai, Hainan, Guizhou, Gansu, Ningxia, and Jilin reject the hypothesis of no correlation at the 10% significant level, and thus, these regions are locally correlated. The Gillet index C shows that Hebei, Guangdong, Hainan, Guizhou, Gansu, and Ningxia reject the original hypothesis at the 10% significant level, and there is a local spatial correlation. Therefore, combining the Moran Index I and the Gillette Index C, Beijing, Shanghai, Hainan, Guizhou, Gansu, Ningxia, Hebei, and Guangdong explanatory variables show significant local spatial correlation, exhibiting either spatial positive or spatial negative correlation. Therefore, when choosing the spatial estimation model, it is necessary to fully consider the spatial correlation of the tourism economic efficiency, introduce the spatial lag term of the tourism economic efficiency into the model, and pay attention to the resulting correlation problems.

4.2

Analysis of the spatial effect of the index of regional tourism development

4.2.1

Validation of spatial effects

Based on the different control of spatial and time effects, the spatial panel model with fixed effects can be categorized into four types: no fixed effects, spatial fixed effects, time fixed effects and spatial time fixed effects. Firstly, we compare the estimation results of the four types of SDM models and select the optimal model from them. Secondly, the individual fixed effects model, spatial lag model and spatial error model of the non-spatial panel are used as the reference model for estimation, and this process is completed with the help of software Matlab R2020b and its spatial econometrics software package.

The regression results of the influencing factors of the regional tourism development index are shown in Table 3. By synthesizing the correlation test and model estimation results, the following conclusions can be initially drawn: Table 3.

The results of the whole domain tourism development index were returned

Variable	OLS(Individual fixation)	SLM	SEM	SDM(Time fixed)
hcl	-0.4254***	0.2986***	-0.1527**	0.2975***
hcl	(-4.2151)	(8.5407)	(-2.0124)	(7.8697)
asa	0.3048***	0.1243***	0.2235***	0.1543***
asa	(8.0053)	(3.6381)	(5.7680)	(4.0517)
pcgdp	0.2176***	0.2688***	0.2104***	0.2493**
pcgdp	(4.5206)	(7.9618)	(3.6524)	(6.3124)
nta	-0.5248	0.1275***	-0.827**	0.1993***
nta	(-1.2701)	(3.3562)	(9.8571)	(6.4215)
nsh	0.4867***	0.2534***	-0.875**	0.1942***
nsh	(10.4813)	(4.8965)	(-2.5354)	(4.2513)
W* hcl	-	-	-	-0.1896*
W* hcl	-	-	-	(-0.1935)
W*asa	-	-	-	0.0027
W*asa	-	-	-	(0.0528)
W*pcgdp	-	-	-	0.0924
W*pcgdp	-	-	-	(1.657)
W*nta	-	-	-	-0.5246***
W*nta	-	-	-	(-8.2012)
W*nsh	-	-	-	0.1530
W*nsh	-	-	-	(1.7852)
ρ/λ	-	0.1893***	0.5243***	0.3562***
ρ/λ		(3.6538)	(11.2042)	(7.4251)
Adj·R²	0.9147	0.7896	0.4869	0.7892
Log L	653.2568	541.2305	598.6258	463.5284

The SDM (time-fixed) model is the optimal model for the tourism development index of the entire region. The regression results of the influencing factors of the regional tourism development index are shown in Table 3, and from the results of the spatial effect test, although the SEM (spatial fixed) model has the highest log-likelihood value (Log L), the adjusted goodness-of-fit coefficient (Adj. R₂) is the lowest. The SDM (time-fixed) model has a higher log-likelihood value (Log L), ranking second after the SEM (spatial-fixed) model, and it has the highest adjusted fit coefficient (Adj. R₂). Therefore, compared to the other two spatial models, the SDM (time-fixed) model is the optimal model for the regional tourism development index, which is consistent with the above test results.

Comparing the OLS model and SDM model, the regression coefficients of the number of A-grade scenic spots and the total number of star-rated hotels in the OLS model are 0.3048 and 0.4867, respectively, while the regression coefficients are 0.1543 and 0.1942 in the SDM model, with the coefficients significantly lower. The regression coefficient of per capita gross domestic product (pcgdp) in the OLS model is 0.2176, while in the SDM model, the regression coefficient is 0.2493, and the coefficient is slightly increased. The regression coefficient of the consumption level of residents (hcl) in the OLS model is -0.4254, while the regression coefficient in the SDM model is 0.2975. The regression coefficient of the number of travel agencies (NTA) in the OLS model is -0.5248, which does not pass the test of the level of significance, while the regression coefficient in the SDM model is 0.1993, and it is significant at the 0.01 level. The above comparison results show that ignoring the existence of spatial effects of explanatory and interpreted variables will overestimate the impact of the number of A-grade scenic spots and the total number of star-rated hotels on the overall tourism development index, underestimate the impact of the per capita gross domestic product on the overall tourism development index, and the paradox of the negative impact of the level of residents’ consumption and the number of travel agencies on the overall tourism development index will occur.

The values of the SLM model ρ, SEM model λ and SDM model ρ are 0.1893, 0.5243 and 0.3562, respectively, and all of them passed the 0.01 significance level test, and all three spatial econometric models confirm the existence of spatial spillover effects of the regional tourism development index.

4.2.2

Analysis of influencing factors

Since the regression coefficients of the explanatory variables in the spatial Durbin model can not directly reflect their specific influence on the regional tourism development index, it is necessary to decompose them, and the results of the direct effect, indirect effect and total effect of the influencing factors of the regional tourism development index are shown in Table 4.

Table 4.

Direct effect, indirect effect and total effect of tourism development index

Variable	hcl	asa	pcgdp	nta	nsh
Direct effect	0.3512***	0.1375***	0.2635***	0.2286**	0.2425***
Direct effect	(7.8695)	(4.5264)	(6.9865)	(5.6838)	(4.4151)
Indirect effect	-0.0921	0.0785	0.2513***	-0.6879***	0.2441*
Indirect effect	(-0.8879)	(0.7196)	(3.5628)	(-6.0591)	(2.3561)
Total effect	0.2041	0.2215	0.5237***	-0.3604***	0.5124***
Total effect	(1.7245)	(1.8206)	(7.2653)	(-4.0111)	(2.9815)

Residents’ consumption level has a promotional effect on the regional tourism development index, and its direct effect on the regional tourism development index is 0.3512 and is significant at the 0.01 level, and the indirect effect is -0.0921, which does not pass the significance level test. If every 1% increase in the level of residents’ consumption, it will directly promote the region’s all-region tourism development index by 0.35%.The effect of residents’ consumption level on the tourism development index of neighboring regions in the whole region is not significant.

In summary, the results of the spatial effect analysis of the regional tourism development index show that: 1)

There is a spatial positive autocorrelation of China’s provincial regional tourism development index from 2010 to 2022, and this spatial correlation shows an enhanced development trend.

2)

The consumption level of residents, the number of A-grade scenic spots, the gross domestic product per capita, the number of travel agencies and the total number of star-rated hotels have a positive effect on the regional tourism development index, especially the consumption level of residents.

3)

GDP per capita and the total number of star-rated hotels have positive spillover effects on the regional tourism development index. The regional tourism development index will be indirectly promoted by 0.25% and 0.24% if neighboring provinces increase their infrastructure and scientific and technological innovation by 1%.

4)

The number of travel agencies has a negative spillover effect on the regional tourism development index. The regional tourism development index will be reduced by 0.68% if the quality of the population in neighboring provinces improves by 1%.

5

Case study on the geographical distribution characteristics of tourism consumption behavior

5.1

Behavioral Data Sources and Processing

Taking a tourist attraction in Jilin Province as the main research object, a questionnaire survey, network check-in data, GPS data, communication data, travelogue text, and other data were compared and analyzed. Questionnaire survey as the main data source, network travelogue text and geographic information as an auxiliary supplement, and government statistical data as support, using a variety of ways to collect and process the data of spatial and temporal behavior of tourists.

A total of 600 online questionnaires and offline paper questionnaires were distributed, and 579 valid questionnaires were collected. The questionnaires covered various aspects, including the basic understanding of the research tourists, the understanding of the tourists’ travel mode, the number of days they stayed, the scenic spots they visited, as well as the tourists’ consumption behavior and satisfaction in the scenic spots, the service space, the transportation space, and so on.

5.2

Analysis of Tourist Consumption Behavior

Through the stage of interviews with tourists and tourism APP in the tourists for different scenic sources of the comments in the summary to get the tourists in different scenic spots in the number of consumption behavior, the average number of times of consumption of different scenic spots as shown in Table 5. Visitors in the scenic spot in each attraction of the consumer behavior is not less than 2 times. The statistics in the text of the attractions show a total of 19 attractions, with tourists’ consumption behavior totaling 103 times, and an average number of times per tourist being 5 times.

Table 5.

Average consumption of scenic spots in different scenic spots

Scenic spot name	Average consumption number (time)
Changbaishan	5
Changbaishanxiagufushilinjingqu	2
Longshunxueshanfeihujingqu	8
Daxitaihejingqu	3
Chuangxingchangbaishanyuanshisamanbuluofengjingqu	2
Daguandongwenhuayuan	7
Mojiefengjingqu	3
Shangbaishanlishiwenhuayuan	5
Shangbaishanhepinghuaxuechang	5
Hongqichaoxianminsucun	12
Changbaishanbaoshixiaozhenlvyoudujiaqu	15
Xidongyouleyuan	4
Songhuacun	8
Baihuagujingqu	6
Changbaishandiyicunfengjingqu	4
Changbaishanwenhuaboliancheng	5
Haigouhuangjincheng	3
Huiyiyizhi	4
Genjudizhanshijinianguan	2

By grading the transportation isochronous circle time of this scenic spot, the tourists’ visit duration is divided into half an hour for a short distance, one to two hours for a middle distance and three hours for a long distance. Combined with the statistical development of tourists’ travel time length, tourist consumption behavior and travel time length have a certain correlation.

Combined with the development of the city including geographic location and economic and social development of the current situation of the development of the city, the city Moran index I shows that p = 0.223, with local correlation. Combined with the results of the analysis of the spatial effect of the regional tourism development index, the development of this tourist attraction in Jilin Province has a certain correlation with the level of consumption of residents, the number of A-class scenic spots, per capita gross domestic product, the number of travel agencies and the total number of star-rated hotels.

6

Conclusion

This paper divides the subject and object of tourism consumption behavior, selects the tourism consumption spatial difference variables, establishes the spatial measurement model, and conducts a correlation analysis of the factors influencing the spatial difference of tourism consumption behavior. The geographic distance and economic distance functions are used to construct the spatial weight matrix, respectively. Based on the gravity model, Moran’s I and Geary’s C index tests are carried out.

1)

The results of the spatial autocorrelation test show that spatial autocorrelation exists in most regions, and Moran’s I and Geary’s C indices point out that there are significant local spatial correlations between the explanatory variables in Beijing, Shanghai, Hainan, Guizhou, Gansu, Ningxia, Hebei, and Guangdong, which show either spatial positive correlation or spatial negative correlation.

2)

The SLM model, the SEM model, and the SDM model all confirm the existence of a spatial spillover effect of the whole region’s tourism development index. Spatial spillover is a significant factor that influences China’s provincial region-wide tourism development index. For example, a 1% increase in the whole-area tourism development index of neighboring provinces can indirectly promote the whole-area tourism development index of this region by 0.35% through spatial interaction.

3)

When we combine the statistics of tourists’ consumption behavior in scenic spots with the analysis of correlation factors, we can determine that the majority of tourists engage in scenic spot consumption at least twice, with an average frequency of approximately five times.

Langue:: Anglais

Périodicité:: 1 fois par an
Sujets de la revue:: Sciences de la vie, Sciences de la vie, autres, Mathématiques, Mathématiques appliquées, Mathématiques générales, Physique, Physique, autres

RSS Feed de la revue

Combining big data technology to study the geographical distribution characteristics of tourism consumption behavior

Zhen Xu

Publié en ligne: 17 mars 2025

Reçu: 11 oct. 2024

Accepté: 26 janv. 2025

DOI: https://doi.org/10.2478/amns-2025-0189

Mots clésSLM model, SEM model, SDM model, Spatial correlation, Tourism consumption

© 2025 Zhen Xu, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Mots clés
SLM model, SEM model, SDM model, Spatial correlation, Tourism consumption