Construction of Time Series Prediction Models for Event Influence and Revenue Growth in Sports Industry

The influence of events in the sports industry covers a wide range of economic, social, cultural, urban image and environmental aspects, and is of great significance to the organizing country or city, as well as to the participants and spectators [1-3]. The organization of sports events usually attracts a large number of spectators and media attention, increases tourism revenue and media exposure, and also drives the development of hotels, restaurants, retail and other services, bringing great economic benefits to the local economy [4-5]. The impact and revenue of organizing these sports events can be predicted based on past data and current environment. And time series forecasting model plays a good role at this moment.

Time series refers to the sequence formed by arranging the values of a variable at different times in chronological order, and its time unit can be minutes, hours, days, weeks, decades, months, quarters, years, etc. [6]. Time series forecasting model is a kind of statistical model used to analyze and forecast time series data, its essence is the use of time series to build a mathematical model, it is mainly used for short-term forecasting of the future, belongs to the trend forecasting method [7-9]. Among them, the data used for model construction are called time series data, and they are a common type of data in many real-world problems, such as sales data, stock prices, temperature changes, social reactions, etc [10]. Common time series forecasting models include moving average model, autoregressive model, ARIMA model and LSTM model, which are widespread [11-14]. And nowadays, there are many global sports events with high participation of all people, for this reason, it is very necessary to construct a model specializing in events in the sports industry.

Several factors need to be considered to construct a suitable time series forecasting model, including data characteristics, model complexity, accuracy and so on. Constructing a suitable model according to the actual situation can lead to better prediction results.

At present, the most common research on the prediction of events in the sports industry is the prediction of the results of the game, while the influence and revenue prediction of the sports event itself is less. For example, literature [15] used a time series model to predict the results of soccer matches in the national league, and literature [16] used an LSTM model to predict the state of a player’s training and the peak of his execution ability in most cases, so as to formulate individual and team training plans for coaches and players. Whereas, literature [17] used different modeling of Autoregressive Integrated Moving Average (ARIMA) and Recurrent Neural Networks (RNNs) to predict and analyze players’ behaviors across seasons and teams, which assisted the management well. A sporting event is broadly predictable in terms of heat and revenue, and literature [18] provides profitability on market odds by calibrating existing expert and experimental information through time series model predictions to further reduce uncertainty due to expert judgment bias. Literature [19] used real-time and historical data on the relative popularity of search terms provided by Google Trends to evaluate the influence of sports leagues and identified future popularity trends through three models: trend plus seasonal regression, Holt-Winter Multiplier Method (HWMM), and Seasonal Autoregressive Integrated Moving Average (SARIMA). Based on this trend, companies or organizers can place relevant advertisements to attract potential consumers during that period. The trend prediction provided by these types of models is convenient for advertisers, companies, and merchants of the organizing venue. In addition, literature [20] also mentions that the use of robust predictive models for time series analysis has helped event managers to make informed decisions and predictions about the success and profitability of the sports industry, enhancing the technical competitiveness of the organizers.

Before carrying out the research on the relationship between the influence of sports events and revenue growth in the sports industry, this paper first constructs the economic revenue measurement and forecasting model ARIMA, conducts the unit root test on the time series, and transforms the non-stationary series into a stationary series through the difference to realize the measurement and prediction of the economic revenue growth of the sports industry. On the basis of the results of economic revenue measurement, the relationship between revenue growth and the influence of sports events is further explored, and an improved time series prediction model SVAR model is built. Relying on the framework of the VAR model, the OLS is used to estimate the induced equations without bias, estimate the parameters of the induced equations, and convert the VAR model to reach the induced equations to the structural equations, and propose variable analysis methods such as variable impulse response and variance decomposition based on the structural equation conversion. Taking the data of China’s sports industry from 2013 to 2023 as sample data, the relationship between event influence and revenue growth is discussed and analyzed with the help of the time series prediction model of event influence and revenue growth constructed in this paper.

2

Model for measuring and forecasting the economic returns of the sports industry

Traditional econometric methods are based on economic theory to describe models of variable relationships. However, economic theory is usually insufficient to provide a rigorous description of the dynamic linkages between variables, and endogenous variables can appear at both the left and right ends of the equation, complicating estimation and inference. To address these issues, an unstructured approach to modeling the relationships between variables has emerged, such as vector autoregressive (VAR) and vector error correction (VEC) models.

In classical regression modeling, the main focus is on regression analysis to establish a functional relationship (causality) between different variables in order to examine the connections between things. It is important to discuss how time series data itself can be used to build models in order to examine the laws of the development of things themselves and to make predictions about the future development of things accordingly. The significance of studying time series data: In reality, it is often necessary to study the pattern of development of a certain thing over time. This requires the study of the historical record of the past development of the thing in order to obtain the law of its own development. In reality, many issues exist, such as interest rate fluctuations, changes in yields, and reflected stock market conditions of various indices. can usually be expressed as time series data, through the study of these data, to find the pattern of change of these economic variables (for some variables, affecting the development of too many factors, or the main impact of the variables of the data is difficult to collect, so it is difficult to establish a regression model to find out the development of its changes) Law, at this time, the time series analysis model shows its advantage, because this kind of model does not need to establish the causality model, only need the data of the variable itself can be modeled), such a modeling method belongs to the research category of time series analysis. In time series analysis, ARIMA model is the most typical and commonly used model.

ARIMA contains three components, i.e., AR, I, and MA. AR denotes auto regression, i.e., autoregressive model [21]. I denotes integration, i.e., the number of single integer orders, time series models must be smooth series in order to establish an econometric model, ARIMA model as a time series model is no exception, so the first time series to carry out the unit root test, if it is a non-smooth series, it is necessary to be transformed into a smooth series through the difference, after a few differences into a smooth series, known as the number of orders of single integer; MA denotes moving average, that is, moving average model. It can be seen that the ARIMA model is actually a combination of the AR model and the MA model. In a $A R I M A (p, d, q)$ model, $A R (p)$ is the autoregressive process, p is the autoregressive term; $M A (q)$ is the moving average process, q is the number of moving average terms, d is the number of time series to become a smooth series of the number of differences made. 1)

Autoregressive process $A R (p)$ model

Describes the relationship between the current value and the historical value, using the historical time data of the variable itself to predict itself, the autoregressive model must satisfy the requirement of smoothness, the Pnd order autoregressive process form is expressed as equation (1): 1 $y_{t} = μ + \sum^{p} γ_{i} y_{t - i} + ε_{t}$

yt is the current value, u is the constant term, P is the order, ri is the autocorrelation coefficient, and et is the error, i.e., white noise.

The formula expansion is shown in (2): 2 $X_{t} = \partial_{1} X_{t - 1} + \partial_{2} X_{t - 2} + \dots + \partial_{p} X_{t - p} + μ_{t}$

If the random perturbation term is a white noise $(μ_{t} = ε_{t})$ , i.e., u = 0, the AR model is said to be a pure AR(p) process, denoted as equation (3): 3 $X_{t} = \partial_{1} X_{t - 1} + \partial_{2} X_{t - 2} + \dots + \partial_{p} X_{t - p} + ε_{t}$

As you can see from the formula, the current value is predicted from the historical values, p is an order in the autoregressive model that indicates how many periods of historical values are used to predict the current value. 2)

Moving Average Process MA(p)

The moving average model is concerned with the accumulation of the error terms in the autoregressive model, and in the AR model, if μ_t it is not a white noise, it is usually considered to be a moving average process MA(p) of order q, as shown in Eq. (4). 4 $u_{t} = β_{1} ε_{t - 1} + β_{2} ε_{t - 2} + \dots + β_{p} ε_{t - p} + ε_{t}$

where ε_t denotes the white noise sequence. In particular, Equation (5) is obtained when u_t = X_t, i.e., the current value of the time series is not related to the historical value but depends only on a linear combination of the historical white noise: 5 $X_{t} = β_{1} ε_{t - 1} + β_{2} ε_{t - 2} + \dots + β_{p} ε_{t - p} + ε_{t}$ 3)

ARIMA Model

AR and MA in ARIMA, are the AR and MA models, respectively, and I is the difference method, where the difference calculation ensures the stability of the data.

The autoregressive model $(A R)$ , the moving average model $(M A)$ and the difference method $(I)$ are combined, so that the differential autoregressive moving average model $A R I M A (p, d, q)$ is obtained, where d is the order in which the data need to be differenced, and ARIMA is the ARMA model after differencing.

3

Forecast analysis of the economic returns of the sports industry

In this chapter, using the ARIMA-based econometric model constructed in the previous section and the forecasting methodology, the quarterly year-on-year data of China’s economic indices for the period from Q1 2006 to Q2 2023 for the sporting goods manufacturing industry such as sporting goods and sports facilities production, and the sports services industry such as the provision of sports services, sports media and information services, and sports tourism and recreation, are used to specifically measure the economic returns of China’s The economic benefits of the sports industry in China are measured by the quarterly year-on-year economic index data. The volatility time series of the economic income index of China’s sports industry is further calculated to systematically and comprehensively analyze the dynamic process of the volatility time series of China’s sports economic income. The time-varying paths of the time series of the economic return indices of China’s sporting goods manufacturing industry and sports service industry are specifically shown in Figure 1, with Figures (a) and (b) representing the sporting goods manufacturing industry and the sports service industry, respectively.

From the figure (a) can be intuitively found, China sporting goods manufacturing industry economic return index time series in this study selected data time range fluctuation is huge, the economic return index in 2009 had reached the peak, and then fell back to the lowest in 2013, followed by a rapid rise. However, since 2015, the economic gain index time series has shown a slow decline and reached a trough position in 2020. Subsequently, from 2021 to present, the time series of the economic return index for the sporting goods manufacturing industry once again shows a slow upward momentum.

Through Figure (b), it can be found that the time series of the economic return index of the sports services industry, such as the provision of sports services, sports media and information services, sports tourism and entertainment, also fluctuates greatly in the time range of the data selected for the study of this paper, and the economic return index had peaked in 2012, also fell back to the lowest in 2015, and then rose rapidly, rising to the maximum value in 2016. However, since 2016, the economic gain index time series also shows a slow decline and small fluctuations, and again shows a small peak in 2021. Subsequently, the time series of the economic return index of China’s sports service industry from 2021 to the present also shows a slow upward momentum.

The estimation results of the descriptive statistics of the time series of the sports industry economic income index are shown in Table 1. When examining the estimation results of skewness and kurtosis statistics, it can be easily seen that the distribution characteristics of the time series of sports economic income index “sharp peaks and thick tails” are extremely significant, and when observing the results of the estimation of the statistics of the J-B normal test and the results of the probability of the P-value, it can be concluded that the time series of the economic income index does not obey the normal distribution, which is consistent with the specific distribution characteristics of the time series mentioned earlier. When we look at the estimated J-B normal test statistic and the probability P-value results, we can conclude that the time series of the economic return index does not obey the normal distribution, which is consistent with the specific distributional characteristics of the time series mentioned earlier.

Table 1.

Statistical estimates

Industry	Sequence	Quantity of samples	Degree of bias	Kurtosis	J-B Normal test
Industry	Sequence	Quantity of samples	Degree of bias	Kurtosis	J-B statistic	Probability P
Sporting goods manufacturing industry	Sporting goods	80	0.3291	4.7819	10.5215	0.0053
Sporting goods manufacturing industry	Sports facilities	80	0.6313	4.5542	11.691	0.0028
Sports service Industry	Sports service	80	0.9757	4.7898	20.455	0.0000
	Sports media and information services	80	-1.6485	9.5655	157.4709	0.0000
	Sports and entertainment	80	0.7149	3.8629	8.1432	0.0172

4

Time-series forecasting model of the impact and revenue growth of the event

The time series forecasting model (VAR) can describe the dynamic relationship between multiple variables, which is just suitable for analyzing the interaction between event influence and revenue growth in the sports industry in the same framework in this paper [22]. Yet general VAR models are unable to capture the contemporaneous correlation of variables, and hence inferences about macroeconomic structure, since the latter necessarily involves a distinction between correlation and causation. The SVAR model is improved on this basis, which can extract the contemporaneous correlations originally hidden between the error terms, and give certain economic meanings by imposing constraints, from which the dynamic impacts of stochastic perturbations on the variable system can be further analyzed [23].

4.1

VAR modeling framework

The so-called VAR model is actually a set of equations describing the interrelationships among multiple time series, which is categorized into structural and induced forms based on whether the right side of the equation contains other variables contemporaneous with the dependent variable on the left. The two forms can be converted to each other, taking the bivariate first-order lagged VAR model as an example, its structural form is given first: 6 ${\begin{array}{l} y_{1 t} = a_{10} + a_{12} y_{2 t} + β_{11} y_{1, t - 1} + β_{12} y_{2, t - 1} + ε_{1 t} \\ y_{2 t} = a_{20} + a_{21} y_{1 t} + β_{21} y_{1, t - 1} + β_{22} y_{2, t - 1} + ε_{2 t} \end{array}$

Where y_1t and y_2t are smooth stochastic processes; ε_1t and ε_2t are random interference terms, uncorrelated, with variances $σ_{1}^{2}$ and $σ_{2}^{2}$ respectively, it can be seen that the right-hand side of the above equation has a contemporaneous effect with the left-hand side, i.e., y_2t has an effect on y_1t and y_1t has an effect on y_2t, which is in line with the logic of reality. Even if it is determined that there is no simultaneous influence between the series, it is sufficient to set the coefficients of the corresponding variables to zero. On this basis, question (1) is asked: how is the above equation estimated? It should be known that OLS coefficient estimation requires that the independent variable and the error term are uncorrelated in order to obtain an unbiased estimate. Assuming that ε_1t produces a disturbance, the first equation of Eq. (6) shows that it will have an effect on y_1t, and the second equation shows that y_1t will in turn have an effect on y_2t, i.e., there is a correlation between ε_1t and y_2t, so the above equation cannot be estimated using OLS.

If there is no contemporaneous effect on both sides of the equation, the above problem will not occur, thus an attempt is made to eliminate the contemporaneous variables on both sides of the equation by some transformation. Firstly, equation (6) is rewritten in matrix form: 7 $[\begin{matrix} 1 & - a_{12} \\ - a_{21} & 1 \end{matrix}] [\begin{array}{l} y_{1 t} \\ y_{2 t} \end{array}] = [\begin{array}{l} a_{10} \\ a_{20} \end{array}] + [\begin{array}{l} β_{11} & β_{12} \\ β_{21} & β_{22} \end{array}] [\begin{array}{l} y_{1, t - 1} \\ y_{2, t - 1} \end{array}] + [\begin{matrix} ε_{1 t} \\ ε_{2 t} \end{matrix}]$

The first term on the left-hand side of Eq. (8) is the identification matrix N introduced in the literature review, such that: 8 $[\begin{array}{l} a_{1} \\ a_{2} \end{array}] = N^{- 1} [\begin{array}{l} a_{10} \\ a_{20} \end{array}], [\begin{array}{l} b_{11} & b_{12} \\ b_{21} & b_{22} \end{array}] = N^{- 1} [\begin{array}{l} β_{11} & β_{12} \\ β_{21} & β_{22} \end{array}], [\begin{array}{l} e_{1 t} \\ e_{2 t} \end{array}] = N^{- 1} [\begin{array}{l} ε_{1 t} \\ ε_{2 t} \end{array}]$

Substituting into equation (9) gives: 9 ${\begin{array}{l} y_{1 t} = a_{1} + + b_{11} y_{1, t - 1} + b_{12} y_{2, t - 1} + e_{1 t} \\ y_{2 t} = a_{2} + b_{21} y_{1, t - 1} + b_{22} y_{2, t - 1} + e_{2 t} \end{array}$

There is no longer a contemporaneous influence on both sides of the above joint equation equation, called the induced set of equations, which can be estimated unbiased using OLS. The key to obtain the coefficients of the induced equations to obtain the coefficients of the structural equations is to take N. The covariance matrix is obtained by taking the covariance matrix on both sides of the third equation in equation (8): 10 $N^{- 1} [\begin{array}{l} ε_{1 t} \\ ε_{2 t} \end{array}] [\begin{array}{l} ε_{1 t} & ε_{2 t} \end{array}] N^{- 1} = [\begin{array}{l} e_{1 t} \\ e_{2 t} \end{array}] [\begin{array}{l} e_{1 t} & e_{2 t} \end{array}]$

By assumption, the error terms in the system of structural equations are uncorrelated, therefore: 11 $N^{- 1} [\begin{matrix} σ_{1}^{2} & 0 \\ 0 & σ_{2}^{2} \end{matrix}] N^{- 1} = [\begin{matrix} V a r (e_{1 t}) & C o v (e_{1 t}, e_{2 t}) \\ C o v (e_{1 t}, e_{2 t}) & V a r (e_{2 t}) \end{matrix}]$

where Var(X) denotes the sample variance and Cov(X,Y) denotes the covariance between variables. Since the parameters of the induced equations can be estimated by OLS, the right-hand side of equation (10) is actually known. The problem of converting from induced to structural equations is also known as the SVAR model identification problem.

4.2

Identification of SVAR models

From the previous subsection, it is clear that the key to converting from induced VAR to structural VAR is to find N, during which additional constraints need to be imposed. Different constraints can be categorized into different identifications according to the constraints, which have been briefly introduced in the literature review. Different constraints actually correspond to different economic meanings, and in this paper, we only select a special case of short-term constraints, the Cholesky decomposition, for illustration [24].

Let a₂₁ = 0 in Eq. (6), the corresponding structural equation is: 12 ${\begin{matrix} y_{1 t} = a_{10} + a_{12} y_{2 t} + β_{11} y_{1, t - 1} + β_{12} y_{2, t - 1} + ε_{1 t} \\ y_{2 t} = a_{20} + β_{21} y_{1, t - 1} + β_{22} y_{2, t - 1} + ε_{2 t} \end{matrix}$

Expressed as y_1t has no current impact on y_2t. It can be obtained from equation (11): 13 $a_{12} = - \frac{C o v (e_{1 t}, e_{2 t})}{V a r (e_{2 t})}$

From equation (8): 14 ${\begin{matrix} ε_{1 t} = a_{12} e_{2 t} + e_{1 t} \\ ε_{2 t} = e_{2 t} \end{matrix}$

It follows that a₂₁ = 0 corresponds to the meaning of. 1)

y_2t the error term in the structural equation is equivalent to the error term in the induced equation and; y_1t the error term in the structural equation ε_1t is the residual from the regression of the error term in the induced equation e_1t on e_2t and a₁₂ is the regression coefficient.

2)

y_1t has no current effect on y_2t. The multivariate case can be reasoned analogously. The so-called Cholesky decomposition, where a symmetric positive definite matrix can be expressed as the product of a lower triangular matrix and its transpose, is actually a special case of the LU triangular decomposition when A is a symmetric positive definite matrix: 15 $A = L L^{T}$

Also, the symmetric positive definite matrix can be decomposed as: 16 $A = Q Λ Q^{T}$

where Q is an orthogonal matrix and Λ is a real diagonal matrix. It is easy to find that Eqs. (16) and (11) correspond to each other, i.e., A corresponds to $[\begin{matrix} V a r (e_{1 t}) & C o v (e_{1 t}, e_{2 t}) \\ C o v (e_{1 t}, e_{2 t}) & V a r (e_{2 t}) \end{matrix}]$ , Q corresponds to N⁻¹, Λ corresponds to $[\begin{matrix} σ_{1}^{2} & 0 \\ 0 & σ_{2}^{2} \end{matrix}]$ , and L = QΛ^1/2. After obtaining A from the OLS estimation of the induced equations, it is possible to further decompose them by Cholesky decomposition to obtain L, which leads to the elements in Q and Λ. It is found that by applying the 0 constraint to some elements of the original equation so that N⁻¹ becomes a triangular matrix and the remaining elements are solved, which is equivalent to Cholesky decomposition of the covariance matrix of the induced equations, and the elements in Q are the corresponding coefficients in the identification matrix, and Λ is the covariance matrix of the error terms of the structural equations, and therefore, this method of identification is referred to as Cholesky decomposition. By applying 0 or other constraints to the other coefficients or variances of the structural equations in order to express different economic meanings, this is a short-term constraint, which is actually a generalization of the Cholesky decomposition.

4.3

Impulse Response Function

The so-called impulse response function is actually a function that describes a one-unit change in the current error term ε_it with t as the independent variable and y^t (denoting the set of y_1t, y_2t, …, y_pt) as the dependent variable [25]. In order to obtain a more intuitive description, an attempt is made here to convert the VAR model into a vector moving average (VMA) model, still using the bivariate first-order lag model as an example, such that: 17 $y_{t} = [\begin{array}{l} y_{1 t} \\ y_{2 t} \end{array}], β = [\begin{array}{l} β_{11} & β_{12} \\ β_{21} & β_{22} \end{array}], ε_{t} = [\begin{array}{l} ε_{1 t} \\ ε_{2 t} \end{array}]$

where ε_1t and ε_2t are uncorrelated and the intercept term is ignored. Then: 18 $y_{t} = β y_{t - 1} + ε_{t}$

Introducing the lag operator: y_t−1 = Ly_t, then: 19 $(I - β L) y_{t} = ε_{t}$

Multiply both sides by (I − βL)⁻¹: 20 $y_{t} = {(I - β L)}^{- 1} ε_{t}$

By mathematical derivation it can be found that multiplying $(I + β L + β^{2} L^{2} + β^{3} L^{3} + \dots)$ by (I − βL) gives $(I - β^{n} L^{n})$ . Where n tends to infinity, so when element b_ij < 1 in β, $(I - β^{n} L^{n})$ tends infinitely to I, i.e., (I − βL)⁻¹ = I + βL + β²L² + β³L³ + …, which can be obtained by substituting into Eq. (20): 21 $y_{t} = (I + β L + β^{2} L^{2} + β^{3} L^{3} + \dots) ε_{t}$

This i.e. the moving average form of the VAR model is expanded as: 22 ${\begin{array}{l} y_{1 t} = \sum_{k = 0} β_{11} (k) ε_{1, t - k} + \sum_{k = 0} β_{12} (k) ε_{2, t - k} \\ y_{2 t} = \sum_{k = 0} β_{21} (k) ε_{1, t - k} + \sum_{k = 0} β_{22} (k) ε_{2, t - k} \end{array}$

where β₁₁(k), β₁₂(k), β₂₁(k), and β₂₂(k) are the corresponding elements in the matrix β to the kth power β^k (note $β_{i j} (k) \neq β_{i j}^{k}$ that Zhao Hong provides a detailed derivation of its computation), i.e: 23 $β^{k} = [\begin{array}{l} β_{11} (k) & β_{12} (k) \\ β_{21} (k) & β_{22} (k) \end{array}]$

It can be found that y_1t and y_2t each have two impulse response functions describing how they are affected by the shocks of error terms ε_1t and ε_2t, respectively. This shows the necessity of the error terms ε_1t and ε_2t being uncorrelated, and assuming that they are correlated, when ε_1t is perturbed, ε_2t may also be perturbed at some time, and y_1t is not affected by the ε_1t front coefficients. Since only the error terms of the structural equations are uncorrelated, it is necessary to convert from the induced equations to the structural equations before performing the impulse response analysis. It can be further deduced that for the n variable first-order lag VAR model, it has n² impulse response functions. For the multi-order lagged VAR model, its moving average form can also be obtained by factorization.

4.4

Variance decomposition

The so-called variance decomposition, which is actually a decomposition of the variance of the prediction error of a VAR model, aims to characterize the contribution of each random perturbation to the effect of the dependent variable [26]. Taking the n-variable first-order lagged VAR model as an example, the generalized formula for the expression of the ind variable can be obtained by analogy from the previous section: 24 $y_{i t} = \sum_{p = 1}^{n} (\sum_{k = 0} β_{i p} (k) ε_{p, t - k})$

n of the error terms ε_pt(p = 1, 2, …, n) are uncorrelated with each other and correspond to a variance of $σ_{p}^{2}$ , as can be deduced: 25 $V a r (y_{i t}) = \sum_{p = 1}^{n} (\sum_{k = 0} {(β_{i p} (k))}^{2} σ_{p}^{2})$

That is, the error of y_it can be decomposed into n uncorrelated effects, and in order to determine the magnitude of the contribution of each perturbation term to it, the relative variance contribution of the effect of the prd perturbation term on the ith variable is defined: 26 $R V C_{p \to i} = \frac{\sum_{k = 0} {(β_{i p} (k))}^{2} σ_{p}^{2}}{\sum_{p = 1}^{n} (\sum_{k = 0} {(β_{i p} (k))}^{2} σ_{p}^{2})}$

5

Study on the time-series relationship between the impact of tournaments and revenue growth

In the previous chapter, this paper proposed a time series forecasting model for event impact and revenue growth, which will be used in this chapter to analyze the dynamic relationship between event impact and revenue growth variables in time series data in the context of the sports industry.

There are also relatively more indicators to measure the influence of sports industry events. The commonly used indicators include event attention, professionalism, contribution, economic benefits, and so on. In this paper, we will choose “Tournament Economic Benefit (TYC)” as a parameter indicator of the impact of the event, which mainly refers to the final results of the impact of the sports events organized by all resident units of a country (or region) in a certain period of time. The statistics of GDP in the sports industry are conducted year by year, and they have continuity, which can objectively reflect the overall operation and development of the sports industry economy in recent years. Therefore, “GDP of sports industry” is adopted as an indicator to measure the growth of sports industry revenue.

In conclusion, this chapter mainly analyzes the relationship between the influence of events and revenue growth of the sports industry, and chooses “economic benefits of events (TYC)” and “gross domestic product (GDP) of the sports industry” as the analytical variables, and the sample interval is 2013-2023. The sample period is 2013-2023, and the data are mainly obtained from China Statistical Yearbook, China Tertiary Industry Statistical Bulletin, and Sports Industry Statistical Bulletin of State General Administration of Sports.

5.1

Parameter Estimation of a VAR Model of Race Impact and Revenue Growth

Since “Tournament Economic Benefits (TYC)” and “Gross Domestic Product (GDP)” are both time series, there may be heteroskedastic effects. Therefore, prior to the empirical validation of the VAR model, the natural logarithm of the variables is taken to preprocess “TYC” and “GDP”, which is denoted as “lnTYC”, “lnGDP”. This treatment does not change the covariance of the original variables, can linearize the time series, and can eliminate the effects of heteroskedasticity.

5.1.1

Stability tests

To prevent “tournament economic benefits (TYC)” and “sports industry gross domestic product (GDP)” two time series variables to establish a VAR model when the “pseudo-regression” phenomenon. Need to carry out the unit root test, this paper adopts the ADF method to test the smoothness of the results as shown in Table 2. From the test results, it can be seen that the ADF test values of the series lnTYC, lnGDP are greater than the critical value at the 10% confidence level, and there is a unit root in the two series, which is a non-stationary series. First-order differencing of the two sequences respectively, the ADF test values are still greater than the critical value at the 10% confidence level, and the first-order differencing sequences ΔlnTYC, ΔlnGDP (Δ is the first-order differencing operator) are non-stationary sequences. After the second-order differencing, the ADF test values are still all less than the critical value at the 1% confidence level, and the second-order differencing series Δ²lnTYC, Δ²lnGDP are smooth series. It can be seen that the sequences lnTYC and lnGDP become smooth sequences after second-order differencing and satisfy the smoothness condition.

Table 2.

Stability result

Variable	ADF test value	Test type (c,t,k)	T statistic			P	Stability
Variable	ADF test value	Test type (c,t,k)	1%Critical value	5%Critical value	10%Critical value	P	Stability
LnGDP	-0.32084	c,0,1	-5.29544	-4.0082	-3.46078	P>0.1	nonstationary
LnTYC	9.401743	c,0,1	-2.81664	-1.98227	-1.6012	P>0.1	nonstationary
ΔlnGDP	-1.04696	c,0,1	-2.84732	-1.9883	-1.60021	P>0.1	nonstationary
ΔlnTYC	0.208104	c,0,1	-2.84724	-1.98812	-1.60022	P>0.1	nonstationary
Δ²lnGDP	-3.53499	c,0,1	-2.93721	-2.00628	-1.59815	P<0.01	stationary
Δ²lnTYC	-3.00523	c,0,1	-2.88611	-1.99586	-1.59906	P<0.01	stationary

Δ represents one order difference sequence.

Δ² represents the second order difference sequence.

c is the intercept term.

t is the trend term.

k is the hysteresis.

5.1.2

Johansen cointegration test

From the ADF test can be seen, the sequence of lnTYC, lnGDP two series itself does not have smoothness, but a certain linear combination between them is likely to be smooth, this linear combination reflects the possible existence of a long-term stable relationship between the series, called the “co-integration relationship (equilibrium relationship)”. To further clarify the long-standing cointegration relationship, a cointegration test is required. The results of the cointegration test are specifically shown in Table 3. As can be seen from Table 3, the original hypothesis None indicates that there is no cointegration relationship between the series lnTYC and lnGDP, and the value of the trace test statistic under this hypothesis is 25.60196, which is greater than the critical value of the 5% confidence interval of 14.2635 (P<0.01), and it is considered that there is at least one cointegration relationship; the original hypothesis At most 1 indicates that the series lnTYC and lnGDP have at most one cointegration relationship, and the value of the trace test statistic under this hypothesis is 0.008918, which is smaller than the critical value of the 5% confidence interval. Cointegration relationship, the value of trace test statistic under this hypothesis 0.008918, less than the critical value of 5% confidence interval 14.2635 (P>0.05), accept the original hypothesis. The results of the cointegration equation fitting index show that the cointegration equation fits well and can reflect the long-term equilibrium relationship between the two, and there is a positive correlation between the economic benefits of the event (TYC) and the gross domestic product (GDP) of the sports industry.

Table 3.

Results of the cointegral test

Original hypothesis	Eigenvalue	Trace survey	Race test critical value (5%)	P	Conclusion
None	0.941735	25.60196	14.2635	0.0007	Reject
At most 1	0.000987	0.008918	3.841477	0.9245	Acceptance

5.1.3

Vector Error Modeling Tests

Although there is a cointegration relationship between the series lnTYC and lnGDP, i.e., there is a long-run stable equilibrium relationship between them. However, in the short term, it may be affected by a variety of factors, resulting in the cointegration relationship deviating from the equilibrium path. Therefore, an error correction model (VEC) can be constructed on the basis of the previous cointegration analysis to link the short-term dynamic relationship between the two with the long-term equilibrium relationship. When the short-term fluctuation of the series is large, the effect of convergence of the cointegration relationship can be achieved by restricting the behavior of the endogenous variables. The test results of the serial vector error model are specifically shown in Table 4. In the VEC model, the coefficients of ΔlnTYC and ΔlnTYC_t-1 are -0.206788 and -0.529791, respectively, which shows that there is a negative correlation between tournament economic benefits (TYC) and the gross product of the sports industry (GDP) in the short term, and this effect is opposite to the long term equilibrium of the positive correlation between the two, which indicates that in the short term there are large fluctuations in the TYC and GDP The interaction between TYC and GDP in the short term is characterized by large fluctuations. In order to maintain the long-term positive correlation equilibrium effect between the two, the error correction model shows that the current period with -1.74408 times the strength of the previous period of the non-equilibrium (deviation) between the variables to adjust the state, pulling it back to the long-term equilibrium state.

Table 4.

Vector error

Variable	Coefficient estimate	Standard deviation	T statistic	P
ΔlnTYC	-0.206788	0.113925	-1.81445	0.1427
C	0.168401	0.038592	4.364773	0.013
ΔlnGDP_t-1	0.933958	0.169932	5.495774	0.0052
ΔlnTYC_t-1	-0.529791	0.181398	-2.92057	0.0435
ECM_t-1	-1.74408	0.253961	-6.86723	0.0026

5.2

Analysis of the relationship between race impact and revenue growth

5.2.1

Impulse Response Analysis

The impulse response function can analyze the impact on the current and future values of other endogenous variables when a one-unit perturbation shock is applied to one endogenous variable, and can vividly show the dynamic relationship between the variables interacting with each other. In this study, impulse response is used to analyze the dynamic relationship between the two variables of sports event influence and revenue growth, as shown in Figure 2. Figure (a) shows the impact of tournament influence (TYC) on earnings growth (GDP), while Figure (b) shows the impact of earnings growth (GDP) on tournament influence (TYC).

From Figure (a), it can be seen that when the influence of sports events is impacted for the first time, the short-term economic earnings will grow; when impacted for a long time, the positive impact of the influence of the event on the growth of economic earnings remains relatively stable, and with the prolongation of the impact period of the positive impact of the slow weakening. It can be seen that the development of the influence of the event can promote economic growth, with the passage of time the influence of the body event on the growth of economic income continues to reduce the impact. In order to promote the stable growth of the economy through sports, it is necessary to combine the development of special sports, create sports projects with regional characteristics, actively build the sports-related industrial base, and constantly inject new vitality into the sports industry to maintain the sustainable development of the sports industry.

From the figure (b), the economic revenue growth of a unit standard deviation shock, the impact of sports events made a positive response to this shock, the short-term development of sports undertakings rise; sustained input standard deviation shock, the peak of the response reached in the three period, with the passage of time, the response is slowly reduced, but the degree of response is still maintained at a high level. It shows that the growth of economic revenue can promote the development of the impact of the tournament. The impact effect is more significant, and the trend of the impact is slowly decreasing. It can be concluded that, in order to make the benign development of sports, the government needs to increase the financial investment in sports, rational planning of financial resources, and financial investment should have a stable cycle.

5.2.2

Granger causality test

Granger causality test measures whether a given set of series is exogenous to another set of series and can analyze the statistically significant causal relationship between variables.The results of Granger causality test of TYC and GDP are shown in Table 5. As can be seen from the table, the empirical results of the Granger causality test between tournament influence and earnings growth, when tournament influence is the dependent variable, the p-value of earnings growth is 0.0488, which is less than 0.05, indicating that earnings growth is the Granger cause of the development of tournament influence. When earnings growth is the dependent variable, the p-value of tournament influence is 0.122, which is greater than 0.05. This indicates that tournament influence is not the cause of Granger’s earnings growth. Therefore, there is a causal relationship between tournament influence and earnings growth that is unidirectional.

Table 5.

Results of the granger causality test

Dependent variable	Exogenous variable	Chi-sq statistical value	P
TYC	GDP	3.845082	0.0488**
TYC	ALL	3.845082	0.0488**
GDP	TYC	2.256788	0.122
GDP	ALL	2.256788	0.122

5.2.3

Variance decomposition

On the basis of impulse response analysis and Granger causality test, the degree of contribution between the variables and the variables to themselves is further investigated through variance decomposition.The results of the variance decomposition between TYC and GDP are specifically shown in Table 6. As can be seen from the table, the results of the variance decomposition between tournament influence and revenue growth for the first 10 periods. The first period has the biggest contribution of tournament influence to itself, at 100%, but it gradually decreases to 93.77% in the tenth period. The contribution of earnings growth to the first period of tournament influence is close to zero, but then it continues to increase to 6.23% in the 10th period. The contribution of earnings growth to its own first period was 99.81% and then continued to decrease to 76.69% in the 10th period. The contribution of tournament influence to revenue growth is 0.19% in the first period and then it keeps growing and grows to 23.31% in the 10th period. It can be concluded that tournament influence and revenue growth have the greatest contribution to themselves and a lesser contribution to each other.

Table 6.

Variance decomposition

Period	Variance decomposition of TYC(%)		Variance decomposition of GDP(%)
Period	TYC	GDP	GDP	TYC
1	100	0	99.81126	0.18874
2	94.38029	5.61971	88.85596	11.14404
3	94.44463	5.55537	83.70355	16.29645
4	93.84035	6.15965	81.0841	18.9159
5	93.94914	6.05086	79.55577	20.44423
6	93.82881	6.17119	78.56684	21.43316
7	93.826	6.174	77.87858	22.12142
8	93.81337	6.18663	77.37445	22.62555
9	93.78918	6.21082	76.99104	23.00896
10	93.77399	6.22601	76.6908	23.3092

6

Conclusion

This paper adopts the quarterly year-on-year data of the economic indices of the sporting goods manufacturing industry and the sports service industry from the 1st quarter of 2006 to the 2nd quarter of 2023, and uses the economic return measurement and forecasting model ARIMA proposed in this paper to measure and forecast the economic returns of the sports industry in China. The “sharp peaks and thick tails” distribution of the time series of the sports economic income index is extremely significant, which leads to the conclusion that the time series of the economic income index does not obey the normal distribution. Based on the results of the economic revenue measurement of the sports industry, a time series prediction model was constructed to further investigate the relationship between the revenue growth of the sports industry and the influence of events. The VAR model is used to preprocess the “Tournament Impact (TYC)” and “Gain Growth (GDP)”, and the series lnTYC and lnGDP become smooth after the second-order differencing, which meets the smoothness test, and the positive correlation between TYC and GDP is verified in Johansen’s cointegration test. The positive correlation between TYC and GDP is verified in the Johansen cointegration test. The vector error model test shows that there is a negative correlation between TYC and GDP in the short run, and the interaction between them also has a large fluctuation with -1.74408 times the strength of the long-run equilibrium state. Impulse response, Granger causality test, and variance decomposition methods are used to analyze the relationship between TYC and GDP. In the impulse response analysis, both TYC and GDP show positive response results when they are subjected to the first shock, and if they are subjected to the shock in the long run, the mutual positive effects of both TYC and GDP can remain in a relatively stable state, but slowly decrease with the prolongation of the shock period. The causal relationship between TYC and GDP is tested, and P<0.05 for GDP when TYC is the dependent variable, i.e., GDP is the Granger cause of TYC. Whereas, P>0.05 for TYC when GDP is the dependent variable, tournament influence is not a Granger cause of earnings growth. This indicates that there is a one-way causal relationship between the two variables. On the basis of the impulse response and Granger causality test analysis, the final variance decomposition analysis was conducted. Both tournament influence and earnings growth have the highest degree of contribution to themselves, and even in the tenth period the contribution to themselves can still reach 93.77% and 99.81%, while the contribution to each other is lower.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

Construction of Time Series Prediction Models for Event Influence and Revenue Growth in Sports Industry

Xiaolu Li

Yuze Gao

Renfei Li

Published Online: Mar 21, 2025

Received: Oct 18, 2024

Accepted: Feb 10, 2025

DOI: https://doi.org/10.2478/amns-2025-0569

KeywordsARIMA model, Time series forecasting model, Impulse response function, Variance decomposition, Sports industry

© 2025 Xiaolu Li et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
ARIMA model, Time series forecasting model, Impulse response function, Variance decomposition, Sports industry