Construction of Time Series Prediction Models for Event Influence and Revenue Growth in Sports Industry
Data publikacji: 21 mar 2025
Otrzymano: 18 paź 2024
Przyjęty: 10 lut 2025
DOI: https://doi.org/10.2478/amns-2025-0569
Słowa kluczowe
© 2025 Xiaolu Li et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The influence of events in the sports industry covers a wide range of economic, social, cultural, urban image and environmental aspects, and is of great significance to the organizing country or city, as well as to the participants and spectators [1-3]. The organization of sports events usually attracts a large number of spectators and media attention, increases tourism revenue and media exposure, and also drives the development of hotels, restaurants, retail and other services, bringing great economic benefits to the local economy [4-5]. The impact and revenue of organizing these sports events can be predicted based on past data and current environment. And time series forecasting model plays a good role at this moment.
Time series refers to the sequence formed by arranging the values of a variable at different times in chronological order, and its time unit can be minutes, hours, days, weeks, decades, months, quarters, years, etc. [6]. Time series forecasting model is a kind of statistical model used to analyze and forecast time series data, its essence is the use of time series to build a mathematical model, it is mainly used for short-term forecasting of the future, belongs to the trend forecasting method [7-9]. Among them, the data used for model construction are called time series data, and they are a common type of data in many real-world problems, such as sales data, stock prices, temperature changes, social reactions, etc [10]. Common time series forecasting models include moving average model, autoregressive model, ARIMA model and LSTM model, which are widespread [11-14]. And nowadays, there are many global sports events with high participation of all people, for this reason, it is very necessary to construct a model specializing in events in the sports industry.
Several factors need to be considered to construct a suitable time series forecasting model, including data characteristics, model complexity, accuracy and so on. Constructing a suitable model according to the actual situation can lead to better prediction results.
At present, the most common research on the prediction of events in the sports industry is the prediction of the results of the game, while the influence and revenue prediction of the sports event itself is less. For example, literature [15] used a time series model to predict the results of soccer matches in the national league, and literature [16] used an LSTM model to predict the state of a player’s training and the peak of his execution ability in most cases, so as to formulate individual and team training plans for coaches and players. Whereas, literature [17] used different modeling of Autoregressive Integrated Moving Average (ARIMA) and Recurrent Neural Networks (RNNs) to predict and analyze players’ behaviors across seasons and teams, which assisted the management well. A sporting event is broadly predictable in terms of heat and revenue, and literature [18] provides profitability on market odds by calibrating existing expert and experimental information through time series model predictions to further reduce uncertainty due to expert judgment bias. Literature [19] used real-time and historical data on the relative popularity of search terms provided by Google Trends to evaluate the influence of sports leagues and identified future popularity trends through three models: trend plus seasonal regression, Holt-Winter Multiplier Method (HWMM), and Seasonal Autoregressive Integrated Moving Average (SARIMA). Based on this trend, companies or organizers can place relevant advertisements to attract potential consumers during that period. The trend prediction provided by these types of models is convenient for advertisers, companies, and merchants of the organizing venue. In addition, literature [20] also mentions that the use of robust predictive models for time series analysis has helped event managers to make informed decisions and predictions about the success and profitability of the sports industry, enhancing the technical competitiveness of the organizers.
Before carrying out the research on the relationship between the influence of sports events and revenue growth in the sports industry, this paper first constructs the economic revenue measurement and forecasting model ARIMA, conducts the unit root test on the time series, and transforms the non-stationary series into a stationary series through the difference to realize the measurement and prediction of the economic revenue growth of the sports industry. On the basis of the results of economic revenue measurement, the relationship between revenue growth and the influence of sports events is further explored, and an improved time series prediction model SVAR model is built. Relying on the framework of the VAR model, the OLS is used to estimate the induced equations without bias, estimate the parameters of the induced equations, and convert the VAR model to reach the induced equations to the structural equations, and propose variable analysis methods such as variable impulse response and variance decomposition based on the structural equation conversion. Taking the data of China’s sports industry from 2013 to 2023 as sample data, the relationship between event influence and revenue growth is discussed and analyzed with the help of the time series prediction model of event influence and revenue growth constructed in this paper.
Traditional econometric methods are based on economic theory to describe models of variable relationships. However, economic theory is usually insufficient to provide a rigorous description of the dynamic linkages between variables, and endogenous variables can appear at both the left and right ends of the equation, complicating estimation and inference. To address these issues, an unstructured approach to modeling the relationships between variables has emerged, such as vector autoregressive (VAR) and vector error correction (VEC) models.
In classical regression modeling, the main focus is on regression analysis to establish a functional relationship (causality) between different variables in order to examine the connections between things. It is important to discuss how time series data itself can be used to build models in order to examine the laws of the development of things themselves and to make predictions about the future development of things accordingly. The significance of studying time series data: In reality, it is often necessary to study the pattern of development of a certain thing over time. This requires the study of the historical record of the past development of the thing in order to obtain the law of its own development. In reality, many issues exist, such as interest rate fluctuations, changes in yields, and reflected stock market conditions of various indices. can usually be expressed as time series data, through the study of these data, to find the pattern of change of these economic variables (for some variables, affecting the development of too many factors, or the main impact of the variables of the data is difficult to collect, so it is difficult to establish a regression model to find out the development of its changes) Law, at this time, the time series analysis model shows its advantage, because this kind of model does not need to establish the causality model, only need the data of the variable itself can be modeled), such a modeling method belongs to the research category of time series analysis. In time series analysis,
Autoregressive process Describes the relationship between the current value and the historical value, using the historical time data of the variable itself to predict itself, the autoregressive model must satisfy the requirement of smoothness, the
The formula expansion is shown in (2):
If the random perturbation term is a white noise
As you can see from the formula, the current value is predicted from the historical values, Moving Average Process The moving average model is concerned with the accumulation of the error terms in the autoregressive model, and in the
where The autoregressive model
In this chapter, using the ARIMA-based econometric model constructed in the previous section and the forecasting methodology, the quarterly year-on-year data of China’s economic indices for the period from Q1 2006 to Q2 2023 for the sporting goods manufacturing industry such as sporting goods and sports facilities production, and the sports services industry such as the provision of sports services, sports media and information services, and sports tourism and recreation, are used to specifically measure the economic returns of China’s The economic benefits of the sports industry in China are measured by the quarterly year-on-year economic index data. The volatility time series of the economic income index of China’s sports industry is further calculated to systematically and comprehensively analyze the dynamic process of the volatility time series of China’s sports economic income. The time-varying paths of the time series of the economic return indices of China’s sporting goods manufacturing industry and sports service industry are specifically shown in Figure 1, with Figures (a) and (b) representing the sporting goods manufacturing industry and the sports service industry, respectively.

Sports industry
From the figure (a) can be intuitively found, China sporting goods manufacturing industry economic return index time series in this study selected data time range fluctuation is huge, the economic return index in 2009 had reached the peak, and then fell back to the lowest in 2013, followed by a rapid rise. However, since 2015, the economic gain index time series has shown a slow decline and reached a trough position in 2020. Subsequently, from 2021 to present, the time series of the economic return index for the sporting goods manufacturing industry once again shows a slow upward momentum.
Through Figure (b), it can be found that the time series of the economic return index of the sports services industry, such as the provision of sports services, sports media and information services, sports tourism and entertainment, also fluctuates greatly in the time range of the data selected for the study of this paper, and the economic return index had peaked in 2012, also fell back to the lowest in 2015, and then rose rapidly, rising to the maximum value in 2016. However, since 2016, the economic gain index time series also shows a slow decline and small fluctuations, and again shows a small peak in 2021. Subsequently, the time series of the economic return index of China’s sports service industry from 2021 to the present also shows a slow upward momentum.
The estimation results of the descriptive statistics of the time series of the sports industry economic income index are shown in Table 1. When examining the estimation results of skewness and kurtosis statistics, it can be easily seen that the distribution characteristics of the time series of sports economic income index “sharp peaks and thick tails” are extremely significant, and when observing the results of the estimation of the statistics of the J-B normal test and the results of the probability of the P-value, it can be concluded that the time series of the economic income index does not obey the normal distribution, which is consistent with the specific distribution characteristics of the time series mentioned earlier. When we look at the estimated J-B normal test statistic and the probability P-value results, we can conclude that the time series of the economic return index does not obey the normal distribution, which is consistent with the specific distributional characteristics of the time series mentioned earlier.
Statistical estimates
Industry | Sequence | Quantity of samples | Degree of bias | Kurtosis | J-B Normal test | |
---|---|---|---|---|---|---|
J-B statistic | Probability P | |||||
Sporting goods manufacturing industry | Sporting goods | 80 | 0.3291 | 4.7819 | 10.5215 | 0.0053 |
Sports facilities | 80 | 0.6313 | 4.5542 | 11.691 | 0.0028 | |
Sports service Industry | Sports service | 80 | 0.9757 | 4.7898 | 20.455 | 0.0000 |
Sports media and information services | 80 | -1.6485 | 9.5655 | 157.4709 | 0.0000 | |
Sports and entertainment | 80 | 0.7149 | 3.8629 | 8.1432 | 0.0172 |
The time series forecasting model (VAR) can describe the dynamic relationship between multiple variables, which is just suitable for analyzing the interaction between event influence and revenue growth in the sports industry in the same framework in this paper [22]. Yet general VAR models are unable to capture the contemporaneous correlation of variables, and hence inferences about macroeconomic structure, since the latter necessarily involves a distinction between correlation and causation. The SVAR model is improved on this basis, which can extract the contemporaneous correlations originally hidden between the error terms, and give certain economic meanings by imposing constraints, from which the dynamic impacts of stochastic perturbations on the variable system can be further analyzed [23].
The so-called VAR model is actually a set of equations describing the interrelationships among multiple time series, which is categorized into structural and induced forms based on whether the right side of the equation contains other variables contemporaneous with the dependent variable on the left. The two forms can be converted to each other, taking the bivariate first-order lagged VAR model as an example, its structural form is given first:
Where
If there is no contemporaneous effect on both sides of the equation, the above problem will not occur, thus an attempt is made to eliminate the contemporaneous variables on both sides of the equation by some transformation. Firstly, equation (6) is rewritten in matrix form:
The first term on the left-hand side of Eq. (8) is the identification matrix
Substituting into equation (9) gives:
There is no longer a contemporaneous influence on both sides of the above joint equation equation, called the induced set of equations, which can be estimated unbiased using OLS. The key to obtain the coefficients of the induced equations to obtain the coefficients of the structural equations is to take
By assumption, the error terms in the system of structural equations are uncorrelated, therefore:
where
From the previous subsection, it is clear that the key to converting from induced VAR to structural VAR is to find
Let
Expressed as
From equation (8):
It follows that
Also, the symmetric positive definite matrix can be decomposed as:
where
The so-called impulse response function is actually a function that describes a one-unit change in the current error term
where
Introducing the lag operator:
Multiply both sides by (
By mathematical derivation it can be found that multiplying
This i.e. the moving average form of the VAR model is expanded as:
where
It can be found that
The so-called variance decomposition, which is actually a decomposition of the variance of the prediction error of a VAR model, aims to characterize the contribution of each random perturbation to the effect of the dependent variable [26]. Taking the
That is, the error of
In the previous chapter, this paper proposed a time series forecasting model for event impact and revenue growth, which will be used in this chapter to analyze the dynamic relationship between event impact and revenue growth variables in time series data in the context of the sports industry.
There are also relatively more indicators to measure the influence of sports industry events. The commonly used indicators include event attention, professionalism, contribution, economic benefits, and so on. In this paper, we will choose “Tournament Economic Benefit (TYC)” as a parameter indicator of the impact of the event, which mainly refers to the final results of the impact of the sports events organized by all resident units of a country (or region) in a certain period of time. The statistics of GDP in the sports industry are conducted year by year, and they have continuity, which can objectively reflect the overall operation and development of the sports industry economy in recent years. Therefore, “GDP of sports industry” is adopted as an indicator to measure the growth of sports industry revenue.
In conclusion, this chapter mainly analyzes the relationship between the influence of events and revenue growth of the sports industry, and chooses “economic benefits of events (TYC)” and “gross domestic product (GDP) of the sports industry” as the analytical variables, and the sample interval is 2013-2023. The sample period is 2013-2023, and the data are mainly obtained from China Statistical Yearbook, China Tertiary Industry Statistical Bulletin, and Sports Industry Statistical Bulletin of State General Administration of Sports.
Since “Tournament Economic Benefits (TYC)” and “Gross Domestic Product (GDP)” are both time series, there may be heteroskedastic effects. Therefore, prior to the empirical validation of the VAR model, the natural logarithm of the variables is taken to preprocess “TYC” and “GDP”, which is denoted as “lnTYC”, “lnGDP”. This treatment does not change the covariance of the original variables, can linearize the time series, and can eliminate the effects of heteroskedasticity.
To prevent “tournament economic benefits (TYC)” and “sports industry gross domestic product (GDP)” two time series variables to establish a VAR model when the “pseudo-regression” phenomenon. Need to carry out the unit root test, this paper adopts the ADF method to test the smoothness of the results as shown in Table 2. From the test results, it can be seen that the ADF test values of the series lnTYC, lnGDP are greater than the critical value at the 10% confidence level, and there is a unit root in the two series, which is a non-stationary series. First-order differencing of the two sequences respectively, the ADF test values are still greater than the critical value at the 10% confidence level, and the first-order differencing sequences ΔlnTYC, ΔlnGDP (Δ is the first-order differencing operator) are non-stationary sequences. After the second-order differencing, the ADF test values are still all less than the critical value at the 1% confidence level, and the second-order differencing series Δ2lnTYC, Δ2lnGDP are smooth series. It can be seen that the sequences lnTYC and lnGDP become smooth sequences after second-order differencing and satisfy the smoothness condition.
Stability result
Variable | ADF test value | Test type (c,t,k) | T statistic | P | Stability | ||
---|---|---|---|---|---|---|---|
1%Critical value | 5%Critical value | 10%Critical value | |||||
LnGDP | -0.32084 | c,0,1 | -5.29544 | -4.0082 | -3.46078 | P>0.1 | nonstationary |
LnTYC | 9.401743 | c,0,1 | -2.81664 | -1.98227 | -1.6012 | P>0.1 | nonstationary |
ΔlnGDP | -1.04696 | c,0,1 | -2.84732 | -1.9883 | -1.60021 | P>0.1 | nonstationary |
ΔlnTYC | 0.208104 | c,0,1 | -2.84724 | -1.98812 | -1.60022 | P>0.1 | nonstationary |
Δ2lnGDP | -3.53499 | c,0,1 | -2.93721 | -2.00628 | -1.59815 | P<0.01 | stationary |
Δ2lnTYC | -3.00523 | c,0,1 | -2.88611 | -1.99586 | -1.59906 | P<0.01 | stationary |
Δ represents one order difference sequence.
Δ2 represents the second order difference sequence.
c is the intercept term.
t is the trend term.
k is the hysteresis.
From the ADF test can be seen, the sequence of lnTYC, lnGDP two series itself does not have smoothness, but a certain linear combination between them is likely to be smooth, this linear combination reflects the possible existence of a long-term stable relationship between the series, called the “co-integration relationship (equilibrium relationship)”. To further clarify the long-standing cointegration relationship, a cointegration test is required. The results of the cointegration test are specifically shown in Table 3. As can be seen from Table 3, the original hypothesis None indicates that there is no cointegration relationship between the series lnTYC and lnGDP, and the value of the trace test statistic under this hypothesis is 25.60196, which is greater than the critical value of the 5% confidence interval of 14.2635 (P<0.01), and it is considered that there is at least one cointegration relationship; the original hypothesis At most 1 indicates that the series lnTYC and lnGDP have at most one cointegration relationship, and the value of the trace test statistic under this hypothesis is 0.008918, which is smaller than the critical value of the 5% confidence interval. Cointegration relationship, the value of trace test statistic under this hypothesis 0.008918, less than the critical value of 5% confidence interval 14.2635 (P>0.05), accept the original hypothesis. The results of the cointegration equation fitting index show that the cointegration equation fits well and can reflect the long-term equilibrium relationship between the two, and there is a positive correlation between the economic benefits of the event (TYC) and the gross domestic product (GDP) of the sports industry.
Results of the cointegral test
Original hypothesis | Eigenvalue | Trace survey | Race test critical value (5%) | P | Conclusion |
---|---|---|---|---|---|
None | 0.941735 | 25.60196 | 14.2635 | 0.0007 | Reject |
At most 1 | 0.000987 | 0.008918 | 3.841477 | 0.9245 | Acceptance |
Although there is a cointegration relationship between the series lnTYC and lnGDP, i.e., there is a long-run stable equilibrium relationship between them. However, in the short term, it may be affected by a variety of factors, resulting in the cointegration relationship deviating from the equilibrium path. Therefore, an error correction model (VEC) can be constructed on the basis of the previous cointegration analysis to link the short-term dynamic relationship between the two with the long-term equilibrium relationship. When the short-term fluctuation of the series is large, the effect of convergence of the cointegration relationship can be achieved by restricting the behavior of the endogenous variables. The test results of the serial vector error model are specifically shown in Table 4. In the VEC model, the coefficients of ΔlnTYC and ΔlnTYCt-1 are -0.206788 and -0.529791, respectively, which shows that there is a negative correlation between tournament economic benefits (TYC) and the gross product of the sports industry (GDP) in the short term, and this effect is opposite to the long term equilibrium of the positive correlation between the two, which indicates that in the short term there are large fluctuations in the TYC and GDP The interaction between TYC and GDP in the short term is characterized by large fluctuations. In order to maintain the long-term positive correlation equilibrium effect between the two, the error correction model shows that the current period with -1.74408 times the strength of the previous period of the non-equilibrium (deviation) between the variables to adjust the state, pulling it back to the long-term equilibrium state.
Vector error
Variable | Coefficient estimate | Standard deviation | T statistic | P |
---|---|---|---|---|
ΔlnTYC | -0.206788 | 0.113925 | -1.81445 | 0.1427 |
C | 0.168401 | 0.038592 | 4.364773 | 0.013 |
ΔlnGDPt-1 | 0.933958 | 0.169932 | 5.495774 | 0.0052 |
ΔlnTYCt-1 | -0.529791 | 0.181398 | -2.92057 | 0.0435 |
ECMt-1 | -1.74408 | 0.253961 | -6.86723 | 0.0026 |
The impulse response function can analyze the impact on the current and future values of other endogenous variables when a one-unit perturbation shock is applied to one endogenous variable, and can vividly show the dynamic relationship between the variables interacting with each other. In this study, impulse response is used to analyze the dynamic relationship between the two variables of sports event influence and revenue growth, as shown in Figure 2. Figure (a) shows the impact of tournament influence (TYC) on earnings growth (GDP), while Figure (b) shows the impact of earnings growth (GDP) on tournament influence (TYC).

Pulse response analysis
From Figure (a), it can be seen that when the influence of sports events is impacted for the first time, the short-term economic earnings will grow; when impacted for a long time, the positive impact of the influence of the event on the growth of economic earnings remains relatively stable, and with the prolongation of the impact period of the positive impact of the slow weakening. It can be seen that the development of the influence of the event can promote economic growth, with the passage of time the influence of the body event on the growth of economic income continues to reduce the impact. In order to promote the stable growth of the economy through sports, it is necessary to combine the development of special sports, create sports projects with regional characteristics, actively build the sports-related industrial base, and constantly inject new vitality into the sports industry to maintain the sustainable development of the sports industry.
From the figure (b), the economic revenue growth of a unit standard deviation shock, the impact of sports events made a positive response to this shock, the short-term development of sports undertakings rise; sustained input standard deviation shock, the peak of the response reached in the three period, with the passage of time, the response is slowly reduced, but the degree of response is still maintained at a high level. It shows that the growth of economic revenue can promote the development of the impact of the tournament. The impact effect is more significant, and the trend of the impact is slowly decreasing. It can be concluded that, in order to make the benign development of sports, the government needs to increase the financial investment in sports, rational planning of financial resources, and financial investment should have a stable cycle.
Granger causality test measures whether a given set of series is exogenous to another set of series and can analyze the statistically significant causal relationship between variables.The results of Granger causality test of TYC and GDP are shown in Table 5. As can be seen from the table, the empirical results of the Granger causality test between tournament influence and earnings growth, when tournament influence is the dependent variable, the p-value of earnings growth is 0.0488, which is less than 0.05, indicating that earnings growth is the Granger cause of the development of tournament influence. When earnings growth is the dependent variable, the p-value of tournament influence is 0.122, which is greater than 0.05. This indicates that tournament influence is not the cause of Granger’s earnings growth. Therefore, there is a causal relationship between tournament influence and earnings growth that is unidirectional.
Results of the granger causality test
Dependent variable | Exogenous variable | Chi-sq statistical value | P |
---|---|---|---|
TYC | GDP | 3.845082 | 0.0488** |
ALL | 3.845082 | 0.0488** | |
GDP | TYC | 2.256788 | 0.122 |
ALL | 2.256788 | 0.122 |
On the basis of impulse response analysis and Granger causality test, the degree of contribution between the variables and the variables to themselves is further investigated through variance decomposition.The results of the variance decomposition between TYC and GDP are specifically shown in Table 6. As can be seen from the table, the results of the variance decomposition between tournament influence and revenue growth for the first 10 periods. The first period has the biggest contribution of tournament influence to itself, at 100%, but it gradually decreases to 93.77% in the tenth period. The contribution of earnings growth to the first period of tournament influence is close to zero, but then it continues to increase to 6.23% in the 10th period. The contribution of earnings growth to its own first period was 99.81% and then continued to decrease to 76.69% in the 10th period. The contribution of tournament influence to revenue growth is 0.19% in the first period and then it keeps growing and grows to 23.31% in the 10th period. It can be concluded that tournament influence and revenue growth have the greatest contribution to themselves and a lesser contribution to each other.
Variance decomposition
Period | Variance decomposition of TYC(%) | Variance decomposition of GDP(%) | ||
---|---|---|---|---|
TYC | GDP | GDP | TYC | |
1 | 100 | 0 | 99.81126 | 0.18874 |
2 | 94.38029 | 5.61971 | 88.85596 | 11.14404 |
3 | 94.44463 | 5.55537 | 83.70355 | 16.29645 |
4 | 93.84035 | 6.15965 | 81.0841 | 18.9159 |
5 | 93.94914 | 6.05086 | 79.55577 | 20.44423 |
6 | 93.82881 | 6.17119 | 78.56684 | 21.43316 |
7 | 93.826 | 6.174 | 77.87858 | 22.12142 |
8 | 93.81337 | 6.18663 | 77.37445 | 22.62555 |
9 | 93.78918 | 6.21082 | 76.99104 | 23.00896 |
10 | 93.77399 | 6.22601 | 76.6908 | 23.3092 |
This paper adopts the quarterly year-on-year data of the economic indices of the sporting goods manufacturing industry and the sports service industry from the 1st quarter of 2006 to the 2nd quarter of 2023, and uses the economic return measurement and forecasting model ARIMA proposed in this paper to measure and forecast the economic returns of the sports industry in China. The “sharp peaks and thick tails” distribution of the time series of the sports economic income index is extremely significant, which leads to the conclusion that the time series of the economic income index does not obey the normal distribution. Based on the results of the economic revenue measurement of the sports industry, a time series prediction model was constructed to further investigate the relationship between the revenue growth of the sports industry and the influence of events. The VAR model is used to preprocess the “Tournament Impact (TYC)” and “Gain Growth (GDP)”, and the series lnTYC and lnGDP become smooth after the second-order differencing, which meets the smoothness test, and the positive correlation between TYC and GDP is verified in Johansen’s cointegration test. The positive correlation between TYC and GDP is verified in the Johansen cointegration test. The vector error model test shows that there is a negative correlation between TYC and GDP in the short run, and the interaction between them also has a large fluctuation with -1.74408 times the strength of the long-run equilibrium state. Impulse response, Granger causality test, and variance decomposition methods are used to analyze the relationship between TYC and GDP. In the impulse response analysis, both TYC and GDP show positive response results when they are subjected to the first shock, and if they are subjected to the shock in the long run, the mutual positive effects of both TYC and GDP can remain in a relatively stable state, but slowly decrease with the prolongation of the shock period. The causal relationship between TYC and GDP is tested, and P<0.05 for GDP when TYC is the dependent variable, i.e., GDP is the Granger cause of TYC. Whereas, P>0.05 for TYC when GDP is the dependent variable, tournament influence is not a Granger cause of earnings growth. This indicates that there is a one-way causal relationship between the two variables. On the basis of the impulse response and Granger causality test analysis, the final variance decomposition analysis was conducted. Both tournament influence and earnings growth have the highest degree of contribution to themselves, and even in the tenth period the contribution to themselves can still reach 93.77% and 99.81%, while the contribution to each other is lower.