Application and Innovation in Financial Market Risk Management Based on Big Data and Artificial Intelligence

In the past thirty years, due to the profound influence of economic globalization and financial integration, the global financial market has developed rapidly, while the volatility of the global financial market has become more and more intense, enterprises, financial institutions, ordinary investors are facing unprecedented financial risks. The occurrence of financial risk not only seriously affects the normal operation of enterprises and financial institutions and the survival of individuals, but also causes serious harm to the national and even global financial markets and economic health and stability [1-4].

With the progress of the times, people’s investment philosophy has changed, no longer satisfied with the way to obtain income only through labor, the financial market in a very short period of time has undergone a radical change. The financial market has changed drastically in a short period of time. In this regard, a variety of trading markets have been created, including the stock exchange market, the financial currency market, and the debt market, and the total number of transactions per day can be as high as hundreds of millions of dollars [5-7]. Therefore, there is a need for their risk assessment and professional management.

Risk management is basically defined as a process of decision making by social organizations or individuals in order to reduce the negative consequences of risks [8-10]. This process includes the following steps: risk identification, risk assessment, risk evaluation, selection of risk management techniques and evaluation of risk management effectiveness [11-12]. The purpose of risk management is to obtain the highest security within the expected range at the lowest cost [13-15]. The core and foundation of financial market risk management is the measurement of risk, with the increase in the complexity of financial markets and the scale of financial asset transactions, the development of financial theories and financial instruments, financial market risk measurement techniques have become more intricate and complex [16-17]. In this complex situation, financial institutions and ordinary investors will consider the balance between return and risk, and the correct choice of risk control indicators to measure the risk in financial markets has become the key to research.

The development of science and technology has long been acting in the financial field, especially in the financial market. On the one hand, the visible and available data are collected through big data, and then their data are analyzed with artificial intelligence technology to assess, predict, and search for risk reduction directions in the financial market [18]. Literature [19] used artificial intelligence to assess the trust risk of loans in the financing market in the financial market, and the analysis of publicly available data found that this assessment can increase the probability of obtaining a loan for those who do not have the privilege to do so. This is also a closer assessment to the civilian financial market, which is a major benefit for both banks and lenders. Literature [20] describes that internal corporate risk management is significantly correlated with the value added of audits, internal auditors and internal auditing, from which some regulations can be targeted to help companies improve their risk management processes, giving a reference role to financial market risk management as well. In addition, literature [21] assessed the role of science and technology such as big data and artificial intelligence in the financial sector, realizing fintech, which reduces the risk of commercial banks and improves profitability and financial innovation. Literature [22] analyzed the commercially available literature using hybrid analysis, and the results showed that the application of artificial intelligence is necessary for financial market participants to learn, and can assist regulators such as governments and trading practitioners to understand the risks, fraud detection, and credit scoring in the financial market, and to better conduct transactions. In addition, several studies have shown that gold is a hedge in times of stress [23-24]. Therefore, literature [25] established an artificial neural network-based intelligent model to predict gold price fluctuations, which provides a reference for the decision-making of investors in the financial market. Literature [26] describes the use of AI analysis in the field of cryptocurrency trading to address its cybersecurity, price trend prediction, and volatility prediction. Literature [27] summarizes the application of AI to the stock market in recent years for price prediction, sentiment analysis, portfolio optimization, etc., which laterally reduces the risk of market trading. Literature [28] analyzed sentiment indicators related to financial market data based on algorithms and found that these indicators may be effective for predicting financial risks and issuing warnings. Literature [29] constructed a model for volatility indicators that measure market risk under high-frequency data, and analyzed the predictions that can be derived from short-term volatility to provide assistance to various participants and monitors in the financial market.

In this paper, financial market risk management starts from two aspects of risk prediction and risk assessment, in the financial market risk prediction, the main three models are used, respectively, GARCH model, LSTM network and ARFIMA model, and then the three models are linked together to form a hybrid ARFIMA-GARCH-LSTM model. For financial market risk assessment, a VAR model is used to analyze the relative level of financial market risk by measuring stock returns over a given period. Relevant indicators for risk prediction are introduced to empirically analyze the ARFIMA-GARCH-LSTM hybrid model, which is found to be better at predicting the status of financial market risks. And then, the evaluation effect of the VAR model is verified in combination with the return of SSE 50.

2

Theories and methods of risk management in intelligent financial markets

2.1

Stock market volatility

Generally speaking, market volatility means fluctuation within a certain reasonable range, and fluctuation beyond that range is something that needs to be taken seriously.

It has always been the case that negative news in the stock market has a stronger impact on stock market volatility than the equivalent positive news. In addition, the sequence has a common characteristic, which is often mentioned as “long memory”. That is, the sequence before and after the correlation characteristics. 1)

Historical Volatility

This calculation method mainly reflects the future volatility based on historical volatility, but also in the process of daily use of an easier method, and also the earliest contact with a method. Generally speaking, we choose the historical volatility to measure volatility is enough, such as this paper’s heteroskedasticity model is based on the historical volatility model. Its formula is as follows: (1) $σ^{2} = \frac{1}{t - 1} \sum_{i = 1}^{t} {(R_{i} - \bar{R})}^{2}$ $${\sigma ^2} = \frac{1}{{t - 1}}\sum\limits_{i = 1}^t {{{({R_i} - \bar R)}^2}}$$

In the above equation, σ² is the volatility, R_i represents the return on the asset, and $\bar{R}$ $$\bar R$$ refers to the average over t days. The empirical part later in this paper also utilizes this form to measure volatility. 2)

Implied volatility

Next we will introduce another measure, we can also be based on the option pricing model to reverse the derivation, and thus get the implied volatility, which is often used in the options trading market a method.

The volatility can be derived from the following option pricing model, keeping the basic assumptions constant σ²: (2) $C = S N (d_{1}) - K e^{- Γ (T - t)} N (d_{2})$ $$C = SN({d_1}) - K{e^{ - \Gamma (T - t)}}N({d_2})$$ (3) $d_{1} = \frac{\ln (\frac{S}{K}) + (r + \frac{1}{2} σ^{2}) (T - t)}{σ \sqrt{T - t}}$ $${d_1} = \frac{{\ln \left( {\frac{S}{K}} \right) + (r + \frac{1}{2}{\sigma ^2})(T - t)}}{{\sigma \sqrt {T - t} }}$$ (4) $d_{2} = d_{1} - \sqrt{T - t}$ $${d_2} = {d_1} - \sqrt {T - t}$$

Symbols denote: C – option spot price, K – option strike price, S – asset spot price, T – expiration date, and r – risk-free rate.

2.2

Development and broadening of GARCH models

The idea underlying the ARCH model is that, based on the information currently available, the noise generation at a given time fits a normal distribution with a constant mean value attributed to zero.

Let y_t be the dependent variable and the expression of the model, conditional on the set of information Ω_t−1 available at moment t, is as follows:

Mean value equation: (5) $y_{t} = E (y_{t} | Ω_{t - 1}) + ε_{t}, ε_{t} ~ N (0, σ_{t}^{2})$ $${y_t} = E(\:{y_t}|{\Omega _{t - 1}}) + {\varepsilon _t},\:{\varepsilon _t}\sim N(0,\sigma _t^2)$$

Variance equations: (6) $σ_{t}^{2} = α_{0} + \sum_{i = 1}^{p} α_{i} ε_{t - i}^{2} + \sum_{i = 1}^{q} β_{i} σ_{t - i}^{2}$ $$\sigma _t^2 = {\alpha _0} + \sum\limits_{i = 1}^p {{\alpha _i}} \:\varepsilon _{t - i}^2 + \sum\limits_{i = 1}^q {{\beta _i}} \:\sigma _{t - i}^2$$

where α₀ > 0, α_i ≥ 0(i = 1, 2, 3…p), β_i ≥ 0(i = 1, 2, 3…p). When q = 0, GARCH(p, q) is then transformed into a general ARCH(p). E(y_t|Ω_t−1) is the conditional expectation of the time series data. ε_t denotes the residual term. $σ_{t}^{2}$ $$\sigma _t^2$$ denotes the variance of the residual term based on the t moment.

The first-order GARCH model, which is used more often in the research process, has the following form: (7) $σ_{t}^{2} = α_{0} + α_{1} ε_{t - 1}^{2} + β_{1} σ_{t - 1}^{2}$ $$\sigma _t^2 = {\alpha _0} + {\alpha _1}\varepsilon _{t - 1}^2 + {\beta _1}\sigma _{t - 1}^2$$

Therefore, the GARCH model [30] can more accurately control the amount of analysis of the ARCH model, and at a relatively low order, it can be more ideal to express the high-order ARCH model. Even so, the shortcomings of the original model have not been completely optimized in the GARCH.

In order to be able to describe the asymmetric effects in financial markets and to compensate for the shortcomings of the GARCH model, another race model is proposed, which can correctly distinguish between positive and negative shocks to conditional volatility.

Mean Value Equation: (8) $y_{t} = E (y_{t} | Ω_{t - 1}) + ε_{t}, ε_{t} ~ N (0, σ_{t}^{2})$ $${y_t} = E(\:{y_t}|{\Omega _{t - 1}}) + {\varepsilon _t},\:{\varepsilon _t} \sim N(0,\sigma _t^2)$$

Variance equations: (9) $σ_{t}^{2} = α_{0} + \sum_{i = 1}^{p} α_{i} \cdot ε_{t - i}^{2} + \sum_{i = 1}^{r} γ_{i} ε_{t - i}^{2} d_{t - i} + \sum_{i = 1}^{q} β_{i} σ_{t - i}^{2}$ $$\sigma _t^2 = {\alpha _0} + \sum\limits_{i = 1}^p {{\alpha _i}} \cdot \varepsilon _{t - i}^2 + \sum\limits_{i = 1}^r {{\gamma _i}} \:\varepsilon _{t - i}^2{d_{t - i}} + \sum\limits_{i = 1}^q {{\beta _i}} \:\sigma _{t - i}^2$$

Where d_t−i is specified as a dummy variable, where $\sum_{i = 1}^{r} γ_{i} ε_{t - i}^{2} d_{t - i}$ $$\sum\limits_{i = 1}^r {{\gamma _i}} \varepsilon _{t - i}^2{d_{t - i}}$$ is also known as the TGARCH term.

The difference between this model and the GARCH model is that this model has an additional term, which is called the asymmetric term. At $Σ_{i = 1}^{r} γ_{i} > 0$ $$\Sigma _{i = 1}^r{\gamma _i} > 0$$, “bad news” has a stronger impact on serial volatility, and at $\sum_{i = 1}^{r} γ_{i} < 0$ $$\sum\limits_{i = 1}^r {{\gamma _i}} < 0$$, “good news” has a stronger impact.

The first-order TGARCH model is more commonly used in practice and takes the following form: (10) $σ_{t}^{2} = α_{0} + α_{1} ε_{t - 1}^{2} + γ_{1} ε_{t - 1}^{2} d_{t - 1} + β_{1} σ_{t - 1}^{2}$ $$\sigma _t^2 = {\alpha _0} + {\alpha _1}\varepsilon _{t - 1}^2 + {\gamma _1}\varepsilon _{t - 1}^2{d_{t - 1}} + {\beta _1}\sigma _{t - 1}^2$$

$d_{t} = {\begin{array}{l} 0, & ε_{t} \geq 0 \\ 1, & ε_{t} < 0 \end{array}$ $${d_t} = \left\{ {\begin{array}{*{20}{l}} {0,}&{{\varepsilon _t} \geq 0} \\ {1,}&{{\varepsilon _t} < 0} \end{array}} \right.$$, ε_t ≥ 0 means “good news”, ε_t < 0 means “bad news”.

When ε_t > 0, there is a α₁-fold shock to financial market volatility; when ε_t < 0, there is a α₁ + γ₁-fold shock to financial market volatility.

Another asymmetric model is the EGARCH model [31], the basic expression of this model is shown below:

Mean value equation: (11) $y_{t} = E (y_{t} | Ω_{t - 1}) + ε_{t}, ε_{t} ~ N (0, σ_{t}^{2})$ $${y_t} = E(\:{y_t}|{\Omega _{t - 1}}) + {\varepsilon _t},\:{\varepsilon _t} \sim N(0,\sigma _t^2)$$

Variance equations: (12) $\ln (σ_{t}^{2}) = α_{0} + \sum_{i = 1}^{p} α_{i} | \frac{ε_{t - i}}{σ_{t - i}} - E (\frac{ε_{t - i}}{σ_{t - i}}) | + \sum_{i = 1}^{r} γ_{i} \frac{ε_{t - i}}{σ_{t - i}} + \sum_{i = 1}^{q} β_{i} \ln (σ_{t - i}^{2})$ $$\ln (\sigma _t^2) = {\alpha _0} + \sum\limits_{i = 1}^p {{\alpha _i}} \left| {\frac{{{\varepsilon _{t - i}}}}{{{\sigma _{t - i}}}} - E(\frac{{{\varepsilon _{t - i}}}}{{{\sigma _{t - i}}}})} \right| + \sum\limits_{i = 1}^r {{\gamma _i}} \frac{{{\varepsilon _{t - i}}}}{{{\sigma _{t - i}}}} + \sum\limits_{i = 1}^q {{\beta _i}} \ln (\sigma _{t - i}^2)$$

There is another, more commonly used form of the above equation, on which the rest of this paper is based: (13) $\ln (σ_{t}^{2}) = α_{0} + \sum_{i = 1}^{p} α_{i} \frac{| ε_{t - i} |}{σ_{t - i}} + \sum_{i = 1}^{r} γ_{i} \frac{ε_{t - i}}{σ_{t - i}} + \sum_{i = 1}^{q} β_{i} \ln (σ_{t - i}^{2})$ $$\ln (\sigma _t^2) = {\alpha _0} + \sum\limits_{i = 1}^p {{\alpha _i}} \frac{{|{\varepsilon _{t - i}}|}}{{{\sigma _{t - i}}}} + \sum\limits_{i = 1}^r {{\gamma _i}} \:\frac{{{\varepsilon _{t - i}}}}{{{\sigma _{t - i}}}} + \sum\limits_{i = 1}^q {{\beta _i}} \ln (\sigma _{t - i}^2)$$

The sign of $\ln (σ_{t}^{2})$ $$\ln (\sigma _t^2)$$ in the above model is not restricted, it can be either positive or negative. $\sum_{i = 1}^{r} γ_{i} \frac{ε_{t - i}}{σ_{t - i}}$ $$\sum\limits_{i = 1}^r {{\gamma _i}} \frac{{{\varepsilon _{t - i}}}}{{{\sigma _{t - i}}}}$$ is the EGARCH term, if $Σ_{i = 1}^{r} γ_{i} \frac{ε_{t - i}}{σ_{t - i}} < 0$ $$\Sigma _{i = 1}^r{\gamma _i}\frac{{{\varepsilon _{t - i}}}}{{{\sigma _{t - i}}}} < 0$$, the “bad news” on the series fluctuations bring a greater impact; if $\sum_{i = 1}^{r} γ_{i} \frac{ε_{t - i}}{σ_{t - i}} > 0$ $$\sum\limits_{i = 1}^r {{\gamma _i}} \frac{{{\varepsilon _{t - i}}}}{{{\sigma _{t - i}}}} > 0$$, the effect of good news is greater.

In practice, the most commonly used is the first-order EGARCH model, whose variance equation takes the following form: (14) $L n (σ_{t}^{2}) = α_{0} + α_{1} \frac{| ε_{t - 1} |}{σ_{t - 1}} + γ_{1} \frac{ε_{t - 1}}{σ_{t - 1}} + β_{1} L n (σ_{t - 1}^{2})$ $$Ln(\sigma _t^2) = {\alpha _0} + {\alpha _1}\:\frac{{|{\varepsilon _{t - 1}}|}}{{{\sigma _{t - 1}}}} + {\gamma _1}\:\frac{{{\varepsilon _{t - 1}}}}{{{\sigma _{t - 1}}}} + {\beta _1}Ln(\sigma _{t - 1}^2)$$

Observing the above equation, the conditional variance above is taking an exponential form. In fact, ε_t−1 ≥ 0, in the variance equation, gives a α₁ + γ₁-fold shock effect to the time series, while ε_t−1 < 0, also produces a α₁ + (−1)γ₁-fold shock.

Based on the distributional characteristics of previous series, the normal distribution is not applicable in the rest of this paper. Therefore this paper additionally introduces two forms of distributions that are highly utilized. 1)

GED distribution

The GED distribution is widely used in practical research, and it can describe the asymmetric effects of time series. The probability density function of this distribution is given below: (15) $f (ε | μ, σ, γ) = \frac{γ}{2 σ Γ (\frac{1}{γ})} \exp {- {⌈ \frac{ε - μ}{σ} ⌉}^{γ}}$ $$f(\varepsilon |\mu ,\sigma ,\gamma ) = \frac{\gamma }{{2\sigma \Gamma (\frac{1}{\gamma })}}\exp \left\{ { - {{\left\lceil {\frac{{\varepsilon - \mu }}{\sigma }} \right\rceil }^\gamma }} \right\}$$

The random variable ε obeys the GED distribution, abbreviated as ε ~ GED(μ, σ, γ). The parameter γ refers to the tail thickness indicator, whose value determines the characteristics of the GED distribution. 2)

Student t distribution

The Student t distribution, also abbreviated as t distribution, can also be used to represent sequences with sharp peaks and thick tails.

Its probability density formula is given below: (16) $f_{t} (x, n) = \frac{Γ (\frac{n + 1}{2})}{\sqrt{n π} Γ (\frac{n}{2})} {(1 + \frac{x^{2}}{n})}^{- \frac{n + 1}{2}}, x \in (- \infty, \infty)$ $${f_t}(x,n) = \frac{{\Gamma (\frac{{n + 1}}{2})}}{{\sqrt {n\pi } \Gamma (\frac{n}{2})}}{(1 + \frac{{{x^2}}}{n})^{ - \frac{{n + 1}}{2}}},x \in ( - \infty ,\infty )$$

Let X₁, X₂ be mutually independent random variables, X₁ ~ N(0, 1), X₂ ~ χ²(n), then $T = \frac{X_{1}}{\sqrt{\frac{x_{2}}{n}}}$ $$T = \frac{{{X_1}}}{{\sqrt {\frac{{{x_2}}}{n}} }}$$ is a t-distribution with degrees of freedom n (n ≥ 2 and n are integers), abbreviated as T ~ t(n).

2.3

LSTM deep neural network

When there are more dependencies between the various layers of the model, the neural network will have such a situation as gradient disappearance. Scholars have proposed an improved model based on recurrent neural networks a Long Short-Term Memory Model (LSTM for short). Compared to the original model, it adds the important part of memory storage, which leads to more accurate learning of long-term dependencies between inputs and outputs.

LSTM [32] replaces the activation function layer in the original neural network with a long and short-term memory unit, which contains an input gate, a forget gate and an output gate.

The passes through the input gates are given, resulting in the output information scaled as: (17) $a_{i}^{t} = \sum_{i = 1}^{r} W_{i i} X_{i}^{t} + \sum_{h = 1}^{H} W_{h i} b_{h}^{t - 1} + \sum_{c = 1}^{c} W_{c i} S_{c}^{t - 1}$ $$a_i^t = \sum\limits_{i = 1}^r {{W_{ii}}} X_i^t + \sum\limits_{h = 1}^H {{W_{hi}}} b_h^{t - 1} + \sum\limits_{c = 1}^c {{W_{ci}}} S_c^{t - 1}$$ (18) $b_{i}^{t} = f (a_{i}^{t})$ $$b_i^t = f(a_i^t)$$

Similarly, the output information of the output gate is proportional: (19) $a_{ω}^{t} = \sum_{i = 1}^{r} W_{i ω} X_{i}^{t} + \sum_{h = 1}^{H} W_{h ω} b_{h}^{t - 1} + \sum_{c = 1}^{c} W_{c ω} S_{c}^{t - 1}$ $$a_\omega ^t = \sum\limits_{i = 1}^r {{W_{i\omega }}} X_i^t + \sum\limits_{h = 1}^H {{W_{h\omega }}} \:b_h^{t - 1} + \sum\limits_{c = 1}^c {{W_{c\omega }}} S_c^{t - 1}$$ (20) $b_{ω}^{t} = f (a_{ω}^{t})$ $$b_\omega ^t = f(a_\omega ^t)$$

The output message ratio of the forget gate is: (21) $a_{ϕ}^{t} = \sum_{i = 1}^{r} W_{i ϕ} X_{i}^{t} + \sum_{h = 1}^{H} W_{h ϕ} b_{h}^{t - 1} + \sum_{c = 1}^{c} W_{c ϕ} S_{c}^{t - 1}$ $$a_\phi ^t = \sum\limits_{i = 1}^r {{W_{i\phi }}} X_i^t + \sum\limits_{h = 1}^H {{W_{h\phi }}} \:b_h^{t - 1} + \sum\limits_{c = 1}^c {{W_{c\phi }}} \:S_c^{t - 1}$$ (22) $b_{ϕ}^{t} = f (a_{ϕ}^{t})$ $$b_\phi ^t = f(a_\phi ^t)$$

According to the update formula for each gate above, the input information of the memory storage block can be obtained: (23) $a_{c}^{t} = \sum_{i = 1}^{r} W_{i c} X_{i}^{t} + \sum_{h = 1}^{H} W_{h c} b_{h}^{t - 1}$ $$a_c^t = \sum\limits_{i = 1}^r {{W_{ic}}} \:X_i^t + \sum\limits_{h = 1}^H {{W_{hc}}} \:b_h^{t - 1}$$

Then the memory unit filters the information of the history state by virtue of the forget gate and retains some of the information: (24) $S_{c}^{t} = b_{ϕ}^{t} S_{c}^{t - 1} + b_{i}^{t} g (a_{c}^{t})$ $$S_c^t = b_\phi ^tS_c^{t - 1} + b_i^tg(a_c^t)$$

After the above filtering, the output information obtained by the memory unit is represented as: (25) $b_{c}^{t} = b_{ϕ}^{t} h (S_{c}^{t})$ $$b_c^t = b_\phi ^th(S_c^t)$$

In the above equation, g and h are the excitation functions.

Since the direction of LSTM updating information is opposite to the direction of propagation, the LSTM neural network must propagate backward along the time so as to update and calculate the corresponding weights. In this process, the following iterative formula is given: (26) $δ_{j}^{t} = \frac{\partial L}{\partial a_{j}^{t}} ε_{c}^{t} = \frac{\partial L}{\partial b_{c}^{t}} ε_{s}^{t} = \frac{\partial L}{\partial s_{c}^{t}}$ $$\delta _j^t = \frac{{\partial L}}{{\partial a_j^t}}\quad \varepsilon _c^t = \frac{{\partial L}}{{\partial b_c^t}}\quad \varepsilon _s^t = \frac{{\partial L}}{{\partial s_c^t}}$$

In the above equation, $δ_{j}^{t}$ $$\delta _j^t$$ denotes the error signal of node j at moment t, similarly $ε_{ξ}^{t}$ $$\varepsilon _\xi ^t$$ and $ε_{s}^{t}$ $$\varepsilon _s^t$$ denote the error signals of the output and hold values of the memory cell respectively, and L denotes the error function.

Finally we get the output error of the memory cell and the output gate: (27) $ε_{c}^{t} = \frac{\partial L}{\partial b_{c}^{t}} = \sum_{k = 1}^{K} \frac{\partial L}{\partial a_{k}^{t}} \frac{\partial a_{k}^{t}}{\partial b_{c}^{t}} + \sum_{h = 1}^{H} \frac{\partial L}{\partial a_{h}^{t + 1}} \frac{\partial a_{h}^{t + 1}}{\partial b_{c}^{t}} = \sum_{k = 1}^{K} δ_{k}^{t} ω_{c k} + \sum_{h = 1}^{H} δ_{h}^{t} ω_{c k}$ $$\varepsilon _c^t = \frac{{\partial L}}{{\partial b_c^t}} = \sum\limits_{k = 1}^K {\frac{{\partial L}}{{\partial a_k^t}}} \frac{{\partial a_k^t}}{{\partial b_c^t}} + \sum\limits_{h = 1}^H {\frac{{\partial L}}{{\partial a_h^{t + 1}}}} \frac{{\partial a_h^{t + 1}}}{{\partial b_c^t}} = \sum\limits_{k = 1}^K {\delta _k^t} \:{\omega _{ck}} + \sum\limits_{h = 1}^H {\delta _h^t} \:{\omega _{ck}}$$ (28) $δ_{ω}^{t} = \frac{\partial L}{\partial a_{w}^{t}} = \sum_{k = 1}^{κ} \frac{\partial L}{\partial b_{w}^{t}} \frac{\partial b_{w}^{t}}{\partial a_{w}^{t}} = \frac{\partial b_{w}^{t}}{\partial a_{w}^{t}} \sum_{c = 1}^{c} \frac{\partial L}{\partial b_{c}^{t}} \frac{\partial b_{c}^{t}}{\partial b_{c}^{t}} = f^{'} (a_{w}^{t}) \sum_{c = 1}^{c} ε_{c}^{t} h (s_{c}^{t})$ $$\delta _\omega ^t = \frac{{\partial L}}{{\partial a_w^t}} = \sum\limits_{k = 1}^\kappa {\frac{{\partial L}}{{\partial b_w^t}}} \frac{{\partial b_w^t}}{{\partial a_w^t}} = \frac{{\partial b_w^t}}{{\partial a_w^t}}\sum\limits_{c = 1}^c {\frac{{\partial L}}{{\partial b_c^t}}} \frac{{\partial b_c^t}}{{\partial b_c^t}} = {f^\prime }(a_w^t)\sum\limits_{c = 1}^c {\varepsilon _c^t} h(s_c^t)$$

From the above formulas, we further derive more detailed iterative formulas for the error signals of the memory cells: (29) $ε_{s}^{t} = \frac{\partial L}{\partial s_{c}^{t}} = ε_{c}^{t} b_{w}^{t} h (s_{c}^{t}) + ω_{c ω} δ_{ω}^{t} + b_{ϕ}^{t} ε_{s}^{t + 1} + ω_{c ϕ} δ_{ϕ}^{t + 1} + ω_{c i} δ_{i}^{t + 1}$ $$\varepsilon _s^t = \frac{{\partial L}}{{\partial s_c^t}} = \varepsilon _c^tb_w^th(s_c^t) + {\omega _{c\omega }}\delta _\omega ^t + b_\phi ^t\varepsilon _s^{t + 1} + {\omega _{c\phi }}\delta _\phi ^{t + 1} + {\omega _{ci}}\delta _i^{t + 1}$$ (30) $δ_{c}^{t} = \frac{\partial L}{\partial a_{c}^{t}} = \frac{\partial L}{\partial s_{c}^{t}} \frac{\partial s_{c}^{t}}{\partial a_{c}^{t}} = ε_{s}^{t} b_{L}^{t} g^{'} (a_{c}^{t})$ $$\delta _c^t = \frac{{\partial L}}{{\partial a_c^t}} = \frac{{\partial L}}{{\partial s_c^t}}\frac{{\partial s_c^t}}{{\partial a_c^t}} = \varepsilon _s^t\:b_L^t{g^\prime }(a_c^t)$$

Based on the above iterations, the detailed update equations for the error signals of the forget gate and the input gate can be obtained in the same way: (31) $δ_{ϕ}^{t} = \frac{\partial L}{\partial a_{ϕ}^{t}} = \frac{\partial L}{\partial b_{ϕ}^{t}} \frac{\partial b_{ϕ}^{t}}{\partial a_{ϕ}^{t}} = \frac{\partial b_{ϕ}^{t}}{\partial a_{ϕ}^{t}} \sum_{c = 1}^{c} \frac{\partial L}{\partial s_{c}^{t}} \frac{\partial s_{c}^{t}}{\partial b_{ϕ}^{t}} = f^{'} (a_{ϕ}^{t}) \sum_{c = 1}^{c} ε_{s}^{t} s_{c}^{t - 1}$ $$\delta _\phi ^t = \frac{{\partial L}}{{\partial a_\phi ^t}} = \frac{{\partial L}}{{\partial b_\phi ^t}}\frac{{\partial b_\phi ^t}}{{\partial a_\phi ^t}} = \frac{{\partial b_\phi ^t}}{{\partial a_\phi ^t}}\sum\limits_{c = 1}^c {\frac{{\partial L}}{{\partial s_c^t}}} \frac{{\partial s_c^t}}{{\partial b_\phi ^t}} = {f^\prime }(a_\phi ^t)\sum\limits_{c = 1}^c {\varepsilon _s^t} \:s_c^{t - 1}$$ (32) $δ_{i}^{t} = \frac{\partial L}{\partial a_{i}^{t}} = \frac{\partial L}{\partial b_{i}^{t}} \frac{\partial b_{i}^{t}}{\partial a_{i}^{t}} = \frac{\partial b_{i}^{t}}{\partial a_{i}^{t}} \sum_{c = 1}^{c} \frac{\partial L}{\partial s_{c}^{t}} \frac{\partial s_{c}^{t}}{\partial b_{i}^{t}} = f^{'} (a_{i}^{t}) \sum_{c = 1}^{c} ε_{s}^{t} g (a_{c}^{t})$ $$\delta _i^t = \frac{{\partial L}}{{\partial a_i^t}} = \frac{{\partial L}}{{\partial b_i^t}}\frac{{\partial b_i^t}}{{\partial a_i^t}} = \frac{{\partial b_i^t}}{{\partial a_i^t}}\sum\limits_{c = 1}^c {\frac{{\partial L}}{{\partial s_c^t}}} \frac{{\partial s_c^t}}{{\partial b_i^t}} = {f^\prime }(a_i^t)\sum\limits_{c = 1}^c {\varepsilon _s^t} \:g(a_c^t)$$

Finally, the error signal is calculated based on the above obtained error signal and the weights are also updated: (33) $ω_{i j} = ω_{i j} - η \nabla L (ω_{i j})$ $${\omega _{ij}} = {\omega _{ij}} - \eta \nabla L({\omega _{ij}})$$

Among them, $\nabla L (ω_{i j}) = \frac{\partial L}{\partial ω_{i j}} = \frac{\partial L}{\partial α_{j}^{t}} \frac{\partial α_{j}^{t}}{\partial ω_{i j}} = δ_{j}^{t} b_{j}^{t}$ $$\nabla L({\omega _{ij}}) = \frac{{\partial L}}{{\partial {\omega _{ij}}}} = \frac{{\partial L}}{{\partial \alpha _j^t}}\frac{{\partial \alpha _j^t}}{{\partial {\omega _{ij}}}} = \delta _j^tb_j^t$$.

What is obtained in summary is the detailed process of training, propagation, and derivation of the LSTM.

2.4

ARFIMA model

The ARMA model deals with short correlation in the series through p-order and q-order autoregression and moving average. From the definition of the ARFIMA model and the specific derivation process, the ARFIMA model is able to consider both the short and long memory of the time series, and it is more suitable for fitting the series with long memory.

Long memory focuses on exploring the continuous dependence of time series fluctuations, which is built on analyzing the sub-sequences in the long-distance time period, and has become an indispensable and important factor in financial time series forecasting modeling.

Definition 1: If there is a time series {X_t} with a knd order autocorrelation function of ρ_k, if ρ_k satisfies the following relation: (34) $\lim_{n \to \infty} \sum_{k = - n}^{n} | ρ_{k} | \to \infty$ $$\mathop {\lim }\limits_{n \to \infty } \sum\limits_{k = - n}^n {|{\rho _k}| \to \infty }$$

Then {X_t} is said to have long memory.

Definition 2: If the time series {X_t} is smooth and when 0 < d < 0.5 the autocorrelation function ρ_k and the lag order k satisfy the following relation: (35) $ρ_{k} ~ c k^{2 d - 1}, k \to \infty$ $${\rho _k} \sim c{k^{2d - 1}},\:k \to \infty$$

Then {X_t} is said to have long memory, and as k increases ρ_k declines slowly.

Hurst exponent H is used to portray long memory.Peters further elaborated the rescaled polar analysis (R/S) to estimate the Hurst exponent:

Let the time series {X_t} be of length N and divide it equally into A sub-intervals of length n to find N = A · n. (36) $X_{t, a} = \sum_{u = 1}^{t} (x_{u, a} - M_{a}), t = 1, 2 \dots, n$ $${X_{t,a}} = \sum\limits_{u = 1}^t {({x_{u,a}} - {M_a})} ,t = 1,2 \cdots ,n$$

where X_u,a is the urd value in the and sub-interval and M_a is the mean value of the interval. (37) $R_{a} = \max (X_{t, a} - \min (X_{t, a})) S_{a} = \sqrt{\frac{1}{n} \sum_{u = 1}^{n} {(x_{u, a} - M_{a})}^{2}} Q_{a} = \frac{R_{a}}{S_{a}}$ $${R_a} = \max ({X_{t,a}} - \min ({X_{t,a}})){S_a} = \sqrt {\frac{1}{n}\sum\limits_{u = 1}^n {{{({x_{u,a}} - {M_a})}^2}} } {Q_a} = \frac{{{R_a}}}{{{S_a}}}$$

At this point the ast sub-interval extreme deviation one statistic Q_a is the quotient of that interval R_a with the standard deviation S_a; taking the mean of the A sub-interval R/S statistic, which is the R/S statistic when the sub-interval length is n, we have: (38) $Q_{n} = \frac{1}{A} \sum_{a = 1}^{A} Q_{a}$ $${Q_n} = \frac{1}{A}\sum\limits_{a = 1}^A {{Q_a}}$$

In the above process, different length subintervals n are chosen to obtain different R/S statistics. Then the relationship between the statistics and the Hurst exponent H is derived H from the relationship between the two obtained by Hurst in his study as: (39) $Q_{n} = K n^{H}$ $${Q_n} = K{n^H}$$

where K is a constant that does not affect H. Since H is a power-indexed operation, it is more difficult to solve H directly, so a linear variation is performed to produce a linear equation about H: (40) $\ln (Q_{n}) = \ln (K) + H ln (n)$ $$\ln ({Q_n}) = \ln (K) + H{\text{ln}}(n)$$

The linear relationship between ln(Q_n) and ln(n) is estimated by the least squares method, and the values of H, H are in the range of 0 to 1 (excluding 0 and 1) The determination of whether the time series has long memory or not is made by the Hurst exponent on the basis of the following: 1)

When 0.5 < H < 1, the time series {X_t} has a long memory, the sequence t + 1 moments may develop in the same direction as the t moments, and when H the closer to 1, t + 1 moments in the same direction of the probability of the development of the stronger memory, and vice versa to be weaker. There is a positive correlation between the values of the sequence.

2)

When H = 0.5, the time series {X_t} is a standard random wandering, the data of each time node are not related to each other, the past does not have an impact on the future, and the sequence has a short memory.

3)

When 0 < H < 0.5, the time series {X_t} is mean-reverting, the model t + 1 moments may evolve in the opposite direction from the t moments, and as the H value gets closer to 0, the greater the probability that the t + 1 moments will evolve in the opposite direction.

After judging that the series has the characteristic of long memory, it can be based on the relationship between the Hurst index H and the fractional difference order d, H = d + 0.5, to find the value of d, and do the fractional difference smoothing analysis.

As the preferred ARMA model for time series modeling, integer order differencing may cause excessive differencing of the series, resulting in the loss of information of the low frequency components of the series, making the model prediction accuracy lower. If {Z_t, t ∈ T} is the dth order difference sequence of {X_t, t ∈ T}, denote X_t ~ I(d). Let the spectral density function of {Z_t, t ∈ T} be f(w), then the expression of the spectral density function of {X_t, t ∈ T} is: (41) $f_{x} (w) = {| \begin{matrix} 1 - e^{- i w} \end{matrix} |}^{- 2 d} f (w), w \neq 0$ $${f_x}(w) = {\left| {\begin{array}{*{20}{c}} {1 - {e^{ - iw}}} \end{array}} \right|^{ - 2d}}f(w),\:w \ne 0$$

Examine whether 1st order differencing causes overdifferentiation: according to Eq. (41) it is known that the product of the original sequence spectral density function f(w) and ${| 1 - e^{- i w} |}^{2}$ $${\left| {1 - {e^{ - iw}}} \right|^2}$$ is the spectral density function of the differenced sequence, which is equivalent in the neighborhood of the origin to multiplying by 2(1 − e^−iw)² cos w). If the sequence {Z_t, t ∈ T} is smooth, there will be: (42) $\lim_{w \to 0} f (w) = c, (c > 0)$ $$\mathop {\lim }\limits_{w \to 0} f(w) = c\:,\:(c > 0)$$

If there is a c = 0, then the integer differencing of {X_t} at this point is overdifferencing. If a sequence that is suitable for fractional order differencing, but integer order differencing, the spectral density function will become f_Δx(w) = [2(1 − cos w)]^−2(1−d)f(w) after 1 order differencing. $\lim_{w \to 0} f_{Δ x} (w) = 0$ $$\mathop {\lim }\limits_{w \to 0} {f_{\Delta x}}(w) = 0$$, then there is an overdifferencing, and the overdifferencing will affect the sequence variance, the following is an example of the MA(1) model.

The correlation function of the smooth sequence can be regarded as a normal distribution with zero mean and standard deviation of $1 / \sqrt{N}$ $$1/\sqrt N$$, then the range of $2 / \sqrt{N}$ $$2/\sqrt N$$ can be set according to the 2-fold standard deviation method, i.e., as long as the autocorrelation function and the bias correlation function converge to this range, it is considered that the value of the function is truncated.

In the modeling can be identified through the autocorrelation and partial white correlation graph order, roughly determine the function value will be in which order after the beginning of gradual convergence to zero, so as to initially determine the model order, i.e., p, q value. Because of the subjectivity of determining the order by looking at the graphs, we can use the information criterion to help us find the relatively optimal combination of orders within a limited range of orders. Commonly used information criteria are AIC, BIC, HQ. (43) $A I C = - 2 \ln (L) + 2 k$ $$AIC = - 2\ln (L) + 2k$$ (44) $B I C = - 2 \ln (L) + \ln (n) k$ $$BIC = - 2\ln (L)\: + \ln (n)k$$ (45) $H Q = - 2 \ln (L) + \ln (\ln (n)) k$ $$HQ = - 2\ln (L) + \:\ln (\ln (n))k$$

Where L is the value of the great likelihood function of the model, n is the sample size, and k is the number of unknown parameters of the model. For all the above three information criteria, the smaller the value of the statistic, the better the model fit, the modeling process of the model is shown in Figure 1.

3

Risk prediction and measurement models

3.1

ARFIMA-GARCH and LSTM based hybrid models

The hybrid approach in series, i.e., the linear combination approach, needs to first decompose the data into two parts, linear and nonlinear, fit the linear component with ARFIMA model, and then process the residual value, i.e., the nonlinear component, with the LSTM model, and the linear combination of the two prediction results, i.e., the final prediction value.

The original data {x_i} is first identified and decomposed into two parts, where L_i represents the linear component of the data and N_i represents the nonlinear component of the data: (46) $x_{i} = L_{i} + N_{i}$ $${x_i} = {L_i} + {N_i}$$

The linear component of the sequence {x_i} was predicted using the linear model ARFIMA-GARCH to obtain the predicted value of the linear part ${\hat{L}}_{i}$ $${\hat L_i}$$, and the residual nonlinear component N_i in the data was calculated based on the linear prediction results. (47) $N_{i} = x_{i} - {\hat{L}}_{i}$ $${N_i} = {x_i} - {\hat L_i}$$

Finally, the LSTM model is used to fit the prediction of the nonlinear component N_i, and the model prediction process is shown in Figure 2. At this point, the linear component of the data has been excluded, and the ability of LSTM to deal with the nonlinear component can be fully utilized to obtain the predicted value of the nonlinear component ${\hat{N}}_{i}$ $${\hat N_i}$$, and then calculate ${\hat{x}}_{i}$ $${\hat x_i}$$: (48) ${\hat{x}}_{i} = {\hat{L}}_{i} + {\hat{N}}_{i}$ $${\hat x_i} = {\hat L_i} + {\hat N_i}$$

According to the different laws of assigning weights, the concatenation method is divided into many kinds, in this paper, the concatenation combination is carried out by the inverse error method.

The data are modeled and predicted using two models, LSTM and ARFIMA-GARCH, respectively, and then the inverse error method is used to assign weights to the two models i.e., give smaller weight to the one with larger error, and give larger weight to the model with smaller error). (49) $L S T M_{w e i g h t} = \frac{A R F I M A_{G} A R C H_{e r r o r}}{L S T M_{e r r o r} + A R F I M A_{G} A R C H_{e r r o r}}$ $$LST{M_{weight}} = \frac{{ARFIM{A_G}ARC{H_{error}}}}{{LST{M_{error}} + ARFIM{A_G}ARC{H_{error}}}}$$ (50) $A R F I M A_{-} G A R C H_{w e i g h t} = \frac{L S T M_{e r r o r}}{L S T M_{e r r o r} + A R F I M A_{-} G A R C H_{e r r o r}}$ $$ARFIM{A_ - }GARC{H_{weight}} = \frac{{LST{M_{error}}}}{{LST{M_{error}} + ARFIM{A_ - }GARC{H_{error}}}}$$ (51) $H y b r i d_{p r e d} = L S T M_{w e i g h t} \times L S T M + A R F I M A_{G} A R C H_{w e i g h t} \times A R F I M A_{G} A R C H$ $$Hybri{d_{pred}} = LST{M_{weight}} \times LSTM + ARFIM{A_G}ARC{H_{weight}} \times ARFIM{A_G}ARCH$$

3.2

VAR model

VaR is short for ValueatRisk, which translates to “Value at Risk” or “Value at Risk”. It is a statistical estimate of the loss that an asset may incur over a holding period due to normal market fluctuations within a certain risk range. From the standpoint of a financial institution, VaR can be defined as the maximum loss that can be incurred by a financial asset with a given probability over a defined period of time. And from a statistical point of view, VaR is defined as the maximum loss in the value of a financial asset or portfolio of securities that could be expected to occur in a given future period under normal market volatility at a given level of probability (confidence level).

We know that future returns can be either positive or negative numbers, and when the return is negative it is what is known as a loss or risk. And when the return is positive it is a profit. If the stochastic return of a financial asset at some time in the future is set to be ΔP, then statistically, the VaR value at confidence level 1 − α (α is the level of significance) has to satisfy the condition: (52) $P {Δ P > V a R} = 1 - α$ $$P\{ \Delta P > VaR\} = 1 - \alpha$$

From the above, it can be seen that according to the definition of VaR, the calculated VaR value is negative, but in practice VaR is usually taken as a positive value, so the mathematical expression for calculating VaR can be expressed as: (53) $P {P_{t} > - V a R_{t}} = 1 - α$ $$P\{ {P_t} > - Va{R_t}\} = 1 - \alpha$$

where P_t is the return of the asset at the t moment, and VaR_t is the VaR at the t moment 1 − α level, which takes on a positive value.

If the returns are treated as inverses, i.e., by ordering R_i = −P_i, then it can be morphed as: (54) $P {R_{t} < V a R_{t}} = 1 - α$ $$P\{ {R_t} < Va{R_t}\} = 1 - \alpha$$

From the expression of VaR, it can be seen that the essence of VaR is the 1 − α quartile of the probability distribution function of R_t. Therefore, the calculation of VaR is essentially an estimation of the 1 − α quartile of the distribution that R_t satisfies.

The value of α is determined by the sensitivity of the financial institution to risk, usually the more sensitive it is to risk, the smaller the corresponding α value is, generally α 0.01 or 0.05 or 0.1.

According to the expression of VaR, the αth quantile of the distribution function F(X) after standardization of the series {R} is X_α, then the general formula of VaR can be obtained: (55) $V a R_{t} = u_{t} + σ_{t} X_{1 - α}$ $$Va{R_t} = {u_t} + {\sigma _t}{X_{1 - \alpha }}$$

where u_i, σ_i is the mean and standard deviation of sequence {R_i} at moment t, respectively.

To estimate the VaR value at the moment t, it is sufficient to know the mean u_i of the sequence at the moment t, the standard deviation σ_t and the 1 − α quantile X_1−α of the distribution function of the sequence {R_t} after standardization.

The VaR calculation method is the core of the whole estimation of the VaR value, and different estimation methods have been produced depending on the starting point.

In order to more accurately portray the mean u_i and standard deviation σ_i of the series, in the process of research on financial time series, scholars found that most financial time series have obvious characteristics such as sharp peaks and thick tails, leverage effect, etc. Therefore, scholars use the family ARCH models (such as family TARCH models, family EGARCH models, etc.), which can portray sharp peaks and thick tails, and leverage effect, to fit the volatility of the time series, so as to get the mean u_i and standard deviation σ_i of the series at different moments from the family ARCH models, and solve the problem of estimating u_i and σ_i. However, for the estimation of X_1−α, it is often assumed that the series R_i obeys the normal distribution, so the 1 − α quantile of the series R_i after standardization is obtained. However, it has been found that many financial time series do not satisfy the normal distribution, so in order to make the estimated VaR more reflective of the real situation of financial time series, it is assumed that the series R_i obeys the t distribution or skewed t distribution, and so on, and a series of parametric methods for estimating the VaR under the assumption of different distributions have been obtained.

In order to avoid making any assumptions about the distribution that R_i obeys, a nonparametric method of estimating VaR has been proposed, the historical simulation method. The historical simulation method assumes that the historical return of an asset is equal in probability in the future, i.e., the return obeys a relatively stable distribution and will not change significantly in the future, then the distribution of the asset’s profit and loss in the future is simulated with historical samples, and the value of VaR at a certain confidence level can be given by using quartiles.

The main steps of the historical simulation method are:

First, a time series of returns over an appropriate time period is selected as historical data.

Then, the selected historical return series are sorted from largest to smallest.

Finally, for a given confidence level, say 95%, find the 95th percentile sample point of the total number of samples, then the value under that sample point is the future VaR value of the asset at the 95% confidence level.

Extreme value theory method is proposed by scholars in the process of continuously improving the calculation method of VaR. Extreme value theory mainly studies the tail characteristics of the distribution of independent and identically distributed sequences, and from the definition of the value at risk VaR can be seen that VaR happens to study the tail characteristics of the distribution of the data, so in the early stage of the study, scholars directly assume that the time series is independently and identically distributed, and apply the extreme value theory to the unprocessed time series, and then calculate the VaR. sequence is not independently and identically distributed, and the residual sequence filtered by the family ARCH model can meet the conditions of independent and identically distributed, so we can first establish the family ARCH model for the original sequence {R_t}, and then use the extreme value theory to describe the residual sequence, and then get the VaR.

If we remember that the residual sequence after establishing a family ARCH model for sequence R_t is e_i.

Cause: (56) $\begin{array}{l} P {R_{t} < V a R_{t}} = P (u_{t} + ε_{t} < V a R_{t}) \\ = P (u_{t} + e_{t} σ_{t} < V a R_{t}) \\ = P (e_{t} < \frac{V a R_{t} - u_{t}}{σ_{t}}) \\ = 1 - α \end{array}$ $$\begin{array} P\{ {R_t} < Va{R_t}\} = P({u_t} + {\varepsilon _t} < Va{R_t}) \\ \qquad = P({u_t} + {e_t}{\sigma _t} < Va{R_t}) \\ \qquad = P({e_t} < \frac{{Va{R_t} - {u_t}}}{{{\sigma _t}}}) \\ \qquad = 1 - \alpha \end{array}$$

To wit: (57) $V a R_{e} = \frac{V a R_{t} - u_{t}}{σ_{t}}$ $$Va{R_e} = \frac{{Va{R_t} - {u_t}}}{{{\sigma _t}}}$$

So the value at risk VaR of sequence {R_i}, and the value at risk VaR of residual sequence {e_i}. Satisfy the following relationship: (58) $V a R_{t} = u_{t} + σ_{t} V a R_{t}$ $${\text{V}}a{R_t} = {u_t} + {\sigma _t}Va{R_t}$$

where u_i, σ_i is the mean and standard deviation of sequence ${R_{i}}$ $$\left\{ {{R_i}} \right\}$$ at moment t, respectively.

4

Empirical analysis of model forecasts of index volatility

4.1

Feature Selection and Data Preprocessing

The empirical research object of this paper is the SSE 50 index and its volatility, and the time range is selected from January 5, 2016 to September 30, 2021. The construction of the model characteristics is divided into four parts, which are the volume and price data, the parameter data of the GARCH family model, the attention index data and the realized volatility data. For the volatility of the SSE 50 index, which is the dependent variable of the model is represented by calculating the realized volatility through the intraday high-frequency data.

4.1.1

Volume and price data regularization

The first part of the feature is the volume and price data of the SSE 50 index, which includes six data: the opening price, the high price, the low price, the closing price, the volume and the turnover of the SSE 50 index. The data comes from the financial data of the MiBasket quantitative platform. The trend of the daily closing price of the SSE 50 Index from January 5, 2016 to September 30, 2021 is shown in Figure 3. The graph visualizes that the SSE 50 Index was highly volatile during the period from 2016 to 2017. Table 1 displays the descriptive statistics of the six volume and price data.

Table 1.

Descriptive statistics of the price data

Variable	Mean	Median	Maximum value	Minimum value
Opening price	2619.061	2642.19	2597.586	2621.28
Maximum price	2614.54	2640.291	2591.675	2623.009
Lowest price	1915.107	1953.378	1874.166	1912.711
Closing price	3494.843	3494.713	3403.695	3458.697
Volume	339.332	341.808	334.741	338.293
Turnover amount	2619.051	2642.2	2597.486	2621.25

4.1.2

GARCH family model parameter estimation

The second part features the parameters of the GARCH family model, and the GARCH family model in this paper includes three types of GARCH model, eGARCH model and tGARCH model, with a total of 11 parameters, including three parameters of the GARCH model, four parameters of the eGARCH model and four parameters of the tGARCH model. Firstly, the daily return series is calculated through the closing price data of the SSE 50 index from January 5, 2016 to September 30, 2021, and the results are shown in Figure 4. It can be intuitively seen that the daily return series of the SSE 50 index is basically smooth and basically symmetrically distributed near the 0-axis, but a smoothness test of the series is needed before modeling the time series data.

The classical test is the unit root test, i.e., ADF test, the null hypothesis (H0) of ADF test is that there is a unit root in the time series under study, and if the statistic of the ADF test is smaller than the corresponding value at the level of significance, then the original hypothesis can be rejected at the corresponding confidence level, which indicates that there is no unit root in the time series, and the series is smooth. Table 2 shows the descriptive statistics of the daily return series of the SSE 50 index as well as the statistics and p-values of the ADF test. The value of the ADF test statistic of the daily return series of SSE 50 index is -7.625, which is smaller than the value at 1%, the significance level, and the p-value is very close to 0, which can reject the original hypothesis, indicating that the return series of SSE 50 index is smooth.

Table 2.

Descriptive statistics

Mean	Median	Maximum value	Minimum value	Standard deviation	Kurtosis	Degree of bias	Adf test
2.78e-4	4.62e-4	7.82e-2	-9.35e-2	1.52e-2	6.475	-0.588	-7.625(0.000)

The use of the GARCH family model is preceded by an ARCH effect test on the time series, i.e., a test for serial correlation of the conditional heteroskedasticity series. Next, the ARCH effect test is performed, generally using the Ljung-Box test, the original hypothesis of the Ljung-Box test is that the time series is a white noise series. The daily return squared series of the SSE 50 index is shown in Figure 5, which clearly shows that there is a significant aggregation of volatility. The following Ljung-Box test is used to test whether the daily return squared series of the SSE 50 index has autocorrelation.

Setting the lag order from 1 to 10, the p-value of the LB statistic of Ljung-Box test changes as in Fig. 6. The p-values of the LB statistic are all much less than 0.05, which can reject the original hypothesis that the squared series of daily returns of the SSE 50 index is white noise, and indicate that there is an ARCH effect in this series. After the ARCH effect test, it was proved that the daily return series of the SSE 50 index can be modeled by the GARCH family model.

4.1.3

Attention Index Collection

The effect of attention on stock market volatility has been studied by scholars in related fields. The volatility of the S&P 500 index is predicted using deep learning with Google Trends as the feature, and the volatility of the CSI 300 index is predicted using Baidu index as the input feature of the model. Referring to these selection methods, this paper chooses 20 keywords related to the economy and finance as indicators that reflect macroeconomic factors, and then collects daily data on the Baidu index for these keywords. These data offer a fresh perspective on studying volatility. Select a keyword from 20 keywords, such as “financial crisis”, and compare its trend with the trend of the realized volatility of the SSE 50 Index, as shown in Figure 7, it can be intuitively seen that the two have roughly the same trend, when the search volume of “financial crisis” is large, the volatility of the SSE 50 Index is also large, and when the search volume of “financial crisis” is small, the volatility of the SSE 50 Index is also small.

4.2

Analysis of model prediction effects

Firstly, the model is used to predict and process the linear part of the test set data, and the specific expression of the test set can be obtained, and secondly, the nonlinear residual sequence et is calculated according to the linear prediction results, and the normalized residual sequence is obtained by normalizing it. The training set and test set are also used to divide the samples with the sample ratio of 3:1, and here the optimal parameters of the model are also used with timestep=20 and batchsize=16. The normalized residual sequences are input into the model for training, and the autocorrelation function of the error of the nonlinear components is shown in Fig. 8. By observing the autocorrelation function of the error, except for the significant non-zero at order 0, almost all of the other orders are controlled within the confidence interval, so it can be considered that the error of the network is within the required range.The Loss plot is shown in Fig. 9, and according to the Loss plot, the error is almost concentrated in the range of 0.002-0.005, which indicates that the network training effect is good.

The results of the two parts of the linear prediction are linearly superimposed to obtain the prediction results of the hybrid model as shown in Fig. 10, and the prediction performance indexes are shown in Table 3, which intuitively shows that the hybrid model of the serial mode has better prediction effect, high image overlap, and the values of MSE, RMSE, and MAE are all small, which are 0.000386, 0.019559, and 0.014479, respectively.

Table 3.

Series hybrid model predictive energy index

Model	MSE	RMSE	MAE
ARFIMA-GARCH-LSTM	0.000386	0.019559	0.014479

4.3

Risk metrics based on VaR modeling

Through a variety of methods, stock return prediction can be achieved. The model in this paper has the best prediction effect, and by optimizing hyperparameters, the prediction effect can be slightly improved. The residual sequence of the model is plotted as a histogram, and the histogram of its residual sequence is shown in Figure 11. From the figure, it can be seen that the residuals follow a normal distribution.

The descriptive statistics values of the residual series are shown in Table 4. The descriptive statistics of the residual series shows that the mean value is -0.0011645, which is approximated to 0, the standard deviation is 0.0245188, the skewness S=-0.025 is approximated to 0, and the kurtosis K=3.312 is close to the kurtosis value of the standard normal-tai distribution of about 3, so the residuals obey a normal distribution.

Table 4.

Residual description

Variable	Numerical value
Mean value	-0.0011645
Standard deviation	0.0245188
Maximum value	0.1445756
Minimum value	-0.1678244
Degree of bias	-0.025
Kurtosis	3.312

Next, model-based rolling forecasts of the standard deviation of returns are made. The optimization process is the same as the prediction of stock returns. The optimized parametric model is obtained with epochs of 10, batch size of 8, N of 9, lstm units of 100, optimizer of “adam”, dropout of 0.1, and activation of “tanh”. Finally, the model loss function value is 0.00601476, the standard deviation of the prediction of the fitted line graph shown in Figure 12, it can be seen that the prediction effect is better, the predicted value and the actual value of the high degree of overlap, can be a better fit to the trend and direction of the standard deviation.

Regarding the measurement of VaR, in the historical simulation method, the VaR is calculated based on the empirical distribution of stock returns constructed from historical data.In this paper, the distribution of stock return forecasts is calculated based on the deep learning method, so as to measure the risk VaR and obtain the value-at-risk VaR sequence. The part of the test set obtained from the measurement is shown in Fig. 13. It can be seen that the value-at-risk volatility (VAR) fluctuates more sharply.

The 1% sample quartile and 5% sample quartile of stock returns are used as the stock market risk warning line, respectively, and the warning graph is shown in Figure 14. An early warning signal is issued when the value at risk falls below the warning line. From the early warning chart, it can be seen that near July 2020, the value at risk Va R is in the trough, zooming in on this interval, as shown in Figure 15.Between June 25, 2020 and around July 10, 2020, the value at risk is below the warning line for the 1% sample quartile.

Again intercepting the stock returns in this interval to observe the true state of the returns, a bar chart is plotted as shown in Figure 15. Most of the returns fluctuate in a lower state except for July 7, 8, and 10, 2020, when the returns are slightly above zero. It can be verified that this period is in the risky zone.

5

Financial market risk management strategies

5.1

Enhanced risk identification

In the financial market, accurate risk identification is a prerequisite for risk management. To achieve this goal, it is necessary to establish a comprehensive risk identification framework and systematize the identification of various risks by combining advanced data analysis techniques.

First, financial institutions need to establish a sound risk identification mechanism, which can adopt a risk identification matrix to categorize and grade risks and clarify the characteristics, sources, and impacts of various types of risks.

Second, financial institutions should ensure the comprehensiveness, accuracy, and timeliness of data by building a perfect data collection system.

Third, financial institutions should focus on the dynamic and forward-looking nature of risk identification. To this end, financial institutions should establish an early warning system for risk identification that tracks market changes in a timely manner to dynamically adjust risk identification strategies.

Fourth, risk identification is not only the responsibility of the risk management department, but also requires the cooperation of other departments, such as business, compliance, and audit. Financial institutions need to set up regular cross-departmental risk assessment meetings to exchange risk experience and countermeasures from each department to form a unified risk identification and management strategy.

Fifth, financial institutions need to strengthen external cooperation and information exchange to enhance their risk identification capabilities through industry cooperation and guidance from regulators.

5.2

Optimizing risk assessment

In risk management in financial markets, optimizing risk assessment is a key step in enhancing the efficiency and effectiveness of risk management.

First, financial institutions must ensure that risk assessment models are highly accurate, which requires constant updating and maintenance of the assessment models to ensure that they can truly reflect the latest risk dynamics.

Second, financial institutions need to establish a multi-dimensional risk assessment system, which should assess risks from multiple perspectives.

Third, with the development of information technology, real-time data analysis has become possible, and financial institutions need to utilize real-time data streams to continuously monitor risk indicators in order to quickly identify and respond to risks brought about by market movements.

Fourth, in order to optimize risk assessment in a comprehensive manner, financial institutions need to establish a cross-departmental risk management team, which should have expertise and experience in multiple fields, so as to analyze and assess risks from different perspectives.

5.3

Enhanced risk control

Financial institutions enhance the adaptability of risk control through the establishment of a multi-level risk control system, the introduction of advanced risk control technology, the strengthening of risk control coordination and the enhancement of risk control.

First, financial institutions should establish a multi-level risk control system to ensure the comprehensiveness and effectiveness of risk control by setting risk limits, establishing internal control mechanisms, and formulating contingency plans.

Second, the implementation of a risk limit system by financial institutions is the basis for strengthening risk control, a strategy that requires institutions to set clear risk-tolerance thresholds and ensure that all operations are carried out within these limits.

Third, refined capital management is also a key component of enhanced risk control. Financial institutions are required to enhance their risk resilience by maintaining adequate capital buffers.

Fourth, financial institutions need to identify and correct deficiencies and loopholes in the risk management process in a timely manner through the establishment of a sound internal control system and frequent audits and inspections.

Fifth, the use of insurance and derivatives by financial institutions for risk transfer is a common means of controlling risks. The proper use of derivatives can help institutions lock in costs, reduce fluctuations in returns, and protect them from extreme market volatility. Enhanced use of derivatives will lead to better risk control.

6

Conclusion

Risk management is the focus of financial market research. This paper combines the traditional econometric model with the deep learning model to construct a more accurate prediction model to predict the stock risk market and to measure the risk of the stock risk market, and the conclusions are as follows:

The error autocorrelation function of the model in this paper is not zero at order 0, but the overall control is inside the confidence interval at other orders, and the prediction error is concentrated between 0.002-0.005, with very small values. The combination of the models in this paper is in series, the predicted images in this way have a high degree of overlap, and the values of the relevant prediction indexes are less than 0.02. Therefore, it is considered that the training of the model network in this paper is better, and the prediction accuracy meets the requirements, and the series connection of the hybrid model has a positive effect on the prediction accuracy of the model, and the prediction accuracy of the hybrid model has a positive effect.

In terms of risk metrics, the VaR assessment model is constructed, and the sample quartiles of 1% and 5% are used as the warning line, and when the value of risk is below the warning line, it indicates that the risk is high and preventive measures should be taken. Based on this, this paper discusses specific risk management strategies in depth from three aspects: improving the risk identification mechanism, optimizing the risk assessment method, and strengthening the risk control measures.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

Application and Innovation in Financial Market Risk Management Based on Big Data and Artificial Intelligence

Ruimei Wang

Published Online: Mar 21, 2025

Received: Oct 31, 2024

Accepted: Feb 15, 2025

DOI: https://doi.org/10.2478/amns-2025-0579

KeywordsARFIMA, GARCH, LSTM, Tandem Portfolio, Financial Market Risk Management

© 2025 Ruimei Wang, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
ARFIMA, GARCH, LSTM, Tandem Portfolio, Financial Market Risk Management