Open Access

Research on multivariate statistical analysis methods of rural economic dynamics in the context of digital countryside

  
Sep 26, 2025

Cite
Download Cover

Introduction

While further doing a good job of monitoring the dynamics of agricultural resources, appropriately increasing the content of agricultural and rural economic operation and expanding the field of work is a breakthrough in the work of agricultural zoning, and doing a good job of this work is of great significance [1-3]. This helps to accelerate the realization of the economic system and economic growth mode change, accelerate the transformation of traditional agriculture to modern agriculture. In order to achieve this goal, it is necessary to provide monitoring information on the supply of agricultural products and agricultural production materials, agricultural science and technology, disaster prevention and mitigation for governments at all levels and the majority of farmers [4-6]. Secondly, agricultural economic monitoring is the need to adapt to the development of socialist market economy. Under the conditions of socialist market economic system, the formulation of rural industrial policy, adjusting the structure of rural economy, planning the regional layout, the development of advantageous industries and regional economy must be oriented to the market, based on market supply and demand information [7-9]. Whether the monitoring work can keep up is directly related to whether the government’s macro-management and regulation goals can be realized, and whether it can avoid the ups and downs of agricultural production and market supply and demand.

Agricultural economic monitoring is needed to realize the transformation of agricultural and rural economic growth mode. From the point of view of agricultural growth mode, on the one hand, the agricultural resources are tight and insufficient inputs, on the other hand, the operation is rough, the utilization rate of production factors is low, and the waste is serious [10-12]. This requires a comprehensive grasp of the flow of information on all types of production factors in agriculture, relying on science and technology to rationally develop and utilize all kinds of resources. Finally, agricultural economic monitoring is the need to strengthen the function of agricultural zoning system. Agricultural resource dynamic monitoring work, from the point of view of monitoring content, is currently limited to arable land, labor, field efficiency and several other indicators, the monitoring surface is relatively narrow, and the application of monitoring results is not ideal [13-15]. On the other hand, the statistical department is engaged in agricultural census, agricultural departments are also establishing and improving their own information systems, if the agricultural zoning department does not actively expand their business scope, the functions assigned by the government at all levels not only can not be strengthened, the original functions are likely to be lost [16-17].

The significance of carrying out statistical work in the rural economy is that through the statistical work in the rural economy can investigate the actual situation of rural agriculture and farmers, and introduce a targeted system to guide the development of the agricultural economy. In the new period, the rural statistical work is facing new requirements, resulting in the current rural grassroots statistical work there are some problems, the professionalism of the statisticians is not strong, the statistical methods are traditionally backward, the statistical management system is not perfect, etc., to improve the ideological understanding of rural grassroots statistical work, the use of a new statistical work methods, etc. [18-21]. At the same time, due to the importance of rural statistical work, there will be data collection difficulties in rural economic statistics, rural economic statistics are more traditional, lack of modern technology and tools, rural economic work methods are backward, etc., need to strengthen the innovation of rural statistical work. It is difficult to ensure the quality and authenticity of rural economic statistics by relying on manpower, so the informatization of rural economic statistics should be strengthened and the comprehensive quality of rural economic statistics staff should be improved [22-25].

The article firstly introduces the definition of multiple linear regression model and VAR model to lay a theoretical foundation for the establishment of the model later. Then it combs and identifies the influencing factors of rural GDP growth to provide a basic basis for selecting and refining the factors. Then the GDP data of a village from January 2013 to December 2013 is used to conduct an empirical study. After analyzing the relevant variables using multiple linear regression models, a VAR model is established, a unit root test is done for each variable time series, a cointegration test is performed after ensuring that each time series is smooth, a Granger causality test is performed, and finally a vector autoregression model is established. Finally, the analysis based on VAR model is carried out according to the relevant data such as GDP, agricultural prices, farmers’ consumption and production materials to explore its impact on rural economic growth.

Method
Idea and construction of multiple regression models

Multiple regression models are applied to explain the relationship between an explanatory variable and multiple explanatory variables, and have the basic form of a regression function for models with n variable present: Yi=β1+β2X2i+β3X3i++βkXki+μ$${Y_i} = {\beta_1} + {\beta_2}{X_{2i}} + {\beta_3}{X_{3i}} + \cdots + {\beta_k}{X_{ki}} + \mu$$

When this is estimated using sample observations, there will be a sample mean of Y¯$$\overline Y$$ for the explanatory variable, which is then available: Y¯i=β¯1+β¯2X2i+β¯3X3i++β¯kXki$${\bar Y_i} = {\bar \beta_1} + {\bar \beta_2}{X_{2i}} + {\bar \beta_3}{X_{3i}} + \cdots + {\bar \beta_k}{X_{ki}}$$

And there are also residuals between the actual values of the explanatory variables Y and the sample estimates ei thus the expression: Yi=Y¯i+ei$${Y_i} = {\overline Y_i} + {e_i}$$

After the establishment of the multiple regression model, the inevitable need to use the sample information to establish the sample regression function, so that it is as far as possible to maximize the reduction of the true regression situation, generally more commonly used is the least squares method, so that the estimated residual sum of squares to minimize the principle of determining the sample regression function, that is: et2=(YiY¯i)2=(Yiβ¯1β¯2X2iβ¯3X3iβ¯kXki)2$$\sum {e_t^2} = \sum {{{\left( {{Y_i} - {{\bar Y}_i}} \right)}^2}} = \sum {{{\left( {{Y_i} - {{\bar \beta }_1} - {{\bar \beta }_2}{X_{2i}} - {{\bar \beta }_3}{X_{3i - }} \cdots {{\bar \beta }_k}{X_{ki}}} \right)}^2}}$$

That is, there are necessary conditions: (et2)β¯j=0$$\frac{{\partial \left( {\sum {e_t^2} } \right)}}{{\partial {{\bar \beta }_j}}} = 0$$ { 2(Yiβ¯1β¯2X2iβ¯3X3iβ¯kXki)(1)=0 2(Yiβ¯1β¯2X2iβ¯3X3iβ¯kXki)(X2i)=0 2(Yiβ¯1β¯2X2iβ¯3X3iβ¯kXki)(Xki)=0$$\left\{ {\begin{array}{*{20}{c}} {2\sum {\left( {{Y_i} - {{\bar \beta }_1} - {{\bar \beta }_2}{X_{2i}} - {{\bar \beta }_3}{X_{3i}} - \cdots {{\bar \beta }_k}{X_{ki}}} \right)} ( - 1) = 0} \\ {2\sum {\left( {{Y_i} - {{\bar \beta }_1} - {{\bar \beta }_2}{X_{2i}} - {{\bar \beta }_3}{X_{3i}} - \cdots {{\bar \beta }_k}{X_{ki}}} \right)} \left( { - {X_{2i}}} \right) = 0} \\ \vdots \\ \vdots \\ {2\sum {\left( {{Y_i} - {{\bar \beta }_1} - {{\bar \beta }_2}{X_{2i}} - {{\bar \beta }_3}{X_{3i}} - \cdots {{\bar \beta }_k}{X_{ki}}} \right)} \left( { - {X_{ki}}} \right) = 0} \end{array}} \right.$$

Representing the above equations as a matrix has: [ ei X2iei Xkiei]=[ 0 0 0]$$\left[ {\begin{array}{*{20}{c}} {\sum {{e_i}} } \\ {\sum {{X_{2i}}} {e_i}} \\ \cdots \\ \cdots \\ {\sum {{X_{ki}}} {e_i}} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} 0 \\ 0 \\ \cdots \\ \cdots \\ 0 \end{array}} \right]$$ [ ei X2iei Xkiei]=[ 1 1 Xk1 Xkn]*[ e1 en]=XTe=[ 0 0].$$\left[ {\begin{array}{*{20}{c}} {\sum {{e_i}} } \\ {\sum {{X_{2i}}} {e_i}} \\ \cdots \\ \cdots \\ {\sum {{X_{ki}}} {e_i}} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} 1& \cdots &1 \\ \vdots &{}& \vdots \\ {{X_{k1}}}& \cdots &{{X_{kn}}} \end{array}} \right]*\left[ {\begin{array}{*{20}{c}} {{e_1}} \\ \vdots \\ {{e_n}} \end{array}} \right] = {X^T}e = \left[ {\begin{array}{*{20}{c}} 0 \\ \vdots \\ 0 \end{array}} \right].$$

There are multiple regression functions expressed in matrix form: Y¯=Xβ¯+e$$\overline Y = X\bar \beta + e$$

The computational simplification in which both sides of the equation are simultaneously multiplied by the transpose matrix XT of X, has: XTY¯=XTXβ¯+XTe=XTXβ¯$${X^T}\overline Y = {X^T}X\bar \beta + {X^T}e = {X^T}X\bar \beta$$

From this, the matrix of regression parameters for the multiple regression equation can be calculated β¯=(XTX)1XTY$$\bar \beta = {\left( {{X^T}X} \right)^{ - 1}}{X^T}Y$$

The multiple regression model constructed on the basis of this least squares method has three remarkable and excellent properties: linearity, unbiasedness, and validity.

Linearity is due to the fact that β¯$$\bar \beta$$ is a linear expression of Y.

Unbiasedness is based on the zero-mean assumption of the multivariate regression model for randomly perturbed terms: E(β¯)=E((XTX)1XTY)$$E(\bar \beta ) = E\left( {{{\left( {{X^T}X} \right)}^{ - 1}}{X^T}Y} \right)$$

There’s always been Y = xβ + U. E(β¯) = E((XTX)1XT(Xβ+U))=E(β+(XTX)1XTU) E(β¯) = β+(XTX)1XT(E(U))=β$$\begin{array}{rcl} E(\bar \beta ) &=& E\left( {{{\left( {{X^T}X} \right)}^{ - 1}}{X^T}(X\beta + U)} \right) = E\left( {\beta + {{\left( {{X^T}X} \right)}^{ - 1}}{X^T}U} \right) \\ E(\bar \beta ) &=& \beta + {\left( {{X^T}X} \right)^{ - 1}}{X^T}\left( {E(U)} \right) = \beta \\ \end{array}$$

Validity, on the other hand, exists based on the fact that the least squares estimator is the one with the least variance of all linear unbiased estimators.

After the construction of the completed multiple regression model can be tested for goodness of fit by the multiple decidable coefficients R2, i.e., the proportion of the total variance in Y that is explained by the explanatory variables explaining part of the variance.

The variance components are: (YIY¯)2=(Y¯iY¯)2+(YiY¯i)2$$\sum {{{\left( {{Y_I} - \bar Y} \right)}^2}} = \sum {{{\left( {{{\bar Y}_i} - \bar Y} \right)}^2}} + \sum {{{\left( {{Y_i} - {{\bar Y}_i}} \right)}^2}}$$

Multiple decidable coefficients are: R2=(Y¯iY¯)2(YiY¯)2$${R^2} = \frac{{\sum {{{\left( {{{\bar Y}_i} - \bar Y} \right)}^2}} }}{{\sum {{{\left( {{Y_i} - \bar Y} \right)}^2}} }}$$ R2=(YiY¯)2(YiY¯i)2(YiY¯)2=1ei2(YiY¯)2$${R^2} = \frac{{\sum {{{\left( {{Y_i} - \bar Y} \right)}^2}} - \sum {{{\left( {{Y_i} - {{\bar Y}_i}} \right)}^2}} }}{{\sum {{{\left( {{Y_i} - \bar Y} \right)}^2}} }} = 1 - \frac{{\sum {e_i^2} }}{{\sum {{{\left( {{Y_i} - \bar Y} \right)}^2}} }}$$

At this time there is R2 the closer to 1, the better the model fit, but at this time when the sample size is unchanged, with the gradual increase in the number of explanatory variables, the total variance and the residual sum of squares will not change, but the explained part of the regression sum of squares will increase, at this time the use of multiple coefficients of determination to compare the two samples with the same size and the number of explanatory variables are different models will be biased [26].

In the case of the same sample size, the increase in the number of explanatory variables will increase the number of parameters to be estimated, the inevitable loss of degrees of freedom, so you can correct the multiple coefficients of determination through the degrees of freedom, this time there is a modified coefficient of determination R¯2$${\bar R^2}$$ expression for: R¯2=1 ei2nk Σ(YiY2n1 =1n1nkei2(YiY¯)2$${\bar R^2} = 1 - \frac{{\frac{{\sum {e_i^2} }}{{n - k}}}}{{\frac{{\Sigma \left( {{Y_i} - {Y^2}} \right.}}{{n - 1}}}} = 1 - \frac{{n - 1}}{{n - k}}\frac{{\sum {e_i^2} }}{{\sum {{{\left( {{Y_i} - \bar Y} \right)}^2}} }}$$ R¯2=1(1R2)n1nk$${\bar R^2} = 1 - \left( {1 - {R^2}} \right)\frac{{n - 1}}{{n - k}}$$

In order to ensure the regression effect it is also necessary to test the regression equation, that is, to test whether there is a significant linear relationship between the explanatory variables and the explained variables, there are hypotheses: { H0:β2=β3==βk=0 H1:βj(j=2,3,k) Not all 0$$\left\{ {\begin{array}{*{20}{l}} {{H_0}:{\beta_2} = {\beta_3} = \cdots = {\beta_k} = 0} \\ {{H_1}:{\beta_j}(j = 2,3, \ldots k){\text{ Not all 0}}} \end{array}} \right.$$

The F -statistic is constructed, again based on the decomposition of the variance and the consideration of the degrees of freedom: F=(Y¯iY¯)2(YiY¯i)2*nkk1$$F = \frac{{\sum {{{\left( {{{\bar Y}_i} - \bar Y} \right)}^2}} }}{{\sum {{{\left( {{Y_i} - {{\bar Y}_i}} \right)}^2}} }}*\frac{{n - k}}{{k - 1}}$$

The F statistic can be related to the decidable coefficients according to Eq. There: F=R21R2*nkk1$$F = \frac{{{R^2}}}{{1 - {R^2}}}*\frac{{n - k}}{{k - 1}}$$

At this time, there is a constructed F statistic obeys the F distribution with degrees of freedom (k − 1), (nk), if the calculated F statistic value is greater than the corresponding boundary value of the F distribution under the given significance level, the hypothesis H0 can be rejected, and at this time, there is a significant regression equation, the explanatory variables have a significant effect on the explanatory variables.

Due to the reality of the problem, multiple factors will inevitably be interrelated rather than completely independent, there will be multicollinearity between the variables, resulting in an inaccurate fitting effect, it is difficult to realize the analysis of the explanatory variables, so in the following we will also use the principal component analysis method, for the selection of the variables to be extracted and screened again, through the construction of a new regression model with independent principal component factors, so as to see the clearer explanation of the Relationships. The main idea is to perform a linear transformation of the explanatory variables X to obtain the principal components F, i.e.: { F1=U11X1+U12X2++U1pXp F2=U21X1+U22X2++U2pXp Fn=Un1X1+Un2X2++UnpXp$$\left\{ {\begin{array}{*{20}{c}} {{F_1} = {U_{11}}{X_1} + {U_{12}}{X_2} + \cdots + {U_{1p}}{X_p}} \\ {{F_2} = {U_{21}}{X_1} + {U_{22}}{X_2} + \cdots + {U_{2p}}{X_p}} \\ \vdots \\ {{F_n} = {U_{n1}}{X_1} + {U_{n2}}{X_2} + \cdots + {U_{np}}{X_p}} \end{array}} \right.$$

There are several restrictions in the transformation process to ensure the effective realization of principal component extraction as follows.

uTu = 1

Fi and Fj are independent of each other (ij,i,j=1,2)$$\left( {i \ne j,i,j = 1,2 \ldots } \right)$$

Fii satisfies the decreasing column, and its variance can reflect the response degree of its original information.

In this way, in the process of statistical analysis, only the principal factors with larger variance can be selected, which have a better response to the original variables and can be more clear and concise.

Econometric modeling of factors influencing changes in the dynamics of the rural economy
VAR model

For vector regression models: yt=c+ϕ1yt1+ϕ2yt2++ϕpytp+εt$${y_t} = c + {\phi_1}{y_{t - 1}} + {\phi_2}{y_{t - 2}} + \ldots + {\phi_p}{y_{t - p}} + {\varepsilon_t}$$

Among them: E(εt)=0 E(εtεr)={ σ2 t=τ 0 tτ$$\begin{array}{*{20}{l}} {E\left( {{\varepsilon _t}} \right) = 0} \\ {E\left( {{\varepsilon _t}{\varepsilon _r}} \right) = \left\{ {\begin{array}{*{20}{c}} {{\sigma ^2}}&{t = \tau } \\ 0&{t \ne \tau } \end{array}} \right.} \end{array}$$

If we extend it further, we need to consider the dynamic interactions between multiple variables, i.e., let yt be a (n × 1)-vector [27]. By setting the lag order of the regression to p, the model is written as VAR(p): yt=c+Φ1yt1+Φ2yt2++Φpytp+εt$${y_t} = c + {\Phi_1}{y_{t - 1}} + {\Phi_2}{y_{t - 2}} + \ldots + {\Phi_p}{y_{t - p}} + {\varepsilon_t}$$

Here c represents a (n × 1) vector of constant terms. Φj is a (n × n) matrix of autoregressive coefficients, j = 1, 2, …, p. εt is a (n × 1) white noise vector characterized by: E(εt) = 0 E(εtετ) = { Ω t=τ 0 tτ$$\begin{array}{rcl} E\left( {{\varepsilon_t}} \right) &=& 0 \\ E\left( {{\varepsilon_t}{\varepsilon_\tau }} \right) &=& \left\{ {\begin{array}{*{20}{l}} \Omega &{t = \tau } \\ 0&{t \ne \tau } \end{array}} \right. \\ \end{array}$$

where Ω is a (n × n)-symmetric positive definite matrix.

Let ci denote the ith element of vector c and let ϕij1$$\phi_{ij}^1$$ denote the ith row and jth column elements of the matrix. Then the VAR model is written in the form of a single equation as: y1t = c1+ϕ111yl(t1)+ϕ121y2(t2)++ϕ1n1yn(n1) +ϕ112y1(t2)+ϕ122y2(t2)++ϕ1n2yn(n2) ++ϕ11py1(tp)+ϕ12py2(tp)++ϕ1npyn(tp)$$\begin{array}{rcl} {y_{1t}} &=& {c_1} + \phi_{11}^1{y_{l(t - 1)}} + \phi_{12}^1{y_{2(t - 2)}} + \ldots + \phi_{1n}^1{y_{n(n - 1)}} \\ &&+ \phi_{11}^2{y_{1(t - 2)}} + \phi_{12}^2{y_{2(t - 2)}} + \ldots + \phi_{1n}^2{y_{n(n - 2)}} \\ &&+ \ldots + \phi_{11}^p{y_{1(t - p)}} + \phi_{12}^p{y_{2(t - p)}} + \ldots + \phi_{1n}^p{y_{n(t - p)}} \\ \end{array}$$

VAR(p) system, each variable regresses on the constant term and its pnd-order lagged value simultaneously on the pth-order lagged values of VAR(p) the other variables in the system.

For VAR(p) process yt, the VAR(p) process is covariance smooth if its first-order moments E(yt)$$E\left( {{y_t}} \right)$$ and second-order moments Eytytj$$E{y_t}{y'_{t - j}}$$ are independent with respect to moment t. After determining that the VAR(p) process covariance is smooth, we deform the equation by taking the expectation, and the mean μ value obtained for this process is: μ=c+Φ1μ+Φ2μ++Φpμ$$\mu = c + {\Phi_1}\mu + {\Phi_2}\mu + \ldots + {\Phi_p}\mu$$

Organize to get: μ=(InΦ1Φ2Φp)1c$$\mu = {\left( {{I_n} - {\Phi_1} - {\Phi_2} - \ldots - {\Phi_p}} \right)^{ - 1}}c$$

The system equation is written in the departure form as: (ytμ)=Φ1(yt1μ)+Φ2(yt2μ)++Φp(ytpμ)+εt$$\left( {{y_t} - \mu } \right) = {\Phi_1}\left( {{y_{t - 1}} - \mu } \right) + {\Phi_2}\left( {{y_{t - 2}} - \mu } \right) + \ldots + {\Phi_p}\left( {{y_{t - p}} - \mu } \right) + {\varepsilon_t}$$

Definition: ξt=[ ytμ yt1μ ytp+1μ]$${\xi_t} = \left[ {\begin{array}{*{20}{c}} {{y_t} - \mu }&{{y_{t - 1}} - \mu }& \cdots &{{y_{t - p + 1}} - \mu } \end{array}} \right]$$ F=[ Φ1 Φ2 Φ3 Φp1 Φp In 0 0 0 0 0 In 0 0 0 0 0 0 In 0]$$F = \left[ {\begin{array}{*{20}{c}} {{\Phi_1}}&{{\Phi_2}}&{{\Phi_3}}& \cdots &{{\Phi_{p - 1}}}&{{\Phi_p}} \\ {{I_n}}&0&0& \cdots &0&0 \\ 0&{{I_n}}&0& \cdots &0&0 \\ \vdots & \vdots & \vdots & \cdots & \vdots & \vdots \\ 0&0&0& \cdots &{{I_n}}&0 \end{array}} \right]$$ Vt=[ εt 0 0]$${V_t} = \left[ {\begin{array}{*{20}{l}} {{\varepsilon_t}}&0& \cdots &0 \end{array}} \right]$$

Thus VAR(p) is further written in matrix form as: ξt=Fξt1+Vt$${\xi_t} = F{\xi_{t - 1}} + {V_t}$$

F(VtVt)={ Q t=τ 0 tτ$$F\left( {{V_t}{V_t}^\prime } \right) = \left\{ {\begin{array}{*{20}{l}} Q&{t = \tau } \\ 0&{t \ne \tau } \end{array}} \right.$$ of them. Q=[ Ω 0 0 0 0 0 0 0 0]$$Q = \left[ {\begin{array}{*{20}{c}} \Omega &0& \cdots &0 \\ 0&0& \cdots &0 \\ \vdots & \vdots & \cdots & \vdots \\ 0&0& \cdots &0 \end{array}} \right]$$

The equation implies: ξt+s=vt+s+Fvt+s1+F2vt+s2++Fs1vt+1+Fsvt$${\xi_{t + s}} = {v_{t + s}} + F{v_{t + s - 1}} + {F^2}{v_{t + s - 2}} + \ldots + {F^{s - 1}}{v_{t + 1}} + {F^s}{v_t}$$

The VAR model is covariance smooth if the eigenvalues of the F test are all within the unit circle, i.e., is for the roots of the characteristic equation |InλpΦ1λp1Φ2λp2Φp|=0$$\left| {{I_n}{\lambda^p} - {\Phi_1}{\lambda^{p - 1}} - {\Phi_2}{\lambda^{p - 2}} - \ldots - {\Phi_p}} \right| = 0$$ fall within the unit circle, i.e., |λ|<1$$\left| \lambda \right| < 1$$ [28].

Granger causality tests

Granger Causality Test Given an information set At, which contains at least (Xt,Yt)$$\left( {{X_t},{Y_t}} \right)$$, if “past and present changes can affect future outcomes, while future changes do not affect the past” holds, Xt is said to be the Granger cause of Yt if the use of Xt past is a better predictor of Yt than when it is not utilized, and the same is true anyway.

The Granger causality test model is: if Xt, Yt is a smooth process, for the model: { Xt=c1+j=1pαjXtj+j=1qβjYtj+ε1t Yt=c2+j=1pγjYtj+j=1qδjXtj+ε2t$$\left\{ {\begin{array}{*{20}{l}} {{X_t} = {c_1} + \sum\limits_{j = 1}^p {{\alpha_j}} {X_{t - j}} + \sum\limits_{j = 1}^q {{\beta_j}} {Y_{t - j}} + {\varepsilon_{1t}}} \\ {{Y_t} = {c_2} + \sum\limits_{j = 1}^p {{\gamma_j}} {Y_{t - j}} + \sum\limits_{j = 1}^q {{\delta_j}} {X_{t - j}} + {\varepsilon_{2t}}} \end{array}} \right.$$

ε1, ε2 for white noise. The following cases exist:

If βj = δj = 0(j = 1, 2, ⋯, q), then Xt, Yt are independent of each other.

If βj = 0, δj ≠ 0(j = 1, 2, ⋯, q), then Xt is the (Granger) cause of Yt.

If βj ≠ 0, δj = 0(j = 1, 2, ⋯, q), then Yt is the (Granger) cause of Xt.

If βj ≠ 0, δj ≠ 0(j = 1, 2, ⋯, q), then Xt, Yt are (Granger) causes of each other.

Obviously, 3) and 2) show a one-way bootstrapping relationship (causality), 4) is a two-way feedback relationship (causality), and 1) is an independent relationship.

The Granger causality F test is: for Xt=c1+j=1pαjXtj+j=1qβjYtj+ε1t$${X_t} = {c_1} + \sum\limits_{j = 1}^p {{\alpha_j}} {X_{t - j}} + \sum\limits_{j = 1}^q {{\beta_j}} {Y_{t - j}} + {\varepsilon_{1t}}$$ perform the test by setting H0 : βj = 0, H1 : βj ≠ 0. j = 1, 2, ⋯, q Estimate the 0LS for this model, noting the residual sum of squares as ESSX(q, p) [29]. Estimate model Xt=c1+j=1pαjXtj+εt$${X_t} = {c_1} + \sum\limits_{j = 1}^p {{\alpha_j}} {X_{t - j}} + {\varepsilon_t}$$ again, noting the residual sum of squares as ESSX(p). Constructor statistic: FX=[ESSX(p)ESSX(q,p)]/pESSX(q,p)/(npq1)~F(p,npq1)$${F_X} = \frac{{\left[ {ES{S_X}(p) - ES{S_X}(q,p)} \right]/p}}{{ES{S_X}(q,p)/(n - p - q - 1)}}\sim F(p,n - p - q - 1)$$

Given confidence level α, look up the critical value Fα and consider Y to be the Granger cause of X if F > Fα, reject H0, and accept H1.

Similarly, the original and alternative hypotheses are H0 : δj = 0, H1 : γj ≠ 0 for the model Yt=c2+j=1pγjYtj+j=1qδjXtj+ε2t$${Y_t} = {c_2} + \sum\limits_{j = 1}^p {{\gamma_j}} {Y_{t - j}} + \sum\limits_{j = 1}^q {{\delta_j}} {X_{t - j}} + {\varepsilon_{2t}}$$ test. j = 1, 2, ⋯, q The 0LS estimation of this model is recorded as the residual sum of squares as ESSY(q, p), and then the 0LS estimation of model Yt=c2+j=1pγjYtj+εt$${Y_t} = {c_2} + \sum\limits_{j = 1}^p {{\gamma_j}} {Y_{t - j}} + {\varepsilon_t}$$ yields a residual sum of squares of ESSY(p) and an F-statistic of: FY=[ESSY(p)ESY(q,p)]/pESSY(q,p)/(npq1)~F(p,npq1)$${F_Y} = \frac{{\left[ {ES{S_Y}(p) - E{S_Y}(q,p)} \right]/p}}{{ES{S_Y}(q,p)/(n - p - q - 1)}}\sim F(p,n - p - q - 1)$$

If H0 is rejected and H1 is accepted, consider X to be the Granger cause of Y.

Cointegration tests

The method of cointegration test is suitable for testing the existence of only one cointegration relationship between variables. Take two variables x and y as an example, first, we team two variables for single integrality test, such as two variables single integrality is the same, the cointegration relationship is established. Let’s assume that x and y are first-order single-integrated series, i.e., x ~ I(1) and y ~ I(1), then the steps of EG two-step test are as follows:

First, the following model is estimated by applying the least squares method: yt=β0+β1xt+εt$${y_t} = {\beta_0} + {\beta_1}{x_t} + {\varepsilon_t}$$

And calculate the corresponding residual series et=yt(β^0+β^1xt)$${e_t} = {y_t} - \left( {{{\hat \beta }_0} + {{\hat \beta }_1}{x_t}} \right)$$

Second, test the smoothness of the residual series in the model with the following tests: Δet=δet1+i=1mγiΔeti+εt$$\Delta {e_t} = \delta {e_{t - 1}} + \sum\limits_{i = 1}^m {{\gamma_i}} \Delta {e_{t - i}} + {\varepsilon_t}$$ Δet=α+δet1+i=1mγiΔeti+εt$$\Delta {e_t} = \alpha + \delta {e_{t - 1}} + \sum\limits_{i = 1}^m {{\gamma_i}} \Delta {e_{t - i}} + {\varepsilon_t}$$ Δet=α+βt+δet1+i=1mγiΔeti+εt$$\Delta {e_t} = \alpha + \beta t + \delta {e_{t - 1}} + \sum\limits_{i = 1}^m {{\gamma_i}} \Delta {e_{t - i}} + {\varepsilon_t}$$

If the original hypothesis H0 : δ = 0 is rejected by the DF test (or ADF test), which means that the residual series is smooth, then there is a cointegration relationship between x and y and the cointegration equation is yt = β0 + β1xt + εt.

Factor identification and selection and model construction
Factor Identification and Selection

Some scholars are concerned about the impact of population changes and changes in the unemployment rate on the rate of economic growth. In addition, some scholars believe that changes in the level of per capita income, labor productivity and other growth will also have a greater impact on future economic growth. In terms of future economic development, there is still much room for growth in per capita income level, productivity, factor prices, etc. Obviously, China’s GDP growth rate has a close relationship with population and productivity, and this paper believes that the three variables of per capita output growth, per capita income growth and labor productivity growth can be used to study the GDP growth rate.

Construction of the model

Per capita output growth is the growth rate of labor productivity of employees, labor productivity growth is considered to be the growth rate of labor productivity per hour, and per capita income growth has a simultaneous change pattern with the change in per capita GDP growth rate. The vector autoregressive model can take each endogenous variable as the lagged value of all endogenous variables in the system, which is the rule in the unstructured model, according to which the dynamic correlation between different variables in the model can be mined. The author utilizes the GDP data of a village for a total of 12 months from January 2013 to December 2013 for empirical analysis. The multivariate regression model of the dynamic changes in the rural economy is established as: Y=β0+β1X1+β2X2+β3X3+β4X4+β5X5+β6X6+μ$$Y = {\beta_0} + {\beta_1}{X_1} + {\beta_2}{X_2} + {\beta_3}{X_3} + {\beta_4}{X_4} + {\beta_5}{X_5} + {\beta_6}{X_6} + \mu$$

Where Y is the rural GDP, X1 is the monthly average price index of primary agricultural products, X2 is the monthly average price index of general agricultural products, X3 is the monthly data of the dollar exchange rate, X4 is the monthly data of the consumer price index of farmers, X5 is the monthly data of the price index of agricultural means of production, and X6 is the monthly data of the money supply M2. μ is the random error term.

Results and discussion
Multiple linear regression models

The initial regression results were obtained by applying least squares OLS and the least squares OLS regression results are shown in Table 1. From the regression results: R2=0.992166, which is overall significant and the t-statistics of most of the independent variables are also significant, but X3’s is not significant and the coefficients of X5 and X6 are negative, which is not in line with the reality and suggests that there may be a problem of multicollinearity. In addition D.W=0.669522 which indicates the presence of severe positive autocorrelation.

Least squares OLS regression

Variable Coefficient Std.error T-Statistic Prob.
C -2133.652 444.6522 -4.730225 0.0000
X1 0.254331 0.073226 3.261251 0.0016
X2 1.465123 0.057223 26.03155 0.0000
X3 0.341162 0.352251 0.932651 0.3681
X4 11.23551 4.641622 3.820013 0.0005
X5 -2.953251 1.636952 -1.754662 0.0835
X6 -0.000265 8.12E-05 -3.026631 0.0035
R-squared= 0.992166 Prob(F-statistic) =0.000000 DW=0.669522

In view of the above problems, the results of the regression are not satisfactory and need to be corrected and further optimized for problems such as autocorrelation. From empirical and theoretical analysis, it is known that the rural economy has a lag and should be regressed with a lag period as the independent variable. The Consumer Price Index (CPI) of villagers is a macroeconomic indicator that reflects the price changes of goods and services purchased by villagers. The CPI measures the average change in retail prices of more than 200 different agricultural products and services, and the level of the CPI can reflect the severity of inflation. It can be seen that not only does the dependent variable, the rural economy, have an impact on the CPI, but the rest of the independent variables also have an impact on it. In turn the change in CPI will also have an effect on these variables, thus it can be seen that X4 may be a contributor to multicollinearity and should be eliminated.

Positive serial correlation has three consequences: first, it overestimates the reliability of the regression results, and in general, autocorrelation biases the standard errors of the calculated coefficients. Second, the fact that neighboring residuals are not independent of each other can cause the regression function to fail to make optimal predictions about them. If the residuals from the previous period help to estimate the residuals from the current period, this link can be utilized to make better predictions about the explanatory variables. Finally, autocorrelation is a signal that the model has a setting error and new influences need to be found to explain it.

To include serial correlation in the equation, AR(1) should be included, which assumes that the random error term obeys a μt = pμt−1 = εt-process. Parameter p is the serial first-order autocorrelation coefficient. In effect, the AR(1) process includes the errors from past observations in the regression model for present observations. At this point, the interpretation of the coefficients, standard errors, and t-statistics does not change, but the model linked to the AR(1) estimates has two different residuals. One is the unconditional residual μt, which is calculated in the same way as when the AR(1) term is not included, with the value of the residual equal to the explanatory variables minus the product of each explanatory variable and its regression coefficient, and this is serially correlated. The other is the conditional residuals εt which are related to the unconditional residuals from the previous period. Because of serial correlation, these residuals will have a decreasing trend so that the unconditional residuals from the lagged period can be used to improve the forecast. The initial multivariate linear equation was optimized and corrected before regression was used, and the regression results are shown in Table 2. The resulting regression equation is: Y=995.9902+0.781522Y(1)+0.195532X1+0.335698X2 +0.621852X3+1.720026X50.000192X6$$\begin{array}{l} Y = - 995.9902 + 0.781522Y( - 1) + 0.195532{X_1} + 0.335698{X_2} \\ + 0.621852{X_3} + 1.720026{X_5} - 0.000192{X_6} \\ \end{array}$$

Regression

Variable Coefficient Std.error T-Statistic Prob.
C -995.9902 277.5201 -3.631125 0.0006
Y(-1) 0.781522 0.062366 15.03522 0.0000
X1 0.195532 0.041552 4.826638 0.0000
X2 0.335698 0.076552 4.189552 0.0002
X3 0.621852 0.226594 2.750622 0.0079
X5 1.720026 0.625512 2.795512 0.0062
X6 -0.000192 5.01E-05 -3.78445 0.0002
AR(1) 0.335216 0.096622 3.785123 0.0009
R-squared= 0.998562 Prob(F-statistic) =0.000000 DW=2.005163

This time the multiple linear regression effect is very good, all the variables are statistically significant, there is no serial correlation problem, just X6 (money supply M2) in front of the coefficient is negative, seems to be inconsistent with the reality of the situation, the money supply increased, the number of commodities remains unchanged, the price of the supply is not enough to meet the demand, so the direction of change should be the same direction. Think about it from another angle, the goal of monetary policy is to affect the interest rate by regulating the money supply to achieve, when inflation prices rise, in order to stabilize prices, the need to implement a tight monetary policy, that is, to reduce the money supply, so that the reverse change is also reasonable. Monetary policy has a time lag, whether the money supply or interest rates on price regulation will not immediately play a role, it is generally believed that, from the change in the money supply to the economic growth rate and (or) the rate of price increases have changed, on average, after 9 to 10 months. In addition, with the exception of money supply, which is a specific value, most of the other variables in the text are indices, which may also have some impact on the results.

Empirical Tests and Interpretation

Rural Gross Domestic Product (GDP) was used to indicate economic growth and was used as the dependent variable, while Investment (DI), Foreign Direct Investment (FDI), and the total number of employees (L) were used as independent variables. In order to eliminate heteroskedasticity, the variables were taken as natural logarithms in the empirical test analysis.

Unit root test

The smoothness of the ln GDP, ln DI, lnFDI and ln L series was tested by ADF, and the results of the ADF unit root test for the series are shown in Table 3 (c, t, n in the type of the test (c, t, n) denote that the unit root test equation contains a constant term, a time trend, and a lagged order, respectively, and 0 denotes that it does not contain it. p-value is the probability value of MacKinnon’s one-sided test. (*, **, *** denote the rejection of the original hypothesis at 10%, 5%, and 1% significance levels, respectively, and the variables are stable at the corresponding significance levels). The results show that lnGDP, lnDI, ln FDI and ln L are non-stationary series, and after first-order differencing respectively, only Δln FDI and Δln L are stationary and there is no unit root at 10% and 1% significance level respectively. Further second order differencing of the variables, all the variables reject the original hypothesis at 1% significance level and there is no unit root, therefore, these variables are I(2) series.

The ADF unit root test results of the sequence

Sequence Test type (c,t,n) ADF statistic Critical value(1%,5%,10%) P value Test conclusion
lnGDP (c,t,1) -2.950332 (4.2839, -3.64942, -3.18816) 0.3736 Uneven stability
ΔlnGDP (c,t,1) -2.658842 (4.43536, -3.6879, -3.22979) 0.1466 Uneven stability
Δ2lnGDP (c,t,0) -5.036151*** (4.51379, -3.66272, -3.27613) 0.0519 Smoothness
lnDI (c,t,2) -2.056326 (4.67817, -3.48604, -3.10642) 0.7093 Smoothness
ΔlnDI (c,t,1) -3.261152 (4.51428, -3.57928, -3.08134) 0.1344 Uneven stability
Δ2lnDI (c,t,2) -2.919522 (4.45497, -3.6862, -3.2658) 0.1442 Uneven stability
lnFDI (c,t,2) -3.584622* (4.44747, -3.69755, -3.11141) 0.1846 Smoothness
ΔlnFDI (c,t,0) -3.598455*** (4.38435, -3.68442, -3.16157) 0.0019 Uneven stability
Δ2lnFDI (c,t,3) -5.532632*** (4.52978, -3.71165, -3.19359) 0.0962 Smoothness
lnL (c,t,1) -2.362152 (4.44082, -3.40217, -3.14687) 0.1579
ΔlnL (c,t,0) -5.031162*** (4.46693, -3.70134, -3.22446) 0.0304 Smoothness
Δ2lnL (c,t,0) -6.952263*** (4.52884, -3.66022, -3.45295) 0.0001 Uneven stability
Cointegration tests

Since all the variables are second-order single-integrated series, cointegration test can be performed and Johansen cointegration test is adopted. Combining the five indicators of LR, FPE, AIC, SC and HQ, the optimal number of lags of the VAR model is determined to be 3. At this time, the inverse of the modes of all the roots of the VAR(3) model are within the unit circle, and the AR root diagram of the VAR(3) model is shown in Figure 1. The model has stability. The lag order chosen for the cointegration test should be 2 (it is equal to the optimal lag order of the unconstrained VAR model minus 1).

Figure 1.

The AR root diagram of the VAR(3) model

The results of Johansen cointegration test are shown in Table 4. The results of the tests for the maximum characteristic root and trace statistic simultaneously indicate that there are 2 cointegration relationships between lnGDP, ln DI, ln FDI and ln L at the 1% significance level. The cointegration vector about ln GDP is regularized to obtain the standardized cointegration vector and the cointegration equation.

The johansen cointegral test results

Original hypothesis Maximum characteristic root (p value) The critical value of a significant level of 1% Trace statistics (p) The critical value of a significant level of 1%
Zero cointegral vector 53.29066(0.0000)* 34.94264 107.18813(0.0000)* 61.64493
At least one cointegral vector 28.69266(0.0062)* 26.74797 53.47109(0.0003)* 42.37507
At least two cointegral vector 15.86719(0.0721) 20.90672 24.66381(0.0105) 22.8413
At least three cointegral vector 8.87416(0.0312) 12.12043 8.97686(0.0322) 13.26284

The standardized cointegration vector is shown in Table 5. From the table and the cointegration equation, it can be seen that the coefficients of lnDI, lnFDI and ln L are more significant, and they have an impact on lnGDP, and there is a long-run stabilizing relationship between the variables. The sign of each coefficient indicates that investment (DI), foreign direct investment (FDI) and the number of social employees (L) move positively with GDP, which is consistent with economic significance. The degree of influence of DI, FDI and L on GDP varies significantly, and the internal rural investment has the greatest impact on GDP, with a 1% increase in rural investment triggering a 0.365521% increase in GDP. This is followed by FDI. The smallest impact is on the number of people working in society, with GDP rising by 0.092311% for every 1% increase in the number of people working.

Normalized cointeger vector

lnGDP lnDI lnFDI lnL C
1.00000 -0.365521 -0.125112 -0.092311 -4.362281
Standard error 0.04551 0.02775 0.02932
Logarithmic likelihood 165.5226
Granger causality test

The above cointegration test indicates that there is a long-term stable relationship between rural internal investment, foreign direct investment, the number of social workers and GDP, but not necessarily constitute a causal relationship, so it is necessary to further test the Granger causality between the variables, Granger causality test as shown in Table 6 (**, *** indicates the rejection of the original hypothesis at the 10%, 5%, 1% significance level, respectively). The analysis shows that intra-rural investment, foreign direct investment and the number of people working in the society are Granger causes of GDP growth, and GDP is only a Granger cause of the increase in intra-rural investment. Increase in intra-rural investment creates environment, conditions and opportunities to attract more labor and foreign direct investment, it is the Granger cause of increase in the number of employees as well as foreign direct investment. Foreign direct investment contributes to an increase in rural capital, which has an accelerating effect on the growth of intra-rural investment and is the Granger cause of the increase in intra-rural investment. The increase in the number of employees can lead to the growth of the regional economy, which in turn leads to an increase in foreign direct investment, and the number of employees is the Granger cause of the increase in foreign direct investment at the 10 per cent significance level.

Granger causality test

Original hypothesis F statistic P value
Di is not the granger reason for GDP 6.75032*** 0.0038
GDP is not di’s granger reason 3.85562** 0.0349
FDI is not the granger reason for GDP 3.04122* 0.0637
GDP is not the reason for the granger of FDI 1.52155 0.2831
L is not the granger reason of GDP 5.88965*** 0.0072
GDP is not the granger reason for L 2.31662 0.1206
FDI is not di’s granger reason 3.95223** 0.0291
Di is not the reason for the granger of FDI 4.59211** 0.0185
L is not di’s granger reason 0.25705 0.8639
Di is not the granger reason for L 9.45112*** 0.0041
L is not the reason for the granger of FDI 2.63551* 0.0924
FDI is not the granger reason for L 1.28155 0.3154
Impulse response analysis and variance decomposition

While the impulse response function describes the impact of a shock to one endogenous variable in the VAR model, on other endogenous variables, the variance decomposition further evaluates the importance of the different structural shocks by analyzing the contribution of each structural shock to the change in the endogenous variable (usually measured in terms of variance). Thus, the variance decomposition gives information about the relative importance of each stochastic perturbation that has an impact on the variables in the VAR model.

Impulse Response Analysis

The impulse corresponding analysis graph is shown in Fig. 2, (a) from the 3rd period onwards the first-class agricultural products have some pulling effect on the gross product, but the sustained effect is not long, and in the 4th period this pulling effect will begin to weaken until the 8th period when the pulling effect will begin again. (b) General agricultural products out show a fluctuating negative effect on the GDP of rural areas. (c) Dollar exchange rate shocks cause inverse changes in rural GDP, with the negative effect gradually appearing from period 4 onwards. (d) Farmers’ consumption of rural GDP shows a large positive impact of the shock, but it is worth noting that the positive effect has a tendency to weaken gradually. (e) Agricultural means of production also shows a positive impact, and there is a gradual strengthening of the trend, mainly due to the continuous adjustment of the economic growth mode and industrial structure within the countryside, the tertiary industry plays a pivotal role in promoting economic growth in rural areas. (f) Money supply has a relatively stable positive impact on GDP in rural areas, but the impact is not significant.

Figure 2.

Esponse to cholesky one S.D. innovations±2 S.E.

Variance decomposition

The results of LNGDP variance decomposition (%) are shown in Table 7. From the table, it can be seen that the change in GDP growth in rural areas in the long run is affected by the impact of its own disturbance term in a gradually decreasing trend, from the initial 100% to 19.57%. Whereas, all the other economic variables’ disturbance terms have an increasing effect on the GDP growth. About 20% of the GDP of rural areas is determined by itself, about 35% by the total value of the tertiary industry, about 20% by the net export, and the lag effect of both of them on the GDP is very obvious, about 10% by fixed investment, about 6% by the consumption of villagers, and 5% to 7% by the fiscal expenditure and the number of students enrolled in general higher education, which is a full manifestation of the fact that the amount of the net export and the gross value of tertiary industry has a significant impact on rural GDP, which is highly consistent with the previous analysis.

The analysis of the variance of the natural log of GDP(%)

Period S.E. GDP X1 X2 X3 X4 X5 X6
1 0.01464 100.0000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
2 0.02642 80.19756 1.89617 0.38904 0.01614 8.4983 7.82309 1.15286
3 0.04793 61.07213 1.79602 1.46392 0.21004 27.21808 7.34092 0.93649
4 0.09444 48.26298 6.05166 1.00878 2.33838 30.11781 10.77132 1.43688
5 0.04111 38.86444 6.89306 0.95321 10.51809 24.88367 15.31345 2.61018
6 0.07439 31.58043 5.14293 1.60678 15.9649 20.95409 21.90103 2.92125
7 0.07039 27.3731 5.23869 2.3259 15.48904 21.80547 25.31351 2.45179
8 0.09172 25.01815 5.40247 2.4154 13.97588 24.84384 26.36582 2.05364
9 0.11117 24.07645 4.93682 2.13853 13.07808 26.59145 27.30454 1.78482
10 0.06304 23.96772 4.5089 1.96726 12.36968 27.23405 28.1668 1.6897
11 0.1295 24.22285 4.29938 1.92374 11.50914 27.60699 28.81192 1.65327
12 0.05609 24.21947 4.15354 1.93988 11.17518 27.71136 29.21589 1.55658
13 0.09017 24.11485 4.43411 1.9123 11.2414 27.44178 29.28908 1.49808
14 0.10479 23.75325 5.0351 2.13766 11.50924 26.90724 29.11073 1.48679
15 0.09456 23.38903 5.35229 2.90684 1156786.9996 26.68382 28.63542 1.46943
16 0.10775 22.80885 5.51564 3.89579 11.26928 27.32772 27.80869 1.41112
17 0.14629 21.98509 5.84277 4.77636 10.74959 28.6756 26.63784 1.30445
18 0.11912 21.08472 6.33147 5.10996 10.36302 30.2377 25.64932 1.27842
19 0.09877 20.23258 6.71344 5.08929 10.04628 31.60989 25.03608 1.21295
20 0.11584 19.56506 6.998 4.91568 9.83756 32.63112 24.85332 1.28206
Cholesky Ordering:LNGDP LNC LNG lni LNNX Lntiv LNCS
Conclusion

Rural economy is an important element of rural revitalization, and it also concerns the quality of life of farmers. Therefore it is of great significance to forecast the dynamic changes of rural economy.

The article takes the GDP data of a village for a total of 12 months from January 2013 to December 2013 as the research object, and combines the multivariate statistical method to carry out empirical analysis. The results of the empirical analysis show that a 1% increase in rural investment triggers a 0.365521% increase in GDP. The least impact on the rural economy is the number of people working in the society, when the number of people working in the society increases by 0.092311% for every 1% increase in GDP. Primary agricultural products, farmers’ consumption, agricultural means of production and money supply have a pulling effect on GDP, and the results of the study have implications for the government in formulating policies for rural economic development.

Language:
English