Research on the technical framework and critical path of new energy portfolio prediction based on multi-algorithm fusion

As environmental protection and energy efficiency issues are increasingly emphasized, the application of renewable energy in power systems has gradually become a research hotspot. However, the intermittency and instability of renewable energy bring new challenges to the power system. In order to solve this problem, renewable energy prediction has an important application value in the power system [1–4].

New energy forecasting techniques are mainly divided into two categories: physical methods and statistical methods. Physical methods are mainly based on meteorological, geographic, environmental and other factors, combined with tools such as numerical weather prediction, to forecast new energy output [5–7]. Statistical methods, on the other hand, are based on historical data and use various statistical models to predict the new energy output. These statistical models include regression analysis, support vector regression, neural network, etc [8–10]. And the current main prediction technique for new energy is support vector regression (SVR), which is used to predict wind and solar power output in power systems [11–14]. By extracting features from historical data and using them as inputs, SVR models can learn patterns in historical data to predict future renewable energy output [15–16]. These techniques can be used individually or in combination to improve the accuracy and reliability of the predictions. And their prediction results are for reference only, because they are affected by many factors, including policy, technology, economy, environment and other factors [17–20]. Therefore, new energy prediction is a complex problem that integrates multiple factors and requires the comprehensive use of various methods and data for analysis [21–22].

This paper analyzes the power calculation of two kinds of power generation, wind power and photovoltaic power generation, sets up the wind and light prediction step, and explains the source of power prediction error of wind and light power generation. Logistic regression model, ARMA model, and gray GM (1, 1) model are proposed respectively, and the induced ordered weighted average operator (IOWA) is introduced to fit each single prediction model, and a new energy power prediction model with multi-algorithm fusion is built. Dividing the weather types, a total of three prediction algorithms, including LSSVM, similar day and combined prediction model based on ordered weighted average operator, are utilized to make short-term prediction of PV power under different weather types. Taking the wind power data as the research object, the ARMA model, LSSVM model and the combined prediction model in this paper are used for the ultra-short-term prediction of wind power.

2

Technical framework for new energy power mix forecasting

New energy power prediction technology has developed rapidly in recent years, with hundreds of algorithmic models. There has been a great improvement in data cleaning, meteorological information utilization, and other aspects. However, the new energy-power prediction algorithms from different manufacturers and strategies have their own strengths, and the matching effect with specific prediction scenarios is not consistent. The tolerance of new energy to extreme weather in actual operation is relatively fragile. Existing prediction technology is difficult to achieve accurate predictions in extreme weather, which may further lead to the aggravation of power supply tension.

Therefore, facing the actual production needs of dispatching operation, we explore the use of artificial intelligence technology path, study the new energy power adaptive combination prediction enhancement method considering multi-algorithm access according to local conditions, improve the new energy power prediction accuracy under multi-dimensional scenarios, and provide support for the dispatching operation of new energy-based electric power system under the complex market environment.

2.1

Classification of complex weather types

The general idea of dividing the weather types is to classify the complex weather types of the study object based on the observed fluctuations in the PV power and the changes in the important factors affecting the power.

In this paper, the weather is classified by calculating and defining the sample entropy of the key influencing factor, solar radiation, for the daily PV power curve under study. Generally, for a dataset consisting of N data x(n) = {x(1), x(2), ⋯, x(N)}, the sample entropy is calculated as follows:

The data are sequentially composed into a m-dimensional vector sequence as follows: 1 $X_{m} (i) = {x (i), x (i + 1), ..., x (i + m - 1)}$ where X_m(i) denotes a m-dimensional vector sequence and x(i) denotes the sample data.

Define the distance d[X_m(i), X_m(j)] as the absolute value of the maximum difference in the corresponding elements of the m-dimensional vector sequence. i.e: 2 $d [X_{m} (i), X_{m} (j)] = \max_{k = 0, \dots m - 1} (1 x (i + k) - x (j + k) 1)$

For a given X_m(i), count the number of j(l ≤ j ≤ N – m, j ≠ i), denoted B_i, whose corresponding elemental distance between X_m(i) and X_m(j) is less than or equal to r, on the basis of a tolerable deviation of r.

For 1 ≤ j ≤ N – m, define: 3 $B_{i}^{m} (r) = \frac{1}{N - m - 1} B_{i}$ 4 $B^{m} (r) = \frac{1}{N - m} \sum_{i = 1}^{N - m} B_{i}^{m} (r)$ where $B_{i}^{n} (r)$ denotes the share of B_i in the total number of vectors. Bⁿ(r) denotes the average of all $B_{i}^{m} (r)$ .

Increase the number of dimensions to m+l and count the number of x_m + 1(i) whose distance from x_m + 1(j)(1 ≤ j ≤ N – m, j ≠ i) is less than or equal to r, denoted as A_i: 5 $A^{m + 1} (r) = \frac{1}{N - m - 1} A$ 6 $A^{m + 1} (r) = \frac{1}{N - m} \sum_{i = 1}^{N - m} A_{i}^{m} (r)$ where $A_{i}^{m + 1} (r)$ denotes the share of A_i in the total number of vectors. Aⁿ + 1(r) denotes the average of all $A_{i}^{m + 1} (r)$ .

The sample entropy is defined as: 7 $S a m p E n (m, r) = \lim_{N \to \infty} {- \ln [A^{m + 1} (r) / B^{m + 1} (r)]}$ where SampEn(m, r) denotes the sample entropy.

The sample entropy defines the complexity of this radiometric sequence and has a value between 0 and 1.

Set E₁ as the reference value for determining the complexity of the experimental sequence, and when the sample entropy of a sample sequence is greater than E₁, it is determined as a complex weather type.

The sample entropy reference value is determined by the following equation: 8 $E_{1} = 0.1 \sqrt{\frac{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}}{n}}$ where E₁ represents the sample entropy reference value. x_i represents the solar radiation sample data. $\bar{x}$ represents the average of the cloud optical thickness sample data. n represents the total amount of sample data.

2.2

Wind power generation and forecast

Implementing energy saving and emission reduction measures to promote wind power generation and photovoltaic power generation (hereinafter referred to as wind power) are growing rapidly. Large-scale wind and solar power generation grid operation, its randomness, volatility and uncertainty to the grid safety operation has a profound impact. The contradiction between the safe operation of the power grid and the large amount of new energy consumption is increasingly apparent. In such a situation, the refined management of PV power generation, highly accurate prediction, and the integration of wind power generation into the grid scheduling plan development and real-time operation and control is one of the important measures to solve this problem.

2.2.1

Power calculations

Wind power generation uses the wind to push the windmill blades to rotate, and then through the speed booster to enhance the rotational speed, to push the generator to generate electricity, to realize the process of wind energy to mechanical energy and then to electrical energy. The basic formula for calculating wind power is: 9 $P = (C_{p} \times ρ \times v^{3} \times A) / 2$

Where C_p is the wind energy utilization factor. ρ is the air density, which is related to the altitude and humidity. v is the wind speed at hub height, and A is the impeller sweep area.

Photovoltaic power generation converts solar energy into electricity through solar photovoltaic panels, and its influencing factors are mainly solar radiation, clear sky index, sunshine hours, clouds, temperature, wind speed and dust. Taking the PV array system as an example, the engineering formula for its output power is: 10 $P = N \times n_{1} \times n_{2} \times n_{3} \times A \times R_{a} \times [1 - s (T_{c} - 25)]$

Where P is the PV output power. N is the number of PV groups. n₁ is the PV conversion efficiency. n₂ is the tracking efficiency (highest power point).

2.2.2

The basic process of wind forecasting

The basic process of wind and solar forecasting is: 1)

Accurate and detailed research to realize the acquisition and selection of historical wind power generation data and information.

2)

Processing of historical wind power generation data.

3)

Pre-processing of wind power generation data, including normalization, smoothing and interpolation of missing data, etc. The main processing methods for abnormal data are horizontal processing and vertical processing.

4)

Scientific and reasonable construction of wind power generation prediction model, a good prediction model can perfectly summarize the laws and trajectories of the prediction object, the correct selection of the model is a crucial part of the prediction process.

5)

According to the prediction error of the comprehensive analysis of the influencing factors, and accordingly the prediction model and algorithm to make reasonable adjustments to the prediction value of the appropriate correction and finalization.

2.2.3

Sources of power prediction error

In the actual prediction process, the influence of weather, prediction model, or non-objective factors leads to the generation of prediction errors, which has a certain impact on the prediction accuracy. The existence of wind power prediction errors will have an impact on the reasonable scheduling plan of the power grid, affecting the normal operation of the power grid. In this paper, the main use of historical data information as a reference data information to carry out short-term wind power prediction. Because historical data information is not real-time data information, and there is a certain degree of error, so only to control the effective error in a determined range, to ensure that the prediction value is closer to the actual value, so that the prediction accuracy is higher. The main reasons for the formation of effective errors are usually the following four levels: 1)

The influence of data information. The different sources of data information and the influence of the data collection system cause the collected data to be incomplete, which in turn makes the data information detected by the pre-operational work have a definite difference. When affected by other uncertainties, it can also lead to problems such as lack of data information or errors.

2)

Influenced by the role of weather. The power output from wind farms is affected by factors such as wind speed, wind direction and ambient temperature, and the uncertainty of the weather can cause the data information predicted at a certain time period to deviate relative to that output in normal weather, thus seriously affecting the prediction accuracy of power.

3)

Selection of prediction model. Since the prediction accuracy of different prediction models is different, in the process of selecting the model, if all the influencing factors are not taken into account, it will also lead to an increase in the error. Therefore, when selecting the prediction model, we need to comprehensively consider the impact of various factors and select the most appropriate model with the highest accuracy after comprehensive consideration and comparative analysis.

4)

The influence of human factors. Even if the staff involved in the prediction is experienced, there are limitations in their own technical level. Considering personal subjective factors, there may be data entry errors during data entry, which may cause unpredictable errors in the prediction results.

2.3

Combined prediction models

Combined prediction modeling is the combination of two or more single prediction models in a specific way to form a comprehensive set of combined prediction methods [23–24]. There are many combined prediction methods, but each has its own advantages and disadvantages. Under different data conditions, their prediction effects may be very different, but there is also a certain connection between them, which can complement each other. If the error of a single prediction model is extremely large in the combination of prediction methods, you can opt to eliminate it and use the most suitable combination of several single models.

Considering the characteristics of various single prediction models, the single prediction models are closely linked together by the method of weights, and then the definition of combined prediction model is proposed. In the combined forecasting model, a single forecasting model has a large error, but it will not have a high degree of influence on the whole. The decision maker uses only one forecasting model for forecasting. If the selected model does not match the data at all, the error in the predicted value will be large and may have a great impact on the decision. Whereas, if a combination of forecasting methods is applied for forecasting, there will not be a situation where the error value is too large. Therefore, the combination of prediction methods can further improve the accuracy and reliability of data prediction on the basis of a single prediction method.

Combined prediction aims to organically combine several different prediction methods, and the information and data of these single models are closely combined through the weighting method. Then the weights are calculated using the Shapley value method. Finally, the prediction model is obtained by combining several methods through weights. The first problem to be solved in the combined prediction model is to calculate the weights of each single prediction model, which allows various single prediction methods to be combined more effectively, thus improving the accuracy of the prediction.

The combined prediction weights are determined by the Shapley value method, which is a mathematical method mainly used to solve multi-individual cooperative relationships.The Shapley value method is used to allocate the maximum benefit method, which can reflect the importance of each individual in the cooperation.

By doing so, the method is applied to combinatorial forecasting studies by assuming each forecasting model as an individual in a cooperative relationship. The error generated by each individual model is considered as an economic benefit from the cooperation of n individuals. Finally, the importance of a single model is determined by assigning weights according to the magnitude of its contribution.

The definition is done by the following method:

The prerequisite is that there are various methods of combining forecasts, which are noted here: 11 $I = {\begin{matrix} 1, 2, 3, \dots, n \end{matrix}}$

For all subsets t and s, (denoting the n combined prediction methods), E(t),E(s) denotes the error values of the various combined prediction methods. It can be defined as: I:

for any subset t,s: 12 $E {t \cup s} \leq E (t) + E (s)$

E{t⋃s}, E(t), E(s) is the error generated by each combination of predictions.

II:

s ⊆ I, Y_i are the absolute error values assigned to the i th prediction model in the combination.

III:

The total error value among the combined prediction models is E(n): 13 $E (n) = \sum_{i \in n} Y_{i}$

E is the combined prediction model error value, which is given by Eq: 14 $E = \frac{1}{n} \sum_{i = 1}^{n} E_{i}$

E_i is the mean absolute error of the i nd prediction model: 15 $E_{i} = \frac{1}{m} \sum_{j = 1}^{m} | e_{i j} |, (i = 1, 2, 3, \dots, n)$

E denotes the total error value of the combined prediction, m denotes the number of samples, and n denotes the number of single prediction models The most error values are assigned in the following way.

Assumptions i denotes a single prediction model in the combined prediction model, s denotes all subsets containing i models, |s| denotes the number of single prediction models in the combined prediction model. n the total number of combined prediction methods, s – (i) denotes the removal of the i th model from the combined prediction model. w(|s|) denotes the value of the combined marginal contribution to be borne by the i forecasting model in the weighted factor combination forecasting model. Shapley’s value is E_i and E_i denotes the amount of error shared by that single forecasting model: 16 $w (| s |) = \frac{(n - | s |)! (| s | - 1)}{n!}$ 17 $E_{i} = \sum_{s_{i} \in s} w (| s |) [E (s) - E (s - {i})]$

The weights of the single prediction model are: 18 $w_{i} = \frac{1}{n - 1} \frac{E - E_{i}}{E_{i}}, i = 1, 2, 3, \dots, n$

3

New energy power prediction based on combinatorial modeling

In this paper, three forecasting models are combined to propose combined forecasting modeling. The combined prediction modeling can maintain the characteristics of the original system, better reflect the development trend of the system, with higher prediction accuracy, and can be widely adapted to the near-term, short-term, and medium- and long-term predictions.

3.1

Prediction Model Based on Logistic Regression Theory

Logistic regression is a multivariate analysis method to study the relationship between the dependent variable as dichotomous and multicategorical observations and the influencing factors (independent variables), which belongs to probabilistic nonlinear regression. There are dichotomous and multicategorical regression models. For dichotomous logistic regression, the dependent variable y has only two values of “yes and no”, which are labeled 1 and 0. Assuming that the probability of y taking a “yes” is p and the probability of taking a “no” is 1 – p under the action of the independent variable, the study is about the relationship between the probability of p occurring when y takes a “yes” and the independent variable. 1)

Logistic regression model

Logistic regression model is to build $L n (\frac{p}{1 - p})$ a linear regression model with the dependent variable.Logistic regression model is: 19 $L n (\frac{p}{1 - p}) = β_{0} + β_{1} x_{1} + \dots + β_{p} x_{p} + ε$

Since the range of values of $L n (\frac{p}{1 - p})$ is (–∞, +∞), such that the independent variable x₁, x₂, …, x_p can take values in any range. Denote g(x) = β₀ + β₁x₁ + ⋯ + β_px_p and get: 20 $p = P (y = 1 | X) = \frac{1}{1 + e^{- g (x)}}$ 21 $l - p = P (y = 0 | X) = 1 - \frac{1}{1 + e^{- g (x)}} = \frac{1}{1 + e^{g (x)}}$

2)

Logistic regression model interpretation: 22 $\frac{p}{1 + p} = e^{β_{0} + β_{1} x_{1} + \dots + β_{p} x_{p} + ε}$ where β₀ is the natural logarithm of the ratio of the probability of y = 1 and y = 0 occurring in the absence of an independent variable, i.e., x₁, x₂, …, x_p taking all zeros. β_i is the logarithm of the ratio of y = 1 dominance when an independent variable x_i changes, i.e., x_i = 1 compared to x_i = 0.

3.2

Forecasting model based on ARMA theory

ARMA model is obtained on the basis of AR model and MA model.ARMA model is built as follows.

AR(p) Model: 23 $u_{t} = c + ϕ_{1} u_{t - 1} + ϕ_{2} u_{t - 2} + \dots + ϕ_{p} u_{t - p} + ε_{t}, t = 1, 2, \dots T$ where parameter C is the parameter. ϕ₁,ϕ₂,…,ϕ_p is the autoregressive model coefficient. p is the autoregressive model order. ε_i is a white noise series with mean 0 and variance σ².

MA(q) Model: 24 $u_{t} = μ + ε_{t} + θ_{1} ε_{t - 1} + θ_{2} ε_{t - 2} + \dots + θ_{p} ε_{t - p} + ε_{t}, t = 1, 2, \dots T$ where parameter μ is a constant. θ₁, θ₂, …, θ_q is the coefficient of the q rd order MA model. ε_i is a white noise sequence with mean 0 and variance σ².

ARMA(p, q) model: 25 $u_{t} = c + ϕ_{1} u_{t - 1} + ϕ_{2} u_{t - 2} + \dots + ϕ_{p} u_{t - p} + ε_{t} + θ_{1} ε_{t - 1} + \dots + θ_{q} ε_{t - q}, t = 1, 2, \dots T$

The ARMA model is identified as follows: 1)

Autocorrelation coefficient

The autocorrelation coefficient of the time series u_t lag k order is estimated by the following equation: 26 $r_{k} = \frac{\sum_{t = k + 1} (u_{t} - \bar{u}) (u_{t - k} - \bar{u})}{\sum_{t = 1}^{T} {(u_{t} - \bar{u})}^{2}}$ where $\bar{u}$ is the mean, which is the correlation coefficient of the values of 1 separated by k periods. r_k is the autocorrelation coefficient of the sequence u_i.

2)

The partial autocorrelation coefficient is the conditional correlation between u_t and u_t – k given u_t – 1,u_t–2,…u_t–k–1. Its degree is measured by the partial correlation coefficient φ_k,k. It is calculated with a lag of order k by the formula: 27 $φ_{k, k} = {\begin{matrix} r_{1}, k = 1 \\ \frac{r_{k} - \sum_{j = 1}^{k - 1} φ_{k - 1, j} r_{k - j}}{1 - \sum φ_{k - 1, j} r_{k - j}}, k > 1 \end{matrix}$ 28 $φ_{k, j} = φ_{k - 1, j} - φ_{k, k} φ_{k - 1, k - j}$

This is a consistent estimate of the partial correlation coefficient. To get a precise estimate of φ_k,k, a regression is needed: 29 $u_{t} = α_{0} + α_{1} u_{t - 1} + \dots + α_{k - 1} u_{t - (k - 1)} + φ_{k, k} u_{t - k} + ε_{t}, t = 1, 2, \dots, T$

Thus, the partial autocorrelation coefficient of lag k order is the coefficient u_t–k when u_t regressed on u_t – 1,u_t–2,…u_t–k. It measures the correlation of the distance between k periods without taking into account the correlation of k – 1 hence it is called biased correlation.

3)

Identification of MA models

MA(q) Model: 30 $u_{t} = μ + ε_{t} + θ_{1} ε_{t - 1} + θ_{2} ε_{t - 2} + \dots + θ_{p} ε_{t - p} + ε_{t}, t = 1, 2, \dots T$ where ε_i is a white noise sequence with mean 0 and variance σ², and u_i has mean u and then self-covariance τ_k. i.e: 31 $τ_{k} = E (u_{t + k} - u) (u_{t} - u) = E (ε_{t} + \sum θ_{j} ε_{t - j}) (ε_{t + k} + \sum θ_{i} ε_{t + k - i})$

Get: 32 $r_{k} = \frac{τ_{k}}{τ_{0}} = {\frac{\begin{matrix} 1, k = 0 \\ θ_{k} + θ_{1} θ_{k + 1} + \dots + θ_{q - k} θ_{q} \end{matrix}}{\begin{matrix} 1 + θ_{1}^{2} + \dots + θ_{q}^{2} \\ 0, k > q \end{matrix}}, 0 < k \leq q$

MA(q) model when k > q. r_k = 0. u_t is uncorrelated with u_t_+k. This phenomenon is called truncated tail.

4)

Identification of the AR model

AR (p) the autocorrelation coefficient of the process: 33 $r_{k} = g_{1} λ_{1}^{k} + g_{2} λ_{2}^{k} + \dots + g_{p} λ_{p}^{k}$ where λ₁, λ₂, … λ_p is the p characteristic root of the characteristic polynomial λ^p – ϕ₁λ^p⁻¹ – ϕ₂λ^p⁻² –⋯– ϕ_p = 0 of the AR(p) model and g₁, g₂, g_p is any given p constant.

Information about the AR(p) model is obtained through the autocorrelation coefficients, so the autocorrelation process AR(p) can be described more efficiently by using partial autocorrelation coefficients φ_k,k. For a AR(p) model: 34 $u_{t} = c + ϕ_{1} u_{t - 1} + ϕ_{2} u_{t - 2} + \dots + ϕ_{p} u_{t - p} + ε_{t}, t = 1, 2, \dots T$

Multiply both sides together by u_t – k(k = 1,2,⋯,p), then take the expected value and divide by the variance of u_t to obtain the following system of linear equations about ϕ₁, ϕ₂, ⋯ ϕ_p: 35 ${\begin{matrix} ϕ_{1} + ϕ_{2} r_{1} + \dots + ϕ_{p} r_{p - 1} = r_{1} \\ ϕ_{1} r_{1} + ϕ_{2} + \dots + ϕ_{p} r_{p - 2} = r_{2} \\ ⋮ \\ ϕ_{1} r_{p - 1} + ϕ_{2} r_{p - 2} + \dots + ϕ_{p} = r_{p} \end{matrix}$ where r₁, r₂, ⋯ r_p is the 1,2,⋯p rd order autocorrelation coefficient of sequence u_t.

5)

Identification and establishment of the model

In the operation of identifying the model, the autocorrelation coefficient and partial autocorrelation coefficient can not be used to identify the specific form of the model, but only as a basis. Recognition also needs to autocorrelation and partial autocorrelation coefficients constantly data test to select the standard model.

For a sequence with number T, its autocorrelation coefficient distribution is: 36 $\hat{r} ~ N (0, 1)$ $$\widehat r \sim N(0,1)$$

The distribution of partial autocorrelation coefficients is: 37 ${\hat{φ}}_{k, k} ~ N (0, \frac{1}{\sqrt{n}})$ $${\hat \varphi _{k,k}} \sim N\left( {0,{1 \over {\sqrt n }}} \right)$$

For the smooth sequence u_t, a histogram of the sequence of estimates is drawn based on the calculated autocorrelation and partial autocorrelation coefficients, and the boundary line of 2σ is given to further approximate the specific form of model chosen for the sequence.

3.3

Prediction model based on gray GM(1,1) theory

The gray model mainly reveals the process of continuous development and change between things within the system [25]. 1)

Grade ratio test judgment

By calculating the level ratio for the sequence X⁽⁰⁾ = (x⁽⁰⁾(1),x⁽⁰⁾(2),⋯,x⁽⁰⁾(n)), n ≥ 4 which has been given by the model, there are: 38 $\hat{c} (k) = \frac{k - 1}{x^{(0)} (k)}, k = 2, 3, 4, \dots, n$

The sequence of grade ratios is obtained as: 39 $\partial = (\partial (2), \partial (3), \partial (4), \dots, \partial (n))$

Finally check whether the level ratio is within the tolerable range, if all fall within the acceptable range, modeling for prediction can be done with time series as GM(1, 1).

2)

Historical data processing

Data change processing is for the level ratio test does not pass the sequence of planning processing to meet the level ratio test, commonly used data transformation methods are, logarithmic transformation, translation transformation and square root transformation method.

3)

Establishment of gray prediction model based on differential differential equations

Based on differential differential equations to establish the model gray GM (1, 1) prediction model.

The following sequence is obtained by summing up the sums of the order X⁽⁰⁾ = (x⁽⁰⁾(1),x⁽⁰⁾(2),⋯,x⁽⁰⁾(n)), n ≥4 : 40 $X^{(1)} = (x^{(1)} (1), x^{(1)} (2), \dots, x^{(1)} (n)), n \geq 4$

Then the difference differential equation corresponding to the gray GM(1, 1) model is: 41 $\frac{d X^{(1)} (k)}{d t} + a X^{(1)} (k) = u$ where a is the development coefficient and u is the amount of gray action. This equation satisfies the initial condition when k = 1 when X⁽⁰⁾ (1) = X⁽⁰⁾ (1).

Its solution is: 42 $X^{(1)} (k) = [X^{(0)} (1) - \frac{u}{a}] e^{- a (k - 1)} + \frac{u}{a}$

4)

A residual test was performed with the test formula: 43 $Δ^{(0)} (i) = | X^{(0)} (i) - {\hat{X}}^{(0)} (i) |, i = 1, 2, 3, \dots, n$ 44 $ϕ (i) = \frac{Δ^{(0)} (i)}{X^{(0)} (i)} \times 100 %, i = 1, 2, 3, \dots, n$

Generally, the test formula requires ϕ(i) ≤ 20% and optimally achieves ϕ(i) ≤ 10%. If the gray GM(1, 1) model built with the original data fails to pass the test or the test accuracy is not enough, the residuals of the model are corrected so as to improve the model prediction accuracy.

3.4

Combined prediction model based on ordered weighted average operator

This section introduces the induced ordered weighted average (IOWA) operator. The fitting assignments of each single-phase prediction method are ranked from high to low on the sample interval, and the sum of squared errors is used as a criterion to give the method of determining the weight coefficients of the combined prediction model based on the IOWA operator and analyze the error comparison. 1)

Model construction

Let observation {x_i, t = 1,2,⋯,N} of the indicator series be an indicator of a socio-economic phenomenon, and suppose that there are m feasible individual forecasting methods. The predicted value (or fitted value) at the t th moment of the i rd prediction method is x_i,i = 1,2,⋯,m, t = 1,2,⋯,N. Let the weighting coefficient of the m individual predictions in the combined prediction be L = (l₁,l₂,⋯,l_m)^T, which satisfies $\sum_{i} = 1, l_{i} \geq 0 j = 1, 2, \dots, m$ Order ${\hat{x}}_{i} = \sum_{i = 1}^{m} l_{i} x_{i i}, t = 1, 2, \dots, N$ , then the traditional weighted arithmetic average combined prediction value at the tth moment is 0. Order: 45 $a_{i t} = {\begin{array}{l} 1 - | (x_{t} - x_{i t}) / x_{t} |, & when | (x_{t} - x_{i t}) / x_{t} | < 1, \\ 0, & when | (x_{t} - x_{i t}) / x_{t} | \geq 1, \end{array} i = 1, 2, \dots, m, t = 1, 2, \dots, N .$

In the prediction method of the i st kind, the prediction accuracy of the t nd moment is a_it obvious a_it ∈ [0,1]. Into the prediction value x_it is regarded as the induced value of the prediction accuracy a_i such that m a single prediction method. The prediction accuracy at the t th moment and its corresponding sample interval prediction value can then form m a two-dimensional array, i.e.: 46 $(〈 a_{i}, x_{i t} 〉, 〈 a_{2 t}, x_{2 t} 〉, \dots, 〈 a_{m t}, x_{m t} 〉)$

Putting m a single prediction method, the sequence of prediction accuracies a_1t, a_2t, ⋯ ,a_m at the t nd moment is ordered from largest to smallest, and let a – index(it) be the i th large subscript of prediction accuracies. Order: 47 $I O W A_{L} (〈 a_{u}, x_{u} 〉, 〈 a_{2 t}, x_{2 t} 〉, \dots, 〈 a_{m t}, x_{m t} 〉) = \sum_{i = 1}^{m} l_{i} x_{a - i n d e x (i t)}$

The predicted value of the combination of the sequence a_1t, a_2t, ⋯, a_mt generating operator IOWA with prediction accuracy at moment t is equation (47). Let e_a – index(it) = x_t – x_{a–index(it)}, and thus S be the sum of squares of the errors of the total portfolio predictions for the N periods. i.e: 48 $\sum_{t = 1}^{N} {(x_{t} - \sum_{i = 1}^{m} l_{i} x_{a - i n d e x (i t)})}^{2} = \sum_{i = 1}^{m} \sum_{j = 1}^{m} l_{i} l_{j} π (\sum_{t = 1}^{N} e_{a - i n d e x (i t)} e_{a - i n d e x (j t)})$

Therefore, the combinatorial prediction model with the sum-of-squares-of-errors criterion can be equated to the following optimization model: 49 $\min S (L) = \sum_{i = 1}^{m} \sum_{j = 1}^{m} l_{i} l_{j} (\sum_{t = 1}^{N} e_{a - i n d e x (i t)} e_{a - i n d e x (j t)})$ 50 $s . t {\begin{array}{l} \sum_{i = 1}^{m} l_{i} = 1, \\ l_{i} \geq 0, i = 1, 2, \dots, m \end{array}$

Let $E_{i j} = \sum_{ι = 1}^{N} e_{α - i n d c x (i t)} e_{α - i n d e x (i t)}$ , i, j = 1,2,⋯,m, then E = (E_ij)_mcm is said to be the combined prediction error information square for the m rd order IOWA operator, so there: 51 $\min S (L) = L^{T} E L; s . t {\begin{array}{l} R^{T} L = 1 \\ L \geq 0, \end{array}$ where R = (1,1,⋯,1)^T, if the nonnegativity of the combined prediction weight vector L of the IOWA operator is not considered, there is: 52 $\min S (L) = L^{T} E L; s . t . R^{T} L = 1$

2)

Model Prediction

Based on the concept of induced ordered weighted average operator, this section proposes a new combination prediction model, which can be solved by a quadratic programming model to obtain the optimization coefficient of the combination prediction IOWA on the sample space, set to $L = (^{l_{1}^{*}, l_{2}^{*}, \dots, l_{m}^{*}) T}$ . According to the principle of prediction coherence, it looks that it can be used to carry out the prediction of the prediction interval [N + 1, N + 2, ⋯] IOWA combinations as: 53 $I O W A_{L^{'}} (〈 a_{1 t}, x_{1 t} 〉, 〈 a_{2 t}, x_{2 t} 〉, \dots, 〈 a_{m t}, x_{m t} 〉) = \sum_{i = 1}^{m} I_{i}^{'} x_{a - i n d e s (i t)}, t = N + 1, N + 2, \dots$

The ordering principle for the prediction accuracy sequence a_1t, a_2t, …, a_mt over prediction interval [N+1,N+2, ] is to rank the individual prediction methods on the sample interval according to the high or low average accuracy of the fit over the most recent time period.

4

Power prediction analysis based on weather type

4.1

PV power prediction errors under different weather types

The prediction of PV power is an important guide for the operation of PV power plants and the scheduling of power grids. Weather type and PV power show a strong correlation, and changes in weather type cause the PV power curve to change accordingly. PV power has randomness and uncertainty, which makes the current accuracy of PV power prediction error still limited. Therefore, it is particularly important to analyze PV power prediction errors.

PV power has obvious seasonal and daily characteristics, and the influence of the external environment is very obvious in the daily, so the influence of different weather types on the PV power prediction error should not be underestimated.

As a case in point, consider a PV power plant in Ashland, USA in 2019. The weather is roughly divided into three types: sunny, rainy, and cloudy. Then a total of three prediction algorithms are used, including least squares support vector machine (LSSVM), similarity day, and a combined prediction model based on ordered weighted average operator. The short-term prediction of PV power under the three different weather types is shown in Fig. 1.

As can be seen from the figure, the predictions for sunny days are more accurate and have smaller errors compared to the rainy and cloudy predictions that fluctuate drastically. Whereas in rainy and cloudy weather conditions, the prediction results are relatively poor.

In the rainy day prediction, the similar day with LSSVM prediction model performs better, and the combined prediction model based on ordered weighted average operator proposed in this paper has poor prediction accuracy, and the range of fluctuation always stays above and below 2000W.

The PV power prediction error results are shown in Table 1. From the table, it can be seen that regardless of the prediction method used, the prediction error value on sunny days is the lowest. The prediction error values of the three prediction models on a sunny day are all less than 0.1. This is mainly due to the fact that there are fewer clouds in the case of a sunny day, which reduces the loss of solar rays received by the PV panels. The fluctuations are also relatively smoother and the PV power curve is relatively smoother.

Table 1.

Photovoltaic power prediction error results

Prediction method	Sunny day		Rainy day		Orb
Prediction method	MAE	RMSE/%	MAE	RMSE/%	MAE	RMSE/%
LSSVM	0.0174	3.5965	0.0556	9.1053	0.0689	15.0334
Similar day	0.0136	3.1277	0.0412	8.2114	0.0454	11.6782
The combination model of this article	0.0075	1.8069	0.0295	7.6067	0.0337	6.0381

The cloudy weather conditions have the largest prediction error values, which may be mainly due to the fact that cloudy weather conditions are characterized by violent cloud motion disturbances with strong uncertainties, resulting in high output PV power volatility. In the vast majority of rainy weather conditions, the actual PV output power will be lower because of the decrease in solar irradiance on cloudy days, with small disturbances brought about by cloud motion before and after the rainfall. However, the small fluctuations are not as dramatic as in cloudy weather. This shows that the weather conditions have a great influence on the prediction error, and the processing of weather information should not be neglected in the future PV power prediction, and it is necessary to characterize the distribution of the PV power prediction error.

4.2

Performance analysis of combinatorial prediction models

In order to verify the superiority of the new energy power prediction model with the combination of multiple algorithms proposed in this paper, the ultra-short-term prediction of wind power is simulated and compared by using the time series ARMA model, the Least Squares Support Vector Machine (LSSVM), and the combination prediction model in this paper, respectively.

4.2.1

Description of wind power data

Most of the existing wind power stations are not equipped with meteorological prediction system and lack of corresponding meteorological data, so this section only takes the wind power data as the research object, explores its intrinsic law, and predicts the power generation of wind power stations in the future moment.

Taking wind power data from a wind farm with an installed capacity of 60MW in Henan Province as the research object, 1200 wind power measured data collected during the time period from February 20 to March 8, 2019 are taken as the test data, and the sampling interval of the original data is 15min, and the first 1000 data are taken as the training data of the model, and the last 150 data are taken as the test data of the model for the super short-term prediction. The raw wind power data are shown in Figure 2. The above wind power sequence is obtained as l_max = 0.136 > 0 by wolf method, which proves that it has chaotic property. As a result, the delay time τ of this wind power sequence is calculated as 16 and the delay time window width τ_w as 62 using C-C method, and the embedding dimension m is further calculated as 6. The sample input test set and training set for the ultra-short-term prediction model for wind power are constructed using Eq.

4.2.2

Results and Analysis of Ultra-short-term Forecasts of Wind Power

The ultra-short-term prediction results of wind power are shown in Figure 3. The experimental ARMA model, LSSVM and the combined prediction model in this paper all have good results for wind power prediction, and can track the changes of the real value in real time, but all have a certain lag. The difference between the real value of wind power and the predicted value of three models is more constant, and the sampling point data reaches the maximum value of wind power between 60 and 80.

The absolute error of ultra-short-term wind power prediction is shown in Fig. 4.The absolute error value of wind power prediction of LSSVM model is the largest, and the absolute error curve fluctuates significantly.The maximum value of the prediction error of LSSVM model is 5.49.The absolute error value of the combination of prediction model proposed in this paper (the red line in the figure) maintains the overall fluctuation range of the value of the absolute error at [0,3].

It can be seen that there is a certain difference in the prediction accuracy of each method, although at different moments, each method has a different degree of error advantage and disadvantage. But in general, the combined prediction model proposed in this paper can maintain a high level of prediction accuracy.

For further objective illustration, the normalized root mean square error e_NRMSE, normalized mean absolute error e_NMAE and correlation coefficient I_c are calculated for the models.The results of the prediction errors of the three prediction models are shown in Table 2.

Table 2.

The prediction error of the three prediction models

Prediction method	e_NRMSE (%)	e_NMAE (%)	I_c
ARMA	5.49	6.79	0.9065
LSSVM	4.25	5.33	0.9124
The combination prediction model of this article	2.73	4.26	0.9567

As with the PV prediction, the combined prediction model in this paper has the smallest prediction error and the largest correlation with the true value, with a correlation coefficient of 0.9567.

5

Conclusion

This paper is oriented to the actual production needs of power dispatch operation, and proposes a new energy power prediction model with multi-algorithm fusion to improve the accuracy of new energy power prediction under complex weather, reduce the power prediction error of wind power generation, and optimize the power supply. 1)

The Shapley value method is used to determine the weights of the combined model, and the induced ordered weighted averaging operator is introduced to fuse the Logistic model, the time-series ARMA model, and the gray prediction GM (1, 1) model to form the combined prediction model. The prediction errors of the combined prediction model based on the ordered weighted averaging operator are analyzed under three weather types: sunny, rainy, and cloudy. The prediction results of sunny days are more accurate and less inaccurate than those of rainy days and cloudy days, which are characterized by sharp fluctuations and a prediction error value less than 0.1.

2)

The wind power sampling data are processed, and the ARMA model, the LSSVM model, and the combination prediction model of this paper are applied to the ultra-short-term prediction of wind power. The prediction accuracies of the three prediction methods have some differences, but the overall fluctuation range of the absolute error of the combination prediction model based on the ordered weighted average operator proposed in this paper is kept at [0,3]. The ultra-short-term prediction performance of the combination prediction model based on ordered weighted average operator proposed in this paper is better than that of ARMA model and LSSVM model, which indicates that the proposed combination prediction model based on ordered weighted average operator is more valuable for engineering practice.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

Research on the technical framework and critical path of new energy portfolio prediction based on multi-algorithm fusion

Zhongyuan Yan

Wulei Xue

Yi Zhang

Xinyi Du

Xiande Zheng

Published Online: Mar 19, 2025

Received: Nov 19, 2024

Accepted: Feb 18, 2025

DOI: https://doi.org/10.2478/amns-2025-0413

KeywordsOrdered weighted average operator, Shapley value method, ARMA model, Combined prediction, New energy power

© 2025 Zhongyuan Yan et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
Ordered weighted average operator, Shapley value method, ARMA model, Combined prediction, New energy power