A new strategy for power monitoring data collection based on data mining and its role in improving prediction accuracy
Published Online: Mar 19, 2025
Received: Nov 16, 2024
Accepted: Feb 19, 2025
DOI: https://doi.org/10.2478/amns-2025-0551
Keywords
© 2025 Junpeng Zhao et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The operation process of the power system is relatively cumbersome and complex, if you want to carry out real-time monitoring of the power grid operation status and changes in real-time monitoring of the relevant data to be recorded and analyzed, you should give full play to the role of the terminal equipment, the operation of the power system data to be measured, and summarized and collated, etc. [1-3]. Therefore, electric power personnel should attach great importance to the collection and detection of electric power data input and output process, assist the power grid to control the data, and timely discovery of problems and failures in the operation of the power system, and the reasonable application of effective measures to solve [4-6]. In addition, the power monitoring data to strengthen the collection and collation, help optimize the operation of the power grid, and increase the power grid resource reserves, improve the integrity and effectiveness of the power grid resources. At the same time, power monitoring data collection itself as the source of information of the power system, can provide accurate information data support for the operation of the power system [7-9]. The main features of monitoring data collection are to ensure the effectiveness of data selection and data collection in the mode of monitoring data on the power site and to control and optimize the input and output process of monitoring data. Due to the power system itself is more special, in the power field should be information input and processing results output, but the past data management model has been difficult to meet the current needs of the times [10-12]. With the continuous development of computer technology, especially the use of computer technology for data mining methods continue to innovate, a large amount of data generated by the power system can be analyzed by the application of data mining technology, and draw valuable conclusions [13-14]. In particular, decision tree, clustering, classification, regression analysis and other methods in data mining methods are most commonly used in power monitoring data collection, some of which can describe the current status of power monitoring and some can make predictions about future power monitoring data, which can provide valuable auxiliary effects for the formulation of power monitoring data collection strategies and the improvement of power monitoring prediction accuracy [15-18].
This paper proposes a method of analyzing the operation data of electric power equipment based on big data mining, that is, using the analysis method of big data, the data in the process of electric power monitoring is first collected with data and processing, so that the data is suitable to be used in the later detection and analysis. After that, the iForest power integration method, based on the LOF algorithm, is used to monitor abnormal data generated during the power monitoring process. Finally, the processed data is input into the improved Transformer model to predict the accuracy of the power monitoring system.
In the development of power systems, the update of power monitoring data collection strategies can process and update remote information in real-time, which helps to further promote the long-term development of power systems. In order to simplify the process of power grid information monitoring, literature [19] tries to introduce data mining technology into the intelligent monitoring system of power grid monitoring information in order to monitor the operation of the power grid in real time, and proposes an effective algorithm for data mining information monitoring. Literature [20] designed a framework for daily power usage pattern recognition and anomaly detection of building power usage data based on data mining, and took the time-series power usage data of three actual office buildings in Chongqing City as an example to verify the validity and feasibility of the proposed framework, which provides technical support for understanding the energy usage pattern and improving the energy management of buildings. Literature [21] designed a framework for identifying building power metering system operation strategies based on Classification Regression Tree (CART) and Weighted Association Rule Mining (WARM) methods, and conducted on-site investigations with three buildings in Shanghai, the results confirmed the effectiveness of the designed framework, which can accurately and automatically identify the building operation strategies, and help to improve the efficiency of the building operation strategy identification work. Literature [22] attempted to introduce data mining and IoT technology in Industry 4.0 smart grid monitoring and energy management, designed a smart grid monitoring platform integrating data mining and IoT technology, and verified the superior performance of the platform through empirical analysis, which is able to realize real-time monitoring and feedback of grid data. Literature [23] proposes a method for detecting the cause of power outage in distribution system based on mining association rules by Apriori technology, and verifies the feasibility of the method through experiments, which can effectively identify the factors related to power outage events, and has a reference value for the future planning and operation plan of the distribution network. In response to the problem of power electronic system faults that affect power detection data collection, literature [24] reviewed various PESs fault detection based literatures in recent years and analyzed the data mining techniques such as artificial neural networks, machine learning and deep learning algorithms applied therein, and the results showed that the deep learning based techniques are more effective in extracting the features from the measured signals than the other methods, which It helps to achieve reliable maintenance of power electronic systems. Literature [25] proposed a centralized heating substation fault detection and operation optimization method based on data mining techniques, and verified the scientific and applicability of the proposed method through empirical analyses, which can effectively extract potentially useful knowledge and thus provide reference value for fault detection and operation optimization of high-voltage substations.
The collection of big data has grown by leaps and bounds due to the widespread use of sensors and electronic components. The rapid development of computers and the Internet of Things (IoT) is even replacing computers to take over data collection. Including some traditional identification technologies including barcodes, QR codes and biometrics are contributing to the development of big data collection.
In the structural design of the data collection tool is mainly divided into three parts: the physical layer design, the access layer design, and the data collection layer design, the physical layer is the source of data collection in the power black box. In the access layer, the communication link mechanism between the host computer and the black box is realized through the serial protocol. The standards and protocols for transmitting data between modules are specified through the designation of transmission protocols, and the upper data acquisition layer is served through the protocols of this layer. The black box is located in the power field address resources are limited, generally take the dynamic division of the address of the black box to identify the method, listen to the command if all the black box to complete an address listening to the work of the black box to achieve a complete data collection, after receiving the confirmation command to start the second round of power equipment operation data listening to the work of the command.
Data cleanup: the operation of power equipment data is multi-faceted, including the detection of power equipment data, current data, voltage data, temperature data, etc. will be recorded, the system in the early stage of the preliminary data can be done on the previous data to do a cleanup, so that the useful data is left to carry out the next step in the work of cleaning up the data of the specific workflow as shown in Figure 1.

Specific workflow diagram for clearing data
1) Locating default values: locating default values are those null or empty values that appear when retrieving data.
2) Extract the replacement value: this approach uses the
3) Replacement of default value: this item is the default value is replaced by the replacement value in the previous step.
4) Locate outliers: because all the original data follow the rules of normal distribution by default, the arithmetic average of the entire data set should be calculated at the beginning, the formula is
5) Repeat steps 2) and 3) to extract the corresponding replacement value to replace the outliers.
6) Secondary detection: after many times of replacement of default values and abnormal values, the whole data set will be disturbed and the state will be changed, so it is necessary to carry out the work of determining and replacing the abnormal values by doing it again in order to ensure that the work is carried out normally.
7) After processing, the useful data left behind will be stored in the database, and these new data will not cover the original data, and the old and new data can be compared with each other, so that the subsequent work of data analysis can be carried out normally.
Data normalization process [26]: divided into standard conversion, polar deviation normalization conversion and square root standard method of three, the specific implementation is described below:
Standard transformation: a vector of dimension
After standardized transformations:
Polar deviation normalization transformation: the matrix of observations expressed in equation (1)
Square root standard method: the observations in equation (1) of the matrix
where
In the condition monitoring of electric power equipment, with the accumulation of time, a huge amount of condition monitoring historical data is gradually formed, and these historical data need to be quickly and effectively analyzed for condition assessment. Users can import these data into the database according to their own needs, and the system carries out the calculation and analysis of the data.
System algorithm realization process
1) Data preparation: Prepare power equipment operation status signal data, power equipment operation status signal data stored in HBase table; prepare a small amount of sample data of known categories collected in the laboratory environment and stored locally. 2) Signal Feature Extraction: Perform feature extraction on the signal data of power equipment operation status, and store the extraction results into sequence File. 3) Clustering center: extract features from a small number of samples of known categories, and then find the clustering center through the formula:
4) Perform clustering: Specify the path of the state signals and clustering centers of the extracted features, perform the clustering process, and the clustering results will be output to HDFS, also using the SequenceFile piece of storage. 5) Apply the KMeans output model to power equipment operation state evaluation: use the latest power equipment operation data to retrain the model to accurately reflect the operation state of power equipment in a timely manner.
Cluster analysis technique [27] is an important tool in data mining, which can be used for data analysis as well as pre-processing the data first for other algorithms, which can improve the accuracy of data processing.
The main process of clustering generally includes the following aspects:
1) Data preparation: the main tasks include feature normalization and dimensionality reduction. 2) Feature selection and presentation: the main purpose is to select the most effective features and save them in the vector. 3) Feature extraction: by converting the selected features into new salient features. 4) Clustering: firstly, some kind of distance function suitable for the characteristics of the data type is selected as the similarity measure criterion, and then clustered or grouped. 5) Evaluation of clustering results: There are three main methods for evaluating clustering results: relevance test evaluation, internal usefulness evaluation, and external usefulness evaluation.
Density based approach for power anomaly detection [28] focuses on the need to compute a numerical value for each data point indicating its degree of outlier similar to the distance based approach. This algorithm, for a given dataset, considers any data point to be a normal data point if the points in its local neighborhood are dense, while an outlier is a data point that is far away from the nearest neighbors of a normal data point, usually with a threshold value to define the distance. Among the density-based anomaly detection methods, the most typical is the local outlier factor [29], or LOF method.
As for the LOF algorithm, five concepts related to this algorithm must be mastered first:
1)
2)
3) Reachable distance:
Let
The reachable distance between data points
4) Locally Accessible Density: This is a measure of the local density of the
5) Local Outlier Factor LOF: The Local Outlier Factor characterizes the degree of outlier of a data point, and is also a measure of the likelihood of a data point being outlier, which is defined as follows:
The outlier factor LOF represents a density contrast that indicates a density difference between the data point
By defining the density, it is able to detect the local outlier points and the algorithm has a good detection accuracy. However, it has a time complexity of
iForest isolated forest [30] is a fast anomaly detection method based on an integrated approach. iForest does not utilize the use of distance or density measures for anomaly detection, which eliminates a large number of computations. iForest has linear time complexity and low memory requirements, but it does have high accuracy.
iForest contains a number of binary trees. These binary trees are called isolation trees, or iTrees for short. iTree trees are not exactly like decision trees in that they randomly select attributes and partition values to construct subspaces on branches.
Definition: isolation Tree. If
Given a set of data
Anomaly detection requires the calculation of an anomaly score that reflects the degree of anomaly. In iForest algorithm is to calculate anomaly score on the basis of path path length and by this score the data points are ranked. Anomalies are the top ranked data points. Define the path length and anomaly score as follows:
Path length: the path length
Exception points: since the iTree has the same structure as a binary search tree, the path length of sample
The load forecasting process based on the improved Transformer model is shown in Figure 2. The collected power load data of substation stations are subjected to data preprocessing, and the samples and labels are formed after normalization and sliding sampling of the data, which are inputted into the improved Transformer model [31] for tuning, training, forecasting and model evaluation.

The load prediction flow chart based on the improved transformer model
1) Overall design. As shown in Fig. 2 above, the coding layer of the improved Transformer model consists of a CNN feature extractor, a location information generator, a mask matrix, and a multilayer multi-head attention unit. The design ideas of these structures are introduced one by one below:
2) CNN feature extractor. There are the following limitations in applying the Transformer model to power data: a. The position distribution of traditional word vectors in the semantic space implies certain semantic information, while the power load data does not contain any semantics; and the position encoding based on sine and cosine functions is not interpretable in the semantic space; b. The native model sheds recursive and convolutional structures in extracting the sequence features, which inevitably leads to the problem of information fragmentation, which weakens the model’s ability to capture local information and long-distance dependencies.
In order to solve the above problems, a convolutional neural network based feature extractor is introduced to do word embedding processing to improve the model’s ability to fit local dependencies. A convolutional kernel with 3 rows and 1 column, and 1 unit of edge padding are used; the number of convolutional kernels (dimensionality of word vectors) is an adjustable hyperparameter. For an
The use of convolutional structure has the following benefits: a. The multidimensional convolutional kernel with shared weights not only notices the simple feature of power loads that are “near big and far small”, but also captures common patterns among neighboring locations.b. The convolutional structure can recognize the different patterns of letters in the local data in a multi-channel way and output them in the structure of multiple feature maps, thus it has a strong ability to extract local information.
3) Discard the padding mask structure. The native Transformer model uses padding mask to solve the problem of inconsistent input sequence length; while this study has avoided the problem of inconsistent input length by fixing the window length during data segmentation and labeling processing, this structure can be discarded.
4) Improvement of timing mask. The improved algorithm leaves the timing mask from the decoding layer and moves it to the coding layer. Based on this design, the model can automatically mask the information after the current processing time point when encoding, which makes the input space closer to the real application scenario.
The mask matrix is:
5) Position encoding
The role of position encoding is to allow the input sequence to carry position information so that the model can automatically capture the local dependencies associated with the position. The computational formula for position coding is:
6) Multihead multilayer self-attention unit.
(1) Working Mechanism of Attention Unit. The working mechanism of the attention unit is described as follows: first, the data are input to the multi-head self-attention mechanism unit; then the output results are subjected to residual correction and layer normalization; then the processed data are input into the multi-layer perception machine; the output results of the multi-layer perception machine are subjected to residual correction and layer normalization once again to output the results of the multi-layer self-attention unit; and the input data are calculated by the multi-layer processing structure After the input data is processed by the multilayer processing structure, the final result is output.
(2) Calculation formula of self-attention score:
Input: Data
Compute the query matrix
Output:
(3) Benefits of multi-head multilayer attention cell design:
Supports parallel computation: the self-attention score can be solved in one step by matrix multiplication.
Retains memory cells intact: Attention weights are computed between every two temporal features, thus the model learns distance dependencies and local dependencies mainly characterized by “near-big-far-small” in an intact manner.
Shorter total signal distance: Compared with RNN and CNN networks, the self-attention network has the shortest paths between units, and more effective gradient information is retained, which solves the problems of gradient vanishing and gradient explosion to a certain extent.
The design of stochastic deactivation reduces the structural risk. The residual structure prevents the problem that the accuracy gradually decreases with the increase of the number of network layers after it reaches saturation.
In order to verify the effectiveness of this algorithm, this method is now used to deal with another set of actual data sets, the data used in this sample set comes from the wind power generation in Province X between 2019 and 2023, the sampling frequency is January, a total of 220 sample data, the annual load curve is 12, and the distribution curve of the 220 sample points is shown in Figure 3. It can be seen that most of the data is concentrated in a certain region, except for one obvious peak data that is obviously different from the normal data.Wind power generation is seasonal and greatly affected by the geographical environment.Province X has sufficient wind energy in winter, but there is a phenomenon of wind abandonment and power limitations. The so-called wind abandonment and limitation of power refers to the phenomenon of suspending part of the wind turbines due to the insufficient acceptance capacity of the power grid, the lesser electricity load and the unstable wind power in the case of large wind energy.

The distribution curve of 220 sample points
The decision diagram of the traditional fast peaking algorithm is shown in Figure 4. It can be seen that there are two sample points with large relative density and distance at the same time on the upper right, and the phenomenon of lassoing occurs, while the distribution of the remaining sample points is characterized by a smaller distribution, which indirectly indicates that the traditional Transformer algorithm has certain limitations when dealing with power data, a data type with large local density variations.

Traditional rapid peak algorithm decision diagram
The decision diagram of the improved fast density peak clustering algorithm is shown in Figure 5. The figure can be the upper right corner of the nesting phenomenon has disappeared, and the clustering results are obviously better than before the improvement, the clustering center of the characteristics of more obvious.

Improved fast density peak clustering algorithm decision diagram
Since the outliers in this dataset are distributed over different years, it is not possible to represent them visually with curves. The distribution of outlier data is shown in Table 1. The results show that this algorithm has detected all the outliers in the dataset, with a total of 9 outliers.This algorithm focuses on the longitudinal comparison of power generation data over different years within the same period. In this dataset, the power generation from January to February every year is very small, even less than half of the highest monthly power generation, which is an anomaly in the whole year, but this algorithm does not regard the data in January and February every year as anomalies, but focuses on the local changes of the data. Simulation and analysis of different power data, respectively, prove that the LOF proposed in this paper has a better effect on the detection of outliers, which illustrates the effectiveness of this algorithm.
Anomalous data distribution
Abnormal value label | Date |
---|---|
5 | 2022/05 |
21 | 2019/12 |
63 | 2019/10 |
96 | 2019/07 |
108 | 2019/05 |
126 | 2021/09 |
143 | 2022/12 |
187 | 2023/10 |
202 | 2023/06 |
By comparing with different models, the advantages and disadvantages of the models can be well determined, so this section compares with LSTM and BiLSTM-Attention, respectively, and verifies the accuracy of the experiments in this paper by comparing the evaluation indexes of different models. The comparison models are shown below:
1) LSTM method: the classical recurrent neural network LSTM solves the problem of gradient vanishing and explosion of RNN during the training process. 2) BiLSTM-Attention approach: the model is based on Bi LSTM, which is highly robust to load sequence data modeling, and then the attention mechanism can highlight key features that play an important role in load forecasting.
The prediction results of different models on public dataset 1 are shown in Fig. 6. From the figure, it can be seen that the method proposed in this paper to improve the Transformer is more closely related to the value of real power data compared to the other 2 models, so it can be judged that the model in this paper has better prediction results, and then the next step is to visually judge the prediction results by comparing the evaluation indexes of each model.

Predictions of different models on public data sets 1
The evaluation metrics of different models on public dataset 1 are shown in Table 2. The MAPE of this paper’s model is 1.03%, which is improved by 65.2% and 61.13%, respectively, compared to other models. The R2 of this experiment reaches 99.84%, which is almost close to 1. All the evaluation metrics show that the model proposed in this paper maintains superior prediction results.
Evaluation indicators of different models in public data sets 1
Model | MAPE(%) | RMSE(MW) | MAE(MW) | R2 |
---|---|---|---|---|
LSTM | 2.96 | 281.0886 | 183.2736 | 0.9542 |
BiLSTM-Attention | 2.65 | 227.4795 | 161.2114 | 0.9701 |
Improve transformer | 1.03 | 85.4571 | 62.3925 | 0.9984 |
The comparison of the experiments has been carried out using different models to verify that the present experiments have high accuracy. In the following, in order to verify the robustness and generalization ability of this method, another public dataset is selected to validate the model proposed in this paper. In this subsection, the power load dataset from public dataset 2 of the 9th Electrician’s Attribute Modeling Competition test is selected to validate the accuracy and generalization ability of the proposed model in this paper. The same comparison model as in public dataset 1 is selected for both training and prediction on the power load data of dataset 2. The prediction results of different models on public dataset 2 are shown in Fig. 7. The results show that the prediction results of the improved Transformer method proposed in this paper perfectly overlap with the true values, while the similarity of the other two comparison models is slightly worse than that of the method proposed in this paper.

The results of different models on public data set 2
The evaluation metrics of different models on public dataset 2 are shown in Table 3. The evaluation metrics of this paper’s method in dataset 2 are MAPE: 1.4%, RMSE: 124.5055 (MW), and MAE: 84.5468 (MW). The R2 of the predicted results reached 99.63%. The results of each evaluation index can be verified to show that the accuracy of this model can still maintain excellent results even when the dataset is replaced.
Different models of the evaluation indicators on public data set 2
Model | MAPE(%) | RMSE(MW) | MAE(MW) | R2 |
---|---|---|---|---|
LSTM | 3.67 | 320.4859 | 235.0074 | 0.9397 |
BiLSTM-Attention | 3.02 | 235.8231 | 171.6748 | 0.9762 |
Improve transformer | 1.40 | 124.5055 | 84.5468 | 0.9963 |
The prediction results for the real dataset are shown in Figure 8. By comparing the results of the evaluation metrics of the different models with the values of the real loads, it can be seen that the model of this paper maintains accurate prediction results on the real dataset as well, in contrast to the other two models, which have a lower degree of fit to the real values.

Prediction of real data sets
The results of the evaluation metrics of the different models on the real dataset are shown in Table 4. The results show that the evaluation indexes are MAPE: 4.15%, RMSE: 496.1061 (MW), MAE: 356.6518 (MW), and the R2 of the prediction results reaches 97.71%. All the indicators are better than other models, so it can be put into the actual power load forecasting problems, and assist the power system to make scheduling plans and decisions.
The results of different models in the real data set
Model | MAPE(%) | RMSE(MW) | MAE(MW) | R2 |
---|---|---|---|---|
LSTM | 5.23 | 683.7404 | 473.8774 | 0.9483 |
BiLSTM-Attention | 4.57 | 596.3083 | 426.4559 | 0.9583 |
Improve transformer | 4.15 | 496.1061 | 356.6518 | 0.9771 |
In this paper, in the context of data mining, we propose a method for collecting and detecting data in the process of power monitoring, on the basis of which we construct an improved Transformer power load forecasting and verify the accuracy of the model. The primary conclusions are as follows:
1) The clustering result of the improved LOF algorithm is obviously better than that of the pre-improved one, and the characteristics of the clustering center are more obvious. And this algorithm detects all the abnormal power values in the dataset, which proves that the improved LOF algorithm proposed in this paper has a better effect on the detection of abnormal values, and shows the effectiveness of this algorithm. 2) Comparison of different models reveals that the prediction model in this paper has the lowest MAPE and is improved by 65.2% and 61.13% than other models, respectively, and the R2 is almost close to 1 (99.84%), and the model maintains more excellent prediction results. Comparison of different datasets shows that the MAPE and MAE values of this paper’s method in dataset 2 are smaller than those of other models, and the R2 of the prediction results reaches 99.63%, which indicates that the accuracy of this paper’s model is still extremely high under different datasets. The results of the validation experiments on the real dataset are similar to the comparison results on different datasets, and the model presented in this paper is still better than other models.Therefore, it can be included in the actual power load forecasting, which can help the power system to make scheduling plans and decisions.