Machine Learning Based Outlier Detection Algorithm for Distributed Flexible Sensing Module with Non-stationary Multi-Parametric Data
Pubblicato online: 25 set 2025
Ricevuto: 01 gen 2025
Accettato: 18 apr 2025
DOI: https://doi.org/10.2478/amns-2025-1028
Parole chiave
© 2025 Suqin Xiong, Yang Li, Qiuyang Li and Zhiru Chen, published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
In today’s era of rapid technological development, there is an increasing demand for sensors. With the development in the fields of microelectronics and nanotechnology, among the many types of sensors, flexible sensors are a relatively new type of sensors. It usually refers to sensors that measure physical quantities such as pressure, strain, bending deformation, conversion, temperature and humidity by integrating flexible materials together [1]. Flexible sensing module is a component of a sensor that can sense and capture environmental data such as external temperature and humidity and biological data in real time [2]. The module has the advantages of being wearable and embeddable in addition to its flexible characteristics, so flexible sensors are widely used in the fields of healthcare, smart gloves, virtual reality, environmental monitoring, robotics, and automation [3-8]. When sensors perform data sensing, the capture of data by flexible sensing modules usually has multiple parameters, and these parameters may be affected by multiple dimensions, such as wearable sensors may be far away from the human skin during human movement and lead to interruption of data sensing; or problems such as sensor noise and sensor drift in different dynamic environments [9-11]. This leads to the data captured by the flexible sensing module in the sensor is characterized by non-stationary and multi-parametric. Whereas data processing and analysis are required after sensor data acquisition, dynamic and complex data undoubtedly increase the difficulty of processing and analysis. However, distributed data acquisition by installing acquisition sub-devices on the device, data acquisition is dispersed in multiple locations and can be centrally managed and processed through the network [12]. This approach not only can greatly reduce the non-stationary state brought about by missing data, errors, etc., but also reduces the equipment wiring and equipment costs [13-14].
In the real world, data anomalies often have a significant impact on analysis and decision making. Therefore, outlier detection has become one of the important tasks in the field of data mining and machine learning. Outliers are data objects that are significantly different from other data points, with eigenvalues that differ from most data points, possibly due to measurement errors, data entry errors, unusual events, etc [15]. The goal is to identify these outliers from the dataset for further analysis and appropriate measures [16]. In practical applications, appropriate outlier algorithms can be selected according to the specific situation and combined with domain knowledge and experience for outlier detection and processing. However, the traditional outlier detection algorithms are not sufficient to cope with non-stationary and multi-parametric data analysis [17-19]. Based on this, a machine learning-based outlier detection algorithm for non-stationary multi-parametric data of distributed flexible sensory modules is proposed.
Currently, outlier detection algorithms are mainly based on statistical, clustering, density, and deep learning methods. Based on statistical methods, Ur Rehman and Belhaouari [20] designed two unsupervised outlier detection based on statistical methods for computational multidimensional data anomaly identification by simplifying the multidimensional data into a one-dimensional space and reducing the computational complexity. Lu et al [21] designed an outlier detection algorithm that identifies and transforms the data into outliers by linear interpolation method, and analyzes the down dimensioned data in inter-correlation with the identity of the isolated outliers to determine the true isolated outliers, and then the outliers are sorted by the multistage Otsu method into the outliers. Nikolova et al [22] proposed a new method for anomaly detection in fuzzy data by applying the leave-one-out method to remove obvious outliers and obtain a high-quality regression model, and the outliers in the model are repeatedly detected under the multiple testing procedure to avoid false anomalies. The traditional statistical based method, which is simple and effective, is not ideal for complex data processing, while the above method improves on this by analyzing the complex data through dimensionality reduction and multiple testing, which improves the accuracy.
A density-based approach, the spatial outlier detection algorithm developed by Singh and Lalitha [23], was used to explore anomalies in non-spatial attributes and their neighboring domains in spatial datasets for locally unstable or extreme observations. Tang and He [24] used the relative density anomaly score as a criterion to achieve local density distribution estimation by k nearest neighbors, inverse nearest neighbors and shared nearest neighbors, thus obtaining a new outlier detection method, which is just simple and effective. Huang et al [25] proposed an outlier detection algorithm under the conditions of fixed local density distribution and small ratio of the number of k-nearest neighbors to the number of inverse k-nearest neighbors, which achieves global or local anomalies and clusters of outliers by identifying outlier inflections and treating their inflections and their sparse domains as outliers. Tran et al [26] invented an efficient and low inventory outlier detection algorithm, CPOD, to complement the detection of peripheral data in the data stream with multi-distance indexing for outlier retrieval in the field adjacent to the set data core points. Such methods, independent of data distribution, are computed between densities and are poorly adapted to high-dimensional data.
Based on clustering, Ray et al [27] proposed an outlier detection method integrating K-means, K-means++ and fuzzy C-means algorithms to improve the comprehensiveness of the information and incorporating probabilistic techniques to increase the affiliation value to the hard clustered data along with outlier identification. Wang et al [28] improved the OPTICS clustering algorithm to obtain an OD-OPTICS outlier detection method, specifically a radius filtering strategy as a focused detection method, covering the spatial model to remove redundant radii and calculating the distance between neighboring points. This method can identify the outlier points simultaneously due to the radius range detection, but obviously the influence of the distribution of the outlier points has a strong uncertainty on the accuracy and efficiency of the detection. And the method of clustering is difficult to determine the selection of the best K value, which leads to the difficulty of identifying outliers.
Based on deep learning approach, Hassan et al [29] used deep neural network model to detect outliers in large and complex datasets, by learning data features and injecting new links in the model to improve feature abstraction and model performance, the outlier detection method obtained has 99% accuracy. Whereas, Abhaya and Patra [30] identified anomalies with self-organizing mapping clustering method and used selfencoder for reconstructing errors of normal points to determine outliers. In the basic self-encoder to recognize abnormal data, the reconstruction error is usually performed for both normal and outlier data, which is difficult to guarantee the accuracy of recognition. It is well known that deep learning can capture the intrinsic laws of data, but outlier detection by such methods is highly dependent on the data sample size.
In this paper, a two-stage algorithm combining improved DBSCAN clustering and local outlier factorization is proposed and applied to the outlier detection of non-stationary multi-parametric data in distributed flexible sensing modules. In this paper, a distributed flexible sensing terminal application system is designed based on AMI architecture, and its functional modules are described in detail. The PCA algorithm and maximum likelihood method are used for dimensionality reduction and screening of the original data, then the improved DBSCAN clustering algorithm is introduced to cluster the features, and the improved local outlier factor algorithm is introduced to construct a two-phase multi-parametric data outlier detection algorithm. For the effectiveness of the above algorithms, they are verified and analyzed by simulation, and the error accuracy of the distributed flexible sensing terminal application system is discussed.
Power enterprises should do a good job in power generation, transmission, transformation and distribution in order to make the whole power grid more sound, and the electric energy measurement data has spatial distribution characteristics. With the rapid development of the Internet of Things and information technology, the power system has begun to use abnormal data mining methods to build a complete metering platform to model the power grid, and combined with the topology of the power grid to visually analyze the measurement data of electric energy, which lays a solid foundation for future electric energy planning. In the practical application of the power consumption information collection system, it can meet the special needs of customers to collect information, calculate, and analyze the abnormal data in the measurement according to the early warning system.
Advanced Measurement System (AMI) an important technical support for smart grid power metering, is a complete network system used to collect, store, analyze and apply user information on electricity consumption, consisting of home EMS, power meters, local communication network, remote communication network, metering data management system and data integration platform [31]. Figure 1 shows the schematic structure of AMI, which is a complete metering system mainly composed of several key technologies and applications that can record the user’s electricity consumption behavior in near real time and transmit the collected data to the metering data management system (MDMS) through the network.

Diagram of AMI structure
The MDMS is a database with analytical tools capable of processing and storing metered values from energy meters, and one of its basic functions is to validate the legitimacy, edit and evaluate the data of the Advanced Measurement System (AMS) to ensure the completeness and accuracy of the messages. Through the aggregation, analysis and storage of messages, a reasonable tariff strategy is formulated to ensure that when the communication network is interrupted and the user side fails, the grid is still able to accurately demand forecast without affecting the stability of the entire system. AMI system generally consists of energy meters, collectors, concentrators, metering data management system and communication network, of which the communication network can be divided into upstream network and downstream network. Downstream network is mainly a network in which the metering data management system of the master station sends down command information to users through the medium of electric power private network or public network. The uplink network is the network in which the smart meter transmits power consumption information to the concentrator through media such as wireless or power line. Smart meter is the metering terminal of AMI on the user’s side, which is a kind of programmable energy meter and has the function of fee control in addition to the function of electricity recording. The smart meter also has a built-in communication module that can communicate with the home network and pass the collected data to the master system for information summarization. When the demand side responds and receives the user’s permission, the smart meter can forward the power company’s control commands to the user’s equipment.
In terms of communication mode, the uplink communication of flexible sensing terminal device supports wired (Ethernet, 485, HPLC, etc.) and wireless (infrared, Bluetooth, etc.) communication modes, and the energy controller and the cloud master prioritize wireless public network communication, and 5G or wireless special network communication can be used in accordance with the network development. Downstream communication supports HPLC, Bluetooth and other communication modes, and HPLC communication is the main communication mode between gateways, with a broadband of 2-12MHz, a rate of 1Mbps or more, and an average online rate of 99.95% or more, which meets the needs of massive low-voltage equipment access and real-time data interaction, and different communication modules can be selected according to different environments. Distributed flexible sensing terminal application system architecture mainly connects the distribution side, branch side and meter box side to collect the metering data of the power meter in real time, supplemented by realizing the effective analysis of the power meter data.
AMI-based online monitoring of power metering device is mainly to realize the management of power metering device operation status, through the information collection of power metering device operation status, data analysis and processing, and logical judgment to the online monitoring of the basic operating parameters of the power metering device. Specific executive functions include online power collection and monitoring and power error testing unit, current CT switching unit, voltage PT switching unit and power meter pulse collection and switching unit, of which the power collection and monitoring and power error testing unit is mainly for the current circuit of the power meter error, the secondary load of the transformer and the PT secondary voltage drop of the online monitoring. The current CT switching unit is responsible for the secondary switching of the current loop, the voltage PT switching unit is responsible for the switching of the multi-section PT bus, and the power meter pulse acquisition switching unit is responsible for the switching of the power pulse input of the measured power meter [32].
The basic composition of AMI-based online monitoring of power metering device mainly includes device abnormality knowledge base, cloud platform storage unit, a new generation of Internet application technology, interface module and background management, etc., which is capable of accomplishing data interaction between multiple subsystems of power metering device. Online monitoring is based on the power collection terminal, and the main function of the collection terminal is to collect, read and store the data signals of the monitoring points of each power measuring device. After obtaining various types of data through the collection terminal, the collected data are analyzed online, which can provide an effective and reliable basis for the fault diagnosis process.
By designing a modular measurement equipment structure, we have innovated from a traditional decentralized structure to a “building block” modular structure. In addition to ordinary measurement and sensing functions, the innovative design of the “expansion module” function. According to the actual demand, through the “hot-swappable interface” to access the powerful special expansion module, to realize the directional expansion function. In order to ensure that the connection between the modules is reliable, the design of the module connectors using a single row of pin-type interfaces, the use of alloy gold-plated plug design, and in the upper and lower ends of the module to increase the position of the fixed buckle, to ensure that the module splicing operation is stable, vibration, shock resistance, expansion module design as shown in Figure 2.

Extension module design drawing
At present, according to the application requirements of different work scenarios, the expansion module can be divided into intelligent protection module, displacement control module, environmental awareness module and so on. Module can be extended to install the protection unit door lock built-in opening and closing control device and camera device, the module built-in all the way to the magnetic sensor connected to the door lock, while the built-in camera device. When there is illegal intrusion, the door lock within the opening and closing control device through the magnetic sensing signals to inform, trigger the two-way camera to take continuous pictures, stored locally, and through the upload master station, triggering early warning. At the same time, the built-in strong magnetic monitoring module can monitor the magnetic intensity around the box in real time, and report to the main station in time when it exceeds the threshold, triggering an early warning.
Depending on Read data such as energy meter clock, battery undervoltage status, programmed open cover event, peak and trough correspondence with total power, etc., to detect the initial state of operation and analyze whether there is any problem with the energy meter system. The intelligent terminal analyzes the voltage curve of the same-phase energy meter and its own voltage value of the same phase (the actual distance is generally less than 1 meter, and the pressure difference in the meter box will be too large) to compare the two sets of voltage data under the two states of the highest/lowest voltage during the testing time. Analyze the voltage error value of each power meter, save the result after exceeding the set threshold, and analyze whether there is any voltage sampling error problem in the power meter. The intelligent terminal collects and calculates the difference between the current of its own test phase and the current of the power meter of the same phase, and analyzes the error rate at the time of triggering the characteristic current (sudden increase or decrease of current) of each power meter if the error is larger than the set threshold. At the same time on the power meter zero, fire current (power meter fire line through the manganese-copper shunt resistor measurement, zero line through the transformer measurement, so there is generally not the same specification error measurement phenomenon) to assist in the analysis until the confirmation of the specific abnormality of the power meter will be the information recorded, this step can be calculated to the power meter current sampling error is large. Calibration meter verification. There is a simulation measurement program written in the CPU of the intelligent terminal, and after specifying the measured energy meter, it first verifies whether the current current of the energy meter meets the conditions of calibration (more than 2A) and whether there is any collection task in the last 5 minutes, and then collects the active value of the calibrated energy meter with high frequency after the conditions permit. When the last decimal of the energy meter is updated, the current input is provided through the 485 interface at a frequency of 2 times per second for the analog metering program, and the voltage sampling value of the same phase of its own metering chip is continuously input to the analog metering program through the built-in communication interface. After running for a period of time (according to the current specification and the purpose of calibration, the longer the time, the higher the accuracy), the analog program will measure the value of the value and the electrical energy indicated by the value for comparison. The metering error information of the meter can be obtained, and this step can directly determine the specific value of the metering error of the meter.
In the distributed flexible sensing terminal application system, the data collection in the non-stationary environment is realized by combining with the online monitoring module of electric power metering. At present, the industry uses the traditional sensor temperature and humidity acquisition device for temperature and humidity measurement and calibration, and this sensor-type temperature and humidity acquisition device can only purely carry out data recording, unable to complete the statistical analysis of data and other functions, poor scalability. And need to monitor the environment in different locations to arrange wired cable and probe, poor convenience and does not meet the calibration requirements. Based on this, this paper proposes a way to support the application of outlier-based data detection in non-stationary environments, aiming to reduce the error of power metering data.
For the multi-parameter electric energy metering data in the non-stationary environment, it obtains the relevant temperature and humidity data through the protection unit of the distributed flexible sensing terminal application system, but due to the high-dimensional characteristics of the data it collects, it is not conducive to outlier detection in the later section. Based on this, in order to improve the efficiency of the original variables with smaller variables in a linear combination of most of the data, this paper introduces the principal component analysis (PCA) to carry out the dimensionality reduction of the original data of the electric energy multi-parameter. PCA is a multivariate statistical method, through the dimensionality reduction with a few principal components to express the original multiple variables in a linear combination of the original way [33].
Let
Let
where
The contribution rate indicates the ability of the principal components to synthesize the original variable, which is defined
The principal elements in PCA are the information produced by several variables of the data to be processed after a linear combination, and this change makes the variance of the variables of the data to be processed after the transformation is performed.
After the completion of data dimensionality reduction, based on the maximum likelihood method to calculate the separability of various types of anomalies in the non-stationary environment of multi-parameter power measurement data, when the higher the separability obtained by the calculation, it means that the data item has a more obvious classification effect for the anomalies, and select the data item with higher separability as the feature data item of various anomalies. The calculation process of feature data screening based on maximum likelihood method is as follows:
Data preparation. Preliminary screening of smart energy meter metering data, by calculating the Pearson correlation coefficient between each feature and the target anomaly, selecting features with correlation coefficients close to 1 or -1, and filtering out features with higher correlation with the anomaly as alternative features. Calculate the likelihood value. For each alternative feature data, calculate its maximum likelihood value in the normal sample data and abnormal sample data. The maximum likelihood value indicates the ratio of the probability density of the feature data appearing in the normal sample data to the probability density of the feature data appearing in the abnormal sample data. The formula is as follows:
Where Calculate separability. Calculate the separability of various types of abnormalities caused by different alternative feature data according to the maximum likelihood value. The higher the separability, the more obvious the classification effect of the feature data on the anomalies. The formula is as follows:
Where separability denotes separability,
The formula for
where
Screening feature data. According to the value of separability, the feature data items with higher separability are screened out, and these feature data items can be used as the basis for building the feature model. Specifically a threshold needs to be set, if the separability of a feature data is higher than this threshold, it is considered to be a valid feature data. For the screened valid feature data, it will be used as the input of the precision research function.
The main task of electric energy metering data point clustering is to aggregate metering data objects with similar characteristics into distinct clusters. In this paper, the fitting method based on the clustering results of electric energy metering data points does not allow specific prediction of the final clustering results of the electric energy metering dataset before the clustering starts. The number of class clusters formed in the clustering result is unknown, so algorithms that require the number of class clusters in the clustering result as an input condition are no longer applicable when selecting a clustering algorithm. Among the machine learning methods, the density-based DBSCAN clustering algorithm satisfies the above constraints, so the electricity metering data point clustering method in this paper is improved on this basis [34].
Assuming a data set of Core object. For data object Boundary object. For data object Noise. For data object Density Direct. For data objects Density reachability. If there exists a chain of data objects
The Local Outlier Detection (LOF) algorithm aims to find outliers in a dataset, where the data points are outliers or not depending on the local environment. The basic principle is to quantify the density of data within the neighborhood of the data point and then achieve outlier detection by the degree of denseness. First calculate the local reachable density of the detection point, and then calculate the outlier factor of the detection point, which can characterize the data density around it. The larger the value of the outlier factor, the higher the outlier degree of the data point i.e., the higher the possibility of data anomalies, and the smaller the value of the outlier factor, the lower the degree of the outlier degree, and the lower the possibility of data anomalies [35]. The process of calculating the LOF is as follows:
Construct the sample matrix from the sample data extracted in the previous section based on PCA and DBSCAN clustering, and construct the sample matrix
where
The set of all data in dataset
Simply put, a small distance between data points is
The local reachable density lrd(
Where
The local outlier factor lof(
If data point
Since the Q-statistic also couples the error change dynamics of other power metering data, it is necessary to locate the power metering data where the error change occurs after the error change is detected. According to the theoretical derivation, compared to the local outliers in the significant case, the possibility of error drift increases in the case where the latest calculated local outliers of the energy metering data are shifted the most. Therefore, by quantitatively analyzing the relative displacement of the local outliers of each power metering data and arranging them in order, it is possible to accurately pinpoint the power meters that are experiencing error drift. In addition, in order to more accurately describe the relative displacement of the local outliers of each meter, we introduce the concept of contribution, and the meter with the largest contribution can be localized as the meter with the error drift. Its calculation expression is:
where
Since the number of outliers in multicomponent power metering data in non-stationary environments is small as a proportion of the overall dataset, in order to make outlier mining more targeted, the overall dataset is first preliminarily screened by clustering. Density-based DBSCAN clustering does not require pre-specification of the number of clusters to be clustered and is able to find any number and shape of clusters in the noisy dataset containing noise, but the algorithm requires the user to set the parameters Processing of DBSCAN input parameters In order to improve the clustering quality and reduce the resource consumption, the idea of K-nearest neighbor is introduced into the DBSCAN clustering algorithm to obtain the distribution of the data set while calculating the k-distance, and the a priori knowledge of the data set is obtained by analyzing the input k-neighbor number instead of the density threshold Definition of new core points for DBSCAN The parameters for determining the core points of the DBSCAN clustering algorithm before the improvement require human input, while in the improved clustering algorithm the core points are determined based on the calculation of the K-distance of each data object in a grounded manner. That is, for any data object Definition of LAOF
where the reachable distance formula is:
Since the LOF algorithm calculates the reachable distance and reachable density with a high time complexity of
The local density of object
The local density of object
According to the formula for solving the local outlier in the LOF algorithm, the formula for calculating the local outlier in the LAOF algorithm is analogous.
The local outlier factor of object
The local outlier factor of object
The power grid system is developing in the direction of intelligentization, automation and large-scale development, and the use of network information technology in the power grid system is increasing in scope and quantity. As an important part of the intelligent power grid, power metering data acquisition undertakes the task of collecting a large amount of power data at every moment and every day in the power grid system. The power metering data acquisition analyzes the collected power parameter information in all aspects through the internal automatic data analysis algorithm, and labels and displays the abnormal data to ensure the real-time detection accuracy and precision of the power metering data. This chapter mainly focuses on the validity of the outlier detection algorithm for non-stationary multi-parameter data in the distributed flexible sensing module designed in the previous section to carry out validation and analysis to provide support for the promotion of intelligent power grid.
In this section, three low-voltage stations (A, B, and C) in an area of C city, Province H, are analyzed and validated as an example, and data dimensionality reduction and feature filtering are performed using historical power metering data collected by State Grid H Province Electric Power Co. through the collection system of distributed flexible sensing terminals. Figure 3 shows the feature extraction results of the power metering data, in which Figures 3(a)~(d) are the voltage peak-to-peak, craggyness, skewness, and waveform factor features, respectively.

Electrical energy measurement data characteristics
As can be seen from the figure, the power metering data of different stations show obvious differences after feature extraction. Observing the voltage peak-to-peak characteristics, the peak-to-peak values of power metering of A, B and C stations fluctuate around 10, 6 and 30, respectively. From the voltage crag characteristics, the voltage crags of A, B and C types of stations fluctuate around 2.5, 1 and 3, respectively. From the voltage skewness characteristics, the voltage skewness of A, B and C type stations fluctuates around 1.25, 0.5 and -1, respectively. From the characteristics of voltage waveform factor, the waveform factor of A and B type of stations fluctuates around 1.005, while that of C type of stations fluctuates around 1.05. Considering the above features comprehensively, the feature extraction effect of power metering data from different stations is more satisfactory, and the feature differentiation between different stations is more obvious, which lays the foundation for the subsequent feature downscaling and clustering analysis.
Since there may be a certain degree of correlation between features of electricity metering data, the purpose of dimensionality reduction is to improve the data processing speed by removing noise and unimportant features while retaining the most important features. Since the PCA method can effectively maintain the features of high-dimensional data and is also suitable for scenarios in which high-dimensional data are downscaled to low-dimensional data and visualized, it is used to demonstrate the visualization results of the recognition method in this study. In order to facilitate the calculation, the PCA algorithm is applied to downsize the power metering feature dataset, and the original power metering dataset collected is T*n dimensional, and a 6*n dimensional power metering feature dataset is formed by power metering feature extraction. Further based on the PCA algorithm, the power measurement feature data is downscaled to 2*n-dimensional low-dimensional feature data. The clustering of data after dimensionality reduction is shown in Fig. 4, and Fig. 4(a)~(d) shows the feature dimensionality reduction by selecting the cosine distance, Chebyshev distance, Euclidean distance, and Mahalanobis distance metrics, respectively.

Degraded data clustering
As can be seen from the figure, by comparing the clustering under different clustering distances, it can be intuitively seen that it is relatively ineffective when using the Mahalanobis distance metric for feature dimensionality reduction. However, the cosine distance, Chebyshev distance, and Euclidean distance are all effective in differentiating the power metering feature data of a station. Since the PCA algorithm cannot maintain the original physical meaning of the original dataset after mapping the high-dimensional data to the low-dimensional space, the distances between different clusters in the visualization results become meaningless and cannot represent the similarity. Considering the advantage of Euclidean distance in computational speed, this paper chooses Euclidean distance for PCA dimensionality reduction.
In this paper, the improved DBSCAN clustering algorithm is used for data feature clustering when performing outlier detection of multi-parameter power metering data in a non-stationary environment. For the effectiveness of the improved DBSCAN clustering algorithm, this paper uses the contour coefficient to measure the clustering quality of the algorithm. The numerator of the contour coefficient (CC) is the measure of the “empty space” between two clusters, and the denominator is the larger of the two lengths, i.e., the radius of the cluster and the distance between the two clusters. The contour coefficient value ranges from -1 to 1. A negative value of the contour coefficient indicates that the radius of the cluster is larger than the distance between the two clusters, indicating that the clusters are overlapping. A larger value of contour coefficient indicates a higher quality of clustering in the clustering algorithm. Therefore, the average value of the contour coefficients of all points within a single cluster is used to measure the clustering quality of the whole cluster as a measure of the clustering effect of the improved DBSCAN clustering algorithm. Based on the power metering data of three low station areas obtained in the previous section, 2000 data were randomly selected from them, which contained seven categories. Hybrid Gaussian model (GMM), load essential features (COL), and matching analysis method (MAA) are selected as comparison algorithms in the experimental process, and experiments are conducted by ten-fold cross-validation method, and Table 1 shows the clustering effect of different clustering algorithms on electric energy metering data.
Energy metering data clustering effect
| No. | DBSCAN | MAA | COL | GMM | Ours |
|---|---|---|---|---|---|
| 1 | 0.401 | 0.472 | 0.469 | 0.528 | 0.724 |
| 2 | 0.413 | 0.436 | 0.436 | 0.514 | 0.693 |
| 3 | 0.371 | 0.415 | 0.455 | 0.509 | 0.689 |
| 4 | 0.356 | 0.467 | 0.429 | 0.487 | 0.691 |
| 5 | 0.401 | 0.415 | 0.454 | 0.522 | 0.707 |
| 6 | 0.422 | 0.423 | 0.441 | 0.491 | 0.719 |
| 7 | 0.389 | 0.439 | 0.439 | 0.507 | 0.726 |
| 8 | 0.358 | 0.478 | 0.463 | 0.518 | 0.681 |
| 9 | 0.395 | 0.461 | 0.457 | 0.473 | 0.693 |
| 10 | 0.388 | 0.427 | 0.432 | 0.525 | 0.725 |
| Means | 0.389 | 0.443 | 0.448 | 0.507 | 0.705 |
In order to further verify the effectiveness of the improved DBSCAN clustering algorithm in this paper, the computation time of different clustering algorithms is compared and analyzed by selecting the hybrid Gaussian model (GMM), the load essential features (COL), and the matching analysis method (MAA) as the comparison algorithms, and the results of the comparison are shown in Fig. 5.

Calculation time comparison results
As can be seen from the table, the mean value of profile coefficient obtained by ten-fold cross-validation of the improved DBSCAN algorithm for feature clustering of multi-parameter electricity metering data is 0.705, which is 81.23% higher compared to the mean value of profile coefficient of the original DBSCAN clustering algorithm. Compared with MAA, COL and GMM algorithms, the mean contour coefficient of the improved DBSCAN clustering algorithm proposed in this paper is improved by 59.14%, 57.37% and 39.05%, respectively. Therefore, it indicates that the improved DBSCAN clustering algorithm in this paper obtains better clustering results when performing feature clustering of multi-parameter power metering data, and can provide reliable clustering results for realizing outlier detection of multi-parameter power metering data in non-stationary environments.
According to the comparison results of different algorithms in Fig. 5, the computation time of non-stationary multi-parameter power metering data clustering based on the improved DBSCAN clustering algorithm shows a relatively smooth state, and when the number of iterations reaches 80 times, its computation time is 3.33 min, and the mean value of the computation time for the 80 times of iterations is 3.38 min. while the computation time of the 80 times of iterations of the non-stationary multi-parameter power metering data clustering based on the hybrid Gaussian model, the load essential features, and the matching analysis algorithm is 3.33 min. analysis algorithms, the average computation time of non-stationary multi-parameter energy metering data clustering is 4.45, 5.13 and 6.32 min, respectively. The average computation time of this paper’s improved DBSCAN clustering algorithm in clustering is reduced by 24.05%, 34.11%, and 46.52% compared to the three comparative algorithms, and the computation time of this paper’s improved DBSCAN clustering algorithm is lower than that of the traditional The computing time of the improved DBSCAN clustering algorithm in this paper is shorter than that of the traditional clustering algorithm for electricity metering data. This fully reflects the effectiveness of this paper’s improved DBSCAN clustering algorithm in non-stationary multi-parameter energy metering data clustering, and the algorithm has a faster clustering time and better performance in energy metering data calculation.
For the non-stationary multi-parametric data outlier detection algorithm proposed in this paper, this paper tests the algorithm on the basis of simulated data set real data set. Laboratory, the algorithm of this paper is compared with LOF, COF, LDOF algorithms to analyze the effect of the time efficiency of this paper’s algorithm, the accuracy of the outlier detection of the outlier detection of the combined parameters of the algorithm on the accuracy of the algorithm detection.
In this paper, the algorithm is first validated on a simulated dataset, the data distribution in the simulated dataset is sparse but shows a certain pattern, the dataset contains 24 data, mainly consists of two diagonal lines with an angle of 45° with the x-axis, each diagonal upward with 12 data points, and its specific distribution is shown in Figure 6. The two oblique lines are compared to the coordinate origin a, b is a point on the x-axis, and c is the farthest point from the origin on the oblique lines. Intuitively, a and c are the data that show a certain pattern distribution, and b is the data whose displacement deviates from the pattern distribution, so the point with the highest degree of sharpness should be point b.

Simulated data set
On the basis of the simulated dataset, the accuracy of this paper’s algorithm in detecting outliers and the effect of parameters on the algorithm are verified. According to the experience take the parameter k equal to 6, the experimental results are shown in Fig. 7, the x-axis represents the data sequence in the simulated dataset, and the y-axis represents the values of different outlier factors. By analyzing the data points with the largest outlier factor for each algorithm it is known that it is the COF algorithm and the algorithm in this paper that can detect the outlier data b. It is proved that in the face of sparse data distribution and presenting a certain pattern distribution, this paper’s algorithm inherits the advantages of LOF and can accurately detect the outlier data outside the pattern.

Comparison of detection results with the same parameters
Secondly, this paper further designs the simulation dataset, which mainly consists of three data classes X1~X3 with different densities and five outlier data consisting of Y1~Y5, and runs four different algorithms on this dataset. In order to facilitate the analysis, only the detection accuracy of the algorithms under different parameters, i.e., the number of correctly found forest clusters/total number of outliers, is compared. Figure 8 shows the comparison results of the different algorithms on the simulated dataset.

Changes of accuracy rate with different parameters
As can be seen from the figure, most of the algorithms can detect the outlier data more accurately when the parameter k is set reasonably, but with the increase of the parameter k, the accuracy of LOF algorithm and COF algorithm shows a significant decrease, while the accuracy of LDOF algorithm and the algorithm of this paper is almost unaffected by the parameter k, and their accuracy of the detection of the outlier points is 59.42% and 100%, respectively. Because the setting of parameter k directly affects the calculation of the kth distance field of the data object, when k is set unreasonably, the value of the outlier factor of the outlier points close to the high-density data region will be reduced, or even lower than the value of the outlier factor of the data points at the edge of the high-density region, thus affecting the detection results of the LOF algorithm and the COF algorithm. Compared with the algorithm in this paper, although the LDOF algorithm avoids the influence of the parameters on the detection results by calculating the ratio of the k-nearest-neighbor distance of the data object to the k-nearest-neighbor internal distance, the k-nearest-neighbor distance of the edge data is larger while the k-nearest-neighbor internal distance is smaller, resulting in a generally larger value of the outlier factor of the edge data, which affects the accuracy of the detection of the outlier points of the data.
In order to further verify the effectiveness of this paper’s algorithm on high-dimensional data with more data volume, based on the electricity metering data of three low stations selected in the previous paper, five categories of data points from them are selected as the cluster data, and then eight categorized data points are taken as the outlier data. Use the improved DBSCAN algorithm to perform clustering first, and compare the outlier detection accuracy and running time of different algorithms on the real dataset. Fig. 9 shows the outlier detection effect of different algorithms, where Fig. 9(a)~(b) shows the outlier detection accuracy and running time, respectively.

The effect of the survey of the group
As can be seen from the figure, for different values of parameter k, the accuracy of this paper’s algorithm is higher than that of the other three algorithms. The accuracy of this paper’s algorithm is up to 0.845 for outlier detection of non-stationary multi-parameter power metering data, while the LOF algorithm, COF algorithm, and LDOF algorithm have outlier detection accuracies of up to 0.667, 0.575, and 0.727 only. In addition, the running time of LOF and COF algorithms grows rapidly with the increase in the amount of data, and the running time of the LDOF algorithm is lower than that of the COF algorithm. However, the running time of the algorithms in this paper grows with the amount of data, and its running time is only 345.07ms at the highest, which is shorter than that of the other three algorithms. It shows that the advantages of the outlier detection algorithm designed in this paper will be more prominent when the data volume is large. Therefore, the algorithm in this paper has better performance when performing outlier detection for non-stationary multi-parameter power metering data.
The article obtains power measurement data in distributed flexible sensing terminal, in which the measurement data are collected through the protection unit, and the protection unit will be affected by the temperature to a certain extent, which makes the acquired data have a large error. In order to analyze the trend of the electric energy measurement data of different systems at different temperatures, this paper selects the data acquisition system based on big data technology and cloud computing technology as a comparison, so as to verify the effectiveness of the system in this paper in the acquisition of electric energy measurement data under temperature change. Figure 10 shows the trend of the ratio difference of electric energy metering data under different temperatures.

Error comparison diagram under different temperatures
After analyzing, it is found that at low temperature (-25℃~0℃), the range of variation of the ratio of electric energy metering data of this paper’s system affected by temperature is between -0.5%~-0.3%, and the range of variation of the ratio of electric energy metering data of the Big Data and Cloud Computing systems is [-0.1%,0.05%] and [-0.1%,0.1%], respectively. In the case of room temperature (0℃~18℃) and high temperature (18℃~25℃), the range of variation of the specific difference of electric energy metering data of the system in this paper affected by temperature is between -0.35% and -0.2%, and the range of variation of the specific difference of the other two systems fluctuates more. By analyzing the results, for different types of power metering data acquisition system, temperature has a significant impact on the change of power metering data error. In the low-temperature state than the difference change is small, room temperature state than the difference change increases, in the high-temperature state than the difference change is large, than the difference change is positively correlated characteristics. Comparatively speaking, the specific difference of the power metering data obtained by the system in this paper is relatively small, which can provide reliable data support for the accurate realization of outlier detection, and also shows the effectiveness of the application of the distributed flexible sensing terminal in the non-steady state environment.
The outlier detection of distributed flexible sensing terminal data is carried out to better realize power dispatching, based on which, combined with the distributed flexible sensing terminal application system designed in the previous section, it is applied to the error analysis of monthly power total active power in the distribution network. Based on the total active power error data from 2023-05 to 2024-04 in City C, Province H, the error of the analog output of the system is compared with the actual output. Figure 11 shows the monthly electrical energy error of the electrical energy metering system.

The monthly electrical error of the electric power measurement system
From the figure, we can get that the maximum actual monthly total active power error is -0.471% (2023-07), and the variation of monthly power error in 1 year time is 0.227% (2023-09, 2023-07 difference), and the overall metering performance is stable during one year operation, which can be prioritized to be applied in electric energy trade settlement. Distributed flexible sensing terminal application system due to which the B-phase meter failure, the maximum total active monthly power error is only -4.435% (2024-01), for the normal operation of the A and C phases, the maximum monthly active power error is -0.272% (2023-11) and -0.199% (2023-07), but its 1-year power error variance is only 0.212% and 0.289%, distributed flexible sensing terminal application system of electric energy metering data collection error is small, the normal operation of the metering equipment also exists in the case of large system error, can be put into operation through the parameter configuration to eliminate the system error and then used for electric energy metering. In summary, the distributed flexible perception terminal application system designed in this paper can realize the accurate acquisition of power measurement data and meet the demand for power distribution network power allocation.
The article develops a distributed flexible sensing terminal application system based on AMI architecture, and proposes an outlier detection algorithm for multi-parameter data in non-stationary environment by combining the improved DBSCAN clustering algorithm and local outlier factor algorithm. It is found that this paper’s algorithm can achieve an accuracy of up to 0.845 and a maximum running time of only 345.07ms when detecting outliers in non-stationary multi-parameter electric energy metering data, which is a better outlier detection performance compared with the comparison algorithm. When the distributed flexible sensing system is applied to the monthly power error calculation, the total active error of monthly power is only -4.435%. The combination of machine learning and multi-parameter energy metering data in a non-stationary environment can accurately detect outliers and provide reliable data support for the optimal allocation of power in the distribution network.
