Design and Optimization Analysis of Energy-Efficient Routing Algorithms in Wireless Sensor Networks
Online veröffentlicht: 17. März 2025
Eingereicht: 27. Okt. 2024
Akzeptiert: 06. Feb. 2025
DOI: https://doi.org/10.2478/amns-2025-0347
Schlüsselwörter
© 2025 Hao Li et al., published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
Wireless sensor networks cover a number of high-tech, is the product of the intersection of multiple disciplines, has a broad development prospects, which urgently need to study a lot of key technologies, routing protocols is one of the research hotspots. In most of the practical applications of wireless sensor networks, miniature sensor nodes are usually randomly deployed in the personnel difficult to reach or harsh environment of the monitoring area, node energy supply is mainly through the lithium battery “one-time supply” [1]. How to obtain as much as possible under the conditions of such limited resources, effective sensory information of the material world and relay forwarded to end-users for research and analysis, wireless sensor networks can be applied to the actual key issues that must be resolved, these issues can ultimately be attributed to the routing of the network [2]. The traditional Internet in the network data communication process generally do not take energy consumption into account, it is mainly to improve the network quality of service (QoS), as far as possible to provide high bandwidth and shorten the communication delay as the goal, to ensure that the user can efficiently and effectively access network resources. However, the physical carrier of the wireless sensor network micro sensor nodes only have limited power supply capacity, limited computing power, limited storage capacity and limited wireless communication capacity, in order to ensure that the nodes can sense the information quickly, reliably, efficiently and correctly transmitted, the design of the routing protocol should follow the principle of low energy consumption and high efficiency [3-4]. In recent years, many university experts and scholars and research institutions are in-depth study of routing protocols and have made some progress, proposed many classical routing algorithms, such as LEACH, TEEN9, HEED0, EEUC and so on. And for the cluster head competition, data communication path selection, data fusion and other aspects of routing protocols, researchers have successively proposed many algorithms to improve the routing performance, such as LEACH-C1121, LEACH-P1131APTEEN and so on [5-6]. The wireless sensor network routing protocol should not only reduce the communication energy consumption of individual nodes as much as possible, but also equalize the energy consumption of the whole network from a global perspective, because the early death of a part of the nodes may cause the paralysis of the whole network, how to effectively prolong the survival cycle of the network is the main goal of the design of routing protocols. The design and development of routing protocols with reliable communication, effective energy utilization, and high adaptability is of great practical importance for wireless sensor networks to be applied in practical environments [7-8].
In this paper, in order to solve the waste of energy when the distribution of sensor nodes in the monitoring place is not uniform the reinforcement learning algorithm is applied to the wireless sensor network, so that the network nodes form a multi-path transmission topology MPHR. Applying the Gibbs distribution to represent the distribution probability of the sensor nodes, using the routing cost function in the node relative mobility, using the cost as the main indicator, selecting the optimal path to establish the routing, and finally forming a self-adaptive wireless routing strategy (MPHR-RL). Finally, the adaptive wireless routing strategy with enhanced learning algorithm (MPHR-RL) is formed to balance energy consumption and guarantee network quality.
Sensor nodes are tiny in size and are generally powered by limited energy batteries, while nodes are usually deployed in large numbers in complex environments where replacing the batteries of the nodes is almost impossible. Therefore designing efficient and energy-saving routing protocols is one of the current research hotspots in wireless sensor networks. Literature [9] categorizes the existing energy efficient routing protocols for wireless sensor networks, compares their performance metrics and discusses their strengths and weaknesses, aiming to find energy efficient routes to save energy and thus extend the lifetime of wireless sensor networks. Literature [10] discusses machine learning based energy efficient routing algorithm in wireless sensor networks and concludes that this algorithm can be used as an effective method to create energy efficient green routing models, which are informative in improving resource utilization and achieving their energy efficient load balancing in wireless sensor networks. Literature [11] compares the existing energy efficient routing techniques in wireless sensor networks and categorizes these protocols according to the newly proposed taxonomy and conducts simulation experiments through the NS3 simulator, the results of which show that the routing task is only based on a variety of intelligent techniques in order to improve the network lifetime and to ensure better coverage of the sensed area. Literature [12] synthesized the exponentially weighted moving average concept, ant-lion optimization, and whale optimization algorithms to propose a new exponential-ant-lion-whale optimization algorithm, and based on this, a high-efficiency routing model for wireless sensor networks was designed, and experiments verified the model’s reliability, which not only improves the lifetime of the wireless sensor network, but also maintains its high scalability. Literature [13] proposed an energy-efficient and reliable routing algorithm based on Dempster-Shafer evidence theory, and the theoretical analysis and simulation results proved that the proposed routing algorithm has a good application prospect, which can not only effectively extend the network lifetime, but also reduce the packet loss rate, and improve the reliability of data transmission. Literature [14] designed an energy-efficient routing protocol for wireless sensor networks based on an efficient improved artificial bee colony algorithm, aiming to improve energy efficiency and further increase network throughput, and verified the superior performance of the proposed protocol by comparing it with many similar protocols in recent years. Literature [15] focuses on LEACH-based protocols and bio-inspired protocols to reveal their limitations by comparing their architectures, strategies and performance, and proposes corresponding improvement strategies, which in turn improve their lifetime, scalability and packet delivery rate.
In addition, literature [16] designed an energy-efficient routing algorithm for wireless body area networks for smart wearable patches, and the effectiveness of the involved algorithm was examined through the metrics of network lifetime, latency, error metrics, energy efficiency, and network throughput, which not only eliminates the data aggregation latency, but also avoids routing loops in an effective way for smart wearable patches. Literature [17] pointed out the effectiveness of routing protocol energy consumption on the lifetime of wireless sensor networks and proposed a novel data aggregation aware energy efficient routing algorithm based on Q-learning, the superior performance of the proposed protocol was proved through simulation analysis and it can successfully reduce the amount of data and prolong the lifetime of wireless sensor networks. Literature [18] proposed an energy-efficient cooperative routing scheme for heterogeneous wireless sensor networks in order to enable all sensors in a multi-wsn environment to share their routing paths/nodes and relay event packets for other wsns and verified the feasibility and effectiveness of the proposed scheme through simulation experiments, which can effectively prolong the lifespan of the heterogeneous wireless sensor networks. Literature [19] proposed a three-layer cluster-based energy-efficient routing protocol for IoT sensor networks in order to overcome the problems of power consumption, network lifetime, network throughput, routing, and network security in wireless sensor networks, and verified the superior performance of the proposed scheme in terms of network survivability time, network throughput, average energy consumption, and packet latency through a large number of experiments. Literature [20] proposed energy-efficient rate-optimized congestion control routing based on energy-efficient optimal rate using hybrid K-means, greedy best-first search algorithm, firefly optimization strategy, and ant colony optimization algorithm in order to achieve energy-efficient transmission and reduce the energy consumption of the whole wireless sensor network, and verified the effectiveness and reliability of the proposed congestion control routing on the MATLAB simulation platform. Literature [21] proposed an energy-efficient multi-hop routing protocol for heterogeneous wireless sensor networks based on Gray Wolf Optimizer (GWO) and Taboo Search Algorithm (TSA)’s and designed simulation experiments to validate its superior performance, which not only efficiently utilizes the energy of the sensor nodes, but also prolongs the lifecycle of the heterogeneous wireless sensor networks. Literature [22] discusses the factors affecting energy consumption in wireless sensor networks and proposes a new and improved LEACH routing protocol, which is simulated and analyzed by MATLAB and found that the proposed routing algorithm outperforms the traditional LEACH protocol and improves the lifetime of the wireless sensor network.
In most applications, sensors are often uniformly arranged in the monitored premises because one cannot predict where the event will occur. The application scenario considered in this paper is when sensor nodes are uniformly deployed in the monitoring area, but the information collection is not uniformly distributed. That is to say, some nodes will work continuously due to the event triggered in the environment, while others will never work in the environment where no event occurs. This situation results in waste of energy and does not allow all the energy to be utilized in the best possible way. To solve this problem, we propose an adaptive wireless routing strategy that uses a reinforcement learning algorithm: MPHR-RL. In MPHR-RL, we treat the information routing process as a reinforcement learning process of distributed intelligent nodes. Each sensor node is treated as an independent intelligent node that can decide its next-hop address by parameterizing the selection probability and return, and use the nodes that do not work for a long time instead of the ones that work for a long time, in order to achieve the effect of averaging the node’s energy consumption and prolonging the lifetime of the sensor network, so that all the energies in the wireless sensor network are effective.
One of the most important applications of sensors is to collect information about some characteristic changes in the environment and transmit this information to the gateway node (root node) for processing and analysis. According to this application characteristic, the most commonly used sensor routing technique is the tree topology. These routing techniques perform data acquisition by constructing tree-shaped transmission paths. However, the tree topology itself has great limitations; its single transmission path tends to increase the transmission delay, and the root node, which has multiple information collection sub-nodes at the same time, will be the first to run out of energy and become the breaking point of the whole transmission network. In order to solve this problem, we proposed an optimal path selection algorithm for mobile networks with node energy states in our previous research. The routing cost function algorithm that takes into account the relative mobility of nodes is explained below.
In recent years, there have been studies introducing reinforcement learning into routing for infinite networks. Most of these studies have focused on two aspects. One is to quickly find the shortest path for transmission in ad-hoc networks where nodes are mobile; the other is to solve the Qos problem in wireless networks.
Reinforcement learning is an important technique in machine learning that allows an intelligent body to autonomously decide the next optimal action to achieve a specific goal by using the feedback information from the environment about previous actions. Assuming that the time step of each learning is 1) The state 2) Action 3) The action 4) In order to make the intelligences learnable, each intelligence is given a formal parameter 5) 6) When an intelligent body is in a partially observable environment, there is uncertainty in the transfer of states, and the function
This paper focuses on wireless sensor network systems for data collection. Each sensor node collects or forwards the monitored data information, and all the information will be transmitted to the gateway aggregation node for the next step. We form the network nodes into a multipath transmission topology MPHR [23-24], in MPHR each node maintains its own next-hop routing table, which includes Class A next-hop addresses, i.e., parent nodes that are one level higher than itself, and Class B next-hop addresses, i.e., nodes of the same level as itself. And the process of routing is the process of selecting a node from one’s own routing table as a next-hop forwarding node. If we consider the process of selecting a node from the routing table as a random process, then we define the probability that node
Following the previous definition, the average return expectation
It can be known that when the number of node forwards is more average, the cumulative sum of the number of next-hop node forwards is smaller, and the sum of
In order to make the sensor nodes learnable, we need to set a formal parameter
The summation in the above equation indicates that all the optional next hop nodes in the node’s own routing table are included. It is easy to introduce that the larger
The goal of enhanced learning performed by the nodes is to find the maximum
The rules for the value of Δ
where
where the gradient operator ▽ is computed for parameter
Based on the relative mobility between nodes and node energy, this paper establishes a mobility routing cost function [27-28] calculation method that considers the influence of these two factors at the same time, so that nodes can use this cost as the main index in the process of route establishment to select the optimal path to establish routes, so as to achieve the purpose of balancing energy consumption and ensuring network quality.
Define the remaining energy of the node as:
Where
Equation (7) defines the remaining energy
To represent the relative mobility between nodes, define the speed of movement of node
where the velocity with subscript
Similar to the definition of energy residual rate, to further measure the relative mobility between nodes, the concept of node relative mobility
Where the former is the relative movement speed between the two nodes, the latter is the sum of the absolute movement speed between the two nodes, i.e., the maximum value of the relative movement speed of the two nodes, characterizing the relative speed between the nodes in the limit state of the movement of the two nodes in opposite directions. The three nodes with the same relative movement speed to each other, the greater the sum of absolute movement speed the greater the relative movement of the two nodes.
Combining the energy residual rate and relative mobility defined above, the weighted definition of the routing cost function of node
Where
Where N denotes the number of nodes in the whole link. After the intermediate node receives the RREQ packet, it calculates the routing cost relative to the previous hop node by the speed and energy information of this node, and after the RREQ reaches the target node (or its parent node), it gets the routing cost of the whole link. The target node (or its parent node) will select the path with minimum
In order to investigate whether the algorithms can compute multipath routing decisions in real time based on the network state when facing complex network environments. In this paper, we compare the performance performance as well as the differences between the algorithms in terms of the performance metrics of network lifecycle, the number of nodes accessing anchors and the comparison of path lengths, latency and task transmission success rate. Under static network load, the experiment divides the network load into three intensities: low, medium, and high, and compares the differences between different algorithms under the same level of network load. Under dynamic network load, the experiment sets the initial network load intensity as low load and makes the network load higher during the iteration process, so as to compare the performance indexes of each algorithm in the face of network load changes.
In this paper, the MPHR-RL algorithm is simulated and analyzed, and the node distribution of the simulation network is shown in Fig. 1. n sensor nodes are randomly distributed in the square area of

Distribution of nodes in the simulation network
In this paper, the number of dead rounds of the first node is used as a measure of network lifetime. The number of surviving nodes in the network of different algorithms varies with the number of running rounds as depicted in Fig. 2. The experimental results show that the algorithm MPHR-RL in this paper has the longest network lifetime. When the number of network operation rounds is at 1400 rounds, the number of surviving nodes of MPHR-RL method is still 221 rounds, while the number of surviving nodes of E-WA, E-WM and E-WI methods are extremely small, 12, 23 and 5 rounds, respectively. Moreover, as shown in Fig. 2, the number of running rounds at half node death for the four algorithms MPHR-RL, E-WA, E-WM and E-WI are 1350 rounds, 1170 rounds, 1029 rounds and 954 rounds, respectively, and it is obvious that the number of running rounds at half node death as well as the number of running rounds at all node death for MPHR-RL algorithm is larger than that for the other comparative algorithms, which fully demonstrates that the MPHR-RL outstanding effect in prolonging the network lifetime.

The operation of the network survival nodes of each algorithm
This may be related to the fact that the difference between the algorithm MPHR-RL and E-WA algorithm in this paper lies in the method of determining the anchor access sequences, MPHR-RL is optimized with the objective of shortest path length based on the mobility routing cost function algorithm and assisted by reinforcement learning methods. While E-WA by random path selection, MPHR-RL under the same anchor distribution will greatly reduce the mobile base station (BS) travel distance and shorten the mobile BS cruising time under the same number of anchor points. It maximizes the number of finalized anchor points within limited mobile BS range time, which reduces the upload distance between the node and the mobile BS. Thus, saving data transmission energy for the node and affecting network lifetime. The difference between MPHR-RL algorithm and E-WM algorithm in this paper lies in the routing method, MPHR-RL uses the office as the single-hop mode switching criterion, and the nodes with anchor distance less than the office use single-hop, which can effectively save energy. The nodes outside the office can save energy by choosing the nodes inside the office as relays, which helps to equalize the energy among the nodes. The difference between MPHR-RL algorithm and E-WI algorithm in this paper lies in the way of determining the distribution of anchor points, E-WI selects the center of the spray as the anchor point through K-means, which is random and does not take into account the energy of nodes, which is prone to the emergence of high-energy nodes, and the network has a shorter lifespan. Whereas, the algorithm in this paper is based on the energy trough coefficient and selects the nodes with low energy in the network as anchor points, which are approached by the mobile BS for data collection, which greatly reduces its energy consumption in that round, which helps in energy equalization and thus prolongs the network lifetime.
The structure for comparing the number of BS access anchor points under different algorithms is depicted in Fig. 3. In this paper, the first 100 rounds of relevant data from network operations are intercepted for algorithm performance analysis. The results show that the algorithm MPHR-RL in this paper, when the network edge length is 800m, the initial number of anchor points is 97, and under the restriction of mobile BS range inches, the finalized number of anchor points is 86, and the average value of the number of anchor points in the whole process is 90.87, with the highest number of anchor points reaching 100 and the smallest number of anchor points being 74. The average values of the number of anchors in the whole process are 90.87, with a maximum of 100 and a minimum of 74. The average values of the number of anchors in the whole process are 8.78, 7.33 and 6.70 for the three comparative methods of E-WA, E-WM and E-WI respectively, with the length of the network of 800m and 100 rounds of interception. Obviously different algorithms are affected by the network energy trough of each - round, the distribution of anchor points under different number of anchor points is different, and the corresponding anchor point access sequence is also different, which affects the cruising time of the final mobile BS, and thus there are fluctuations in the number of anchor points under different number of rounds. However, the MPHR-RL algorithm in this paper is generally at a high level and achieves the effect of maximizing the number of anchor points. Obviously, compared to other algorithms, this paper’s MPHR-RL algorithm has the highest number of anchor points visited by the BS under MPHR-RL. It can be seen that the adaptive wireless routing policy with enhanced learning algorithm proposed in this paper is extremely stable.

The comparison of the number of anchor points in the different algorithm
The results of BS moving path length comparison under different algorithms are shown in Fig. 4. As can be seen from the figure, the MPHR-RL algorithm proposed in this paper is significantly better than the other three comparison methods, and the average value of the BS moving path length of the MPHR-RL algorithm is 5.5049 Km, while the average values of the E-WA, E-WM and E-WI methods are 4.2805 Km, 4.1161 Km and 3.7060 Km respectively for the three comparison methods with the length of the network side of 800m and the 100 rounds of interception of the network during the whole process. The mean values of the BS movement path lengths for the three comparison methods, E-WA, E-WM and E-WI, are 4.2805 Km, 4.1161 Km and 3.7060 Km, respectively, for a network edge length of 800 m and 100 rounds of network interception, and the reason for this result may be related to the optimization schemes of anchor distribution of different algorithms. In the first 100 rounds of network operation, the moving path length distribution of this paper’s algorithm MPHR-RL is greater than 5 Km, while the moving path length distribution of the remaining three methods is less than 5 Km. It can be seen that this paper’s anchor access sequence determination scheme can effectively optimize the BS moving paths, and at the same time, the fluctuation of MPHR-RL’s value is the smallest, which further proves the stability of the paper’s algorithm.

Different algorithms are compared with the length of the BS mobile path
This experiment compares the latency differences between different algorithms under dynamic network load from the perspective of latency. The algorithms’ delay performance is evaluated before and after a node load change in dynamic network load. The results of the delay comparison under dynamic network load are shown in Fig. 5. In the case of dynamic network load, the load variation starts to change from the time when the number of iterations is zero. The MPHR-RL algorithm proposed in this paper has the lowest transmission delay of 29.99 ms, while the transmission delays of the three comparative methods, E-WA, E-WM and E-WI, are 102.24 ms, 138.09 ms and 164.34 ms, respectively, when the number of iterations is 0. It has been found that the transmission delay of the four methods tends to stabilize at 40 iterations, which is when the four methods’ performance under dynamic network loads stabilizes. It is found that the transmission delay of the four methods is 85.51 ms, 219.93 ms, 246.43 ms and 290.72 ms at 40 iterations, at which time the transmission delay of the four methods is 85.51 ms, 219.93 ms, 246.43 ms and 290.72 ms, respectively. It is obvious that the MPHR-RL algorithm reduces the transmission delay of the MPHR-RL algorithm by 61.11%, 65.30% and 70.59% compared with the other algorithms, respectively. In contrast, the comparison algorithms are not able to effectively adjust their strategies for the real-time network state, resulting in a large transmission delay. By comparing the fluctuation of transmission delays between different algorithms during load changes, it can be found that the proposed algorithm can quickly adapt to the dynamically changing network environment.

The delay of the dynamic network load is the result
The experiment compares the differences in task transfer success rates between different algorithms under static and dynamic network loads from the perspective of task transfer success rates. Among them, under dynamic network load, the experiment counts the difference in the overall task transfer success rate of each algorithm, which is affected by the change of network load in iteration. The results of comparing the task transfer success rate of each algorithm are shown in Table 1. By comparing the task transmission success rate of each algorithm in static network under different network loads, it can be found that the success rates of the four algorithms, MPHR-RL, E-WA, E-WM, and E-WI, are 99.87%, 93.01%, 90.85%, and 89.93%, respectively, under low network loads. As the network becomes highly loaded, the success rates of the 4 algorithms MPHR-RL, E-WA, E-WM and E-WI are 98.01%, 53.83%, 65.87% and 74.23%, respectively. In contrast, only the MPHR-RL algorithm proposed in this paper keeps the transmission success rate basically unchanged, while the success rates of the 3 methods, E-WA, E-WM and E-WI, are reduced by 39.18%, 24.98% and 15.70%, respectively. The MPHR-RL algorithm is able to maintain a high task transfer success rate (98.85%) in a dynamic network load. Compared to this algorithm, the other 3 algorithms, E-WA, E-WM and E-WI, have a task transfer success rate of 73.27%, 76.16% and 79.786%, which are 25.58%, 22.69% and 19.07% less than the MPHR-RL algorithm, respectively. Combining the differences in the task transmission success rates of the algorithms under static and dynamic network loads, it can be seen that the MPHR-RL algorithm can adjust the routing strategy in time through the interaction with the environment to realize the stable transmission of the tasks, and avoid the task transmission success rate to be drastically reduced due to the network load becoming larger.
The comparison of the transmission success rate of each algorithm
| Contrast method | MPHR-RL | E-WA | E-WM | E-WI | |
| The success rate of task transmission in static network load (%) | Low load | 99.87 | 93.01 | 90.85 | 89.93 |
| Middle load | 98.17 | 81.04 | 85.98 | 88.79 | |
| High load | 98.01 | 53.83 | 65.87 | 74.23 | |
| Mission transmission success rate in dynamic network load (%) | 98.85 | 73.27 | 76.16 | 79.78 | |
In this paper, the adaptive wireless sensor routing strategy (MPHR-RL) based on reinforcement learning algorithm is analyzed for energy effectiveness in the case of uneven distribution of sensor nodes, which proves the effectiveness of the MPHR-RL algorithm in wireless sensor networks. The primary conclusions are as follows:
1) The number of rounds of death of half nodes (1350 rounds) and the number of rounds of operation when all nodes are dead (221 rounds) of MPHR-RL algorithm are larger than the other comparative algorithms, which indicates that MPHR-RL algorithm is outstandingly effective in prolonging the network lifetime.
2) Under the restrictions of network edge length of 800m and mobile BS renewal inches, MPHR-RL algorithm has the highest number of anchor points in the first 100 rounds of interception network of 100, the smallest number of anchor points is 74, and the average number of anchor points is 90.87, which is significantly higher than other methods. In addition, the MPHR-RL algorithm has the largest mean value of moving path length (5.5049 Km) under BS, with the smallest fluctuation in value and the highest stability.
3) The transmission delay of the four methods basically stabilizes when the number of iterations is 40, compared with the MPHR-RL algorithm, which has the lowest transmission delay. It shows that the MPHR-RL algorithm is more adaptable to dynamic changes in the network environment.
4) In the case of extreme changes in network load, the MPHR-RL algorithm is still extremely stable in the task transmission process, and its task transmission success rate is greater than 98%. It shows that after adding reinforcement learning, the MPHR-RL algorithm can still guarantee the success rate of task transmission under both static and dynamic network loads.
This research was supported by the University-level Key Research Project of Suzhou University (2023yzd13, 2023yzd15); Provincial General Teaching Research Project of Anhui Province (2022jyxm1603); Suzhou University University-level Innovation Integration Demonstration Course (szxy2023zcsf10); Horizontal Project of Suzhou University (2023xhx141).
