Research on Computational Load Balancing for Massively Parallel Tasks Based on Adaptive Iterative Algorithm

With the development of computing technology, computers have been applied to an increasingly wide range of fields. In addition to the computer automation applications that we often see in our daily life, there are also some very complex fields, such as large-scale scientific computing, massive data information processing, etc., which usually have higher requirements on the processing power of computers, and the commonly used PCs simply can not meet their needs [1-3]. So people must strive to build more powerful computer systems to solve these complex problems. However, the application is endless, after solving a complex problem, more complex problems will emerge, putting higher requirements on computers [4-6].

Throughout the history of computer development, people have been in the constant pursuit of high computer performance. From the history of computer development, it is easy to see that a large number of hardware design solutions have appeared in the past forty to fifty years to improve computer speed. For example, increasing the CPU frequency, using multi-level buffering and instruction prediction, improving the access speed of memory and disk, improving the process to increase integration and reduce power consumption, etc. However, due to the physical characteristics of the hardware itself, such improvements are not only costly, but also bound to go to its limit [7-10]. Meanwhile, with the development of Ethernet, there exists a large amount of free computing resources on the network. Therefore, a more feasible approach is to use load balancing techniques to reschedule the tasks that have been assigned to each node and make the load of each node roughly equal through process migration (also known as task migration), so that the processors of all nodes can execute the tasks consecutively, avoiding the coexistence of idle and busy waits, and effectively improving the resource utilization rate, decreasing the average response time of the tasks, and realizing the execution time minimization goal [11-14]. Its greatest significance lies in its ability to fully utilize the existing idle CPUs in the network, providing computing power far exceeding that of the equivalent serial computers, while at the same time the price is lower than that of the equivalent minicomputers. If run properly, very good results can be obtained [15-17].

Writing parallel programs in a cluster environment is different from programming on a single machine, and the programmer has to deal with a great deal of application-independent drudgery. To make parallel programs run efficiently on clusters, a very critical issue is to ensure that all computers get the right amount of load, avoiding the situation where some computers are overloaded while others are idle [18-20]. In professional terms, this means that load balancing is required. The impact of load balancing on the overall performance of the cluster is very obvious, and exploring reasonable load balancing strategies is very important for improving cluster efficiency and shortening parallel computing time [21-23].

In this paper, the structural model and key techniques of task scheduling in cloud computing environment are first described, after which a mathematical model of task scheduling is obtained in terms of task and resource nodes, resource load and task scheduling time span. An improved adaptive iterative based algorithm (AGAPD) is obtained after processing using techniques such as fitness function, selection, crossover and mutation operators. The convergence of the AGAPD algorithm is then solved based on Markov chain solving for the equilibrium state of a massively parallel task, and the process of solving the parallel test task scheduling problem with the AGAPD algorithm is demonstrated. Finally, the solution time and CPU occupancy of the 2 traditional and this paper’s algorithms are tested by large and small scale use cases, and the performance of the algorithm’s load balancing is analyzed with the help of large scale use cases.

2

AGAPD-based load balancing algorithm for parallel task scheduling

2.1

Task Scheduling Model in Cloud Computing Environment

2.1.1

Structural model of task scheduling

The cloud computing environment based on virtualization technology is transforming the cluster of physical nodes into a cluster of resource nodes, and then the attribute information of the resource nodes is handed over to the data center agent. The users submit their respective tasks to the data center agent through the user interface and the data center agent is responsible for the specific scheduling process of the tasks. Task scheduling in cloud computing environment is to schedule different tasks to the appropriate resource nodes for execution and evaluate the advantages and disadvantages of the scheduling scheme with the overall execution benefits.

2.1.2

Key Technologies for Task Scheduling in Cloud Computing Environment

Tasks in a cloud computing environment are generally categorized into tasks that have dependencies on each other and tasks that are independent of each other. Measurement indexes of resource capacity on resource nodes generally include computing capacity, storage capacity, transmission capacity, etc. To ensure that the overall execution time of the system is minimized and the load of the cloud computing system is balanced, the resource nodes should be described by at least combining the two aspects of computing capacity and load rate, and this paper, in order to simplify the problem, focuses on designing the task scheduling model from the computational capacity of the resource nodes and the load rate. In this paper, in order to simplify the problem, we focus on designing the task scheduling model from the computing power and load rate of the resource nodes, and assume that the tasks to be scheduled are independent meta-tasks, i.e., each task is a nondivisible atomic task.

2.1.3

Mathematical model for task scheduling

1)

Representation Model of Tasks and Resource Nodes

Scheduling n mutually independent meta-tasks to be executed on m resource nodes (n > m), where the set of tasks is denoted by T, $T = {t_{1}, t_{2}, t_{3}, \dots, t_{n}}$ $$T = \left\{ {{t_1},{t_2},{t_3}} \right.,\left. { \cdots ,{t_n}} \right\}$$, and the set of resource nodes is denoted by R, $R = {r_{1}, r_{2}, r_{3}, \dots, r_{m}}$ $$R = \left\{ {{r_1},{r_2},{r_3}, \cdots ,{r_m}} \right\}$$, $t_{j} (j = 1, 2, 3, \dots, n)$ $${t_j}\left( {j = 1,2,3, \cdots ,n} \right)$$ denotes the jth subtask, and r_i(i = 1, 2, 3, ⋯, m) denotes the ith resource node. Each subtask can be executed on only one resource node. The scheduling relationship between task set T and resource node R is represented by a map matrix: (1) $m a p = [\begin{matrix} t r_{11} & t r_{12} & \dots & t r_{1 n} \\ t r_{21} & t r_{22} & \dots & t r_{2 n} \\ \dots \\ t r_{m 1} & t r_{m 2} & \dots & t r_{m n} \end{matrix}]$ $$map = \left[ {\begin{array}{*{20}{c}} {t{r_{11}}}&{ t{r_{12}}}& \cdots &{ t{r_{1n}}} \\ {t{r_{21}}}&{ t{r_{22}}}& \cdots &{ t{r_{2n}}} \\ \cdots &{ }&{ }&{ } \\ {t{r_{m1}}}&{ t{r_{m2}}}& \cdots &{ t{r_{mn}}} \end{array}} \right]$$

Where, map(i, j) denotes the scheduling (mapping) relationship between task j and resource node, map(i, j) ∈ {0, 1}, i = {1, 2, 3, ⋯, m}, j = {1, 2, 3, ⋯, n}, map(i, j) = 1 when it means that task j is scheduled to be executed on resource node i, and map(i, j) = 0 when it means that task j is not assigned to resource node i. 2)

Resource load balancing model

Denote by C_i the computing power of the ith resource node, W_j the expected required computation of task j, p_i the resource load ratio of resource node i, and ω_i the expected total computation required to complete all tasks scheduled to resource node i. Then: (2) $p_{i} = \frac{ω_{i}}{C_{i}} \times 100 %$ $${p_i} = \frac{{{\omega_i}}}{{{C_i}}} \times 100\%$$

can be deduced: (3) $p_{i} = \frac{ω_{i}}{C_{i}} = \frac{\sum W_{j}}{C_{i}}$ $${p_i} = \frac{{{\omega_i}}}{{{C_i}}} = \frac{{\sum {{W_j}} }}{{{C_i}}}$$

j ∈ {set of task numbers scheduled to resource node i}, and denote the cloud computing system resource load factor by P, then: (4) $P = = \frac{\sum_{i - 1}^{n} W_{j}}{\sum_{i = 1}^{m} C_{j}} \times 100 %$ $$P = = \frac{{\sum\limits_{i - 1}^n {{W_j}} }}{{\sum\limits_{i = 1}^m {{C_j}} }} \times 100\%$$

C_i = Num_i × Mips_i, Num_i denote the number of processors in resource node i, and Mips_i denotes the average speed of processors in resource node i.

Define the fitness function for the resource load rate: (5) $f_{p} (I) = \sum_{i = 1}^{m} \frac{1}{| P_{i} - p |} (i \in {1, 2, 3, \dots, m})$ $${f_p}(I) = \sum\limits_{i = 1}^m {\frac{1}{{\left| {{P_i} - p} \right|}}} \left( {i \in \left\{ {1,2,3, \cdots ,m} \right\}} \right)$$

Where p_i is closer to the system equilibrium rate, the larger the value of its fitness function, indicating that the system resource load is closer to the equilibrium level. 3)

Task scheduling time span modeling

Assuming that Etc(i, j) is used to denote the expected execution time of task j on resource node i, then, the mapping scheduling relationship matrix map corresponding to the set of tasks and the set of resource nodes can be constituted as the Etc task execution expected time matrix: (6) $E t c = [\begin{array}{l} E t c_{11} & E t c_{12} & E t c_{13} & \dots & E t c_{1 l} \\ E t c_{21} & E t c_{22} & E t c_{23} & \dots & E t c_{2 n} \\ E t c_{31} & E t c_{32} & E t c_{33} & \dots & E t c_{3 n} \\ \dots \\ E t c_{m 1} & E t c_{m 2} & E t c_{m 3} & \dots & E t c_{m n} \end{array}]$ $$Etc = \left[ {\begin{array}{*{20}{l}} {Et{c_{11}}}&{ Et{c_{12}}}&{ Et{c_{13}}}& \cdots &{ Et{c_{1l}}} \\ {Et{c_{21}}}&{ Et{c_{22}}}&{ Et{c_{23}}}& \cdots &{ Et{c_{2n}}} \\ {Et{c_{31}}}&{ Et{c_{32}}}&{ Et{c_{33}}}& \cdots &{ Et{c_{3n}}} \\ \cdots &{ }&{ }&{ }&{ } \\ {Et{c_{m1}}}&{ Et{c_{m2}}}&{ Et{c_{m3}}}& \cdots &{ Et{c_{mn}}} \end{array}} \right]$$

$M s_{j} (j = 1, 2, 3, \dots, m)$ $$M{s_j}\left( {j = 1,2,3, \cdots ,m} \right)$$ denotes the expected completion time of processing all the assigned tasks on the resource node j, then, based on the matrices map and Etc, Ms_j are defined as follows: (7) $M s_{j} = \sum_{i = 1}^{n} E t c (i, j) \times m a p (i, j)$ $$M{s_j} = \sum\limits_{i = 1}^n {Etc} (i,j) \times map(i,j)$$

Definitions: $M s_{\max} = \max {M s_{j}}$ $$M{s_{\max }} = \max \left\{ {M{s_j}} \right\}$$, $j \in {1, 2, 3, \dots, m}$ $$j \in \left\{ {1,2,3, \cdots ,m} \right\}$$, Since the resource nodes in a cloud computing environment process tasks in parallel, the total completion time of a task depends on the time used by the resource node with the slowest (longest) processing time. Therefore, we define the fitness function of the task processing time as: (8) $f_{M t} (I) = \frac{1}{M s_{\max}}$ $${f_{Mt}}(I) = \frac{1}{{M{s_{\max }}}}$$

The above equation indicates that the lesser the task execution time, the greater the value of adaptation.

2.2

Design of Load Balancing Scheduling Algorithm for Massively Parallel Task Computing

2.2.1

Improvement of adaptive genetic algorithms

1)

Coding and population initialization

Individuals are generated by randomly ordering the genes, and the initialization of the population is completed by generating individuals several times until the number of individuals reaches the population size. The random ordering is to make the individuals as random as possible, so that the initial population is uniformly distributed in the whole solution space.

2)

Design of fitness function

The fitness function [24] varies according to the optimization direction of the specific problem, the goal of parallel test task scheduling research in this paper is to determine the scheduling plan with the shortest total test time, so the total test time is used as a criterion for evaluating the strengths and weaknesses of individuals.

The acceleration ratio f is a function that evaluates the length of time it takes a parallel test to complete the test and is expressed as: (9) $f = \sum_{i = 1}^{m} τ_{i} / t i m e (T^{ρ})$ $$f = \sum\limits_{i = 1}^m {{\tau_i}} /time\left( {{T^\rho }} \right)$$

Where: $t i m e (T^{p})$ $$time\left( {{T^p}} \right)$$ is the time required to complete the test for the scheduling scheme indicated by T^p. f is used as the fitness function, with larger values indicating better individuals. 3)

Selection operator

The elite retention strategy of the selection operation selects a portion of the excellent individuals directly and inherits them to the offspring, solving the problem that the excellent individuals may not be selected or destroyed by the crossover and mutation operations when using the roulette algorithm. The selection probability obtained using the roulette algorithm is: (10) $p_{i} = \frac{f_{i}}{\sum_{k = 1}^{N} f_{k}}$ $${p_i} = \frac{{{f_i}}}{{\sum\limits_{k = 1}^N {{f_k}} }}$$

Where: p_i is the probability that the ind individual is selected, f_i is the fitness value of the ith individual, and N is the population size. 4)

Crossover operator

The crossover operation gets the offspring by crossing part of the gene segments of 2 parents, and inherits part of its own characteristics to the offspring to increase the diversity of the population and improve the global search ability. The two-point crossover method is adopted, and the specific steps are as follows: Step 1 Randomly select the parents to be crossover and the crossover gene fragments. Step 2 Exchange the gene fragments, and use “X” to occupy the gene positions that do not need to be exchanged. Step 3 In order to prevent coding conflicts, delete the genes in parent p₁ from child c₁ to get gene fragment $s_{1} = {3, 7, 10, 8, 4, 1}$ $${s_1} = \left\{ {3,7,10,8,4,1} \right\}$$, and put the genes in s₁ into child c₁ to get new child c₁, and similarly to get new child c₂.

5)

Mutation operator

Mutation operation is used to produce offspring by changing some genes in the parent to increase the diversity of the population and to avoid local optimization to some extent. Reverse-sequence mutation is used, the specific steps are as follows: step 1 randomly select the parent and the mutated gene fragments that need to be mutated. Step 2 Generate the offspring by reverse ordering the gene fragments.

6)

Adaptive crossover and mutation probability

In this paper, the population diversity is expressed in terms of the degree of dissimilarity, and the degree of dissimilarity d_i,j between individuals is defined as: (11) $d_{i, j} = \frac{\sum_{k = 1}^{m} d (x_{i k}, x_{j k})}{m} i \neq j$ $${d_{i,j}} = \frac{{\sum\limits_{k = 1}^m d \left( {{x_{ik}},{x_{jk}}} \right)}}{m}i \ne j$$ (12) $d (x_{i k}, x_{j k}) = {\begin{array}{l} 1 & x_{i k} \neq x_{j k} \\ 0 & x_{i k} = x_{j k} \end{array}$ $$d\left( {{x_{ik}},{x_{jk}}} \right) = \left\{ {\begin{array}{*{20}{l}} 1&{ {x_{ik}} \ne {x_{jk}}} \\ 0&{ {x_{ik}} = {x_{jk}}} \end{array}} \right.$$

Where: d_i,j is the dissimilarity between individuals x_i and x_j; x_ik and x_jk are the kth genes of individuals x_i and x_j, respectively.

The degree of dissimilarity D for the population as a whole is: (13) $D = \frac{\sum_{i = 1}^{N - 1} \sum_{j = i + 1}^{N} d_{i, j}}{C_{N}^{2}}$ $$D = \frac{{\sum\limits_{i = 1}^{N - 1} {\sum\limits_{j = i + 1}^N {{d_{i,j}}} } }}{{C_N^2}}$$ (14) $C_{N}^{2} = \frac{N (N - 1)}{2}$ $$C_N^2 = \frac{{N(N - 1)}}{2}$$

Individuals are categorized into inferior and superior individuals based on the average fitness value of the current population. For inferior individuals, i.e., those with small fitness values, they are defined with crossover and mutation probabilities: (15) $p_{c} = {\begin{array}{l} p_{c \max} - \frac{(p_{a \max} - p_{c \min}) (f' - f_{a v g}) D}{(f_{\max} - f_{a v g}) D_{0} + o} & f' \geq f_{a v g} \\ p_{a \max} & f' < f_{a v g} \end{array}$ $${p_c} = \left\{ {\begin{array}{*{20}{l}} {{p_{c\max }} - \frac{{\left( {{p_{a\max }} - {p_{c\min }}} \right)\left( {f^\prime - {f_{avg}}} \right)D}}{{\left( {{f_{\max }} - {f_{avg}}} \right){D_0} + o}}}&{ f^\prime \geq {f_{avg}}} \\ {{p_{a\max }}}&{ f^\prime < {f_{avg}}} \end{array}} \right.$$ (16) $p_{m} = {\begin{array}{l} p_{\max} - \frac{(p_{m \max} - p_{m \min}) (f - f_{a v g}) D}{(f_{\max} - f_{a v g}) D_{0} + o} & f \geq f_{a v g} \\ p_{\max} & f < f_{a v g} \end{array}$ $${p_m} = \left\{ {\begin{array}{*{20}{l}} {{p_{\max }} - \frac{{\left( {{p_{m\max }} - {p_{m\min }}} \right)\left( {f - {f_{avg}}} \right)D}}{{\left( {{f_{\max }} - {f_{avg}}} \right){D_0} + o}}}&{ f \geq {f_{avg}}} \\ {{p_{\max }}}&{ f < {f_{avg}}} \end{array}} \right.$$

where: p_c is the crossover probability; p_cmax is the initialized maximum crossover probability; p_cmin is the initialized minimum crossover probability; D₀ is the initial population dissimilarity; f′ is the value of the fitness of the larger parent individual in the crossover operation; f_avg is the average fitness value of the population; p_m is the probability of variance; p_mmax is the initialized maximum probability of variance; p_mmin is the initialized minimum probability of variance; f is the value of fitness of the variant individual; and o is the equivariate infinitesimal that Prevent the denominator from being 0.

According to equations (15) and (16), the change images of self-adaptation crossover [25-26] and mutation probability are plotted as shown in Figures 1 and 2. It can be seen that for individuals whose fitness value is smaller than the average fitness value, their crossover and variation probabilities are maximized; for individuals whose fitness value is larger than the average fitness value, the crossover and variation probabilities are determined based on their fitness value and the ratio of the current population dissimilarity D to the initial population dissimilarity D₀.

During the iteration process, when the diversity of the population is high, the population dissimilarity D is similar to the initial population dissimilarity D₀, i.e: (17) $p_{c} = {\begin{array}{l} p_{c \max} - \frac{(p_{c \max} - p_{c \min}) (f' - f_{a v g})}{(f_{\max} - f_{a v g}) + o} & f^{'} \geq f_{a v g} \\ p_{c \max} & f' < f_{a v g} \end{array}$ $${p_c} = \left\{ {\begin{array}{*{20}{l}} {{p_{c\max }} - \frac{{\left( {{p_{c\max }} - {p_{c\min }}} \right)\left( {f^\prime - {f_{avg}}} \right)}}{{\left( {{f_{\max }} - {f_{avg}}} \right) + o}}}&{ f' \geq {f_{avg}}} \\ {{p_{c\max }}}&{ f^\prime < {f_{avg}}} \end{array}} \right.$$ (18) $p_{m} = {\begin{array}{l} p_{m \max} - \frac{(p_{\max} - p_{m \min}) (f - f_{a v g})}{(f_{\max} - f_{a v g}) + o} & f \geq f_{a v g} \\ p_{m \max} & f < f_{a v g} \end{array}$ $${p_m} = \left\{ {\begin{array}{*{20}{l}} {{p_{m\max }} - \frac{{\left( {{p_{\max }} - {p_{m\min }}} \right)\left( {f - {f_{avg}}} \right)}}{{\left( {{f_{\max }} - {f_{avg}}} \right) + o}}}&{ f \geq {f_{avg}}} \\ {{p_{m\max }}}&{ f < {f_{avg}}} \end{array}} \right.$$

When the diversity of the population is low and the population dissimilarity D is much smaller than the initial population dissimilarity D₀, then there is: (19) $p_{c} = p_{c \max}$ $${p_c} = {p_{c\max }}$$ (20) $p_{m} = p_{m \max}$ $${p_m} = {p_{m\max }}$$

The initial population is randomly distributed throughout the solution space, when the population is most diverse and most dissimilar.

2.2.2

AGAPD algorithm flow

The flow of solving the parallel test task scheduling problem based on AGAPD algorithm is shown in Fig. 3. The basic steps are as follows:

Step 1: Initialize the algorithm parameters. Obtain the test task information.

Step 2: Use integer coding to encode the test tasks and generate individuals in a randomized order.

Step 3: Calculate the dissimilarity of the initial population based on the population obtained from initialization.

Step 4: Calculate the fitness value of each individual.

Step 5: Determine whether the termination condition is satisfied, here the termination condition is whether the number of iterations reaches the maximum number of iterations, if yes then execute step 9, otherwise execute step 6.

Step 6: Use a roulette algorithm with an elite retention strategy to perform a selection operation.

Step 7: First, calculate the dissimilarity degree of the current population; then, calculate the current crossover probability of the population; finally, carry out the crossover operation.

Step 8: First, the dissimilarity degree of the current population is calculated; then, the current mutation probability of the population is calculated; finally, the mutation operation is performed and step 4 is executed.

Step 9: According to the individual fitness value, the individual with the largest fitness value is output as the result of task scheduling.

3

Load balancing performance analysis for massively parallel task computing

3.1

Experimental validation analysis

In this paper, the performance of the algorithms is compared through small-scale and large-scale use cases, respectively. The small-scale case is used to test the algorithm’s ability to handle general test task scheduling, while the large-scale case is used to explore the algorithm’s extreme performance. The test task completion time and CPU utilization rate are used as the evaluation indexes of the algorithms.

3.1.1

Small-scale use case testing

The small-scale test case 1 is shown in Table 1, the scale: the number of test tasks is 7, the number of resources is 7. The algorithms in the table are solved by calling the integer planning exact algorithm (method 1), the genetic algorithm (method 2), and the AGAPD algorithm (method 3) in this paper, respectively. 3 algorithms can find the optimal solution of the case, and the difference of their solving efficiencies is not big.

Table 1.

Small-scale test case 1

Test number	R₁	R₂	R₃	R₄	R₅	R₆	R₇	R_i
Resource number	R₁	R₂	R₃	R₄	R₅	R₆	R₇	R_i
1	1	1	1	0	0	0	0	6
2	0	0	1	0	0	0	0	15
3	0	1	0	1	0	0	0	5
4	0	0	0	0	1	0	0	4
5	1	0	0	0	1	0	0	22
6	0	1	1	0	0	1	0	7
7	0	0	0	0	0	0	1	43
8	0	1	0	1	0	0	0	14

The solution results of the 3 algorithms are shown in Fig. 4.

The solution time and CPU occupancy of the three algorithms are shown in Fig. 5. It can be seen that the solving time of Method 1, Method 2 and Method 3 algorithms are 1.67s, 0.85s and 0.56s respectively, which is a big difference, and their CPU occupancy are 55%, 68% and 40% respectively, compared with Method 3, which has the lowest solving time and CPU occupancy, and has the best effect.

3.1.2

Large-scale use case testing

Large-scale example: the number of test tasks is 150, and the number of resources is 15. The results of the large-scale case test are shown in Fig. 6. When the test scale is enlarged to a certain degree, the integer planning exact algorithm and the genetic algorithm can barely accomplish the task, and their solution times reach 5944s and 3782s respectively, and their CPU occupancy rates are as high as 93.5% and 84.6% respectively.

In contrast, the AGADP algorithm (Adaptive Genetic Algorithm) proposed in this paper can search for the optimal solution in only 357 seconds, and its CPU occupancy rate is only 49.3% at this time. Thus, compared with the traditional heuristic algorithm, the AGADP algorithm proposed in this paper has a great advantage in large-scale testing tasks.

3.2

Load balancing performance result analysis of AGAPD algorithm

Figure 7 shows the input task statistics of this experiment, the system randomly generates 2000 tasks between 150 and 650 as test input. In this paper, the generated dataset in the simulated environment is used uniformly, so that although it does not guarantee that the AGAPD algorithm has the same performance in the real environment, it can ensure comparability under the premise of only considering the factor of computational performance, and lay the foundation for more in-depth research in the future.

3.2.1

Convergence of the degree of load imbalance

The convergence of the algorithm needs to be examined during the experiments in this subsection, i.e., whether the load equalization metrics show a decreasing trend as the number of iterations grows, and the comparison of the convergence speeds at different diffusion radii. The convergence of the degree of load imbalance is shown in Figure 8. The stability of the algorithm needs to be improved, especially by the time the system is about to end, as many processes are already in idle state, and the remaining tasks of the processes that have not yet completed the computation tasks are not much, so the tasks can only be appropriately relocated, but still can not change the situation of the load imbalance at this time.

Algorithm in the early stage is a steady decline and tend to stabilize the state, the difference is that the larger the diffusion radius (such as diffusion radius of 5), the sooner the system tends to stabilize the state. At this point, because the processes are in a state of information gap in the early stage, so in the first few iterations, the radius of information diffusion determines how quickly or slowly the system will make the decision of load balancing.

By the middle of the process, most of the processes in the system have gained information about other processes, the difference is only whether this information is up-to-date or not, but this does not affect the execution of the optimization algorithm. Because its design is adapted to the asynchronous update state, the system is at a steady state at this point.

At a later stage, on the basis of the original load distribution, not much balancing is needed, just let them each complete the remaining tasks, but the evaluation index will therefore tend to increase, considering that the effect of migration at this time is not very large, so the increase in the degree of load imbalance during this period can be ignored.

3.2.2

Information diffusion protocol efficiency

The experiments in this subsection focus on testing the propagation efficiency of the message diffusion protocol in terms of the number of diffusion times required to diffuse a message to 99% coverage in heterogeneous cluster systems of different sizes and with different diffusion radii. The number of diffusion times required for a message to diffuse to 99% of the processes is shown in Figure 9. It can be seen that as the system size increases, the number of diffusion times shows an extremely slow growth trend, which indicates that the message diffusion protocol used in the AGAPD algorithm has good scalability, and the number of diffusion times stays around a stable value no matter how the system grows. A larger diffusion radius will require fewer diffusions, as there are more objects exchanging information in each round, so the number of diffusions is relatively small to achieve 99% coverage.

3.2.3

Overhead of Dynamic Load Balancing

For the load balancing algorithms in the simulation experiments, the measurement of their overhead is relatively simple, i.e., the amount of tasks for which migration occurs; in fact, a more accurate measurement would also need to include the information sent and received by the information diffusion, but since the information diffusion of the AGAPD algorithm is designed to be asynchronous, and the delay in its communication does not have much impact on the final result, it is not included in the measurement. The additional overhead of the different algorithms at different system sizes is shown in Figure 10. The additional overhead of the AGAPD algorithm has been kept very low (0.18), while the rest of the algorithms have more system overhead. This may be related to the fact that the AGAPD algorithm fully takes into account the possibility of repeated task migrations and learns the structure of the probabilistic graph by reorganizing the system into a bipartite graph and a neutral set of nodes, thus reducing unnecessary task migrations.

3.2.4

Performance Improvement for Parallel Computing

The purpose of the experiments in this subsection is to test the speedup ratio improvement that the AGAPD algorithm brings to the application compared to other algorithms under different sizes of heterogeneous cluster systems. The speedup ratios of different algorithms at different system sizes are shown in Fig. 11. The results show that the acceleration ratio increases with the increase of system size, and on the whole, AGAPD algorithm brings better acceleration to the system than other algorithms, which shows a gradual incremental trend with the growth of the number of processes. This shows that a good load balancing algorithm can indeed bring efficiency improvement to parallel computing applications. The rest of the algorithms have a mediocre performance, the reason is that the scope of the collection of load information is limited to the processes to understand the system, although the local load balancing but the global effect is not ideal.

4

Conclusion

To solve the efficiency bottleneck caused by dynamic load fluctuations in massively parallel computing, this study proposes a task scheduling algorithm based on the improved adaptive iterative algorithm (AGAPD). This algorithm is used to compare with the traditional algorithm, and the application performance of the model in this paper is verified through experiments. The results show that the method proposed in this paper saves 90.56%-93.99% of time and reduces 41.73%-47.27% of hardware resources compared to the traditional solver, which improves the efficiency of solving this type of problems. Under heterogeneous systems it is more likely to lead to load imbalance and information about computational performance is crucial for load balancing algorithms, so it is necessary to notify the rest of the processes in a timely manner. The AGAPD algorithm ensures enough information while reducing the communication overhead.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

Research on Computational Load Balancing for Massively Parallel Tasks Based on Adaptive Iterative Algorithm

Wenwen Huang

Xiaofang Hu

Fengfei Yang

Published Online: Sep 29, 2025

Received: Feb 06, 2025

Accepted: May 10, 2025

DOI: https://doi.org/10.2478/amns-2025-1104

KeywordsAdaptive Iterative Algorithm, Task Scheduling Model, AGAPD Algorithm, Load Balancing

© 2025 Wenwen Huang, Xiaofang Hu and Fengfei Yang, published by Sciendo.

This work is licensed under the Creative Commons Attribution 4.0 International License.

Keywords
Adaptive Iterative Algorithm, Task Scheduling Model, AGAPD Algorithm, Load Balancing