Modular design and algorithmic complexity optimization for large-scale software systems
Data publikacji: 17 mar 2025
Otrzymano: 08 lis 2024
Przyjęty: 14 lut 2025
DOI: https://doi.org/10.2478/amns-2025-0228
Słowa kluczowe
© 2025 Tiande Pan, published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
Software modularization design method is a design method that helps to improve the maintainability, testability and scalability of software systems. By adopting top-down decomposition method, bottom-up construction method, hierarchical design method, controlling module coupling, observing single responsibility principle and good interface design [1-4], a software system with clear structure and independent modules can be designed. These methods can help developers to develop, test and maintain software systems more efficiently and improve the quality of software development [5-7].
In the process of software development, program performance is a very important consideration. Program performance optimization can help us to reduce the time and space complexity, improve the responsiveness of the program and resource utilization. Algorithm is the core of the program, and its optimization has the most significant impact on program performance [8-11]. Reasonable selection and design of algorithms can greatly reduce time and space complexity. In computer science, algorithm complexity refers to the order of magnitude of operations required by an algorithm to solve a problem. As the size of the problem increases, the complexity of the algorithm also increases, which leads to problems such as slower program operation and more system resources [12-15]. Algorithm complexity is an important index to judge the effectiveness of the algorithm, the execution efficiency of the algorithm directly affects the speed of the program, which is one of the important knowledge points that programmers must master. Therefore, algorithm complexity optimization is an important means to improve program efficiency [16-19].
In this paper, the research is carried out from the analysis and transformation of users’ needs. An undirected weighted network is used to establish a user demand model and analyze the interaction relationship between user demands. The expert scoring method based on triangular fuzzy number is utilized to determine the degree of influence between nodes as the weights of connected edges in the weighted network. K-means clustering algorithm is introduced on the basis of weighted network, and user demand weights are obtained by normalizing the comprehensive eigenvalues of demand nodes to realize the clustering of user demand. Based on QFD, the fuzzy user requirements are summarized, classified and processed in terms of importance, and transformed into technical requirements that can be understood by engineers and technicians. The particle swarm optimization algorithm based on probabilistic selection is used for the modular design of large-scale software systems. Experimental analysis is carried out through theoretical data and actual open-source projects, and the modular design results are compared with those of the hMetis algorithm, highlighting the validity and reasonableness of this paper’s algorithm in the modular design of large-scale software systems. The complexity of this paper’s algorithm is analyzed according to the calculation results of time complexity and space complexity, exploring the relationship between the algorithm’s complexity and the problem size and population size, and giving the optimization method of the algorithm’s complexity by combining with the dynamic programming, greedy algorithm and partitioning method.
The design of large-scale software systems in the field of intelligent software engineering needs to know the user’s demand information for the system in advance, and then collate and analyze the user’s functional requirements for the system, and then pre-set the system’s common parts and module parts for the user to choose. Demand acquisition has an important application value for the design and development of large-scale software systems, and how to fully consider user demand in the early stage of system design has become a key factor in deciding whether large-scale software systems can be competitive in the market.
Since the user’s demand for the system is often based on perceptual understanding, with ambiguity and uncertainty, they are unlikely to have a comprehensive and detailed understanding of the structure of the demand system, functions, technical parameters and other aspects. When describing their customization requirements, they generally make conceptual descriptions based on their intuitive understanding of the system, or just make some qualitative requirements on system functions. Moreover, due to the fuzzy nature of the natural language used to describe the requirements, it is not possible to make a precise and quantitative description to transform the requirements into actual technical and functional parameters.
Users’ needs will be constantly changing and expanding with the improvement of economic level, environmental changes, and the updating of concepts. On the one hand, as the functions of large-scale systems on the market become more diversified, the original needs of users will change accordingly. On the other hand, with the system design, users will also change some of their own needs, for example, when the user in the early stages of the lack of a comprehensive understanding of the system attributes, features, in the design of the system found that the design and the needs of the existence of contradictions or does not meet the design requirements and other issues, it is necessary to analyze and deal with the appropriate modifications and adjustments.
Users in many cases do not express their needs completely and clearly, or implicitly in their proposed requirements. On the one hand, users think that some requirements of the system are unnecessary. On the other hand, since the user is not fully aware of the system, it may not be accurate enough when making requests. If the user’s needs are not carefully identified and the design is directly in accordance with the user’s requirements, it is obvious that it does not really meet the user’s requirements.
Early in the pre-design stage of the system in order to obtain user requirements, the acquisition methods mainly include market research, which is the most basic acquisition method, and can be divided into inquiry survey and observation survey. Network media acquisition, including network questionnaire surveys, the collection of web-related data. Resource sharing of cooperative enterprises, collecting customer information through channel cooperation, achieving win-win cooperation with cooperative enterprises through resource sharing, and other ways. In the field of intelligent software engineering, with the deepening of informatization and intelligence, there are more and more ways to acquire user requirements, and the acquisition of demand information has the advantages of real-time, accuracy, convenience and so on.
With the rapid expansion of big data, it is increasingly important for the modular design of large-scale software systems, the modular design process of the system needs to have big data thinking, extraction, storage and analysis of external relevant big data. It can also organize and analyze the user-system database in the process of system design, effectively analyze user information, system information and behavioral data, and then get the user’s demand information.
Weighted network is an important manifestation of complex network, which indicates the closeness of the connection between entity nodes by giving certain weights to the edges. Considering the mutual influence between user demands, so this paper adopts the undirected weighted network to establish the user demand model [20]. The user requirements are abstracted as nodes in the undirected weighted network, and the degree of connection between the requirements is abstracted as the connecting edges with weights between the nodes, so as to analyze the interaction relationship between the user requirements. Complex networks usually contain statistical characteristic parameters such as node degree, node strength, average path length and clustering coefficients, regarding the commonly used features in weighted networks describing the density of connections between nodes in a localized range are as follows:
Degree is an important statistical indicator that all nodes in a network have, it refers to the number of edges connected to the current node, usually denoted as
If given a certain network
In a directed network, the degree of a node is composed of two parts: out-degree and in-degree. By out-degree, we mean the number of edges from the current node that point to the rest of the nodes, while in-degree is the opposite, the number of edges from the rest of the nodes that point to the current node. It is interesting to note that usually in directed networks, the average out-degree and the average in-degree of the network are the same.
Whether a network is sparse or dense is usually measured using density. If there exists a network with
As can be seen from equation (3), the so-called “maximum possible number of edges” is actually the number of edges obtained by connecting each node in the network. When the density of the network is particularly small, that is, the actual existence of the edge is smaller than the possible existence of the edge, the network is a class of sparse networks, on the contrary, it is a dense network. Of course, the current academic community on the “far less than” standard is still controversial, it is generally believed that when the number of edges of a graph is less than the square of the number of nodes, the graph is said to be sparse.
Before discussing the average path length, it is important to first clarify the question, i.e., what is the distance between two points? In a complex network, there are usually numerous paths between nodes, and among these paths, the shortest path is obtained by finding the one with the least number of edges. The length of the shortest path is called the distance between two points.
On this basis, the average path length is defined as the average shortest distance between any nodes in the network, which is usually denoted by
Note that the above definition applies to undirected unweighted networks, and in weighted networks, the length of the path with the smallest sum of weights is usually regarded as the distance between two points, which is quite different from the concept of the minimum number of edges in an undirected unweighted network.
Network diameter is another statistical measure of network size, usually expressed as
However, in reality, complex networks are usually not completely connected, and it is obviously unrealistic to use the above definition to calculate the network diameter, therefore, academics usually use another way to depict the network diameter, i.e., the “effective diameter”. Since a large complex network consists of partially connected components and some isolated nodes, the network diameter can be redefined by finding the number of connected pairs of nodes in the network whose distance does not exceed a certain value. Given that there are
Typically, if
The clustering coefficient is usually used to measure the degree of node clustering in a graph. The so-called node clustering, in short, means that the neighbor nodes of the neighbor nodes are still their neighbor nodes, then it means that there is a significant clustering phenomenon of these three nodes. It is defined as if the current node
If
In the weighted network, the degree of influence between nodes is different, and the existing results do not specialize in its determination. In this paper, the expert scoring method based on triangular fuzzy number is used to determine the degree of influence between nodes as the weights of the connected edges in the weighted network, i.e., the linguistic evaluations given by the experts are converted into triangular fuzzy numbers by the method of rough set, and then the triangular fuzzy numbers are converted into clear values.
When using the triangular fuzzy number
Semantic variables based on triangular fuzzy numbers
Semantic variable | Triangular ambiguity | ||
---|---|---|---|
Very weak influence | 0 | 0 | 0.1 |
Weak influence | 0 | 0.1 | 0.3 |
General influence | 0.3 | 0.5 | 0.7 |
Strong influence | 0.7 | 0.9 | 1.0 |
Very strong influence | 0.9 | 1.0 | 1.0 |
In the table,
Based on the above analysis, a questionnaire is distributed so that the experts give the fuzzy semantic representation of the degree of influence between the nodes based on the node characteristics, attributes and other indicators, and is converted to the corresponding triangular fuzzy number of the form of
The scoring data of multiple experts are combined into one data by calculating the average:
Determining the degree of influence between nodes by defuzzification. In this paper, the strength of the degree of influence between nodes is determined in a desired manner.
First, the affiliation function of the triangular fuzzy number
Then its left and right affiliation functions are:
The corresponding inverse functions are respectively:
Therefore, the expectation of the triangular fuzzy number
Through the clustering analysis of user requirements, complex requirements can be simplified, analyzed and processed into multiple categories, each category contains its own unique detailed requirements under each category, with a low degree of similarity of the requirements between different categories and a high degree of similarity of the requirements within the same category, which helps to carry out the system design work accurately, reduces the development cost, and improves the competitiveness of the system. In this paper, K-Means clustering algorithm is introduced on the basis of the weighted network model, and the user demand weights are obtained by normalizing the comprehensive eigenvalues of the demand nodes, the number of clusters is determined by using the elbow method, and the initial clustering centers are determined according to the size of the user demand weights, and then the clustering of the user demands is realized [21].
The system design quality house is the core of quality function unfolding [22], and its role is to convert fuzzy user requirements into technical requirements that can be understood by engineers and technicians, and then waterfall decompose them into system part characteristics, development characteristics and quality characteristics, which is an effective tool for user requirements conversion. The complete system design quality house is shown in Fig. 1, the user requirements as inputs to the system design quality house is a hierarchical list of discrete and fuzzy user requirements collected from market research summarized, classified and processed in terms of importance, which is the key to the implementation of QFD.

Structure of system design house of quality
In recent years, with the increasing maturity of artificial intelligence technology, large-scale software system design is becoming more and more intelligent, which provides users with a lot of convenience. 2021, despite the overall consumer market performance downturn, but the survey data shows that the household appliances based on large-scale software system still shows growth, which set the self-cleaning technology of sweeping robots in the cleaning appliances market is in the core position. Compared with the past, the sweeping machine in the technical innovation makes it in each function has been greatly improved and upgraded, in the navigation and obstacle avoidance, combined with a variety of technologies to solve the past sweeping robots, “chaotic collision”, “leakage of sweeping” and other common problems, the use of modularized design and other artificial intelligence large-scale software system design. Modular design and other artificial intelligence large-scale software system design technology to improve the intelligent cleaning ability of the sweeping robot and the planning ability of the sweeping route to reduce the burden of labor, in addition to mopping, water tank capacity and other aspects have been greatly improved. In this paper, combining the previous industry reports of sweepers and consulting relevant experts and industry personnel, numerous demands are organized, and the detailed user demands are shown in Table 2.
User demand
Demand number | Specific demand | Demand number | Specific demand |
---|---|---|---|
Strong cleaning ability | Voice control | ||
Monitoring failure information | Time booking | ||
Clean speed | Small running sound | ||
Anti-fall collision | Lightness | ||
Drag integration | Simple operation | ||
Mobile control | Dump dust box | ||
Automatic recharge | Automatic cleaning mop | ||
Standby time | Automatic add cleaner | ||
Fault prediction | Large dust box | ||
Price appropriate | Maintenance convenience |
For the determined user requirements, the triangular fuzzy number is firstly used to realize the transformation from semantic evaluation to interval value, and then the precise value of the relationship between each user requirement is obtained according to the defuzzification of the triangular fuzzy number. In order to ensure the accuracy and objectivity of the user demand relationship, in addition to selecting professional researchers, users are invited to participate in the evaluation of the demand relationship. In this paper, 10 experts and 10 users are selected to form an evaluation group to score the semantic evaluation of the interrelationships between demands from different perspectives and get the comprehensive evaluation value of the triangular fuzzy number as shown in Table 3.
User demand triangular fuzzy number comprehensive evaluation value
Demand number | ⋯ | ||||||||
---|---|---|---|---|---|---|---|---|---|
1 | (0,0.03,0.12) | (0.78,0.93,1) | (0.32,0.58,0.74) | ⋯ | 0 | 0 | 0 | 0 | |
1 | (0,0.08,0.17) | (0,0.04,0.19) | ⋯ | 0 | 0 | (0.9,1.0,1.0) | (0.28,0.47,0.63) | ||
1 | (0.16,0.34,0.58) | ⋯ | (0,0.09,0.26) | 0 | (0,0,0.07) | 0 | |||
1 | ⋯ | (0.6,0.81,0.93) | (0,0.15,0.33) | 0 | 0 | ||||
⋯ | 1 | ⋯ | ⋯ | ⋯ | ⋯ | ||||
⋯ | 1 | (0,0.05,0.17) | 0 | 0 | |||||
⋯ | 1 | 0 | 0 | ||||||
⋯ | 1 | (0.54,0.71,0.82) | |||||||
⋯ | 1 |
For the triangular fuzzy number comprehensive evaluation value given by the expert group, according to the formula of its defuzzification processing, to get the corresponding exact value, after the defuzzification of the user demand relationship comprehensive evaluation value as shown in Table 4, in which each specific value represents the degree of interaction between the user’s needs, the larger the value, it means that the two needs the greater the degree of mutual influence, when the value of the comprehensive evaluation of the value of 0, then it means that the two demands When the composite evaluation value is 0, it means that the two needs do not affect each other and are independent of each other.
User demand relationship comprehensive evaluation value
Demand number | ⋯ | ||||||||
---|---|---|---|---|---|---|---|---|---|
1 | 0.09 | 0.93 | 0.58 | ⋯ | 0 | 0 | 0 | 0 | |
1 | 0.02 | 0.04 | ⋯ | 0 | 0 | 0.99 | 0.47 | ||
1 | 0.37 | ⋯ | 0.06 | 0 | 0.05 | 0 | |||
1 | ⋯ | 0.79 | 0.17 | 0 | 0 | ||||
⋯ | 1 | ⋯ | ⋯ | ⋯ | ⋯ | ||||
1 | 0.08 | 0 | 0 | ||||||
1 | 0 | 0 | |||||||
1 | 0.73 | ||||||||
1 |
According to the formula for the eigenvalue of each attribute of the weighted network, the eigenvalue of the parameters of each demand node is calculated separately, and the weight of user demand is obtained by normalizing the comprehensive eigenvalue, and the results of the calculation of each parameter are shown in Table 5. Through the establishment of the demand network model, the mutual influence relationship between user demands is considered, and more accurate demand weights are obtained, thus laying the foundation for the clustering of demands and the transformation of demands.
Users Require Network Node Eigenvalues
Demand node | Degrees | Network sparseness | Average path length | Network diameter | Concentration coefficient | Weight |
---|---|---|---|---|---|---|
12 | 3.51 | 39.08 | 0.80 | 0.55 | 0.0658 | |
10 | 5.21 | 45.65 | 0.91 | 0.50 | 0.0628 | |
15 | 5.91 | 50.16 | 0.81 | 0.52 | 0.0637 | |
12 | 5.24 | 45.98 | 0.86 | 0.54 | 0.0661 | |
16 | 3.83 | 49.13 | 0.71 | 0.43 | 0.0537 | |
10 | 4.17 | 22.49 | 0.83 | 0.50 | 0.0577 | |
8 | 3.13 | 36.80 | 0.83 | 0.40 | 0.0550 | |
9 | 3.62 | 43.00 | 0.75 | 0.48 | 0.0555 | |
9 | 4.93 | 39.93 | 0.58 | 0.37 | 0.0499 | |
17 | 7.08 | 53.58 | 0.61 | 0.40 | 0.0531 | |
9 | 3.31 | 35.27 | 0.66 | 0.46 | 0.0550 | |
9 | 2.86 | 38.62 | 0.64 | 0.40 | 0.0534 | |
8 | 1.30 | 35.54 | 0.79 | 0.41 | 0.0476 | |
6 | 1.59 | 25.22 | 0.68 | 0.37 | 0.0419 | |
5 | 2.01 | 22.94 | 0.49 | 0.33 | 0.0396 | |
5 | 2.21 | 22.80 | 0.69 | 0.39 | 0.0448 | |
4 | 1.47 | 16.64 | 0.69 | 0.37 | 0.0439 | |
3 | 1.31 | 15.05 | 0.61 | 0.37 | 0.0400 | |
5 | 1.72 | 12.73 | 0.61 | 0.31 | 0.0457 | |
8 | 2.67 | 29.35 | 0.61 | 0.34 | 0.0048 |
As can be seen from Table 5, the top-ranked demand nodes in terms of weight are
Figure 2 shows the result graph of demand data processing through the elbow method, and there is an obvious inflection point at the classification number

Clustering deviation of different K values

User demand clustering results
In establishing the QFD model, the independence of each function of the product needs to be taken into account, and the independence of each function is guaranteed by using the independence axiom in axiomatic design. By analyzing the home sweeper, its main functions include sweeping and cleaning function, monitoring and planning function, intelligent control function, optimization function and human-machine interaction function. The improved QFD is used to establish the quality house of user requirements-product functions, and the degree of correlation between the parameters in the requirements-functions relationship matrix is represented by the numbers “1”, “3”, “5”, “7”, “9”, the larger the value indicates that the stronger the relationship, with ‘0’ indicates that there is no relationship between the two parameters, the function of the sweeper with
Relationship between User Requirements and Product Quality
Demand node | Weight | |||||
---|---|---|---|---|---|---|
0.0658 | 7 | 3 | 5 | 9 | 5 | |
0.0628 | 9 | 7 | 7 | 1 | 9 | |
0.0637 | 5 | 5 | 3 | 3 | 3 | |
0.0661 | 3 | 3 | 1 | 3 | 1 | |
0.0537 | 0 | 0 | 0 | 5 | 0 | |
0.0577 | 1 | 1 | 9 | 0 | 3 | |
0.0550 | 3 | 0 | 0 | 2 | 7 | |
0.0555 | 5 | 0 | 1 | 3 | 3 | |
0.0499 | 3 | 5 | 1 | 5 | 1 | |
0.0531 | 1 | 0 | 3 | 7 | 0 | |
0.0550 | 5 | 3 | 5 | 3 | 0 | |
0.0534 | 7 | 1 | 7 | 5 | 3 | |
0.0476 | 3 | 7 | 7 | 3 | 5 | |
0.0419 | 0 | 0 | 5 | 7 | 0 | |
0.0396 | 0 | 5 | 3 | 9 | 0 | |
0.0448 | 1 | 3 | 9 | 3 | 0 | |
0.0439 | 0 | 1 | 3 | 1 | 3 | |
0.0400 | 0 | 3 | 1 | 0 | 1 | |
0.0457 | 3 | 0 | 0 | 3 | 5 | |
0.0048 | 5 | 3 | 3 | 5 | 7 |
Table 6 gives the relationship matrix of user demand - product function, the absolute importance of each function is calculated by the demand weight and demand - function relationship matrix, the specific calculation process is for the column where the demand weight is located and the column where each product function is located, each row of data is weighted and summed to get the absolute importance of the product function, and then normalization is done to get the relative importance, the results are shown in Table 7. The sweeping and cleaning function is the most important in household sweepers, followed by the intelligent control function monitoring and planning function, which needs to be focused on, and finally the optimization function and human-computer interaction function, respectively. The design and implementation of optimization function and human-computer interaction function can improve the user experience and user satisfaction, and the calculation results of the importance of each function in Table 7 are in line with the engineering reality, which provides a strong data support for the modular design of large-scale software system that meets the user’s needs.
Product function importance
Importance | |||||
---|---|---|---|---|---|
Absolute importance | 3.9543 | 3.1687 | 3.4832 | 2.5738 | 1.7064 |
Relative importance | 0.2386 | 0.2438 | 0.2573 | 0.1354 | 0.1249 |
Modularization design is to gather the same or similar elements in the system together to form functionally independent and structurally complete units (modules), and combine these units according to the requirements to come up with a system that meets the needs of different users. For the modularization design of large-scale software system, it needs to go through a series of processes such as system requirement analysis, module division, module combination and module evaluation before it can be finally user-oriented. Therefore, the modular design system of large-scale software system is derived as shown in Fig. 4.

Modular design system
Modular design of large-scale software systems needs to analyze the conceptual model of the system based on user requirements, that is, to carry out a preliminary decomposition of the system’s functional structure through market research, to complete the mapping of function to structure, and to prepare for the module division of the subsequent system. Module division is a comprehensive optimization process of multi-attribute analysis, through the study and analysis of modular design objectives and the formulation of relevant division guidelines, the use of appropriate module division methods to achieve module division. Module division of the module is divided into general module, special module, etc., design module interface and select the appropriate combination to carry out module combination, so as to get the required modular products. Module evaluation is a key step to judge whether the modular design scheme is reasonable or not, by analyzing the rationality of the functional structure of the module, modularity, complexity and other indicators, comprehensively analyze the advantages and disadvantages of the module scheme, so as to arrive at the optimal module division scheme of the system.
Module partitioning is the first step in modular design, which determines how to decompose the system into a number of relatively independent modules. Common ways of module division include division by function, division by hierarchy and division by domain. The goal of module partitioning is to reasonably allocate system functions to each module, so that each module can independently complete a specific function and the dependency relationship between modules is minimized.
Module interface is a bridge for communication between modules, which defines the interaction between modules. The design of module interfaces should follow the principles of simplicity, stability and medical use to ensure efficient and reliable communication between modules. Common interface design methods include abstract class or interface based design, event or message based design, and service or API based design.
Module implementation refers to how the functionality of a module is concretely realized. Common module implementation methods include object-oriented programming, functional programming and componentized programming. Object-oriented programming implements module functionality by defining classes and objects, and utilizes features such as encapsulation, integration, and polymorphism to ensure module independence. Functional programming implements the functionality of a module by means of pure and higher-order functions, emphasizing statelessness and invariance. Componentized programming implements the functionality of modules by defining components or plug-ins, supporting modular assembly and dynamic replacement.
Module combination is the combination of individual modules into a complete system. Methods of module combination include dependency injection, mediator patterns, and process or state machine combination methods. Dependency injection manages the dependencies between modules through the framework, realizing the dynamic combination and replacement of modules. Intermediary pattern manages the communication between modules through intermediary objects to realize decoupling and collaboration of modules. Process or state machine manages the execution order and state transitions of modules by defining processes or state machines to realize the orderly combination and control of modules.
Particle swarm optimization algorithm is a global stochastic search algorithm based on group intelligence by simulating the migration and flocking behaviors of bird flocks during foraging [23]. Solving the software modular design problem using particle swarm optimization algorithm has to include key processes such as coding and initialization, position update, etc.
For the modular design problem of a software system, each particle represents a feasible modularized design solution, represented by a modular design matrix. The initialization of the population in the elementary particle swarm algorithm is done in a random initialization manner, which does not make use of the a priori knowledge of the problem to be solved, and the initial solution tends to be poor. In the software system method network diagram, the methods in which direct calling relationships occur are connected to each other by edges. The more frequent the calling relationship, the closer the two methods are connected. The two methods that have indirect calling relationships are measured by the minimum number of edges passing between two vertices in the graph to measure the degree of connection between the two vertices. Therefore, in this paper, taking into account the characteristics of the software modular design problem, the probabilistic choice of initialization is combined with the principle of the shortest path of the graph, and the initialization is carried out in this way to improve the quality of the initial solution.
The call relationship graph of a software system is expressed as
Assuming that the position of the
In total,
Then method
Traditional particle swarm optimization algorithms are not globally convergent due to the iterative formula traversal being too directional. In this paper, we give a global convergence theorem for hybrid optimization algorithms.
Theorem 1.An iterative evolutionary algorithm must converge globally to the optimal solution with probability 1 if the iterative algorithm has the following 2 conditions.
Condition 1.Elite retention is used when comparing the historical optimal solution with the current solution:
Condition 2. the probability of transferring from any non-global optimal point (set to
It is easy to see that the probabilistic selection particle swarm algorithm satisfies the 2 conditions stated in Theorem 1 above, so it globally converges to the optimal solution with probability 1. This is illustrated as follows: firstly, in the elite retention strategy of the particle swarm algorithm, the one with the largest value of the fitness is selected as
This paper is based on theoretical data and actual open source projects for experimental analysis and comparison with the hMetis algorithm, in order to verify the software modular design algorithm and the rationality and effectiveness of this paper. The experimental data are shown in Table 8, which shows that the number of code of the experimentally selected software systems are all above 10000, which meets the standard of large-scale software systems.
Experimental data
Case number | Software system name | Class number | Method number | Number of calls | Code number |
---|---|---|---|---|---|
1 | Merge | 103 | 138 | 113 | 12983 |
2 | Carch | 192 | 291 | 462 | 13482 |
3 | Yahoo | 231 | 584 | 593 | 15671 |
4 | Bitcoin | 576 | 4083 | 2194 | 19834 |
5 | Emule | 588 | 8326 | 3586 | 20976 |
6 | Svn | 784 | 9248 | 5735 | 25847 |
7 | V8 | 932 | 11823 | 10842 | 39594 |
This section from the theoretical data and the actual case of 2 aspects of the experimental analysis, and in the follow-up on the design results and other algorithms for comparison, to verify the reasonableness and effectiveness of this paper’s method.
Theoretically, and with a set of test cases, that is, reflecting the large-scale software system amount of source code, can obviously analyze the modular design situation. Then use the algorithm proposed in this paper to carry out modularization design, the two to verify the reasonableness of the algorithm in this paper. Among them, merge and Carch are theoretical data. Merge is mainly a merger of 2 independent programs. Modular design, obviously 2 independent programs are divided into their own modules, 2 sub-modules and then based on the modularity of the design of a finer-grained sub-module.Carch is the implementation of this paper’s algorithm. The experimental results of the design carried out for the software system in Table 8 are shown in Table 9.
Module division experimental results
Case number | Software system name | Vertex number | Population number | Partition module |
---|---|---|---|---|
1 | Merge | 13 | 15 | 5 |
2 | Carch | 19 | 12 | 4 |
3 | Yahoo | 31 | 29 | 7 |
4 | Bitcoin | 138 | 108 | 9 |
5 | Emule | 283 | 236 | 13 |
6 | Svn | 302 | 197 | 19 |
7 | V8 | 734 | 494 | 134 |
In fact, real open source software systems are also experimentally analyzed in this paper. In addition, a number of other open source software systems are also verified by extensive experiments. Unlike theoretical data, large open source systems cannot visualize their information, and the experimental data are analyzed with the help of source code analysis tools (e.g., Understand, Sourcenav), including the size of the system’s source code, the number of classes, and the inter-class calls within the source code, to get the size of the experimental data and its complexity.
In this section, based on the resultant modules designed from the above experiments, the design results of the hMetis algorithm are compared with the design results of the hMetis algorithm to analyze the goodness of the design results in this paper. The so-called good and bad design results are balanced between a tighter association within the module and a lower coupling between the module and the module. In order to intuitively see the difference between the two design results, the visualization of the design results is realized, i.e., the design result modules and inter-module associations are displayed, and the average module degree between the design modules derived from the 2 algorithms is calculated. Taking the case 4 listed in this paper as an example, the design results based on hMetis and the design results based on this paper’s method are displayed as shown in Fig. 5, respectively. (a) and (b) are the modular design results of the hMetis algorithm and this paper’s algorithm, respectively, where the dots represent the resultant modules, and the size of which indicates how many vertices are contained in the module, respectively, and the various dots are labeled by different colors, which indicate that the connection tightness within the module degree, the darker the color, the stronger the degree of cohesion, in a way, this is a better design, it is recommended, the modules are connected by different thickness of the line, representing the degree of coupling between the modules, the thicker the line, the greater the degree of inter-module coupling, which is not recommended. In Fig. 5(a), the modules seem to be more uniform in size, but the connecting lines between the modules are thicker compared to Fig. 5(b), i.e., there is a large degree of inter-module coupling in Fig. 5(a).

Module division results
The values of segmented expected modularity obtained based on the 2 algorithms are shown in Fig. 6, and it can be seen that the design results obtained based on this paper’s algorithm have an expected modularity greater than that of the design based on the hMetis algorithm, i.e., this paper’s algorithm is better than the hMetis as a whole, and the reasonableness and effectiveness of this paper’s algorithm is verified.

Expectation module degree contrast
The purpose of algorithm analysis is to choose the right algorithm to solve a problem, and also to improve the algorithm. The same problem can usually be solved by different algorithms. Usually, evaluating the complexity of an algorithm consists of two main parts: the time complexity and the space complexity of the algorithm [24].
The PSPSO algorithm has been widely applied to many large-scale software system modular design problems, but there are few studies on the time complexity of the algorithm. In this section, we will initially explore the relationship between the time complexity of the PSPSO algorithm and the size of the problem and the population size model.
To facilitate the analysis, the concept of pheromone is introduced in the PSPSO algorithm to characterize the biological properties of the particle population. In this section, the time complexity of the simplified PSPSO algorithm will be analyzed based on the pheromone rate, which is a function of pheromone, and the concept of pheromone rate is briefly introduced below. For (
Because the PSPSO algorithm is more complex, and its time complexity is more difficult to realize through theoretical research, the probability selection function and pheromone update strategy of the PSPSO algorithm are simplified as follows.
Probability selection function:
In the probability selection function of the above equation, the effect of heuristic information is not considered, so the analysis of complexity in this section mainly considers the effect of pheromone rate
The pheromone update strategy is as follows:
The effect of the pheromone residual rate function is not considered in the pheromone update strategy.
In the following, the time complexity of the PSPSO algorithm will be initially analyzed based on the simplified model of the algorithm given above.
PSPSO-1: Set the number of particle swarms
Let
The derivation process is as follows:
In Eq. (21), to simplify the equation, let:
Based on the concept of pheromone rate, equation (21) can be further simplified as follows:
From equations (22), (23), and (24),
From equation (26), for
If
Based on the analysis of the convergence rate of the PSPSO algorithm, it can be seen that
From equation (27):
From Eq. (27), the pheromone rate at moment
Set
It is also obtained from equation (29):
From the probabilistic selection function of the PSPSO-1 algorithm Eq. (29), which defines
For
Because
From
According to equation (35),
Since the number of particle swarms in the PSPSO-1 algorithm is 1,
According to the above equation, it can be concluded that if the problem size
Based on the above case study of the PSPSO algorithm it can be concluded that the time complexity of the PSPSO algorithm increases as the problem size increases and decreases as the number of particle swarms increases.
The following is an analysis of the space complexity of the particle swarm optimization algorithm based on probabilistic selection. Taking
1) Experiment on the relationship between the average number of iterations of the PSPSO algorithm and the size of the problem being solved
Based on the analysis of the time complexity of the PSPSO algorithm, a specific probability selection function is given:
The relationship between the average number of iterations of the algorithm and the problem size is shown in Fig. 7. As the problem size increases, the number of iterations required by the algorithm to solve the optimal solution of the problem increases gradually. That is, as the size of the required problem increases, the time required by the particle swarm optimization algorithm based on probabilistic selection to solve the problem also increases gradually.
The relationship between scale of problem and average iteration number
2) Experiment on the relationship between the convergence time of the PSPSO algorithm and the number of particle swarms in the algorithm
Fig. 8 shows the convergence results corresponding to different numbers of particle swarms, where the horizontal coordinates indicate the number of iterations and the vertical coordinates indicate the convergence time of the resulting modular design algorithm. From the figure, it can be seen that within the same number of iterations, with the increase in the number of particle swarms, both in terms of solution quality and convergence time are better optimized. In summary, it can be seen that within the same number of iterations, the appropriate re-increase in the number of particle swarms will reduce the convergence time of the algorithm, i.e., the appropriate increase in the number of particle swarms will reduce the time complexity of the algorithm.
The convergence of the number of different particles
From the above experimental analysis, it can be concluded that the time complexity of the PSPSO algorithm can be effectively reduced by increasing the number of particle swarms and other methods. Besides, there are still some aspects that are commonly used in the process of algorithm complexity optimization. In this section, we will focus on methods such as dynamic programming, greedy algorithm and partitioning method to provide reference for the complexity optimization of PSPSO algorithm for module design of large-scale software systems.
Dynamic programming is a method of solving complex problems by decomposing them into subproblems. It stores the solution for each subproblem to avoid repeated computations. Dynamic programming is suitable for problems with overlapping subproblems and optimal substructure properties, such as Fibonacci series, shortest path problem and knapsack problem. The time complexity of dynamic programming is usually
A greedy algorithm is a method for solving globally optimal problems by choosing the current optimal solution at each step. Greedy algorithms are suitable for problems that satisfy the greedy selection property and the optimal substructure property, such as minimum number of generators, single-source shortest paths, and active selection. The time complexity of a greedy algorithm is usually
Partitioning is a method of solving complex problems by recursively decomposing them into subproblems. The partition method is suitable for problems with independent subproblems and merged solution properties, such as fast sorting, merge sorting and large integer multiplication. The time complexity of the partition method is usually
Based on the analysis and transformation of user requirements, this paper explores the modular design method for large-scale software systems, and explores the complexity and optimization strategy of the proposed algorithm from the time and space perspectives. The PSPSO modular design algorithm used in this paper has a tighter connection within the modules, stronger cohesion, lower inter-module coupling, and the expected modularity of the design results are greater than those of the hMetis algorithm in comparison. Under the condition that the initialization pheromone is set to