Accès libre

Modular design and algorithmic complexity optimization for large-scale software systems

  
17 mars 2025
À propos de cet article

Citez
Télécharger la couverture

Introduction

Software modularization design method is a design method that helps to improve the maintainability, testability and scalability of software systems. By adopting top-down decomposition method, bottom-up construction method, hierarchical design method, controlling module coupling, observing single responsibility principle and good interface design [1-4], a software system with clear structure and independent modules can be designed. These methods can help developers to develop, test and maintain software systems more efficiently and improve the quality of software development [5-7].

In the process of software development, program performance is a very important consideration. Program performance optimization can help us to reduce the time and space complexity, improve the responsiveness of the program and resource utilization. Algorithm is the core of the program, and its optimization has the most significant impact on program performance [8-11]. Reasonable selection and design of algorithms can greatly reduce time and space complexity. In computer science, algorithm complexity refers to the order of magnitude of operations required by an algorithm to solve a problem. As the size of the problem increases, the complexity of the algorithm also increases, which leads to problems such as slower program operation and more system resources [12-15]. Algorithm complexity is an important index to judge the effectiveness of the algorithm, the execution efficiency of the algorithm directly affects the speed of the program, which is one of the important knowledge points that programmers must master. Therefore, algorithm complexity optimization is an important means to improve program efficiency [16-19].

In this paper, the research is carried out from the analysis and transformation of users’ needs. An undirected weighted network is used to establish a user demand model and analyze the interaction relationship between user demands. The expert scoring method based on triangular fuzzy number is utilized to determine the degree of influence between nodes as the weights of connected edges in the weighted network. K-means clustering algorithm is introduced on the basis of weighted network, and user demand weights are obtained by normalizing the comprehensive eigenvalues of demand nodes to realize the clustering of user demand. Based on QFD, the fuzzy user requirements are summarized, classified and processed in terms of importance, and transformed into technical requirements that can be understood by engineers and technicians. The particle swarm optimization algorithm based on probabilistic selection is used for the modular design of large-scale software systems. Experimental analysis is carried out through theoretical data and actual open-source projects, and the modular design results are compared with those of the hMetis algorithm, highlighting the validity and reasonableness of this paper’s algorithm in the modular design of large-scale software systems. The complexity of this paper’s algorithm is analyzed according to the calculation results of time complexity and space complexity, exploring the relationship between the algorithm’s complexity and the problem size and population size, and giving the optimization method of the algorithm’s complexity by combining with the dynamic programming, greedy algorithm and partitioning method.

User demand analysis and transformation

The design of large-scale software systems in the field of intelligent software engineering needs to know the user’s demand information for the system in advance, and then collate and analyze the user’s functional requirements for the system, and then pre-set the system’s common parts and module parts for the user to choose. Demand acquisition has an important application value for the design and development of large-scale software systems, and how to fully consider user demand in the early stage of system design has become a key factor in deciding whether large-scale software systems can be competitive in the market.

User Requirements Acquisition
Characterization of user needs
Fuzzy

Since the user’s demand for the system is often based on perceptual understanding, with ambiguity and uncertainty, they are unlikely to have a comprehensive and detailed understanding of the structure of the demand system, functions, technical parameters and other aspects. When describing their customization requirements, they generally make conceptual descriptions based on their intuitive understanding of the system, or just make some qualitative requirements on system functions. Moreover, due to the fuzzy nature of the natural language used to describe the requirements, it is not possible to make a precise and quantitative description to transform the requirements into actual technical and functional parameters.

Dynamic

Users’ needs will be constantly changing and expanding with the improvement of economic level, environmental changes, and the updating of concepts. On the one hand, as the functions of large-scale systems on the market become more diversified, the original needs of users will change accordingly. On the other hand, with the system design, users will also change some of their own needs, for example, when the user in the early stages of the lack of a comprehensive understanding of the system attributes, features, in the design of the system found that the design and the needs of the existence of contradictions or does not meet the design requirements and other issues, it is necessary to analyze and deal with the appropriate modifications and adjustments.

Hiddenness

Users in many cases do not express their needs completely and clearly, or implicitly in their proposed requirements. On the one hand, users think that some requirements of the system are unnecessary. On the other hand, since the user is not fully aware of the system, it may not be accurate enough when making requests. If the user’s needs are not carefully identified and the design is directly in accordance with the user’s requirements, it is obvious that it does not really meet the user’s requirements.

User demand acquisition methods

Early in the pre-design stage of the system in order to obtain user requirements, the acquisition methods mainly include market research, which is the most basic acquisition method, and can be divided into inquiry survey and observation survey. Network media acquisition, including network questionnaire surveys, the collection of web-related data. Resource sharing of cooperative enterprises, collecting customer information through channel cooperation, achieving win-win cooperation with cooperative enterprises through resource sharing, and other ways. In the field of intelligent software engineering, with the deepening of informatization and intelligence, there are more and more ways to acquire user requirements, and the acquisition of demand information has the advantages of real-time, accuracy, convenience and so on.

With the rapid expansion of big data, it is increasingly important for the modular design of large-scale software systems, the modular design process of the system needs to have big data thinking, extraction, storage and analysis of external relevant big data. It can also organize and analyze the user-system database in the process of system design, effectively analyze user information, system information and behavioral data, and then get the user’s demand information.

Analysis and Transformation of Needs
Based on weighted network demand model construction
Weighted Complex Network Definition and Characteristics

Weighted network is an important manifestation of complex network, which indicates the closeness of the connection between entity nodes by giving certain weights to the edges. Considering the mutual influence between user demands, so this paper adopts the undirected weighted network to establish the user demand model [20]. The user requirements are abstracted as nodes in the undirected weighted network, and the degree of connection between the requirements is abstracted as the connecting edges with weights between the nodes, so as to analyze the interaction relationship between the user requirements. Complex networks usually contain statistical characteristic parameters such as node degree, node strength, average path length and clustering coefficients, regarding the commonly used features in weighted networks describing the density of connections between nodes in a localized range are as follows:

Degree

Degree is an important statistical indicator that all nodes in a network have, it refers to the number of edges connected to the current node, usually denoted as ki. If the degrees of all the nodes in the network are summed and averaged, there is kavg, which denotes the average degree of the network.

If given a certain network H, and its adjacency matrix A, there are: ki=j=1Naij=j=1Naji \[{{k}_{i}}=\sum\limits_{j=1}^{N}{{{a}_{ij}}}=\sum\limits_{j=1}^{N}{{{a}_{ji}}}\] kavg=1Ni=1Nki=1Ni,j=1Naij \[{{k}_{avg}}=\frac{1}{N}\sum\limits_{i=1}^{N}{{{k}_{i}}}=\frac{1}{N}\sum\limits_{i,j=1}^{N}{{{a}_{ij}}}\]

In a directed network, the degree of a node is composed of two parts: out-degree and in-degree. By out-degree, we mean the number of edges from the current node that point to the rest of the nodes, while in-degree is the opposite, the number of edges from the rest of the nodes that point to the current node. It is interesting to note that usually in directed networks, the average out-degree and the average in-degree of the network are the same.

Network sparsity

Whether a network is sparse or dense is usually measured using density. If there exists a network with N node and M connected edges, its density is defined as the ratio of the number of edges actually present in the network to the maximum possible number of edges, so that for a network H the value of its density ρ is: ρ=2MN(N1) \[\rho =\frac{2M}{N\left( N-1 \right)}\]

As can be seen from equation (3), the so-called “maximum possible number of edges” is actually the number of edges obtained by connecting each node in the network. When the density of the network is particularly small, that is, the actual existence of the edge is smaller than the possible existence of the edge, the network is a class of sparse networks, on the contrary, it is a dense network. Of course, the current academic community on the “far less than” standard is still controversial, it is generally believed that when the number of edges of a graph is less than the square of the number of nodes, the graph is said to be sparse.

Average path length

Before discussing the average path length, it is important to first clarify the question, i.e., what is the distance between two points? In a complex network, there are usually numerous paths between nodes, and among these paths, the shortest path is obtained by finding the one with the least number of edges. The length of the shortest path is called the distance between two points.

On this basis, the average path length is defined as the average shortest distance between any nodes in the network, which is usually denoted by L. If the distance between any two points is denoted by dij, then: L=2MN(N1)i>jdij \[L=\frac{2M}{N\left( N-1 \right)}\sum\limits_{i>j}{{{d}_{ij}}}\]

Note that the above definition applies to undirected unweighted networks, and in weighted networks, the length of the path with the smallest sum of weights is usually regarded as the distance between two points, which is quite different from the concept of the minimum number of edges in an undirected unweighted network.

Network diameter

Network diameter is another statistical measure of network size, usually expressed as D, and its statistical significance in complex networks is the maximum distance between any two points, i.e.: D=max(dij) \[D=\max \left( {{d}_{ij}} \right)\]

However, in reality, complex networks are usually not completely connected, and it is obviously unrealistic to use the above definition to calculate the network diameter, therefore, academics usually use another way to depict the network diameter, i.e., the “effective diameter”. Since a large complex network consists of partially connected components and some isolated nodes, the network diameter can be redefined by finding the number of connected pairs of nodes in the network whose distance does not exceed a certain value. Given that there are m connected node pairs in network G with a distance not exceeding a value of x, and that there are n connected node pairs in network G, the effective diameter is denoted as g(x): g(x)=mn \[g\left( x \right)=\frac{m}{n}\]

Typically, if x satisfies g(x–1)<0.9, g(x)≥0.9, then x is said to be the effective diameter of network G.

Clustering coefficient

The clustering coefficient is usually used to measure the degree of node clustering in a graph. The so-called node clustering, in short, means that the neighbor nodes of the neighbor nodes are still their neighbor nodes, then it means that there is a significant clustering phenomenon of these three nodes. It is defined as if the current node i has k edges connected to the rest of the nodes, then the ratio of the number of edges Ei that actually exist between the k nodes to the maximum number of edges possible is the clustering coefficient, i.e.: Cc=2Eik(k1) \[{{C}_{c}}=\frac{2{{E}_{i}}}{k\left( k-1 \right)}\]

If Em is regarded as a triangle with the current node as a fixed point, then another equivalent definition of the agglomeration coefficient is: Cc=TT \[{{C}_{c}}=\frac{{{T}'}}{T}\] where T′ is the number of triangles containing node i and T is the number of connected triples centered at node i. Note that “connected triad centered at node i” is defined as the three nodes are connected, i.e., between the three nodes, there are at least two edges that are not node self-loop edges.

Triangular fuzzy number assignment

In the weighted network, the degree of influence between nodes is different, and the existing results do not specialize in its determination. In this paper, the expert scoring method based on triangular fuzzy number is used to determine the degree of influence between nodes as the weights of the connected edges in the weighted network, i.e., the linguistic evaluations given by the experts are converted into triangular fuzzy numbers by the method of rough set, and then the triangular fuzzy numbers are converted into clear values.

Determining fuzzy semantic variables and corresponding triangular fuzzy numbers

When using the triangular fuzzy number M(l,m,μ), the corresponding fuzzy semantic variables should be defined first. The fuzzy semantic variables and their triangular fuzzy numbers applied in this paper are shown in Table 1.

Semantic variables based on triangular fuzzy numbers

Semantic variable Triangular ambiguity
l m μ
Very weak influence 0 0 0.1
Weak influence 0 0.1 0.3
General influence 0.3 0.5 0.7
Strong influence 0.7 0.9 1.0
Very strong influence 0.9 1.0 1.0

In the table, l, m and μ represent the minimum possible value, the most possible value and the maximum possible value of the triangular fuzzy number, respectively.

Based on the above analysis, a questionnaire is distributed so that the experts give the fuzzy semantic representation of the degree of influence between the nodes based on the node characteristics, attributes and other indicators, and is converted to the corresponding triangular fuzzy number of the form of Bkt=bkt(lkt,mkt,μkt)\[B_{k}^{t}=b_{k}^{t}(l_{k}^{t},m_{k}^{t},\mu _{k}^{t})\], which represents the evaluation of the strength of the kth relationship given by the tth expert.

Triangular fuzzy number defuzzification

The scoring data of multiple experts are combined into one data by calculating the average: b¯k=t=1Tbkt(lkt,mkt,μkt)T \[{{\bar{b}}_{k}}=\frac{\sum\limits_{t=1}^{T}{b_{k}^{t}\left( l_{k}^{t},m_{k}^{t},\mu _{k}^{t} \right)}}{T}\] Where: b¯k\[{{\bar{b}}_{k}}\] denotes the average of all the experts’ scores and is still a triangular fuzzy number, and T is the total number of experts.

Determining the degree of influence between nodes by defuzzification. In this paper, the strength of the degree of influence between nodes is determined in a desired manner.

First, the affiliation function of the triangular fuzzy number M(l,m,μ) is defined as fM(x)={ (xl)/(μl)lxm(xμ)/(mμ)mxμ0x<lorx>μ \[{{f}_{M}}\left( x \right)=\left\{ \begin{array}{*{35}{l}} {\left( x-l \right)}/{\left( \mu -l \right)}\; & l\le x\le m \\ {\left( x-\mu \right)}/{\left( m-\mu \right)}\; & m\le x\le \mu \\ 0 & x<lorx>\mu \\ \end{array} \right.\]

Then its left and right affiliation functions are: fML(x)=(xl)/(μl),fMR(x)=(xμ)/(mμ) \[f_{M}^{L}\left( x \right)={\left( x-l \right)}/{\left( \mu -l \right)}\;,f_{M}^{R}\left( x \right)={\left( x-\mu \right)}/{\left( m-\mu \right)}\;\]

The corresponding inverse functions are respectively: φML(y)=l+(ml)y,φMR(y)=μ+(mμ)y \[\varphi _{M}^{L}\left( y \right)=l+\left( m-l \right)y,\varphi _{M}^{R}\left( y \right)=\mu +\left( m-\mu \right)y\]

Therefore, the expectation of the triangular fuzzy number M(l,m,μ) is: I(M)=η01φML(y)dy+(1η)01φMR(y)dy=η2(l+m)+1η2(m+μ) \[\begin{align} & I\left( M \right)=\eta \int_{0}^{1}{\varphi _{M}^{L}}\left( y \right)dy+\left( 1-\eta \right)\int_{0}^{1}{\varphi _{M}^{R}}\left( y \right)dy \\ & =\frac{\eta }{2}\left( l+m \right)+\frac{1-\eta }{2}\left( m+\mu \right) \end{align}\] Where: 0≤η≤1 is the decision-maker’s “optimism and pessimism coefficient”. In order to ensure the general applicability and operability of the study, this paper makes η = 0.5, that is, assuming that the decision maker is neutral, at this time: I(M)=l+2m+μ4 \[I\left( M \right)=\frac{l+2m+\mu }{4}\] is the degree of inter-nodal influence after the triangular fuzzy number is refined, i.e., inter-nodal weight wij.

K-means based user requirement clustering

Through the clustering analysis of user requirements, complex requirements can be simplified, analyzed and processed into multiple categories, each category contains its own unique detailed requirements under each category, with a low degree of similarity of the requirements between different categories and a high degree of similarity of the requirements within the same category, which helps to carry out the system design work accurately, reduces the development cost, and improves the competitiveness of the system. In this paper, K-Means clustering algorithm is introduced on the basis of the weighted network model, and the user demand weights are obtained by normalizing the comprehensive eigenvalues of the demand nodes, the number of clusters is determined by using the elbow method, and the initial clustering centers are determined according to the size of the user demand weights, and then the clustering of the user demands is realized [21].

QFD-based user requirements transformation

The system design quality house is the core of quality function unfolding [22], and its role is to convert fuzzy user requirements into technical requirements that can be understood by engineers and technicians, and then waterfall decompose them into system part characteristics, development characteristics and quality characteristics, which is an effective tool for user requirements conversion. The complete system design quality house is shown in Fig. 1, the user requirements as inputs to the system design quality house is a hierarchical list of discrete and fuzzy user requirements collected from market research summarized, classified and processed in terms of importance, which is the key to the implementation of QFD.

Figure 1.

Structure of system design house of quality

Example analysis
Analysis of user needs

In recent years, with the increasing maturity of artificial intelligence technology, large-scale software system design is becoming more and more intelligent, which provides users with a lot of convenience. 2021, despite the overall consumer market performance downturn, but the survey data shows that the household appliances based on large-scale software system still shows growth, which set the self-cleaning technology of sweeping robots in the cleaning appliances market is in the core position. Compared with the past, the sweeping machine in the technical innovation makes it in each function has been greatly improved and upgraded, in the navigation and obstacle avoidance, combined with a variety of technologies to solve the past sweeping robots, “chaotic collision”, “leakage of sweeping” and other common problems, the use of modularized design and other artificial intelligence large-scale software system design. Modular design and other artificial intelligence large-scale software system design technology to improve the intelligent cleaning ability of the sweeping robot and the planning ability of the sweeping route to reduce the burden of labor, in addition to mopping, water tank capacity and other aspects have been greatly improved. In this paper, combining the previous industry reports of sweepers and consulting relevant experts and industry personnel, numerous demands are organized, and the detailed user demands are shown in Table 2.

User demand

Demand number Specific demand Demand number Specific demand
SR1 Strong cleaning ability SR11 Voice control
SR2 Monitoring failure information SR12 Time booking
SR3 Clean speed SR13 Small running sound
SR4 Anti-fall collision SR14 Lightness
SR5 Drag integration SR15 Simple operation
SR6 Mobile control SR16 Dump dust box
SR7 Automatic recharge SR17 Automatic cleaning mop
SR8 Standby time SR18 Automatic add cleaner
SR9 Fault prediction SR19 Large dust box
SR10 Price appropriate SR20 Maintenance convenience

For the determined user requirements, the triangular fuzzy number is firstly used to realize the transformation from semantic evaluation to interval value, and then the precise value of the relationship between each user requirement is obtained according to the defuzzification of the triangular fuzzy number. In order to ensure the accuracy and objectivity of the user demand relationship, in addition to selecting professional researchers, users are invited to participate in the evaluation of the demand relationship. In this paper, 10 experts and 10 users are selected to form an evaluation group to score the semantic evaluation of the interrelationships between demands from different perspectives and get the comprehensive evaluation value of the triangular fuzzy number as shown in Table 3.

User demand triangular fuzzy number comprehensive evaluation value

Demand number SR1 SR2 SR3 SR4 SR17 SR18 SR19 SR20
SR1 1 (0,0.03,0.12) (0.78,0.93,1) (0.32,0.58,0.74) 0 0 0 0
SR2 1 (0,0.08,0.17) (0,0.04,0.19) 0 0 (0.9,1.0,1.0) (0.28,0.47,0.63)
SR3 1 (0.16,0.34,0.58) (0,0.09,0.26) 0 (0,0,0.07) 0
SR4 1 (0.6,0.81,0.93) (0,0.15,0.33) 0 0
1
SR18 1 (0,0.05,0.17) 0 0
SR19 1 0 0
SR20 1 (0.54,0.71,0.82)
SR20 1

For the triangular fuzzy number comprehensive evaluation value given by the expert group, according to the formula of its defuzzification processing, to get the corresponding exact value, after the defuzzification of the user demand relationship comprehensive evaluation value as shown in Table 4, in which each specific value represents the degree of interaction between the user’s needs, the larger the value, it means that the two needs the greater the degree of mutual influence, when the value of the comprehensive evaluation of the value of 0, then it means that the two demands When the composite evaluation value is 0, it means that the two needs do not affect each other and are independent of each other.

User demand relationship comprehensive evaluation value

Demand number SR1 SR2 SR3 SR4 SR17 SR18 SR19 SR20
SR1 1 0.09 0.93 0.58 0 0 0 0
SR2 1 0.02 0.04 0 0 0.99 0.47
SR3 1 0.37 0.06 0 0.05 0
SR4 1 0.79 0.17 0 0
1
SR18 1 0.08 0 0
SR19 1 0 0
SR20 1 0.73
SR20 1

According to the formula for the eigenvalue of each attribute of the weighted network, the eigenvalue of the parameters of each demand node is calculated separately, and the weight of user demand is obtained by normalizing the comprehensive eigenvalue, and the results of the calculation of each parameter are shown in Table 5. Through the establishment of the demand network model, the mutual influence relationship between user demands is considered, and more accurate demand weights are obtained, thus laying the foundation for the clustering of demands and the transformation of demands.

Users Require Network Node Eigenvalues

Demand node Degrees Network sparseness Average path length Network diameter Concentration coefficient Weight
SR1 12 3.51 39.08 0.80 0.55 0.0658
SR2 10 5.21 45.65 0.91 0.50 0.0628
SR3 15 5.91 50.16 0.81 0.52 0.0637
SR4 12 5.24 45.98 0.86 0.54 0.0661
SR5 16 3.83 49.13 0.71 0.43 0.0537
SR6 10 4.17 22.49 0.83 0.50 0.0577
SR7 8 3.13 36.80 0.83 0.40 0.0550
SR8 9 3.62 43.00 0.75 0.48 0.0555
SR9 9 4.93 39.93 0.58 0.37 0.0499
SR10 17 7.08 53.58 0.61 0.40 0.0531
SR11 9 3.31 35.27 0.66 0.46 0.0550
SR12 9 2.86 38.62 0.64 0.40 0.0534
SR13 8 1.30 35.54 0.79 0.41 0.0476
SR14 6 1.59 25.22 0.68 0.37 0.0419
SR15 5 2.01 22.94 0.49 0.33 0.0396
SR16 5 2.21 22.80 0.69 0.39 0.0448
SR17 4 1.47 16.64 0.69 0.37 0.0439
SR18 3 1.31 15.05 0.61 0.37 0.0400
SR19 5 1.72 12.73 0.61 0.31 0.0457
SR20 8 2.67 29.35 0.61 0.34 0.0048
Clustering of user requirements

As can be seen from Table 5, the top-ranked demand nodes in terms of weight are SR4, SR1, SR3, SR2 and SR6, whose corresponding user demands are anti-fall collision, strong sweeping ability, fast cleaning speed, monitoring fault information, and movable operation and control, i.e., they are the methods that users are most concerned about, and prioritizing these kinds of demands in the design of large-scale software systems can provide users with a better experience and solution and helps to improve the system’s market competitiveness and user satisfaction.In the process of user demand clustering, this no shopping user demand as the initial clustering center candidate points, using the elbow method to determine the number of clusters, input the weight of the user demand used, apply MATLAB to calculate the output about the number of classifications k and the sum of squares of the error within the cluster, through which you can determine the optimal number of clusters of demand, and then use clustering algorithms to classify and group all the user demand, which helps to better understanding of user requirements, understanding the real needs and key pain points of users, and providing strong support and guidance for the development of a large-scale software system sweeper that is more composite of user expectations and needs.

Figure 2 shows the result graph of demand data processing through the elbow method, and there is an obvious inflection point at the classification number k = 3, which corresponds to the optimal number of clusters. According to the weight ordering of each demand and the optimal number of chiron classes in Table 5, the initial clustering center of the K-means algorithm is determined to be the demand node 4, node 1, and node 3, and the user demand is clustered through the K-means algorithm, and the clustering results are shown in Figure 3. The user requirements are clustered into three categories, respectively {SR1,SR2,SR3,SR4}, {SR5,SR6,SR7,SR8,SR9,SR10,SR11,SR12}, {SR13,SR14,SR15,SR16,SR17,SR18,SR19,SR20}. It can be seen that the first category of requirements are functional requirements, which are closer to the actual use of large-scale software system products and satisfy the basic large-scale functionality. The second category of requirements is emotional requirements, which are more satisfying to the users. The third category of requirements are additional requirements, which are important requirements to be considered behind the realization of system functions.

Figure 2.

Clustering deviation of different K values

Figure 3.

User demand clustering results

Transformation of user needs

In establishing the QFD model, the independence of each function of the product needs to be taken into account, and the independence of each function is guaranteed by using the independence axiom in axiomatic design. By analyzing the home sweeper, its main functions include sweeping and cleaning function, monitoring and planning function, intelligent control function, optimization function and human-machine interaction function. The improved QFD is used to establish the quality house of user requirements-product functions, and the degree of correlation between the parameters in the requirements-functions relationship matrix is represented by the numbers “1”, “3”, “5”, “7”, “9”, the larger the value indicates that the stronger the relationship, with ‘0’ indicates that there is no relationship between the two parameters, the function of the sweeper with FR1, FR2, FR3, FR4, FR5, respectively. The quality configuration relationship between the user needs of household floor sweepers and product features is shown in Table 6.

Relationship between User Requirements and Product Quality

Demand node Weight FR1 FR2 FR3 FR4 FR5
SR1 0.0658 7 3 5 9 5
SR2 0.0628 9 7 7 1 9
SR3 0.0637 5 5 3 3 3
SR4 0.0661 3 3 1 3 1
SR5 0.0537 0 0 0 5 0
SR6 0.0577 1 1 9 0 3
SR7 0.0550 3 0 0 2 7
SR8 0.0555 5 0 1 3 3
SR9 0.0499 3 5 1 5 1
SR10 0.0531 1 0 3 7 0
SR11 0.0550 5 3 5 3 0
SR12 0.0534 7 1 7 5 3
SR13 0.0476 3 7 7 3 5
SR14 0.0419 0 0 5 7 0
SR15 0.0396 0 5 3 9 0
SR16 0.0448 1 3 9 3 0
SR17 0.0439 0 1 3 1 3
SR18 0.0400 0 3 1 0 1
SR19 0.0457 3 0 0 3 5
SR20 0.0048 5 3 3 5 7

Table 6 gives the relationship matrix of user demand - product function, the absolute importance of each function is calculated by the demand weight and demand - function relationship matrix, the specific calculation process is for the column where the demand weight is located and the column where each product function is located, each row of data is weighted and summed to get the absolute importance of the product function, and then normalization is done to get the relative importance, the results are shown in Table 7. The sweeping and cleaning function is the most important in household sweepers, followed by the intelligent control function monitoring and planning function, which needs to be focused on, and finally the optimization function and human-computer interaction function, respectively. The design and implementation of optimization function and human-computer interaction function can improve the user experience and user satisfaction, and the calculation results of the importance of each function in Table 7 are in line with the engineering reality, which provides a strong data support for the modular design of large-scale software system that meets the user’s needs.

Product function importance

Importance FR1 FR2 FR3 FR4 FR5
Absolute importance 3.9543 3.1687 3.4832 2.5738 1.7064
Relative importance 0.2386 0.2438 0.2573 0.1354 0.1249
Modular design basis
Modular Design System

Modularization design is to gather the same or similar elements in the system together to form functionally independent and structurally complete units (modules), and combine these units according to the requirements to come up with a system that meets the needs of different users. For the modularization design of large-scale software system, it needs to go through a series of processes such as system requirement analysis, module division, module combination and module evaluation before it can be finally user-oriented. Therefore, the modular design system of large-scale software system is derived as shown in Fig. 4.

Figure 4.

Modular design system

Modular design of large-scale software systems needs to analyze the conceptual model of the system based on user requirements, that is, to carry out a preliminary decomposition of the system’s functional structure through market research, to complete the mapping of function to structure, and to prepare for the module division of the subsequent system. Module division is a comprehensive optimization process of multi-attribute analysis, through the study and analysis of modular design objectives and the formulation of relevant division guidelines, the use of appropriate module division methods to achieve module division. Module division of the module is divided into general module, special module, etc., design module interface and select the appropriate combination to carry out module combination, so as to get the required modular products. Module evaluation is a key step to judge whether the modular design scheme is reasonable or not, by analyzing the rationality of the functional structure of the module, modularity, complexity and other indicators, comprehensively analyze the advantages and disadvantages of the module scheme, so as to arrive at the optimal module division scheme of the system.

Key Elements of Modular Design
Module division

Module partitioning is the first step in modular design, which determines how to decompose the system into a number of relatively independent modules. Common ways of module division include division by function, division by hierarchy and division by domain. The goal of module partitioning is to reasonably allocate system functions to each module, so that each module can independently complete a specific function and the dependency relationship between modules is minimized.

Module Interfaces

Module interface is a bridge for communication between modules, which defines the interaction between modules. The design of module interfaces should follow the principles of simplicity, stability and medical use to ensure efficient and reliable communication between modules. Common interface design methods include abstract class or interface based design, event or message based design, and service or API based design.

Module implementation

Module implementation refers to how the functionality of a module is concretely realized. Common module implementation methods include object-oriented programming, functional programming and componentized programming. Object-oriented programming implements module functionality by defining classes and objects, and utilizes features such as encapsulation, integration, and polymorphism to ensure module independence. Functional programming implements the functionality of a module by means of pure and higher-order functions, emphasizing statelessness and invariance. Componentized programming implements the functionality of modules by defining components or plug-ins, supporting modular assembly and dynamic replacement.

Module combinations

Module combination is the combination of individual modules into a complete system. Methods of module combination include dependency injection, mediator patterns, and process or state machine combination methods. Dependency injection manages the dependencies between modules through the framework, realizing the dynamic combination and replacement of modules. Intermediary pattern manages the communication between modules through intermediary objects to realize decoupling and collaboration of modules. Process or state machine manages the execution order and state transitions of modules by defining processes or state machines to realize the orderly combination and control of modules.

Modular design algorithm
Particle swarm optimization algorithm based on probabilistic selection
Particle Swarm Optimization Algorithm

Particle swarm optimization algorithm is a global stochastic search algorithm based on group intelligence by simulating the migration and flocking behaviors of bird flocks during foraging [23]. Solving the software modular design problem using particle swarm optimization algorithm has to include key processes such as coding and initialization, position update, etc.

Coding and population initialization

For the modular design problem of a software system, each particle represents a feasible modularized design solution, represented by a modular design matrix. The initialization of the population in the elementary particle swarm algorithm is done in a random initialization manner, which does not make use of the a priori knowledge of the problem to be solved, and the initial solution tends to be poor. In the software system method network diagram, the methods in which direct calling relationships occur are connected to each other by edges. The more frequent the calling relationship, the closer the two methods are connected. The two methods that have indirect calling relationships are measured by the minimum number of edges passing between two vertices in the graph to measure the degree of connection between the two vertices. Therefore, in this paper, taking into account the characteristics of the software modular design problem, the probabilistic choice of initialization is combined with the principle of the shortest path of the graph, and the initialization is carried out in this way to improve the quality of the initial solution.

The call relationship graph of a software system is expressed as G = (V,R), assuming that the software system is designed as m modules, then in Figure G to find the first m degree of the largest and not directly adjacent to each other as the center of the software module vertices, find the m center vertices are expressed as V={v1,v2,,vk,,vm}(1km)\[{V}'=\{{{{v}'}_{1}},{{{v}'}_{2}},\cdots ,{{{v}'}_{k}},\cdots ,{{{v}'}_{m}}\}(1\le k\le m)\], and V′⊆V, respectively, calculate the shortest path from each vertex to the center of each module can be expressed as dist(vi,vk)(1km)\[dist({{v}_{i}},{{{v}'}_{k}})(1\le k\le m)\], compare dist(vi,v1),dist(vi,v2),dist(vi,vm)\[dist({{v}_{i}},{{{v}'}_{1}}),dist({{v}_{i}},{{{v}'}_{2}}),\cdots dist({{v}_{i}},{{{v}'}_{m}})\], with the shortest path as a probability of choosing the calculation of the vertex to which the vi belongs to the module center.

Position update method

Assuming that the position of the w st particle at the time of iteration to step t is denoted as ClusterMt=[ aM11taM12taM1ntaM21taM22taM2ntaMm1taMm2taMmnt ]$Cluster_{M}^{t}=\left[ \begin{matrix} a_{M11}^{t} & a_{M12}^{t} & \cdots & a_{M1n}^{t} \\ a_{M21}^{t} & a_{M22}^{t} & \cdots & a_{M2n}^{t} \\ \vdots & \vdots & \vdots & \vdots \\ a_{Mm1}^{t} & a_{Mm2}^{t} & \cdots & a_{Mmn}^{t} \\ \end{matrix} \right]$, it is necessary to determine the position code ClusterMt+1$Cluster_{M}^{t+1}$ at the time of iteration to step t+1, i.e., it is necessary to determine the module to which each method in particle w belongs at the time of iteration to step t+1. Suppose that to determine the module to which method fj(1≤jn) belongs, the value of the i(1≤im)th row and j(1≤jn)th column of ClusterMt$Cluster_{M}^{t}$ is 1, and the other values of the jth column are 0, i.e., aMijt=1$a_{Mij}^{t}=1$, aM1jt=0$a_{M1j}^{t}=0$, aM2jt=0$a_{M2j}^{t}=0$, aM3jt=0,$a_{M3j}^{t}=0,\cdots $, aMmjt=0$a_{Mmj}^{t}=0$. From which the matrix is constructed: Aωti=[ αω11tαω12t0αω1ntαω21tαω22t0αω2ntαωi1tαωi2t1αωintαωm1tαωm2t0αωmnt ] $A_{\omega }^{{{t}_{i}}}=\left[ \begin{matrix} \alpha _{\omega 11}^{t} & \alpha _{\omega 12}^{t} & \cdots & 0 & \cdots & \alpha _{\omega 1n}^{t} \\ \alpha _{\omega 21}^{t} & \alpha _{\omega 22}^{t} & \cdots & 0 & \cdots & \alpha _{\omega 2n}^{t} \\ \vdots & \vdots & {} & \vdots & {} & \vdots \\ \alpha _{\omega i1}^{t} & \alpha _{\omega i2}^{t} & \cdots & 1 & \cdots & \alpha _{\omega in}^{t} \\ \vdots & \vdots & {} & \vdots & {} & \vdots \\ \alpha _{\omega m1}^{t} & \alpha _{\omega m2}^{t} & \cdots & 0 & \cdots & \alpha _{\omega mn}^{t} \\ \end{matrix} \right]$

In total, m such matrix ClusterMti(1im)$Cluster_{M}^{{{t}_{i}}}\left( 1\le i\le m \right)$ can be constructed, and the fitness value for each positional code ClusterMti(1im)$Cluster_{M}^{{{t}_{i}}}\left( 1\le i\le m \right)$ is calculated according to Eq. (15) fitnessMi*\[fitnes{{s}_{M_{i}^{*}}}\] the probability value for selecting each positional code ClusterMti$Cluster_{M}^{{{t}_{i}}}$ is calculated separately probMi based on the calculated fitness value: probxi= fitnessxik=1mfitnessxk \[pro{{b}_{{{x}_{i}}}}=\frac{\text{ }fitnes{{s}_{{{x}_{i}}}}}{\sum\limits_{k=1}^{m}{fitnes{{s}_{{{x}_{k}}}}}}\]

Then method fj selects module i with probability probi, thus determining the module to which method fj belongs, and determining the module to which each method of the wth particle belongs, i.e., the value of Clusterxt+1$Clus-ter_{x}^{t+1}$.

Convergence analysis

Traditional particle swarm optimization algorithms are not globally convergent due to the iterative formula traversal being too directional. In this paper, we give a global convergence theorem for hybrid optimization algorithms.

Theorem 1.An iterative evolutionary algorithm must converge globally to the optimal solution with probability 1 if the iterative algorithm has the following 2 conditions.

Condition 1.Elite retention is used when comparing the historical optimal solution with the current solution: fi+1*={ f(x)f(x)<fi*fi*other \[f_{i+1}^{*}=\left\{ \begin{array}{*{35}{l}} f\left( x \right) & f\left( x \right)<f_{i}^{*} \\ f_{i}^{*} & other \\ \end{array} \right.\] where fi is the optimal value before the ind iteration.

Condition 2. the probability of transferring from any non-global optimal point (set to x2) to the corresponding level set L(x*) = {x|f(x)<f(x*),xS} is greater than zero, where S is the feasible solution space.

It is easy to see that the probabilistic selection particle swarm algorithm satisfies the 2 conditions stated in Theorem 1 above, so it globally converges to the optimal solution with probability 1. This is illustrated as follows: firstly, in the elite retention strategy of the particle swarm algorithm, the one with the largest value of the fitness is selected as gbest every time, which satisfies Condition 1. Secondly, the algorithm selects each possible location with a probabilistic value, and any two locations in the feasible solution space are reachable, at any moment, the probability of arriving at the other location from one location is greater than zero, with the ability to get rid of the local optimum, satisfying condition 2.

Experiments and Analysis of Results

This paper is based on theoretical data and actual open source projects for experimental analysis and comparison with the hMetis algorithm, in order to verify the software modular design algorithm and the rationality and effectiveness of this paper. The experimental data are shown in Table 8, which shows that the number of code of the experimentally selected software systems are all above 10000, which meets the standard of large-scale software systems.

Experimental data

Case number Software system name Class number Method number Number of calls Code number
1 Merge 103 138 113 12983
2 Carch 192 291 462 13482
3 Yahoo 231 584 593 15671
4 Bitcoin 576 4083 2194 19834
5 Emule 588 8326 3586 20976
6 Svn 784 9248 5735 25847
7 V8 932 11823 10842 39594
Experimental cases

This section from the theoretical data and the actual case of 2 aspects of the experimental analysis, and in the follow-up on the design results and other algorithms for comparison, to verify the reasonableness and effectiveness of this paper’s method.

Theoretically, and with a set of test cases, that is, reflecting the large-scale software system amount of source code, can obviously analyze the modular design situation. Then use the algorithm proposed in this paper to carry out modularization design, the two to verify the reasonableness of the algorithm in this paper. Among them, merge and Carch are theoretical data. Merge is mainly a merger of 2 independent programs. Modular design, obviously 2 independent programs are divided into their own modules, 2 sub-modules and then based on the modularity of the design of a finer-grained sub-module.Carch is the implementation of this paper’s algorithm. The experimental results of the design carried out for the software system in Table 8 are shown in Table 9.

Module division experimental results

Case number Software system name Vertex number Population number Partition module
1 Merge 13 15 5
2 Carch 19 12 4
3 Yahoo 31 29 7
4 Bitcoin 138 108 9
5 Emule 283 236 13
6 Svn 302 197 19
7 V8 734 494 134

In fact, real open source software systems are also experimentally analyzed in this paper. In addition, a number of other open source software systems are also verified by extensive experiments. Unlike theoretical data, large open source systems cannot visualize their information, and the experimental data are analyzed with the help of source code analysis tools (e.g., Understand, Sourcenav), including the size of the system’s source code, the number of classes, and the inter-class calls within the source code, to get the size of the experimental data and its complexity.

Analysis of results

In this section, based on the resultant modules designed from the above experiments, the design results of the hMetis algorithm are compared with the design results of the hMetis algorithm to analyze the goodness of the design results in this paper. The so-called good and bad design results are balanced between a tighter association within the module and a lower coupling between the module and the module. In order to intuitively see the difference between the two design results, the visualization of the design results is realized, i.e., the design result modules and inter-module associations are displayed, and the average module degree between the design modules derived from the 2 algorithms is calculated. Taking the case 4 listed in this paper as an example, the design results based on hMetis and the design results based on this paper’s method are displayed as shown in Fig. 5, respectively. (a) and (b) are the modular design results of the hMetis algorithm and this paper’s algorithm, respectively, where the dots represent the resultant modules, and the size of which indicates how many vertices are contained in the module, respectively, and the various dots are labeled by different colors, which indicate that the connection tightness within the module degree, the darker the color, the stronger the degree of cohesion, in a way, this is a better design, it is recommended, the modules are connected by different thickness of the line, representing the degree of coupling between the modules, the thicker the line, the greater the degree of inter-module coupling, which is not recommended. In Fig. 5(a), the modules seem to be more uniform in size, but the connecting lines between the modules are thicker compared to Fig. 5(b), i.e., there is a large degree of inter-module coupling in Fig. 5(a).

Figure 5.

Module division results

The values of segmented expected modularity obtained based on the 2 algorithms are shown in Fig. 6, and it can be seen that the design results obtained based on this paper’s algorithm have an expected modularity greater than that of the design based on the hMetis algorithm, i.e., this paper’s algorithm is better than the hMetis as a whole, and the reasonableness and effectiveness of this paper’s algorithm is verified.

Figure 6.

Expectation module degree contrast

Modular design algorithm complexity analysis and optimization
Algorithm complexity analysis

The purpose of algorithm analysis is to choose the right algorithm to solve a problem, and also to improve the algorithm. The same problem can usually be solved by different algorithms. Usually, evaluating the complexity of an algorithm consists of two main parts: the time complexity and the space complexity of the algorithm [24].

Time complexity

The PSPSO algorithm has been widely applied to many large-scale software system modular design problems, but there are few studies on the time complexity of the algorithm. In this section, we will initially explore the relationship between the time complexity of the PSPSO algorithm and the size of the problem and the population size model.

To facilitate the analysis, the concept of pheromone is introduced in the PSPSO algorithm to characterize the biological properties of the particle population. In this section, the time complexity of the simplified PSPSO algorithm will be analyzed based on the pheromone rate, which is a function of pheromone, and the concept of pheromone rate is briefly introduced below. For (i*,j*)∈s*, the pheromone rate is shown in the following equation: c(i,t)=τα(i*,j*,t)kDiτα(i*,k,t) \[c(i,t)=\frac{{{\tau }^{\alpha }}\left( {{i}^{*}},{{j}^{*}},t \right)}{\sum\limits_{k\in {{D}_{i}}}{{{\tau }^{\alpha }}}\left( {{i}^{*}},k,t \right)}\] Where τ(i*,j*,t) is the pheromone on the optimal path (i*,j*) at the moment of t, and Di is the set of all the paths that can be selected by the particle swarm located at point i.

Because the PSPSO algorithm is more complex, and its time complexity is more difficult to realize through theoretical research, the probability selection function and pheromone update strategy of the PSPSO algorithm are simplified as follows.

Probability selection function: P(i,j,t)={ τα(i,j,t)kDiτα(i,k,t)if(i,j)Di0otherwise $P\left( i,j,t \right)=\left\{ \begin{array}{*{35}{l}} \frac{{{\tau }^{\alpha }}\left( i,j,t \right)}{\sum\limits_{k\in {{D}_{i}}}{{{\tau }^{\alpha }}}\left( i,k,t \right)} & if\left( i,j \right)\in {{D}_{i}} \\ 0 & otherwise \\ \end{array} \right.$

In the probability selection function of the above equation, the effect of heuristic information is not considered, so the analysis of complexity in this section mainly considers the effect of pheromone rate c(i,t) on complexity. From Eq. (19), it can be seen that the probability selection function is a bounded function, which is in line with the definition of the probability selection function in the PSPSO algorithm.

The pheromone update strategy is as follows: τ(i,j,tm+1)=τ(i,j,tm)+μ(i,j,tm) \[\tau \left( i,j,{{t}_{m+1}} \right)=\tau \left( i,j,{{t}_{m}} \right)+\mu \left( i,j,{{t}_{m}} \right)\]

The effect of the pheromone residual rate function is not considered in the pheromone update strategy.

In the following, the time complexity of the PSPSO algorithm will be initially analyzed based on the simplified model of the algorithm given above.

PSPSO-1: Set the number of particle swarms k = 1, ∀(x,y)∈s, sSiter(t)(t = 1,2,⋯), the pheromone increment function μ(i,j,tm) is variable, and the probability selection function and pheromone updating strategy are shown in Eqs. (19) and (20).

Let ξIGACO−1(t) be the stochastic process of PSPSO-1 algorithm, and the expectation of pheromone rate at the moment of t+1 is E{c(i,t+1)|ξtGACo−1a(t)}. The relationship between this expectation and pheromone rate c(i,t) is derived and simplified by the following formula.

The derivation process is as follows: E{ c(i,t+1)|ξIGACO1(t) }= i1,y D(i1){ i1,i }[ τ(i1,y,t) i1,x D(i1)τ(i1,x,t)τ(i1,i,t)l ]+τ(i1,i,t) i1,x D(i1)τ(i1,x,t)τ(i1,i,t)+μ(i1,i,t)g= i1,y D(i1){ i1,i }[ τ(i1,i,t) i1,x D(i1)τ(i1,x,t)τ(i1,y,t)l ]+τ(i1,i,t) i1,x D(i1)τ(i1,x,t)τ(i1,i,t)+μ(i1,i,t)g \[\begin{align} & E\left\{ c\left( i,t+1 \right)\left| {{\xi }^{IGACO-1}} \right.\left( t \right) \right\} \\ & =\sum\limits_{\left\langle i-1,y \right\rangle \in D\left( i-1 \right)-\left\{ \left\langle i-1,i \right\rangle \right\}}{\left[ \frac{\tau \left( i-1,y,t \right)}{\sum\limits_{\left\langle i-1,x \right\rangle \in D\left( i-1 \right)}{\tau }\left( i-1,x,t \right)}\cdot \frac{\tau \left( i-1,i,t \right)}{l} \right]} \\ & +\frac{\tau \left( i-1,i,t \right)}{\sum\limits_{\left\langle i-1,x \right\rangle \in D\left( i-1 \right)}{\tau }\left( i-1,x,t \right)}\cdot \frac{\tau \left( i-1,i,t \right)+\mu \left( i-1,i,t \right)}{g} \\ & =\sum\limits_{\left\langle i-1,y \right\rangle \in D\left( i-1 \right)-\left\{ \left\langle i-1,i \right\rangle \right\}}{\left[ \frac{\tau \left( i-1,i,t \right)}{\sum\limits_{\left\langle i-1,x \right\rangle \in D\left( i-1 \right)}{\tau }\left( i-1,x,t \right)}\cdot \frac{\tau \left( i-1,y,t \right)}{l} \right]} \\ & +\frac{\tau \left( i-1,i,t \right)}{\sum\limits_{\left\langle i-1,x \right\rangle \in D\left( i-1 \right)}{\tau }\left( i-1,x,t \right)}\cdot \frac{\tau \left( i-1,i,t \right)+\mu \left( i-1,i,t \right)}{g} \\ \end{align}\]

In Eq. (21), to simplify the equation, let: l=μ(i1,y,t)+ i1,x D(i1)τ(i1,x,t) \[l=\mu \left( i-1,y,t \right)+\sum\limits_{\left\langle i-1,x \right\rangle \in D\left( i-1 \right)}{\tau }\left( i-1,x,t \right)\] g=μ(i1,i,t)+ i1,x D(i1)τ(i1,x,t) \[g=\mu \left( i-1,i,t \right)+\sum\limits_{\left\langle i-1,x \right\rangle \in D\left( i-1 \right)}{\tau }\left( i-1,x,t \right)\] h= i1,y D(i1){ i1,i }τ(i1,y,t)[μ(i1,i,t)μ(i1,y,t)]l \[h=\sum\limits_{\left\langle i-1,y \right\rangle \in D\left( i-1 \right)-\left\{ \left\langle i-1,i \right\rangle \right\}}{\frac{\tau \left( i-1,y,t \right)\cdot [\mu \left( i-1,i,t \right)-\mu \left( i-1,y,t \right)]}{l}}\]

Based on the concept of pheromone rate, equation (21) can be further simplified as follows: E{ c(i,t+1)|ξIGACO1(t) }=[ τ(i1,i,t)+μ(i1,i,t)g+ i1,y D(i1){ i1,i }τ(i1,y,t)l ]c(i,t)=[ τ(i1,i,t)+μ(i1,i,t)g+ i1,y D(i1){ i1,i }τ(i1,y,t)g+hg ]c(i,t)=[ μ(i1,i,t)+i1,xD(i1)τ(i1,i,t)g+hg ]c(i,t)=(gg+hg)c(i,t)=(1+hg)c(i,t) \[\begin{align} & E\left\{ c\left( i,t+1 \right)\left| {{\xi }^{IGACO-1}} \right.(t) \right\} \\ & =\left[ \frac{\tau \left( i-1,i,t \right)+\mu \left( i-1,i,t \right)}{g}+\sum\limits_{\left\langle i-1,y \right\rangle \in D\left( i-1 \right)-\left\{ \left\langle i-1,i \right\rangle \right\}}{\frac{\tau \left( i-1,y,t \right)}{l}} \right]\cdot c\left( i,t \right) \\ & =\left[ \frac{\tau \left( i-1,i,t \right)+\mu \left( i-1,i,t \right)}{g}+\sum\limits_{\left\langle i-1,y \right\rangle \in D\left( i-1 \right)-\left\{ \left\langle i-1,i \right\rangle \right\}}{\frac{\tau \left( i-1,y,t \right)}{g}+\frac{h}{g}} \right]\cdot c\left( i,t \right) \\ & =\left[ \frac{\mu \left( i-1,i,t \right)+\sum\limits_{\langle i-1,x\rangle \in D(i-1)}{\tau }\left( i-1,i,t \right)}{g}+\frac{h}{g} \right]\cdot c\left( i,t \right) \\ & =\left( \frac{g}{g}+\frac{h}{g} \right)\cdot c\left( i,t \right) \\ & =\left( 1+\frac{h}{g} \right)\cdot c\left( i,t \right) \\ \end{align}\]

From equations (22), (23), and (24), l, g, and h are functions of variables i and t, and in order to conveniently represent and highlight the relationship between the expectation of the pheromone rate at the next moment and the current pheromone rate, let the function ψ(i,t)=hg$\psi \left( i,t \right)=\frac{h}{g}$, then: E{ c(i,t+1)|ξIGACO1(t) }=[ 1+ψ(i,t) ]c(i,t) \[E\left\{ c\left( i,t+1 \right)\left| {{\xi }^{IGACO-1}} \right.\left( t \right) \right\}=\left[ 1+\psi \left( i,t \right) \right]\cdot c\left( i,t \right)\]

From equation (26), for i = 1,2,⋯,n, t = 1,2,⋯,E{c(i,t+1)|ξIGACO−1a(t)} is proportional to c(i,t). Also, according to the concept of pheromone rate, c(i,t) is proportional to the pheromone increment function μ(i–1,i,t–1), then it can be concluded that E{c(i,t+1)|ξIGACO−1a(t)} is also proportional to the pheromone increment function μ(i–1,i,t–1).

If ψ(i,t)≥ψ>0, then: E{ c(i,t)|ξIGACO1(t1) }(1+ψ)c(i,t1) \[E\left\{ c\left( i,t \right)\left| {{\xi }^{IGACO-1}} \right.\left( t-1 \right) \right\}\ge \left( 1+\psi \right)\cdot c\left( i,t-1 \right)\] E{ c(i,t)c(i,t1)|ξIGACO1(t1) }ψc(i,t1) \[E\left\{ c\left( i,t \right)-c\left( i,t-1 \right)\left| {{\xi }^{IGACO-1}} \right.\left( t-1 \right) \right\}\ge \psi \cdot c\left( i,t-1 \right)\]

Based on the analysis of the convergence rate of the PSPSO algorithm, it can be seen that { ξIGACO1(t) }t=0+$\left\{ {{\xi }^{IGACO-1}}\left( t \right) \right\}_{t=0}^{+\infty }$ is an absorbing state Markov process, so: c(i,t)E{ c(i,t)|ξIGACO-1(0),ξIGACO-1(1),,ξIGACO-1(t1) }=E{ c(i,t)|ξIGACO-1(t1) } \[\begin{align} & c\left( i,t \right)\approx E\left\{ c\left( i,t \right)\left| {{\xi }^{IGACO-1}} \right.\left( 0 \right),{{\xi }^{IGACO-1}}\left( 1 \right),\cdots ,{{\xi }^{IGACO-1}}\left( t-1 \right) \right\} \\ & =E\left\{ c\left( i,t \right)\left| {{\xi }^{IGACO-1}}\left( t-1 \right) \right. \right\} \end{align}\]

From equation (27): E{ c(i,t)|ξIGACO1(0),ξIGACO1(1),,ξIGACO-1(t1) }=E{ c(i,t)|ξIGACO1(t1) }(1+ψ)c(i,t1)(1+ψ)E{ c(i,t)|ξIGACO1(t2) }(1+ψ)2c(i,t2)(1+ψ)tc(i,0) \[\begin{align} & E\left\{ c\left( i,t \right)\left| {{\xi }^{IGACO-1}} \right.\left( 0 \right),{{\xi }^{IGACO-1}}\left( 1 \right),\cdots ,{{\xi }^{IGACO-1}}\left( t-1 \right) \right\} \\ & =E\left\{ c\left( i,t \right)\left| {{\xi }^{IGACO-1}} \right.\left( t-1 \right) \right\} \\ & \ge \left( 1+\psi \right)\cdot c\left( i,t-1 \right) \\ & \approx \left( 1+\psi \right)\cdot E\left\{ c\left( i,t \right)\left| {{\xi }^{IGACO-1}} \right.\left( t-2 \right) \right\} \\ & \ge {{\left( 1+\psi \right)}^{2}}\cdot c\left( i,t-2 \right) \\ & \vdots \\ & \ge {{\left( 1+\psi \right)}^{t}}\cdot c(i,0) \\ \end{align}\]

From Eq. (27), the pheromone rate at moment t = 0 is c(i,0) = 1/n, so Eq. (30) can be further simplified as: E{ c(i,t)|ξIGACO1(0),ξIGACO1(1),,ξIGACO1(t1) }(1+ψ)tn \[E\left\{ c\left( i,t \right)\left| {{\xi }^{IGACO-1}} \right.\left( 0 \right),{{\xi }^{IGACO-1}}\left( 1 \right),\cdots ,{{\xi }^{IGACO-1}}\left( t-1 \right) \right\}\ge \frac{{{\left( 1+\psi \right)}^{t}}}{n}\] tlog1+ψ(nE{ c(i,t)|ξIGACO1(0),ξIGACO1(1),,ξIGACO1(t1) }) \[t\le {{\log }_{1+\psi }}\left( n\cdot E\left\{ c\left( i,t \right)\left| {{\xi }^{IGACO-1}} \right.\left( 0 \right),{{\xi }^{IGACO-1}}\left( 1 \right),\cdots ,{{\xi }^{IGACO-1}}\left( t-1 \right) \right\} \right)\]

Set clow>c(i,j,0) = 1/n, and when tt0, c(i,t)≥clow, and when 0<tt0, c(i,t)≤clow. According to equation (32): E[ t0|ξIGACO1(0) ]log1+ψ(nE{ c(i,t)|ξIGACO1(0),ξIGACO1(1),,ξIGACO1(t1) }) \[\begin{align} & E\left[ {{t}_{0}}\left| {{\xi }^{IGACO-1}} \right.\left( 0 \right) \right] \\ & \le {{\log }_{1+\psi }}\left( n\cdot E\left\{ c\left( i,t \right)\left| {{\xi }^{IGACO-1}} \right.\left( 0 \right),{{\xi }^{IGACO-1}}\left( 1 \right),\cdots ,{{\xi }^{IGACO-1}}\left( t-1 \right) \right\} \right) \\ \end{align}\]

It is also obtained from equation (29): E[ t0|ξIGACO1(0) ]=log1+ψ(nE{ c(i,t)|ξIGACO1(t1) })=log1+φ(nc(i,t))log1+φ(nclow) \[\begin{align} & E\left[ {{t}_{0}}\left| {{\xi }^{IGACO-1}} \right.\left( 0 \right) \right]={{\log }_{1+\psi }}\left( n\cdot E\left\{ c\left( i,t \right)\left| {{\xi }^{IGACO-1}} \right.\left( t-1 \right) \right\} \right) \\ & ={{\log }_{1+\varphi }}\left( n\cdot c\left( i,t \right) \right) \\ & \le {{\log }_{1+\varphi }}\left( n\cdot {{c}_{low}} \right) \end{align}\]

From the probabilistic selection function of the PSPSO-1 algorithm Eq. (29), which defines ω(clow,t0) as an upper bound for according to Eq. (18), we have: ω(clow,t0)=[ 1ϕ(0) ][ 1(1clown)m ]1 \[\omega \left( {{c}_{low}},{{t}_{0}} \right)=\left[ 1-\phi \left( 0 \right) \right]{{\left[ 1-{{\left( 1-c_{low}^{n} \right)}^{m}} \right]}^{-1}}\]

For t′ = 1,2,⋯, i = 1,2,⋯,n, let t′ = t0t, c(i,t′)≥clow(t0)>0. For when t>t0, c(i,t)>clow>0, by Eq. (35): Eλt0+ω(clow,t0) \[E\lambda \le {{t}_{0}}+\omega \left( {{c}_{low}},{{t}_{0}} \right)\]

Because Eλ=t=0+[ (1ϕ(0))i=1i(1pi) ]$E\lambda =\sum\limits_{t=0}^{+\infty }{\left[ (1-\phi (0))\prod\limits_{i=1}^{i}{\left( 1-{{p}_{i}} \right)} \right]}$, it follows from the definition of the process model of the PSPSO algorithm that ξIGACO−1(0) = (S(0),T(0)), S(0) = {Sbs(0)∪Siter(0)} = ∅ because during the initialization of the algorithm, S(t) = {Sbs(t)∪Siter(t)} = ∅, so ϕ(0) = 0. Then Eλ=t=0+[ i=1t(1pi) ]$E\lambda =\sum\limits_{t=0}^{+\infty }{\left[ \prod\limits_{i=1}^{t}{\left( 1-{{p}_{i}} \right)} \right]}$.

From ϕ(0) = 0, it follows that P{ξIGACO−1(0)∈Y*} = 0, so Eλ=t=0+[ i=1t(1pi) ]$E\lambda =\sum\limits_{t=0}^{+\infty }{\left[ \prod\limits_{i=1}^{t}{\left( 1-{{p}_{i}} \right)} \right]}$ is independent of ξIGACO−1(0). This gives E[|ξIGACO−1(0)] = .

According to equation (35), ω(clow,t0) is independent of ξIGACO−1(0), so E[ω(clow,t0)|ξ(0)] = ω(clow,t0). Summarizing the above, we have: EλE[ t0|ξIGACO-1(0) ]+ω(clow,t0) \[E\lambda \le E\left[ {{t}_{0}}\left| {{\xi }^{IGACO-1}} \right.\left( 0 \right) \right]+\omega \left( {{c}_{low}},{{t}_{0}} \right)\]

Since the number of particle swarms in the PSPSO-1 algorithm is 1, m = 1 in Eq. (35) and ϕ(0) = 0, the following equation is obtained: ω(clown)=clown \[\omega \left( c_{low}^{n} \right)=c_{low}^{-n}\] can be obtained from Eqs. (34), (37), and (38): EλIGACO1log(1+ψ)(nclow)+clown \[E{{\lambda }^{IGACO-1}}\le {{\log }_{\left( 1+\psi \right)}}\left( n\cdot {{c}_{low}} \right)+c_{low}^{-n}\]

According to the above equation, it can be concluded that if the problem size n becomes larger, more running time is required.

Based on the above case study of the PSPSO algorithm it can be concluded that the time complexity of the PSPSO algorithm increases as the problem size increases and decreases as the number of particle swarms increases.

Space complexity

The following is an analysis of the space complexity of the particle swarm optimization algorithm based on probabilistic selection. Taking n large-scale software system modular design problem as an example, m particle swarms are used in the PSPSO algorithm, then through the analysis of each step of the particle swarm optimization algorithm based on probabilistic selection, its space complexity can be obtained as: S(n)=O(n2)+O(nm) \[S\left( n \right)=O\left( {{n}^{2}} \right)+O\left( n\cdot m \right)\]

Algorithm Complexity Experiments

1) Experiment on the relationship between the average number of iterations of the PSPSO algorithm and the size of the problem being solved

Based on the analysis of the time complexity of the PSPSO algorithm, a specific probability selection function is given: P(i,j,t)=[r=12wrτr(i,j,t)]αlDik[r=12wrτr(i,j,t)]α,(i,j)Di \[P(i,j,t)=\frac{{{[\sum\limits_{r=1}^{2}{{{w}_{r}}{{\tau }^{r}}(i,j,t)}]}^{\alpha }}}{\sum\limits_{l\in D_{i}^{k}}{{{[\sum\limits_{r=1}^{2}{{{w}_{r}}{{\tau }^{r}}(i,j,t)}]}^{\alpha }}}},(i,j)\in {{D}_{i}}\] where w1·τ1(i,j,t)+w2·τ2(i,j,t) replaces the single pheromone τ(i,j,t), w1 and w2 in the modular design problem for small-scale software systems, corresponding to the weight coefficients corresponding to the modular design goals. The relevant parameters are set as follows: initialization pheromone τ0 = 15,α = 3,β =7,γ0 = 6,λ = 0.05, in the pheromone increment function A = 1.03. number of particle swarms set in the algorithm m = 10, w1 = w2 = 0.5.

The relationship between the average number of iterations of the algorithm and the problem size is shown in Fig. 7. As the problem size increases, the number of iterations required by the algorithm to solve the optimal solution of the problem increases gradually. That is, as the size of the required problem increases, the time required by the particle swarm optimization algorithm based on probabilistic selection to solve the problem also increases gradually.

Figure 7.

The relationship between scale of problem and average iteration number

2) Experiment on the relationship between the convergence time of the PSPSO algorithm and the number of particle swarms in the algorithm

Fig. 8 shows the convergence results corresponding to different numbers of particle swarms, where the horizontal coordinates indicate the number of iterations and the vertical coordinates indicate the convergence time of the resulting modular design algorithm. From the figure, it can be seen that within the same number of iterations, with the increase in the number of particle swarms, both in terms of solution quality and convergence time are better optimized. In summary, it can be seen that within the same number of iterations, the appropriate re-increase in the number of particle swarms will reduce the convergence time of the algorithm, i.e., the appropriate increase in the number of particle swarms will reduce the time complexity of the algorithm.

Figure 8.

The convergence of the number of different particles

Algorithm complexity optimization strategy

From the above experimental analysis, it can be concluded that the time complexity of the PSPSO algorithm can be effectively reduced by increasing the number of particle swarms and other methods. Besides, there are still some aspects that are commonly used in the process of algorithm complexity optimization. In this section, we will focus on methods such as dynamic programming, greedy algorithm and partitioning method to provide reference for the complexity optimization of PSPSO algorithm for module design of large-scale software systems.

Dynamic planning

Dynamic programming is a method of solving complex problems by decomposing them into subproblems. It stores the solution for each subproblem to avoid repeated computations. Dynamic programming is suitable for problems with overlapping subproblems and optimal substructure properties, such as Fibonacci series, shortest path problem and knapsack problem. The time complexity of dynamic programming is usually O(n2) or O(n3), but lower complexity can be achieved with appropriate optimization. For example, the geodesic formula for the Fibonacci series can be optimized to a time complexity of O(n) by memetic recursion.

Greedy Algorithm

A greedy algorithm is a method for solving globally optimal problems by choosing the current optimal solution at each step. Greedy algorithms are suitable for problems that satisfy the greedy selection property and the optimal substructure property, such as minimum number of generators, single-source shortest paths, and active selection. The time complexity of a greedy algorithm is usually O(nlog(n)) or O(n2), depending on the specific implementation. For example, the time complexity of the Kruskal algorithm for solving the minimum number of generators is O(E log v), where E is the number of edges and v is the number of vertices. The advantage of the greedy algorithm is its simplicity and efficiency, but it may not be able to obtain a globally optimal solution in a certain write situation.

Partitioning

Partitioning is a method of solving complex problems by recursively decomposing them into subproblems. The partition method is suitable for problems with independent subproblems and merged solution properties, such as fast sorting, merge sorting and large integer multiplication. The time complexity of the partition method is usually O(nlog(n)) or O(n2), depending on the specific decomposition and merging strategies. For example, fast sorting has an average time complexity of O(nlog(n)), but in the worst case may degenerate to O(n2). The advantage of the partitioning method is its efficiency and flexibility for many different types of problems.

Conclusion

Based on the analysis and transformation of user requirements, this paper explores the modular design method for large-scale software systems, and explores the complexity and optimization strategy of the proposed algorithm from the time and space perspectives. The PSPSO modular design algorithm used in this paper has a tighter connection within the modules, stronger cohesion, lower inter-module coupling, and the expected modularity of the design results are greater than those of the hMetis algorithm in comparison. Under the condition that the initialization pheromone is set to τ0 = 15,α = 3,β = 7,γ0 = 6, λ = 0.05 and the number of particle swarms is m = 10, the number of iterations of the algorithm to find the optimal solution to the large-scale software system design problem increases gradually with the increase of the problem scale, i.e., the complexity of the algorithm shows an upward trend with the expansion of the problem scale. In the same number of iterations, the increase in the number of particle swarms will narrow the convergence time of the algorithm and reduce the complexity of the algorithm. The results of this study show that the complexity of the modular design algorithm can be optimized by reasonably controlling the problem size and appropriately increasing the number of particle swarms, so as to improve the convenience and reliability of large-scale software system design.