Modular design and algorithmic complexity optimization for large-scale software systems

Software modularization design method is a design method that helps to improve the maintainability, testability and scalability of software systems. By adopting top-down decomposition method, bottom-up construction method, hierarchical design method, controlling module coupling, observing single responsibility principle and good interface design [1-4], a software system with clear structure and independent modules can be designed. These methods can help developers to develop, test and maintain software systems more efficiently and improve the quality of software development [5-7].

In the process of software development, program performance is a very important consideration. Program performance optimization can help us to reduce the time and space complexity, improve the responsiveness of the program and resource utilization. Algorithm is the core of the program, and its optimization has the most significant impact on program performance [8-11]. Reasonable selection and design of algorithms can greatly reduce time and space complexity. In computer science, algorithm complexity refers to the order of magnitude of operations required by an algorithm to solve a problem. As the size of the problem increases, the complexity of the algorithm also increases, which leads to problems such as slower program operation and more system resources [12-15]. Algorithm complexity is an important index to judge the effectiveness of the algorithm, the execution efficiency of the algorithm directly affects the speed of the program, which is one of the important knowledge points that programmers must master. Therefore, algorithm complexity optimization is an important means to improve program efficiency [16-19].

In this paper, the research is carried out from the analysis and transformation of users’ needs. An undirected weighted network is used to establish a user demand model and analyze the interaction relationship between user demands. The expert scoring method based on triangular fuzzy number is utilized to determine the degree of influence between nodes as the weights of connected edges in the weighted network. K-means clustering algorithm is introduced on the basis of weighted network, and user demand weights are obtained by normalizing the comprehensive eigenvalues of demand nodes to realize the clustering of user demand. Based on QFD, the fuzzy user requirements are summarized, classified and processed in terms of importance, and transformed into technical requirements that can be understood by engineers and technicians. The particle swarm optimization algorithm based on probabilistic selection is used for the modular design of large-scale software systems. Experimental analysis is carried out through theoretical data and actual open-source projects, and the modular design results are compared with those of the hMetis algorithm, highlighting the validity and reasonableness of this paper’s algorithm in the modular design of large-scale software systems. The complexity of this paper’s algorithm is analyzed according to the calculation results of time complexity and space complexity, exploring the relationship between the algorithm’s complexity and the problem size and population size, and giving the optimization method of the algorithm’s complexity by combining with the dynamic programming, greedy algorithm and partitioning method.

2

User demand analysis and transformation

The design of large-scale software systems in the field of intelligent software engineering needs to know the user’s demand information for the system in advance, and then collate and analyze the user’s functional requirements for the system, and then pre-set the system’s common parts and module parts for the user to choose. Demand acquisition has an important application value for the design and development of large-scale software systems, and how to fully consider user demand in the early stage of system design has become a key factor in deciding whether large-scale software systems can be competitive in the market.

2.1

User Requirements Acquisition

2.1.1

Characterization of user needs

1)

Fuzzy

Since the user’s demand for the system is often based on perceptual understanding, with ambiguity and uncertainty, they are unlikely to have a comprehensive and detailed understanding of the structure of the demand system, functions, technical parameters and other aspects. When describing their customization requirements, they generally make conceptual descriptions based on their intuitive understanding of the system, or just make some qualitative requirements on system functions. Moreover, due to the fuzzy nature of the natural language used to describe the requirements, it is not possible to make a precise and quantitative description to transform the requirements into actual technical and functional parameters.

2)

Dynamic

Users’ needs will be constantly changing and expanding with the improvement of economic level, environmental changes, and the updating of concepts. On the one hand, as the functions of large-scale systems on the market become more diversified, the original needs of users will change accordingly. On the other hand, with the system design, users will also change some of their own needs, for example, when the user in the early stages of the lack of a comprehensive understanding of the system attributes, features, in the design of the system found that the design and the needs of the existence of contradictions or does not meet the design requirements and other issues, it is necessary to analyze and deal with the appropriate modifications and adjustments.

3)

Hiddenness

Users in many cases do not express their needs completely and clearly, or implicitly in their proposed requirements. On the one hand, users think that some requirements of the system are unnecessary. On the other hand, since the user is not fully aware of the system, it may not be accurate enough when making requests. If the user’s needs are not carefully identified and the design is directly in accordance with the user’s requirements, it is obvious that it does not really meet the user’s requirements.

2.1.2

User demand acquisition methods

Early in the pre-design stage of the system in order to obtain user requirements, the acquisition methods mainly include market research, which is the most basic acquisition method, and can be divided into inquiry survey and observation survey. Network media acquisition, including network questionnaire surveys, the collection of web-related data. Resource sharing of cooperative enterprises, collecting customer information through channel cooperation, achieving win-win cooperation with cooperative enterprises through resource sharing, and other ways. In the field of intelligent software engineering, with the deepening of informatization and intelligence, there are more and more ways to acquire user requirements, and the acquisition of demand information has the advantages of real-time, accuracy, convenience and so on.

With the rapid expansion of big data, it is increasingly important for the modular design of large-scale software systems, the modular design process of the system needs to have big data thinking, extraction, storage and analysis of external relevant big data. It can also organize and analyze the user-system database in the process of system design, effectively analyze user information, system information and behavioral data, and then get the user’s demand information.

2.2

Analysis and Transformation of Needs

2.2.1

Based on weighted network demand model construction

1)

Weighted Complex Network Definition and Characteristics

Weighted network is an important manifestation of complex network, which indicates the closeness of the connection between entity nodes by giving certain weights to the edges. Considering the mutual influence between user demands, so this paper adopts the undirected weighted network to establish the user demand model [20]. The user requirements are abstracted as nodes in the undirected weighted network, and the degree of connection between the requirements is abstracted as the connecting edges with weights between the nodes, so as to analyze the interaction relationship between the user requirements. Complex networks usually contain statistical characteristic parameters such as node degree, node strength, average path length and clustering coefficients, regarding the commonly used features in weighted networks describing the density of connections between nodes in a localized range are as follows:

(1)

Degree

Degree is an important statistical indicator that all nodes in a network have, it refers to the number of edges connected to the current node, usually denoted as k_i. If the degrees of all the nodes in the network are summed and averaged, there is k_avg, which denotes the average degree of the network.

If given a certain network H, and its adjacency matrix A, there are: (1) $k_{i} = \sum_{j = 1}^{N} a_{i j} = \sum_{j = 1}^{N} a_{j i}$ \[{{k}_{i}}=\sum\limits_{j=1}^{N}{{{a}_{ij}}}=\sum\limits_{j=1}^{N}{{{a}_{ji}}}\] (2) $k_{a v g} = \frac{1}{N} \sum_{i = 1}^{N} k_{i} = \frac{1}{N} \sum_{i, j = 1}^{N} a_{i j}$ \[{{k}_{avg}}=\frac{1}{N}\sum\limits_{i=1}^{N}{{{k}_{i}}}=\frac{1}{N}\sum\limits_{i,j=1}^{N}{{{a}_{ij}}}\]

In a directed network, the degree of a node is composed of two parts: out-degree and in-degree. By out-degree, we mean the number of edges from the current node that point to the rest of the nodes, while in-degree is the opposite, the number of edges from the rest of the nodes that point to the current node. It is interesting to note that usually in directed networks, the average out-degree and the average in-degree of the network are the same.

(2)

Network sparsity

Whether a network is sparse or dense is usually measured using density. If there exists a network with N node and M connected edges, its density is defined as the ratio of the number of edges actually present in the network to the maximum possible number of edges, so that for a network H the value of its density ρ is: (3) $ρ = \frac{2 M}{N (N - 1)}$ \[\rho =\frac{2M}{N\left( N-1 \right)}\]

As can be seen from equation (3), the so-called “maximum possible number of edges” is actually the number of edges obtained by connecting each node in the network. When the density of the network is particularly small, that is, the actual existence of the edge is smaller than the possible existence of the edge, the network is a class of sparse networks, on the contrary, it is a dense network. Of course, the current academic community on the “far less than” standard is still controversial, it is generally believed that when the number of edges of a graph is less than the square of the number of nodes, the graph is said to be sparse.

(3)

Average path length

Before discussing the average path length, it is important to first clarify the question, i.e., what is the distance between two points? In a complex network, there are usually numerous paths between nodes, and among these paths, the shortest path is obtained by finding the one with the least number of edges. The length of the shortest path is called the distance between two points.

On this basis, the average path length is defined as the average shortest distance between any nodes in the network, which is usually denoted by L. If the distance between any two points is denoted by d_ij, then: (4) $L = \frac{2 M}{N (N - 1)} \sum_{i > j} d_{i j}$ \[L=\frac{2M}{N\left( N-1 \right)}\sum\limits_{i>j}{{{d}_{ij}}}\]

Note that the above definition applies to undirected unweighted networks, and in weighted networks, the length of the path with the smallest sum of weights is usually regarded as the distance between two points, which is quite different from the concept of the minimum number of edges in an undirected unweighted network.

(4)

Network diameter

Network diameter is another statistical measure of network size, usually expressed as D, and its statistical significance in complex networks is the maximum distance between any two points, i.e.: (5) $D = \max (d_{i j})$ \[D=\max \left( {{d}_{ij}} \right)\]

However, in reality, complex networks are usually not completely connected, and it is obviously unrealistic to use the above definition to calculate the network diameter, therefore, academics usually use another way to depict the network diameter, i.e., the “effective diameter”. Since a large complex network consists of partially connected components and some isolated nodes, the network diameter can be redefined by finding the number of connected pairs of nodes in the network whose distance does not exceed a certain value. Given that there are m connected node pairs in network G with a distance not exceeding a value of x, and that there are n connected node pairs in network G, the effective diameter is denoted as g(x): (6) $g (x) = \frac{m}{n}$ \[g\left( x \right)=\frac{m}{n}\]

Typically, if x satisfies g(x–1)<0.9, g(x)≥0.9, then x is said to be the effective diameter of network G.

(5)

Clustering coefficient

The clustering coefficient is usually used to measure the degree of node clustering in a graph. The so-called node clustering, in short, means that the neighbor nodes of the neighbor nodes are still their neighbor nodes, then it means that there is a significant clustering phenomenon of these three nodes. It is defined as if the current node i has k edges connected to the rest of the nodes, then the ratio of the number of edges E_i that actually exist between the k nodes to the maximum number of edges possible is the clustering coefficient, i.e.: (7) $C_{c} = \frac{2 E_{i}}{k (k - 1)}$ \[{{C}_{c}}=\frac{2{{E}_{i}}}{k\left( k-1 \right)}\]

If E_m is regarded as a triangle with the current node as a fixed point, then another equivalent definition of the agglomeration coefficient is: (8) $C_{c} = \frac{T^{'}}{T}$ \[{{C}_{c}}=\frac{{{T}'}}{T}\] where T′ is the number of triangles containing node i and T is the number of connected triples centered at node i. Note that “connected triad centered at node i” is defined as the three nodes are connected, i.e., between the three nodes, there are at least two edges that are not node self-loop edges.

2)

Triangular fuzzy number assignment

In the weighted network, the degree of influence between nodes is different, and the existing results do not specialize in its determination. In this paper, the expert scoring method based on triangular fuzzy number is used to determine the degree of influence between nodes as the weights of the connected edges in the weighted network, i.e., the linguistic evaluations given by the experts are converted into triangular fuzzy numbers by the method of rough set, and then the triangular fuzzy numbers are converted into clear values.

(1)

Determining fuzzy semantic variables and corresponding triangular fuzzy numbers

When using the triangular fuzzy number M(l,m,μ), the corresponding fuzzy semantic variables should be defined first. The fuzzy semantic variables and their triangular fuzzy numbers applied in this paper are shown in Table 1.

Table 1.

Semantic variables based on triangular fuzzy numbers

Semantic variable	Triangular ambiguity
Semantic variable	l	m	μ
Very weak influence	0	0	0.1
Weak influence	0	0.1	0.3
General influence	0.3	0.5	0.7
Strong influence	0.7	0.9	1.0
Very strong influence	0.9	1.0	1.0

In the table, l, m and μ represent the minimum possible value, the most possible value and the maximum possible value of the triangular fuzzy number, respectively.

Based on the above analysis, a questionnaire is distributed so that the experts give the fuzzy semantic representation of the degree of influence between the nodes based on the node characteristics, attributes and other indicators, and is converted to the corresponding triangular fuzzy number of the form of $B_{k}^{t} = b_{k}^{t} (l_{k}^{t}, m_{k}^{t}, μ_{k}^{t})$ \[B_{k}^{t}=b_{k}^{t}(l_{k}^{t},m_{k}^{t},\mu _{k}^{t})\], which represents the evaluation of the strength of the kth relationship given by the tth expert.

(2)

Triangular fuzzy number defuzzification

The scoring data of multiple experts are combined into one data by calculating the average: (9) ${\bar{b}}_{k} = \frac{\sum_{t = 1}^{T} b_{k}^{t} (l_{k}^{t}, m_{k}^{t}, μ_{k}^{t})}{T}$ \[{{\bar{b}}_{k}}=\frac{\sum\limits_{t=1}^{T}{b_{k}^{t}\left( l_{k}^{t},m_{k}^{t},\mu _{k}^{t} \right)}}{T}\] Where: ${\bar{b}}_{k}$ \[{{\bar{b}}_{k}}\] denotes the average of all the experts’ scores and is still a triangular fuzzy number, and T is the total number of experts.

Determining the degree of influence between nodes by defuzzification. In this paper, the strength of the degree of influence between nodes is determined in a desired manner.

First, the affiliation function of the triangular fuzzy number M(l,m,μ) is defined as (10) $f_{M} (x) = {\begin{array}{l} (x - l) / (μ - l) & l \leq x \leq m \\ (x - μ) / (m - μ) & m \leq x \leq μ \\ 0 & x < l o r x > μ \end{array}$ \[{{f}_{M}}\left( x \right)=\left\{ \begin{array}{*{35}{l}} {\left( x-l \right)}/{\left( \mu -l \right)}\; & l\le x\le m \\ {\left( x-\mu \right)}/{\left( m-\mu \right)}\; & m\le x\le \mu \\ 0 & x<lorx>\mu \\ \end{array} \right.\]

Then its left and right affiliation functions are: (11) $f_{M}^{L} (x) = (x - l) / (μ - l), f_{M}^{R} (x) = (x - μ) / (m - μ)$ \[f_{M}^{L}\left( x \right)={\left( x-l \right)}/{\left( \mu -l \right)}\;,f_{M}^{R}\left( x \right)={\left( x-\mu \right)}/{\left( m-\mu \right)}\;\]

The corresponding inverse functions are respectively: (12) $φ_{M}^{L} (y) = l + (m - l) y, φ_{M}^{R} (y) = μ + (m - μ) y$ \[\varphi _{M}^{L}\left( y \right)=l+\left( m-l \right)y,\varphi _{M}^{R}\left( y \right)=\mu +\left( m-\mu \right)y\]

Therefore, the expectation of the triangular fuzzy number M(l,m,μ) is: (13) $\begin{matrix} I (M) = η \int_{0}^{1} φ_{M}^{L} (y) d y + (1 - η) \int_{0}^{1} φ_{M}^{R} (y) d y \\ = \frac{η}{2} (l + m) + \frac{1 - η}{2} (m + μ) \end{matrix}$ \[\begin{align} & I\left( M \right)=\eta \int_{0}^{1}{\varphi _{M}^{L}}\left( y \right)dy+\left( 1-\eta \right)\int_{0}^{1}{\varphi _{M}^{R}}\left( y \right)dy \\ & =\frac{\eta }{2}\left( l+m \right)+\frac{1-\eta }{2}\left( m+\mu \right) \end{align}\] Where: 0≤η≤1 is the decision-maker’s “optimism and pessimism coefficient”. In order to ensure the general applicability and operability of the study, this paper makes η = 0.5, that is, assuming that the decision maker is neutral, at this time: (14) $I (M) = \frac{l + 2 m + μ}{4}$ \[I\left( M \right)=\frac{l+2m+\mu }{4}\] is the degree of inter-nodal influence after the triangular fuzzy number is refined, i.e., inter-nodal weight w_ij.

2.2.2

K-means based user requirement clustering

Through the clustering analysis of user requirements, complex requirements can be simplified, analyzed and processed into multiple categories, each category contains its own unique detailed requirements under each category, with a low degree of similarity of the requirements between different categories and a high degree of similarity of the requirements within the same category, which helps to carry out the system design work accurately, reduces the development cost, and improves the competitiveness of the system. In this paper, K-Means clustering algorithm is introduced on the basis of the weighted network model, and the user demand weights are obtained by normalizing the comprehensive eigenvalues of the demand nodes, the number of clusters is determined by using the elbow method, and the initial clustering centers are determined according to the size of the user demand weights, and then the clustering of the user demands is realized [21].

2.2.3

QFD-based user requirements transformation

The system design quality house is the core of quality function unfolding [22], and its role is to convert fuzzy user requirements into technical requirements that can be understood by engineers and technicians, and then waterfall decompose them into system part characteristics, development characteristics and quality characteristics, which is an effective tool for user requirements conversion. The complete system design quality house is shown in Fig. 1, the user requirements as inputs to the system design quality house is a hierarchical list of discrete and fuzzy user requirements collected from market research summarized, classified and processed in terms of importance, which is the key to the implementation of QFD.

2.3

Example analysis

2.3.1

Analysis of user needs

In recent years, with the increasing maturity of artificial intelligence technology, large-scale software system design is becoming more and more intelligent, which provides users with a lot of convenience. 2021, despite the overall consumer market performance downturn, but the survey data shows that the household appliances based on large-scale software system still shows growth, which set the self-cleaning technology of sweeping robots in the cleaning appliances market is in the core position. Compared with the past, the sweeping machine in the technical innovation makes it in each function has been greatly improved and upgraded, in the navigation and obstacle avoidance, combined with a variety of technologies to solve the past sweeping robots, “chaotic collision”, “leakage of sweeping” and other common problems, the use of modularized design and other artificial intelligence large-scale software system design. Modular design and other artificial intelligence large-scale software system design technology to improve the intelligent cleaning ability of the sweeping robot and the planning ability of the sweeping route to reduce the burden of labor, in addition to mopping, water tank capacity and other aspects have been greatly improved. In this paper, combining the previous industry reports of sweepers and consulting relevant experts and industry personnel, numerous demands are organized, and the detailed user demands are shown in Table 2.

Table 2.

User demand

Demand number	Specific demand	Demand number	Specific demand
SR₁	Strong cleaning ability	SR₁₁	Voice control
SR₂	Monitoring failure information	SR₁₂	Time booking
SR₃	Clean speed	SR₁₃	Small running sound
SR₄	Anti-fall collision	SR₁₄	Lightness
SR₅	Drag integration	SR₁₅	Simple operation
SR₆	Mobile control	SR₁₆	Dump dust box
SR₇	Automatic recharge	SR₁₇	Automatic cleaning mop
SR₈	Standby time	SR₁₈	Automatic add cleaner
SR₉	Fault prediction	SR₁₉	Large dust box
SR₁₀	Price appropriate	SR₂₀	Maintenance convenience

For the determined user requirements, the triangular fuzzy number is firstly used to realize the transformation from semantic evaluation to interval value, and then the precise value of the relationship between each user requirement is obtained according to the defuzzification of the triangular fuzzy number. In order to ensure the accuracy and objectivity of the user demand relationship, in addition to selecting professional researchers, users are invited to participate in the evaluation of the demand relationship. In this paper, 10 experts and 10 users are selected to form an evaluation group to score the semantic evaluation of the interrelationships between demands from different perspectives and get the comprehensive evaluation value of the triangular fuzzy number as shown in Table 3.

Table 3.

User demand triangular fuzzy number comprehensive evaluation value

Demand number	SR₁	SR₂	SR₃	SR₄	⋯	SR₁₇	SR₁₈	SR₁₉	SR₂₀
SR₁	1	(0,0.03,0.12)	(0.78,0.93,1)	(0.32,0.58,0.74)	⋯	0	0	0	0
SR₂		1	(0,0.08,0.17)	(0,0.04,0.19)	⋯	0	0	(0.9,1.0,1.0)	(0.28,0.47,0.63)
SR₃			1	(0.16,0.34,0.58)	⋯	(0,0.09,0.26)	0	(0,0,0.07)	0
SR₄				1	⋯	(0.6,0.81,0.93)	(0,0.15,0.33)	0	0
⋯					1	⋯	⋯	⋯	⋯
SR₁₈					⋯	1	(0,0.05,0.17)	0	0
SR₁₉					⋯		1	0	0
SR₂₀					⋯			1	(0.54,0.71,0.82)
SR₂₀					⋯				1

For the triangular fuzzy number comprehensive evaluation value given by the expert group, according to the formula of its defuzzification processing, to get the corresponding exact value, after the defuzzification of the user demand relationship comprehensive evaluation value as shown in Table 4, in which each specific value represents the degree of interaction between the user’s needs, the larger the value, it means that the two needs the greater the degree of mutual influence, when the value of the comprehensive evaluation of the value of 0, then it means that the two demands When the composite evaluation value is 0, it means that the two needs do not affect each other and are independent of each other.

Table 4.

User demand relationship comprehensive evaluation value

Demand number	SR₁	SR₂	SR₃	SR₄	⋯	SR₁₇	SR₁₈	SR₁₉	SR₂₀
SR₁	1	0.09	0.93	0.58	⋯	0	0	0	0
SR₂		1	0.02	0.04	⋯	0	0	0.99	0.47
SR₃			1	0.37	⋯	0.06	0	0.05	0
SR₄				1	⋯	0.79	0.17	0	0
⋯					1	⋯	⋯	⋯	⋯
SR₁₈						1	0.08	0	0
SR₁₉							1	0	0
SR₂₀								1	0.73
SR₂₀									1

According to the formula for the eigenvalue of each attribute of the weighted network, the eigenvalue of the parameters of each demand node is calculated separately, and the weight of user demand is obtained by normalizing the comprehensive eigenvalue, and the results of the calculation of each parameter are shown in Table 5. Through the establishment of the demand network model, the mutual influence relationship between user demands is considered, and more accurate demand weights are obtained, thus laying the foundation for the clustering of demands and the transformation of demands.

Table 5.

Users Require Network Node Eigenvalues

Demand node	Degrees	Network sparseness	Average path length	Network diameter	Concentration coefficient	Weight
SR₁	12	3.51	39.08	0.80	0.55	0.0658
SR₂	10	5.21	45.65	0.91	0.50	0.0628
SR₃	15	5.91	50.16	0.81	0.52	0.0637
SR₄	12	5.24	45.98	0.86	0.54	0.0661
SR₅	16	3.83	49.13	0.71	0.43	0.0537
SR₆	10	4.17	22.49	0.83	0.50	0.0577
SR₇	8	3.13	36.80	0.83	0.40	0.0550
SR₈	9	3.62	43.00	0.75	0.48	0.0555
SR₉	9	4.93	39.93	0.58	0.37	0.0499
SR₁₀	17	7.08	53.58	0.61	0.40	0.0531
SR₁₁	9	3.31	35.27	0.66	0.46	0.0550
SR₁₂	9	2.86	38.62	0.64	0.40	0.0534
SR₁₃	8	1.30	35.54	0.79	0.41	0.0476
SR₁₄	6	1.59	25.22	0.68	0.37	0.0419
SR₁₅	5	2.01	22.94	0.49	0.33	0.0396
SR₁₆	5	2.21	22.80	0.69	0.39	0.0448
SR₁₇	4	1.47	16.64	0.69	0.37	0.0439
SR₁₈	3	1.31	15.05	0.61	0.37	0.0400
SR₁₉	5	1.72	12.73	0.61	0.31	0.0457
SR₂₀	8	2.67	29.35	0.61	0.34	0.0048

2.3.2

Clustering of user requirements

As can be seen from Table 5, the top-ranked demand nodes in terms of weight are SR₄, SR₁, SR₃, SR₂ and SR₆, whose corresponding user demands are anti-fall collision, strong sweeping ability, fast cleaning speed, monitoring fault information, and movable operation and control, i.e., they are the methods that users are most concerned about, and prioritizing these kinds of demands in the design of large-scale software systems can provide users with a better experience and solution and helps to improve the system’s market competitiveness and user satisfaction.In the process of user demand clustering, this no shopping user demand as the initial clustering center candidate points, using the elbow method to determine the number of clusters, input the weight of the user demand used, apply MATLAB to calculate the output about the number of classifications k and the sum of squares of the error within the cluster, through which you can determine the optimal number of clusters of demand, and then use clustering algorithms to classify and group all the user demand, which helps to better understanding of user requirements, understanding the real needs and key pain points of users, and providing strong support and guidance for the development of a large-scale software system sweeper that is more composite of user expectations and needs.

Figure 2 shows the result graph of demand data processing through the elbow method, and there is an obvious inflection point at the classification number k = 3, which corresponds to the optimal number of clusters. According to the weight ordering of each demand and the optimal number of chiron classes in Table 5, the initial clustering center of the K-means algorithm is determined to be the demand node 4, node 1, and node 3, and the user demand is clustered through the K-means algorithm, and the clustering results are shown in Figure 3. The user requirements are clustered into three categories, respectively {SR₁,SR₂,SR₃,SR₄}, {SR₅,SR₆,SR₇,SR₈,SR₉,SR₁₀,SR₁₁,SR₁₂}, {SR₁₃,SR₁₄,SR₁₅,SR₁₆,SR₁₇,SR₁₈,SR₁₉,SR₂₀}. It can be seen that the first category of requirements are functional requirements, which are closer to the actual use of large-scale software system products and satisfy the basic large-scale functionality. The second category of requirements is emotional requirements, which are more satisfying to the users. The third category of requirements are additional requirements, which are important requirements to be considered behind the realization of system functions.

2.3.3

Transformation of user needs

In establishing the QFD model, the independence of each function of the product needs to be taken into account, and the independence of each function is guaranteed by using the independence axiom in axiomatic design. By analyzing the home sweeper, its main functions include sweeping and cleaning function, monitoring and planning function, intelligent control function, optimization function and human-machine interaction function. The improved QFD is used to establish the quality house of user requirements-product functions, and the degree of correlation between the parameters in the requirements-functions relationship matrix is represented by the numbers “1”, “3”, “5”, “7”, “9”, the larger the value indicates that the stronger the relationship, with ‘0’ indicates that there is no relationship between the two parameters, the function of the sweeper with FR₁, FR₂, FR₃, FR₄, FR₅, respectively. The quality configuration relationship between the user needs of household floor sweepers and product features is shown in Table 6.

Table 6.

Relationship between User Requirements and Product Quality

Demand node	Weight	FR₁	FR₂	FR₃	FR₄	FR₅
SR₁	0.0658	7	3	5	9	5
SR₂	0.0628	9	7	7	1	9
SR₃	0.0637	5	5	3	3	3
SR₄	0.0661	3	3	1	3	1
SR₅	0.0537	0	0	0	5	0
SR₆	0.0577	1	1	9	0	3
SR₇	0.0550	3	0	0	2	7
SR₈	0.0555	5	0	1	3	3
SR₉	0.0499	3	5	1	5	1
SR₁₀	0.0531	1	0	3	7	0
SR₁₁	0.0550	5	3	5	3	0
SR₁₂	0.0534	7	1	7	5	3
SR₁₃	0.0476	3	7	7	3	5
SR₁₄	0.0419	0	0	5	7	0
SR₁₅	0.0396	0	5	3	9	0
SR₁₆	0.0448	1	3	9	3	0
SR₁₇	0.0439	0	1	3	1	3
SR₁₈	0.0400	0	3	1	0	1
SR₁₉	0.0457	3	0	0	3	5
SR₂₀	0.0048	5	3	3	5	7

Table 6 gives the relationship matrix of user demand - product function, the absolute importance of each function is calculated by the demand weight and demand - function relationship matrix, the specific calculation process is for the column where the demand weight is located and the column where each product function is located, each row of data is weighted and summed to get the absolute importance of the product function, and then normalization is done to get the relative importance, the results are shown in Table 7. The sweeping and cleaning function is the most important in household sweepers, followed by the intelligent control function monitoring and planning function, which needs to be focused on, and finally the optimization function and human-computer interaction function, respectively. The design and implementation of optimization function and human-computer interaction function can improve the user experience and user satisfaction, and the calculation results of the importance of each function in Table 7 are in line with the engineering reality, which provides a strong data support for the modular design of large-scale software system that meets the user’s needs.

Table 7.

Product function importance

Importance	FR₁	FR₂	FR₃	FR₄	FR₅
Absolute importance	3.9543	3.1687	3.4832	2.5738	1.7064
Relative importance	0.2386	0.2438	0.2573	0.1354	0.1249

3

Modular design basis

3.1

Modular Design System

Modularization design is to gather the same or similar elements in the system together to form functionally independent and structurally complete units (modules), and combine these units according to the requirements to come up with a system that meets the needs of different users. For the modularization design of large-scale software system, it needs to go through a series of processes such as system requirement analysis, module division, module combination and module evaluation before it can be finally user-oriented. Therefore, the modular design system of large-scale software system is derived as shown in Fig. 4.

Modular design of large-scale software systems needs to analyze the conceptual model of the system based on user requirements, that is, to carry out a preliminary decomposition of the system’s functional structure through market research, to complete the mapping of function to structure, and to prepare for the module division of the subsequent system. Module division is a comprehensive optimization process of multi-attribute analysis, through the study and analysis of modular design objectives and the formulation of relevant division guidelines, the use of appropriate module division methods to achieve module division. Module division of the module is divided into general module, special module, etc., design module interface and select the appropriate combination to carry out module combination, so as to get the required modular products. Module evaluation is a key step to judge whether the modular design scheme is reasonable or not, by analyzing the rationality of the functional structure of the module, modularity, complexity and other indicators, comprehensively analyze the advantages and disadvantages of the module scheme, so as to arrive at the optimal module division scheme of the system.

3.2

Key Elements of Modular Design

3.2.1

Module division

Module partitioning is the first step in modular design, which determines how to decompose the system into a number of relatively independent modules. Common ways of module division include division by function, division by hierarchy and division by domain. The goal of module partitioning is to reasonably allocate system functions to each module, so that each module can independently complete a specific function and the dependency relationship between modules is minimized.

3.2.2

Module Interfaces

Module interface is a bridge for communication between modules, which defines the interaction between modules. The design of module interfaces should follow the principles of simplicity, stability and medical use to ensure efficient and reliable communication between modules. Common interface design methods include abstract class or interface based design, event or message based design, and service or API based design.

3.2.3

Module implementation

Module implementation refers to how the functionality of a module is concretely realized. Common module implementation methods include object-oriented programming, functional programming and componentized programming. Object-oriented programming implements module functionality by defining classes and objects, and utilizes features such as encapsulation, integration, and polymorphism to ensure module independence. Functional programming implements the functionality of a module by means of pure and higher-order functions, emphasizing statelessness and invariance. Componentized programming implements the functionality of modules by defining components or plug-ins, supporting modular assembly and dynamic replacement.

3.2.4

Module combinations

Module combination is the combination of individual modules into a complete system. Methods of module combination include dependency injection, mediator patterns, and process or state machine combination methods. Dependency injection manages the dependencies between modules through the framework, realizing the dynamic combination and replacement of modules. Intermediary pattern manages the communication between modules through intermediary objects to realize decoupling and collaboration of modules. Process or state machine manages the execution order and state transitions of modules by defining processes or state machines to realize the orderly combination and control of modules.

4

Modular design algorithm

4.1

Particle swarm optimization algorithm based on probabilistic selection

4.1.1

Particle Swarm Optimization Algorithm

Particle swarm optimization algorithm is a global stochastic search algorithm based on group intelligence by simulating the migration and flocking behaviors of bird flocks during foraging [23]. Solving the software modular design problem using particle swarm optimization algorithm has to include key processes such as coding and initialization, position update, etc.

4.1.2

Coding and population initialization

For the modular design problem of a software system, each particle represents a feasible modularized design solution, represented by a modular design matrix. The initialization of the population in the elementary particle swarm algorithm is done in a random initialization manner, which does not make use of the a priori knowledge of the problem to be solved, and the initial solution tends to be poor. In the software system method network diagram, the methods in which direct calling relationships occur are connected to each other by edges. The more frequent the calling relationship, the closer the two methods are connected. The two methods that have indirect calling relationships are measured by the minimum number of edges passing between two vertices in the graph to measure the degree of connection between the two vertices. Therefore, in this paper, taking into account the characteristics of the software modular design problem, the probabilistic choice of initialization is combined with the principle of the shortest path of the graph, and the initialization is carried out in this way to improve the quality of the initial solution.

The call relationship graph of a software system is expressed as G = (V,R), assuming that the software system is designed as m modules, then in Figure G to find the first m degree of the largest and not directly adjacent to each other as the center of the software module vertices, find the m center vertices are expressed as $V^{'} = {{v^{'}}_{1}, {v^{'}}_{2}, \dots, {v^{'}}_{k}, \dots, {v^{'}}_{m}} (1 \leq k \leq m)$ \[{V}'=\{{{{v}'}_{1}},{{{v}'}_{2}},\cdots ,{{{v}'}_{k}},\cdots ,{{{v}'}_{m}}\}(1\le k\le m)\], and V′⊆V, respectively, calculate the shortest path from each vertex to the center of each module can be expressed as $d i s t (v_{i}, {v^{'}}_{k}) (1 \leq k \leq m)$ \[dist({{v}_{i}},{{{v}'}_{k}})(1\le k\le m)\], compare $d i s t (v_{i}, {v^{'}}_{1}), d i s t (v_{i}, {v^{'}}_{2}), \dots d i s t (v_{i}, {v^{'}}_{m})$ \[dist({{v}_{i}},{{{v}'}_{1}}),dist({{v}_{i}},{{{v}'}_{2}}),\cdots dist({{v}_{i}},{{{v}'}_{m}})\], with the shortest path as a probability of choosing the calculation of the vertex to which the v_i belongs to the module center.

4.1.3

Position update method

Assuming that the position of the w st particle at the time of iteration to step t is denoted as $C l u s t e r_{M}^{t} = [\begin{matrix} a_{M 11}^{t} & a_{M 12}^{t} & \dots & a_{M 1 n}^{t} \\ a_{M 21}^{t} & a_{M 22}^{t} & \dots & a_{M 2 n}^{t} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ a_{M m 1}^{t} & a_{M m 2}^{t} & \dots & a_{M m n}^{t} \end{matrix}]$ $Cluster_{M}^{t}=\left[ \begin{matrix} a_{M11}^{t} & a_{M12}^{t} & \cdots & a_{M1n}^{t} \\ a_{M21}^{t} & a_{M22}^{t} & \cdots & a_{M2n}^{t} \\ \vdots & \vdots & \vdots & \vdots \\ a_{Mm1}^{t} & a_{Mm2}^{t} & \cdots & a_{Mmn}^{t} \\ \end{matrix} \right]$, it is necessary to determine the position code $C l u s t e r_{M}^{t + 1}$ $Cluster_{M}^{t+1}$ at the time of iteration to step t+1, i.e., it is necessary to determine the module to which each method in particle w belongs at the time of iteration to step t+1. Suppose that to determine the module to which method f_j(1≤j≤n) belongs, the value of the i(1≤i≤m)th row and j(1≤j≤n)th column of $C l u s t e r_{M}^{t}$ $Cluster_{M}^{t}$ is 1, and the other values of the jth column are 0, i.e., $a_{M i j}^{t} = 1$ $a_{Mij}^{t}=1$, $a_{M 1 j}^{t} = 0$ $a_{M1j}^{t}=0$, $a_{M 2 j}^{t} = 0$ $a_{M2j}^{t}=0$, $a_{M 3 j}^{t} = 0, \dots$ $a_{M3j}^{t}=0,\cdots $, $a_{M m j}^{t} = 0$ $a_{Mmj}^{t}=0$. From which the matrix is constructed: (15) $A_{ω}^{t_{i}} = [\begin{matrix} α_{ω 11}^{t} & α_{ω 12}^{t} & \dots & 0 & \dots & α_{ω 1 n}^{t} \\ α_{ω 21}^{t} & α_{ω 22}^{t} & \dots & 0 & \dots & α_{ω 2 n}^{t} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ α_{ω i 1}^{t} & α_{ω i 2}^{t} & \dots & 1 & \dots & α_{ω i n}^{t} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ α_{ω m 1}^{t} & α_{ω m 2}^{t} & \dots & 0 & \dots & α_{ω m n}^{t} \end{matrix}]$ $A_{\omega }^{{{t}_{i}}}=\left[ \begin{matrix} \alpha _{\omega 11}^{t} & \alpha _{\omega 12}^{t} & \cdots & 0 & \cdots & \alpha _{\omega 1n}^{t} \\ \alpha _{\omega 21}^{t} & \alpha _{\omega 22}^{t} & \cdots & 0 & \cdots & \alpha _{\omega 2n}^{t} \\ \vdots & \vdots & {} & \vdots & {} & \vdots \\ \alpha _{\omega i1}^{t} & \alpha _{\omega i2}^{t} & \cdots & 1 & \cdots & \alpha _{\omega in}^{t} \\ \vdots & \vdots & {} & \vdots & {} & \vdots \\ \alpha _{\omega m1}^{t} & \alpha _{\omega m2}^{t} & \cdots & 0 & \cdots & \alpha _{\omega mn}^{t} \\ \end{matrix} \right]$

In total, m such matrix $C l u s t e r_{M}^{t_{i}} (1 \leq i \leq m)$ $Cluster_{M}^{{{t}_{i}}}\left( 1\le i\le m \right)$ can be constructed, and the fitness value for each positional code $C l u s t e r_{M}^{t_{i}} (1 \leq i \leq m)$ $Cluster_{M}^{{{t}_{i}}}\left( 1\le i\le m \right)$ is calculated according to Eq. (15) $f i t n e s s_{M_{i}^{*}}$ \[fitnes{{s}_{M_{i}^{*}}}\] the probability value for selecting each positional code $C l u s t e r_{M}^{t_{i}}$ $Cluster_{M}^{{{t}_{i}}}$ is calculated separately prob_{M_i} based on the calculated fitness value: (16) $p r o b_{x_{i}} = \frac{f i t n e s s_{x_{i}}}{\sum_{k = 1}^{m} f i t n e s s_{x_{k}}}$ \[pro{{b}_{{{x}_{i}}}}=\frac{\text{ }fitnes{{s}_{{{x}_{i}}}}}{\sum\limits_{k=1}^{m}{fitnes{{s}_{{{x}_{k}}}}}}\]

Then method f_j selects module i with probability prob_i, thus determining the module to which method f_j belongs, and determining the module to which each method of the wth particle belongs, i.e., the value of $C l u s - t e r_{x}^{t + 1}$ $Clus-ter_{x}^{t+1}$.

4.1.4

Convergence analysis

Traditional particle swarm optimization algorithms are not globally convergent due to the iterative formula traversal being too directional. In this paper, we give a global convergence theorem for hybrid optimization algorithms.

Theorem 1.An iterative evolutionary algorithm must converge globally to the optimal solution with probability 1 if the iterative algorithm has the following 2 conditions.

Condition 1.Elite retention is used when comparing the historical optimal solution with the current solution: (17) $f_{i + 1}^{*} = {\begin{array}{l} f (x) & f (x) < f_{i}^{*} \\ f_{i}^{*} & o t h e r \end{array}$ \[f_{i+1}^{*}=\left\{ \begin{array}{*{35}{l}} f\left( x \right) & f\left( x \right)<f_{i}^{*} \\ f_{i}^{*} & other \\ \end{array} \right.\] where f_i is the optimal value before the ind iteration.

Condition 2. the probability of transferring from any non-global optimal point (set to x²) to the corresponding level set L(x^*) = {x|f(x)<f(x^*),x∈S} is greater than zero, where S is the feasible solution space.

It is easy to see that the probabilistic selection particle swarm algorithm satisfies the 2 conditions stated in Theorem 1 above, so it globally converges to the optimal solution with probability 1. This is illustrated as follows: firstly, in the elite retention strategy of the particle swarm algorithm, the one with the largest value of the fitness is selected as gbest every time, which satisfies Condition 1. Secondly, the algorithm selects each possible location with a probabilistic value, and any two locations in the feasible solution space are reachable, at any moment, the probability of arriving at the other location from one location is greater than zero, with the ability to get rid of the local optimum, satisfying condition 2.

4.2

Experiments and Analysis of Results

This paper is based on theoretical data and actual open source projects for experimental analysis and comparison with the hMetis algorithm, in order to verify the software modular design algorithm and the rationality and effectiveness of this paper. The experimental data are shown in Table 8, which shows that the number of code of the experimentally selected software systems are all above 10000, which meets the standard of large-scale software systems.

Table 8.

Experimental data

Case number	Software system name	Class number	Method number	Number of calls	Code number
1	Merge	103	138	113	12983
2	Carch	192	291	462	13482
3	Yahoo	231	584	593	15671
4	Bitcoin	576	4083	2194	19834
5	Emule	588	8326	3586	20976
6	Svn	784	9248	5735	25847
7	V8	932	11823	10842	39594

4.2.1

Experimental cases

This section from the theoretical data and the actual case of 2 aspects of the experimental analysis, and in the follow-up on the design results and other algorithms for comparison, to verify the reasonableness and effectiveness of this paper’s method.

Theoretically, and with a set of test cases, that is, reflecting the large-scale software system amount of source code, can obviously analyze the modular design situation. Then use the algorithm proposed in this paper to carry out modularization design, the two to verify the reasonableness of the algorithm in this paper. Among them, merge and Carch are theoretical data. Merge is mainly a merger of 2 independent programs. Modular design, obviously 2 independent programs are divided into their own modules, 2 sub-modules and then based on the modularity of the design of a finer-grained sub-module.Carch is the implementation of this paper’s algorithm. The experimental results of the design carried out for the software system in Table 8 are shown in Table 9.

Table 9.

Module division experimental results

Case number	Software system name	Vertex number	Population number	Partition module
1	Merge	13	15	5
2	Carch	19	12	4
3	Yahoo	31	29	7
4	Bitcoin	138	108	9
5	Emule	283	236	13
6	Svn	302	197	19
7	V8	734	494	134

In fact, real open source software systems are also experimentally analyzed in this paper. In addition, a number of other open source software systems are also verified by extensive experiments. Unlike theoretical data, large open source systems cannot visualize their information, and the experimental data are analyzed with the help of source code analysis tools (e.g., Understand, Sourcenav), including the size of the system’s source code, the number of classes, and the inter-class calls within the source code, to get the size of the experimental data and its complexity.

4.2.2

Analysis of results

In this section, based on the resultant modules designed from the above experiments, the design results of the hMetis algorithm are compared with the design results of the hMetis algorithm to analyze the goodness of the design results in this paper. The so-called good and bad design results are balanced between a tighter association within the module and a lower coupling between the module and the module. In order to intuitively see the difference between the two design results, the visualization of the design results is realized, i.e., the design result modules and inter-module associations are displayed, and the average module degree between the design modules derived from the 2 algorithms is calculated. Taking the case 4 listed in this paper as an example, the design results based on hMetis and the design results based on this paper’s method are displayed as shown in Fig. 5, respectively. (a) and (b) are the modular design results of the hMetis algorithm and this paper’s algorithm, respectively, where the dots represent the resultant modules, and the size of which indicates how many vertices are contained in the module, respectively, and the various dots are labeled by different colors, which indicate that the connection tightness within the module degree, the darker the color, the stronger the degree of cohesion, in a way, this is a better design, it is recommended, the modules are connected by different thickness of the line, representing the degree of coupling between the modules, the thicker the line, the greater the degree of inter-module coupling, which is not recommended. In Fig. 5(a), the modules seem to be more uniform in size, but the connecting lines between the modules are thicker compared to Fig. 5(b), i.e., there is a large degree of inter-module coupling in Fig. 5(a).

The values of segmented expected modularity obtained based on the 2 algorithms are shown in Fig. 6, and it can be seen that the design results obtained based on this paper’s algorithm have an expected modularity greater than that of the design based on the hMetis algorithm, i.e., this paper’s algorithm is better than the hMetis as a whole, and the reasonableness and effectiveness of this paper’s algorithm is verified.

5

Modular design algorithm complexity analysis and optimization

5.1

Algorithm complexity analysis

The purpose of algorithm analysis is to choose the right algorithm to solve a problem, and also to improve the algorithm. The same problem can usually be solved by different algorithms. Usually, evaluating the complexity of an algorithm consists of two main parts: the time complexity and the space complexity of the algorithm [24].

5.1.1

Time complexity

The PSPSO algorithm has been widely applied to many large-scale software system modular design problems, but there are few studies on the time complexity of the algorithm. In this section, we will initially explore the relationship between the time complexity of the PSPSO algorithm and the size of the problem and the population size model.

To facilitate the analysis, the concept of pheromone is introduced in the PSPSO algorithm to characterize the biological properties of the particle population. In this section, the time complexity of the simplified PSPSO algorithm will be analyzed based on the pheromone rate, which is a function of pheromone, and the concept of pheromone rate is briefly introduced below. For (i^*,j^*)∈s^*, the pheromone rate is shown in the following equation: (18) $c (i, t) = \frac{τ^{α} (i^{*}, j^{*}, t)}{\sum_{k \in D_{i}} τ^{α} (i^{*}, k, t)}$ \[c(i,t)=\frac{{{\tau }^{\alpha }}\left( {{i}^{*}},{{j}^{*}},t \right)}{\sum\limits_{k\in {{D}_{i}}}{{{\tau }^{\alpha }}}\left( {{i}^{*}},k,t \right)}\] Where τ(i^*,j^*,t) is the pheromone on the optimal path (i^*,j^*) at the moment of t, and D_i is the set of all the paths that can be selected by the particle swarm located at point i.

Because the PSPSO algorithm is more complex, and its time complexity is more difficult to realize through theoretical research, the probability selection function and pheromone update strategy of the PSPSO algorithm are simplified as follows.

Probability selection function: (19) $P (i, j, t) = {\begin{array}{l} \frac{τ^{α} (i, j, t)}{\sum_{k \in D_{i}} τ^{α} (i, k, t)} & i f (i, j) \in D_{i} \\ 0 & o t h e r w i s e \end{array}$ $P\left( i,j,t \right)=\left\{ \begin{array}{*{35}{l}} \frac{{{\tau }^{\alpha }}\left( i,j,t \right)}{\sum\limits_{k\in {{D}_{i}}}{{{\tau }^{\alpha }}}\left( i,k,t \right)} & if\left( i,j \right)\in {{D}_{i}} \\ 0 & otherwise \\ \end{array} \right.$

In the probability selection function of the above equation, the effect of heuristic information is not considered, so the analysis of complexity in this section mainly considers the effect of pheromone rate c(i,t) on complexity. From Eq. (19), it can be seen that the probability selection function is a bounded function, which is in line with the definition of the probability selection function in the PSPSO algorithm.

The pheromone update strategy is as follows: (20) $τ (i, j, t_{m + 1}) = τ (i, j, t_{m}) + μ (i, j, t_{m})$ \[\tau \left( i,j,{{t}_{m+1}} \right)=\tau \left( i,j,{{t}_{m}} \right)+\mu \left( i,j,{{t}_{m}} \right)\]

The effect of the pheromone residual rate function is not considered in the pheromone update strategy.

In the following, the time complexity of the PSPSO algorithm will be initially analyzed based on the simplified model of the algorithm given above.

PSPSO-1: Set the number of particle swarms k = 1, ∀(x,y)∈s, s∈S_iter(t)(t = 1,2,⋯), the pheromone increment function μ(i,j,t_m) is variable, and the probability selection function and pheromone updating strategy are shown in Eqs. (19) and (20).

Let ξ^IGACO−1(t) be the stochastic process of PSPSO-1 algorithm, and the expectation of pheromone rate at the moment of t+1 is E{c(i,t+1)|ξ^tGACo−1a(t)}. The relationship between this expectation and pheromone rate c(i,t) is derived and simplified by the following formula.

The derivation process is as follows: (21) $\begin{array}{l} E {c (i, t + 1) | ξ^{I G A C O - 1} (t)} \\ = \sum_{〈 i - 1, y 〉 \in D (i - 1) - {〈 i - 1, i 〉}} [\frac{τ (i - 1, y, t)}{\sum_{〈 i - 1, x 〉 \in D (i - 1)} τ (i - 1, x, t)} \cdot \frac{τ (i - 1, i, t)}{l}] \\ + \frac{τ (i - 1, i, t)}{\sum_{〈 i - 1, x 〉 \in D (i - 1)} τ (i - 1, x, t)} \cdot \frac{τ (i - 1, i, t) + μ (i - 1, i, t)}{g} \\ = \sum_{〈 i - 1, y 〉 \in D (i - 1) - {〈 i - 1, i 〉}} [\frac{τ (i - 1, i, t)}{\sum_{〈 i - 1, x 〉 \in D (i - 1)} τ (i - 1, x, t)} \cdot \frac{τ (i - 1, y, t)}{l}] \\ + \frac{τ (i - 1, i, t)}{\sum_{〈 i - 1, x 〉 \in D (i - 1)} τ (i - 1, x, t)} \cdot \frac{τ (i - 1, i, t) + μ (i - 1, i, t)}{g} \end{array}$ \[\begin{align} & E\left\{ c\left( i,t+1 \right)\left| {{\xi }^{IGACO-1}} \right.\left( t \right) \right\} \\ & =\sum\limits_{\left\langle i-1,y \right\rangle \in D\left( i-1 \right)-\left\{ \left\langle i-1,i \right\rangle \right\}}{\left[ \frac{\tau \left( i-1,y,t \right)}{\sum\limits_{\left\langle i-1,x \right\rangle \in D\left( i-1 \right)}{\tau }\left( i-1,x,t \right)}\cdot \frac{\tau \left( i-1,i,t \right)}{l} \right]} \\ & +\frac{\tau \left( i-1,i,t \right)}{\sum\limits_{\left\langle i-1,x \right\rangle \in D\left( i-1 \right)}{\tau }\left( i-1,x,t \right)}\cdot \frac{\tau \left( i-1,i,t \right)+\mu \left( i-1,i,t \right)}{g} \\ & =\sum\limits_{\left\langle i-1,y \right\rangle \in D\left( i-1 \right)-\left\{ \left\langle i-1,i \right\rangle \right\}}{\left[ \frac{\tau \left( i-1,i,t \right)}{\sum\limits_{\left\langle i-1,x \right\rangle \in D\left( i-1 \right)}{\tau }\left( i-1,x,t \right)}\cdot \frac{\tau \left( i-1,y,t \right)}{l} \right]} \\ & +\frac{\tau \left( i-1,i,t \right)}{\sum\limits_{\left\langle i-1,x \right\rangle \in D\left( i-1 \right)}{\tau }\left( i-1,x,t \right)}\cdot \frac{\tau \left( i-1,i,t \right)+\mu \left( i-1,i,t \right)}{g} \\ \end{align}\]

In Eq. (21), to simplify the equation, let: (22) $l = μ (i - 1, y, t) + \sum_{〈 i - 1, x 〉 \in D (i - 1)} τ (i - 1, x, t)$ \[l=\mu \left( i-1,y,t \right)+\sum\limits_{\left\langle i-1,x \right\rangle \in D\left( i-1 \right)}{\tau }\left( i-1,x,t \right)\] (23) $g = μ (i - 1, i, t) + \sum_{〈 i - 1, x 〉 \in D (i - 1)} τ (i - 1, x, t)$ \[g=\mu \left( i-1,i,t \right)+\sum\limits_{\left\langle i-1,x \right\rangle \in D\left( i-1 \right)}{\tau }\left( i-1,x,t \right)\] (24) $h = \sum_{〈 i - 1, y 〉 \in D (i - 1) - {〈 i - 1, i 〉}} \frac{τ (i - 1, y, t) \cdot [μ (i - 1, i, t) - μ (i - 1, y, t)]}{l}$ \[h=\sum\limits_{\left\langle i-1,y \right\rangle \in D\left( i-1 \right)-\left\{ \left\langle i-1,i \right\rangle \right\}}{\frac{\tau \left( i-1,y,t \right)\cdot [\mu \left( i-1,i,t \right)-\mu \left( i-1,y,t \right)]}{l}}\]

Based on the concept of pheromone rate, equation (21) can be further simplified as follows: (25) $\begin{array}{l} E {c (i, t + 1) | ξ^{I G A C O - 1} (t)} \\ = [\frac{τ (i - 1, i, t) + μ (i - 1, i, t)}{g} + \sum_{〈 i - 1, y 〉 \in D (i - 1) - {〈 i - 1, i 〉}} \frac{τ (i - 1, y, t)}{l}] \cdot c (i, t) \\ = [\frac{τ (i - 1, i, t) + μ (i - 1, i, t)}{g} + \sum_{〈 i - 1, y 〉 \in D (i - 1) - {〈 i - 1, i 〉}} \frac{τ (i - 1, y, t)}{g} + \frac{h}{g}] \cdot c (i, t) \\ = [\frac{μ (i - 1, i, t) + \sum_{〈 i - 1, x 〉 \in D (i - 1)} τ (i - 1, i, t)}{g} + \frac{h}{g}] \cdot c (i, t) \\ = (\frac{g}{g} + \frac{h}{g}) \cdot c (i, t) \\ = (1 + \frac{h}{g}) \cdot c (i, t) \end{array}$ \[\begin{align} & E\left\{ c\left( i,t+1 \right)\left| {{\xi }^{IGACO-1}} \right.(t) \right\} \\ & =\left[ \frac{\tau \left( i-1,i,t \right)+\mu \left( i-1,i,t \right)}{g}+\sum\limits_{\left\langle i-1,y \right\rangle \in D\left( i-1 \right)-\left\{ \left\langle i-1,i \right\rangle \right\}}{\frac{\tau \left( i-1,y,t \right)}{l}} \right]\cdot c\left( i,t \right) \\ & =\left[ \frac{\tau \left( i-1,i,t \right)+\mu \left( i-1,i,t \right)}{g}+\sum\limits_{\left\langle i-1,y \right\rangle \in D\left( i-1 \right)-\left\{ \left\langle i-1,i \right\rangle \right\}}{\frac{\tau \left( i-1,y,t \right)}{g}+\frac{h}{g}} \right]\cdot c\left( i,t \right) \\ & =\left[ \frac{\mu \left( i-1,i,t \right)+\sum\limits_{\langle i-1,x\rangle \in D(i-1)}{\tau }\left( i-1,i,t \right)}{g}+\frac{h}{g} \right]\cdot c\left( i,t \right) \\ & =\left( \frac{g}{g}+\frac{h}{g} \right)\cdot c\left( i,t \right) \\ & =\left( 1+\frac{h}{g} \right)\cdot c\left( i,t \right) \\ \end{align}\]

From equations (22), (23), and (24), l, g, and h are functions of variables i and t, and in order to conveniently represent and highlight the relationship between the expectation of the pheromone rate at the next moment and the current pheromone rate, let the function $ψ (i, t) = \frac{h}{g}$ $\psi \left( i,t \right)=\frac{h}{g}$, then: (26) $E {c (i, t + 1) | ξ^{I G A C O - 1} (t)} = [1 + ψ (i, t)] \cdot c (i, t)$ \[E\left\{ c\left( i,t+1 \right)\left| {{\xi }^{IGACO-1}} \right.\left( t \right) \right\}=\left[ 1+\psi \left( i,t \right) \right]\cdot c\left( i,t \right)\]

From equation (26), for i = 1,2,⋯,n, t = 1,2,⋯,E{c(i,t+1)|ξ^IGACO−1a(t)} is proportional to c(i,t). Also, according to the concept of pheromone rate, c(i,t) is proportional to the pheromone increment function μ(i–1,i,t–1), then it can be concluded that E{c(i,t+1)|ξ^IGACO−1a(t)} is also proportional to the pheromone increment function μ(i–1,i,t–1).

If ψ(i,t)≥ψ>0, then: (27) $E {c (i, t) | ξ^{I G A C O - 1} (t - 1)} \geq (1 + ψ) \cdot c (i, t - 1)$ \[E\left\{ c\left( i,t \right)\left| {{\xi }^{IGACO-1}} \right.\left( t-1 \right) \right\}\ge \left( 1+\psi \right)\cdot c\left( i,t-1 \right)\] (28) $E {c (i, t) - c (i, t - 1) | ξ^{I G A C O - 1} (t - 1)} \geq ψ \cdot c (i, t - 1)$ \[E\left\{ c\left( i,t \right)-c\left( i,t-1 \right)\left| {{\xi }^{IGACO-1}} \right.\left( t-1 \right) \right\}\ge \psi \cdot c\left( i,t-1 \right)\]

Based on the analysis of the convergence rate of the PSPSO algorithm, it can be seen that ${ξ^{I G A C O - 1} (t)}_{t = 0}^{+ \infty}$ $\left\{ {{\xi }^{IGACO-1}}\left( t \right) \right\}_{t=0}^{+\infty }$ is an absorbing state Markov process, so: (29) $\begin{matrix} c (i, t) \approx E {c (i, t) | ξ^{I G A C O - 1} (0), ξ^{I G A C O - 1} (1), \dots, ξ^{I G A C O - 1} (t - 1)} \\ = E {c (i, t) | ξ^{I G A C O - 1} (t - 1)} \end{matrix}$ \[\begin{align} & c\left( i,t \right)\approx E\left\{ c\left( i,t \right)\left| {{\xi }^{IGACO-1}} \right.\left( 0 \right),{{\xi }^{IGACO-1}}\left( 1 \right),\cdots ,{{\xi }^{IGACO-1}}\left( t-1 \right) \right\} \\ & =E\left\{ c\left( i,t \right)\left| {{\xi }^{IGACO-1}}\left( t-1 \right) \right. \right\} \end{align}\]

From equation (27): (30) $\begin{array}{l} E {c (i, t) | ξ^{I G A C O - 1} (0), ξ^{I G A C O - 1} (1), \dots, ξ^{I G A C O - 1} (t - 1)} \\ = E {c (i, t) | ξ^{I G A C O - 1} (t - 1)} \\ \geq (1 + ψ) \cdot c (i, t - 1) \\ \approx (1 + ψ) \cdot E {c (i, t) | ξ^{I G A C O - 1} (t - 2)} \\ \geq {(1 + ψ)}^{2} \cdot c (i, t - 2) \\ ⋮ \\ \geq {(1 + ψ)}^{t} \cdot c (i, 0) \end{array}$ \[\begin{align} & E\left\{ c\left( i,t \right)\left| {{\xi }^{IGACO-1}} \right.\left( 0 \right),{{\xi }^{IGACO-1}}\left( 1 \right),\cdots ,{{\xi }^{IGACO-1}}\left( t-1 \right) \right\} \\ & =E\left\{ c\left( i,t \right)\left| {{\xi }^{IGACO-1}} \right.\left( t-1 \right) \right\} \\ & \ge \left( 1+\psi \right)\cdot c\left( i,t-1 \right) \\ & \approx \left( 1+\psi \right)\cdot E\left\{ c\left( i,t \right)\left| {{\xi }^{IGACO-1}} \right.\left( t-2 \right) \right\} \\ & \ge {{\left( 1+\psi \right)}^{2}}\cdot c\left( i,t-2 \right) \\ & \vdots \\ & \ge {{\left( 1+\psi \right)}^{t}}\cdot c(i,0) \\ \end{align}\]

From Eq. (27), the pheromone rate at moment t = 0 is c(i,0) = 1/n, so Eq. (30) can be further simplified as: (31) $E {c (i, t) | ξ^{I G A C O - 1} (0), ξ^{I G A C O - 1} (1), \dots, ξ^{I G A C O - 1} (t - 1)} \geq \frac{{(1 + ψ)}^{t}}{n}$ \[E\left\{ c\left( i,t \right)\left| {{\xi }^{IGACO-1}} \right.\left( 0 \right),{{\xi }^{IGACO-1}}\left( 1 \right),\cdots ,{{\xi }^{IGACO-1}}\left( t-1 \right) \right\}\ge \frac{{{\left( 1+\psi \right)}^{t}}}{n}\] (32) $t \leq \log_{1 + ψ} (n \cdot E {c (i, t) | ξ^{I G A C O - 1} (0), ξ^{I G A C O - 1} (1), \dots, ξ^{I G A C O - 1} (t - 1)})$ \[t\le {{\log }_{1+\psi }}\left( n\cdot E\left\{ c\left( i,t \right)\left| {{\xi }^{IGACO-1}} \right.\left( 0 \right),{{\xi }^{IGACO-1}}\left( 1 \right),\cdots ,{{\xi }^{IGACO-1}}\left( t-1 \right) \right\} \right)\]

Set c_low>c(i,j,0) = 1/n, and when t≥t₀, c(i,t)≥c_low, and when 0<t≤t₀, c(i,t)≤c_low. According to equation (32): (33) $\begin{array}{l} E [t_{0} | ξ^{I G A C O - 1} (0)] \\ \leq \log_{1 + ψ} (n \cdot E {c (i, t) | ξ^{I G A C O - 1} (0), ξ^{I G A C O - 1} (1), \dots, ξ^{I G A C O - 1} (t - 1)}) \end{array}$ \[\begin{align} & E\left[ {{t}_{0}}\left| {{\xi }^{IGACO-1}} \right.\left( 0 \right) \right] \\ & \le {{\log }_{1+\psi }}\left( n\cdot E\left\{ c\left( i,t \right)\left| {{\xi }^{IGACO-1}} \right.\left( 0 \right),{{\xi }^{IGACO-1}}\left( 1 \right),\cdots ,{{\xi }^{IGACO-1}}\left( t-1 \right) \right\} \right) \\ \end{align}\]

It is also obtained from equation (29): (34) $\begin{matrix} E [t_{0} | ξ^{I G A C O - 1} (0)] = \log_{1 + ψ} (n \cdot E {c (i, t) | ξ^{I G A C O - 1} (t - 1)}) \\ = \log_{1 + φ} (n \cdot c (i, t)) \\ \leq \log_{1 + φ} (n \cdot c_{l o w}) \end{matrix}$ \[\begin{align} & E\left[ {{t}_{0}}\left| {{\xi }^{IGACO-1}} \right.\left( 0 \right) \right]={{\log }_{1+\psi }}\left( n\cdot E\left\{ c\left( i,t \right)\left| {{\xi }^{IGACO-1}} \right.\left( t-1 \right) \right\} \right) \\ & ={{\log }_{1+\varphi }}\left( n\cdot c\left( i,t \right) \right) \\ & \le {{\log }_{1+\varphi }}\left( n\cdot {{c}_{low}} \right) \end{align}\]

From the probabilistic selection function of the PSPSO-1 algorithm Eq. (29), which defines ω(c_low,t₀) as an upper bound for Eλ according to Eq. (18), we have: (35) $ω (c_{l o w}, t_{0}) = [1 - ϕ (0)] {[1 - {(1 - c_{l o w}^{n})}^{m}]}^{- 1}$ \[\omega \left( {{c}_{low}},{{t}_{0}} \right)=\left[ 1-\phi \left( 0 \right) \right]{{\left[ 1-{{\left( 1-c_{low}^{n} \right)}^{m}} \right]}^{-1}}\]

For t′ = 1,2,⋯, i = 1,2,⋯,n, let t′ = t–₀t, c(i,t′)≥c_low(t₀)>0. For when t>t₀, c(i,t)>c_low>0, by Eq. (35): (36) $E λ \leq t_{0} + ω (c_{l o w}, t_{0})$ \[E\lambda \le {{t}_{0}}+\omega \left( {{c}_{low}},{{t}_{0}} \right)\]

Because $E λ = \sum_{t = 0}^{+ \infty} [(1 - ϕ (0)) \prod_{i = 1}^{i} (1 - p_{i})]$ $E\lambda =\sum\limits_{t=0}^{+\infty }{\left[ (1-\phi (0))\prod\limits_{i=1}^{i}{\left( 1-{{p}_{i}} \right)} \right]}$, it follows from the definition of the process model of the PSPSO algorithm that ξ^IGACO−1(0) = (S(0),T(0)), S(0) = {S_bs(0)∪S_iter(0)} = ∅ because during the initialization of the algorithm, S(t) = {S_bs(t)∪S_iter(t)} = ∅, so ϕ(0) = 0. Then $E λ = \sum_{t = 0}^{+ \infty} [\prod_{i = 1}^{t} (1 - p_{i})]$ $E\lambda =\sum\limits_{t=0}^{+\infty }{\left[ \prod\limits_{i=1}^{t}{\left( 1-{{p}_{i}} \right)} \right]}$.

From ϕ(0) = 0, it follows that P{ξ^IGACO−1(0)∈Y^*} = 0, so $E λ = \sum_{t = 0}^{+ \infty} [\prod_{i = 1}^{t} (1 - p_{i})]$ $E\lambda =\sum\limits_{t=0}^{+\infty }{\left[ \prod\limits_{i=1}^{t}{\left( 1-{{p}_{i}} \right)} \right]}$ is independent of ξ^IGACO−1(0). This gives E[Eλ|ξ^IGACO−1(0)] = Eλ.

According to equation (35), ω(c_low,t₀) is independent of ξ^IGACO−1(0), so E[ω(c_low,t₀)|ξ(0)] = ω(c_low,t₀). Summarizing the above, we have: (37) $E λ \leq E [t_{0} | ξ^{I G A C O - 1} (0)] + ω (c_{l o w}, t_{0})$ \[E\lambda \le E\left[ {{t}_{0}}\left| {{\xi }^{IGACO-1}} \right.\left( 0 \right) \right]+\omega \left( {{c}_{low}},{{t}_{0}} \right)\]

Since the number of particle swarms in the PSPSO-1 algorithm is 1, m = 1 in Eq. (35) and ϕ(0) = 0, the following equation is obtained: (38) $ω (c_{l o w}^{n}) = c_{l o w}^{- n}$ \[\omega \left( c_{low}^{n} \right)=c_{low}^{-n}\] can be obtained from Eqs. (34), (37), and (38): (39) $E λ^{I G A C O - 1} \leq \log_{(1 + ψ)} (n \cdot c_{l o w}) + c_{l o w}^{- n}$ \[E{{\lambda }^{IGACO-1}}\le {{\log }_{\left( 1+\psi \right)}}\left( n\cdot {{c}_{low}} \right)+c_{low}^{-n}\]

According to the above equation, it can be concluded that if the problem size n becomes larger, more running time is required.

Based on the above case study of the PSPSO algorithm it can be concluded that the time complexity of the PSPSO algorithm increases as the problem size increases and decreases as the number of particle swarms increases.

5.1.2

Space complexity

The following is an analysis of the space complexity of the particle swarm optimization algorithm based on probabilistic selection. Taking n large-scale software system modular design problem as an example, m particle swarms are used in the PSPSO algorithm, then through the analysis of each step of the particle swarm optimization algorithm based on probabilistic selection, its space complexity can be obtained as: (40) $S (n) = O (n^{2}) + O (n \cdot m)$ \[S\left( n \right)=O\left( {{n}^{2}} \right)+O\left( n\cdot m \right)\]

5.1.3

Algorithm Complexity Experiments

1) Experiment on the relationship between the average number of iterations of the PSPSO algorithm and the size of the problem being solved

Based on the analysis of the time complexity of the PSPSO algorithm, a specific probability selection function is given: (41) $P (i, j, t) = \frac{{[\sum_{r = 1}^{2} w_{r} τ^{r} (i, j, t)]}^{α}}{\sum_{l \in D_{i}^{k}} {[\sum_{r = 1}^{2} w_{r} τ^{r} (i, j, t)]}^{α}}, (i, j) \in D_{i}$ \[P(i,j,t)=\frac{{{[\sum\limits_{r=1}^{2}{{{w}_{r}}{{\tau }^{r}}(i,j,t)}]}^{\alpha }}}{\sum\limits_{l\in D_{i}^{k}}{{{[\sum\limits_{r=1}^{2}{{{w}_{r}}{{\tau }^{r}}(i,j,t)}]}^{\alpha }}}},(i,j)\in {{D}_{i}}\] where w₁·τ₁(i,j,t)+w₂·τ₂(i,j,t) replaces the single pheromone τ(i,j,t), w₁ and w₂ in the modular design problem for small-scale software systems, corresponding to the weight coefficients corresponding to the modular design goals. The relevant parameters are set as follows: initialization pheromone τ₀ = 15,α = 3,β =7,γ₀ = 6,λ = 0.05, in the pheromone increment function A = 1.03. number of particle swarms set in the algorithm m = 10, w₁ = w₂ = 0.5.

The relationship between the average number of iterations of the algorithm and the problem size is shown in Fig. 7. As the problem size increases, the number of iterations required by the algorithm to solve the optimal solution of the problem increases gradually. That is, as the size of the required problem increases, the time required by the particle swarm optimization algorithm based on probabilistic selection to solve the problem also increases gradually.

2) Experiment on the relationship between the convergence time of the PSPSO algorithm and the number of particle swarms in the algorithm

Fig. 8 shows the convergence results corresponding to different numbers of particle swarms, where the horizontal coordinates indicate the number of iterations and the vertical coordinates indicate the convergence time of the resulting modular design algorithm. From the figure, it can be seen that within the same number of iterations, with the increase in the number of particle swarms, both in terms of solution quality and convergence time are better optimized. In summary, it can be seen that within the same number of iterations, the appropriate re-increase in the number of particle swarms will reduce the convergence time of the algorithm, i.e., the appropriate increase in the number of particle swarms will reduce the time complexity of the algorithm.

5.2

Algorithm complexity optimization strategy

From the above experimental analysis, it can be concluded that the time complexity of the PSPSO algorithm can be effectively reduced by increasing the number of particle swarms and other methods. Besides, there are still some aspects that are commonly used in the process of algorithm complexity optimization. In this section, we will focus on methods such as dynamic programming, greedy algorithm and partitioning method to provide reference for the complexity optimization of PSPSO algorithm for module design of large-scale software systems.

5.2.1

Dynamic planning

Dynamic programming is a method of solving complex problems by decomposing them into subproblems. It stores the solution for each subproblem to avoid repeated computations. Dynamic programming is suitable for problems with overlapping subproblems and optimal substructure properties, such as Fibonacci series, shortest path problem and knapsack problem. The time complexity of dynamic programming is usually O(n²) or O(n³), but lower complexity can be achieved with appropriate optimization. For example, the geodesic formula for the Fibonacci series can be optimized to a time complexity of O(n) by memetic recursion.

5.2.2

Greedy Algorithm

A greedy algorithm is a method for solving globally optimal problems by choosing the current optimal solution at each step. Greedy algorithms are suitable for problems that satisfy the greedy selection property and the optimal substructure property, such as minimum number of generators, single-source shortest paths, and active selection. The time complexity of a greedy algorithm is usually O(nlog(n)) or O(n²), depending on the specific implementation. For example, the time complexity of the Kruskal algorithm for solving the minimum number of generators is O(E log v), where E is the number of edges and v is the number of vertices. The advantage of the greedy algorithm is its simplicity and efficiency, but it may not be able to obtain a globally optimal solution in a certain write situation.

5.2.3

Partitioning

Partitioning is a method of solving complex problems by recursively decomposing them into subproblems. The partition method is suitable for problems with independent subproblems and merged solution properties, such as fast sorting, merge sorting and large integer multiplication. The time complexity of the partition method is usually O(nlog(n)) or O(n²), depending on the specific decomposition and merging strategies. For example, fast sorting has an average time complexity of O(nlog(n)), but in the worst case may degenerate to O(n²). The advantage of the partitioning method is its efficiency and flexibility for many different types of problems.

6

Conclusion

Based on the analysis and transformation of user requirements, this paper explores the modular design method for large-scale software systems, and explores the complexity and optimization strategy of the proposed algorithm from the time and space perspectives. The PSPSO modular design algorithm used in this paper has a tighter connection within the modules, stronger cohesion, lower inter-module coupling, and the expected modularity of the design results are greater than those of the hMetis algorithm in comparison. Under the condition that the initialization pheromone is set to τ₀ = 15,α = 3,β = 7,γ₀ = 6, λ = 0.05 and the number of particle swarms is m = 10, the number of iterations of the algorithm to find the optimal solution to the large-scale software system design problem increases gradually with the increase of the problem scale, i.e., the complexity of the algorithm shows an upward trend with the expansion of the problem scale. In the same number of iterations, the increase in the number of particle swarms will narrow the convergence time of the algorithm and reduce the complexity of the algorithm. The results of this study show that the complexity of the modular design algorithm can be optimized by reasonably controlling the problem size and appropriately increasing the number of particle swarms, so as to improve the convenience and reliability of large-scale software system design.

Langue:: Anglais

Périodicité:: 1 fois par an
Sujets de la revue:: Sciences de la vie, Sciences de la vie, autres, Mathématiques, Mathématiques appliquées, Mathématiques générales, Physique, Physique, autres

RSS Feed de la revue

Modular design and algorithmic complexity optimization for large-scale software systems

Tiande Pan

Publié en ligne: 17 mars 2025

Reçu: 08 nov. 2024

Accepté: 14 févr. 2025

DOI: https://doi.org/10.2478/amns-2025-0228

Mots clésModular design, PSPSO algorithm, Algorithm complexity, Large-scale software system

© 2025 Tiande Pan, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Mots clés
Modular design, PSPSO algorithm, Algorithm complexity, Large-scale software system