Optimization Design of College Teaching Reform Paths in the Context of Big Data Mining-Driven High-Quality Development of Commerce and Circulation Based on Big Data Mining

The establishment of educational reform program is an important initiative to promote the university teachers to carry out educational teaching research and improve the quality of university education [1]. The education reform project must be combined with the reality of the university and with the reform of the school integrated planning, supporting, based on the school’s talent cultivation goals, from the whole university level of thinking and planning for education reform work, to focus on all aspects of the strength of the joint efforts to promote the development of education reform research work. Each school has a different school philosophy and school characteristics, its school positioning and development concepts are also different, the actual work of the education reform project management will also be very different [2-4]. With the rapid development of the communication and computer industry, the concept of big data is favored by the government, society, and researchers. The arrival of the era of big data for colleges and universities lies in the change of ideas [5]. The application of big data concept in colleges and universities can improve the wisdom of educational management, decision-making and evaluation [6]. Based on the background of the big data era, in order to improve the efficiency of education and teaching reform research project management and give full play to the guiding function and service function of project management, a group of universities have carried out research on the construction of informationization platform for education reform project management [7-8].

Higher education informatization is an effective way to promote higher education reform and innovation and improve quality, and it is the innovation frontier of education informatization development. In the future, we should focus on promoting the in-depth fusion of information technology and higher education, promoting the modernization of education content, teaching means and methods, innovating talent training, research organization and social service mode, promoting cultural heritage and innovation, and promoting the overall improvement of higher education quality [9-11]. As a university teaching manager, it is necessary to actively use advanced information technology to innovatively carry out various educational and teaching reforms and management, especially under the concept of big data, collect and utilize educational and teaching data, improve the level of educational management, guide the educational and teaching work of the school, and continuously promote the improvement of the quality of talent cultivation [12-14].

Educational reform project is a key link and important means of educational teaching reform work in colleges and universities, and the management level of educational reform project affects the development of educational teaching reform work in colleges and universities. As the management department of education reform project, every year the organization undertakes the declaration and completion of various subject projects, and has a library of subject projects in the past years, which are the results of the school’s teaching reform and the wisdom library to guide the school’s education and teaching reform [15-17]. But all along, the subject project declaration and other work more paper form, resulting in more data in the form of paper dispersion is saved, in the data summary and analysis is only limited to the name of the subject project, for the specific study of the form, content, mode, results and other aspects can not be comprehensive and effective summary and analysis of statistics. And with the incentives of education reform policy, teachers’ enthusiasm for education reform continues to improve, the number of declared education reform projects increases year by year, the pressure of effective management of education reform projects increases significantly, the need to improve the level of informationization of project management, revitalization of project management data and information, and effectively improve the efficiency of project management [18-20].

First of all, a systematic overview of factor analysis is carried out, and according to the mathematical model of the factor analysis method, its computational characteristics are summarized, the computational process and steps are sorted out, and the correlation between the factor analysis variables is studied. Subsequently, factor analysis was used to mine and analyze the student achievement data using certain technical routes, so as to discover the shortcomings of the current teaching in colleges and universities. Then, we will explore the construction of a practical teaching system of “basic interconnection, hierarchical progression, integration of competition and innovation, and comprehensive leapfrogging” within the professional group, and build a “five-in-one, virtual and real” cross-professional integrated simulation training center. At the same time, the study introduces the “teaching factory” model in the professional group, deepens the “combination of engineering and learning, school-enterprise cooperation”, and innovates the practical teaching model. On this basis, the article proposes a teaching quality assessment model based on the fireworks algorithm to optimize k-mean clustering, using the fireworks algorithm with the ability to balance global and local search to optimize the k-mean clustering algorithm, and using the obtained data results as the initial clustering centroid of the k-mean clustering algorithm, to solve the problem of the k-mean clustering algorithm easily falling into the local optimum. Finally, based on the results of commerce and circulation majors of students in a university, the k-mean value clustering algorithm is optimized by FWA to achieve accurate and effective clustering segmentation of innovation education, and to explore the relationship between college students’ innovation education and course teaching.

2

Mathematical modeling of factor analysis and geometric interpretation

2.1

Raw data and correlation matrix

To study an object using factor analysis is to study the underlying relationships between its attributes. The raw data, which are the sample values, are provided with 2 random variables x, y, which represent two variables A and B. Their content values are measured for n specimens: (1) $\vec{x} = (x_{1}, x_{2}, \dots \dots, x_{n})$ $$\overrightarrow x = ({x_1},{x_2}, \cdots \cdots ,{x_n})$$ (2) $\vec{y} = (y_{1}, y_{2}, \dots \dots, y_{n})$ $$\overrightarrow y = ({y_1},{y_2}, \cdots \cdots ,{y_n})$$

The samples were first standardized and the mean and variance were calculated according to the following formula: (3) $\begin{array}{l} \bar{x} = \frac{1}{n} \sum_{i = 1}^{n} x_{i} \\ \bar{y} = \frac{1}{n} \sum_{i = 1}^{n} y_{i} \\ σ_{x}^{2} = \frac{1}{n} \sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2} \\ σ_{y}^{2} = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2} \end{array}$ $$\begin{array}{l} \overline x = \frac{1}{n}\sum\limits_{i = 1}^n {{x_i}} \\ \overline y = \frac{1}{n}\sum\limits_{i = 1}^n {{y_i}} \\ \sigma_x^2 = \frac{1}{n}\sum\limits_{i = 1}^n {{{({x_i} - \bar x)}^2}} \\ \sigma_y^2 = \frac{1}{n}\sum\limits_{i = 1}^n {{{({y_i} - \bar y)}^2}} \\ \end{array}$$

Re-order: (4) $x_{i}^{'} = \frac{x_{i} - \bar{x}}{σ_{x}}, y_{i}^{'} = \frac{y_{i} - \bar{y}}{σ_{y}}, i = 1, 2 \dots \dots, n$ $$x_i^\prime = \frac{{{x_i} - \overline x }}{{{\sigma_x}}},\quad y_i^\prime = \frac{{{y_i} - \overline y }}{{{\sigma_y}}},\quad i = 1,2 \cdots \cdots ,n$$

The sample after standardization meets the following conditions: (5) $\bar{x^{'}} = \frac{1}{n} \sum_{i = 1}^{n} x_{i}^{'} = 0, \bar{y^{'}} = \frac{1}{n} \sum_{i = 1}^{n} y_{i}^{'} = 0$ $$\overline {x'} = \frac{1}{n}\sum\limits_{i = 1}^n {{x'_i}} = 0,\quad \overline {y'} = \frac{1}{n}\sum\limits_{i = 1}^n {{{y'}_i}} = 0$$ (6) $σ_{x^{'}}^{2} = \frac{1}{n} \sum_{i = 1}^{n} x_{i}^{'}^{2} = 1, σ_{y^{'}}^{2} = \frac{1}{n} \sum_{i = 1}^{n} y_{i}^{'}^{2} = 1$ $$\sigma_{x'}^2 = \frac{1}{n}\sum\limits_{i = 1}^n {{x'_i}^2} = 1,\quad \sigma_{y'}^2 = \frac{1}{n}\sum\limits_{i = 1}^n {{{y'}_i}^2} = 1$$

Here, $\bar{x}, \bar{y}$ $$\overline x ,\overline y$$ is still used to represent the samples after standardization, and their variance and correlation coefficients can be calculated according to the following formula: (7) ${\begin{array}{l} σ_{x}^{2} = \frac{1}{n} \sum_{i = 1}^{n} x_{i}^{2} = \frac{1}{n} \bar{x^{'}} \bar{x} = 1 \\ σ_{y}^{2} = \frac{1}{n} \sum_{i = 1}^{n} y_{i}^{2} = \frac{1}{n} \bar{y^{'}} \bar{y} = 1 \\ Y_{x y} = \frac{1}{n} \sum_{i = 1}^{n} x_{i} y_{i} = \frac{1}{n} \bar{x^{'}} \bar{y} \end{array}$ $$\left\{ {\begin{array}{*{20}{l}} {\sigma_x^2 = \frac{1}{n}\sum\limits_{i = 1}^n {x_i^2} = \frac{1}{n}\overline {{x^\prime }} \bar x = 1} \\ {\sigma_y^2 = \frac{1}{n}\sum\limits_{i = 1}^n {y_i^2} = \frac{1}{n}\overline {{y^\prime }} \bar y = 1} \\ {{Y_{xy}} = \frac{1}{n}\sum\limits_{i = 1}^n {{x_i}} {y_i} = \frac{1}{n}\overline {{x^\prime }} \bar y} \end{array}} \right.$$

It can be shown that the random variables $\vec{x}, \vec{y}$ $$\overrightarrow x,\overrightarrow y$$ are uncorrelated, Y_xy = 0 and algebraically equivalent to their inner product ${\vec{x}}^{'} \vec{y} = 0$ $${\overrightarrow x'}\overrightarrow y=0$$ and geometrically the two vectors are directly intersecting.

For n samples with m variables each, the original data matrix is as follows: (8) $X = [\begin{matrix} x_{11} & x_{12} & \dots & x_{1 m} \\ x_{21} & x_{22} & \dots & x_{2 m} \\ \dots & \dots & \dots & \dots \\ x_{n 1} & x_{n 2} & \dots & x_{n m} \end{matrix}] = [\begin{matrix} \vec{x_{1}}, \vec{x_{2}}, \dots, \vec{x_{m}} \end{matrix}]$ $$X = \left[ {\begin{array}{*{20}{c}} {{x_{11}}}&{ {x_{12}}}& \cdots &{ {x_{1m}}} \\ {{x_{21}}}&{ {x_{22}}}& \cdots &{ {x_{2m}}} \\ \cdots & \cdots & \cdots & \cdots \\ {{x_{n1}}}&{ {x_{n2}}}& \cdots &{ {x_{nm}}} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {\overrightarrow {{x_1}} ,\overrightarrow {{x_2}} , \cdots ,\overrightarrow {{x_m}} } \end{array}} \right]$$

The column vector at the right end of the equation: (9) ${\vec{x}}_{j} = {(x_{1 j}, x_{2 j}, \dots, x_{n j})}^{'}, j = 1, 2, \dots, m$ $${\vec x_j} = {({x_{1j}},{x_{2j}}, \cdots ,{x_{nj}})^\prime },j = 1,2, \cdots ,m$$

The observation representing the jst variable on the n sample can be viewed as a point or vector in a n dimensional Euclidean space, here denoted by ${\vec{x}}_{j}$ $${\vec x_j}$$. The relationship between the original variables is studied by examining the positional relationship of these m points or vectors.

If the sample data is normalized, i.e., X a normalized matrix, there is: (10) ${\vec{x}}_{j} = \frac{1}{n} \sum_{i = 1}^{n} x_{i j} = 0$ $${\vec x_j} = \frac{1}{n}\sum\limits_{i = 1}^n {{x_{ij}}} = 0$$ (11) $σ_{j}^{2} = \frac{1}{n} \sum_{i = 1}^{n} x_{i j}^{2} = \frac{1}{n} {\vec{x}}_{j} {\vec{x}}_{j} = 1, 2, \dots, m$ $$\sigma_j^2 = \frac{1}{n}\sum\limits_{i = 1}^n {x_{ij}^2} = \frac{1}{n}{\vec x_j}{\vec x_j} = 1,2, \cdots ,m$$

Then, the correlation coefficient between ${\vec{x}}_{j}$ $${\vec x_j}$$ and ${\vec{x}}_{k}$ $${\vec x_k}$$ is, by Eq: (12) $Y_{j k} = \frac{1}{n} \sum_{i = 1}^{n} x_{i j} x_{i k} = \frac{1}{n} {\vec{x}}_{j}^{'} {\vec{x}}_{k}, j, k = 1, 2, \dots, m$ $${Y_{jk}} = \frac{1}{n}\sum\limits_{i = 1}^n {{x_{ij}}} {x_{ik}} = \frac{1}{n}\vec x_j^\prime{\vec x_k}\:,j,k = 1,2, \cdots ,m$$

The correlation coefficient matrix R consists of the correlation coefficients between the m variables: (13) $R = [\begin{matrix} r_{11} & r_{12} & \dots & r_{1 m} \\ r_{21} & r_{22} & \dots & r_{2 m} \\ \dots & \dots & \dots & \dots \\ r_{m 1} & r_{m 2} & \dots & r_{m m} \end{matrix}] = \frac{1}{n} x^{'} x$ $$R = \left[ {\begin{array}{*{20}{c}} {{r_{11}}}&{ {r_{12}}}& \cdots &{ {r_{1m}}} \\ {{r_{21}}}&{ {r_{22}}}& \cdots &{ {r_{2m}}} \\ \cdots & \cdots & \cdots & \cdots \\ {{r_{m1}}}&{ {r_{m2}}}& \cdots &{ {r_{mm}}} \end{array}} \right] = \frac{1}{n}x'x$$

The correlation coefficient matrix R is symmetric and at least semi-positive definite, which means that all its eigenvalues are non-negative.

The correlation coefficient matrix is the starting point of the factor analysis method and an important part of factor analysis is to study the structure of the correlation matrix [21]. Also in factor analysis, we are often involved in the correlation coefficient matrix between two sets of variables, assuming that in addition to the previous m random variables, there are another p random variables, the matrix is as follows: (14) $y = [\begin{matrix} y_{11} & y_{12} & \dots & y_{1 p} \\ y_{21} & y_{22} & \dots & y_{2 p} \\ \dots & \dots & \dots & \dots \\ y_{n 1} & y_{n 2} & \dots & y_{n p} \end{matrix}] = [\begin{matrix} {\bar{y}}_{1}, {\bar{y}}_{2}, \dots, {\bar{y}}_{p} \end{matrix}]$ $$y = \left[ {\begin{array}{*{20}{c}} {{y_{11}}}&{ {y_{12}}}& \cdots &{ {y_{1p}}} \\ {{y_{21}}}&{ {y_{22}}}& \cdots &{ {y_{2p}}} \\ \cdots & \cdots & \cdots & \cdots \\ {{y_{n1}}}&{ {y_{n2}}}& \cdots &{ {y_{np}}} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {{{\bar y}_1},{{\bar y}_2}, \cdots ,{{\bar y}_p}} \end{array}} \right]$$

Assuming all standardized data, the correlation coefficient between ${\vec{y}}_{k}$ $${\vec y_k}$$ and ${\vec{x}}_{j}$ $${\vec x_j}$$ is given by Eq: (15) $S_{k j} = \frac{1}{n} {\vec{y}}_{k} {\vec{x}}_{j}, k = 1, 2, \dots, p; j = 1, 2, \dots, m$ $${S_{kj}} = \frac{1}{n}{\vec y_k}{\vec x_j},k = 1,2, \cdots ,p;j = 1,2, \cdots ,m$$

Written in matrix form as follows: (16) $S_{p \times m} = [\begin{matrix} S_{11} & S_{12} & \dots & S_{1 m} \\ S_{21} & S_{22} & \dots & S_{2 m} \\ \dots & \dots & \dots & \dots \\ S_{p 1} & S_{p 2} & \dots & S_{p m} \end{matrix}] = [\begin{matrix} \frac{1 -}{n} y_{1} x_{1} & \frac{1 -}{n} y_{1} x_{2} & \dots & \frac{1 -}{n} y_{1}^{'} x_{m} \\ \frac{1 -}{n} y_{2} x_{1} & \frac{1 -}{n} y_{2}^{'} x_{2} & \dots & \frac{1 -}{n} y_{2}^{'} x_{m} \\ \dots & \dots & \dots & \dots \\ \frac{1 -}{n} y_{p}^{'} x_{1} & \frac{1 -}{n} y_{p}^{'} x_{2} & \dots & \frac{1 -}{n} y_{p}^{'} x_{m} \end{matrix}] = \frac{1}{n} [\begin{matrix} {\vec{y}}_{1}^{'} \\ {\vec{y}}_{2}^{'} \\ ⋮ \\ {\vec{y}}_{p}^{'} \end{matrix}] [\begin{matrix} {\vec{x}}_{1}, {\vec{x}}_{2}, \dots, {\vec{x}}_{m} \\ {\vec{x}}_{1}, {\vec{x}}_{2}, \dots, {\vec{x}}_{m} \end{matrix}] = \frac{1}{n} Y X$ $${S_{p \times m}} = \left[ {\begin{array}{*{20}{c}} {{S_{11}}}&{ {S_{12}}}& \cdots &{ {S_{1m}}} \\ {{S_{21}}}&{ {S_{22}}}& \cdots &{ {S_{2m}}} \\ \cdots & \cdots & \cdots & \cdots \\ {{S_{p1}}}&{ {S_{p2}}}& \cdots &{ {S_{pm}}} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {\frac{{1 - }}{n}{y_1}{x_1}}&{ \frac{{1 - }}{n}{y_1}{x_2}}& \cdots &{ \frac{{1 - }}{n}y_1^\prime {x_m}} \\ {\frac{{1 - }}{n}{y_2}{x_1}}&{ \frac{{1 - }}{n}y_2^\prime {x_2}}& \cdots &{ \frac{{1 - }}{n}y_2^\prime {x_m}} \\ \cdots & \cdots & \cdots & \cdots \\ {\frac{{1 - }}{n}y_p^\prime {x_1}}&{ \frac{{1 - }}{n}y_p^\prime {x_2}}& \cdots &{ \frac{{1 - }}{n}y_p^\prime {x_m}} \end{array}} \right] = \frac{1}{n}\left[ {\begin{array}{*{20}{c}} {\vec y_1^\prime } \\ {\vec y_2^\prime } \\ \vdots \\ {\vec y_p^\prime } \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {{{\vec x}_1},{{\vec x}_2}, \cdots ,{{\vec x}_m}} \\ {{{\vec x}_1},{{\vec x}_2}, \cdots ,{{\vec x}_m}} \end{array}} \right] = \frac{1}{n}YX$$

2.2

Mathematical model for factor analysis

The common factor of factor analysis can, in fact, be expressed in the following linear algebraic form: (17) ${\begin{matrix} \vec{x_{1}} = a_{11} {\bar{f}}_{1} + a_{21} {\bar{f}}_{2} + \dots + a_{p 1} {\bar{f}}_{p} + μ_{1} {\bar{ε}}_{1} \\ \vec{x_{2}} = a_{12} {\bar{f}}_{1} + a_{22} {\bar{f}}_{2} + \dots + a_{p 2} {\bar{f}}_{p} + μ_{2} {\bar{ε}}_{2} \\ \dots \\ \vec{x_{m}} = a_{1 m} {\bar{f}}_{1} + a_{2 m} {\bar{f}}_{2} + \dots + a_{p m} {\bar{f}}_{p} + μ_{m} {\bar{ε}}_{m} \end{matrix}$ $$\left\{ {\begin{array}{*{20}{c}} {\overrightarrow {{x_1}} = {a_{11}}{{\bar f}_1} + {a_{21}}{{\bar f}_2} + \cdots + {a_{p1}}{{\bar f}_p} + {\mu_1}{{\bar \varepsilon }_1}} \\ {\overrightarrow {{x_2}} = {a_{12}}{{\bar f}_1} + {a_{22}}{{\bar f}_2} + \cdots + {a_{p2}}{{\bar f}_p} + {\mu_2}{{\bar \varepsilon }_2}} \\ \cdots \\ {\overrightarrow {{x_m}} = {a_{1m}}{{\bar f}_1} + {a_{2m}}{{\bar f}_2} + \cdots + {a_{pm}}{{\bar f}_p} + {\mu_m}{{\bar \varepsilon }_m}} \end{array}} \right.$$

Abbreviated into: (18) ${\vec{x}}_{j} = \sum_{k = 1}^{p} a_{k j} {\bar{f}}_{k} + μ_{j} {\vec{ε}}_{j}, j = 1, 2, \dots, m$ $${\vec x_j} = \sum\limits_{k = 1}^p {{{\text{a}}_{kj}}} {\bar f_k} + {\mu_j}{\vec \varepsilon_j},j = 1,2, \cdots ,m$$

Where ${\vec{f}}_{1}, {\vec{f}}_{2}, \dots \dots, {\vec{f}}_{p}$ $${\vec f_1},{\vec f_2}, \ldots \ldots ,{\vec f_p}$$ and ${\vec{ε}}_{1}, {\vec{ε}}_{2}, \dots \dots, {\vec{ε}}_{m}$ $${\vec \varepsilon_1},{\vec \varepsilon_2}, \ldots \ldots ,{\vec \varepsilon_m}$$ are the new variables sought, the former is the common factor can be understood as commonality. The latter is called the single factor, or the individuality factor. Positive integer P represents the number of common factors, which is much smaller than the original number of variables m, the formula means to simplify the original m variables into a small number of factors, the coefficients a_kj and μ_j(j = 1, 2, ⋯⋯, m; k = 1, 2, ⋯⋯, p) are called factor loadings or factor loadings, the former is called the common factor loadings, the latter is called the single factor loadings, since we are concerned only with the common factors, usually referred to as factor loadings refers only to the former.

Notation: (19) $A = {[\begin{matrix} a_{11} & a_{12} & \dots & a_{1 m} \\ a_{21} & a_{22} & \dots & a_{2 m} \\ \dots & \dots & \dots & \dots \\ a_{p 1} & a_{p 2} & \dots & a_{p m} \end{matrix}]}_{p \times m}$ $$A = {\left[ {\begin{array}{*{20}{c}} {{a_{11}}}&{ {a_{12}}}& \cdots &{ {a_{1m}}} \\ {{a_{21}}}&{ {a_{22}}}& \cdots &{ {a_{2m}}} \\ \cdots & \cdots & \cdots & \cdots \\ {{a_{p1}}}&{ {a_{p2}}}& \cdots &{ {a_{pm}}} \end{array}} \right]_{p \times m}}$$

where a_kj is the loading of the jnd variable on the krd factor (k = 1, 2, ……, p; j = 1, 2, ……., m). (20) $F = [\begin{matrix} {\bar{f}}_{1}, {\bar{f}}_{2}, \dots, {\bar{f}}_{p} \end{matrix}] = {[\begin{matrix} f_{11} & f_{12} & \dots & f_{1 p} \\ f_{21} & f_{22} & \dots & f_{2 p} \\ \dots & \dots & \dots & \dots \\ f_{n 1} & f_{n 2} & \dots & f_{n p} \end{matrix}]}_{n \times p}$ $$F = \left[ {\begin{array}{*{20}{c}} {{{\bar f}_1},{{\bar f}_2}, \cdots ,{{\bar f}_p}} \end{array}} \right] = {\left[ {\begin{array}{*{20}{c}} {{f_{11}}}&{ {f_{12}}}& \cdots &{ {f_{1p}}} \\ {{f_{21}}}&{ {f_{22}}}& \cdots &{ {f_{2p}}} \\ \cdots & \cdots & \cdots & \cdots \\ {{f_{n1}}}&{ {f_{n2}}}& \cdots &{ {f_{np}}} \end{array}} \right]_{n \times p}}$$

Where column k is the value of the knd factor on each specimen, this matrix is called the factorial measure. (21) $U = {[\begin{matrix} u_{1} & 0 & \dots & 0 \\ 0 & u_{2} & \dots & 0 \\ ... & ... & ... & ... \\ 0 & 0 & \dots & u_{m} \end{matrix}]}_{m \times m}$ $$U = {\left[ {\begin{array}{*{20}{c}} {{u_1}}&0& \cdots &0 \\ 0&{ {u_2}}& \cdots &0 \\ {...}&{ ...}&{ ...}&{ ...} \\ 0&0& \cdots &{ {u_m}} \end{array}} \right]_{m \times m}}$$

This is the mst order diagonal matrix where the jnd diagonal element u_j is the loading (j = 1, 2, ……, m) of variable X_j on a single factor ε_j. (22) $E = [\begin{matrix} {\vec{ε}}_{1}, {\vec{ε}}_{2}, \dots, {\vec{ε}}_{m} \end{matrix}] = [\begin{matrix} ε_{11} & ε_{12} & \dots & ε_{1 m} \\ ε_{21} & ε_{22} & \dots & ε_{2 m} \\ \dots & \dots & \dots & \dots \\ ε_{n 1} & ε_{n 2} & \dots & ε_{n m} \end{matrix}]$ $$E = \left[ {\begin{array}{*{20}{c}} {{{\vec \varepsilon }_1},{{\vec \varepsilon }_2}, \cdots ,{{\vec \varepsilon }_m}} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {{\varepsilon_{11}}}&{ {\varepsilon_{12}}}& \cdots &{ {\varepsilon_{1m}}} \\ {{\varepsilon_{21}}}&{ {\varepsilon_{22}}}& \cdots &{ {\varepsilon_{2m}}} \\ \cdots & \cdots & \cdots & \cdots \\ {{\varepsilon_{n1}}}&{ {\varepsilon_{n2}}}& \cdots &{ {\varepsilon_{nm}}} \end{array}} \right]$$

where column j is the value of ε_j on each specimen. Then Eq. can be rewritten in the following form: (23) $[\begin{matrix} {\tilde{x}}_{1}, {\tilde{x}}_{2}, \dots, {\tilde{x}}_{m} \end{matrix}] = [\begin{matrix} {\bar{f}}_{1}, {\bar{f}}_{2}, \dots, {\bar{f}}_{p} \end{matrix}] [\begin{matrix} a_{11} & a_{12} & \dots & a_{1 m} \\ a_{21} & a_{22} & \dots & a_{2 m} \\ \dots & \dots & \dots & \dots \\ a_{p 1} & a_{p 2} & \dots & a_{p m} \end{matrix}] + [\begin{matrix} {\vec{ε}}_{1}, {\vec{ε}}_{2}, \dots, {\vec{ε}}_{m} \end{matrix}] [\begin{matrix} u_{1} & 0 & \dots & 0 \\ 0 & u_{2} & \dots & 0 \\ \dots & \dots & \dots & \dots \\ 0 & 0 & \dots & u_{m} \end{matrix}]$ $$\left[ {\begin{array}{*{20}{c}} {{{\tilde x}_1},{{\tilde x}_2}, \cdots ,{{\tilde x}_m}} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {{{\bar f}_1},{{\bar f}_2}, \cdots ,{{\bar f}_p}} \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {{a_{11}}}&{ {a_{12}}}& \cdots &{ {a_{1m}}} \\ {{a_{21}}}&{ {a_{22}}}& \cdots &{ {a_{2m}}} \\ \cdots & \cdots & \cdots & \cdots \\ {{a_{p1}}}&{ {a_{p2}}}& \cdots &{ {a_{pm}}} \end{array}} \right] + \left[ {\begin{array}{*{20}{c}} {{{\vec \varepsilon }_1},{{\vec \varepsilon }_2}, \cdots ,{{\vec \varepsilon }_m}} \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {{u_1}}&0& \cdots &0 \\ 0&{ {u_2}}& \cdots &0 \\ \cdots & \cdots & \cdots & \cdots \\ 0&0& \cdots &{ {u_m}} \end{array}} \right]$$ (24) $X = F A + E U$ $$X = FA + EU$$

2.3

Factor loads

We already know that the original variables $\vec{x_{j}}$ $$\overrightarrow{x_j}$$ in Eq. are all standardized variables, now assume again that both the public factor ${\vec{f}}_{k} (k = 1, 2, \dots \dots, p)$ $${\vec f_k}(k = 1,2, \cdots \cdots ,p)$$ and the single factor ${\vec{ε}}_{j} (j = 1, 2, \dots \dots, m)$ $${\vec \varepsilon_j}(j = 1,2, \cdots \cdots ,m)$$ to be solved are also standardized variables.

And the correlation coefficients between all the common factors and between the single factors are 0. Then, there is the following relationship according to Eq: (25) ${\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} f_{i k} = 0, k = 1, 2, \dots, p \\ \frac{1}{n} \sum_{i = 1}^{n} ε_{i j} = 0, j = 1, 2, \dots, m \\ \frac{1}{n} {\vec{f}}_{k}^{'} {\vec{f}}_{l} = δ_{k l} = {\begin{array}{l} 1, & k = l \\ 0, & k \neq l \end{array} k, l = 1, 2, \dots, p \\ \frac{1}{n} {\vec{ε}}_{j}^{'} {\vec{ε}}_{q} = δ_{j q} = {\begin{array}{l} 1, & j = q \\ 0, & j \neq q \end{array} j, q = 1, 2, \dots, m \\ \frac{1}{n} {\vec{f}}_{k}^{'} {\vec{ε}}_{j} = 0, k = 1, 2, \dots, p; j = 1, 2, \dots, m \end{matrix}$ $$\left\{ {\begin{array}{*{20}{c}} {\frac{1}{n}\sum\limits_{i = 1}^n {{f_{ik}}} = 0,k = 1,2, \cdots ,p} \\ {\frac{1}{n}\sum\limits_{i = 1}^n {{\varepsilon_{ij}}} = 0,j = 1,2, \cdots ,m} \\ {\frac{1}{n}\vec f_k^\prime{{\vec f}_\ell } = {\delta_{k\ell }} = \left\{ {\begin{array}{*{20}{l}} {1,}&{ k = \ell } \\ {0,}&{ k \ne \ell } \end{array}} \right.k,\ell = 1,2, \cdots ,p} \\ {\frac{1}{n}\vec \varepsilon_j^\prime {{\vec \varepsilon }_q} = {\delta_{jq}} = \left\{ {\begin{array}{*{20}{l}} {1,}&{ j = q} \\ {0,}&{ j \ne q} \end{array}} \right.j,q = 1,2, \cdots ,m} \\ {\frac{1}{n}{{\vec f}_k}^\prime {{\vec \varepsilon }_j} = 0,k = 1,2, \cdots ,p;j = 1,2, \cdots ,m} \end{array}} \right.$$

These relational equations are written in matrix form and the correlation matrix between the metrics is obtained from Eq: (26) $\frac{1}{n} F^{'} F = \frac{1}{n} [\begin{matrix} {\bar{f}}_{1}^{'} \\ {\bar{f}}_{2}^{'} \\ ⋮ \\ {\bar{f}}_{p}^{'} \end{matrix}] [\begin{matrix} {\bar{f}}_{1}, {\bar{f}}_{2}, \dots, {\bar{f}}_{p} \end{matrix}] = [\begin{matrix} \frac{1}{n} {\bar{f}}_{1}^{'} {\bar{f}}_{1} & \frac{1}{n} {\bar{f}}_{1}^{'} {\bar{f}}_{2} & \dots & \frac{1}{n} {\bar{f}}_{1}^{'} {\bar{f}}_{p} \\ \frac{1}{n} {\bar{f}}_{2}^{'} {\bar{f}}_{1} & \frac{1}{n} {\bar{f}}_{2}^{'} {\bar{f}}_{2} & \dots & \frac{1}{n} {\bar{f}}_{2}^{'} {\bar{f}}_{p} \\ \dots & \dots & \dots & \dots \\ \frac{1}{n} {\bar{f}}_{p}^{'} {\bar{f}}_{1} & \frac{1}{n} {\bar{f}}_{p}^{'} {\bar{f}}_{2} & \dots & \frac{1}{n} {\bar{f}}_{p}^{'} {\bar{f}}_{p} \end{matrix}] = [\begin{matrix} 1 & 0 & \dots & 0 \\ 0 & 1 & \dots & 0 \\ \dots & \dots & \dots & \dots \\ 0 & 0 & \dots & 1 \end{matrix}] = I_{p}$ $$\frac{1}{n}F'F = \frac{1}{n}\left[ {\begin{array}{*{20}{c}} {\bar f_1^\prime } \\ {\bar f_2^\prime } \\ \vdots \\ {\bar f_p^\prime } \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {{{\bar f}_1},{{\bar f}_2}, \cdots ,{{\bar f}_p}} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {\frac{1}{n}\bar f_1^\prime {{\bar f}_1}}&{ \frac{1}{n}\bar f_1^\prime {{\bar f}_2}}& \cdots &{ \frac{1}{n}\bar f_1^\prime {{\bar f}_p}} \\ {\frac{1}{n}\bar f_2^\prime {{\bar f}_1}}&{ \frac{1}{n}\bar f_2^\prime {{\bar f}_2}}& \cdots &{ \frac{1}{n}\bar f_2^\prime {{\bar f}_p}} \\ \cdots & \cdots & \cdots & \cdots \\ {\frac{1}{n}\bar f_p^\prime {{\bar f}_1}}&{ \frac{1}{n}\bar f_p^\prime {{\bar f}_2}}& \cdots &{ \frac{1}{n}\bar f_p^\prime {{\bar f}_p}} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} 1&0& \cdots &0 \\ 0&1& \cdots &0 \\ \cdots & \cdots & \cdots & \cdots \\ 0&0& \cdots &1 \end{array}} \right] = {I_p}$$

where I_p is a unit matrix of order p. Similarly, the correlation matrix between the single factors can be obtained as: (27) $\frac{1}{n} E^{'} E = I_{m}$ $$\frac{1}{n}{E^\prime }E = {I_m}$$

Then the correlation matrix between the common factor and the single factor is: (28) $\frac{1}{n} F^{'} E = \frac{1}{n} [\begin{matrix} {\bar{f}}_{1}^{'} \\ {\bar{f}}_{2}^{'} \\ ⋮ \\ {\bar{f}}_{p}^{'} \end{matrix}] [\begin{matrix} {\vec{ε}}_{1}, {\vec{ε}}_{2}, \dots, {\vec{ε}}_{m} \end{matrix}] = [\begin{matrix} \frac{1}{n} {\bar{f}}_{1}^{'} {\bar{f}}_{1} & \frac{1}{n} {\bar{f}}_{1}^{'} {\bar{f}}_{2} & \dots & \frac{1}{n} {\bar{f}}_{1}^{'} {\bar{f}}_{p} \\ \frac{1}{n} {\bar{f}}_{2}^{'} {\bar{f}}_{1} & \frac{1}{n} {\bar{f}}_{2}^{'} {\bar{f}}_{2} & \dots & \frac{1}{n} {\bar{f}}_{2}^{'} {\bar{f}}_{p} \\ \dots & \dots & \dots & \dots \\ \frac{1}{n} {\bar{f}}_{p}^{'} {\bar{f}}_{1} & \frac{1}{n} {\bar{f}}_{p}^{'} {\bar{f}}_{2} & \dots & \frac{1}{n} {\bar{f}}_{p}^{'} {\bar{f}}_{p} \end{matrix}] = [\begin{matrix} 1 & 0 & \dots & 0 \\ 0 & 1 & \dots & 0 \\ \dots & \dots & \dots & \dots \\ 0 & 0 & \dots & 1 \end{matrix}] = H$ $$\frac{1}{n}{F^\prime }E = \frac{1}{n}\left[ {\begin{array}{*{20}{c}} {\bar f_1^\prime } \\ {\bar f_2^\prime } \\ \vdots \\ {\bar f_p^\prime } \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {{{\vec \varepsilon }_1},{{\vec \varepsilon }_2}, \cdots ,{{\vec \varepsilon }_m}} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {\frac{1}{n}\bar f_1^\prime {{\bar f}_1}}&{ \frac{1}{n}\bar f_1^\prime {{\bar f}_2}}& \cdots &{ \frac{1}{n}\bar f_1^\prime {{\bar f}_p}} \\ {\frac{1}{n}\bar f_2^\prime {{\bar f}_1}}&{ \frac{1}{n}\bar f_2^\prime {{\bar f}_2}}& \cdots &{ \frac{1}{n}\bar f_2^\prime {{\bar f}_p}} \\ \cdots & \cdots & \cdots & \cdots \\ {\frac{1}{n}\bar f_p^\prime {{\bar f}_1}}&{ \frac{1}{n}\bar f_p^\prime {{\bar f}_2}}& \cdots &{ \frac{1}{n}\bar f_p^\prime {{\bar f}_p}} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} 1&0& \cdots &0 \\ 0&1& \cdots &0 \\ \cdots & \cdots & \cdots & \cdots \\ 0&0& \cdots &1 \end{array}} \right] = H$$

According to Eq. the correlation matrix between the obtained public factors and the original variables can be obtained: (29) $[\begin{matrix} \frac{1}{n} {\bar{f}}_{1}^{'} {\bar{f}}_{1} & \frac{1}{n} {\bar{f}}_{1}^{'} {\bar{f}}_{2} & \dots & \frac{1}{n} {\bar{f}}_{1}^{'} {\bar{f}}_{p} \\ \frac{1}{n} {\bar{f}}_{2}^{'} {\bar{f}}_{1} & \frac{1}{n} {\bar{f}}_{2}^{'} {\bar{f}}_{2} & \dots & \frac{1}{n} {\bar{f}}_{2}^{'} {\bar{f}}_{p} \\ \dots & \dots & \dots & \dots \\ \frac{1}{n} {\bar{f}}_{p}^{'} {\bar{f}}_{1} & \frac{1}{n} {\bar{f}}_{p}^{'} {\bar{f}}_{2} & \dots & \frac{1}{n} {\bar{f}}_{p}^{'} {\bar{f}}_{p} \end{matrix}] \frac{1}{n} [\begin{matrix} {\bar{f}}_{1}^{'} \\ {\bar{f}}_{2}^{'} \\ ⋮ \\ {\bar{f}}_{p}^{'} \end{matrix}] [\begin{matrix} {\vec{x}}_{1}, {\vec{x}}_{2}, \dots, {\vec{x}}_{m} \end{matrix}] = \frac{1}{n} F X = \frac{1}{n} F F A + \frac{1}{n} F E U = I_{p} A + H U = A$ $$\left[ {\begin{array}{*{20}{c}} {\frac{1}{n}\bar f_1^\prime {{\bar f}_1}}&{ \frac{1}{n}\bar f_1^\prime {{\bar f}_2}}& \cdots &{ \frac{1}{n}\bar f_1^\prime {{\bar f}_p}} \\ {\frac{1}{n}\bar f_2^\prime {{\bar f}_1}}&{ \frac{1}{n}\bar f_2^\prime {{\bar f}_2}}& \cdots &{ \frac{1}{n}\bar f_2^\prime {{\bar f}_p}} \\ \cdots & \cdots & \cdots & \cdots \\ {\frac{1}{n}\bar f_p^\prime {{\bar f}_1}}&{ \frac{1}{n}\bar f_p^\prime {{\bar f}_2}}& \cdots &{ \frac{1}{n}\bar f_p^\prime {{\bar f}_p}} \end{array}} \right]\frac{1}{n}\left[ {\begin{array}{*{20}{c}} {\bar f_1^\prime } \\ {\bar f_2^\prime } \\ \vdots \\ {\bar f_p^\prime } \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {{{\vec x}_1},{{\vec x}_2}, \cdots ,{{\vec x}_m}} \end{array}} \right] = \frac{1}{n}FX = \frac{1}{n}FFA + \frac{1}{n}FEU = {I_p}A + HU = A$$

where H is the zero matrix.

This shows is the correlation coefficient between the common factor and the original variable for the elements in the factor loading A: (30) $\frac{1}{n} {\vec{f}}_{k^{'}} {\vec{x}}_{j} = a_{k j}, k = 1, 2, \dots, p; j = 1, 2, \dots, m$ $$\frac{1}{n}{\vec f_{k'}}{\vec x_j} = {{\text{a}}_{kj}},k = 1,2, \cdots ,p;j = 1,2, \cdots ,m$$

Factor loading a_kj reflects the link between factor ${\bar{f}}_{k}$ $${\bar f_k}$$ and variable ${\bar{x}}_{j}$ $${\bar x_j}$$. When a_kj > 0, it indicates a positive correlation between factor ${\bar{f}}_{k}$ $${\bar f_k}$$ and variable x_j. When a_kj < 0, it indicates an inverse correlation between factor ${\vec{f}}_{k}$ $${\vec f_k}$$ and variable ${\vec{x}}_{j}$ $${\vec x_j}$$. When a_kj ≈ 0, indicates a weak link between factor ${\vec{f}}_{k}$ $${\vec f_k}$$ and variable ${\vec{x}}_{j}$ $${\vec x_j}$$, the role of a_kj can be seen more clearly.

The correlation array R can be expressed as from Eq: (31) $\begin{array}{rcl} R & = & \frac{1}{n} X^{'} X = \frac{1}{n} {(F A + E U)}^{'} (F A + E U) = \frac{1}{n} A^{'} F^{'} F A + \frac{1}{n} U^{'} E^{'} F A + \frac{1}{n} A^{'} F^{'} E U + \frac{1}{n} U^{'} E^{'} E U \\ = & A^{'} (\frac{1}{u} F^{'} F) A + U^{'} (\frac{1}{u} E^{'} F) A + A^{'} (\frac{1}{u} F^{'} E) U + U^{'} (\frac{1}{u} E^{'} E) U = A^{'} A + U^{'} U \\ = & R^{*} + [\begin{matrix} u_{1}^{2} & 0 & \dots & 0 \\ 0 & u_{2}^{2} & \dots & 0 \\ \dots & \dots & \dots & \dots \\ 0 & 0 & \dots & u_{m}^{2} \end{matrix}] \end{array}$ $$\begin{array}{rcl} R &=& \frac{1}{n}X'X = \frac{1}{n}{(FA + EU)^\prime }(FA + EU) = \frac{1}{n}A'F'FA + \frac{1}{n}U'E'FA + \frac{1}{n}A'F'EU + \frac{1}{n}U'E'EU \\ &=& {A^\prime }(\frac{1}{u}{F^\prime }F)A + {U^\prime }(\frac{1}{u}{E^\prime }F)A + {A^\prime }(\frac{1}{u}{F^\prime }E)U + {U^\prime }(\frac{1}{u}{E^\prime }E)U = {A^\prime }A + {U^\prime }U \\ &=& {R^*} + \left[ {\begin{array}{*{20}{c}} {u_1^2}&0& \cdots &0 \\ 0&{ u_2^2}& \cdots &0 \\ \cdots & \cdots & \cdots & \cdots \\ 0&0& \cdots &{ u_m^2} \end{array}} \right] \\ \end{array}$$

Among them: (32) $R^{*} = A^{'} A$ $${R^*} = {A^\prime}A$$

is called the approximate correlation matrix, the raw variable correlation coefficients, and the non-diagonal elements are the same as in R. (33) $r_{i j} = \sum_{k = 1}^{p} a_{k i} a_{k j}, i, j = 1, 2, \dots, m$ $${r_{ij}} = \sum\limits_{k = 1}^p {{a_{ki}}} {a_{kj}},i,j = 1,2, \cdots ,m$$

The diagonal element of R^* is: (34) $h_{j}^{2} = \sum_{k = 1}^{p} a_{k j}^{2} = 1 - u_{j}^{2}, j = 1, 2, \dots, m$ $${h_j}^2 = \sum\limits_{k = 1}^p {a_{kj}^2} = 1 - {u_j}^2,j = 1,2, \cdots ,m$$

$h_{j}^{2}$ $$h_j^2$$ is called the metric variance of variable ${\vec{x}}_{j}$ $${\vec x_j}$$, which represents the share of each metric in the variance of the original variable, and is numerically equal to the sum of the squares of the elements in column j of A. The common factor variance represents the extent to which all of the original variables can be explained by these p common factors, and takes values between 0 and 1, which are positively correlated.

The sum of the common factor variances is shown below: (35) $\sum_{j = 1}^{m} h_{j}^{2} = \sum_{j = 1}^{m} \sum_{k = 1}^{p} a_{k j}^{2} = \sum_{k = 1}^{p} \sum_{j = 1}^{m} a_{k j}^{2} = \sum_{k = 1}^{p} S_{k}^{2}$ $$\sum\limits_{j = 1}^m {{h_j}^2} = \sum\limits_{j = 1}^m {\sum\limits_{k = 1}^p {{a_{kj}}^2} } = \sum\limits_{k = 1}^p {\sum\limits_{j = 1}^m {{a_{kj}}^2} } = \sum\limits_{k = 1}^p {{S_k}^2}$$

Among them: (36) $S_{k}^{2} = \sum_{j = 1}^{m} a_{k j}^{2}, k = 1, 2, \dots, p$ $${S_k}^2 = \sum\limits_{j = 1}^m {a_{kj}^2} ,k = 1,2, \cdots ,p$$

is called the variance contribution of factor ${\bar{f}}_{k}$ $${\bar f_k}$$ and is numerically equal to the sum of the squares of the elements in row k of A. It indicates the degree of contribution, or significance, that factor ${\bar{f}}_{k}$ $${\bar f_k}$$ plays in all of the common factors, and is again positively correlated.

2.4

Geometric Interpretation

All the attributes, variables or indicators, known or calculated, in Eq. can be regarded as vectors in a n-dimensional Euclidean space. Since they are all normalized variables, the squares of their modulus lengths are n, i.e., the lengths of the vectors are all $\sqrt{n}$ $$\sqrt n$$. From Eq. the vectors corresponding to all the factors are orthogonal to each other two by two, and they form the base of a p + m subspace, Eq. is the expansion of the variables ${\vec{x}}_{j}$ $${\vec x_j}$$ within this set of bases, and the factor loadings are the coordinates of ${\vec{x}}_{j}$ $${\vec x_j}$$ the variables within this set of bases. The variables and the projections of the variables into the common factor space are shown in Figure 1. The common factor space refers to the p dimensional subspace generated by the p common factors, and the projection of variable ${\vec{x}}_{j}$ $${\vec x_j}$$ in the common factor space is: (37) ${\vec{x}}_{j}^{*} = a_{1 j} {\vec{f}}_{1} + a_{2 j} {\vec{f}}_{2} + \dots + a_{p j} {\vec{f}}_{p} = \sum_{k = 1}^{p} a_{k j} {\vec{f}}_{k}, j = 1, 2, \dots, m$ $$\vec x_j^* = {a_{1j}}{\vec f_1} + {a_{2j}}{\vec f_2} + \cdots + {a_{pj}}{\vec f_p} = \sum\limits_{k = 1}^p {{a_{kj}}} {\vec f_k},j = 1,2, \cdots ,m$$

This ${\vec{x}}_{j}^{*}$ $$\vec x_j^*$$ is called the projective variable of ${\vec{x}}_{j}$ $${\vec x_j}$$, which is the square of its length: (38) $| | {\vec{x}}_{j}^{*} | |^{2} = \vec{x} {_j^*^{'}} {\vec{x}}_{j}^{*} = {(\sum_{k = 1}^{p} a_{k j} {\bar{f}}_{k})}^{'} (\sum_{l = 1}^{p} a_{l j} {\bar{f}}_{l}) = \sum_{k = 1}^{p} \sum_{l = 1}^{p} a_{k j} a_{i j} {\bar{f}}_{k}^{'} {\bar{f}}_{l} = \sum_{k = 1}^{p} a_{k j}^{2} {\bar{f}}_{k}^{'} {\bar{f}}_{k} = \sum_{k = 1}^{p} a_{k j}^{2} = n h_{j}^{2}$ $$||\vec x_j^*|{|^2} = \vec x_j^{*\prime} \vec x_j^* = {(\sum\limits_{k = 1}^p {{a_{kj}}} {\bar f_k})^\prime }(\sum\limits_{\ell = 1}^p {{a_{\ell j}}} {\bar f_\ell }) = \sum\limits_{k = 1}^p {\sum\limits_{\ell = 1}^p {{a_{kj}}} } {a_{ij}}\bar f_k^\prime {\bar f_\ell } = \sum\limits_{k = 1}^p {a_{kj}^2} \bar f_k^\prime {\bar f_k} = \sum\limits_{k = 1}^p {a_{kj}^2} = nh_j^2$$

Also from $| | {\vec{x}}_{j} | |^{2} = n$ $$||{\vec x_j}|{|^2} = n$$ you can get the square of the cosine of the angle between vector ${\vec{x}}_{j}$ $${\vec x_j}$$ and the projected vector ${\vec{x}}_{j}^{*}$ $$\vec x_j^*$$ as: (39) $\cos^{2} θ = {(\frac{| | {\vec{x}}_{j} | |}{| | {\vec{x}}_{j} | |})}^{2} = \frac{n h_{j}^{2}}{n} = h_{j}^{2}$ $${\cos^2}\theta = {(\frac{{||{{\vec x}_j}||}}{{||{{\vec x}_j}||}})^2} = \frac{{n{h_j}^2}}{n} = {h_j}^2$$

3

Application of comprehensive evaluation of student achievement based on factor analysis

3.1

Modeling

3.1.1

KMO and Bartlett’s test

1)

KMO test

According to the common KMO criteria43 given by Kaiser to determine whether it is suitable for factor analysis, the KMO criteria are shown in Table 1.

2)

Bartlett’s test of sphericity

Bartlett’s spherical test is a type of test to check the degree of association between different variables.

3)

Based on the above principles, KMO and Bartlett’s test are done on the data of students’ performance, KMO and Bartlett’s test are shown in Table 2. Since larger KMO values are more favorable for factor analysis, as seen from the output of the table: the KMO metric value in this case is 0.965, and the test probability Sig value of the KMO and Bartlett’s spherical test values is 0.000. This is because the chi-square value is too large and the Sig value is much less than 0.05. Therefore, the coefficient matrices of the case-observed variables in this study are unlikely to be unit matrices and should be able to express multidimensional numerical relationships, making them well suited for factor analysis.

Table 1.

KMO standard

Is it appropriate to do factorial analysis	Perfect for	Fit	Basic fit	Reluctance	Discomfort
Score	K>=0.9	0.9>K>=0.8	0.8>K>=0.7	0.7>K>=0.6	K<=0.6

Table 2.

KMO and bartlett test

KMO sampling availability number		0.955
Bartlett sphericity test	15205.074	15215.621
	.351	0.362
	Significance	0.000

3.1.2

Common factor variance

The variance of the common factor is shown in Table 3. From the table, we can see that the variance of the common factor of all the “extracted” variables is mostly in the range of 0.403-0.782, so the factor is considered to be basically indicative of the variance of each course.

Table 3.

Common factor variance

	Initial	Extraction
Pathology	1.000	0.66
Formulology	1.000	0.782
The golden chamber is slightly read	1.000	0.664
Internal reading	1.000	0.671
Chilling theory	1.000	0.689
Physiology	1.000	0.655
Biochemistry	1.000	0.769
Microbial parasitology	1.000	0.619
Epidemiology	1.000	0.553
Western medicine	1.000	0.611
Pharmacology	1.000	0.748
Medical history	1.000	0.545
Acupuncture	1.000	0.703
Diagnostic foundation	1.000	0.619
Human anatomy	1.000	0.746
Chinese medical history	1.000	0.403
Combination of Chinese and western medicine	1.000	0.687
Chinese and western medicine combine the oral and throat	1.000	0.702
Combination of Chinese and western medicine combined with gynecology	1.000	0.712
Chinese and western medicine combined with foreign science	1.000	0.68
Chinese and western medicine combined ophthalmology	1.000	0.685
Chinese medicine	1.000	0.676
Traditional Chinese medicine	1.000	0.676
Basic theory of Chinese medicine	1.000	0.554
Internal medicine	1.000	0.714
TCM diagnosis	1.000	0.657
Histology	1.000	0.724

3.1.3

Extraction of principal factor components

The total variance is explained in Table 4. As shown in the table, the system presets 27 common factors, and after multiple iterations, the “initial eigenvalues” of 4 components are greater than 1, and “component 1” explains 50.604% of the variance, “component 2” explains 6.244%, “component 3” explains 4.444% of the variance, and “component 4” explains the variance of 4.133, and the cumulative variance contribution rate is 65.425%, that is, the 4 component factors explain 65.425% of the original 27 variables. It shows that the common factors extracted by factor analysis can represent most of the information of the variables to be analyzed. The “gravel diagram” is a graphical representation of the influence of all common factors in the factor analysis, and the gravel diagram of the factor variable is shown in Figure 2. The “gravel diagram” shown in the figure shows the common factors formed after the analysis of the 27 initial common factors, and it can be clearly seen that the influence of the first 4 common factors is greater than 1, and the influence of the subsequent common factors decreases in turn. Therefore, it is reasonable to extract the four principal components.

Table 4.

Total variance interpretation

Constituent	Initial eigenvalue			Extracting the load of the load			Rotational load squared
	Total	Percentage of variance	Cumulation%	Total	Percentage of variance	Cumulation%	Total	Percentage of variance	Cumulation%
1	13.663	50.604	50.604	13.663	50.604	50.604	5.184	19.200	19.200
2	1.686	6.244	56.848	1.686	6.244	56.848	4.872	18.044	37.244
3	1.2	4.444	61.292	1.2	4.444	61.292	4.719	17.478	54.722
4	1.116	4.133	65.425	1.116	4.133	65.425	2.89	10.703	65.425
5	0.866	3.207	68.632
6	0.863	3.196	71.828
7	0.632	2.341	74.169
8	0.603	2.233	76.402
9	0.564	2.089	78.491
10	0.548	2.030	80.521
11	0.477	1.767	82.288
12	0.435	1.611	83.899
13	0.435	1.611	85.51
14	0.361	1.337	86.847
15	0.357	1.322	88.169
16	0.352	1.304	89.473
17	0.315	1.167	90.64
18	0.307	1.137	91.777
19	0.291	1.078	92.855
20	0.286	1.059	93.914
21	0.283	1.048	94.962
22	0.272	1.007	95.969
23	0.231	0.856	96.825
24	0.22	0.815	97.64
25	0.219	0.811	98.451
26	0.217	0.804	99.255
27	0.201	0.744	100.000

1)

Factor model a before rotation

The composition matrix a before rotation is shown in Table 5. As can be seen from the table, the results of factor analysis mainly include four principal components, among which “principal component 1” has a strong performance on each observed variable, and its factor loading values are basically equal, and there is no outstanding performance. The “principal component 2” is in the courses of the observation variables “Jin Kui Yao Brief Reading”, “Selected Readings of the Neijing” and “Selected Readings of Typhoid Fever”, “Principal Component 3” is in the courses of the observed variables “Biochemistry”, “Microbial Parasitology”, “Warm Disease” and “Ancient Medical Literature”, and “Principal Component 4” is in the courses of the observed variables “Pathology”, “Formulary”, “Selected Readings of the Neijing” and “Selected Readings of Typhoid Fever”, etc., and the loading values of the above principal components 2, 3 and 4 are negative, and the characteristics are obvious. Since the load value of principal component 1 on each observed variable is too uniform, the corresponding principal factor cannot be abstracted from this “component matrix”. In view of these phenomena, it is necessary to study rotational transformations.

2)

Rotated factor model a

The rotated component matrix a is shown in Table 6. As in the table, the maximum variance method is used to unfold the flip change on the factor loading matrix, which can lead to the flipped factor loading matrix. Due to the rotational transformation, it is possible to make the change of loadings on the observed factors of different principal components more centralized, so it can be more convenient to understand the significance of different principal factors.

3)

Factor scores

Factor analysis methods represent factors as linear combinations between common factors and specific variables. Alternatively, the factor scores can be obtained by reversing the process of representing all the common variables as linear combinations of the factors. The matrix of factor scores for each component is shown in Table 7.

Table 5.

The component matrix a before the rotation

	Constituent
Course name	1	2	3	4
Pathology	0.743	0.172	0.175	-0.022
Formulology	0.76	0.069	0.141	-0.402
The golden chamber is slightly read	0.74	-0.254	0.07	0.098
Internal reading	0.722	-0.043	0.188	-0.205
Chilling theory	0.826	-0.121	0.184	-0.098
Physiology	0.665	0.405	0.311	0.147
Biochemistry	0.723	0.1	-0.361	0.281
Microbial parasitology	0.79	0.146	-0.038	-0.178
Epidemiology	0.606	-0.307	-0.054	0.219
Western medicine	0.704	-0.297	0.15	0.161
Pharmacology	0.763	-0.017	0.342	-0.098
Medical history	0.646	0.365	-0.218	0.212
Acupuncture	0.795	-0.107	0.151	-0.11
Diagnostic foundation	0.73	0.011	-0.324	0.136
Human anatomy	0.714	0.345	-0.084	0.287
Chinese medical history	0.554	0.256	0.169	0.056
Combination of Chinese and western medicine	0.633	-0.288	-0.341	-0.278
Chinese and western medicine combine the oral and throat	0.674	-0.3	-0.227	-0.276
Combination of Chinese and western medicine combined with gynecology	0.764	-0.303	-0.126	-0.007
Chinese and western medicine combined with foreign science	0.783	-0.251	-0.057	0.148
Chinese and western medicine combined ophthalmology	0.746	-0.262	-0.127	0.211
Chinese medicine	0.729	0.158	-0.103	-0.201
Traditional Chinese medicine	0.474	-0.331	0.506	0.377
Basic theory of Chinese medicine	0.604	0.311	-0.241	-0.065
Internal medicine	0.768	-0.222	0.059	-0.063
TCM diagnosis	0.632	0.387	0.057	-0.192
Histology	0.776	0.293	-0.043	0.147

Table 6.

The component matrix of the rotation

	Constituent
Course name	1	2	3	4
Pathology	0.197	0.591	0.486	0.273
Formulology	0.366	0.758	0.243	0.085
The golden chamber is slightly read	0.44	0.32	0.188	0.553
Internal reading	0.321	0.269	0.257	0.662
Chilling theory	0.434	0.375	0.228	0.551
Physiology	-0.038	0.44	0.618	0.393
Biochemistry	0.566	0.047	0.624	0.157
Microbial parasitology	0.378	0.556	0.436	0.073
Epidemiology	0.491	0.032	0.292	0.397
Western medicine	0.463	0.271	0.238	0.452
Pharmacology	0.266	0.635	0.316	0.437
Medical history	0.217	0.181	0.726	0.056
Acupuncture	0.426	0.575	0.275	0.399
Diagnostic foundation	0.562	0.136	0.491	0.131
Human anatomy	0.181	0.21	0.787	0.216
Chinese medical history	0.086	0.326	0.477	0.23
Combination of Chinese and western medicine	0.727	0.334	0.122	-0.016
Chinese and western medicine combine the oral and throat	0.714	0.397	0.09	0.045
Combination of Chinese and western medicine combined with gynecology	0.642	0.397	0.263	0.312
Chinese and western medicine combined with foreign science	0.655	0.254	0.282	0.376
Chinese and western medicine combined ophthalmology	0.639	0.104	0.356	0.387
Chinese medicine	0.399	0.561	0.479	0.001
Traditional Chinese medicine	0.802	0.117	0.007	0.111
Basic theory of Chinese medicine	0.277	0.32	0.574	-0.097
Internal medicine	0.538	0.467	0.254	0.381
TCM diagnosis	0.131	0.636	0.514	-0.032
Histology	0.247	0.336	0.704	0.231

Table 7.

Each component score coefficient matrix

	Constituent
Course name	1	2	3	4
Pathology	-0.11	0.126	0.026	0.002
Formulology	-0.019	0.333	-0.142	-0.134
The golden chamber is slightly read	0.076	-0.018	-0.038	0.179
Internal reading	-0.062	0.014	-0.158	0.278
Chilling theory	0.023	0.089	-0.081	0.191
Physiology	-0.274	0.056	0.18	0.135
Biochemistry	0.196	-0.275	0.171	-0.067
Microbial parasitology	0.02	0.14	0.024	-0.116
Epidemiology	0.135	-0.183	0.023	0.158
Western medicine	0.21	-0.036	-0.063	0.015
Pharmacology	-0.114	0.222	-0.113	0.15
Medical history	-0.025	-0.123	0.3	-0.07
Acupuncture	-0.023	0.138	-0.084	0.058
Diagnostic foundation	0.201	-0.149	0.135	-0.083
Human anatomy	-0.047	-0.179	0.322	0.029
Chinese medical history	-0.159	0.075	0.158	0.091
Combination of Chinese and western medicine	0.331	0.019	-0.12	-0.23
Chinese and western medicine combine the oral and throat	0.246	0.078	-0.106	-0.122
Combination of Chinese and western medicine combined with gynecology	0.165	-0.014	-0.061	0.003
Chinese and western medicine combined with foreign science	0.136	-0.046	-0.023	0.157
Chinese and western medicine combined ophthalmology	0.166	-0.162	0.031	0.104
Chinese medicine	0.039	0.168	0.061	-0.204
Traditional Chinese medicine	0.566	-0.081	-0.069	-0.134
Basic theory of Chinese medicine	0.046	0.004	0.186	-0.19
Internal medicine	0.091	0.096	-0.091	0.027
TCM diagnosis	-0.128	0.228	0.086	-0.132
Histology	-0.046	-0.073	0.228	0.034

3.2

Results and Analysis of Teaching Quality Monitoring

A comparison of the top 30 students’ composite score rankings with the mean score rankings is shown in Figure 3. From the figure, it can be seen that the results of the factor analysis composite score and the traditional mean score ranking are still different. For example, the student whose number is #1 has the highest composite score and ranked #1 in the factor analysis, but the mean score ranking is #30.

To summarize, in the context of the high-quality development of commerce and distribution, the performance of different students in their ability to deal with common diseases in various clinical disciplines, clinical diagnosis and identification of traditional Chinese medicines, the ability to diagnose and examine the condition, take medical history, and the ability to recognize and treat the illnesses in Chinese medicine is also very different. Teachers can tailor their teaching to the needs of their students. Teaching managers can also use the structure of professional knowledge and ability obtained from factor analysis as an objective reference in improving or refining the setting of professional courses and the formulation of professional training objectives.

4

Practical teaching reform ideas and measures

In view of the common problems in practice teaching of professional groups, we actively carry out research and reform of practice teaching, construct “basic interoperability, hierarchical progression, integration of competition and creation, comprehensive leap” practice teaching system, and put forward effective reform measures in the construction of training rooms and teaching resources [22]. 1)

Constructing the practice teaching system of “basic interoperability, hierarchical progression, integration of competition and innovation, and comprehensive leap”.

In accordance with the principle of “basic interoperability, hierarchical progression, integration of competition and creation, and comprehensive leap”, the practice teaching system of the commerce professional group centers on the requirements of enterprises for the technical skills and comprehensive quality of finance and commerce management personnel in the context of industrial integration, is based on the cognitive law of students’ professional learning and the needs of professional development, and is aligned with the work tasks and work process of finance and commerce job groups, combining with the law of professional growth. Based on the cognitive law of students’ professional learning and the needs of career development, buttressing the work tasks and work process of finance and trade job groups, combining with the law of career growth, breaking through the boundaries of the separation of various majors, highlighting the inter-specialty nature of practice teaching according to the basic knowledge of economic management and circulation and commerce and the work related to finance and trade management jobs, and realizing the interoperability and sharing of practice teaching resources among different majors.

The practical teaching system of “Basic Interoperability, Layered Progression, Competition and Creation Integration, and Comprehensive Leap” is shown in Fig. 4, which includes 6 stages, i.e., professional cognitive experience training, general practical training of professional group, professional basic practical training, professional development practical training, practical training of national higher vocational skills competition, and inter-professional comprehensive practical training and top job internship. Following the law of students’ professional learning cognition and ability enhancement, it forms a hierarchical progression system of professional ability cognition - professional ability formation - professional ability expansion - professional comprehensive ability enhancement, and at the same time, it emphasizes the integration and enhancement of personal quality - professional quality - comprehensive quality. Professional cognitive experience and general practical training of professional group, as the general basic practical training activities of each specialty in the professional group, on the one hand, enhance the students’ cognition and understanding of all the occupational positions in finance, economics and trade. On the other hand, it enables students to have a full understanding of the whole process of enterprise operation and the cooperation between various departments. Through carrying out professional basic practical training, professional development practical training and national higher vocational skills competition practical training, students’ professional foundation and core skills are strengthened to promote the formation of vocational ability and the ability to deal with vocational complex problems. Interdisciplinary comprehensive practical training is an indispensable practical training stage for the cultivation of composite and developmental talents, and students complete the integration of interdisciplinary knowledge and ability, and are able to solve practical problems quickly, efficiently and creatively, and have strong adaptability to the new period, the new business model and the new business.

2)

Construction of a “five-in-one, virtual and real” cross-disciplinary integrated simulation training center

In order to adapt to the requirements of the integration and synergistic development of the manufacturing industry and the financial and trade service industry, after the research of enterprises and job groups, it is found that the four majors of e-commerce, logistics management, accounting, advertising planning and marketing have a high degree of relevance in the positions of supply chain operation, marketing, warehousing and distribution, purchasing and supplying, and financial management, etc., and they have the common core skills, such as marketing planning, resource planning, cost control, and on-line trading. In order to strengthen the relevance of each major, improve the core practical training courses and core skills training, on the basis of the original on-campus training rooms and training bases, we have gathered the advantageous teaching resources and constructed a cross-specialty integrated simulation training center. In accordance with the idea of “school-enterprise co-construction, intra-group sharing, and professional co-management”, the enterprise production management standards are introduced, the real jobs of the enterprise are compared, the types of work are improved, and the development trend of finance and commerce is oriented to build a cross-professional integrated simulation training center integrating “practical teaching, skill competition, vocational training, teaching and research, innovation and entrepreneurship incubation” - “simulation training center for the integrated development of financial and commercial services and manufacturing industry”. The simulation training center for the integrated development of financial and commercial services and manufacturing industry is shown in Figure 5

3)

Introducing the “Teaching Factory” model, connecting with real work and innovating practical teaching resources.

The “Teaching Factory” education model is created by Nanyang Technological Institute of Singapore, which introduces the practical environment of enterprises into the teaching environment, takes the project as a link, and integrates teaching, learning and research in depth, which plays an important role in cultivating students’ professional and vocational abilities. The combination of enterprises and schools, practice and theory, teachers and experts in the “Teaching Factory” realizes the seamless connection between graduates and jobs, and has cultivated a large number of highly-skilled and applied talents with high technological research and development capabilities for Singapore.

5

Educational reform data mining based on FWA optimized k-mean clustering algorithm

5.1

FWA

The new swarm intelligence FWA originates from the process of sparking when fireworks are ignited, which is regarded as the process of searching for the location of fireworks ignition in the local space of the neighborhood of a specific point by igniting the sparks generated by the ignition, and then continuously igniting fireworks in the search space until the sparks reach the optimal location.

Sparks based on fitness are categorized into 2 types, i.e., good sparks and bad sparks, where sparks with smaller fitness have a stronger search capability, have a smaller explosion radius in a smaller search space, and produce a higher number of fireworks. On the contrary, sparks with greater adaptation have stronger digging ability, explode in larger radius in larger search space and produce less number of fireworks [23]. In FWA, there are 2 parameters that play a decisive role, one is the explosion radius R_i of firework i, and the other is the number of sparks S_i produced by the explosion of firework i, which are calculated as follows (40) $R_{i} = \hat{R} \frac{f_{i} - y_{\min} + ε}{\sum_{i = 1}^{N} (f_{i} - y_{\min}) + ε}$ $${R_i} = \hat R\frac{{{f_i} - {y_{\min }} + \varepsilon }}{{\sum\limits_{i = 1}^N {({f_i} - {y_{\min }})} + \varepsilon }}$$ (41) $S_{i} = M \frac{y_{\max} - f_{i} + ε}{\sum_{i = 1}^{N} (y_{\max} - f_{i}) + ε}$ $${S_i} = M\frac{{{y_{\max }} - {f_i} + \varepsilon }}{{\sum\limits_{i = 1}^N {({y_{\max }} - {f_i})} + \varepsilon }}$$

Where: $\hat{R}$ $$\hat R$$ is the average explosion radius of the firework. f_i is the fitness function of fireworks i. y_min = minf_i, y_max = maxf_i are the minimum and maximum values of the fitness function in the fireworks population, respectively. ε is the machine minimum. M is the constant. N is the firework size.

The number of sparklers S_i is also subject to certain constraints in Eq: (42) $S_{i} = {\begin{array}{l} round (a W), s_{i} < a W \\ round (b W), s_{i} < b W \\ round (s_{i}), other \end{array}$ $${S_i} = \left\{ {\begin{array}{*{20}{l}} {{\text{round}}(aW),{s_i} < aW} \\ {{\text{round}}(bW),{s_i} < bW} \\ {{\text{round}}({s_i}),{\text{other}}} \end{array}} \right.$$

Where: a, b are constants, set by default a = 0.1, b = 0.5. round(·) is a rounding function. s_i is the initial number of fireworks. W is the weight parameter.

Set the dimension of the fireworks for k, in-law i position for x_i = (x_i, x_i2, ⋯, x_ik), according to Eq. respectively calculated fireworks i explosion radius R_i, the number of sparks S_i, randomly selected u(1 ≤ u ≤ w) fireworks position component, and according to Eq. position update to generate the explosion of sparks, that is: (43) $x_{i k}^{'} = x_{i k} + r_{i} U (- 1, 1)$ $$x_{ik}^\prime = {x_{ik}} + {r_i}U( - 1,1)$$

Where: x_ik is the position element of firework i in dimension k. x_i is the explosion updated x_ik. U(−1, 1) is a random number on the interval [-1, 1]. r_i is the scaling parameter for fireworks i.

Exploding fireworks may be out of range of the feasible domain boundary, sparks within the range are not processed, but for fireworks outside the range, they need to be assigned to a new search space by the mapping rule. The mapping formula is: (44) $| | \hat{x} | | = x_{i b, k} + | {\hat{x}}_{i k} | | x_{u b, k} - x_{i b, k} |$ $$||\hat x|| = {x_{ib,k}} + |{\hat x_{ik}}||{x_{ub,k}} - {x_{ib,k}}|$$

Where: $\hat{x}$ $$\hat x$$ is the position of the next generation firework. ${\hat{x}}_{i k}$ $${\hat x_{ik}}$$ is the position element of the next generation firework i in k dimensions. x_ub,k, x_lb,k are the upper and lower bounds of the solution space in k dimensions, respectively. ||·|| is the mode operation.

The variational operator is introduced in FWA in order to generate Gaussian sparks, which allows the diversity of the population to increase. The process of increasing starts with randomly selecting x_i, then selecting a specific dimension for Gaussian variation, and finally performing Gaussian variation calculations through the dimension k of x_i, i.e: (45) $\hat{x} = x_{i k} e$ $$\hat x = {x_{ik}}e$$

where e is equivalent to N(1, 1) and N(1, 1) is a Gaussian random number with variance and mean of 1.

The computational flow of the fireworks algorithm is shown in Fig. 6.

5.2

k-mean clustering algorithm

k The principle of the mean clustering algorithm is to make the objects of different classes of clusters as different as possible and the objects of the same class of clusters as identical as possible according to the corresponding similarity rules. The algorithm uses distance as a criterion for classifying the cluster classes, and the Euclidean distance formula is usually used to calculate the distance between samples of data objects, i.e.: (46) $d = \sqrt{\sum_{i = 1}^{N} (x_{i} - y_{i})^{2}}$ $$d = \sqrt {\sum\limits_{i = 1}^N ( {x_i} - {y_i}{{\text{)}}^2}}$$

Where: d is the Euclidean distance. x_i, y_i are the data object samples, X_i = {x₁, x₂, ⋯, x_N}, Y = {y₁, y₂, ⋯, y_N}, i ∈ [1, N] respectively.

In the k mean clustering algorithm clustering process, each iteration needs to recalculate the average of all samples in the cluster that is the cluster centroid c_i, update c_i is calculated as: (47) $C_{i} = \frac{1}{| c_{i} |} \sum_{x_{i} \in c_{i}} x_{i}$ $${C_i} = \frac{1}{{|{c_i}|}}\sum\limits_{{x_i} \in {c_i}} {{x_i}}$$

Where C_i is the cluster.

k The mean clustering algorithm needs to iteratively update the partitioned categories and cluster centroids c_i until the termination conditions are met. The default termination condition is that the number of iterations has reached the maximum value or the objective function of the algorithm is less than a threshold value.

From the above description, it can be seen that the core of the k-mean clustering algorithm is based on the minimum error sum of squares criterion, and the basic idea is to divide the given data objects into the same class clusters through a certain number of iterations, and then recalculate the clustering centers, and carry out cyclic iterations according to a certain number of times, and output the results when the criterion function converges [24]. k The criterion function E is defined as: (48) $E = \sum_{i = 1}^{k} \sum_{x_{i} \in C_{i}} (x_{i} - {\bar{x}}_{i})^{2}$ $$E = \sum\limits_{i = 1}^k {\sum\limits_{{x_i} \in {C_i}} ( } {x_i} - {\bar x_i}{)^2}$$

where ${\bar{x}}_{i}$ $${\bar x_i}$$ is the mean value of x_i.

It can be seen from Eq. The larger E is, the lower the similarity within class clusters. On the contrary, the smaller E is, the higher the similarity within class clusters.

5.3

k-mean clustering algorithm based on FWA optimization

The k mean clustering algorithm is more reliant on initializing the cluster centroids and easily falls into the local optimum problem, while FWA has the ability to balance the global search and local search, therefore, FWA is used to optimize the k mean clustering algorithm. Firstly, FWA is used to find k clustering centers as the initial cluster centroids of the k-mean clustering algorithm, and then the k-mean clustering algorithm is used for clustering to get the optimal values. The selection strategy is set in FWA, and the candidate set Y_H (H is the total number of elements) is set to contain the original fireworks, exploding sparks and Gaussian sparks. The set Y_n contains the optimal elements, in addition to the optimal elements, select Q − 1 elements in the set to form a new population of Q elements, where the probability of each element selection is: (49) $p_{i} = \frac{\sum_{j = 1}^{H} d_{i j}}{\sum_{i = 1}^{H} \sum_{j = 1}^{H} d_{i j}}$ $${p_i} = \frac{{\sum\limits_{j = 1}^H {{d_{ij}}} }}{{\sum\limits_{i = 1}^H {\sum\limits_{j = 1}^H {{d_{ij}}} } }}$$ (50) $d_{i j} = | f_{i} - f_{j} |$ $${d_{ij}} = \left| {{f_i} - {f_j}} \right|$$

where d_ij is the Euclidean distance between firework i and firework j.

6

Empirical studies

6.1

Data comprehension

In this section of the experiment, the grades of three courses related to commerce and distribution in a university were selected for statistical analysis, and Q-Q plots were selected to observe the changes of students at the high grades level, with an effective sample of 1,366. The normal Q-Q plots for the year of enrollment are shown in Figure 7. There is no significant change, mainly due to the limitation of the laboratory equipment with the limitation of the size of the innovation laboratory to accommodate a limited number of students. The trend normal Q-Q plot for year of enrollment is shown in Figure 8. The current main results are achieved by the students of class 2021 and 2022, with class 2023 as a strong reserve. The trend normal Q-Q graph of examination results in the main courses is shown in Fig. 9 (Fig. a shows the trend normal Q-Q graph of Data Structures and Fig. b shows the trend normal Q-Q graph of Database Principles). From the data of the above table and figure, it can be seen that through participation in the innovation laboratory and enterprise practical training, after a period of time, the students’ motivation to learn has improved, which is reflected in the improvement of the average grade, especially the number of excellent grades is growing. At the same time, polarization is also increasing, after all, the innovation laboratory does not cover all students.

6.2

Data pre-processing

The database principles were selected as the comparison object, the ANOVA model was constructed, the year of enrollment was selected as the dependent variable, and the fixed factor was the database principles, and the ANOVA was utilized to view the test of the between-subjects effect, and the test of the between-subjects effect of the database principles course is shown in Table 8. Where df represents the degree of freedom, F is the group variance value, Sig is the test value of the difference is significant, the value is generally compared with 0.05 or 0.01, if it is greater than 0.05, it means that the difference is significant. As can be seen from the table, Sig>0.05 indicates that there is a significant change in the achievement with the innovative activities in each grade. Next, a random factor analysis was conducted to derive the test of between-subjects effect by comparing the achievement of data structure and database principles based on the year of enrollment, and the test of between-subjects effect of comparing data structure and database principles courses is shown in Table 9. Comparing on the year of enrollment, the Sig<0.05 indicates that there is no significant difference in the degree of students’ choosing to participate in innovative activities in different years of enrollment, but with the deepening of the innovative activities, the before-and-after comparison of the two courses of Data Structure and Database Principles, there is a significant difference between the students’ learning outcomes and the rest of the students with the Sig>0.05.

Table 8.

The test of the effect of the course subject in the database principle

Source	Type iii sum	df	Mean square	F	Sig.
Calibration model	756.853a	1	369.495	2.234	.112
intercept	1313830.727	1	1315231.672	7952.285	.000
Year of admission	751.996	2	369.493	2.324	.110
error	52524.312	315	168.606
total	1925995.000	305
Total correction	52269.326	301

Table 9.

The data structure and the database principle course compare the test

Source		Type iii sum	df	Mean square	F	Sig.
Intercept	hypothesize	748 752.631	1	747482.833	4214.836	.000
Intercept	error	17236.548	98.417	167.951a
Year of admission	hypothesize	912.435	1	922.431	5.822	.015
Year of admission	error	33364.626	215	156.282b
Data structure	hypothesize	6899.072	40	195.932	1.285	.143
Data structure	error	33164.526	215	156.122b

In order to compare the mined data more intuitively, the year of enrollment, the main course scores and the trend of scores were compared by introducing RFM model to measure the level of students’ learning motivation and innovation ability, and the students’ year of enrollment and the main course scores were compared as shown in Figure 10. The distribution of the trend of training students’ test scores is shown in Figure 11. It shows that the impact of innovation education on students’ course learning is very obvious, and achieves the goal of optimizing the path of education reform in colleges and universities.

7

Conclusion

The study provides an in-depth analysis of the optimal design method of college education path through data mining algorithms and discusses its practical applications. The experimental conclusions drawn in this paper are as follows: 1)

The experiment utilizes factor analysis to make a comprehensive evaluation of the performance data, and finds that the student whose academic number is No. 1 has the highest and ranked No. 1 in the comprehensive rating of the factor analysis, but the average score ranking is No. 30. From this, it can be concluded that the comprehensive evaluation derived from the factor analysis is more reflective of the overall professional competence of the students. Therefore, the performance of different students in terms of their ability to deal with different types of problems varies greatly. Teachers can target and sex to tailor their teaching.

2)

Using the cluster analysis algorithm to compare and analyze the year of entry, main course grades and grade trends of students in a school, it is found that the impact of innovative education on students’ course learning is very obvious, which helps to achieve the goal of optimizing the path of education reform in colleges and universities.

Lingua:: Inglese

Frequenza di pubblicazione:: 1 volte all'anno
Argomenti della rivista:: Scienze biologiche, Scienze della vita, altro, Matematica, Matematica applicata, Matematica generale, Fisica, Fisica, altro

Feed RSS della rivista

Optimization Design of College Teaching Reform Paths in the Context of Big Data Mining-Driven High-Quality Development of Commerce and Circulation Based on Big Data Mining

Li He

Pubblicato online: 29 set 2025

Ricevuto: 20 gen 2025

Accettato: 27 apr 2025

DOI: https://doi.org/10.2478/amns-2025-1101

Parole chiaveData mining techniques, Factor analysis, Fireworks algorithm, Clustering algorithm

© 2025 Li He, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Parole chiave
Data mining techniques, Factor analysis, Fireworks algorithm, Clustering algorithm