Design and Implementation of Precision Strategy for Civic Education of Student Management in Colleges and Universities Based on Big Data Analysis

With the increasing scale of colleges and universities and the popularization of data and information, the number of students continues to grow, and the difficulty of managing students also increases. As educational institutions, colleges and universities, in addition to teaching subject knowledge, it is more important to provide students with ideological and moral education. In the process of student management in a strict attitude, guidance and education, but this way is easy to cause students’ resentment, resistance, and even aversion to school. Therefore, student management in colleges and universities must pay attention to emotional education, by paying attention to students’ emotional needs, respecting students’ personality and thoughts, letting students feel that they are cared for and understood, so as to stimulate students’ self-knowledge and development potential [1-3]. Students need to be bound and regulated by macro rules, norms and group discipline during their school years, to comply with school rules and regulations, and to avoid bad behavior. Student management in colleges and universities must focus on normative education, by strengthening students’ self-discipline, respecting students’ autonomy, educating students to abide by laws and regulations, and consciously resisting undesirable behaviors, so as to enable students to form good behavioral habits and ethical concepts in the process of growing up, and to lay a good foundation for future development [4-6].

At the same time, with the acceleration of globalization and the rapid change of information technology, the ideology of the student body is becoming more and more diversified, and the individual characteristics are becoming more and more distinct. The traditional ideological and political education model of “standardization and relative uniformity” ignores the individual differences among students, and it is difficult to cope with the challenges of diversification and individualization of students’ thoughts [7]. The rapid development of the information age has exacerbated this limitation. The rapid development of the Internet and social media has greatly broadened the boundaries of information dissemination, and the content of information is more complex and diverse than ever before, and the ideological concepts that students come into contact with in their daily lives show a diversified trend [8-9]. The vast amount of information on the Internet may have a positive impact on ideological and political education, but also may bring misinformation or bias due to the penetration of negative factors [10]. In the face of these challenges, precise ideological politics came into being. In the process of the construction of curriculum ideology and politics, teachers can make use of big data technology to carry out supply-side reform based on students’ ideological characteristics and developmental needs, and realize the precise supply of ideological and political education, i.e., precise ideology and politics [11]. Precision ideological politics is relying on big data and other modern technologies to provide personalized and customized ideological and political education services through the in-depth analysis of students’ ideological dynamics, behavioral characteristics and psychological needs, the core of which lies in the use of data-driven, accurate identification of students’ individual differences, optimizing the content of education and the allocation of resources, so as to enhance the relevance and effectiveness of ideological and political education [12-15]. Compared with the traditional model, Precision Ideological and Political Education emphasizes the individualization and scientific nature of education, and helps ideological and political education to maintain its effectiveness in a diversified environment [16]. In these contexts, how to carry out the work of ideological and political education for student management in colleges and universities has become a problem that school leaders and teachers need to think deeply about.

Based on big data mining and analysis technology, this paper realizes the design of precise strategy for student management and ideological education in colleges and universities. First, the relevant concepts of data mining and analysis are elaborated. Then, the improved K-Means algorithm is applied to cluster analyze the daily behaviors of college students. Then, based on the multiple regression model, the gradual regression analysis of multiple course grades and Civic Education grades is carried out to realize the accurate prediction of Civic Education grades of college students. Finally, based on the results of big data analysis and the current situation of teaching management and civic education, the precise strategy design of education management and course civic education is proposed.

2

Big data mining and analysis

This paper mines and analyzes the relevant big data of students in colleges and universities to provide data support for realizing the design of precise strategies for the management of students in colleges and universities in the field of Civic and Political Education. Specifically, the improved K-means algorithm is used to cluster and analyze students’ daily behaviors, and the students’ performance in civic education is predicted based on the multiple linear regression model.

2.1

Overview of Data Mining

2.1.1

Concepts of data mining

Data mining is a process of mining out information that people cannot see from the surface, unknown in advance, implied but valuable from massive, noisy, fuzzy, incomplete and random data. Compared with traditional data analysis (such as statistics, reports, queries, online application analysis), data mining is to discover knowledge without any assumptions, mining information, laws, the results are unknown, useful and effective. Data mining can provide assistance to decision makers by helping them to analyze historical and current data, and can discover hidden relationships from them, which in turn can provide a basis for business decisions, financial forecasting and market planning and other areas to predict possible future behavior. Data mining is also a very broad cross-cutting discipline involving artificial intelligence, machine learning, mathematical statistics, databases, visualization and other disciplines. At present, the application of data mining technology is also quite extensive, finance, insurance, economic management, health care, education and other fields have used data mining technology to solve practical problems.

2.1.2

The process of data mining

Data mining is one of the most important steps in knowledge discovery (KDD). The process of data mining is shown in Figure 1.

The process of data mining (knowledge discovery) mainly includes: 1)

Data Collection

Through the analysis of the research content and related data to determine the required target data, the data needed for the study is extracted from the data source.

2)

Data Cleaning

Due to the complexity of the original data, it may contain some unnecessary or erroneous data, which will have an impact on the results of data mining, so it is necessary to clean the data to eliminate the noise, delete unnecessary and erroneous data.

3)

Data integration

In the process of data mining, the required data may come from different data sources, which can be combined with data from multiple sources to facilitate the management and operation of data in the future.

4)

Data conversion

Data conversion is the form of the current data into the form of mining research needs a process, in order to better use the data and mining. Frequently used data conversion methods are: data normalization, aggregation, attributes, construction and generalization.

5)

Data Mining

In view of the nature of the target data for knowledge representation, relevant algorithms suitable for data mining are selected, and intelligent methods are used to extract data patterns and construct models, so as to mine the potential laws and knowledge from the data.

6)

Knowledge representation After the above process, the potential knowledge in the data is mined, and then the visualization or knowledge representation technology is used to represent the useful knowledge, clearly show the knowledge to the user, and provide decision-making support for the user. The data mining system is composed of various types of databases, pre-mining processing modules, mining process operation modules, pattern evaluation modules, and knowledge input modules. The organic composition of these modules constitutes the architecture of data mining as shown in Figure 2.

2.2

Data Mining Related Techniques

Data mining is in the application-driven field and absorbs techniques from many application areas such as machine learning, statistics, databases, pattern recognition, data warehousing, high performance computing, information retrieval and visualization. There are many edges to the research and development of data mining, and these edges have greatly contributed to the widespread use and success of data mining today.

Statistics focuses on the collection, interpretation analysis and representation of data. In turn, data mining and statistics are very much related. The models of statistics are generally mathematical functions and they portray the behavior of the object to be studied by using methods such as probability distributions and random variables. There are many tools in statistics that utilize data and statistical models to make predictions and forecasts, and these tools can certainly monitor and validate the results of data mining.

Machine learning is an examination of data-based learning of computers and a technique used to improve computer performance. It focuses on the ability of a computer program to automatically learn and recognize complex patterns and make intelligent judgments about them based on data. Machine learning is mainly categorized into supervised learning, unsupervised learning, semi-supervised learning and active learning. Supervised learning is actually classification. Supervision in learning mainly comes from the labeled instances in the training dataset. Unsupervised learning is actually clustering. The learning process is unsupervised because the input instances are not labeled with classes. Classically, clustering algorithms are used to discover classes in the data. Semi-supervised learning is a class of machine learning techniques that uses both labeled and unlabeled instances for model recognition. In the same method, labeled instances are used to learn the class model, while unlabeled instances are used to further optimize the class boundaries. Active learning is an approach to machine learning that makes the user’s role in the learning process always active.

Data warehousing integrates data from multiple data sources and different time periods, and it focuses on merging data in a multidimensional space and forming partial data cubes. It facilitates OLAP for multidimensional databases and can be used to drive the development of multidimensional data mining.

Information retrieval is the science of searching documents. There are many forms of documents, which can be in text form or multimedia form, and may reside on the Web. There are two differences between database systems and traditional information retrieval: the data that information retrieval wants to search is unstructured. Information retrieval queries are primarily keyword searches with no particularly complex structure, whereas database systems utilize SQL statements for queries. Probabilistic modeling is the classic approach to information retrieval.

3

Cluster analysis of student behavior based on improved K-Means algorithm

3.1

Student Behavioral Feature Extraction

In this paper, students of a university in China are selected as the research object, and their behavioral data are obtained by processing and analyzing the data generated by students when they use campus Wi-Fi to connect to the Internet. By analyzing the length of time a student spends in different types of locations on campus, we can understand how active he or she is in studying. Generally speaking, if a student spends a large amount of time in the teaching building, library and other study places, it can be assumed that the student’s attitude towards study is more serious and the degree of motivation for study is higher, while on the contrary, it can be assumed that his/her degree of motivation for study is not high.

The locations in the school are first classified into 7 categories by functionalization, which are teaching building, library, dormitory building, stadium, college activity center, cafeteria and other locations. Subsequently, the trajectory data of each student is arranged by time and features are extracted in terms of days and weeks, respectively. Define the distribution matrix of the residence time of a student z at different types of locations in the nnd week of a semester as shown in equation (1): (1) $X^{(n)} = [\begin{matrix} α_{11}^{(n)} & \dots & α_{1 j}^{(n)} \\ ⋮ & ⋱ & ⋮ \\ α_{i 1}^{(n)} & \dots & α_{i j}^{(n)} \end{matrix}] {\begin{array}{l} 0 < i \leq 7 \\ 0 < j \leq 7 \\ 0 < n \leq N \end{array}$

where N denotes the number of weeks included in a semester, and $α_{i j}^{(n)}$ denotes the proportion of time spent by the student z at the location j on the ith day of the nth week of the semester to the total time spent on that day. The calculation of $α_{i j}^{(n)}$ is shown in equation (2): (2) $α_{i j}^{(n)} = \frac{t_{i j}^{(n)}}{\sum_{0 < j \leq 7} t_{i j}^{(n)}}$

where $t_{i j}^{(n)}$ denotes the length of time a student spends at location j on day i in week n. The spatio-temporal behavioral feature vectors of student z are defined as S_z, $S_{z} = [S_{z 1}, S_{z 2}, S_{z 3}, \dots]$ , where S_Zj denotes the average weekly length of stay at location j as a proportion of the day for student z in week N. The computation of S_Zj is shown in equation (3): (3) $S_{Z j} = \frac{\sum_{0 < n \leq N} \frac{\sum_{0 < j \leq 7} α_{i j}^{(n)}}{7}}{N}$

Eq. (3) allows us to obtain the eigenvector S_z of student z‘s behavioral pattern, which enables us to count the proportion of the average day’s length of stay at each type of location on campus for that day.

3.2

K-Means clustering algorithm and its improvement

3.2.1

Improved K-Means Algorithm for Cluster Center Screening

The K-Means clustering algorithm is an iterative method for cluster analysis that combines a multi-objective optimization algorithm with fuzzy set theory to determine the location of the center of each cluster by calculating the distance between each object and the initial cluster [17]. After assigning tasks to each objective, the cluster will be reset based on the objectives that already exist in that cluster. This cycle will continue until the cluster center no longer changes or no more data is assigned to other clusters. However, since the K-Means clustering method has specific requirements for the cluster centers of the initial values, if the initial values are chosen differently, this can result in large differences. Therefore, in this paper, the K-Means clustering method is improved so that the value of k is changed, and then the K-Means calculation is performed sequentially on the k-value, and the calculated core is used to replace the original data. As the value of k increases, the clusters obtained from each K-Means subset have a wider space and more information. The centroids are generated by taking the value of k for the daily behavioral data of college students.

If a behavioral category intersects with other behavioral categories, the number of categories and the final clustering effect will be adversely affected. In order to eliminate intermediate nodes that are unfavorable to clustering and improve clustering accuracy, the distance between intermediate nodes and the fractional radius of the categories are fused and the following conditions are set: (4) $d (o_{i}, o_{j}) < \max (r_{i}, r_{j}) - \min (r_{i}, r_{j})$

In the formula, r_i and r_j represent the clustering radius of the ird and jth category center points o_i and o_j, respectively. In order to satisfy this constraint, i.e., among the two centroids, the larger diameter is larger than the smaller diameter and the sum of the spacing between the two centroids, which can be regarded as the emergence of cross-information between different classes in the category with the larger diameter, which leads to the merging of classes that should have been differentiated, the centroids with the larger diameters are deleted, and the set of centroids is then updated.

3.2.2

Data clustering based on improved K-Means algorithm

Based on the data clustering center screening results of the improved K-Means algorithm, the size of the regional data aggregation density of each clustering center is calculated. The largest clustering area is taken as the initial clustering area and counted in sequence until the number of selected clustering areas meets the requirements. The initial clusters selected by this method can better reflect the spatial distribution characteristics of the data, thus avoiding the clustering results terminating in a local optimum due to the randomness of the initial value selection. After screening the data clustering center using the improved K-Means algorithm, the clustering steps based on the improved K-Means algorithm are designed as follows:

Step 1: After determining the clustering core, the clusters are screened. In the initial student daily behavior data, there are multiple behavior records for the same student number, which can be expressed as: (5) $x_{d (o_{i}, o_{i})} = \sum_{k = 1}^{I D} s_{k} t_{k}$

where t_k denotes the time delay coding of the knd record. s_k denotes the error data for a number corresponding to multiple record results. Based on this, its solution is to accumulate multiple data of the same student in the same time period and then synthesize them into one record data. As a result, the design of the separation model of normal and abnormal data can be completed, which can be expressed as: (6) $f (x_{d (o_{i}, o_{i})}) = {\begin{array}{l} 1, n_{n u l l C o u n t} \geq η \\ 0, n_{n u l l C o u n t} < η \end{array}$

where n_nullCount indicates the number of feature number parameters; η indicates the number of behavioral features. If the value is 1, it means that the number of consecutive empty behavioral features does not exceed the number of feature parameters, so it is not necessary to delete. If the value is 0, it means that the number of consecutive empty behavioral features exceeds the number of feature parameters, so it needs to be deleted. As a result, data continuity can be ensured and some potentially useful data can be avoided to be erased.

Step 2: Conduct cohesive hierarchical cluster analysis on students’ daily behavior data. Considering the m kinds of clustered data as m clusters, the Euclidean distance between different kinds of data can be expressed as: (7) $d_{i j} = \sqrt{{(c_{i 1} - c_{j 1})}^{2} + {(c_{i 2} - c_{j 2})}^{2} + \dots + {(c_{i m} - c_{j m})}^{2}}$

where c_i1, c_j1,⋯,c_im, c_jm denote the corresponding principal component scores in the original sample feature matrix. The distances are ordered from smallest to largest, and the labeling of the m objects is also included in the ordering process. Eventually, according to the order, for each distance, the two class clusters it is labeled with are compared, and if it belongs to other class clusters, then it is fused into a single class cluster until the total number of class clusters is 1 or until a specific situation is satisfied.

Step 3: A modified K-Means clustering algorithm is used to merge the selected cluster points to get better initial results. The core idea of merging centroids is to synthesize multiple centroids that are closer together into a single category and use the average of the multiple centroids as the centroid of the new category. When the distance between any two centroids is greater than the spacing between any two centroids, they can be merged into one category.

Create a pointer matrix for merging the center nodes of the categories, which can be expressed as: (8) $T = {\begin{array}{l} 1, d (o_{i}, o_{j}) < r_{i} + r_{j} \\ 0, d (o_{i}, o_{j}) > r_{i} + r_{j} o r i = j \end{array}$

When the calculated result is 1, the two centroids should be combined, combining the transmissibility and mutual incompatibility properties of the centroids and setting up a set to store the centroids that are still undivided. If there is still an element in that center point, it takes the value of 1, and vice versa it takes the value of 0.

Step 4: For the same data, the range of values between the attributes can be very different. In the original data, the characteristics of some attributes are not exactly the same as the units of other characteristics. If these characteristics are analyzed, the characteristics with large values will cover the characteristics with small values, making the characteristics with small values not fully utilized, which leads to deviations and errors in the calculation results. Therefore, in order to solve this problem, it is necessary to standardize the original data with the formula: (9) $Z_{i} = \frac{Z_{a} - \min_{a}}{\max_{a} - \min_{a}}$

where Z_a represents a value of attribute a. max_a, min_a denote the maximum and minimum values of attribute a, respectively. By using the operation of interval normalization, all the behavioral data values can be normalized within the range of [0, 1]. If the behavioral data values in a row are all 0, then no interval normalization is required, and they are just kept as 0. If the behavioral data values in a row are not exactly the same, then the interval needs to be normalized.

Step 5: After the normalization process, feature extraction of students’ daily behavioral data is performed to obtain the behavioral data clustering channel transmission power spectral density of different clusters, which can be expressed as: (10) $ρ = \frac{l^{2}}{μ_{z_{i}} \cdot h_{i}}$

Where, l denotes the behavioral data channel sequence number; $μ_{z_{i}}$ denotes the data channel transmission frequency; h_t denotes the transmission time spectrum. The clustering function for parallel integration is constructed under the results of channel transmission power spectral density calculation: (11) $W = H \cdot ϑ \cdot \frac{ρ}{q (l)}$

Where, H denotes the extent to which different classes of data are affected by each other; ϑ denotes the dataset frequency; and q(l) denotes the number of data in the channel.

Step 6: Repeat the above steps until the clustering center no longer reassigns data to other class clusters.

3.3

Clustering results and analysis

The obtained feature vectors of students’ behavioral patterns S_z were clustered and analyzed using the improved K-Means algorithm, and the clustering results of students’ daily behaviors are shown in Fig. 3. Among them, the horizontal and vertical coordinates represent the code names and location types of different students, and the location types from top to bottom are teaching building, library, dormitory, stadium, activity center, cafeteria, and other locations. The colors in the figure indicate the percentage of time that the students stayed at each location, and the closer the color is to red means that the percentage of time that the student stayed at the location is larger, and vice versa.

Among these different locations, there are some similarities in some locations, for example, the teaching building and the library are the same locations for studying, and the stadium and the activity center are the locations for students’ social activities and physical exercise. Therefore, according to the proportion of time spent in different types of locations students can be divided into three categories of student behavior patterns, namely, hard studying type, lack of socialization type, and physical exercise type. The hotspots of time distribution of hard-learning students are mainly concentrated in teaching buildings and libraries, and this type of students has a high motivation to learn. The hotspots of the time distribution of the lack of socialization type students are mainly concentrated in the dormitory, this type of students have poor motivation to study and also lack of physical exercise. The hotspots of physical activity type students are mainly concentrated in activity centers, stadiums and other places, and they are more interested in physical activity and club activities.

Through the clustering algorithm, the percentages of the three types of students, i.e., hardworking students, socially inactive students, and physically active students, were 36%, 38%, and 26% of the total number of students, respectively.

In order to further understand the influencing factors of students’ learning, on the basis of this clustering, different types of students were clustered and analyzed according to gender and education. The results of clustering students of different genders are shown in Figure 4.

As can be seen in Figure 4, the percentage of male students (47%) is slightly smaller than the percentage of female students (53%) among the hard-learning type of students. Among the lack of socialization type of students, it is clear that there is a gap between the percentage of male students (42%) compared to the percentage of female students (58%). In the case of physically active students, the percentage of male students (63%) is clearly larger than the percentage of female students (37%). This indicates that male students are more active in terms of physical activity and socialization, while female students prefer to stay in indoor locations such as dormitories or libraries.

The results of clustering students with different academic degrees are shown in Figure 5. Graduate students accounted for a higher percentage of both the hard-working students and the unsocialized students types (59% and 58%, respectively), while undergraduates accounted for a higher percentage of the physically active students (51%). The reason for this result is that in general, graduate students have fewer classes scheduled and most of the time they need to do self-study in the library or laboratory or complete projects with their instructors, while undergraduates have to complete the courses scheduled by the university most of the time, and they can usually arrange themselves to do physical exercise or club activities, etc., in their spare time.

4

Research on the prediction of students’ performance in civic education based on multiple linear regression

4.1

Regression analysis model

4.1.1

Linear regression

Linear regression has many practical uses. They fall into the following two broad categories: if the goal is prediction or mapping, linear regression can be used to fit a predictive model to the sum X values of an observed data set. When such a model is completed, the fitted model can be used to predict a y value for an added X value, given no y to match it.

Given a variable y and a number of variables X₁, X₂,⋯,X_p that are potentially correlated y with each other, linear regression analysis can be used to quantify the strength of the correlation between y and X, to assess the X uncorrelation with y and to identify which subset of times X contain redundant information about y.

4.1.2

Multiple linear regression

Multiple regression analysis is a statistical analysis method in which one variable is considered as the dependent variable and one or more other variables are considered as independent variables in the correlation, and a mathematical model quantitative equation of linear or nonlinear relationships between multiple variables is established and analyzed using sample data [18]. In addition, multiple regression analysis of linear dependence of multiple independent variables on multiple dependent variables, called multiple multiple regression analysis model, is also discussed. Usually there are more than one factors affecting the dependent variable and this problem of multiple independent variables affecting one dependent variable can be solved by multiple multiple regression analysis. In linear regression analysis, multiple linear regression has greater practical significance than univariate linear regression.

The basic tasks of multiple linear regression analysis are as follows: to establish multiple linear regression equations of the dependent variable on multiple independent variables based on the actual observed values of the dependent variable and numerous independent variables. Evaluating the relative importance of the effects of each independent variable on the dependent variable and determining the height of the bias of the optimal multiple linear regression linear equation. Many multivariate nonlinear regression problems can be solved by multivariate linear regression, so multivariate linear regression has a wide range of applications.

4.1.3

Multiple linear regression models

Let there be the following linear relationship between variable Y and variable X₁,X₂,⋯,X_p: (12) $Y = β_{0} + β_{1} X_{1} + \dots + β_{p} X_{p} + ε$

where β₀ is the regression constant and β₁,β₂,⋯,β_p is the overall regression parameter, when p = 1, the equation (12) is called a univariate linear regression model, and when p ≥ 2, it is called a multiple linear regression model. ε is the random error and obeys a $ε ~ N (0, σ^{2})$ distribution.

The most commonly used estimation method for parameter β is the least squares estimation (OLS), which has an objective function of minimization: (13) $Q (β) = \sum_{i = 1}^{n} {‖ y_{i} - x_{i} β ‖}^{2}$

Because in solving real problems, matrix X′X is usually singular. So when X′X is a non-singular matrix, it indicates that the variables are not perfectly correlated, and the least squares estimate obtained at this point is: (14) $\dot{β} = {(X^{'} X)}^{- 1} X^{'} Y$

The regression model can thus be obtained as: (15) $\dot{Y} = X \dot{β} = X {(X^{'} X)}^{- 1} X^{'} Y$

4.1.4

Tests of multiple linear regression models

From the established multiple linear regression model and the regression coefficients that have been obtained, to test the fit of the entire regression equation, the R² test can be used.

The coefficient of determination R² is defined as: (16) $R^{2} = 1 - \frac{S S E}{S S T}$

where SSR denotes the sum of squares of regression, which is defined as in Equation (17) and reflects the portion of y variation due to the linear relationship between x and y. SST denotes the sum of squares of total deviations, which is defined as in Equation (18) and reflects the total deviation of the n observed values of the dependent variable from its mean. sse denotes the sum of squares of residuals, which is defined as in Equation (19) and reflects the linear effect of all factors except x on y the portion of variation that cannot be explained by a regression straight line. Other factors on the y variance is the portion of the variance that cannot be y explained by the regression straight line. To wit: (17) $S S R = \sum_{i = 1}^{n} {({\hat{y}}_{i} - \bar{y})}^{2}$ (18) $S S T = \sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}$ (19) $S S E = \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}$

The relationship between the three is satisfied: (20) $S S T = S S R + S S E$

R² reflects the goodness of fit of the regression line to the data and takes a value between $[0, 1]$ . R² tends to 1, indicating a better fit to the regression equation, and conversely, R² tends to 0, indicating a worse fit to the regression equation.

4.2

Stepwise regression analysis of student course grades

4.2.1

Sample Selection

In this paper, eight courses of political science theory (X1), history of Western political thought (X2), public relations (X3), basic theory and practice of economics (X4), selected Marxist classics (X5), topics of socialist economy (X6), history of the Chinese revolution (X7), and ideological and political education (X8) of three grades of a university in China, with a total of 232 students, have been selected as a sample. The grades (percentage system) of the courses were used as samples. Except for Ideological and Political Education, the other seven courses selected are related to Ideological and Political Education and have some influence on the learning of Ideological and Political Education.

4.2.2

Correlation analysis

Firstly, Pearson correlation coefficient was utilized to explore the correlation between the seven courses and the ideological and political education grades, and the correlation coefficient was calculated as shown in Table 1 [19]. The correlation coefficients between the grades of each course and the grades of ideological and political education are between [0.51,0.69], indicating that there is a moderate correlation between the grades, which can be processed by regression analysis.

Table 1.

Results of correlation coefficients calculation

Course title	Correlation coefficient
Political theory	0.6844
History of Western political thought	0.6427
Public relations	0.5813
Basic theory and practice of economics	0.5648
Selected readings of Marxist classics	0.5646
Special topic on socialist economy	0.5309
History of Chinese revolution	0.5173

4.2.3

Multicollinearity test

Considering the correlation between the seven courses, a correlation heat map is drawn as shown in Figure 6. The results show that there is no strong correlation between the courses.

The variance inflation factor was utilized to test for the existence of multicollinearity among the variables, and the variance inflation factors of the explanatory variables are shown in Table 2. The results of the analysis show that there is a weak multicollinearity between the two courses of Selected Marxist Classics (7.4882) and History of Western Political Thought (5.2183) and the other courses, and there is no multicollinearity between the other explanatory variables. Therefore, the relationship between grades can be analyzed using the multiple linear regression model.

Table 2.

The variance inflation factor of the explanatory variable

Course title	Correlation coefficient
Political theory	4.3472
History of Western political thought	5.2183
Public relations	3.2546
Basic theory and practice of economics	3.8159
Selected readings of Marxist classics	7.4882
Special topic on socialist economy	2.6147
History of Chinese revolution	3.2038

4.2.4

Multiple linear regression analysis

Separately, the grades of the seven courses numbered X1~X7 were recorded as x₁,x₂,…x₇ and the grades of ideological and political education were recorded as y according to the order of courses in Table 2. According to the relationship of correlation with ideological and political education from the largest to the smallest, the multivariate linear regression model was established by adopting the stepwise regression method: (21) $y = β_{0} + β_{1} x_{1} + \dots + β_{p} x_{p} + ε$

Where, $β_{0}, β_{1}, \dots, β_{p} (p = 1, 2, \dots 7)$ is the parameter to be estimated.

Stepwise regression coefficients were obtained by programming Matlab software using the grades of 232 students. In the regression process, if the confidence interval of a coefficient contains zero points, it is necessary to first consider whether the data contain outliers. Repeat the process of removing outliers and calculating regression coefficients for many times until the data samples are free of outliers, and the stable regression coefficients are obtained as shown in Table 3. If the confidence intervals of the coefficients still contain zeros after removing outliers (regression coefficients with * in the table), the variables corresponding to the coefficients are considered to have no significant effect on the mathematical modeling scores, and will not be considered again in the subsequent regression iterations. Also, the terms with significant regression coefficients, but with reduced goodness of fit after addition, were done to be excluded (with **).

Table 3.

Stepwise regression coefficients

Number of iterations	Homology coefficient
Number of iterations	β₀	β₁	β₂	β₃	β₄	β₅	β₆	β₇
1	31.5643	0.5864	-	-	-	-	-	-
2	17.8265	0.4114	0.3685	-	-	-	-	-
3	16.5447	0.2336	0.2217	0.1684*	-	-	-	-
4	13.6524	0.1647	0.2358	-	0.0718**	-	-	-
5	12.3472	0.0941	0.2034	-	-	0.1753	-
6	11.4656	0.0935	0.2165	-	-	0.1732	0.1187	-
7	10.2945	0.1052	0.1823	-	-	0.1865	0.1034	0.0642

Through stepwise regression, 2 courses, Public Relations and Basic Theory and Practice of Economics, were excluded. The regression equation was obtained as shown in equation (22): (22) $y = 5.2034 + 0.1382 x_{1} + 0.1744 x_{2} + 0.1093 x_{5} + 0.1561 x_{6} + 0.1185 x_{7}$

The values of the regression effect test statistics are shown in Table 4. As can be seen from Table 4, the goodness of fit of regression utilizing only the primary term is low, and it is possible that ideological and political education performance is affected by the secondary term or the interaction term. For this reason, the method of stepwise regression was continued, adding the secondary and interaction terms to the regression equation one by one, and retaining the terms with strong regression significance and high goodness of fit.

Table 4.

Regression effect test statistic

Statistic	Linear regression model	Extended model
Goodness of fit R²	0.7595	0.8613
F statistic	79.4132	99.5874
P value	0.0000	0.0000
Error variance	11.3862	6.1947

The extended model after regression is shown in equation (23): (23) $\begin{array}{l} y = 45.8714 - 0.3721 x_{1} + 0.1586 x_{2} + 0.1762 x_{5} + 0.1093 x_{6} \\ + 0.0895 x_{7} + 0.5849 t_{1, 2} \end{array}$

The extended linear regression model with the addition of the interaction term increased the goodness-of-fit to 0.8613, the F-statistic increased, and the error variance decreased, further improving the regression.

The results of regression analysis showed that there was a linear relationship between the grades of five courses, namely, political science theory, history of western political thought, selected Marxist classics, topics of socialist economy, and history of the Chinese revolution, and the grades of ideological and political education.

4.3

Prediction of students’ performance in civic education

In order to test the prediction accuracy of the regression model, the ten-fold cross-validation method was used, in which the grades of 232 students were randomly divided into a training set and a test set, in which the training set contained the grade data of 207 students and the test set contained the grades of the remaining 25 students. A multiple linear regression extended model was used to determine the parameters using the training set data, and the test set data was used to predict the students’ performance in Civic and Political Education and compare it with the actual performance. In this paper, the accuracy of prediction is determined by mean square error (MSE) and mean relative error (MRE).

In order to study the prediction effect of the regression model, 10 experiments were conducted in this paper, and the experimental results are shown in Table 5. Since the training and test sets are randomly assigned, the regression model coefficients obtained from each training are not exactly the same and have some fluctuations. The results in Table 5 show that MSE and MRE are the mean square error and mean relative error of the training model on the test set, and the maximum values of MSE and MRE of the training model are 48.164 and 0.075, respectively, which have high prediction accuracy and can be utilized to predict students’ performance in Civic and Political Education using the regression model.

Table 5.

10 fold cross-verification results

Regression coefficient	Cross validation rounds
Regression coefficient	1	2	3	4	5	6	7	8	9	10
β₀	74.852	32.549	56.493	53.501	25.176	36.523	58.624	55.431	53.426	50.786
β₁	-0.718	-0.146	-0.482	-0.396	-0.062	-0.275	-0.495	-0.421	-0.463	-0.462
β₂	0.105	0.058	0.1319	0.094	0.096	0.169	0.107	0.098	0.164	0.126
β₅	0.095	0.076	0.195	0.109	0.093	0.114	0.129	0.078	0.091	0.125
β₆	0.157	0.181	0.116	0.123	0.131	0.119	0.148	0.125	0.078	0.155
β₇	0.104	0.139	0.102	0.117	0.138	0.077	0.128	0.125	0.097	0.117
t_1,2	1.006	0.425	0.776	0.789	0.358	0.396	0.735	0.825	0.749	0.671
MSE	20.673	22.134	33.157	22.615	48.164	38.217	19.524	33.619	30.427	36.428
MRE	0.051	0.053	0.059	0.044	0.075	0.069	0.048	0.059	0.054	0.065

5

Precision strategies for collaborative educational management and curricular philosophy and human resource development

Based on the big data analysis of the daily behavior of college students and the performance of Civic Education, this paper designs an effective path for educational management and curriculum Civic Education. Specifically, it includes the following aspects:

5.1

Management of Students through Curriculum Civics

Students, as the main body of the school, should be adhered to as the main body of the school when carrying out civic and political education activities, combining civic and political activities with student management, promoting the development of the students’ minds, and changing the students from a passive state to an active state of commitment, so as to make the management work more effective. For example, teachers can set up suggestion boxes in schools to encourage students to actively express their opinions on the issue of ideology and politics. Schools can also objectively adjust the civic and political work according to the psychological needs of students, improve students’ ideological concepts and attitudes, integrate people-oriented into the overall articulation of joint parenting, and build a new education management model. At the same time, in order to better implement management and innovative management, guide students to feel the school’s care and actively participate in management activities, to change the traditional passive state, so that the two are integrated, laying a solid foundation. In this regard, schools should make use of the teaching of ideology and politics, teachers should observe more carefully the psychological state of students and their reactions to some ideological issues, and the parties concerned should carry out management in a timely manner according to the actual situation of the students, and put people at the center of the management, maximize and prompt management activities unconsciously, as well as add personal feelings to the management activities. Such effective management activities can reflect students’ views on traditional management, and the integration of the two can greatly improve students’ comprehensive quality.

5.2

Improve the comprehensive quality of the Civic Education team

In order to achieve good educational results and enhance the effectiveness of ideological and political teaching and student education management guidance, teachers of ideological and political disciplines must have comprehensive and excellent quality, firm political stance, correct political direction, noble quality and personality charm, solid theoretical knowledge, persistent spiritual pursuit, rigorous and realistic academic attitude, serious and responsible educational behavior, rigorous and self-disciplined personal style, decent and solemn professional image, passion and vitality of loving education, and positive ideological exploration and follow-up guidance. In order for students to actively participate in the study of philosophical and political disciplines and to effectively improve their “attendance” and “advancement” rates, it is necessary to strengthen theoretical education and improve their level of academic competence. In the course of education, it is necessary to improve the quality of the classroom, demonstrate academic infectiousness, deepen the teaching of Civics and Political Science, give students room for development, and increase their motivation and initiative in participating in teaching. Deepen the implementation of a series of measures taken by the government to improve the quality of teachers, stimulate the motivation of teachers of Civics and Political Science, and enhance the spontaneity and creativity of Civics and Political Science teachers. They are encouraged to work as counselors or class teachers, to penetrate into the inner world of students and give them thorough guidance. With the goal of understanding students’ ideological conditions and improving teaching efficiency, teachers of teaching and research departments are organized, new media are used to prepare lessons, and teachers are given training in Civics and Politics to effectively improve their teaching level.

5.3

Diversification of Teaching Forms in Civics Programs

Civic and political education in schools has long faced the dilemma of difficult theory-guided teaching, which is importantly related to the cumbersome content of theory-guided teaching and the single teaching method, and the practical resources need to be integrated and upgraded urgently. Civic and political education should not be limited to the knowledge of a single subject, but should cover language, mathematics, nature, society, aesthetics, health, physical education and other aspects. In such a diversified curriculum environment, students can not only gain knowledge and improve their comprehensive quality. In addition, the selection and application of teaching methods in the teaching of Civics and Political Science is also very important. Traditional teaching methods are gradually detached from students’ needs, which is one of the main reasons for the low educational effect. Therefore, teachers need to adopt a variety of teaching methods to fully meet the diverse needs of students. Therefore, in the process of student management, Civic Education guides and practices various theoretical knowledge, enhances the aspect of Civic Education as well as other useful knowledge for students, so that students can acquire and master more important knowledge. Only through such methods and means can the overall effectiveness and efficiency be significantly improved.

5.4

Synergizing the content of Civics courses with student education and management

The content of education needs to be constantly updated and developed, but at the same time, its form also needs cultural innovation, and political education needs to be comprehensively covered, comprehensively evaluating students’ learning and life, and realizing comprehensive Civic and Political Education. At present, Civic and political education is carried out through a variety of channels, but in the process of specific educational activities, most of the methods and channels such as network education are used, obviously emphasizing form over content. This is a very novel teaching method, but the teaching content is not updated, the teaching resources are not integrated, and the problems of formalism and book knowledge are very obvious. Under such circumstances, the feasibility and significance of its education is obviously affected, thus giving rise to many practical problems and shortcomings. In general, the causes of this problem are multifaceted, objective understanding of its causes, actively targeting the characteristics of student management, the effective combination of the two, the improvement and optimization of the quality of education should be encouraged, so that the process of student management to participate in the practice of Civic and Political Education. The educational content of Civic and political education should be combined with the content of work related to student management to form a comprehensive integration mechanism for the purpose of organizing the teaching content of the practical courses, and the relevant teaching and research departments of philosophy and political science courses can organize the staff to set up a special working group to carry out an in-depth analysis of the educational content of the content of the practical courses, sorting out the various types of themes, and formulating a theme for each discipline, divided into a number of objectives Task items. These objectives and tasks are used in the daily work of student education and management and are accomplished in various forms.

5.5

Utilizing Civic Synergy Management to Improve Students’ Literacy

Higher education is an important channel for cultivating young talents, and the words and behaviors of students are directly related to the image of the country and the improvement of the quality of school education. Therefore, in order to promote the unity, harmony and healthy development of students, the school should formulate corresponding regulations and require all students to comply with them seriously. In addition, the school should also pay attention to the construction of campus cultural environment, cultural teaching literacy to help synergize the practice of high-quality teaching.

First of all, students should firmly establish the spirit of patriotism and collectivism, actively participate in volunteer activities and social practice activities, enhance the spirit of patriotism and social responsibility, students should pay attention to improve their ideological and political awareness, maintain support for the school leadership, and safeguard the interests of the school.

Secondly, after the risk crisis of the synergy of student management and ideological education occurs, the crisis can be reported through campus forums, public media accounts and other information dissemination methods, actively guide students, strengthen information orientation, and the behavior must be strictly regulated.

6

Conclusion

This paper comprehensively utilizes big data analysis methods such as K-Means clustering algorithm, Pearson correlation, and multiple regression analysis to mine the data of student management and civic education in colleges and universities, so as to put forward a precise strategy of collaborative parenting between education management and civic education in colleges and universities. 1)

Cluster analysis is performed by the improved K-Means algorithm, which clusters students into three categories of hard-learning, lack of socialization and physical exercise according to the proportion of time spent in different types of locations, with the proportion of the three types of students being 34%, 39% and 27%, respectively. The hotspots of time distribution of the hard-working students are mainly concentrated in the study places such as teaching buildings and libraries, the hotspots of time distribution of the unsocialized students are mainly concentrated in the dormitories, and the hotspots of time distribution of the physically active students are mainly concentrated in the activity centers, stadiums and other places. In addition, the cluster analysis of the types of students by gender and educational level shows that the proportion of female students is larger than that of male students in the hard-working and unsocialized types, while the proportion of male and female students in the physically active type is the opposite of that of the other two types. The proportion of graduate students is larger than the proportion of undergraduates in hard-learning students and lack of socialization students, and the proportion of graduate students is smaller than the proportion of undergraduates in physically active students.

2)

A prediction model for the achievement of Civic Education was established using multiple linear regression method. The correlation coefficients between the grades of each selected course and the grades of Civic and Political Education were all within the range of [0.51,0.69], with moderate correlation between the grades and only two courses, namely, Selected Classical Writings of Marxism (7.4882) and the History of Western Political Ideology (5.2183), had weak multiple covariation with other courses, so that they could be processed by regression analysis. The MSE and MRE of the resulting model are at a low level, the model prediction accuracy is high, and the prediction model is valid.

3)

On the basis of big data analysis, combining student education management and civic education, an innovative and precise strategy design for the collaborative education of the two is proposed.

Langue:: Anglais

Périodicité:: 1 fois par an
Sujets de la revue:: Sciences de la vie, Sciences de la vie, autres, Mathématiques, Mathématiques appliquées, Mathématiques générales, Physique, Physique, autres

RSS Feed de la revue

Design and Implementation of Precision Strategy for Civic Education of Student Management in Colleges and Universities Based on Big Data Analysis

Hua Deng

Publié en ligne: 26 janv. 2025

Reçu: 15 janv. 2025

Accepté: 08 mai 2025

DOI: https://doi.org/10.2478/amns-2025-1061

Mots clésK-Means, Multiple linear regression, Pearson correlation, Civic education, Big data analysis

© 2025 Hua Deng, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Mots clés
K-Means, Multiple linear regression, Pearson correlation, Civic education, Big data analysis