Extraction and analysis of individual features of students’ ideological and political education based on big data clustering algorithm

Accompanied by the booming development of information technology field, the intersection of information technology and human life is more and more, and information technology together with the development of the Internet is also rapid popularization, and the ensuing number of global data shows explosive growth, data from all over the world shows the aggregation trend, the outbreak of massive data for the economic and social development of the world’s countries and people’s lives and other aspects of the impact of a certain degree [1-2]. At the same time, the development of big data technology also provides impetus and technical support for the innovation of modern ideological and political education [3]. The application of big data technology to ideological and political education is a due move of education with the development of the times and the gradual maturation of big data technology [4]. Big data technology promotes the ideological and political education in colleges and universities to optimize and innovate in teaching methods, modes, management and evaluation, which improves the precision of ideological and political education in colleges and universities and enhances the effectiveness and validity of education [5-6]. Colleges and universities should grasp the digital revolution as a major historical opportunity, give full play to the functional role of big data in ideological and political education, coordinate and balance the dichotomy faced by ideological and political education under the big data scenario, accurately grasp the effective path of the innovation and development of big data-driven ideological and political education, effectively improve the effectiveness of ideological and political education, and continue to write new chapters in the history of ideological and political education in the history of the digital development of mankind. A new chapter in the history of human digital development and progress.

As a product of the new era, big data has both natural and social characteristics, with powerful data analysis and prediction functions, and is able to grasp the ideological dynamics and behavioral trajectory of college students as a whole. Li, Y. et al. showed that using big data technology to transform and analyze the huge amount of data generated by teaching activities can accurately monitor the quality of teaching, and then improve the teaching methods, providing powerful support for universities to strengthen teaching [7]. Liu, R. constructed a data analysis model for educational evaluation using K-means clustering technique, which was able to provide a detailed understanding of the problems faced by students’ learning by mining and analyzing the important features generated in the teaching and learning process, providing a basis for the development of effective student management strategies [8]. Kausar, S. et al. analyzed the performance of different mining techniques in educational data analysis and pointed out that more robust educational mining tools are able to adaptively mine effective information from large amounts of educational data to provide administrators with suitable educational decisions [9]. Wang, Z. emphasized the important role of data mining techniques in instructional management by using clustering algorithms to assess students’ performance, which can classify students into groups with high degree of similarity and provide assistance to instructional administrators to understand students’ learning characteristics and change their teaching strategies [10]. Oladipupo, O. O. et al. used median deviation statistics to assess the performance of different clustering algorithms in the task of knowledge mining of student data; algorithms with the highest clustering potential can provide educational administrators with more accurate clustering features, which is important for improving learning outcomes, student success, and supporting strategic decision making [11]. Urbina Nájera, A. B. et al. introduced clustering algorithms that can reflect the teacher-student relationship in instructional tutoring, and the use of clustering algorithms such as k-means to correlate teacher-student skills and affinity can effectively strengthen the effectiveness of instructional tutoring play [12]. Dwivedi, S. et al. explored the application of big data analytics techniques in educational recommender systems by extracting individual student characteristics in order to discover patterns of similarity in their grade level, subject matter, etc., in order to recommend appropriate courses of study for students [13]. Ding, D. et al. proposed a set of student behavior description index system and a student behavior segmentation model based on cluster analysis, which can identify student groups with different cluster characteristics by segmenting students’ behaviors, facilitating further student management in schools [14]. Through data analysis and data processing technology, students in colleges and universities can be “group portrait” and “individual portrait”, however, in the field of ideological and political education, how to use big data technology to achieve accurate ideological and political education goals still need further research.

The big data clustering algorithm is conducive to revealing the characteristics and developmental laws of college student groups, and is conducive to the targeted adjustment of the existing ideological and political teaching system in colleges and universities. Firstly, based on the theory of clustering algorithm, this paper focuses on the student group of higher vocational colleges and adopts FCM algorithm to construct the student portrait. Secondly, relying on the school information platform dataset, the individual characteristics of students are further extracted from three aspects, namely, diligence, sleep pattern, and consumption behavior situation, based on the psychological perspective. Finally, students are divided into five groups and the target group index TGI is introduced to characterize different students. The comparison of the clustering results of TGI values reveals the characteristics and growth patterns of students between students as a whole and different groups.

2

Cluster analysis

2.1

Data structure for cluster analysis

Two data structures are generally used in cluster analysis: the data matrix and the dissimilarity matrix.

Assuming that an entity set, with n member, is described using p attributes, its data matrix can be expressed as: $[\begin{matrix} x_{11} & \dots & x_{1 j} & \dots & x_{1 p} \\ \dots & \dots & \dots & \dots & \dots \\ x_{i 1} & \dots & x_{i j} & \dots & x_{i p} \\ \dots & \dots & \dots & \dots & \dots \\ x_{n 1} & \dots & x_{n j} & \dots & x_{n p} \end{matrix}]$

Let d(i,j) denote the dissimilarity between object i and object j, then the dissimilarity matrix can be expressed as: $[\begin{matrix} 0 & d (1, 2) & d (1, 3) & \dots & d (1, n) \\ d (2, 1) & 0 & d (2, 3) & \dots & d (2, n) \\ d (3, 1) & d (3, 2) & 0 & \dots & d (3, n) \\ ⋮ & ⋮ & ⋮ & \dots & ⋮ \\ d (n, 1) & d (n, 2) & d (n, 3) & \dots & 0 \end{matrix}]$ Where (1) generally d(i,j) is non-negative; (2) as objects i and j become more similar or “closer”, the closer the value of d(i,j) is to 0; and (3) conversely, the less similar the two objects are, the larger the value of d(i,j) is.

Because many clustering algorithms operate on dissimilarity matrices, the data matrix is often transformed into a dissimilarity matrix before the clustering algorithm is used.

2.2

Calculation of the degree of variability between data

Depending on the type of data, the calculation of the degree of difference between the data is different. The common types of data and their calculation of the degree of difference are described below:

1)

Interval Scaled Type

Interval-scaled variables are continuous measures of a rough linear scale. The clustering results of this type of variable may be strongly influenced by the units. Generally the smaller the units of the variable, the larger the domain of values and the greater the impact on the clustering results. Data can generally be standardized to reduce the impact of variable units on clustering results.

(1)

Calculate the absolute deviation from the mean s_f: (1) $s_{f} = \frac{1}{n} (| x_{1 f} - m_{f} | + | x_{2 f} - m_{f} | + \dots + | x_{n f} - m_{f} |)$ Where: x_1f,x_2f,⋯⋯x_nj is the n measure of f and m_f is the mean of f. (2) $m_{f} = \frac{1}{n} (x_{1 f} + x_{2 f} + \dots + x_{n f})$

(2)

Calculate the standardized measure or z – score: (3) $z_{i f} = \frac{x_{i f} - m_{f}}{s_{f}}$

For interval-scaled variables, the following distance measures are generally used to describe the dissimilarity between each pair of objects. Let object i = (x_in,x_i2,⋯⋯x_in), object j = (x_j1,x_j2,⋯⋯x_jn), then: (1)

The Euclidean distance is defined as: (4) $d (i, j) = \sqrt{{(x_{n} - x_{j 1})}^{2} + {(x_{i 2} - x_{j 2})}^{2} + \dots + {(x_{i n} - x_{j n})}^{2}}$

(2)

The Manhattan distance is defined as: (5) $d (i, j) = | x_{i 1} - x_{j 1} | + | x_{i 2} - x_{j 2} | + \dots + | x_{i n} - x_{j n} |$

(3)

Minkowski distance is defined as: (6) $d (i, j) = {({| x_{i 1} - x_{j 1} |}^{p} + {| x_{i 2} - x_{j 2} |}^{p} + \dots + {| x_{i n} - x_{j n} |}^{p})}^{1 / p}$

In the above equation, p is a positive integer. When p = 1, the formula represents the Manhattan distance; when p = 2, the formula represents the Euclidean distance.

The above distance formula has the following properties: d(i,j) ≥ 0; d(i,j) = 0; d(i,j) = d(j,i); d(i,j) ≤ d(i,h) + d(h,j).

When it is necessary to differentiate the importance of variables, weights can be assigned to the variables with the following formula: (7) $d (i, j) = \sqrt{w_{1} {| x_{i 1} - x_{j i} |}^{2} + w_{2} {| x_{i 2} - x_{j 2} |}^{2} + \dots + w_{m} {| x_{i n} - x_{j n} |}^{2}}$

2)

Binary variables

Binary variables have states: 0 and 1. When the variable is present, the value is 0; otherwise the value is 1. The possible values of binary variables are shown in Table 1. Where p = a + b + c +d.

Table 1.

The Possible values table of two element type variables

Object i	1	0	sum
1	a	b	a + b
0	c	d	c + d
sum	a + c	b + d	p

Binary variables are subdivided into symmetric and asymmetric binary variables.

A variable is symmetric binary if its two states have equal value and same weight. An example is the gender attribute. The dissimilarity formula for symmetric binary variables is: (8) $d (i, j) = \frac{b + c}{a + b + c + d}$

An asymmetric binary variable is one in which the outputs of the two states of the variable are not of equal importance, e.g., positive and negative results of a disease test are generally valued in the case of getting the disease that has less chance of occurring, i.e., the positive case, and the result is coded as a 1. The formula for the degree of dissimilarity of an asymmetric binary variable is: (9) $d (i, j) = \frac{b + c}{a + b + c + d}$

3)

Categorical variables

A categorical variable is similar to a binary variable, but it can take more than two state values. For example, a color variable may have five values: red, yellow, orange, green, and blue. Assuming that the number of states of a categorical variable is M, there are two methods for calculating the dissimilarity of categorical variables: (1)

Simple matching method (10) $d (i, j) = \frac{p - m}{p}$ where m is the number of variables that take the same value for object i and object j, and p is the number of all variables.

(2)

Create a binary variable for each of the M states of the variable, then encode the categorical variables with the asymmetric binary variables, and convert the dissimilarity of the categorical variables to calculate the dissimilarity of the asymmetric binary variables.

4)

Ordinal variables

Ordinal variables are divided into discrete ordinal variables and continuous ordinal variables. Discrete ordinal variables are similar to categorical variables except that the ordering of the M states is meaningful. For example, faculty title ranks are categorized as assistant professor, instructor, associate professor, and full professor. Continuous ordinal variables are similar to interval scale variables, but they do not have units; for example, the ranking of gold, silver, and bronze medals in sports competitions.

Suppose that some ordinal-type variable f describing an object has M_f ordered states, defining these states as 1,2,⋯⋯, M_f. The degree of dissimilarity with respect to a variable f can be computed using the following method: (1)

Replace the value x_if of the f variable of the ith object with the corresponding ordinal r_if ∈ {1,⋯⋯, M_f};

(2)

Compute the end z_if using equation $z_{i f} = \frac{r_{i f} - 1}{M_{f} - 1}$ and replace r_if with z_if.

(3)

Replace the f value of the ith object with z_if, and then calculate the degree of dissimilarity using either of the methods in Eqs. (4)(7).

5)

Proportional Scale Variables

Variables of this type approximately follow Equation Ae^Bt or Ae^−Bt. where A and B are positive constants and t generally represents time. There are three ways to calculate the dissimilarity of this type of variable as follows: (1)

Use the same method as for dealing with interval scaled variables;

(2)

Doing a logarithmic transformation on the values of the proportional scale variable, and then calculating the degree of dissimilarity of the results obtained using the same methods as those used to deal with the interval scale variable;

(3)

The proportional scale variable is treated as continuous ordinal-type data and then processed in the ordinal-type data manner. The processing should be chosen according to the actual needs, but generally the latter two are more effective.

6)

Mixed-type variables

In actual data processing, the data object to be processed is often described by a variety of different types of variables. For objects described by mixed types of variables, the degree of difference between them can be solved using Equation (11). The degree of dissimilarity d(i,f) between object i and object j is defined as: (11) $d (i, j) = \frac{\sum_{f = 1}^{p} δ_{i j}^{(f)} d_{i j}^{(f)}}{\sum_{f = 1}^{p} δ_{i j}^{(f)}}$ Where: p is the number of variables in the object. Indicator $δ_{i j}^{(f)} = 0$ when x_if or x_if is missing, or when x_if = x_jf = 0 and variable f is an asymmetric binary variable; otherwise indicator $δ_{i j}^{(f)} = 1$ .

Depending on the type of variable f, its contribution $d_{i j}^{(f)}$ to the dissimilarity between i and j is calculated as follows: (1)

f is an interval scale variable: $d_{i j}^{(f)} = \frac{| x_{i f} - x_{j f} |}{{max}_{h} x_{h f} - {min}_{h} x_{h f}}$ (h traverses all objects).

(2)

f is a binary or categorical variable: $d_{i j}^{(f)} = 0$ if x_if = x_if and $d_{i j}^{(f)} = 1$ otherwise.

(3)

f is an ordinal variable: compute the ranks r_f and $z_{i f} = \frac{r_{i f} - 1}{M_{f} - 1}$ , and then treat z_if as an interval scale variable.

(4)

f is a proportional scale variable: one way to do this is to log-transform it and then treat the result as an interval scale type: another way is to treat f as continuous ordinal data, compute r_g and z_if, and then treat z_if as an interval scale variable.

2.3

Fuzzy Clustering Algorithm

2.3.1

Basic concepts of fuzzy clustering

Clustering is an unsupervised learning method that aims to group data objects in a dataset into distinct groups or clusters in such a way that the data objects within a group are highly similar and the similarity between groups is low. Clustering algorithms usually group data without prior knowledge or labeling. As one of the classical machine learning algorithms, clustering algorithms are characterized by simple principles, high efficiency and practicality, and have high applicability in several application areas such as image segmentation, document analysis, feature learning and market segmentation.

Fuzzy clustering technique was proposed in the late 1960s and is an improvement of clustering technique. It was developed on the basis of the previous traditional distance-based clustering techniques, aiming to solve the problems of traditional clustering algorithms in dealing with complex datasets. Fuzzy clustering is a clustering based on the fuzzy division of objects, combined with the fuzzy measure of the object, and its basic concept is that the data points in the clustering space can belong to different clusters at the same time, and each data point has a proportional weight to indicate the degree of its belonging in each cluster. The fuzzy clustering results are expressed in the form of fuzzy matrices, which can be used to jointly decide which clusters each object belongs to and how much it belongs to according to subjective and objective forces, which improves the intuition of the fuzzy clustering results.

2.3.2

Introduction to fuzzy C-mean clustering

FCM is the best known fuzzy clustering method. With the hard clustering algorithm represented by K-means, each feature vector can be a member of a cluster, and in FCM feature vectors can belong to one or more clusters with different degrees of membership, constrained by a sum of membership degrees equal to one.

Let X = [X₁,X₂,⋯, X_n] ∈ ℝ^d≈n be a dataset consisting of n samples containing d features defined by the following matrix equation: (12) $X = [\begin{matrix} x_{11} & x_{12} & \dots & x_{1 n} \\ x_{21} & x_{22} & \dots & x_{2 n} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ x_{d 1} & x_{d 2} & \dots & x_{d 1} \end{matrix}]$

The FCM minimizes an objective function that takes into account the distance from the feature vector to the cluster center, and the following objective function needs to be minimized: (13) $J (U, V) = \sum_{i = 1}^{N} \sum_{j = 1}^{c} u_{i j}^{m} x_{i} - c_{j}^{2}$ where m is the number of clusters clustered, i,j is the class labeling, u_ij denotes the degree of affiliation of sample x₁ belonging to class j. i denotes the ith sample, x is a sample with d dimensional features. c_j(j = 1,2,⋯, k) is the center of the j cluster, which also has d dimensions. It is a Euclidean metric. In addition, the following assumptions are made: (14) $\sum_{j = 1}^{c} u_{i j} = 1, 0 \leq u_{i j} \leq 1$ where i = 1,2,⋯, n, j = 1,2,⋯, k. In order to search for the solution of equation (13), the following results are obtained by the Lagrangian function calculation under the constraints (14): (15) $c_{j} = \frac{\sum_{i = 1}^{N} 1_{i j}^{m} X_{i}}{\sum_{i = 1}^{N} u_{i j}^{m}}$ (16) $u_{i j} = \frac{1}{\sum_{k = 1}^{c} {(\frac{x_{i} - c_{j}}{x_{i} - c_{k}})}^{\frac{2}{m - 1}}}$ where i = 1,2,⋯, n, j = 1,2,⋯, k. The partition matrix U consists of the affiliation u_ij of the j th eigenvector to the first cluster. In any loop, the FCM computes C and U by Eq. (15) and Eq. (16). If $u_{1 j}^{(t)}$ is the value of u_ij computed after t iterations, the iterative process stops with the condition as: (17) $U^{(2)} - U^{(l - 1)} = max | u_{i j}^{(2)} - u_{i j}^{(i - 1)} | \leq ε$

In the equation ε > 0 is a pre-assigned parameter which is considered as a specified level of accuracy. The FCM algorithm pre-assigns the number of clustering centers, the fuzzification parameter m, which allows for the continuous computation of the degree of affiliation U and the clustering centers C. The FCM algorithm can be summarized as follows:

Algorithm 1: Fuzzy C Mean Clustering Algorithm (FCM)

Input: dataset D, number of cluster centers k, fuzzification parameters m, threshold ε

Output: set of clustering center vectors U

Step 1: Initialize the affiliation matrix U as a random number matrix.

Step 2: Calculate the degree of affiliation u_ij, i = 1,2,⋯, n, j = 1,2,⋯, k according to equation 2.5.

Step 3: Calculate the clustering center c_j, j = 1,2,⋯, k according to Equation 2.4.

Step 4: Repeat steps 2 to 3 until the error variation rate calculated by Equation (17) is less than ε

Step 5: Return the set of clustering center vectors U = {c₁,c₂,⋯, c_k}

3

Extraction and analysis of individual student characteristics based on clustering algorithms

3.1

Student Portrait Construction

3.1.1

Label Extraction

Ideological and political education in higher vocational colleges and universities needs to adjust the existing teaching system and carry out teaching practices precisely through student portraits, so there is an urgent need to obtain student portrait labels from the data collected in schools. Establishing portrait labels is the key work of user profiling, and the label name is a semanticized short text to summarize the function or meaning of the label. Based on the logic of teaching Civics class in higher vocational colleges and universities, the labels are extracted from students’ basic situation data, academic level data, social practice data, and physical and mental health data.

3.1.2

Student Portrait Construction

Through the above data collection and label extraction, the Civics teaching management department of higher vocational colleges and universities can draw portraits of upcoming students according to the levels of grades, majors, and classes before the start of the first semester of the new academic year, so as to reasonably formulate the teaching plan and arrange the lecturers. Firstly, at the grade level, the student portraits will be combined with the academic level data labels to make the first round of division for the freshmen who have never taken university Civics classes and those who have already taken Civics classes; secondly, at the major level, the student portraits will be combined with the basic information and the academic level data labels to make the second round of division according to the background of the disciplines of arts, sciences, engineering, agriculture, and arts, as well as the overall performance profiles of the majors, including the willingness to learn Civics and the ability to learn the rest of the subjects. learning ability) for the second round of division; again, at the class level, the student portrait will combine the basic situation and academic level data labels to carry out the third round of division according to the middle and high school situation and the enthusiasm for independent learning; finally, at the student level, the student portrait will combine the basic situation, social practice and physical and mental health data labels to carry out the third round of division according to the students’ individual profiles (including freshmen and former freshmen, nature of hukou, age, sex grade) , professional recognition, social service awareness, self-management and sound personality for the fourth round of division. The Civics teaching management part can, according to the teaching arrangement, start from the first round of division, the label extraction and classification integration of the target group, so as to construct a student portrait in line with the actual teaching.

3.2

Individual student characteristics

The scope of the acquired dataset is mainly the data generated by students using the information system provided by the school. In order to abstract business scenarios as much as possible, the behavioral characteristics of students are portrayed in terms of their intrinsic traits, so that these characteristics can play a certain degree of universal value in a wide range of application scenarios. Therefore, in this paper, we portray students’ individual characteristics from three perspectives: diligence, sleep patterns, and consumption behavior situations.

3.2.1

Diligence

The sub-trait of Diligence and Diligence, Aggressiveness, is strongly correlated. Aggressiveness responds to expectations that reflect the pursuit of competence and success in work or study. From campus card spending data, diligence can be measured by the amount of time a student spends studying. While there is no explicit metric, the frequency of presence on their study areas can be calculated as a proxy. The most obvious study areas include libraries and academic buildings. Colleges and universities, for security reasons, will set up access to the library to ensure that no one can enter the library without a campus card. Therefore, in order for students to gain access to the library, they must swipe their cards. In addition, students borrow books inside the library through their campus cards. In the academic building, although there is no access control, students usually fetch water in the academic building when school is in session or during self-study. In the academic buildings of Chinese colleges and universities, the schools usually install card machines for the machines that provide boiled water, so that students have to swipe their cards when they get boiled water, thus saving water. These consumption records can be considered a reflection of the students’ studies in the building.

For this reason, this paper calculates the number of times each student appears in the academic building and the library and uses it to measure the diligence indicator. That is, the more times a student appears in the library and the academic building, the more diligent he is. The correlation between diligence and academic performance is shown in Fig. 1(a) and Fig. 1(b), where the effect of the mean has been removed, making it possible to compare between different students. As can be seen from the figures, diligence and achievement rank are inversely related, with a mean Spearman rank correlation coefficient as high as -0.381. This observation suggests that students’ efforts will always be rewarded with good academic performance. In addition, we find that the correlation varies across semesters. In particular, the correlation coefficients of -0.152 and -0.149 in the first semester are significantly lower than in other semesters where the correlation is more than -0.261 One possible reason is that the first semester grades may still be highly dependent on what was learned in high school. Another reason is that many universities set up a strict routine during freshman year, resulting in less variation in behavior across students. It is worth noting that this paper includes the number of times students go to the print room as an indicator of diligence because students often need to print a lot of instructional materials to review before exams (20 days). The Cramér’sV value was used for one of the sleep patterns, while the others were Spearman’s correlation coefficients, and the p-values of all the correlations were much smaller than 0.001.

3.2.2

Sleep patterns

Literature examining the relationship between sleep patterns and student achievement suggests that students with good sleep habits are likely to achieve good grades. In particular, wake-up time and bedtime have a significant impact on student achievement, and students who stay up late and sleep in are generally less successful.

In the dataset collected from the schools, there were no actual wake-up and bedtimes. However, students need to use their campus cards to turn on the water, enter and exit the dormitory or take a shower. Therefore, in this paper, the first time and the last time of swiping the card each day are used as the wake-up time and bedtime. Considering the variability of the first and last time of the day for different students, this paper does not use too specific time to represent the sleeping pattern. Therefore, the frequency of the first swipe and the last swipe of each student’s campus card corresponding to the hour is calculated, and the timestamp of the highest frequency is used to denote the sleeping pattern of each student. Corresponding to wake-up and sleep patterns, there are only a few discrete values to represent them, for example, wake-up patterns are mainly concentrated at 6:00, 7:00, 8:00, 9:00, and 10:00, while sleep patterns are mainly concentrated at 22:00, 23:00, 24:00, 1:00, and 2:00. The correlation between students’ sleep patterns and grades is shown in Fig. 2(a) and Fig. 2(b).

In Figure 2(a), students who wake up later generally have worse grades. Since the first class usually starts at 8:00 a.m., students waking up at 6:00-7:00 a.m. and waking up at 7:00-8:00 a.m. are considered to be early risers, and those who often wake up during this time have the best grades. In Figure 2(b), we see that students who stay up late also have lower grades, which may be due to indulging in recreational activities such as games or novels. In this paper, we put the sleeping pattern characteristics into individual characteristics as well. Notice that the sleep pattern feature is uniquely hot, so we cannot calculate the Spearman correlation coefficient and choose the discretized Cramér’sV value between grades and sleep patterns instead, with a mean of 0.0677.

3.2.3

Consumer behavior

The students’ consumption behavior data extracted and counted according to the consumption behavior description index is shown in Table 2. In order to make the characteristics of students’ consumption behavior more accurate, the average monthly consumption amount of the extracted students’ consumption behavior data is the amount of students’ consumption in the restaurant, the peak monthly consumption amount is the total amount of students’ consumption in the restaurant and the supermarket, and the average monthly consumption frequency is the number of times that students consume in the restaurant.

Table 2.

Student consumption behavior data

Student number	Average monthly consumption/Yuan	Average monthly consumption/Yuan	Average monthly consumption frequency/yuan
201901****	739.54	820.24	73.0
201901****	832.28	1093.13	87.3
201901****	428.42	592.01	50.3
201901****	924.14	1093.23	59.1
201901****	380.09	532.08	50.2
201901****	879.24	967.29	78.5
201901****	921.33	987.23	79.2
201901****	350.20	380.92	48.3
201901****	730.29	829.34	78.1
201901****	926.24	988.26	80.5
201901****	539.21	678.29	67.9
201901****	370.13	540.23	73.2
201901****	863.92	930.92	64.1
201901****	572.39	783.93	70.3

Considering the actual situation of schools and college students, it can be seen that the average value of the overall average monthly consumption of students is not large, indicating that students do not spend a lot in the restaurant, and the price of the school restaurant is more favorable, but the average monthly consumption frequency of the overall students is not high, indicating that the college students’ diets are more irregular, and they generally do not eat breakfast.

3.3

Processing of clustered data

In this paper, the clustered student data are divided into five categories, including four special groups and one reference group. The five categories of student groups are: [Class I - learning difficulties group; Class II - economic difficulties group; Class III - psychological difficulties group; Class IV - group with employment difficulties; Category V (reference group) - general group.]

In order to outline the group portrait more easily, the Target Group Index (TGI) was introduced to characterize the characteristics of different students. The TGI means the strength of an individual within a group under a certain characteristic, i.e., the ratio of the proportion of a certain characteristic in the target group to the proportion of the characteristic in the whole, which can more accurately portray the strength of the characteristics of an individual within a group and the differences of the characteristics between groups. When TGI>100 indicates a tendency to be above average and vice versa. And usually when TGI>120 can be considered that the characteristic is more significant in the target group, and when TGI<80 can be considered that the characteristic is more significant in the total group. The results of clustering TGI values for different student group characteristics are shown in Table 3.

Table 3.

Clustering table of TGI values of student group characteristics

Group division label classification	Groups with learning difficulties	Groups with financial difficulties	Group with psychological difficulties	Groups with employment difficulties	General group
Diligence	82.89	84.92	88.03	81.47	94.18
Sleep pattern	159.23	92.11	110.35	137.35	98.46
Consumption behavior	105.32	74.92	101.29	102.51	103.63

For example, the TGI value of the economically disadvantaged group in the characteristic of “consumption behavior” is 74.92, which indicates that the performance of the target group is significantly lower than that of the overall level in this characteristic. The Target Group Index (TGI) is a numerical representation of the group’s characteristics, and a portrait based on the TGI can better reflect the characteristics and growth patterns of this group of students.

3.4

Characterization of the student population

In order to further reveal the characteristics and growth patterns of students between students as a whole and different groups, and to transform the characteristics of student groups from data description to graphical portrait, the results of TGI value clustering are shown in Figure 3. Based on the TGI values of each characteristic of different groups, it is found that there are both commonalities and some differences in the three characteristics between different student groups.

From the above three behavioral labels, the TGI values of the five types of student groups in terms of diligence do not differ much, with an average value of 86.298, which is lower than the average but not lower than 80, indicating that the five types of student groups are firm in the pursuit of values and cultural confidence. The majority of them are brave to struggle and never stop, and the minority of them choose to “lie down”.

The average TGI of the five groups of students in terms of sleep pattern is 119.5, which is more than the average level, but there is a big difference between different groups. The TGI values of the learning-difficulty group and the employment-difficulty group are well over 120, which is due to the poor self-discipline and initiative of students, the pan-entertainment use of cell phones, and the weak ability of self-education and self-management. Therefore, it is necessary to focus on playing an exemplary role to enhance the learning atmosphere and cultivate students’ awareness of independent learning.

The average TGI of the five types of student groups in consumption behavior is 97.534, close to the average level. Among them, the TGI of the economically disadvantaged student groups on consumption behavior is 74.92, much lower than the average level. The economically disadvantaged student groups should be managed accurately, and a financial aid model that focuses on financial aid and combines with spiritual education should be established.

4

Conclusion

In order to adjust the existing ideological and political education teaching system in higher vocational colleges and universities, this paper extracts and analyzes individual characteristics of students’ ideological and political education based on big data clustering algorithm.

Diligence and grade ranking are inversely related, and the average Spearman’s rank correlation coefficient is as high as -0.381. Sleep pattern and grade ranking. Sleep pattern and grade ranking were highly correlated with an average Cramér’sV value of 0.0677. Consumption behavior was differentiated.

The TGI values of the five student groups in terms of diligence do not differ much, with a mean value of 86.298, which is lower than the average but not lower than 80. the mean TGI in terms of sleep patterns is 119.5, which exceeds the average but varies greatly between groups, with the TGI values of the groups with learning difficulties and the groups with difficulties in finding employment being much higher than 120. The mean TGI in terms of consumption behavior is 97.534, with the TGI value of the economically difficult student group is 74.92, which is much lower than the average level.

The big data clustering algorithm intuitively reveals the characteristics and growth patterns of different student groups, and improves the precision of ideological and political education. Precise ideological and political education in colleges and universities in the new era should not only focus on extracting effective information from data, but also build an integrated system including data extraction, analysis and research, law discovery, strategy formulation, effective implementation, evaluation and feedback, and dynamic adjustment, in order to meet the requirements of ideological and political education of “transforming according to the events, advancing according to the times, and being new according to the situation”.

Acknowledgements

The Research is Supported by: The Hubei Province Philosophy and Social Science Research Special Task Project of 2020 (Ideological and Political Theory Course) “Research on Synergistic Training of Innovation and Entrepreneurship Education and Ideological and Political Education in Universities” (20Z059).

Sprache:: Englisch

Zeitrahmen der Veröffentlichung:: 1 Hefte pro Jahr
Fachgebiete der Zeitschrift:: Biologie, Biologie, andere, Mathematik, Angewandte Mathematik, Mathematik, Allgemeines, Physik, Physik, andere

Zeitschrift RSS Feed

Extraction and analysis of individual features of students’ ideological and political education based on big data clustering algorithm

Ying Zhai

Yong Gan

Online veröffentlicht: 19. März 2025

Eingereicht: 27. Okt. 2024

Akzeptiert: 18. Feb. 2025

DOI: https://doi.org/10.2478/amns-2025-0531

SchlüsselwörterStudent behavior, Cluster analysis, Student portrait, Precise thinking, FCM algorithm

© 2025 Ying Zhai, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Schlüsselwörter
Student behavior, Cluster analysis, Student portrait, Precise thinking, FCM algorithm