The realization of information technology-based big data analysis in personalized ideological and political education in colleges and universities

Ideological and political education in colleges and universities is an important way to enhance the cohesion and leading force of socialist ideology, a soul-casting channel to cultivate socialist core values, and an important guarantee for colleges and universities to implement the fundamental task of establishing morality and educating people, and to cultivate high-quality talents with high motivation, strong initiative and creativity. With the continuous development of information technology, digital technology has been integrated into ideological and political education in colleges and universities in a variety of forms, especially the in-depth analysis of the huge scale of data sets has opened up the way to dissect the trajectory of the educational target in real time, reveal the ideological status and behavioral dynamics of the whole group in an aggregated manner, and accurately predict the future trend. At the same time, big data analysis has shown unparalleled practical superiority in optimizing the information environment of ideological and political education in colleges and universities, realizing personalized knowledge inheritance and enhancing its relevance, scientificity and effectiveness.

Personalized education advocates that education should respect students, serve students, create favorable conditions for students, and maximize the possibility of meeting students’ needs and promoting students’ development. The important premise of realizing personalized education is to understand students’ needs, and in the era of big data, massive individual data provide strong support for educators to understand students’ individual characteristics and needs [1]. On the one hand, big data analysis can enhance the universality, continuity and flow of knowledge transfer of ideological and political education in colleges and universities by dynamically analyzing these continuously expanding whole-sequence data, thus changing the traditional pattern of ideological and political education in colleges and universities [2-3]. On the other hand, the subject of education relying on big data analysis technology to purify the total amount, speed, form and other dimensions of the raw data, the formation of the initial knowledge of the development trend of the individualization of the education object, so that the education object of the independent learning ability, learning emotions, etc. to be comprehensively improved [4-5].

Wang, X. et al. design a personalized recommendation method based on data mining for civic education resource classification in universities, which solves the drawbacks of the K-mean clustering algorithm recommender system that is affected by the dynamic change of the user’s preference, and the recommended resource samples have a high click-through rate, reliability, and promote the sharing of civic education resources [6]. Xu, Y. et al. proposed a personalized learning resource recommendation system for Civics teaching courses, which ensures that students receive resource information that meets their learning needs at the right time by identifying their preferences, goals, skills, and interests, and experiments have found that the system effectively improves students’ Civics scores, which is of strong value for dissemination [7]. Ma, X. showed that promoting informatization classroom reform and network teaching platform construction is conducive to enhancing the overall understanding of ideological and political education in colleges and universities, which can construct the informatization carrier of ideological and political education in colleges and universities from the three dimensions of network learning resources, network learning platforms and network learning interactions, and help students’ personalized learning [8]. Jiang, Y. designed a new method to provide accurate knowledge recommendation service based on students’ feedback information and behavioral data, using data profiling technology to extract knowledge tokens related to educational content based on students’ personality traits, and adopting recursive neural networks to achieve accurate recommendation of knowledge services based on the classification of knowledge tokens of association rule model [9]. Zhang, Q. et al. explored the advantages of artificial intelligence technologies such as data mining, personalized recommendation algorithms, and intelligent tutoring systems in empowering ideological and political precision education in colleges and universities, which improves the relevance and effectiveness of education by meeting the personalized and diversified learning needs of today’s college students [10]. Li, G. Personalized recommendation of Civics course resources is achieved by building a user model, the user’s online excursion behavior record contains rich behavioral characteristics and learning styles, and the collaborative recommendation algorithm is used to calculate similarity and nearest neighbor, which can provide services with good performance for students’ Civics course learning [11]. Zhou, D. pointed out that intelligent algorithms can accurately portray the user’s “portrait” and realize the accurate distribution of user needs, and the combination of intelligent algorithms and ideological and political education teaching can promote the accuracy of education and teaching and have an impact on the teaching of ideological and theoretical courses [12]. Hong, J. explored the integration of cloud computing and ideological and political education in colleges and universities innovative teaching mode, with the characteristics of fast processing speed, large data processing capacity, and high overall efficiency, cloud computing technology can provide effective support for personalized ideological and political education, and it is an innovative development path to strengthen the ideological and political education in colleges and universities [13].

This paper mainly selects the students of a university as the research object, and carries out data mining from the aspects of students’ performance and students’ daily behavior data. Through the histogram and kernel density diagram to show the law of changes in students’ performance, followed by the clustering analysis of the consumption module data in the one-card system, the use of Apriori algorithm to mine the association rules between the students’ behavioral habits and performance, and construct the intrinsic association between behavioral characteristics and the performance of ideology and politics. Thus, personalized education in ideology and politics can be achieved in colleges and universities, while improving students’ grades and upgrading the teaching management level of colleges and universities. After a semester of big data analysis in the practice of personalized education of ideology and politics, the differences between the experimental group and the control group are compared, which in turn verifies whether the personalized teaching model improves the learning effect of students.

2

Higher education data processing and statistical analysis

2.1

Data mining

Data mining, also known as knowledge discovery in databases (KDD), usually consists of seven phases: data cleaning, data integration, data selection, data transformation, pattern discovery, pattern evaluation, and knowledge representation. Data cleaning, data integration, data selection, and data transformation are all processes of data preprocessing, and the quality of data mining is largely dependent on the effectiveness of preprocessing. Pattern discovery is the process of extracting useful patterns from data using data mining algorithms. Pattern evaluation and knowledge representation are the subsequent processing steps to identify really useful knowledge through metrics and present it to the user using techniques such as visualization. The logistical data and student achievement data used in this study are from the data generated by all the students of the School of Civic and Political Science from March 2023 to June 2023. The data include campus one-card data, campus takeout data, library-related data, student Internet access data, and student achievement data. The main purpose is to collect and analyze the daily behavior data of college students, carry out data application such as data portrait, visualize the dynamic trend of college students’ thoughts and behaviors, and explore the law of daily ideological and political education of college students.

2.2

Cluster analysis

Cluster analysis is one of the most commonly used data analysis techniques for data mining and is related to unsupervised learning for machine learning. Clustering has a wide range of applications in many fields such as business intelligence, image pattern recognition, Web search, biology, and security, and can also be used as a preprocessing step for other data mining algorithms. Clustering is the process of dividing data objects into clusters, where objects within clusters are similar to each other and objects in different clusters are different from each other. In many cases, objects within the same cluster can be treated as a whole.

1)

Clustering Criteria

According to different criteria for clustering division, clustering algorithms are usually categorized into division methods, hierarchical methods, density-based methods, and grid-based methods.

DIVISION METHOD: This method divides n the original data objects into k clusters (k ≤ n) each containing at least one object. Assuming that C_t(1 ≤ t ≤ k) is a cluster after clustering division and U represents the original data set, there are: 1 $\cup_{t = 1}^{k} C_{t} = U$ 2 $\begin{matrix} C_{m} \cap C_{n} = ϕ & m \neq n . & 1 \leq n \leq k & 1 \leq m \leq k \end{matrix}$

Typical division algorithms are k – Means (K-Means) and k – Centroids (K-Medoids) algorithms, both of which use heuristics to progressively approximate the optimal clustering result. Division-based clustering algorithms are suitable for discovering spherical clusters in small to medium-sized databases.

Hierarchical methods: hierarchical methods are categorized into cohesive hierarchical clustering and split hierarchical clustering. Hierarchical clustering methods uncover patterns of data aggregation at different levels. Density-based clustering methods are designed to discover non-spherical clusters. The main idea is that as long as the density exceeds a certain limited density threshold, the cluster can continue to expand. Usually, density-based clustering algorithms consider only mutually exclusive clusters and ignore fuzzy sets. DBSCAN and DENCLUE are both density-based clustering algorithms.

Grid based methods: this method divides the data space into a limited number of data cells to form a grid structure and then clusters on the grid structure. The main grid-based clustering are GRIDCLUS, STING, etc.

2)

Separability of clusters

The clustering results can be categorized into mutually exclusive clusters and fuzzy clusters depending on the separability of the clusters. The traditional clustering division is characterized by strict classification of data objects, which results in distinct categorical boundaries between clusters. Fuzzy clustering clusters are not mutually exclusive, and data objects can have different degrees of affiliation to multiple clusters, which establishes an uncertain description of data objects with respect to categories. Fuzzy clusters respond better to the objective world than hard-divided clusters. Fuzzy C -mean (FCM) clustering algorithm is a typical fuzzy clustering algorithm, which obtains the final soft division result by calculating the minimum value of the objective function.

The K-Means algorithm is commonly employed in medicine, biology, and text document clustering. This clustering algorithm is used to discover the distribution of object clusters and the degree of similarity to obtain group characteristics of the objects.

K-Means algorithm is used in web user data mining to quickly and effectively discover the interest characteristics of web users, which in turn can be clustered to analyze the interest characteristics of the group of users, discovering the user’s interest, which helps to effectively predict the user’s interest and content recommendation at a later stage.

Assuming that the set of data points is: 3 $D : {x_{1}, x_{2}, \dots, x_{n}}$

Among them: 4 $x_{i} = (x_{i 1}, x_{i 2}, \dots, x_{i r})$

r -dimensional vector of real number space, n denotes the number of data points.The K-Means algorithm is described as follows:

Step (1), k data points are randomly selected from the data set D as the initial cluster centers.

Step (2), each data point outside the cluster center calculates the distance to each cluster center separately, divides the data points into the clusters represented by the nearest cluster centers, and then recalculates the cluster centers of each cluster. This process is iterated over many iterations until the termination condition is satisfied.

The formula for calculating the cluster centers of the K-Means algorithm is: 5 $m_{j} = \frac{1}{| C_{j} |} \sum_{x_{i} \in C_{j}} x_{i}$

where C_j denotes the j nd cluster, j = 1,2,…,k·m_j . C_j denotes the cluster center of cluster |C_j| (the mean vector of all data points in the cluster). C_j denotes the number of data points contained in cluster C_j. The distance from data point x_i to cluster center m_j is calculated as: 6 $\begin{matrix} d i s t (x_{i}, m_{j}) = ‖ x_{i} - m_{j} ‖ \\ = \sqrt{{(x_{i 1} - m_{j 1})}^{2} + {(x_{i 2} - m_{j 2})}^{2} + \dots + {(x_{i r} - m_{j r})}^{2}} \end{matrix}$

The iteration termination condition can be any of the following conditions:

No more data points are reassigned.

The cluster center no longer changes.

The sum of squared errors (SSE) is locally minimized.

The SSE is calculated by the formula: 7 $S S E = \sum_{j = 1}^{k} \sum_{x \in C_{j}} d i s t {(x, m_{j})}^{2}$

where dist(x,m_j) denotes the distance between data point x and cluster center m_j. The computational complexity of K-Means is O(tkn), where t is the number of loop iterations, k is the number of clusters to be divided, and n is the number of data points.

K-Means clustering is characterized by the need to pre-determine the number of clusters k to be divided, and the clustering effect depends to a large extent on the selection of the initial cluster center. Currently, the commonly used method to determine the number of clusters k is to set multiple k values, perform multiple clustering, evaluate the results of multiple clustering, and finally determine the number of clusters based on the evaluation results. There are many ways to select the initial cluster centers, but the one that is easier to understand and accept is the “density method”, which first divides all data points according to a certain radius, and selects the first k points with the highest number of data points falling in the circular area centered on the data points as the initial cluster centers.

2.3

Association rules

Association rule is an important topic in data mining, which is to mine and analyze the deep relationship of a large amount of teaching data, so as to discover the correlation and regularity between the data of students’ daily behaviors and students’ performance as well as the implicit correlation information between the data. Understand a few relevant definitions below to better understand the association rules:

Definition 1: In the item set: 8 $I = {I [1], I [2] \dots, I [k]}$

The number of contained items is k, then I is a k – item set.

Definition 2: Relying on the fact that for the criticality of an association rule, the data of support is applied, the probability of simultaneous occurrence of itemset X and itemset Y in all the itemsets can be expressed by the support of association rule X ⇒ Y, i.e.: 9 $\begin{array}{l} \sup (X \Rightarrow Y) = \sup (X \cup Y) = P (X \cup Y) \\ = (\frac{N u m b e r o f t r a n s a c t i o n s c o n t a i n i n g X a n d Y}{T o t a l n u m b e r o f t r a n s a c t i o n s}) \times 100 % \end{array}$

General practical use will confirm the minimum support: min_sup, the minimum support is used to filter the resulting association rules, remove some of the unimportant or useless rules, and is a measure of the support of the user’s own needs defined by the threshold.

Definition 3: For an itemset I, if the actual support is sup(I) ≥ min, then I can be defined as a frequent itemset.

Definition 4: The conditional probability of an association rule outcome occurring is usually described by the confidence level, denoted as conf(X ⇒ Y) which is expressed as follows: 10 $c o n f (X \Rightarrow Y) = \frac{\sup (X \cup Y)}{\sup (X)} = P (Y | X)$

Confidence is used to express the specific probability of Y being generated under the condition that X occurs, and is generally used to determine rule accuracy. In practice, users will recognize the minimum confidence level: min _ conf, and therefore the removal of association rules with accuracy less than the threshold.

Definition 5: Set (X ⇒ Y) and meet the conditions sup(X ⇒ Y) ≥ min_sup and conf (X ⇒ F) ≥ min_conf, then it is considered that X ⇒ Y is a strong association rule, in general, strong association rules, are useful information that the user needs.

Apriori algorithm uses a hierarchical iterative search strategy, and is also one of the more classic association rule algorithms in a frequent itemset mining algorithm, the database is opened in k = 0 and continuously scanned, step by step to generate the corresponding frequent k+1–itemset.

1)

Apriori Properties

Apriori algorithm is a classical association rule mining algorithm for discovering frequent itemsets in a dataset. The algorithm is based on an idea known as the “Apriori principle”, i.e., if an item set is frequent, then all its subsets must be frequent as well.The main idea of the Apriori algorithm is to mine the frequent itemsets from the dataset in an iterative manner. Its a commonly used association rule mining algorithm. Its core idea is to reduce the number of candidate itemsets by the nature of frequent itemsets, so as to improve the efficiency of the algorithm. Specifically, Apriori algorithm uses two main steps: the first step is to find the frequent itemsets. The second step is to use the frequent itemsets to generate association rules.The key to the Apriori algorithm is the nature of the frequent itemsets. This property states that if a given itemset is a frequent itemset, then all non-empty subsets of it must also be frequent itemsets. Conversely, if an item set is infrequent, then all the supersets that encompass it must also be infrequent. This property helps us to prune and reduce the number of candidate itemsets, thus improving the efficiency of the algorithm.

2)

Joining and pruning operations

Candidate item sets are usually generated by the join operation, usually by the set L_k containing all k – frequent item sets and its own connection, that is, the candidate k + 1 rely on L_k ∝ L_k generated and recorded as C_k+1. Let set L_k = {I₁,I₂,I₃,⋯,I_n} to represent n and k – items to cover L_k, and also to express the set of items and items to I_i = {I_i[1],I_i[2],I_i[3]⋯I_i[k]} arranging the items introduced in I_i according to the dictionary order.

In the case of (I_i[1] = I_j[1])ℓℓ(I_i[2] = I_j[2])ℓℓ…ℓ(I_i[k – 1] = I_j[k - 1])ℓℓ(I_i[k] < I_j[k] only, then it is determined that I_i and I_j can be connected and the itemset is {I_i[1].I_i[2],I_i[3]⋯,I_i[k – 1],I_i[k – 1],I_i[k],I_j[k]}.

The pruning operation is intelligently powered to sift through the set of itemsets generated after the self-connections C_k+1, removing the corresponding infrequent items based on the minimum support data, and the Apriori properties should be synthesized to reduce the number of scans accordingly.

3)

Steps of Apriori algorithm

(1) Find the transactions in the database and generate candidate k – itemsets.

(2) Frequent k – itemsets, i.e., generated by combining whether the items in them match the corresponding min_sup.

(3) By joining and pruning operations, candidate k + 1 – itemsets are generated from frequent k – itemsets, relying on the continuous repetition of the previous steps until the existence of empty ones.

(4) Strong association rules are generated for the obtained frequent itemsets using minimum confidence min_conf.

4)

Apriori algorithm implementation

Input: D-Transaction database. min_sup-Minimum support.

Output: Frequent itemsets in L – D.

3

Implementation path of personalized ideological and political education for university students

To promote the innovation of daily ideological and political education of college students based on big data, it is not only necessary to follow the disciplinary law and focus on the deepening of theories, but also adhere to the practice orientation, and endeavor to promote the application of big data to facilitate the two-way transformation of theories and practices. On the basis of establishing the principle of big data to promote daily ideological and political education of college students, on the one hand, the application process of big data in daily ideological and political education of college students is discussed at the macro level, including the five major work module processes of data collection, data pre-processing and storage, data mining and analysis, data application and visualization, and data interpretation and feedback of big data in daily ideological and political education of college students. On the other hand, the program design and data model construction of the application of big data in the typical field of college students’ daily ideological and political education, such as data portrait, are carried out at the micro level. Through exploration, we provide a practical implementation path for the utilization of big data in daily ideological and political education of college students.

4

Practical research on personalized education in civics and politics with big data analysis

4.1

Example of Personalized Education Analysis

4.1.1

Analysis of performance in instructional system design

This paper focuses on the analysis of students’ data behavior by collecting students’ data from students’ ideological and political achievements, students’ daily behavioral data, and teaching and training programs, strengthening counseling and assistance for problematic students, and rationally adjusting teaching and training programs according to the results of the analysis to improve the efficiency of teaching. A university is a key university in a province, and it offers three degree-granting qualifications: doctoral, master’s, and bachelor’s degrees.

The histogram is capable of demonstrating the regularity of data changes and accurately reflecting the characteristic distribution state of the data to be analyzed. The use of histograms to analyze data changes is mainly used to describe the frequency distribution of continuous type variables. The kernel density plot is suitable for observing the distribution of continuous variables, and it is a nonparametric method used to estimate the probability density function of random variables. After synthesizing the initial perceptions above, it is time to move forward with the subsequent comparative analysis. Civics performance in the teaching system is shown in Figure 1, the teaching system design performance in the number of people in each score band is not evenly distributed, creating a distinct achievement gap, the difference in the number of people in each score band is too large, and may also be related to the relative ease of the topic, the number of people scoring high scores of more than 80 to 93 points is high.

4.1.2

Consumption data in the cartoon system

Firstly, according to the analysis requirements, the data of consumption module in the one-card system are collected, and then the data preprocessing is carried out through big data technology, and the data of the number of times of card use, food (combining the consumption of Peach Garden Dining Hall, Apricot Garden Dining Hall, Guiyuan Dining Hall, Food City, and Commercial Street food and beverage consumption), supermarkets, hot water, and public transportation of all the students of the class of 2023 of a university college are selected as the clustering variables, so as to get the data that can be used for the K- means clustering of the data to be measured. In addition, the raw consumption data were discretized before the cluster analysis, and the characteristics of the data to be measured were divided into five ranges based on quintiles as: very high (assigned a value of 2), high (1), medium (0), low (-1), and very low (-2) for five divisions of the value domain. In order to alleviate the difficulty of subsequent research, the data were transformed to the desired state and the results are shown in Figure 2. Supermarket consumption, hot water consumption, and public transportation consumption have more similar discrete value distributions, with the most distribution at the -2 value division and the least distribution at the 2 value. Food consumption and credit card consumption have similar discrete distributions, which demonstrates that there is some connection between them.

In this paper, we use R language to conduct several tests, and finally take the C value of 4. The clustering results are relatively good. The distribution of the number of students in the four groups of clustered categories is shown in Figure 3. It can be seen that the number of students in the second category is the smallest, and the number of students in the third category is the largest.

From the clustering results situation, the four different categories are characterized as follows:

Category 1: This category has the highest number of card usage (Usetime), food spending (Sc.exp) and supermarket spending (Sm.exp), while hot water spending (Ib.exp) and public transportation spending (Bus.exp) are at the lower normal level.

Category 2: The most significant difference between this category and the other categories is characterized by the fact that hot water consumption is the highest, while the number of card uses, food consumption, supermarket consumption, and bus consumption are at the lower normal consumption level.

Category 3: The consumption characteristics of this category of favors are not prominent. The difference that can be seen is that the number of card uses and the amount of food consumption are slightly higher than those in categories 2 and 4, while supermarket consumption, hot water consumption and public transportation consumption are all at a lower level.

Category 4: This group of people has the lowest level of food consumption and the number of times they use their cards among the four categories, but the highest amount of public transportation consumption, meanwhile, on-campus supermarkets and convenience stores are at the upper-normal level of consumption. Hot water card consumption is also relatively low. The correlation between this group and academic performance suggests that this group is generally underperforming and that the school department should pay more attention to this group of students.

4.1.3

Example of Analyzing Student Achievement and One-Card Spending Data

After clustering and correlating the student one-card consumption data in the previous section, we found some valuable data and further analyzed this kind of data with consumption characteristics, and obtained the results as shown in Table 1, the posterior items are the frequent itemsets of academic performance variables, and the attributes such as eating breakfast, participating in the examination, and the number of book lending are the strong correlation sets of good and bad academic performance. In terms of the degree of enhancement, the enhancement degree of the 10 itemsets is in the range of 1.1225~1.4026, indicating a good degree of association. It indicates that whether students eat breakfast or not has the same strong correlation with good or bad student performance. The higher the number of times students eat breakfast, the higher the probability of better student performance. From the opposite perspective, students who ate breakfast less often also checked out books less often. The greater the probability that students who did not receive scholarships had poorer student grades.

Table 1.

Student achievement and cartoon consumption data correlation analysis

ID	Support	Confidence	Degree of ascension
1	0.068956	0.475282	1.1524
2	0.065254	0.522367	1.2935
3	0.056204	0.480214	1.1920
4	0.179532	0.452312	1.2014
5	0.175268	0.475268	1.2536
6	0.156231	0.462187	1.2125
7	0.134029	0.433257	1.1653
8	0.110022	0.241587	1.1235
9	0.110022	0.312240	1.4026
10	0.112357	0.274295	1.2548

4.2

Analysis of the effect of the application of personalized education in civic politics

For the analysis of the effect of teaching application, this study adopts the quasi-experimental method to investigate the effect of personalized teaching with the application of big data on the improvement of students’ academic performance. In addition, interviews and questionnaires were designed to obtain teachers’ teaching feelings and students’ learning feelings, to understand teachers’ and students’ data-supported teaching experience, the acceptance level of supported personalized teaching and its teaching effectiveness. This study started with the feasibility of the experiment and chose to carry out a study on the teaching practice of personalized teaching supported by technology in the field of information technology.

During and after the teaching practice, experimental data were collected and analyzed statistically and analytically. The experimental data are mainly divided into three parts: the stage test Civics scores of the experimental group and the control group, as well as the final test Civics scores. The second step is to investigate the learning experiences of the students in the two groups using questionnaires. Third, ten students in the experimental group were randomly selected and interviewed about their learning experience.

In this study, a personalized teaching experiment was conducted at a key university in a province for a semester. In the course implementation, two classes were tested in stages. A total of six tests were conducted, all of which were in the form of on-line operation to examine students’ mastery of knowledge related to Civics and Politics, and the type, number and difficulty of the test questions were relatively close to each other, and the Civics scores of the control class and the experimental class on the phase test are shown in Figure 4, where the Civics scores of the control class were relatively better in Test 1, probably due to the control class having a better foundation in information. Although both classes saw an overall improvement in Civics scores, the experimental class outperformed the control class on the last three tests by showing a greater trend of improvement. Test 6 saw the experimental class score 3.86 points more than the control class.

The change in the distribution of scores from Test 1 to Test 6 for the control and experimental classes is shown in Figure 5. The excellence rate is the number of people with scores of 85 and above as a percentage of the total number of students in the class, the good rate is the number of people with scores between 75 and 84 (inclusive of 75 and 84) as a percentage of the total number of students in the class, the pass rate is the number of people with scores between 60 and 74 (inclusive of 60 and 74) as a percentage of the total number of students in the class, and the fail rate is the number of people with scores lower than 60 as a percentage of the total number of students in the class. As can be seen from the graph, the excellence rate of both classes has increased, with the control class increasing from 44.7% to 64.7%, but in contrast, the experimental class has made more significant progress, with the excellence rate increasing from 52.5% to 94.6%. It can be seen that students at different levels have improved on their original foundation, especially those with a poorer foundation who have been able to make continuous progress and break through their learning bottlenecks, a change that requires a lot of effort. To a certain extent, it also shows that personalized teaching can stimulate students’ motivation to learn, and can meet the learning styles of the majority of students, making them all able to make progress in their learning, and at the same time promoting the improvement of the teaching level.

Analyzed by one-way ANOVA test as shown in Table 2, the mean score of the experimental group for test 1 is 70.14 and the mean score for test 6 is 92.23, it can be seen that the mean score of the experimental group has been significantly improved. The p-value is less than 0.01. In statistical terms, a difference of less than 0.05 indicates a significant difference, and a difference of more than 0.01 indicates a significant difference. This indicates that in this experimental study, the experimental group’s performance in Civics was significantly improved from the beginning of the semester to the end of the semester, reflecting the fact that personalized teaching with big data analytics can help to promote students’ learning in Civics. The experimental class scores are higher than the control class, and the significance level P value is 0.004, which is less than 0.01, indicating that the mean scores show a significant difference, i.e., the students in the experimental class scored significantly higher than the control class in Civics and Politics in the final exam.

Table 2.

Analysis and analysis of single factor variance analysis

	Average score	Standard deviation	Significance level
Test 1	70.14	13.16	P<0.01
Test 2	92.23	3.22
B group	88.36	6.42	P=0.004
A group	92.23	3.22

5

Conclusion

This paper mines and analyzes the data of the teaching system of colleges and universities, and obtains certain research results, using the K-means algorithm and the improved Apriori algorithm to analyze the students’ consumption habits and academic performance, and mines the connection between the students’ performance and their behavioral habits, which plays an important role as a reference for the personalized training of students. The results show that the difference in the number of students’ performance in each score band is large, which is mainly concentrated in the high score band. The consumption data of students were clustered into 4 categories, and the 4th group of people had a general deviation in academic performance, which should be paid extra attention by schools and teachers. Eating breakfast, attending exams, and the number of books borrowed have a better degree of academic performance association, which can influence students’ performance. Through teaching practice, it can be found that personalized teaching based on big data analysis can help promote students’ learning of Civics.

Funding:

Shaanxi Provincial “14th Five-Year Plan” Education Science Plan 2022 annual project: Shaanxi Provincial private universities “Internet + Party building” theory and practice research (project approval number: SGH22Y1736); 2024 Ideological and Political Work Research Fund of Xi ‘an Mingde Institute of Technology: Research on the Construction and Optimization of the Paradigm of Online Ideological and Political Education in Universities under the Background of Big Data (Project number: SZ2024Y01).

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

The realization of information technology-based big data analysis in personalized ideological and political education in colleges and universities

Wenju He

Yongheng Wang

Published Online: Mar 21, 2025

Received: Nov 06, 2024

Accepted: Feb 04, 2025

DOI: https://doi.org/10.2478/amns-2025-0659

KeywordsData mining, Personalized education, K-means clustering, Apriori association rule

© 2025 Wenju He, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
Data mining, Personalized education, K-means clustering, Apriori association rule