The realization of information technology-based big data analysis in personalized ideological and political education in colleges and universities
Published Online: Mar 21, 2025
Received: Nov 06, 2024
Accepted: Feb 04, 2025
DOI: https://doi.org/10.2478/amns-2025-0659
Keywords
© 2025 Wenju He, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Ideological and political education in colleges and universities is an important way to enhance the cohesion and leading force of socialist ideology, a soul-casting channel to cultivate socialist core values, and an important guarantee for colleges and universities to implement the fundamental task of establishing morality and educating people, and to cultivate high-quality talents with high motivation, strong initiative and creativity. With the continuous development of information technology, digital technology has been integrated into ideological and political education in colleges and universities in a variety of forms, especially the in-depth analysis of the huge scale of data sets has opened up the way to dissect the trajectory of the educational target in real time, reveal the ideological status and behavioral dynamics of the whole group in an aggregated manner, and accurately predict the future trend. At the same time, big data analysis has shown unparalleled practical superiority in optimizing the information environment of ideological and political education in colleges and universities, realizing personalized knowledge inheritance and enhancing its relevance, scientificity and effectiveness.
Personalized education advocates that education should respect students, serve students, create favorable conditions for students, and maximize the possibility of meeting students’ needs and promoting students’ development. The important premise of realizing personalized education is to understand students’ needs, and in the era of big data, massive individual data provide strong support for educators to understand students’ individual characteristics and needs [1]. On the one hand, big data analysis can enhance the universality, continuity and flow of knowledge transfer of ideological and political education in colleges and universities by dynamically analyzing these continuously expanding whole-sequence data, thus changing the traditional pattern of ideological and political education in colleges and universities [2-3]. On the other hand, the subject of education relying on big data analysis technology to purify the total amount, speed, form and other dimensions of the raw data, the formation of the initial knowledge of the development trend of the individualization of the education object, so that the education object of the independent learning ability, learning emotions, etc. to be comprehensively improved [4-5].
Wang, X. et al. design a personalized recommendation method based on data mining for civic education resource classification in universities, which solves the drawbacks of the K-mean clustering algorithm recommender system that is affected by the dynamic change of the user’s preference, and the recommended resource samples have a high click-through rate, reliability, and promote the sharing of civic education resources [6]. Xu, Y. et al. proposed a personalized learning resource recommendation system for Civics teaching courses, which ensures that students receive resource information that meets their learning needs at the right time by identifying their preferences, goals, skills, and interests, and experiments have found that the system effectively improves students’ Civics scores, which is of strong value for dissemination [7]. Ma, X. showed that promoting informatization classroom reform and network teaching platform construction is conducive to enhancing the overall understanding of ideological and political education in colleges and universities, which can construct the informatization carrier of ideological and political education in colleges and universities from the three dimensions of network learning resources, network learning platforms and network learning interactions, and help students’ personalized learning [8]. Jiang, Y. designed a new method to provide accurate knowledge recommendation service based on students’ feedback information and behavioral data, using data profiling technology to extract knowledge tokens related to educational content based on students’ personality traits, and adopting recursive neural networks to achieve accurate recommendation of knowledge services based on the classification of knowledge tokens of association rule model [9]. Zhang, Q. et al. explored the advantages of artificial intelligence technologies such as data mining, personalized recommendation algorithms, and intelligent tutoring systems in empowering ideological and political precision education in colleges and universities, which improves the relevance and effectiveness of education by meeting the personalized and diversified learning needs of today’s college students [10]. Li, G. Personalized recommendation of Civics course resources is achieved by building a user model, the user’s online excursion behavior record contains rich behavioral characteristics and learning styles, and the collaborative recommendation algorithm is used to calculate similarity and nearest neighbor, which can provide services with good performance for students’ Civics course learning [11]. Zhou, D. pointed out that intelligent algorithms can accurately portray the user’s “portrait” and realize the accurate distribution of user needs, and the combination of intelligent algorithms and ideological and political education teaching can promote the accuracy of education and teaching and have an impact on the teaching of ideological and theoretical courses [12]. Hong, J. explored the integration of cloud computing and ideological and political education in colleges and universities innovative teaching mode, with the characteristics of fast processing speed, large data processing capacity, and high overall efficiency, cloud computing technology can provide effective support for personalized ideological and political education, and it is an innovative development path to strengthen the ideological and political education in colleges and universities [13].
This paper mainly selects the students of a university as the research object, and carries out data mining from the aspects of students’ performance and students’ daily behavior data. Through the histogram and kernel density diagram to show the law of changes in students’ performance, followed by the clustering analysis of the consumption module data in the one-card system, the use of Apriori algorithm to mine the association rules between the students’ behavioral habits and performance, and construct the intrinsic association between behavioral characteristics and the performance of ideology and politics. Thus, personalized education in ideology and politics can be achieved in colleges and universities, while improving students’ grades and upgrading the teaching management level of colleges and universities. After a semester of big data analysis in the practice of personalized education of ideology and politics, the differences between the experimental group and the control group are compared, which in turn verifies whether the personalized teaching model improves the learning effect of students.
Data mining, also known as knowledge discovery in databases (KDD), usually consists of seven phases: data cleaning, data integration, data selection, data transformation, pattern discovery, pattern evaluation, and knowledge representation. Data cleaning, data integration, data selection, and data transformation are all processes of data preprocessing, and the quality of data mining is largely dependent on the effectiveness of preprocessing. Pattern discovery is the process of extracting useful patterns from data using data mining algorithms. Pattern evaluation and knowledge representation are the subsequent processing steps to identify really useful knowledge through metrics and present it to the user using techniques such as visualization. The logistical data and student achievement data used in this study are from the data generated by all the students of the School of Civic and Political Science from March 2023 to June 2023. The data include campus one-card data, campus takeout data, library-related data, student Internet access data, and student achievement data. The main purpose is to collect and analyze the daily behavior data of college students, carry out data application such as data portrait, visualize the dynamic trend of college students’ thoughts and behaviors, and explore the law of daily ideological and political education of college students.
Cluster analysis is one of the most commonly used data analysis techniques for data mining and is related to unsupervised learning for machine learning. Clustering has a wide range of applications in many fields such as business intelligence, image pattern recognition, Web search, biology, and security, and can also be used as a preprocessing step for other data mining algorithms. Clustering is the process of dividing data objects into clusters, where objects within clusters are similar to each other and objects in different clusters are different from each other. In many cases, objects within the same cluster can be treated as a whole.
According to different criteria for clustering division, clustering algorithms are usually categorized into division methods, hierarchical methods, density-based methods, and grid-based methods.
DIVISION METHOD: This method divides
Typical division algorithms are
Hierarchical methods: hierarchical methods are categorized into cohesive hierarchical clustering and split hierarchical clustering. Hierarchical clustering methods uncover patterns of data aggregation at different levels. Density-based clustering methods are designed to discover non-spherical clusters. The main idea is that as long as the density exceeds a certain limited density threshold, the cluster can continue to expand. Usually, density-based clustering algorithms consider only mutually exclusive clusters and ignore fuzzy sets. DBSCAN and DENCLUE are both density-based clustering algorithms.
Grid based methods: this method divides the data space into a limited number of data cells to form a grid structure and then clusters on the grid structure. The main grid-based clustering are GRIDCLUS, STING, etc.
The clustering results can be categorized into mutually exclusive clusters and fuzzy clusters depending on the separability of the clusters. The traditional clustering division is characterized by strict classification of data objects, which results in distinct categorical boundaries between clusters. Fuzzy clustering clusters are not mutually exclusive, and data objects can have different degrees of affiliation to multiple clusters, which establishes an uncertain description of data objects with respect to categories. Fuzzy clusters respond better to the objective world than hard-divided clusters. Fuzzy
The K-Means algorithm is commonly employed in medicine, biology, and text document clustering. This clustering algorithm is used to discover the distribution of object clusters and the degree of similarity to obtain group characteristics of the objects.
K-Means algorithm is used in web user data mining to quickly and effectively discover the interest characteristics of web users, which in turn can be clustered to analyze the interest characteristics of the group of users, discovering the user’s interest, which helps to effectively predict the user’s interest and content recommendation at a later stage.
Assuming that the set of data points is:
Among them:
Step (1),
Step (2), each data point outside the cluster center calculates the distance to each cluster center separately, divides the data points into the clusters represented by the nearest cluster centers, and then recalculates the cluster centers of each cluster. This process is iterated over many iterations until the termination condition is satisfied.
The formula for calculating the cluster centers of the K-Means algorithm is:
where
The iteration termination condition can be any of the following conditions:
No more data points are reassigned.
The cluster center no longer changes.
The sum of squared errors (SSE) is locally minimized.
The SSE is calculated by the formula:
where dist(
K-Means clustering is characterized by the need to pre-determine the number of clusters
Association rule is an important topic in data mining, which is to mine and analyze the deep relationship of a large amount of teaching data, so as to discover the correlation and regularity between the data of students’ daily behaviors and students’ performance as well as the implicit correlation information between the data. Understand a few relevant definitions below to better understand the association rules:
Definition 1: In the item set:
The number of contained items is
Definition 2: Relying on the fact that for the criticality of an association rule, the data of support is applied, the probability of simultaneous occurrence of itemset
General practical use will confirm the minimum support: min_sup, the minimum support is used to filter the resulting association rules, remove some of the unimportant or useless rules, and is a measure of the support of the user’s own needs defined by the threshold.
Definition 3: For an itemset
Definition 4: The conditional probability of an association rule outcome occurring is usually described by the confidence level, denoted as
Confidence is used to express the specific probability of
Definition 5: Set (
Apriori algorithm uses a hierarchical iterative search strategy, and is also one of the more classic association rule algorithms in a frequent itemset mining algorithm, the database is opened in
Apriori algorithm is a classical association rule mining algorithm for discovering frequent itemsets in a dataset. The algorithm is based on an idea known as the “Apriori principle”, i.e., if an item set is frequent, then all its subsets must be frequent as well.The main idea of the Apriori algorithm is to mine the frequent itemsets from the dataset in an iterative manner. Its a commonly used association rule mining algorithm. Its core idea is to reduce the number of candidate itemsets by the nature of frequent itemsets, so as to improve the efficiency of the algorithm. Specifically, Apriori algorithm uses two main steps: the first step is to find the frequent itemsets. The second step is to use the frequent itemsets to generate association rules.The key to the Apriori algorithm is the nature of the frequent itemsets. This property states that if a given itemset is a frequent itemset, then all non-empty subsets of it must also be frequent itemsets. Conversely, if an item set is infrequent, then all the supersets that encompass it must also be infrequent. This property helps us to prune and reduce the number of candidate itemsets, thus improving the efficiency of the algorithm.
Candidate item sets are usually generated by the join operation, usually by the set
In the case of (
The pruning operation is intelligently powered to sift through the set of itemsets generated after the self-connections
(1) Find the transactions in the database and generate candidate
(2) Frequent
(3) By joining and pruning operations, candidate
(4) Strong association rules are generated for the obtained frequent itemsets using minimum confidence min_
Input:
Output: Frequent itemsets in
To promote the innovation of daily ideological and political education of college students based on big data, it is not only necessary to follow the disciplinary law and focus on the deepening of theories, but also adhere to the practice orientation, and endeavor to promote the application of big data to facilitate the two-way transformation of theories and practices. On the basis of establishing the principle of big data to promote daily ideological and political education of college students, on the one hand, the application process of big data in daily ideological and political education of college students is discussed at the macro level, including the five major work module processes of data collection, data pre-processing and storage, data mining and analysis, data application and visualization, and data interpretation and feedback of big data in daily ideological and political education of college students. On the other hand, the program design and data model construction of the application of big data in the typical field of college students’ daily ideological and political education, such as data portrait, are carried out at the micro level. Through exploration, we provide a practical implementation path for the utilization of big data in daily ideological and political education of college students.
This paper focuses on the analysis of students’ data behavior by collecting students’ data from students’ ideological and political achievements, students’ daily behavioral data, and teaching and training programs, strengthening counseling and assistance for problematic students, and rationally adjusting teaching and training programs according to the results of the analysis to improve the efficiency of teaching. A university is a key university in a province, and it offers three degree-granting qualifications: doctoral, master’s, and bachelor’s degrees.
The histogram is capable of demonstrating the regularity of data changes and accurately reflecting the characteristic distribution state of the data to be analyzed. The use of histograms to analyze data changes is mainly used to describe the frequency distribution of continuous type variables. The kernel density plot is suitable for observing the distribution of continuous variables, and it is a nonparametric method used to estimate the probability density function of random variables. After synthesizing the initial perceptions above, it is time to move forward with the subsequent comparative analysis. Civics performance in the teaching system is shown in Figure 1, the teaching system design performance in the number of people in each score band is not evenly distributed, creating a distinct achievement gap, the difference in the number of people in each score band is too large, and may also be related to the relative ease of the topic, the number of people scoring high scores of more than 80 to 93 points is high.

Teaching system design results
Firstly, according to the analysis requirements, the data of consumption module in the one-card system are collected, and then the data preprocessing is carried out through big data technology, and the data of the number of times of card use, food (combining the consumption of Peach Garden Dining Hall, Apricot Garden Dining Hall, Guiyuan Dining Hall, Food City, and Commercial Street food and beverage consumption), supermarkets, hot water, and public transportation of all the students of the class of 2023 of a university college are selected as the clustering variables, so as to get the data that can be used for the K- means clustering of the data to be measured. In addition, the raw consumption data were discretized before the cluster analysis, and the characteristics of the data to be measured were divided into five ranges based on quintiles as: very high (assigned a value of 2), high (1), medium (0), low (-1), and very low (-2) for five divisions of the value domain. In order to alleviate the difficulty of subsequent research, the data were transformed to the desired state and the results are shown in Figure 2. Supermarket consumption, hot water consumption, and public transportation consumption have more similar discrete value distributions, with the most distribution at the -2 value division and the least distribution at the 2 value. Food consumption and credit card consumption have similar discrete distributions, which demonstrates that there is some connection between them.

Collect the data of the cancellation fee of a cartoon system
In this paper, we use R language to conduct several tests, and finally take the C value of 4. The clustering results are relatively good. The distribution of the number of students in the four groups of clustered categories is shown in Figure 3. It can be seen that the number of students in the second category is the smallest, and the number of students in the third category is the largest.
From the clustering results situation, the four different categories are characterized as follows:
Category 1: This category has the highest number of card usage (Usetime), food spending (Sc.exp) and supermarket spending (Sm.exp), while hot water spending (Ib.exp) and public transportation spending (Bus.exp) are at the lower normal level.
Category 2: The most significant difference between this category and the other categories is characterized by the fact that hot water consumption is the highest, while the number of card uses, food consumption, supermarket consumption, and bus consumption are at the lower normal consumption level.
Category 3: The consumption characteristics of this category of favors are not prominent. The difference that can be seen is that the number of card uses and the amount of food consumption are slightly higher than those in categories 2 and 4, while supermarket consumption, hot water consumption and public transportation consumption are all at a lower level.
Category 4: This group of people has the lowest level of food consumption and the number of times they use their cards among the four categories, but the highest amount of public transportation consumption, meanwhile, on-campus supermarkets and convenience stores are at the upper-normal level of consumption. Hot water card consumption is also relatively low. The correlation between this group and academic performance suggests that this group is generally underperforming and that the school department should pay more attention to this group of students.

The distribution of students in clustering
After clustering and correlating the student one-card consumption data in the previous section, we found some valuable data and further analyzed this kind of data with consumption characteristics, and obtained the results as shown in Table 1, the posterior items are the frequent itemsets of academic performance variables, and the attributes such as eating breakfast, participating in the examination, and the number of book lending are the strong correlation sets of good and bad academic performance. In terms of the degree of enhancement, the enhancement degree of the 10 itemsets is in the range of 1.1225~1.4026, indicating a good degree of association. It indicates that whether students eat breakfast or not has the same strong correlation with good or bad student performance. The higher the number of times students eat breakfast, the higher the probability of better student performance. From the opposite perspective, students who ate breakfast less often also checked out books less often. The greater the probability that students who did not receive scholarships had poorer student grades.
Student achievement and cartoon consumption data correlation analysis
| ID | Support | Confidence | Degree of ascension |
|---|---|---|---|
| 1 | 0.068956 | 0.475282 | 1.1524 |
| 2 | 0.065254 | 0.522367 | 1.2935 |
| 3 | 0.056204 | 0.480214 | 1.1920 |
| 4 | 0.179532 | 0.452312 | 1.2014 |
| 5 | 0.175268 | 0.475268 | 1.2536 |
| 6 | 0.156231 | 0.462187 | 1.2125 |
| 7 | 0.134029 | 0.433257 | 1.1653 |
| 8 | 0.110022 | 0.241587 | 1.1235 |
| 9 | 0.110022 | 0.312240 | 1.4026 |
| 10 | 0.112357 | 0.274295 | 1.2548 |
For the analysis of the effect of teaching application, this study adopts the quasi-experimental method to investigate the effect of personalized teaching with the application of big data on the improvement of students’ academic performance. In addition, interviews and questionnaires were designed to obtain teachers’ teaching feelings and students’ learning feelings, to understand teachers’ and students’ data-supported teaching experience, the acceptance level of supported personalized teaching and its teaching effectiveness. This study started with the feasibility of the experiment and chose to carry out a study on the teaching practice of personalized teaching supported by technology in the field of information technology.
During and after the teaching practice, experimental data were collected and analyzed statistically and analytically. The experimental data are mainly divided into three parts: the stage test Civics scores of the experimental group and the control group, as well as the final test Civics scores. The second step is to investigate the learning experiences of the students in the two groups using questionnaires. Third, ten students in the experimental group were randomly selected and interviewed about their learning experience.
In this study, a personalized teaching experiment was conducted at a key university in a province for a semester. In the course implementation, two classes were tested in stages. A total of six tests were conducted, all of which were in the form of on-line operation to examine students’ mastery of knowledge related to Civics and Politics, and the type, number and difficulty of the test questions were relatively close to each other, and the Civics scores of the control class and the experimental class on the phase test are shown in Figure 4, where the Civics scores of the control class were relatively better in Test 1, probably due to the control class having a better foundation in information. Although both classes saw an overall improvement in Civics scores, the experimental class outperformed the control class on the last three tests by showing a greater trend of improvement. Test 6 saw the experimental class score 3.86 points more than the control class.

The periodic test of the comparison class and the experimental
The change in the distribution of scores from Test 1 to Test 6 for the control and experimental classes is shown in Figure 5. The excellence rate is the number of people with scores of 85 and above as a percentage of the total number of students in the class, the good rate is the number of people with scores between 75 and 84 (inclusive of 75 and 84) as a percentage of the total number of students in the class, the pass rate is the number of people with scores between 60 and 74 (inclusive of 60 and 74) as a percentage of the total number of students in the class, and the fail rate is the number of people with scores lower than 60 as a percentage of the total number of students in the class. As can be seen from the graph, the excellence rate of both classes has increased, with the control class increasing from 44.7% to 64.7%, but in contrast, the experimental class has made more significant progress, with the excellence rate increasing from 52.5% to 94.6%. It can be seen that students at different levels have improved on their original foundation, especially those with a poorer foundation who have been able to make continuous progress and break through their learning bottlenecks, a change that requires a lot of effort. To a certain extent, it also shows that personalized teaching can stimulate students’ motivation to learn, and can meet the learning styles of the majority of students, making them all able to make progress in their learning, and at the same time promoting the improvement of the teaching level.

The distribution of the comparison class and the experimental class
Analyzed by one-way ANOVA test as shown in Table 2, the mean score of the experimental group for test 1 is 70.14 and the mean score for test 6 is 92.23, it can be seen that the mean score of the experimental group has been significantly improved. The p-value is less than 0.01. In statistical terms, a difference of less than 0.05 indicates a significant difference, and a difference of more than 0.01 indicates a significant difference. This indicates that in this experimental study, the experimental group’s performance in Civics was significantly improved from the beginning of the semester to the end of the semester, reflecting the fact that personalized teaching with big data analytics can help to promote students’ learning in Civics. The experimental class scores are higher than the control class, and the significance level P value is 0.004, which is less than 0.01, indicating that the mean scores show a significant difference, i.e., the students in the experimental class scored significantly higher than the control class in Civics and Politics in the final exam.
Analysis and analysis of single factor variance analysis
| Average score | Standard deviation | Significance level | |
|---|---|---|---|
| Test 1 | 70.14 | 13.16 | P<0.01 |
| Test 2 | 92.23 | 3.22 | |
| B group | 88.36 | 6.42 | P=0.004 |
| A group | 92.23 | 3.22 |
This paper mines and analyzes the data of the teaching system of colleges and universities, and obtains certain research results, using the K-means algorithm and the improved Apriori algorithm to analyze the students’ consumption habits and academic performance, and mines the connection between the students’ performance and their behavioral habits, which plays an important role as a reference for the personalized training of students. The results show that the difference in the number of students’ performance in each score band is large, which is mainly concentrated in the high score band. The consumption data of students were clustered into 4 categories, and the 4th group of people had a general deviation in academic performance, which should be paid extra attention by schools and teachers. Eating breakfast, attending exams, and the number of books borrowed have a better degree of academic performance association, which can influence students’ performance. Through teaching practice, it can be found that personalized teaching based on big data analysis can help promote students’ learning of Civics.
Shaanxi Provincial “14th Five-Year Plan” Education Science Plan 2022 annual project: Shaanxi Provincial private universities “Internet + Party building” theory and practice research (project approval number: SGH22Y1736); 2024 Ideological and Political Work Research Fund of Xi ‘an Mingde Institute of Technology: Research on the Construction and Optimization of the Paradigm of Online Ideological and Political Education in Universities under the Background of Big Data (Project number: SZ2024Y01).
