Research on the Optimization Path of Data Mining Algorithms and Strategies for Mental Health Education in Colleges and Universities under the New Quality Productivity Framework 
Online veröffentlicht: 29. Sept. 2025
Eingereicht: 26. Jan. 2025
Akzeptiert: 02. Mai 2025
DOI: https://doi.org/10.2478/amns-2025-1116
Schlüsselwörter
© 2025 Xixi Chen et al., published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
With the development of the Internet, data has become an indispensable part of our lives. We are generating data all the time, from the economic development data of the country to the health data of a person, data has penetrated into all areas of our life. How to extract useful information from these complicated data is a long-standing problem that has always been of great concern. At this time, data mining technology came into being and became an important tool for processing data, which is widely used in education, finance, medical treatment and agriculture [1-4]. The core of data mining is algorithms, excellent algorithms can screen data faster and more accurately, extract more valuable information, and then promote the development of various industries [5-6]. Data mining algorithm refers to a set of computational methods used to extract valuable information from a large amount of data. The process of data mining is generally divided into preprocessing, model selection, model construction, model evaluation and application [7]. Specifically, the preprocessing stage mainly cleans, denoises, and converts the original data to facilitate subsequent operations. The model selection stage is to select the appropriate algorithm according to the data characteristics and application goals. In the model construction stage, the algorithm and model will be adjusted according to the data characteristics and application goals, and useful information will be extracted from the built model. In the model evaluation stage, the optimal algorithm and model are selected by comparing the performance of various algorithms and models. In the application stage, the model is applied to the actual scene to realize the mining and utilization of information [8-10].
In higher education, students’ mental health is getting more and more attention. With the aggravation of social competition and the increase of life pressure, many college students are facing great pressure in academics, employment and interpersonal relationships, resulting in increasingly prominent psychological problems [11-12]. Therefore, strengthening mental health education and helping students to maintain their mental health has become an important task in college education. Applying data mining to mental health education, on the one hand, updates the education means, on the other hand, more scientific and comprehensive understanding of students’ mental health problems, targeted teaching [13].
Under the guidance of the concept of new quality productivity, data mining technology is used to collect the typical features of college students’ mental health, and a series of pre-processing such as data cleaning is performed to avoid the influence of interfering information on the research results. According to the data characteristics, set the data item sets, strong association rules, frequent patterns, minimum support, use Apriori algorithm to calculate the support and confidence of each set, and finally complete the construction of the association rule model. Using the above model, we explore the mental health education in colleges and universities, and give strategies and optimization paths, aiming to improve the psychological quality of students and the level of psychological education in colleges and universities.
In the new qualitative productivity framework, many human activities are no longer constrained by time and space, e.g., telecommuting, information sharing, etc. Increasingly sophisticated database technologies and pervasive data applications have led to an exponential growth of human data. People began to devote themselves to mining some potential information from these massive data, which resulted in a variety of data mining techniques. Among them, mental health education data mining is a new technology born to make full use of the huge data flow that occurs in the process of digital campus operation. Through the comprehensive utilization of various data mining techniques, a large amount of data is integrated, classified, and refined, making it valuable for application in the constantly developing and innovating education.
Mental health data mining is a multidirectional cross-cutting research field that mainly contains three disciplines: education, computer science, and statistics. At the same time, their combination has given rise to research areas that are similar or related to mental health data mining, such as computer-based education, machine learning and data mining, and learning analytics.
Some definitions of mental health data mining, which is an emerging interdisciplinary field that aims to discover valuable information from mental health data by studying new techniques, and at the same time, hopes to better understand students’ learning status, psychological status, etc. The most common definition of mental health data mining has been defined as the application of data mining techniques to datasets in educational settings in order to address key educational issues.
With the advent of the big data era, data mining technology has gradually matured and has been widely used in various fields, such as precision marketing, loan repayment, biology and medicine. In the 21st century, data mining technology has gradually emerged in the field of education, why should its application in education be a separate research direction? On the one hand, the data of different research fields have the data characteristics of their own fields; on the other hand, mental health data not only have the data characteristics of their own fields, but also have the following typical characteristics.
 Mental health data have stages In China, the education system is generally divided into four stages: elementary school, middle school, high school and university. For different stages, education covers generally greater variability in research issues; for students in the same educational stage, there is also some variability, including: different majors, different grades, different classes, different courses, and different teachers, etc., and their data are generally characterized differently. Mental health data are contextually diverse With the prosperity of the country, education has been popularized to everyone in the process of receiving education, in a particular time and place, a student’s educational problems in the next second or the next stage may be a huge change, therefore, in the education system data are usually recorded in a particular student at a particular time and a particular place a particular event. Variability in mental health data collection Currently, there is some variability in the educational systems of various universities, and different educational systems often record data in different formats or fields, and may collect data at different frequencies and time intervals, making the unification of mental health data mining techniques a huge challenge. Time Spanning of Mental Health Data Mental health data is usually recorded according to the time of behavior, which is time-series and tracking, and it will record the campus life and learning behavior data of each student in a semester, an academic year, or even the whole educational period, so there is a certain degree of variability in its time span.
The process of mental health data mining is basically the same as that of data mining, which mainly includes the following steps:
 Data Collection Due to the different educational environments and educational systems, the data collected for solving different problems are also different. There are rich sources of mental health data, they may include data from management systems, data from psychological counseling, data from questionnaires, etc. It is necessary to collect and organize these multi-source data. Pre-processing Mental health data are complex and diverse. On the one hand, for solving the same problem, different educational systems generally store data in different formats, so it is necessary to extract relevant data according to the actual problem. On the other hand, it is crucial to extract the best data structure according to the type of problem to be solved. Therefore, converting the raw data into suitable data structure helps to solve the problem. Data Mining Mental health data are analyzed through the use of data mining techniques, and in the field of education, data mining techniques such as classification, clustering, and correlation analysis are usually used. Interpretation of results Finally, the educational environment or educational system can be improved based on the results of the experiment. For decision-making, we can interpret the models obtained from data mining algorithms and then design some systems to provide decisions, opinions or suggestions for related educators.
Association analysis can be described as a method of discovering some latent valuable or regular connection in a target data set through the execution of some algorithm [14-16]. Such discovered connections are generally represented by association rules which are generated on the basis of frequent item sets. Association rule mining has been widely used in many fields.
Association rules are generally represented in the form of expressions in which 
The support of a rule is the probability that the set of items within the rule occurs at the same time, and the confidence is a measure of how trustworthy the rule is. The usefulness of a mined rule and its level of certainty are usually expressed in terms of support and confidence. Usually a preset support interval and confidence threshold are determined by professionals and mining experts, and if the result is greater than or equal to the minimum support and confidence, then the association rule can be said to be useful. Some statistical correlation analysis can also be performed to discover certain characteristic ways of correlating between related items.
Assuming that 
The number of items listed in a transaction can be called the transaction width. If transaction 
The support count is important for itemsets, which is the number of transactions that contain a particular itemset. The mathematical formula to represent the support count 
The phenomenon that certain fixed itemsets or sequences, etc., recur many times in a dataset is known as frequent patterns. For example, there are combinations of items that often appear multiple times simultaneously in the purchase data set; these combinations are frequent itemsets.
If the support and confidence of a rule 
Minimum support, denoted by the acronym 
Apriori algorithm is the classic and original association rule mining algorithm, it is based on the support and confidence to find strong association rules, in this way can effectively control the size of the candidate item set, and efficiently mine to meet the requirements of the association rules.
Step 1: Firstly, traverse scan the whole dataset and count the number of times each data item appears in the dataset to get the frequent 1-item set.
Step 2: Generate a new candidate 2-item set based on the frequent 1-item set generated in Step 1.
Step 
Last step: the loop iteration is executed until step 
The two major prerequisites for generating frequent 
Step 1: The two subsets in the previous level of frequent itemset collection are joined to produce the candidate itemset collection. It is represented by the formula: 
Step 2: The infrequent itemsets in the candidate itemset collection are pruned. The candidate set generated by merging is not necessarily the final valid candidate set, because these subsets of the candidate set generated by the initial merging are not necessarily all frequent itemsets, and there may be many infrequent itemsets among them, which will be infrequent itemsets to find out and prune this subset. The specific method steps are: first of all, the overall scanning, comparing the count of each candidate itemset with the minimum support, retaining greater than or equal to the minimum support, and pruning the others. Consider reducing the size of 
Collective activities play a key role in the development of students’ mental health, as they contribute to the construction of social networks, as well as stabilize psychological states and enhance academic performance. By counting multi-dimensional data such as the frequency of collective activities, the participation rate of clubs, the time of volunteering, and the frequency of participation in sports activities, and quantifying the students’ activeness in collective activities through the Apriori algorithm, we take it as an important predictor of mental health. Social interaction helps to alleviate stress and loneliness, clubs and volunteer activities reflect social responsibility and fulfillment, and physical activities have positive effects on both physical and mental health. Therefore, the group activity assessment model can comprehensively understand and quantify students’ level of participation in group life, thus providing an accurate basis for assessing and predicting students’ mental health status, contributing to individual mental health management, and guiding schools and educational institutions to provide more targeted support services.
In order to capture the association between collective activity features more closely, the covariance matrix and multivariate normal distribution are introduced on the basis of Apriori algorithm to more comprehensively consider the potential associations between feature terms the proposed expression is as follows:
Collective Activity Score = P (collective activity frequency, club organization participation rate, volunteer service hours, sports activity participation frequency) can better capture the association between features. To wit:
where 
Thus, the characteristic term probabilities are obtained. As follows:
Here, 

Square curve of collective activity distribution
In order to better understand the relationship between the collective activity ratings given to students and their individual characteristic items, the model was presented visually, and Figure 2 shows a scatter plot of characteristic activities. Each point represents a student, with the horizontal coordinate indicating the value of one of the student’s characteristic items and the vertical coordinate indicating the student’s collective activity score. The scatter points for each feature item show different distribution characteristics, and there may be some concentration trends or outliers. This helps to provide a more complete picture of the differences in students’ group activity ratings and provides a basis for further analysis. The relationship between the frequency of collective activities, the participation rate of club organizations, the length of volunteer service and the frequency of participation in sports activities and students’ collective activity scores can be initially observed. From the figure, it can be seen that the frequency of collective activities has a positive correlation trend with collective activity ratings, i.e., when the frequency of students’ participation in collective activities is higher, their collective activity ratings are also relatively higher. Club organization participation also appears to have a positive effect on collective activity ratings, while the relationships between volunteer hours and sports participation frequency and collective activity ratings are relatively more diffuse, making it difficult to identify clear trends. These preliminary observations provide clues for further in-depth analyses and help to understand the effects of different collective activity characteristics on students’ collective activity ratings. Can be interpreted as a student’s group activity participation score, providing a more complete picture of a student’s social, academic, and mental health status.

Feature activity scatter plot
Aiming at the psychological problems existing in primary and secondary school students, a number of scholars have already conducted research in this area. However, most researchers still use statistical analysis methods for statistical analysis, and most researchers do not use association rule mining algorithms to analyze the correlation between the data in question. As a matter of fact, the influencing factors of the psychological problems data set are characterized by autocorrelation, and mining the potential correlations between the influencing factors of psychological problems will have positive theoretical significance for the research and guidance of mental health education.
It will create a large amount of inaccurate and noisy data due to incomplete and invalid data due to inattention of students or other reasons in student mental health assessment data. Due to the existence of this worthless information, the mining results will eventually be affected. Therefore, preprocessing non-ideal data sources can significantly improve the efficiency of data mining algorithms and knowledge discovery accuracy.
 Data Extraction In the obtained data, there is a lack of mining significance due to the uniqueness of the values such as date, school number, name, etc., and most of the students are Han Chinese, which does not have an impact on the mining results, so these attributes are deleted in this paper. For the characteristics of students’ psychological data, 9 dimensional psychological problems of interpersonal tension and sensitivity, somatization, anxiety, obsessive-compulsive disorder, hostility, horror, psychosis, depression, and paranoia were identified, and the links between the 6 psychological problem influencing factors of being an only child, a student cadre, the region where they are located, the monthly income of the family, the gender, and the structure of the family, and the correlations between the 9 dimensional psychological problems were mined . Data cleaning The data after data extraction still has many defects (dirty data), based on these defects can not build a good mining model, so it needs to be cleaned. Data cleaning mainly includes missing value processing, noise data processing, abnormal data processing, duplicate data checking and data validation. Data specification The psychological problems in students’ psychological data are scored and divided into five levels, which are none, mild, moderate, severe and extremely severe. According to statistics, the proportion of students with a large score on each dimension is much lower than the other dimensions, if the minimum support threshold is set too large, it will lead to the frequent item set mining, then the less frequent items, whose support is less than the minimum support threshold will be filtered out, however, these filtered information instead can provide us with greater value. Table 1 shows the performance variables of common psychological problems of students, based on the above nine psychological problems and six psychological problems influencing factors, the items were coded separately to obtain the performance variables, variables and codes of mental health problems prevalent among students.
Common psychological problem expression variables
| Variable name | Variable code | 
|---|---|
| Interpersonal tension and sensitivity | A1 | 
| Feeling of learning pressure | A2 | 
| Depression | A3 | 
| Poor adjustment to school life | A4 | 
| Anxiety | A5 | 
| Force | A6 | 
| Bigoted | A7 | 
| Antagonism | A8 | 
| Horror | A9 | 
The Apriori algorithm was used to scan the variable data of psychological problems to obtain the set of items with weighted support higher than or equal to 0.08, and the set of frequent items of common psychological problems was ranked according to the size of weighted support as shown in Table 2. It can be found that the top eight items in descending order of severity are sense of learning stress, compulsion, anxiety, interpersonal tension and sensitivity, depression, paranoia, hostility, and maladaptation. Learning stress, compulsion and the combination of these two psychological problem factors are the most frequent psychological problems present in the students in this survey, with 0.715 and 0.626 students having problems of learning stress and compulsion respectively, and 0.101 having problems of learning stress as well as compulsion, so it can be deduced that the majority of the students have both learning stress problems or compulsion problems.
Frequent item set of common psychological problems
| N | Support degree | Item set | 
|---|---|---|
| 1 | 0.715 | {Study pressure} | 
| 2 | 0.626 | {force} | 
| 3 | 0.384 | {anxiety} | 
| 4 | 0.207 | {Interpersonal tension and sensitivity} | 
| 5 | 0.126 | {depression} | 
| 6 | 0.119 | {paranoid} | 
| 7 | 0.105 | {hostile} | 
| 8 | 0.103 | {maladaptation} | 
| 9 | 0.101 | {Learning pressure, compulsion} | 
| 10 | 0.089 | {Study pressure, interpersonal tension sensitivity} | 
| 11 | 0.086 | {Depression, maladjustment} | 
| 12 | 0.076 | {Study stress, depression} | 
| 13 | 0.075 | {Learning pressure, maladjustment, interpersonal tension sensitivity} | 
| 14 | 0.072 | {horror} | 
In order to find the connection between psychological problems, we produced rules with a confidence level of 0.57 or higher and used the confidence level as a benchmark for ranking, and the strong correlation rules for common psychological problems are shown in Table 3. As can be seen, {Obsessive-Compulsive, Anxious}→}Study Stress} is the rule with the highest level of confidence, with a confidence level of 0.9031, indicating that 90.36% of those who feel obsessive-compulsive and anxious will experience study stress, and this rule is ranked fourth with a support level of 0.1888, which means that the probability of it occurring is relatively high. The weighted support of {Interpersonal sensitivity}→{Maladaptation} is 0.1208, which is the rule with higher confidence in the combination of psychological problems with higher support, and the probability of this rule occurring is higher.
Strong association rules for common psychological problems
| N | Cause item set | Result item set | Weighted support | Weighted confidence | 
|---|---|---|---|---|
| 1 | {Compulsion, anxiety} | {Study pressure} | 0.1888 | 0.9036 | 
| 2 | {Interpersonal tension and sensitivity} | {maladaptation} | 0.1208 | 0.8977 | 
| 3 | {Anxiety, depression} | {Study pressure} | 0.0661 | 0.8548 | 
| 4 | {Interpersonal tension, sensitivity, maladjustment} | {Study pressure} | 0.0916 | 0.8151 | 
| 5 | {Interpersonal tension and sensitivity} | {Study pressure} | 0.0586 | 0.8105 | 
| 6 | {Interpersonal tension and sensitivity} | {Study pressure} | 0.1124 | 0.8084 | 
| 7 | {depression} | {Study pressure} | 0.0975 | 0.7946 | 
| 8 | {maladjustment, depression} | {Study pressure} | 0.0652 | 0.7806 | 
| 9 | {anxious, hostile} | {force} | 0.0964 | 0.7454 | 
| 10 | {compulsion, paranoia} | {somatization} | 0.0134 | 0.7185 | 
| 11 | {compulsion, interpersonal tension and sensitivity} | {somatization} | 0.0912 | 0.6509 | 
| 12 | {Anxiety, depression} | {force} | 0.2505 | 0.6378 | 
| 13 | {compulsion, paranoia} | {Interpersonal tension and sensitivity} | 0.3138 | 0.5958 | 
| 14 | {Terror, interpersonal tension sensitive} | {depression} | 0.1626 | 0.5888 | 
Next, the two psychological problems of maladjustment and academic stress, which appeared in a higher proportion, were mined and analyzed using the Apriori algorithm of this paper, and some of the correlation rules between the factors affecting psychological problems and maladjustment are shown in Table 4. It is found that maladaptive problems occur in 47.08% of the surveyed students from low-income families in rural areas, while the proportion of maladaptive problems in low-income families from rural areas is 68.06%. Maladjustment occurs in 60.27% of the students with internet addiction. While no maladaptive problems occur in 71.84% of the students from high income families from rural areas. For the factor of sleep quality among the personal factors of the students, the probability of maladjustment is less among the students with higher quality of sleep.
Partial association rules between psychological problems and maladjustment
| N | Cause item set | Result item set | Weighted support | Weighted confidence | 
|---|---|---|---|---|
| 1 | {Rural, low income} | YES | 0.4708 | 0.6806 | 
| 2 | {Rural, only child} | YES | 0.1676 | 0.5459 | 
| 3 | {Town, only child} | YES | 0.0946 | 0.4678 | 
| 4 | {Single parent family, low income} | YES | 0.2408 | 0.4229 | 
| 5 | {Rural, high income} | YES | 0.5176 | 0.7184 | 
| 6 | {Sleep quality is high} | YES | 0.2598 | 0.4686 | 
| 7 | {Have Internet addiction} | YES | 0.1877 | 0.6027 | 
| 8 | {Non-single parent families, large and medium-sized cities} | YES | 0.1604 | 0.4486 | 
| 9 | {High income} | YES | 0.2355 | 0.5903 | 
The partial correlation rules between students’ psychological influences and academic stress are shown in Table 5. It can be seen that the probability of student cadres in low-income families having academic stress problems is 64.86%, which indicates that family economic factors as well as students’ personal status also contribute to students’ academic stress problems. On the other hand, only 37.77% of the students who are physically fit have academic stress, indicating that physically fit students are also more capable of relieving themselves from stress. By comparing articles 4 and 5, it can be found that students from single-parent families are more likely to experience academic stress than those from non-single-parent families, which indicates that the mental health of students from intact families is better than that of students from incomplete families.
Partial association rules between psychological problems and learning stress
| N | Cause item set | Result item set | Weighted support | Weighted confidence | 
|---|---|---|---|---|
| 1 | {Low income, student cadres} | YES | 0.1939 | 0.6486 | 
| 2 | {High income} | YES | 0.0644 | 0.5408 | 
| 3 | {High income, rural} | NO | 0.1846 | 0.4916 | 
| 4 | {Single parent families, large and medium-sized cities} | YES | 0.3755 | 0.6909 | 
| 5 | {Non-single parent families, large and medium-sized cities} | YES | 0.1358 | 0.2688 | 
| 6 | {Non-student cadres, town} | NO | 0.1646 | 0.3759 | 
| 7 | {Sleep quality is high} | YES | 0.0956 | 0.4409 | 
| 8 | {In good health} | NO | 0.1626 | 0.3777 | 
| 9 | {Not Only child, not single parent Family} | YES | 0.0858 | 0.4178 | 
Under the concept of “three-pronged education”, college students’ mental health education needs to be oriented to all-member education, and it is necessary to build an integrated education team. In this process, first of all, it is necessary to clarify the responsibilities and roles of each main body of education and form a situation of full participation. For teachers, they should not only pay attention to students’ knowledge learning, but also pay attention to students’ mental health. Counselors should strengthen the ideological guidance and psychological guidance for students; administrators should actively participate in mental health education and provide the necessary support and protection. Secondly, it is necessary to strengthen the collaboration and cooperation between the main body of parenting, teachers, counselors and administrators should establish a good communication mechanism and cooperation mechanism, and work together to solve various problems faced by students. Finally, it is necessary to strengthen the training and management of the nurturing team, to continuously improve the professionalism and skill level of teachers, so that they can better provide mental health services and support for students, and at the same time, to strengthen the team building of counselors, to improve their professional ability and work level, and also to strengthen the training and management of administrative staff, to improve their work efficiency and service quality.
To educate students on mental health under the concept of “three-pronged education”, it is necessary to establish a continuous education mechanism based on the idea of educating people throughout the entire process. In this process, colleges and universities should formulate targeted mental health education plans and programs according to the needs and characteristics of students at different stages of their lives, throughout the entire education process from enrollment to graduation. For college students with psychological problems, it is necessary to establish a mechanism for the timely detection and resolution of psychological problems. Through regular psychological assessment and screening, students’ psychological problems are detected in a timely manner, and effective intervention and solution measures are taken to avoid the backlog and deterioration of psychological problems. Finally, in order to ensure the effectiveness of mental health education, it is necessary to establish a continuous education mechanism to provide students with continuous educational support and assistance through a variety of forms of educational activities and curricula, such as mental health lectures, psychological counseling courses, and psychological training camps. Through the implementation of the above measures, a continuous education mechanism based on the idea of whole process education can be established to provide strong support and guarantee for students’ mental health education.
In this paper, under the concept of new quality productivity, the association rule model is used to provide an in-depth interpretation of the current status of psychoeducation in colleges and universities, and to propose targeted strategies and paths.
 Most of the students in the collective activity scores are distributed in the range of similar mean values, showing a trend of normal distribution, which helps to fully understand the social, academic and mental health status of students. The weighted confidence level of the item set {low-income families, student cadres, learning pressure problems} is 64.86%, indicating that family economic factors and students’ personal identity are important reasons for students’ learning pressure, and also reflecting that the learning pressure of students with good family economic conditions is relatively low. The optimization of mental health education in colleges and universities is achieved through the construction of an integrated parenting team and the establishment of a continuous education mechanism.
