Research on the precise economic support model of college student financial aid policy to the funded group based on cluster analysis

With the great victory of the first hundred-year goal of building a moderately prosperous society in an all-round way, the Chinese people are moving towards a new starting point for realizing the second hundred-year goal. In such a period of historical convergence, consolidating the great achievements made in the fight against poverty and continuing to do a good job in helping economically disadvantaged people will lay a more solid foundation for realizing the great rejuvenation of the Chinese nation [1-2]. However, in many impoverished areas, the income level of residents is relatively low, and even some children in some families are unable to successfully complete their education [3]. Education can realize social justice and stabilize social equity, and the success of education will be directly related to the stable development of China’s economy and society [4-5]. Therefore, the state must continuously increase the subsidy for students with family economic difficulties, and must help students with family economic difficulties to successfully complete their education, which is not only the minimum requirement of precise poverty alleviation work, but also one of the important work in the national education poverty alleviation work [6-8].

Precision financial aid is an improvement and refinement of the traditional crude financial aid. It mainly implements financial support by providing material and spiritual help for students in difficulty and taking detailed measures, including clear funding objectives, standardizing the funding process and specific funding content [9-11]. Accurate funding adheres to the principles of individual differences and specific analysis of specific problems, and formulates funding programs according to the individual differences and actual needs of students, in order to achieve more effective funding effects [12-14]. Accurate financial aid needs to be built on the basis of accurate identification, focusing on the precise needs and precise forms, and guaranteeing the implementation effect through the supervision and feedback mechanism [15-16]. In the context of the digital era, the heads and teachers of major universities can rely on digital technology to achieve accurate support for financial aid, promote the smooth implementation of financial aid for students with family economic difficulties in China, and comprehensively realize the fairness, justice and openness of education and society [17-20].

The implementation of scientific, reasonable, and accurate financial aid policies can lay a solid foundation for the comprehensive development and growth of students in difficulty, and ensure the implementation of financial aid policies. The article first covered data mining related techniques, including data preprocessing, association rule algorithm, FCM cluster analysis, and MD-KNN algorithm. Secondly, based on the campus one-card data of University W, data modeling is carried out through poverty data, the initial feature set of poverty data is established by using FSIPD algorithm, and the classification model of poor students is established by combining with MD-KNN algorithm. Finally, for the accuracy of university student financial aid policy, Apriori algorithm was used to mine the strong association rules of poor students, and FCM cluster analysis was combined to explore the difference between the study status of the funded groups and the satisfaction of financial aid policy.

2

Data mining process and related algorithms

In the new historical stage, colleges and universities should firmly establish the concept of accurate financial support, deeply understand the importance of accurate financial support in colleges and universities to serve the goal of common wealth, adhere to the direction of socialist school running, and realize the overall development of students, so as to break the current reality of the dilemma of accurate financial support. Starting from the precise identification of the funding object, improving the funding method, strengthening the value leadership, etc., to enhance the effectiveness of precise funding and educating people. Accurate funding is the pursuit of the goal of college financial aid work, make full use of the information data, will provide scientific reference and anticipation for the formulation of policies, financial investment, funding effect of college financial aid work, will effectively promote the realization of accurate funding for students in colleges and universities, and provide a basis for decision-making on accurate funding.

2.1

Data Mining and Preprocessing

2.1.1

Data mining process

Data mining is the process of using one or several computer science techniques to automatically analyze and collect data information from data. Data mining technology can be seen as a result of natural changes in the development of social informatization, and data mining must go through the process from the conception of the concept to its acceptance, to be widely researched and explored, and then to be used and applied. The specific process of data mining is shown in Figure 1, which mainly includes the steps of data preparation, mining data, interpreting and evaluating data, and applying models [21].

1) Data preparation. Data is a very important point in data mining, thus the data preparation part is very important. In order to reduce the negative effects, it is necessary to improve the preliminary data preparation from the following aspects, i.e., a good mining plan based on the objectives, data collection and acquisition, and data integration.

2) Mining data. After completing the data preparation stage, select a suitable data mining algorithm, provide the prepared data to the data mining algorithm, and use this algorithm to carry out modeling.

3) Interpretation and evaluation of data. Interpreting and evaluating data means checking the output value of data mining, evaluating whether it can meet the goal of mining, and understanding whether the information and data found by it have value.

4) Model application. Model application refers to applying the information and knowledge obtained from using data mining algorithms to actual problems and examples.

2.1.2

Data pre-processing

The root of data analysis is data, and data quality is the life of data, if there is a problem with the quality of data, even if the tools for analyzing and mining data are more advanced, they can only extract worthless information from poor quality data. A set of techniques used before the application of data mining methods is called data preprocessing for data mining, and it is well known that this is one of the key aspects of the process of discovering knowledge from data.

Preprocessing is an important step in the process of data mining, and the use of data preprocessing methods can convert the large amount of problematic data collected into usable data, effectively improving the quality of the data, and ultimately improving the classification performance so that the classification algorithms can operate more efficiently. Data preprocessing is also a complex task that requires a proper understanding of the problems that arise in the data after analyzing it in order to best utilize the preprocessing methods to process the data.

1) Data integration and consolidation. Putting data with different representations together so that conflicts within the data can be resolved, such as using multiple databases or files to integrate for ease of analysis.

2) Data transformation. Normalization and aggregation of data. Normalization is a process that ensures that there is no redundant data, that all data is stored in one place, and that all dependencies are logical.

3) Data reduction. When the amount of data is large, accessing the database can become slow, expensive, and difficult to store correctly. For data reduction aims at a simplified representation of the data in the data warehouse, which is reduced through a variety of methods.

4) Data discretization. Data can be replaced with the interval levels of the original values to reduce model training time.

2.2

Data Mining Algorithms

2.2.1

Association Rule Algorithm

The association rule mining problem is simply a two-step operation, the first step is to find all frequent itemsets that are greater than the minimum support, and the second step is to generate association rules from the identified frequent itemsets. That is, to find the association rules that satisfy the minimum support and minimum confidence, in which finding all the frequent itemsets is the core content of the association rule algorithm.

Given I = {I₁,I₂,…,I_m} a collection of items, referred to as the itemset, the set containing K items is the K itemset, transaction T is a non-empty subset of I, T ⊆ I, each item in thing T is identified with a unique one identifier TID, and D is the set of all transactions [22].

An association rule is shaped as X → Y implication, where X ⊆ D, Y ⊆ D, X are the conditions of the association rule and Y is the result of the association rule.

1) The support S(support) of association rule X → Y in D can be expressed as: (1) $S u p p o r t (X \to Y) = P (X U Y)$

That is, the percentage of Transaction D that contains both a X-item set and a Y-item set. Support is a measure of importance.

2) The association rule X → Y confidence C(confidence) can be expressed as: (2) $C o n f i d e n c e (X \to Y) = P (Y | X)$

That is, the percentage of cases where transaction D contains a X-item set that also contains a Y-item set. Confidence is a measure of accuracy.

3) Thresholds. In order to find out useful association rules, then two thresholds need to be set artificially, i.e., minimum support threshold minsup and minimum confidence threshold minconf. If these two thresholds are satisfied, the association rule is considered to exist.

4) Frequent itemsets. The set of items that satisfy the minimum support minsup, finding the frequent itemset is one of the most important content in the association rules.

The basic idea of Apriori algorithm is to scan the database several times to find out the frequent itemsets which are greater than the minimum support, and then generate association rules through these frequent itemsets.

The Apriori algorithm first scans the database, the first scan obtains a frequent 1-item set L₁ greater than the minimum support, the K(K > 1)nd scan uses the results of the previous L_k–1-item set of K–1 to produce candidate set C_n, and then determines the support degree of C_n by scanning the database to generate a frequent K item set L_n. The scan ends until the set of Candidate Set C₁ is empty.

2.2.2

Cluster analysis algorithms

Cluster analysis refers to the division of a series of entities or abstract objects into groups according to their degree of similarity, so that data objects in the same group are highly similar and data objects in different groups are not similar. Clustering is the process of clustering all the samples in an unprocessed sample set into a number of generally disjoint subsets, viewing each subset as a class, by which each class may correspond to attributes or concepts unknown to the samples, which are not known to the clustering algorithm in advance.

Assuming that sample set D = {x₁,x₂,…,x_m} contains m samples, and considering sample x₁ = (x_i1;x_i2;…;x_in) in the sample set as a n-dimensional vector, the clustering algorithm is to divide sample set D into k disjoint classes {C_l|l = 1,2,…,k}, where C_{_l′ ∩l′≠l} C_l = ∅ and $D = \cup_{i = 1}^{k} C_{i}$ . Accordingly, we use λ_i ∈ {1,2,….k} to denote the “labeled class” of sample x_i, i.e., . The result of clustering can be represented by a labeled class vector x_i ∈ C_λ containing λ = (λ₁;λ₂;…;λ_m) elements.

Fuzzy C-mean clustering (FCM) is a mathematical method for categorizing things according to certain requirements when fuzzy boundaries between things are involved, i.e., the probability of each sample belonging to each category is calculated in an iterative manner. Fuzzy clustering is a cluster analysis that introduces fuzzy logic, which has the advantage of improving classification accuracy and robustness. At the same time, it can provide richer information for the data, and can also provide more reference for subsequent data mining and decision-making.

Given dataset X = {X₁……X_n}, k is the number of categories, m is the center of each cluster, and μ_j(x_i) is the affiliation function of the ith sample corresponding to the jth category, the clustering loss function based on the affiliation function is: (3) $J_{f} = \sum_{j = 1}^{k} \sum_{i = 1}^{n} {[μ_{j} (x_{i})]}^{2} {‖ x_{i} - m_{j} ‖}^{2}$

Let the partial derivatives of J_f with respect to m_j and μ_j(x_i) be zero to find the necessary conditions for the minimal values of Eq. Then: (4) $m_{j} = \frac{\sum_{i - 1}^{n} {[μ_{j} (x_{i})]}^{2} x_{i}}{\sum_{i = 1}^{n} {[μ_{j} (x_{i})]}^{2}}$ (5) $μ_{j} (x_{i}) = \frac{{‖ x_{i} - m_{j} ‖}^{- 2}}{\sum_{s - 1}^{k} {‖ x_{i} - m_{s} ‖}^{- 2}}$

The solution is solved using an iterative approach until the convergence condition is satisfied. The convergence condition is: (6) $\max_{i j} {‖ U_{i j}^{(r + 1)} - U_{i j}^{(i)} ‖} < ε$

The key to the FCM algorithm is to make the loss function (also known as the objective function) to be minimized and to reach the convergence condition [23].

2.2.3

MD-KNN algorithm

K Nearest Neighbor Algorithm (KNN) is a method that can be used for classification and regression. Its core idea lies in the decision-making test sample category, only based on the feature space in the 1 or K nearest neighbor samples of the category to determine the test sample belongs to the category. KNN algorithm belongs to the lazy learning, there is no training phase, the new samples directly with the training set to do the classification of the prediction, the implementation of the process is simple.

An important feature of the Mahalanobis distance (MD) is that the correlation of the dataset is taken into account, and the covariance matrix is added to the distance measure to exclude the interference of the different variables’ measures, and the correlation between the variables, on the distance measure.The basic role of MD is to make the original data covariance eliminated and more robust by certain data transformations without changing the overall distribution of the sample, based on the principal component analysis (PCA) technique of dimensionality reduction.The theoretical implementation method is to assign smaller weights to features with high probability of occurrence in the overall sample, reducing the computational distance. The features with low probability of occurrence in the overall sample are given larger weights to increase the computational distance.

Let the mean of a set of multivariate data vector x = (x₁,x₂,⋯,x₁) be μ = (μ₁,μ₂,⋯,μ₁)^T, and its covariance matrix and inverse matrix be ∑ and ∑⁻¹, then the Mahalanobis distance d_m(x) of this single point data is: (7) $d_{m} (x) = \sqrt{{(x - μ)}^{T} Σ^{- 1} (x - μ)}$

The Mahalanobis distance between two random variables x and y that obey the same distribution and whose covariance matrix is ∑ can be described as: (8) $d_{m} (x, y) = \sqrt{{(x - y)}^{T} Σ^{- 1} (x - y)}$

If the covariance matrix ∑ is a unit vector, i.e., the dimensions are independently and identically distributed, the Mahalanobis distance degenerates to the Euclidean distance.

Based on the above analysis, this paper combines MD with KNN algorithm to design MD-KNN classification algorithm, which aims to better mine the data features in the data. The Mahalanobis distance can clearly detect the distance between both observed and known samples, which is very suitable for the process of identification and classification. The Mahalanobis distance represents the covariance distance of the data, which is different from the Euclidean distance in that it takes into account the connection between various features and can measure the distance of similarity between samples, and thus detects weak features when the variance of the same variable is significantly different in different states.

The traditional KNN algorithm calculates the Euclidean distance between samples without taking into account the local differences of the samples, and calculates the sample data, which has poor performance in detecting weak features when the variance of different samples is significantly different. Feature detection classification based on the MD-KNN algorithm uses the Mahalanobis distance to calculate the distance between each sample without considering the different sparsity between samples, which can further improve the classification accuracy of the sample features.

3

MD-KNN-based classification model for poor students

State financial assistance for students whose families are under financial pressure is the basis for ensuring fairness in education. The funding method has the characteristic of guarantee, can basically satisfy the material needs of students from families with financial difficulties, and can also realize the political function of fair teaching as well as the economic function of helping the needy. The implementation of accurate financial aid for students in difficulty can effectively stimulate the internal motivation of students in difficulty and enable them to realize sustainable development. The implementation of scientific, reasonable, and accurate financial aid policies can lay a solid foundation for the comprehensive development and growth of students in difficulty, and ensure the implementation of financial aid policies.

3.1

Poverty data modeling and processing

3.1.1

Campus One Card Data Collection

In order to analyze the correlation between autonomous policies for college students and precise economic support for poor students, this paper takes University W as the research object, and mainly obtains the relevant data for subsequent data mining and analysis from the campus one-card data of this university. Among these massive data, the data volume of enrolled students accounts for about 85% or more of the total data storage, and among the data volume about enrolled students, the consumption flow data of enrolled undergraduates dining in the cafeteria occupies about 85%.

In carrying out the campus one-card data collection, the data are mainly collected from the campus integrated fee management sub-system (including the integrated business and various fee and consumption data of Han meal and Qing meal, and the consumption data of cafeteria meals in the sub-systems of the transfer restaurant on the second floor of the Logistics Department), the business sub-system (including the consumption data of the international students’ cafeteria, Han meal and Qing meal), the third-party docking sub-system (including the docking of the campus hospital’s HIS system, the International Education Building (including consumption data in the sub-systems of water control, drinking water, etc.), Ethernet POS sub-system (including consumption data in the sub-systems of deposits, bill payments, vending machines in the exchange center, bakery, supermarkets, stores, lounge bars, fruits and dried fruits stores, barbershops, laundromats, etc.), Campus One Card Library Management sub-system (mainly including data in the supermarkets on the first floor of the library of the main campus, consumption data of overdue book repayments, consumption data of the lounge bars, etc.), and other different business sub-systems. (mainly including data from the supermarket on the first floor of the main library, book overdue repayment consumption data, leisure bar consumption data) and other different business subsystems.

After collecting and summarizing, the volume of consumption flow data for the 142 days of the first semester of the 2022-2023 academic year obtained from various business subsystems in the backend database of the campus card in this paper is about 8,160,000 entries, with a total storage capacity of about 3G.

3.1.2

Poverty data modeling treatment

The goal of the preprocessing of poor students’ data is to improve the classification accuracy of poor students’ data, reduce the cost of time and computational power consumed in the classification and identification of poor students, and provide a data base for the identification of poverty. The objective function set in this paper uses the data classification accuracy rate as the measurement criterion, and the overall optimization objective function can be expressed as follows: (9) $\max i m i z e Z = \frac{1}{N} \sum_{i = 1}^{N} I (h (x_{i}) = Y_{i})$ (10) $s . t . h (x) = \sum M o d e l (x)$ (11) $\frac{1}{N} \sum_{i = 1}^{N} I (h (x_{i}) = Y_{i}) > φ$

Where h(x_i) denotes the predictive class label of the poor student sample x_i and Y_i denotes the true class label of x_i. I(h(x_i) = Y_i) is used to test the consistency of x_i the predicted class label with the true class label, if h(x_i) and Y_i are equal, the result value 1 is returned, and vice versa, the result value 0 is returned. N is the total sample size for the data on students in poverty. Formula i ∈ {1,…,N}. The constraints ensure that the classification accuracy of the data on students in poverty remains above the value of φ.

In this paper, we propose a feature search processing method based on FSIPD, which aims at better initial feature set construction to provide support for feature extraction of poor students. It is assumed that the data after data preprocessing is the data set D of poor students with the number of species m, corresponding to the feature set f = [f₁,f₂,…,f_v] after initial screening, v is the number of features in the poor data D, and φ is the threshold value of feature selection in D. The specific search steps are as follows:

Step1 For the feature set f of D, calculate the F(f_k) value of each feature, then: (12) $S_{b} (f_{k}) = \sum_{j = 1}^{m} \frac{D_{j}}{N} (μ_{j}^{k} - μ^{k})$ (13) $S_{o} (f_{k}) = \frac{1}{N} \sum_{j = 1}^{m} \sum_{x \in j} (x^{k} - μ_{j}^{k})$ (14) $F (f_{k}) = \frac{S_{b} (f_{k})}{S_{o} (f_{k})}$

where N denotes the total number of samples in the poverty data d, D_j denotes the number of samples of poor students in category j in d, μ^k is the mean of all the samples in feature f_k in d, $μ_{j}^{k}$ is the mean of the samples in category j in d in feature f_k, j ∈ {1,…,m}, k ∈ {1,…,v}. Eq. (12) is the interclass scatter S_b(f_k) of feature f_k and Eq. (13) is the intraclass scatter S_ω(f_k) of feature f_k. For feature f_k selected from poverty data, the larger the interclass scatter S_b(f_k) and the smaller the intraclass scatter S_ω(f_k), the higher the value of F(f_k) is taken, which indicates that the discriminative ability of the feature is stronger.

Step2 Sort the computed feature values in descending order and construct a feature sequence in that order.

Step3 Sequentially select a feature f_k to be added to the relevant subset f_rel of the poverty data by sorting the features in F-rank, use the classification model in order to evaluate the relevant subset of features, and record the classification accuracy of the relevant subset of features f_rel as φ_i.

Step4 Repeat Step3 until φ_i+1–φ_i < φ, then stop searching the relevant feature subset f_rel and output the last selected relevant feature subset f_rel.

3.2

Feature Extraction and Classification Model

3.2.1

Characterization of poor students

After completing the construction of the dataset of poor students and the search of the initial feature set, this paper extracts and constructs the relevant abstract features for students to provide reliable data support for the MD-KNN classification model.

The construction of abstract features is proposed by combining the goal of mining poor students from families and the working experience of related business personnel. It mainly includes regularity and economic level of friend circle. Regularity can be described by the entropy of the occurrence of a person’s behavior at specific time intervals.

Assuming that the time interval is n, i.e., T = {t₁,t₂⋯,t_n}, the formula for the probability of any one student’s behavior occurring at t_i time interval is: (15) $P_{v} (T = t_{i}) = \frac{n_{v} (t_{i})}{\sum n_{v} (t_{i})}$

where n_v(t_i) is the frequency of occurrence of behavior v in time interval t_i. Then the entropy of behavior v is: (16) $E_{v} = - \sum_{i = 1} P_{v} (T = t_{i}) \lg P_{v} (T = t_{i})$

The higher the entropy of a behavior, the more uneven the probability of that behavior occurring in different time periods, i.e., the behavior is less regular. In the study of this paper, the entropy of behaviors such as dining, non-cafeteria consumption and going to the library are considered.

For students in colleges and universities nowadays, the circle of friends can respond to quite a lot of information, and the economic level of a person may be related to the average economic level of his/her circle of friends. First, the concept of intimacy is introduced, which indicates the closeness of the relationship between two people.

First, the concept of intimacy is introduced, which indicates the closeness of the relationship between two people. Then calculate the closeness R_A(B) of any two students, set a threshold H, and consider that the classmate B(R_A(B)>H) with A closeness greater than H is the friend of A. In this way, the circle of friends is constructed. For intimacy, it can be calculated by the number of times two people appear at the same place at the same time in a certain time period, and different swiping scenarios need to have different weights. The intimacy between student A and student B in time period T is calculated by the formula: (17) $R_{A} (B) = \sum_{i \in L} ⌊ \frac{R_{A}^{i} (B)}{C_{A} (i)} \frac{| S |}{S_{A} (i)} ⌋$

Where L denotes all swipe locations, C_A(i) denotes the total number of swipes made by student A at location i within time period T, R_A(B) denotes the number of times student A co-occurs with student B at location i within time period T, |S| denotes the total number of students, and S_A(i) denotes the total number of people co-occurring with student A at location i. It can be seen that the intimacy is directed, and the fact that A has a high intimacy for B does not necessarily mean that H has a high intimacy for R_A(B), i.e., in Eq. (17) R_A(B)≠R_B(A). i.e., in Eq. (17), it is possible to compute the intimacy R_A(B) of any two students A and B, and to set a threshold H to consider that the student who meets the requirements of R_A(B) > H that student B is the friend of A - -In this way the friend circle of student A can be obtained. Next, the economic level of the circle of friends F_A is defined by the number of students in the student’s circle of friends who have received a grant, and the number of friends of the student, there: (18) $F_{A} = \frac{P_{A}^{2}}{N_{A}}$

where N_A represents the total number of friends of student A and P_A represents the number of friends of A who are from poor families.

3.2.2

Process of categorizing poor students

Based on the feature extraction results of poor students from campus one-card data, this paper introduces the MD-KNN algorithm to establish a classification model for poor students, and its training and classification process is as follows:

1)

Training phase:

Input a training set T = (X₁,X₂,⋯,X_N) of multidimensional vectors, each X_N in the training set must belong to a certain classification set, where N denotes the classification number.

The covariance matrix for each classification can be calculated according to the formula for MD S = (S₁,S₂,⋯,S_N). Where S_N corresponds to the Nth classification.

Output the covariance matrix for each classification.

2)

Classification stage:

Input multidimensional vector Y.

Calculate the Mahalanobis distance between each classification and Y based on the covariance matrix obtained in the training phase.

Find the smallest distance closest to Y.

The output determines the classification with the minimum Mahalanobis distance.

According to this algorithm, the classifier does not need to compute the parameter K in advance in order to compute the classifications of the samples based on the decentralized set of training samples.

4

Validation analysis of precise clustering of poor students

In recent years, the evaluation system for the effectiveness of student financial aid in colleges and universities has been increasingly improved. With the six-in-one student aid policy system of “awards, grants, loans, attendance, exemption and subsidy” as the core, the effectiveness evaluation of the “guarantee” student financial aid work has been realized with the in-depth promotion and extensive application of “precise funding”.Combining financial aid and human development work with the development needs of students, combining material and spiritual financial aid to meet the basic needs of students and provide opportunities for their long-term development, and creating a virtuous cycle of “Relief - Human Development - Achievement - Giving Back”.

4.1

Analysis of association rules for poor students

4.1.1

Frequent itemset mining

The Apriori association rule mining algorithm proposed in the previous paper is used as a support to realize the mining of the frequent patterns of poor students by combining with the campus card data.The thresholds of minimum support and minimum confidence are set to 30% and 80%, respectively, and the latter items are accurately matched to “general poverty”, “poverty” and “special poverty”, and the frequent itemset is generated based on this.At the end of the Apriori algorithm, 1584 frequent item sets are finally obtained, some of which are shown in Table 1. Through the Apriori algorithm can be associated for the frequent item sets of students, for the in-depth analysis of the specific situation of students receiving financial support in different cases, and through the support and confidence count can be clear about the number of students under different categories of frequent item sets. It provides a reference for the development of the specific content of the college student financial aid policy, better planning for the application of financial aid program, to enhance the effective implementation of the college student financial aid policy, to ensure that poor students can complete their studies without any worries, but also for the cultivation of talents in colleges and universities to provide a reliable economic support.

Table 1.

Partial frequent sets

No.	Frequent set	Support count	Confidence count
1	No scholarship, not poor, female, 1000128	186	217
2	Non-poverty, female, 1000128	224	308
3	Non-poor students, 1000128	231	284
4	Female, 1000136	269	262
5	No scholarship, 1000124	307	135
6	Male, 1000037	86	96
7	1000064, 11000069	74	88
8	Male, 1000082, 1000084	81	101
9	Male, 22, 1000071	83	79

4.1.2

Analysis of association rules

A total of 217 strong association rules were generated by running the code related to the Apriori algorithm for association rules based on frequent itemsets with the number of rule length association attributes, including 5 rules of length 3, 35 rules of length 4 and 177 rules of length 5. The support ranges from 12% to 25% and the confidence level from 80% to 100%. It indicates that the attributes extracted from statistical significance and probability are more highly correlated with the attribute of student poverty. The latter item is precisely matched to “special poverty” to obtain the association rules and sorted according to the decreasing support, and the first five rules are organized as shown in Table 2.

Table 2.

High trusted approximate precision association rules-Special poverty

No.	Approximate precision rule	Support	Confidence
1	Per capita net income < 2200 yuan =< Special poverty	25%	100%
2	Whether to loan = Yes & per capita net income < 2200 yuan =< Special poverty	25%	100%
3	Whether out of poverty = out of poverty & per capita net income < 2200 yuan =< Special poverty	25%	100%
4	Whether to loan = Yes & Whether out of poverty = out of poverty & per capita net income < 2200 yuan =< Special poverty	25%	100%
5	Physical health = health & per capita net income < 2200 yuan =< Special poverty	25%	100%
……

In the first 5 days of correlation rules obtained when the latter item is accurately matched to “special poverty”, the attributes with high correlation degree include “per capita net income <2200”, “whether the loan = yes”, “whether the student is out of poverty = no poverty alleviation”, and “physical health = health”, with support and confidence levels of 25% and 100%, respectively.This is also in line with our usual life of judging a family’s financial poverty mainly from the perspectives of income and whether it has applied for student loans.

From the association rules now available, it appears that the latter term is “special difficulties”, in order to extract the main factors that are associated with “poverty” and “poverty in general”.Therefore, the posterior terms of the association rules are precisely matched to “general poverty” and “poverty”, and the parameters of minimum support and minimum confidence are reset to extract the corresponding strong association rules.By adjusting the minimum support level to 15% and the minimum confidence level to 70%, 20 association rules that meet the requirements are extracted. The latter items in the association analysis were precisely matched to “poverty” and “general poverty” to obtain the corresponding association rules, which were sorted according to the decreasing support, and the first five items were viewed and organized as shown in Table 3.

Table 3.

High trusted approximate precision association rules

The latter term is “poverty”
No.	Approximate precision rule	Support	Confidence
1	Property income < 328.7 yuan & per capita net income > 3200 yuan & Annual income > 18000 yuan =< poverty	15%	76%
2	Family number > 10 & Annual income > 18000 yuan =< poverty	14%	82%
3	Family number > 10 & per capita net income > 3200 yuan & Annual income > 18000 yuan =< poverty	14%	85%
4	Whether to loan = Yes & Family number > 10 & Annual income > 18000 yuan =< poverty	14%	82%
5	Family number > 10 & whether out of poverty = out of poverty & Annual income > 18000 yuan =< poverty	13%	82%
……
The latter term is “general poverty”
No.	Approximate precision rule	Support	Confidence
1	Family woodland area < 1 mam & Wage income < 11000 yuan & per capita net income < 2200 yuan =< general poverty	9%	60%
2	Wage income < 11000 yuan & per capita net income < 2200 yuan & Annual income=12000~18000 yuan =< general poverty	8%	60%
3	Wage income < 11000 yuan & Productive expenditure=1200~1900yuan & per capita net income < 2200 yuan =< general poverty	6%	63%
4	Wage income < 11000 yuan & Transfer income=0yuan & per capita net income < 2200 yuan =< general	6%	60%
5	Production operating income=3800~7000yuan & Wage income < 11000 yuan & per capita net income < 2200 yuan =< general	6%	60%
……

When the latter item in the correlation analysis is accurately matched to “poverty”, the rules with a large correlation degree are “property income”, “per capita net income”, “annual household income”, “family size”, “whether to take out loans” and “whether the place of origin is out of poverty”, and the support and confidence ranges are 13%~15% and 76%~85%, respectively. It is found that “family forest land area”, “wage income”, “per capita net income”, “family annual income”, “productive expenditure”, “transfer income” and “production and operation income” are highly correlated with the family poverty attribute of “general poverty”, and their support and confidence are between 6%~9% and 60%~63%, respectively. In summary, the correlation rule algorithm can thoroughly mine the relevant data of poor students, and provide a reliable data basis for the precise economic support of college students’ financial aid policy.

4.2

Clustering of financial aid policies and learning status

4.2.1

Clustering of Students’ Learning Status

For students’ learning status, the variables to be considered when using FCM cluster analysis are learning behavior, learning interest, learning pressure, and learning gain. Before using the FCM cluster analysis method, this study first standardized the dimensions of learning status by converting the raw scores of the four dimensions (factor scores calculated according to the weights after percentile forward assignment) into standard scores (Z-scores). After transforming the categorical variables into standard scores, the clustering was carried out by the FCM clustering algorithm, and the analysis results are shown in Table 4, and Figure 2 shows the final clustering centers of different types of college student financial aid policies.

Table 4.

The type of college students’ study

Variable	Clustering
Variable	1	2	3
Learning behavior	-0.173	1.138	-0.924
Learning interest	-0.328	1.316	-0.811
Learning stress	-0.062	0.134	0.015
Learning harvest	0.083	0.985	-1.247
Number (%)	4264 (52.42%)	2028 (24.93%)	1842 (22.65%)

Cluster 1 is the stray type. This type of national scholarship college students have negative values in study behavior, study interest and study pressure, but positive values in study gain, at the same time, both negative values of study behavior, study interest and study pressure and positive values of study gain are close to the value of 0, which indicates that they are at the average level overall. Moreover, the horizontal comparison shows that the dimensions of learning status are at the middle level compared to the other two types of national scholarship college students. The above can show that this kind of national scholarship college students invest general time and energy in study, have low interest in study, and have general learning gains. This type of student is often referred to as having average performance in all aspects of study, accounting for 52.42% of the total number of students.

Cluster 2 is the positive type. The values of this type of national scholarship college students in all dimensions of learning status are positive, that is, their performance in all dimensions is higher than the average, and all dimensions of learning status are the highest relative to the other two types of national scholarship college students. It means that this kind of national scholarship college students have more learning behaviors, a higher interest in learning, and professional knowledge and skills enhancement, and other aspects of learning gains. This type of students is usually referred to as the national scholarship college students with excellent learning performance and active learning, and about 24.93% of the students belong to the active type.

Cluster 3 is the lazy type. The values of study behavior, study interest and study gain of this type of national scholarship college students are negative, i.e., significantly lower than the average, and the lowest relative to the other two types of national scholarship college students. This indicates that college students have invested little time and energy in their studies, and their interest in studying is extremely low. The value of study pressure of this kind of national scholarship college students is slightly higher than 0, which is in the middle level compared with the other two categories of national scholarship college students, indicating that they have certain study pressure. This type of student is what we usually call a poor performer in all aspects of learning. About 22.65% of students belong to this type.

4.2.2

Satisfaction with student financial assistance policies

The financial assistance policies for college students are mainly divided into four categories, namely, national scholarships, national inspirational scholarships, national grants and national student loans, which are represented by Y1~Y4. The implementation of different financial aid policies for different types of students can help enhance their motivation to study and help them complete their studies better. This study explores the different funding policy groups within the national scholarship college students, and finds out the differences in the satisfaction of different types of national scholarship college students in the funding policies they receive, and the specific analysis results are shown in Figure 3. In the figure, *** indicates that there is a significant difference in the satisfaction of different types of college students with the financial aid policies (P<0.05).

Through one-way ANOVA, for the college students funded by National Scholarship and National Inspiration Scholarship, there is no significant difference between the satisfaction of financial aid policies for the stray type, active type and lazy type (Sig=0.297/0.238>0.05). For national grants and national student loans funded college students, there is a significant difference between the satisfaction of financial aid policies of stray, active and lazy types (Sig=0.002/0.001<0.05). Among them, it can be found that among the national scholarship and national student loan funded college students, the financial aid policy satisfaction is ranked from high to low as active type > stray type > lazy type. This indicates that in the group of national scholarship and national student loan funded university students, the satisfaction of financial aid policy of the positive type is significantly higher than that of the stray type and the lazy type.

4.3

Precise classification and identification of poor students

4.3.1

Student data selection

Based on the consumption data of the Campus One Card collected in the previous section, the information on the average daily meal consumption amount of the students is counted, and the results obtained are specifically shown in Figure 4. The corresponding horizontal coordinate is the information related to the average daily consumption amount, and the vertical coordinate is the number of people corresponding to this consumption amount. Analysis of this figure shows that it exhibits a certain bell-shaped distribution, with most of the average daily consumption in the range of 20 yuan to 27 yuan, corresponding to the number of students consuming more than 240, while the number of students in other ranges is relatively small. In addition, the subject count and average meal cost tables in the consumption data are connected and processed. In these tables, the first 10 columns represent statistical values corresponding to different types of consumption. Further analysis shows that there are certain disturbing information, such as students whose average daily consumption of meals exceeds thirty. These students are assessed as poor students, or those who consume very little but are not selected. According to the law of normal distribution, it can be seen that this type of data would have a significant impact on the results, and therefore this data was removed from the study.

4.3.2

MD-KNN Experimental Analysis

According to the collected experimental sample data, the MD-KNN algorithm is used to analyze the data, setting more than twenty labels such as economy, consumption, study, place of origin, whether there is a certificate of poverty in the place of origin, whether there is a disability, etc., and then carrying out an iterative experimental analysis. After obtaining the preliminary list of poor students, then set the conditions for screening the student list in the results, i.e., the number of students to be selected, the level of scholarship, etc., so as to obtain the list of poor students recommended by the MD-KNN algorithm screening. The list obtained by MD-KNN algorithm screening is compared with the list of students recommended by actual manual review. The matching rate of the two lists is about 55%, which is not a high probability, but the reason for this may be twofold: first, the MD-KNN algorithm may need to be improved to better adapt to the application environment of the screening of poor students in colleges and universities, and second, the manual screening of the list has a great deal of uncertainty, and the teachers and students in many cases use the application form and the usual cognition (or even don’t know each other) to Screening recommendations, manual screening of poor students also has some loopholes. Therefore, it is necessary to analyze the consumption of students in the two lists. Figure 5 displays the results of the comparison of the average daily consumption amount.

Through comparison, it is found that the consumption level of the student list screened by the paper is significantly lower than that of the list of poor students provided by the school and obtained by the actual personnel involved in the evaluation, which shows that the accurate identification model of poor students designed by this paper based on the MD-KNN algorithm is effective for screening poor students. Although there are some economically disadvantaged students who have higher consumption data because of physical or disease reasons, the majority of economically disadvantaged students should have lower consumption data in the student population. Therefore, the MD-KNN algorithm is an effective algorithm for screening difficult students and deserves further analysis and research.

5

Conclusion

In this paper, association rules, cluster analysis and MD-KNN algorithm are utilized to study the association rules of college student financial aid policies, student categories, and data related to student consumption, with the aim of improving the accurate economic support for college student financial aid groups.

1) The Apriori correlation algorithm can be used to obtain the strong correlation rules of “special poverty”, “poverty” and “general poverty”, which mainly include “per capita net income”, “whether to take out loans”, “whether the student is out of poverty”, “property income”, “annual household income”, “productive expenditure”, “production and operation income”, etc. The level of support and confidence is still high. This shows that each of the above rules is important information for identifying poor students, and can be used as a reference when determining the subsidized group.

2) Based on the FCM cluster analysis algorithm, the funded groups are divided into stray type, active type and lazy type, which account for 52.42%, 24.93% and 22.65% of the research sample respectively. And for the funded groups of national grants and national student loans, there is a significant difference (Sig=0.002/0.001<0.05) between the satisfaction of the funding policy of the stray type, active type and lazy type.

3) The MD-KNN algorithm can be used to classify the students’ average daily consumption data more, and the average daily consumption amount of poor students is mainly concentrated in the range of 20 to 27 yuan, and there are also some students whose average daily consumption amount is high, but it may be due to the increase of the consumption amount caused by physical reasons.

In summary, the data mining related technology can help colleges and universities to accurately identify poor students and make choices of college student financial aid policies based on the students’ specific conditions, so as to realize more accurate economic support as a way to meet the needs of students to complete their studies.

Fund Project:

Key Scientific Research Projects of Guangxi Science and Technology Normal University’s 2023 Scientific Research Fund Project (B Block), “‘Three-Full Person’ Concept Under the College Funding Person Work Study” (Project Number: GXKS2023ZDB016).

Langue:: Anglais

Périodicité:: 1 fois par an
Sujets de la revue:: Sciences de la vie, Sciences de la vie, autres, Mathématiques, Mathématiques appliquées, Mathématiques générales, Physique, Physique, autres

RSS Feed de la revue

Research on the precise economic support model of college student financial aid policy to the funded group based on cluster analysis

Ying Deng

Mingxia Li

Xiaolong Li

Publié en ligne: 21 mars 2025

Reçu: 15 nov. 2024

Accepté: 14 févr. 2025

DOI: https://doi.org/10.2478/amns-2025-0655

Mots clésFCM clustering, MD-KNN, Association rules, Financial aid policy

© 2025 Ying Deng, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Mots clés
FCM clustering, MD-KNN, Association rules, Financial aid policy