Data mining and neural network modeling for teaching and learning in vocational education: promoting innovation in academic management and teaching reforms

Since its creation, data mining technology has been widely used in a variety of fields, such as in the financial system, telecommunication industry, retail industry, health care industry, biomedical industry, as well as science and engineering, etc., and all of them have the corresponding mature mining system formation, which has made great contributions to these fields. However, the application of data mining in different industries is not universal, and its application in the field of colleges and universities is relatively small, with the maturity of data mining technology and the continuous expansion of the application field, at present, many college and university researchers have begun to study the application of data mining technology in the teaching and management of colleges and universities [1–5]. Higher vocational education is an important part of higher education and also an important part of vocational education. In recent years, the development of higher vocational education has been rapid, and at the same time, there are some problems. The development of higher vocational is not related to its scale and capital investment, the key is the quality of education. In order to improve the quality of its own education, higher vocational should firstly have a full understanding of its own cultivation object, namely students, and at the same time, combine the characteristics of higher vocational itself and social demand, and combine the three to formulate corresponding educational measures and strategies, in order to form a management system with its own characteristics [6–10].

Higher vocational education is employment-oriented education, what kind of talents the society needs, the school should cultivate what kind of talents. In the process of vocational education coursework assessment, all the elements that may affect students’ coursework effectiveness should be considered, but there are more elements associated with students’ coursework effectiveness, and the degree of influence between each element and element varies, so the results of coursework assessment are usually difficult to interpret with an accurate and appropriate mathematical analytical formula, and the traditional classification methods are difficult to accurately deal with this problem. And neural networks, with their nonlinear mapping, self-learning, self-organization and self-adaptive ability, have good results in solving complex and difficult internal mechanisms, which can provide new processing methods for the solution of nonlinear classification, pattern recognition, signal processing and other problems [11–15]. Therefore, grasping the two centers of vocational education and employment, applying data mining technology and neural network to higher vocational education, a large amount of valuable information can be found from it to realize a data mining and neural network model used for the analysis of students’ characteristics, the analysis of success factors, the prediction of students’ employment and the analysis of coursework assessment. The application of this information to the field of academic management in colleges and universities can promote the further reform, improvement and development of the education system, and can provide an important basis for decision-making in the management of colleges and universities, so as to promote the sustained and healthy development of the cause of higher vocational education [16–20].

In this paper, an early warning analysis model (TabNet) for students’ academic management and teaching is constructed based on data mining techniques. It also utilizes data cleaning, transformation, and approximation methods to preprocess students’ course performance data and construct their datasets. Meanwhile, based on the above research results, an academic early warning system for vocational education teaching in colleges and universities was successfully designed and implemented. The system provides data support and decision-making support for the teaching management of the college while providing early warning of students’ academic performance. Finally, the construction method and implementation process of an intelligent classroom for vocational education teaching are provided.

2

Data mining and processing processes in vocational education and teaching

2.1

Data Mining Flowchart

Data mining [21] is a relatively complex system engineering, this paper, in the context of vocational education and teaching, divides the data mining process of academic management and teaching reform into the following five phases, which are as follows: the phase of clarifying the problem, the phase of data collection, the phase of data pre-processing, the phase of model building, and the phase of model interpretation and evaluation. The specific process of data mining is shown in Figure 1.

2.2

Data pre-processing

In data mining, most of the collected data is incomplete and disorganized, which cannot directly be used for mining analysis. Therefore, in order to improve the quality of data analysis, preprocessing operations are required. The methods of data preprocessing have been developed over the years and have become very mature. The following mainly introduces data cleaning, data conversion, and data normalization.

2.2.1

Data cleansing

Data cleansing is the procedure of detecting and correcting identifiable errors in a dataset, including checking data consistency, handling invalid and missing data, and so on. This paper focuses on methods for cleaning incomplete data and noise data. 1)

Cleaning Incomplete Data

Using incomplete data for data mining can have an impact on the analysis results of data mining algorithms and may even lead to algorithmic errors. For the missing data in the original dataset, it is necessary to determine the range of missing values, calculate the proportion of missing values for each field, and select different cleaning methods according to the proportion of missing and the importance of the field.

2)

Noisy data

Noisy data in the original dataset refers to data containing outliers or error values, and these noisy data will have an impact on the results of data mining, so it is necessary to process the noisy data before data analysis. Noisy data can be processed using the binning method, which divides the data into different subintervals according to their attribute values, and adopts the corresponding methods to process the data in different subintervals; clustering can be used to find out and remove those isolated points that are far away from the center of the clusters; and regression can be used to find the fitting curve between two attributes, use one attribute to predict the value of the other. Finding and removing data that differ significantly from the predicted values.

2.2.2

Data conversion

Data conversion is the process of transforming data into a form suitable for data analysis, or changing the structure or format of the original data through the processing of data specification. For example, before mining and analyzing the student achievement data in the academic management system of colleges and universities, the student achievement data can be converted into indicators such as the average grade point, failure rate, and failed science scores, so that the data can become more meaningful. Several methods of data conversion are introduced below: 1)

Min-max normalization:

Let max and min be the maximum and minimum values of a field in the dataset respectively, the min-max normalization is to map the value v in this field to the v′ in the new interval [new_min, new_max], this method can eliminate the magnitude of different fields in the dataset, the mapping formula is shown below: 1 $v' = \frac{v - \min}{\max - \min} (n e w_{\max} - n e w_{\min}) + n e w_{\min}$

2)

Fractional calibration normalization:

Decimal calibration normalization can also be referred to as base transformation specification. Specifically, it is achieved by moving the decimal point position to normalize the relevant fields in the data set. The transformation relational equation is shown below: 2 $v' = \frac{v}{10^{j}}$

In the above equation, j is the smallest integer of max(|v′|) < 1, and v and v′ denote the values of current and normalized attributes respectively.

2.2.3

Data attribution

Data generalization refers to simplifying the dataset as much as possible on the basis of maintaining the information content of the dataset, so as to achieve the purpose of improving the efficiency of data mining. For large datasets, it is necessary to perform data generalization on the dataset before data mining, and several methods of data statute are introduced below. 1)

Dimension reduction

Dimensional approximation is divided into two main methods: the first is directly from the original data set to delete those fields with less relevance to the mining target or repeated fields; the second is to reduce the number of fields in the original data set through the method of dimensionality reduction, the original fields are re-combined into a new set of irrelevant to each other a few integrated variables, while maintaining as much as possible the original data set of fields contained in the information.

2)

Numerical generalization

Numerical approximation refers to reducing the amount of data by using adopting smaller data units or replacing the original data with a data model, and the specific methods can be divided into two categories: parametric and parameter-free. Parametric methods are used to evaluate the original dataset by building a model that only stores parameters without storing specific data. The parameter-free method, on the other hand, requires the storage of specific data and divides the original dataset into several clusters so that the data in the clusters are similar to each other and the data between each cluster are different from each other, using the clusters of the data instead of the actual data.

2.3

Constructing the Academic Management Dataset

2.3.1

Data distribution characteristics

Due to the batching standards of different courses, different examination contents lead to the distribution of student achievement data of different majors is not reasonable, there will be some abnormal distribution, such as the overall course grades are high, or the overall low, rather than in the intervals of the overall normal distribution. Therefore, in order to intuitively understand whether the distribution of different courses in academic data is normal or not, this experiment divides five achievement intervals, which are (80, 100], (60, 80], in (40, 60], (20, 40], [0, 20]. In order to verify the effect of this approach, a number of courses are randomly selected as an example of the performance data for division, where students of four majors A, B, C and D are selected, for which different course scores of C1, C2, C3 and C4 are selected as the object of division. The results of the distribution of the scores of the four majors and their corresponding courses are shown in Figure 2. The four graphs in Figure 2 show histograms of the distribution of scores in each class after division by interval. The C1 course scores of A majors are mainly distributed between 80-90 points, while the C2 course scores of B majors are mainly distributed between 70-85 points, and the C3 course scores of C majors are mainly distributed between 70-90 points, while the C4 course grades of students majoring in D have a certain distribution in each interval and do not conform to normal distribution. This shows that there is a significant difference in the distribution of grades in the courses of different majors. The reason for this large difference is not only due to the influence of the course characteristics and the difficulty of the teacher’s marking, but also due to the missing values in the raw data, more error values, resulting in the total distribution of students’ grades is affected, so it is necessary to further process the student data.

2.3.2

Data integration

Since a college has different majors, and different majors have different course systems, the academic early warning model constructed in this experiment is applicable to students of all relevant majors in the college. Moreover, the courses chosen by each student are not the same, making it impossible to make a simple score comparison between different courses. Taking each person’s course as a feature will lead to too large data features, which will make the model too complicated and not highly generalizable when constructing the model for prediction. Therefore, in order to objectively demonstrate each student’s academic performance, an overall assessment of the student’s performance in each semester was used, so this study used statistical indicators such as GPA, credits taken, credits failed, number of courses failed, and grade point average in each semester as the data input features for analysis. After the process, four students were randomly selected and their input data format for the first three semesters was listed. Table 1 exhibits the data input format for the first three semesters.

Table 1.

Data input format for the first three semestings

First semester
Student number(ID)	20240528	20240157	20240346	2024004
The first semester	3.72	2.56	2.88	3.83
First semester credit	27.5	32.0	28.5	27.0
No credit for the first semester	0	4.5	0	5.0
The first of the semester	0	2	0	3
First semester average	88.53	76.56	72.38	88.86
Second semester
Student number(ID)	20240077	20240172	20240369	20240605
The second semester	2.90	2.77	3.52	3.96
Second t semester credit	27.0	30.5	26.5	28.0
No credit for the second semester	4.5	0	0	5.5
The second of the semester	1	0	0	2
Second semester average	88.63	76.66	72.48	88.96
Third semester
Student number(ID)	20240077	20240172	20240369	20240605
The third semester	2.90	2.76	3.51	3.95
Third semester credit	27.0	29.5	25.5	28.0
No credit for the third semester	0	4.5	5.5	0
The third of the semester	0	1	2	0
Third semester average	88.53	76.56	72.38	88.89

Because the traditional academic warning has the disadvantage of lagging, often at the end of the semester or in the upper grades of the student’s academic problems before the problem is exposed, when the academic warning is too late, which will lead to the effect of the academic warning is greatly reduced. Therefore, in order to intervene earlier for students, this experiment will advance the academic warning for students, based on students’ grades in the first two semesters, and predict whether students need academic warning in the third semester to conduct the study. The focus of the study is to provide students with timely academic warnings in the early grades so that they have the opportunity to make corrections. Therefore, the academic number was used as the unique ID of the student. The final data entry format for each student is shown in Table 2.

Table 2.

The final data input format for each student

Serial number	Field name
1	Student number(ID)
2	Student number(ID)
3	The first semester
4	First semester credit
5	No credit for the first semester
6	The first of the semester
7	The second semester
8	Second t semester credit
9	No credit for the second semester
10	The second of the semester
11	Second semester average

Raw data often contains course scores for some students due to missing exams, comprehensive test credit recognition, not being entered into the system, etc. Null value, so the need for data in the null value of the processing, the mainstream data complementary there are three methods. Respectively, the average score of this course is used to complement the plurality of complementary and median complementary processes, with value 0 processing. As the data in this paper is used in each semester statistical indicators, so the missing value of the course can be directly rounded off without processing, this operation almost does not affect the overall statistical indicators for each student, basically reflecting the student’s academic performance in each semester.

The Pearson correlation coefficient is a measure of the degree of linear correlation between two variables, with a value between -1 and 1. Assuming that there are two variables X and Y, the Pearson correlation coefficient of the two variables is expressed as the quotient of their covariance and variance, and the Pearson coefficient is calculated as shown in equation (3): 3 $ρ_{X, Y} = \frac{cov (X, Y)}{σ_{X} σ_{Y}} = \frac{E (X Y) - E (X) E (Y)}{\sqrt{E (X^{2}) - E {(X)}^{2}} \sqrt{E (Y^{2}) - E {(Y)}^{2}}}$ where cov is the covariance of X and Y, and E is the mathematical expectation. The correlation coefficients between students’ performance characteristics were calculated using Pearson correlation coefficient analysis, and the scatter plot of the correlation between the first semester and third semester performance characteristics is shown in Figure 3. It can be seen that by fitting the scatterplot, the first semester GPAs and third semester GPAs of different students are generally positively correlated, which is consistent with common knowledge that performance between each semester shows a strong correlation. It is often the case that students who start off with poorer studies have a high probability of performing poorer thereafter. Also the skewed normal distribution of first semester GPAs to third semester GPAs indicates that the largest number of students have GPAs between 2.8 and 3.6, which is consistent with expectations for overall student performance.

3

Research on academic early warning model based on TabNet

3.1

Introduction to the TabNet algorithm

3.1.1

Decision flow shape of TabNet neural network

The process of neural network [22] to construct the decision flow shape is similar to the process of decision tree to perform decision making, in the figure, x₁ and are two features of the sample, the decision tree uses a and d as the decision thresholds of features x₁ and x₂ respectively, and classifies the sample linearly according to the feature values of samples x₁ and x₂. A simple example of a decision tree executing a decision is shown in Figure 4:

The neural network, on the other hand, mimics the tree model decision-making process using a designed fully connected layer and the ReLU function, and the neural network decision-making process is shown in Figure 5. As can be seen from Fig. 5 Neural Network Decision Flow Shape, firstly the input data of the neural network is a feature vector [x₁, x₂] the feature vector passes through the Mask layer, the features x₁ and x₂ are filtered out from the feature vector, and then through a fully connected layer which has been designed with weights and biases, the features x₁ and x₂ are linearly transformed and converted into new vectors, and the ReLU function processes the elements of the two feature vectors in a manner that It is guaranteed that one and only one element of each vector is positive and the other elements go to 0. The features with positive numbers in the vectors are the selected features. The fully connected layer and the ReLU activation function correspond to the conditional judgment of the decision flow shape, and finally the judgment results of the two features are linearly summed up, and the output result is calculated by the Softmax function.

3.1.2

TabNet structure

The TabNet architecture is shown in Fig. 6. TabNet [23–24] is a stack of multiple decision steps, each consisting of Feature transformer and Attentive transformer, Mask layer, Split layer and ReLU. The input sample features such as discrete features, TabNet first maps the discretized features into continuous numerical features using training embedding, and then ensures that the data input form of each decision step is an B×D matrix, where B stands for the size of the batchsize, and D stands for the dimensionality of the academic warning features. The features of each decision step are output by the Attentive transfomer of the previous decision step, and finally the decision step output is integrated into the overall decision.

1)

Feature transformer

Feature transformer functions to implement the feature computation at the decision step. Feature transformer is a bit more complex compared to neural network that implements feature processing process through FC, Feature transformer consists of BN layer, Gated Linear Unit (GLU) layer and fully connected layer, the purpose of GLU is to add a gate unit on top of the original FC layer, the computation is as shown in Eq. (4): 4 $h (X) = (W * X + b) \oplus σ (V * X + c)$

2)

Split layer

The role of Split layer is to cut the vectors output by Feature transformer, the calculation is shown in equation (5): 5 $[d [i], a [i]] = f_{i} (M [i], f)$

In the above equation d[i] denotes the final output of the computational model and a[i] denotes the Mask layer for the next decision step.

3)

Attentive transformer

Attentive transformer obtains the Mask layer matrix of the current decision step based on the output of the previous decision step and makes the Mask matrix is sparse and non-repeating, so that the Mask matrix of the student samples is also different, with the function of letting the samples choose different features.

The Attentive transformer structure is shown in Fig. 7. In Fig. 7, the role of the Sparsemax layer makes the Attentive transformer output results sparse, Sparsemax directly projects the feature vector to the simplex to achieve the sparsification, the calculation formula is shown below: 6 $S p a r s e \max (z) = \arg_{p \in Δ}^{\min} ‖ p - z ‖$

According to the application of Sparsemax function in Attentive transformer, the calculation formula is shown below: 7 $M [i] = S p a r s e \max (P [i - 1] * h_{i} (a [i - 1]))$

In the above equation, a[i-1] is the Split layer divided in a decision step, h_i (·) denotes the FC+BN layer, and P[i] is the Priorscales term in the form shown below: 8 $P [i] = \prod_{j = 1}^{i} (γ - M [j])$

P[i] indicates how often a feature is used in the later decision step, if a feature is used too often in the previous decision step, the probability of the feature being selected is low, such as making γ = 1, each feature can only be used once in the TabNet decision step. Since the Mask matrix is B*D dimensional, from the nature of the Sparsemax function, M[i] indicates that in the current decision step, the corresponding weight values are assigned to all the features of each sample and the sum of the feature weights of each sample is equal to 1. This enables the feature selection of the student samples, allowing TabNet to use the most favorable features for the model in each decision step, and in order to control the selected features’ sparsity, TabNet uses sparse regular terms. The formula is shown below: 9 $L_{s p a r s c} = \sum_{i = 1}^{N_{s e p p}} \sum_{b = 1}^{B} \sum_{j = 1}^{D} \frac{- M_{b, j} [i]}{N_{s t e p s} \cdot B} \log (M_{b, j} [i] + ε)$

3.2

KNN-TabNet modeling

The KNN interpolation method uses the remaining features to construct the multidimensional space, selects the k samples closest to the missing samples according to the distance measure of the samples, and the k samples are assigned sub-sample feature weight values based on the distance ratio, and the missing data values of the features of the samples are estimated after weighting calculation.

The distance between two student samples x_a, x_b Ben is d(x_a, x_b), and there are both discrete variable samples and continuous variable features in the student sample characteristics, using a distance metric for different types of variables, the Heterogeneous Euclidean-Overlap Metric, abbreviated as HEOM, with d(x_a, x_b) defined as shown below: 10 $d (x_{a}, x_{b}) = \sqrt{\sum_{j = 1}^{p} d_{j} {(x_{a j}, x_{b j})}^{2}}$

In the above equation, d_j(x_aj, x_bj) denotes the distance between the student sample x_a and the j th variable of x_b, which is realized as follows: 11 $d_{j} (x_{a j}, x_{b j}) = {\begin{array}{l} 1 & (1 - m_{a j}) (1 - m_{b j}) = 0 \\ d_{o} (x_{a j}, x_{b j}) & j t h var i a b l e i s a d i s c r e t e var i a b l e \\ d_{N} (x_{a j}, x_{b j}) & j t h var i a b l e i s a c o n t i n u o u s var i a b l e \end{array}$

For sample x_a or x_b the j th feature is missing, return 1. When the j th feature is a discrete variable, use the d_o calculation, and the d_o calculation formula is as follows: 12 $d_{o} (x_{a j}, x_{b j}) = {\begin{array}{l} 0, & x_{a j} = x_{b j} \\ 1, & x_{a j} \neq x_{b j} \end{array}$

If the j th feature is a continuum value then it is calculated using the d_N function with the formula shown below: 13 $d_{N} (x_{a j}, x_{b j}) = \frac{| x_{a j} - x_{b j} |}{\max (x_{j}) - \min (x_{j})}$

For the KNN interpolation method, if the j th feature of sample x_a is missing, the closest k student sample is selected, and the j th feature of all k samples is not missing, and the k samples will be formed into a set in the order of proximity to distance, and the set representation formula is as follows: 14 $ϑ_{x_{a}} = {v_{k}}_{k = 1}^{K}$

Where v₁ is the closest sample to the missing feature sample. For samples with missing features of continuous value type, the weighted average of k samples is used to fill the missing value of sample x_a, and the weight is the ratio of the distance of a single sample to the sum of the distances of all the samples: for samples with missing features of discrete value type, the eigenvalues of the nearest neighboring samples of k samples are weighted and voted, and the eigenvalue with the highest number of votes is the eigenvalue of that missing sample.

3.3

Analysis of Academic Early Warning Results

3.3.1

Academic Alert Classifier

For visualization, this paper uses 30 test sets and 30 training sets for decision tree warning with accuracy (a), precision (p), recall (r) and F1 values under TabNet neural network. The results of the decision tree performance on the warning data are shown in Fig. 8. From the figure, it can be seen that the average accuracy of the classifier on the test set is 93.11% and the average recall is 74.35%. While the classifier performs poorly on the training set, its overall accuracy is 81.36% with a recall of only 47.07%, indicating that the classifier is not precise enough to classify the positive examples. In contrast to the decision tree algorithm, the support vector machine, on the other hand, also performed well on the test set. Since the support vector machine maps the data to a high-dimensional space for classification, it is difficult to visualize its classification process visually, and only the performance is analyzed here.

In the support vector machine algorithm, the dataset is normalized. Because the number of features in the warning dataset is not very large and the number of sample sets is sufficiently full compared to the amount of features, a Gaussian kernel function is used and its parameter gamma is set to 0.5 according to the inverse of the number of categories of the warning results. The results of the performance of the Support Vector Machine algorithm on the warning dataset are shown in Fig. 9. The support vector machine algorithm is shown in Fig. 5 obtained 87.36% recall and 91.14% precision rate on the training set, as much as possible, we would like to obtain a high recall along with a high precision rate, but the mutual constraints of these two metrics inevitably lead to the inability to satisfy them at the same time. Finding a balance between recall and precision, we obtained the reconciled average evaluation metric F1 value of the two, and the average F1 values of the training set and test set were 81.57% and 81.85%, respectively. The similarity between the test and training sets of the model is extremely high at this point.

3.3.2

Performance analysis of TabNet neural network vs. control group

In this paper, the performance of TabNet neural network algorithm is compared and analyzed with the traditional BP neural network algorithm. The loss rate curves of TabNet algorithm and BP algorithm on the training and test sets are shown in Fig. 10. The (a) and (b) are the loss rate of Bp neural network and TabNet neural network respectively. On the test set, the loss curve of the TabNet algorithm is 43.75% lower than that of the BP neural network algorithm, reaching a minimum of 0.045. It shows that the TabNet neural network has a better generalization performance than the traditional BP neural network, with a low chance of overfitting, when predicting new data. The results show that the BP neural network algorithm ended up with a difference of about 33.33% between the training set and the test set loss curves, while the difference in the curves on the TabNet neural network algorithm is an order of magnitude smaller than the former. It is inevitable that there is a gap between the loss curves of the training set and the test set, because the classifiers learned on the training set try to fit the test set samples as much as possible there will be errors, and if the gap is too large, overfitting may have occurred. The accuracy of the joint result shows that both classifiers perform well here, but the latter is more detailed.

In order to intuitively judge the value of the results of these two classifiers in predictive classification, the ROC curve graph is introduced here, the larger the area under the curve indicates that the value of the classifier is greater, i.e., when the curve is closer to 0 on the X-axis, and closer to 1 on the Y-axis, the higher the accuracy rate is. The ROC comparison curves of the two on the training and test sets are shown in Fig. 11. From the ROC curve in Fig. 11, it can be seen that the TabNet neural network is closer to the axes, and the areas of both are 0.8054 and 0.8913, respectively, which indicates that the predicted TabNet neural network is more accurate, and the classified results are better value to use.

In order to avoid issuing warnings at a time when they are not working well therefore the three weeks at the beginning and end of the semester were not considered as the optimal time, we ended up testing the classification using a classifier according to two weeks as a set of dynamic data. Weeks 9-10 were determined to be the best time to implement the alert system from an accuracy standpoint and from a data standpoint such as the frequency of student library access, as the student profile data showed a slackening of motivation. Early warning at that point in time would allow the student time to change their learning style while maintaining efficient predictive accuracy, followed by continued attention to the student’s dynamic academic profile with periodic occurrences of early warning. Early warning alerts were obtained indicating that the student was failing exams frequently and that his recent study behavior and study self-discipline remained poor, and that an early warning should be issued to urge him to correct his studies and strengthen his study habits.

4

Innovative methods for reforming intelligent teaching models in higher education institutions

4.1

Construction of Intelligent Teaching System for Courses

The construction of intelligent teaching system mainly refers to the collection and analysis of learning behavior data related to vocational education teaching through big data, cloud computing, Internet of Things and other emerging information technologies in vocational education curriculum teaching, to realize the perception and identification of learners’ learning portrait and learning status, so as to provide the basic conditions for the further realization of intelligent intelligent teaching. Specific practices include: ① Equip teachers with network information technology application ability as team leaders in the teaching team. ② Adopt human-computer combination, cross-border cooperation and data-driven and other ways to realize the construction of teaching resources, mainly including the construction of an intelligent classroom platform based on the network teaching resource library. ③ Through the network, artificial intelligence and other means to build a platform for communication, interaction and information feedback between teachers and students, through which real-time records are made of students’ questions, answers and interactions in the classroom.

4.2

Data-based instructional design

In the intelligent teaching mode, teaching design is an extremely critical link, which is related to the effectiveness of the course and the personalized teaching level of teachers. First of all, the intelligent teaching mode will carry out data collection and analysis in the design process, on the basis of which it proposes the implementation of personalized learning programs for students with different levels and abilities, and this process is also the basis for the implementation of personalized teaching mode and teaching methods to achieve the expected results. Second, the intelligent teaching mode in the design process according to the data empowerment analysis to get the student learning behavior portrait information, these portrait information into the student learning model, and combined with the current learning state of the students, design targeted, can meet the requirements of the knowledge transfer of the learner’s activities to implement the program. Again, the intelligent teaching program is classified and analyzed based on the collected student data in the design process. Finally, the intelligent teaching program is synthesized and analyzed in the design process based on the collected data and information, and the results are used to develop a personalized learning model implementation plan and specific activity arrangements.

4.3

Smart Classroom Implementation

The intelligent classroom teaching mode is mainly implemented through the following two aspects: first, before teaching, the teacher pushes the syllabus of the course, teaching objectives, unit training content, learning tasks and practice questions and other relevant content through the cell phone software, and pushes the relevant information and knowledge in combination with the learning objectives and learning requirements that the teacher has set in advance. Secondly, the intelligent robot or intelligent terminal is taught and instructed through online and offline training platforms.

5

Conclusion

In this paper, an academic management dataset is constructed based on big data, and then the TabNet neural network algorithm is used to provide early warning on the academic results of vocational education teaching. Finally, based on the early warning results, it proposes the construction method and implementation process of the intelligent teaching model for colleges and universities. The primary conclusions are as follows: 1)

Due to the missing values in the raw data of the grades of different specialized courses, more error values and other reasons, the total distribution of students’ grades has a large difference. Therefore, this study adopts statistical indicators such as GPA, credits taken, credits not passed, number of courses not passed, and grade point average of each semester as the data input characteristics for analysis.

2)

The average accuracy and recall of the classifier on the test set are 93.11% and 74.35%, respectively; while the classifier is not precise enough to categorize the positive examples on the training set, as evidenced by the fact that its recall is only 47.07% with an overall average accuracy of 81.36%. Compared to the decision tree algorithm, the support vector machine has a recall and precision of 87.36% and 91.14%, respectively, on the training set, and its similarity between the training and test sets is extremely high, with F1 means of 81.57% and 81.85%, respectively.

3)

On the test set, the TabNet algorithm has a better generalization performance than the BP neural network algorithm with a low chance of overfitting. The results show that the BP neural network algorithm ended up with a difference of about 33.33% between the training and test set loss curves, while the TabNet algorithm had a minimal difference. The ROC curves show that the TabNet neural network has a higher accuracy and classifies the results to a better use value.

4)

Through the data mining technology and neural network algorithm, the academic management process is standardized, the collected data information is comprehensively analyzed, and according to the results, the personalized learning mode implementation plan and specific activity arrangements are formulated, which in turn promotes the innovation of academic management and teaching reform.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

Data mining and neural network modeling for teaching and learning in vocational education: promoting innovation in academic management and teaching reforms

Wangkai Xu

Geng Rui

Lihuan Xu

Published Online: Mar 19, 2025

Received: Nov 22, 2024

Accepted: Feb 20, 2025

DOI: https://doi.org/10.2478/amns-2025-0444

KeywordsTabNet, KNN, BP neural network algorithm, Data mining, Teaching vocational education

© 2025 Wangkai Xu et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
TabNet, KNN, BP neural network algorithm, Data mining, Teaching vocational education