Selection and Optimization of University English Teaching Path Based on Knowledge Distillation and Transfer Learning
Published Online: Mar 17, 2025
Received: Oct 17, 2024
Accepted: Jan 29, 2025
DOI: https://doi.org/10.2478/amns-2025-0348
Keywords
© 2025 Jing Li, published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
Traditional data-driven diagnostic methods require training and testing data to have consistent distributions, however, distributional differences are bound to exist in different domains. In order to adapt to the distributional differences between different domains, migration learning methods have become increasingly popular as a means of solving such problems by fine-tuning a pre-trained model to reuse a limited amount of target data [1-4]. In the field of computer vision, migration learning is widely used in tasks such as image classification, target detection, and image generation. In the field of natural language processing, migration learning also plays an important role. For example, in tasks such as text classification, sentiment analysis, and machine translation, by using pre-trained word vectors or language models, the performance and generalization ability of the models can be significantly improved. There are still some challenges on how to accurately assess the similarity between source and target tasks to determine the effectiveness of migration learning, how to choose appropriate migration strategies to adapt to different tasks and data distributions, and how to balance the relationship between source and target tasks to avoid the occurrence of negative migration phenomena [5-8].
Knowledge distillation refers to the transfer of high-precision diagnostic knowledge from a cumbersome model to a lightweight model to improve its diagnostic accuracy. The field of knowledge distillation has seen a number of novel algorithms and techniques emerge in recent years, and contrast distillation facilitates the learning of more detailed feature representations in student models by introducing an additional contrast loss term in the student model [9-11]. Attention-guided knowledge distillation focuses on using the attention mechanism of the teacher model to guide the student model to learn important feature maps or temporal information, and by distilling the attention weights or the attention distribution of the feature maps, the student model learns the key features more effectively [12]. Adaptive distillation techniques dynamically adjust distillation strategies based on the performance of student models during training to utilize instructor knowledge more efficiently [13]. Multi-instructor distillation combines the knowledge of multiple instructor models with different strengths to provide more comprehensive guidance to student models. These state-of-the-art knowledge distillation techniques continue to push the boundaries of model compression and transfer learning, providing more powerful and flexible solutions for real-world applications [14-15].
Learning ability is one of the students’ core qualities to be cultivated in English curriculum, and it is a key element in the development of core qualities, and the development of learning ability helps students master scientific learning methods and develop good lifelong learning habits [16-17]. The teaching methods of English teachers affect students’ English learning methods, and the scientific and effective teaching methods of English teachers are an important way to help students improve their learning effect and develop their learning ability. Therefore, teachers should purposefully and consciously use the theories related to knowledge distillation and transfer learning to guide the teaching design and promote the transfer of learning in classroom teaching practice [18-20].
In this paper, knowledge distillation and transfer learning are combined to build a model of English teaching path selection and optimization in colleges and universities, and a teaching experiment is evaluated using the T-test method. The model is mainly divided into two parts: the teacher model and the student model, and the dense convolutional neural network (DenseNet) is selected for the teacher model, while the artificial neural network (ANN) is selected for the student model. In the model effect evaluation experiment, the experimental group and the control group used the model-optimized English teaching path and the traditional teaching method respectively. The characteristics of the research subjects were first described, and then the differences in background variables and representativeness of the research subjects were examined to ensure that the samples were selected reliably. Finally, independent samples are utilized and paired samples t-tests are applied to compare the effects of English proficiency improvement of students in the experimental and control groups.
In transfer learning, the source domain is the domain being transferred while the target domain is the domain to be learned. The migration task involves applying models and knowledge learned in the old domain to the new domain. Transfer learning is achieved by identifying and utilizing the commonalities of domains as a bridge to systematically transfer existing cognitive achievements from the source domain to the target domain.
In this paper, we use the notation
Specifically, through the method of transfer learning, the source domain data is utilized on the target domain to learn a model that minimizes the error of the prediction function :
A core aspect of transfer learning is to construct and optimize a predictive model for the target domain based on existing source domain data. The learning process in which one task helps to facilitate another task is called “positive transfer learning”, while the learning process in which one task mitigates another task is called “negative transfer learning”.
The essence of knowledge distillation (KD) belongs to the category of transfer learning, and its main idea is to take the well-trained model as the teacher model, and “distill” the “knowledge” from the output of the teacher model for the training of the student model by controlling the temperature
The knowledge distillation model consists of three parts: teacher model, student model and knowledge transfer, and the whole process is trained on a supervised dataset. It transfers the knowledge learned by the teacher model, which has a large number of parameters and strong learning ability, to the student model, which has fewer parameters and weaker learning ability.
Based on the knowledge used for distillation, distillation can be divided into the following three ways:
1) Response-based distillation: learning the output of the teacher model, e.g. DistilBERT model, where the student model learns the knowledge of the output layer of the teacher model. 2) Feature-based distillation: learning the knowledge of the middle layer of the teacher model, e.g. PKDBERT, where the student model learns the knowledge of the middle layer. 3) Relation-based distillation: learning the relationship between layers of the teacher model, the relationship between samples, TinyBERT learns the knowledge of the embedding layer.
The main reason why this knowledge is effective is that some implicit features (dark knowledge) cannot be represented at the data level, and teacher models with strong learning ability can learn these features. For general classification problems, the label of data is a “one-hot” category, i.e., the category of a piece of data is fixed, which is called “hard label”.
In the process of knowledge distillation, the trained teacher network will provide the label probability distributions of the softmax layer to the student model as a guide during prediction. And these distributions of label probability contain inter-category information, which can be referred to as soft labels.
The degree of knowledge distillation is determined by temperature, with a higher temperature value indicating a higher degree of distillation and a more moderate label distribution. Whereas a decrease in the temperature value indicates a lower degree of distillation, which in turn amplifies the probability of misclassification and therefore introduces unnecessary noise.
The teacher network and the student network are jointly trained, and the knowledge and learning style of the teacher network affects the learning of the student network, and the loss function Loss is shown in equation (1):
Where:
The combination of knowledge distillation and transfer learning provides an innovative approach to teaching English in higher education. Specifically, a model pre-trained on a large-scale multilingual dataset can be adapted to a specific English teaching task using transfer learning techniques. Then, knowledge distillation techniques are used to further optimize this adapted model to make it more suitable for specific teaching scenarios and student groups. This combination mechanism not only improves the pedagogical adaptability and efficiency of the model, but also reduces the reliance on large amounts of labeled data to some extent.
In this paper, the selected teacher model is Dense Convolutional Neural Network (DenseNet).
DenseNet is a deep learning network inspired and improved by the residual network (ResNet) architecture, and the DenseNet network structure is shown in Fig. 1 [21]. It is different from ResNet’s method to improve the network performance, and its core idea is dense connection, which establishes the connection relationship between different layers, makes full use of the feature information of each layer, and improves the training effect of the network. DenseNet mainly consists of multiple dense blocks and transition layers, and each layer of the Dense Block is connected to all the previous layers, using the Concatenation The Dense Block is connected to all previous layers using the Concatenation method, and keeps the feature map size of each layer the same, and the adjacent Dense Blocks are connected by the Transition Layer, and downsampling is achieved by the batch normalization layer, the activation layer, the convolution layer, and the pooling layer, which reduces the number of channels by using the 1×1 convolution, and reduces the size of the feature maps by pooling, and serves as a compression model.

Structure of dense convolutional neural network
In a general convolutional neural network, layer
And in DenseNet, it will connect each previous layer as an input:
where
In this paper, the selected student model is the Artificial Neural Network (ANN).
ANN is a multilayer supervised learning perceptron with strong self-learning ability to minimize the empirical risk, and the structure of ANN is shown in Fig. 2 [22]. The ANN model is mainly composed of two parts: information feed-forward propagation and error feedback propagation. In the information feed forward broadcast, the input samples are adaptively and randomly extracted features through multiple implicit layers, which are mapped to the target type by the output layer.

Structure diagram of ANN
Let the training sample set be
Where:
The expression of the output layer prediction function of ANN is:
Where:
Determine the training samples {
The training parameters
Where:
In this paper, the teacher model is used as the training model for the source domain, while the student model is used as the training model for the target domain. A dense convolutional neural network is built as the teacher model to pre-train the source domain samples, and the output predictions are soft labels. The artificial neural network is built as the student model, the main task is the hard labels fitted at the student model

Schematic diagram of knowledge distillation
The KD method introduces the value of the temperature factor
Where:
where
The teacher model loss function
Where:
Student Model Loss Function
Where:
Soft-label loss consists of two parts: teacher model soft-label loss and student model soft-label prediction loss, i.e., the accumulation of teacher model cross entropy and student model cross entropy, and its functional expression is:
Where,
Hard-label loss consists of two parts of hard-label prediction loss and hard-label loss of the student model, i.e.,
Where:
Both model tasks train college English teaching path samples, which are similar, so the two tasks share the hidden layer parameters while retaining the output layer of their respective tasks. The distillation loss function is mainly composed of two parts of the loss function of the teacher model
Where:
The above distillation function expression only realizes the marginal distribution difference of domain migration. With this basis, this paper introduces the stratified transfer learning algorithm (STL) to improve the conditional distribution difference of domain samples with the function expression:
Where:
The final distillation target loss
where
In summary, for sample
The English teaching path selection and optimization model for colleges and universities proposed in this paper consists of four main parts: acquisition of English teaching resources and teaching contexts, sample preprocessing, model construction and training, and migration optimization of English teaching paths in colleges and universities. The process framework of the college English teaching path selection and optimization model based on knowledge distillation and transfer learning is shown in Figure 4, and its main steps are as follows:
1) English teaching resources and teaching context input. Take the basic knowledge points of English teaching and diversified teaching materials as teaching resources, take specific teaching tasks and needs as teaching contexts, and input both as samples into the model. 2) Sample preprocessing. The input English teaching resources and teaching contexts are preprocessed for sample normalization and paired to obtain English teaching path samples. The source domain samples are divided into a training and test set according to a ratio of 7:3, and the target domain samples are divided into 20% labeled and 80% unlabeled samples. 3) Model building and training. Firstly, the normalized source domain samples are extracted using DenseNet adaptive features, and the source domain test samples are used to obtain the optimal source domain migration model. Secondly, the target domain samples are pre-trained by ANN model, the temperature factor T is set, the knowledge distillation is “purified”, the soft target loss and hard target loss are obtained, and the hierarchical migration is introduced to improve the difference in the distribution of conditions, the final distillation loss function is obtained, and the feedback is provided to update the student model. 4) College English teaching path migration optimization. The target domain samples are input to the distilled student model, the features are mapped to the high-dimensional RKHS space, and the Softmax logistic classifier is used to realize the optimization decision of the college English teaching path.

Flow chart for selecting and optimizing college English teaching paths
On the basis of evaluating the model, this paper designs a controlled experiment on teaching and uses independent samples t-test and paired samples t-test to explore the effectiveness of the English teaching path in colleges and universities obtained by model optimization.
The t-test for two independent samples is used to test whether two independent samples come from an aggregate with the same mean, that is, to test whether two independent normal aggregates have equal means [23].
The two independent samples t-test entails testing whether there is a significant difference between the means of the two aggregates. Its null hypothesis is
The two independent samples test of means presupposes that the distribution of two independent aggregates obeys normal distributions
Under the condition that the null hypothesis is valid, the t-statistic is used for the test of the mean of two independent samples. The t-statistic for constructing two independent samples is selected and analyzed in two cases.
1) When the variances of the two populations are unknown but equal, i.e.,
Where:
This statistic obeys a t-distribution with a degree of freedom of
2) The t-test statistic constructed when the variances of the two aggregates are unknown and unequal, i.e.
This statistic obeys a t-distribution with modified degrees of freedom:
In statistical analysis, if the variances of two totals are equal, this is said to satisfy variance chi-square. Determining the chi-square of two independent samples is the key to constructing and selecting the two independent samples t-test statistic, which can be utilized to test whether there is a significant difference in the variance of the two totals using the Levene F chi-square test.
First, the null hypothesis
The formula for calculating the value of the F statistic in the F test is:
Where:
Given the null hypothesis, the observed value of the test statistic is obtained by substituting the test value
When the
The two paired samples test is used to test whether two related samples come from a normal population with the same mean, i.e., for two paired samples, it is inferred whether the means of the two populations are significantly different [24].
The paired samples t-test is also required to test whether there is a significant difference between two overall means with the null hypothesis that
Let (
Under the condition that the null hypothesis holds, the mean of the difference from the population is zero.
The paired samples t-test uses the t-statistic, constructed as:
When
The criteria for determining the significance of differences in the paired-samples t-test are consistent with the independent-samples t-test.
This study aims to verify the impact of the college English teaching path optimization model by intervening in college English teaching through a teaching path optimized based on knowledge distillation and transfer learning. The purpose of the specific experimental study includes two aspects: first, to verify whether the optimized college English teaching path has a significant effect on the improvement of English proficiency level of non-English major college students. Second, to verify whether the optimized college English teaching path has different effects on different dimensions of English proficiency.
In the academic year 2023-2024, the researcher undertook the task of teaching college English courses to the 2023 classes of product design and animation majors at the College of Art and Design, University of S. Thus, the students majoring in product design and animation in 2023 were identified as the subjects of this experimental study. The product design class consists of three natural classes (referred to as “product 2301-03”), totaling 92 students. The animation class consists of two natural classes (referred to as “animation 2301-02”), totaling 54 students. In view of the fact that the university English teaching in the case school adopts the co-teaching system, Product 2301-03 was randomly selected as the experimental class and Animation 2301-02 as the control class, and an unequal pre and post-test teaching experiment was carried out with the two English natural classes.
A summary of the personal background information of the experimental research subjects is shown in Table 1.
The personal information of the experimental subjects
| Background information | Experimental class | Control class | |||
|---|---|---|---|---|---|
| Number | Percentage | Number | Percentage | ||
| Gender | Male | 34 | 36.96% | 20 | 37.04% |
| Female | 58 | 63.04% | 34 | 62.96% | |
| Nationality | The Han Nationality | 75 | 81.52% | 43 | 79.63% |
| Minority Nationality | 17 | 18.48% | 11 | 20.37% | |
| Place of student source | The north | 91 | 98.91% | 54 | 100.00% |
| The south | 1 | 1.09% | 0 | 0.00% | |
| College entrance examination results | 50-89 points | 70 | 76.09% | 42 | 77.78% |
| 90-150 points | 22 | 23.91% | 12 | 22.22% | |
As can be seen from Table 1, there is a great similarity between the research subjects in the two classes. In the gender dimension, both are dominated by female students, accounting for more than 62%. In the ethnic dimension, both are predominantly Han Chinese, accounting for more than 75%. In the dimension of place of origin, only one person in the experimental class was born in a southern city, while the rest were born in the north. In the dimension of college entrance examination English performance, both the experimental class and the control class have more than 75% of the research subjects’ college entrance examination English scores below 90 points, corresponding to the English level of students in English B classes of other universities, and more than 20% of the research subjects’ college entrance examination English scores above 90 points, corresponding to the English level of students in English A classes of other universities, which is basically in line with the distribution of the English level of non-English-major college students of the same case study schools.
Due to the large difference in the number of students in the experimental and control classes, further tests of the relevant variables for the study population were needed to ensure that there were no significant differences between the study population in the experimental and control classes. The results of the test of differences in background variables between the samples of the experimental and control classes are shown in Table 2.
Difference test of background variables between experimental class and control class
| Sum of squares | Degree of freedom | Mean square | F | Significance | ||
|---|---|---|---|---|---|---|
| Gender | Inter group | 0.067 | 1 | 0.067 | 0.236 | 0.648 |
| Within group | 34.185 | 144 | 0.258 | |||
| Total | 34.264 | 145 | ||||
| Nationality | Inter group | 0.253 | 1 | 0.239 | 1.305 | 0.272 |
| Within group | 25.144 | 144 | 0.181 | |||
| Total | 25.316 | 145 | ||||
| Place of student source | Inter group | 0.008 | 1 | 0.008 | 0.042 | 0.841 |
| Within group | 28.641 | 144 | 0.214 | |||
| Total | 28.657 | 145 | ||||
| College entrance examination results | Inter group | 0.152 | 1 | 0.152 | 0.224 | 0.654 |
| Within group | 91.246 | 144 | 0.694 | |||
| Total | 91.383 | 145 | ||||
As can be seen from Table 2, in the dimensions of gender, ethnicity, place of origin, and English achievement in the college entrance examination, the ANOVA chi-square test was conducted on the research subjects in the experimental and control classes, and the F-values were equal to 0.236, 1.305, 0.042, 0.224, and the p-values were 0.648, 0.272, 0.841, and 0.654, which were greater than 0.05 respectively.This shows that there is no significant difference between the research subjects in the experimental and control classes in the dimensions of gender, ethnicity, place of origin and English scores in the college entrance examination, there is no significant difference between the subjects of the experimental class and the control class. It can be seen that the two classes are highly homogeneous and possess the necessary research conditions to conduct the experiment.
In order to further test the representativeness of the students in the quasi-experimental classes, a one-sample t-test was conducted with the research subjects participating in the experiment as a single sample and the 704 people participating in the survey of English proficiency level proficiency in the case schools as the reference population to test the difference in the distribution of English proficiency levels between the experimental sample and the total sample of the survey. The results of the one-sample t-test are shown in Table 3.
Single sample T test results
| Test value=1.72 | ||||||
|---|---|---|---|---|---|---|
| t | Degree of freedom | Sig.(two-tailed) | Mean difference | 95% confidence interval for the difference | ||
| Lower limit | Upper limit | |||||
| Inter group | -0.204 | 145 | 0.857 | -0.016 | -0.17 | 0.12 |
As can be seen from Table 3, although the experimental research subjects in this paper are art majors, there is no significant difference between them and other non-English majors undergraduates in the case school in the dimension of English proficiency level (t=-0.204, p=0.857>0.05), which suggests that the experimental research subjects can be representative of the overall level of the research of non-English majors undergraduates in the case school. In addition, combining with Table 1, it can be seen that the distribution of English college entrance examination scores of the experimental research subjects is concentrated in the range of 70-90 points, which is also generally consistent with the distribution of English proficiency of non-English-major undergraduates in the case schools, and has the overall characteristics of the research subjects.
This study adopts an experimental comparison approach, in which the experimental class and the control class were taught college English with the same teaching materials in the same teaching environment for two semesters, and all the subjects were tested on their English proficiency before and after the experiment (pre-test and post-test), which included listening, reading comprehension, oral expression, and written expression, and each of them accounted for 25% of the total score, and the scoring was based on a percentage system. The experimental class was guided by the optimization pathway of English teaching derived from the model in this paper, while the control class was taught in the traditional way.
The study used quantitative methods, applying SPSS28.0 statistical software for quantitative analysis, including independent samples t-test and paired samples t-test, to compare and analyze the performance of the students in the experimental and control classes in all dimensions of English language proficiency before and after the experiment, and to test the gaps between the test data of the two groups in terms of the progress of performance.
The results of the statistical analysis of the pre and post-test paired samples of the experimental and control classes are shown in Table 4. In the table, D1, D2, D3 and D4 represent the average scores of listening, reading comprehension, oral expression and written expression, respectively.
Statistical analysis of paired samples before and after testing
| D1 | D2 | D3 | D4 | Total | Number | Standard deviation | Standard error | ||
|---|---|---|---|---|---|---|---|---|---|
| Pretest | Experimental class | 65.12 | 71.09 | 56.89 | 74.53 | 66.91 | 92 | 9.423 | 1.476 |
| Control class | 63.64 | 69.24 | 58.13 | 73.96 | 66.24 | 54 | 8.205 | 1.215 | |
| Posttest | Experimental class | 78.57 | 86.35 | 71.37 | 86.52 | 80.70 | 92 | 7.341 | 1.027 |
| Control class | 70.21 | 76.42 | 62.44 | 80.11 | 72.30 | 54 | 8.256 | 1.234 | |
As can be seen from Table 4, in terms of the mean value of the total score, the experimental class and the control class in the pre-test were 66.91 and 66.24 respectively, with standard deviations of 9.423 and 8.205 respectively, which indicates that the English level of the experimental class before the experiment was slightly higher than that of the control class, and that there was not a big difference in general. In the post-test, the mean total scores of the control class and the experimental class are 80.70 and 72.30 respectively, with standard deviations of 7.341 and 8.256 respectively, which indicates that the English proficiency of the experimental class has been improved after the experiment, and the polarization is narrowing, while the learning level of students in the control class has not improved significantly, and there is a tendency to widen the polarization a little bit. As for the specific scores of the dimensions of English proficiency, the maximum score difference between the experimental class and the control class in the pre-test is 1.85 points, and the minimum score difference between the two classes in the post-test is 6.41 points, and the score gap has increased significantly.
The results of the independent samples t-test for the pre and post-tests of the experimental and control classes are shown in Table 5. Where D1, D2, D3 and D4 denote listening, reading comprehension, oral expression and written expression, respectively.
Independent sample T-test for pretest and posttest
| Dimension | Levin’s test for equality of variances | T-test | ||||||
|---|---|---|---|---|---|---|---|---|
| Assuming | F | Sig. | t | Sig.(2-tailed) | 95% confidence interval of the difference | |||
| Lower limit | Upper limit | |||||||
| Pretest | D1 | Equal variances | 3.315 | 0.081 | 0.145 | 0.715 | -3.527 | 4.024 |
| D2 | Equal variances | 4.232 | 0.079 | 0.156 | 0.637 | -4.135 | 3.859 | |
| D3 | Equal variances | 2.964 | 0.093 | 0.189 | 0.741 | -2.958 | 3.421 | |
| D4 | Equal variances | 3.109 | 0.105 | 0.195 | 0.822 | -3.763 | 4.325 | |
| Posttest | D1 | Equal variances | 2.942 | 0.089 | 2.943 | 0.007 | -6.942 | -2.354 |
| D2 | Equal variances | 3.857 | 0.094 | 3.127 | 0.000 | -7.257 | -2.678 | |
| D3 | Equal variances | 2.645 | 0.132 | 3.481 | 0.003 | -8.104 | -2.593 | |
| D4 | Equal variances | 3.054 | 0.103 | 2.894 | 0.018 | -7.109 | -1.742 | |
From Table 5, it can be seen that in the pre-test, the significant values of the variance chi-square test for the dimensions of Listening, Reading Comprehension, Oral Expression, and Written Expression are 0.081, 0.079, 0.093, and 0.105, respectively, which are greater than 0.05, which means that the variance of the dimensions where the two classes are located is homogeneous. Under variance chi-square, the significant values (two-tailed) of each dimension are 0.715, 0.637, 0.741 and 0.822 respectively, all of which are greater than 0.05. Combined with Table 4, it can be seen that the scores of the dimensions of English proficiency of the two classes are not significantly different from each other before the experiment although there is a slight difference between them in terms of the mean value and the standard deviation, i.e., the two classes are basically comparable in terms of their English proficiency before the experiment. In the posttest, the p-values of the variance chi-square test are all greater than 0.05, i.e., variance chi-square. Meanwhile, under variance chi-square, the significant values (two-tailed) are all less than 0.05, which indicates that there is a statistically significant difference in the posttest scores of the two groups.
The results of the pre and post-test paired samples t-tests for the experimental and control classes are shown in Table 6, with D1, D2, D3 and D4 denoting listening, reading comprehension, oral expression and written expression, respectively.
Paired sample T-test for pretest and posttest
| Paired difference | t | Sig.(2-tailed) | |||||
|---|---|---|---|---|---|---|---|
| Mean | Std. deviation | Std. error mean | |||||
| Experimental class | D1 | Pretest and posttest | -13.45 | 1.421 | 0.241 | 3.147 | 0.000 |
| D2 | Pretest and posttest | -15.26 | 1.714 | 0.197 | 4.274 | 0.000 | |
| D3 | Pretest and posttest | -14.48 | 2.047 | 0.259 | -2.716 | 0.000 | |
| D4 | Pretest and posttest | -11.99 | 1.892 | 0.264 | -4.207 | 0.000 | |
| Control class | D1 | Pretest and posttest | -6.57 | 4.125 | 0.755 | 4.815 | 0.085 |
| D2 | Pretest and posttest | -7.18 | 5.264 | 0.761 | -5.143 | 0.073 | |
| D3 | Pretest and posttest | -4.31 | 3.973 | 0.804 | -3.459 | 0.104 | |
| D4 | Pretest and posttest | -6.15 | 4.677 | 0.825 | -4.531 | 0.096 | |
As can be seen from Table 6, the p-value of the two pre- and post-test scores of the experimental class in the dimensions of listening, reading comprehension, oral expression and written expression is 0.000, which is less than 0.05, indicating that there is a statistically significant difference between the two test scores of the experimental class in all dimensions of English proficiency. This result supports the hypothesis that the optimized English teaching path presented in this paper can promote the improvement of students’ English proficiency. Similarly, there is no statistically significant difference between the two scores of the control class in all dimensions (p>0.05), this result indicates that the progress of the control class is not obvious compared with the experimental class, which verifies the validity of the optimization model of the English teaching path of this paper’s model in colleges and universities.
To enhance the quality and efficiency of English teaching, this paper examines a college English teaching path selection and optimization model that incorporates knowledge distillation and knowledge transfer. The teacher-student model is used to pre-train the English teaching path samples in the source and target domains respectively, adaptively and stochastically extract the English teaching path features, and obtain the final distillation loss by continuously adjusting the temperature factor T, which is fed back to the student model to ultimately realize the decision-making of the optimization of the English teaching paths in colleges and universities, and the T-test method is used to evaluate the optimization effect of the model. The model evaluation experiment’s conclusions are as follows:
First of all, this paper conducts reliability tests on the selected research subjects. In the dimensions of gender, ethnicity, place of origin, and English achievement in the college entrance examination, the ANOVA chi-square test was conducted for the students in the experimental class and the control class, and the F values were 0.236, 1.305, 0.042, 0.224, and the p values were 0.648, 0.272, 0.841, and 0.654, which were all greater than 0.05. It indicates that in the dimensions of gender, ethnicity, place of origin, and English achievement in the college entrance examination, the research subjects in the experimental and control classes are homogeneous and have the conditions for experimental implementation. At the same time, a one-sample t-test was conducted with the research subjects participating in the experiment as a single sample and the students participating in the English proficiency level survey in the case school as the reference population. The results show that there is no significant difference between the experimental research subjects in this paper and other non-English major undergraduate students in the case schools in the English proficiency level dimension (t=-0.204, p=0.857>0.05), indicating that the selected research subjects are representative.
Second, the two-sample t-test was utilized to assess the differences in the English proficiency scores of the two groups of students. There is no significant difference in the English proficiency of the experimental group adopting the optimized teaching path of this paper’s model and the control group adopting the traditional teaching method before the experiment (p>0.05), while there is a significant difference after the experiment (p<0.05). And the English proficiency of the experimental group after the experiment is significantly improved compared with that before the experiment (p<0.05), while the control group is not significantly improved (p>0.05). This fully demonstrates the usefulness of the model in this paper for optimizing the English teaching process in colleges and universities.
This research was supported by the Henan Provincial Higher Education Teaching Reform Research and Practice Project Approval (Progressive Integration, Categorized Expansion: Research and Practice on the Paradigm of English Curriculum Serving the Career Development of Medical Students, 2024SJGLX0876).
