Prediction of English Vocabulary Learning Difficulty and Adjustment of Teaching Strategies Based on Decision Tree Algorithm 
Data publikacji: 21 mar 2025
Otrzymano: 07 lis 2024
Przyjęty: 15 lut 2025
DOI: https://doi.org/10.2478/amns-2025-0614
Słowa kluczowe
© 2025 Liqiang Song et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
English vocabulary, as the basic teaching of English education, has a great influence on the development of students’ English reading and English writing ability. The importance of vocabulary teaching is self-evident to English teaching [1-2]. If students want to learn English well, they need to master a large number of vocabulary, so that they can better understand English knowledge, get information about the content of the article and understand the thoughts and feelings of the article [3-4]. When discussing vocabulary teaching, many teachers tend to think that it is the teaching process of vocabulary cognition, vocabulary interpretation, vocabulary memorization and application, which is, in fact, one of the important links in the cultivation of English core literacy [5-6]. Through effective optimization of vocabulary teaching, students’ language proficiency, learning ability, thinking quality and cultural awareness can be better developed [7].
Vocabulary teaching is a major difficulty in the educational process of English subject. In the traditional form of teaching, teachers often ask students to memorize relevant vocabulary content to gain basic knowledge and practice relevant topics to get high scores in the exam [8-9]. However, this form of teaching is easy to limit the development of students’ thinking and affect students’ interest in learning. After the educational concepts of the new curriculum standard, when teaching English vocabulary, teachers should improve students’ interest in learning, so that students can learn vocabulary knowledge from the perspective of improving the core literacy of individual English subjects [10-12]. It is also a great test for teachers to make adjustments to the vocabulary teaching process so as to promote the all-round development of students [13-14].
With the rapid development of information technology, the mode of English teaching has experienced great changes. Vocabulary is the foundation of English learning [15-16]. However, the traditional way of teaching vocabulary is often too single and difficult to adapt to the development needs of the information technology society [17]. Facing this challenge, teachers need to actively explore new teaching strategies and make full use of the advantages of information technology in order to stimulate students’ interest in learning and improve teaching efficiency [18-19]. Therefore, it is necessary to explore English vocabulary teaching strategies under the background of information technology and explore new teaching methods in order to improve students’ vocabulary mastery and the quality of English teaching [20-21].
Therefore, it is necessary to study the current situation of students’ English vocabulary teaching and students’ English vocabulary level, and Literature. [22] systematically analyzes the importance of English vocabulary and the current situation of English vocabulary learning, focuses on the current English vocabulary learning strategies, and makes positive contributions to the design and strategy optimization of English vocabulary learning teaching methods. Based on the above research, it can be seen that the students’ English vocabulary knowledge level is extremely unoptimistic, and the optimization of English vocabulary teaching methods is imperative.Literature [23], based on a mixed-method design, reveals that both teachers and students show motivation in teaching English vocabulary, with teachers most preferring the use of whole-context strategies for teaching English vocabulary, while students prefer determination and metacognitive strategies in English vocabulary learning, and states that vocabulary strategy choice is highly correlated with students’ level of vocabulary learning.Literature [24] combines qualitative techniques and descriptive research design methodology with English teachers as the subjects of the study, supplemented by the analysis of relevant informational data such as documents, interviews, and observations, revealing that English teachers at MTsS Siulak Gedang chose methodological tools such as dictionaries and translation techniques to teach English vocabulary, while the teachers indicated that the selection of these vocabulary teaching strategies followed the appropriate as well as the simplicity principle. Literature [25] describes that English vocabulary teaching in Chinese university vocabulary teaching classrooms is mainly rote memorization and passive learning with a lack of opportunities for practice and interaction, and proposes the introduction of the CLT approach to English vocabulary teaching to help students build vocabulary, and points out that this strategy can also help to improve the students’ practical performance in real-life linguistic and English language contexts. Literature [26] designed a vocabulary learning application based on the mobile platform of cell phones and discussed in detail the development and design process, aiming to meet the needs of students’ English vocabulary learning in any space and at any time. Using semi-structured interviews, [27] revealed that strategies such as shared learning, feedback, and peer assessment can help improve students’ vocabulary knowledge in English vocabulary teaching practice. Literature [28] discusses an educational game teaching model centered on the Android platform, which has positive significance in promoting vocabulary enhancement for English learners, and considers this game education model an innovative teaching practice that provides an important reference for the optimization of English vocabulary teaching.
In this paper, the definition and method of DBSCAN clustering are introduced first, and the performance of the clustering method is evaluated using one internal evaluation index for clustering and five external evaluation indexes for clustering.Then, the decision tree classification algorithm is proposed, optimized on the most common ID3 algorithm, and the C4.5 algorithm is proposed as the English vocabulary learning difficulty classification algorithm in this paper. For the problem of long computation time of decision tree construction of C4.5 algorithm, the balance coefficient is introduced for optimization, so as to carry out the prediction of English vocabulary learning difficulty. Finally, a method for adjusting English vocabulary teaching strategies is proposed.The improved teaching strategies are applied in practice for the purpose of testing and evaluating the method presented in this paper.
Cluster analysis [29], the groupings obtained after similarity partitioning, have a great deal of similarity in the data points within the groups, while the similarity between the groups is small. Assume that there exists a data set 
Evaluation metrics for cluster analysis currently include: internal evaluation metrics and external evaluation metrics. Several clustering evaluation metrics are described below, including: the contour coefficient (SC), the standardized mutual information (NMI), the Jacarrd coefficient (Jac), the Rand index (RI), the F-Measure (FM), and the 
The contour coefficient (SC) is a typical internal evaluation metric for clustering. Suppose, dataset 
Where point 
Secondly, the contour coefficient 
Finally, the contour coefficient (SC) of dataset 
Standard mutual information (NMI) is a typical external evaluation metric. Suppose, 
The four evaluation metrics, Jacarrd’s coefficient (Jac), Rand’s index (RI), F-Measure (FM), and F1 index (F1), are classical, simple, and commonly used external evaluation metrics. Suppose, there are data points 
Jacarrd coefficient (Jac) can be calculated by equation (8), the larger the value of Jacarrd coefficient (Jac), the better the clustering effect.
The Rand Index (RI) can be calculated by Equation (9), the larger the value of Rand Index (RI), the better the clustering effect.
The 
F-Measure (FM) uses Precision (Pre) and Recall (Rec) together as an index to evaluate the performance of clustering. Where 
F-Measure (FM) can be calculated by equation (13), the larger the value of F-Measure (FM), the better the clustering effect.
DBSCAN [30] is a pioneering and popular algorithm for density-based cluster analysis. It is capable of recognizing clusters of any shape without setting the number of clusters and is robust to outliers, so in this paper it is applied to English vocabulary learning difficulty clustering.The relevant definitions of DBSCAN algorithm are as follows:
 Definition 1.  Definition 2. Density: the density of data point  Definition 3. Core point: point  Definition 4. Boundary Points: a point  Definition 5. Noise Point: If point  Definition 6. Direct Density Reachable: a point  Definition 7. Density Reachability: If there exists a series of data points  Definition 8. Density connected: Point  Definition 9. Clustering: a cluster  (1) For any point  (2) For any  The main idea of the DBSCAN algorithm is to randomly select a point from the dataset at point 
The decision tree algorithm is an inductive learning algorithm that is able to reason about the specific representation of a decision tree and the classification rules from a collection of irregular data samples.
The following is a general description of constructing a decision tree:
 1) Firstly, all the training samples on the root node are selected for attributes, and the best feature attribute is selected as the root node. 2) According to this feature attribute, all training samples at the position of this node are divided, and the generated subset of training samples is the best classification at this node. 3) Determine whether the divided subset of samples has been considered correctly classified and if so create leaf nodes. 4) If there are still some subsets of samples that cannot be classified into the correct class, then it is necessary to correspond to a subset of these samples to find the best features, continue to split the subset of samples, and build good corresponding nodes. 5) The whole process is a recursive operation, the termination condition is that all the data of the training samples are correctly classified or the features are empty, so that all the samples can correspond to a leaf node.
This section focuses on a brief introduction of the research status of the above aspects.
In the process of constructing a decision tree, the core problem of the whole algorithm is to select the nodes of the decision tree. Information gain and information gain rate are attribute selection metrics based on information entropy, so information entropy is introduced first.
Information entropy is a measure of the degree of information confusion. If the distribution of classes inside the data subset is uniformly mixed, then the information entropy is high, and if the distribution of classes in the subset is single, the information entropy is lower. Its specific definition is as follows:
Let 
Information gain refers to the difference between the original information of the data set and the information after categorization, and the calculation process of information gain is as follows:
Let attribute 
Then the information gain obtained after partitioning the sample set for the current node according to the 
The information gain rate is the normalization of the information gain, and the normalization of the information gain uses the concept of split information.
Attribute 
Equation (18) is the formula for the gain rate of attribute 
The 
The 
If the total data set contains more cluttered categories, the 
Scholars have proposed pruning techniques to improve the overfitting situation. Currently, the commonly used pruning methods are pre-pruning and post-pruning.
Pre-pruning method is to prune the decision tree by stopping the construction of the tree in advance, which is not often used because the threshold of stopping cannot be obtained in advance.
Post-pruning methods are used to simplify large-scale decision trees by pruning the branches of certain nodes after the decision tree has grown completely.
C4.5 algorithm is an optimized algorithm for ID3 algorithm, as this algorithm is more accurate and faster than ID3. The decision tree created by C4.5 algorithm can be used for classification prediction.The main improvements of C4.5 algorithm for ID3 are as follows.
(1) The C4.5 algorithm is able to handle default values in the training data.
(2) There is a more sophisticated approach to the pruning process of decision trees.
(3) The C4.5 algorithm is able to handle continuous data by discretizing the attributes of continuous types.
The main difference between the C4.5 algorithm and the ID3 algorithm [31] is the difference in the attribute selection metric used, where the ID3 algorithm utilizes information gain, the C4.5 algorithm applies the information gain rate for the attribute selection calculation. In addition the C4.5 algorithm can process continuous type attributes after discretizing them first.
The C4.5 algorithm utilizes the information gain rate to select the split attributes of the current node, which effectively eliminates the disadvantage of information gain tending to multi-valued attribute selection.
The flowchart of C4.5 algorithm is shown in Fig. 1.

C4.5 algorithm flowchart
The calculation of the information gain rate is actually a process of normalizing the information gain, and the concept of split information is used in the calculation of the information gain rate.
The split information of 
The C4.5 algorithm information gain rate is calculated as:
For the problem of high time complexity of C4.5, the concept of equivalent infinitesimal is utilized to reduce the computation time of the decision tree. In addition, due to the application of this algorithm to the prediction of English vocabulary learning difficulty, in order to improve the accuracy of the algorithm, a balancing coefficient 
The concept of equivalent infinitesimal in Taylor’s formula is introduced to reduce the computational cost of the decision tree, so as to achieve the purpose of saving the decision tree generation time. According to the definition of Taylor’s series [32] a tetrad of infinitely differentiable functions 
Then when 
So when the value of 
The simplification leads to the simplified C4.5 algorithm’s information gain rate formula (25):
After simplification, the formula is changed from logarithmic operation to the basic operation of addition, subtraction, multiplication and division, which improves the efficiency of the algorithm.
In the process of calculating the information gain rate, a balancing degree factor 
Among them:
Then after simplification:
In order to avoid the influence of subjective factors, different scores are set according to the importance of different English vocabularies 
The vector features of the seven dimensions of English vocabulary with different learning difficulties in the dataset were input into the DBSCAN algorithm, and the grid search was carried out by comprehensively considering the number of clusters and the DBI metrics, and finally all the words in the dataset were clustered into six clusters, and the results are shown in Table 1.
Word memory retrieval difficulty clustering results
| Cluster | 0 | 1 | 2 | 3 | 4 | 5 | 
|---|---|---|---|---|---|---|
| Quantities | 233 | 17 | 4 | 6 | 6 | 4 | 
| 1 try | 0.31 | 1.28 | 0.28 | 0.02 | 0.21 | 0.41 | 
| 2 tries | 5.02 | 14.51 | 1.96 | 1.35 | 4.02 | 10 | 
| 3 tries | 22.73 | 33.28 | 10.27 | 9 | 25.27 | 37.04 | 
| 4 tries | 34.82 | 30.22 | 24.26 | 26.01 | 40.97 | 35.01 | 
| 5 tries | 24.52 | 15.25 | 31.81 | 36.41 | 22.54 | 13.21 | 
| 6 tries | 10.65 | 5.19 | 25.25 | 23.01 | 6.19 | 3.2 | 
| 7 or more | 1.74 | 0.84 | 6.5 | 3.83 | 0.79 | 0.01 | 
The clusters with very similar seven-dimensional vectors of English vocabulary learning difficulty were merged two by two to obtain three categories of word memory retrieval difficulty: easy, medium, and difficult. The distribution of the mean seven-dimensional vectors of the three difficult words is shown in Figure 2.

The average profile of different difficult words
The prediction of the word “cockroach” yields a result of “medium” with an average of 4.49 guesses. Figure 3 shows the distribution of the seven-dimensional vector features of the word “cockroach” and its comparison with the seven-dimensional vector features of words of different difficulties. The word “cockroach” belongs to the hard words, but it still has a big difference with the difficult words.

The number of word guesses and different difficulty comparisons
The model prediction results were evaluated using the mean absolute error (MAE) and the goodness of fit (
The number of decision trees is set to 1000 and the feature selection ratio is 40%. Train the set of classification models of this paper and calculate their MAE and 
Model set error evaluation index
| Tries | 1 | 2 | 3 | 4 | 5 | 6 | 7+ | 
|---|---|---|---|---|---|---|---|
| MAE | 0.19 | 0.87 | 1.83 | 1.14 | 1.15 | 1.42 | 0.63 | 
| R2 | 0.85 | 0.92 | 0.94 | 0.92 | 0.92 | 0.92 | 0.91 | 
Accuracy, precision, recall, and 
Leave 25% of the samples as the test set and retrain the classification model. By predicting the samples in the test set, the accuracy of the random forest classification prediction model is obtained as 0.988, and the other generalization ability evaluation indexes are shown in Table 3, the generalization ability evaluation indexes of the classification model in this paper are all excellent, with the values above 0.8, and the recall of the simple category is slightly worse, only 0.81, and the model overall performance is excellent.
Classification prediction model generalization ability evaluation index
| Categories | Precision | Recall | F1 | Sample size | 
|---|---|---|---|---|
| Simplicity | 1 | 0.81 | 0.92 | 9 | 
| Medium | 0.97 | 1 | 0.97 | 60 | 
| Difficulty | 1 | 0.99 | 0.98 | 6 | 
| Macroid | 0.98 | 0.95 | 0.99 | 70 | 
| Weighted mean | 0.98 | 0.98 | 0.97 | 70 | 
Robustness refers to how well the model tolerates changes in the data. A model is robust if small deviations in the data have only a small effect on the model’s output. After deleting all structural information of the first and last letters in the 142-dimensional sparse matrix, the C4.5 model set of “4 tries” is trained and the new model is compared with the original model to obtain the results in Fig. 4. Figure 4 shows that after removing 26 features, the R2 of the model does not change much. This indicates that the regression model is robust.

The model reduces the test set R2 after partial input
One of the seven-dimensional vectors is deleted respectively, and the remaining six-dimensional vectors are input into the classification prediction model, and the results are shown in Table 4, which indicate that the C4.5 classification prediction model is robust, and all the indexes are above 0.9.
The model reduces the evaluation of some input
| Delete | Accuracy | Precision | Recall | F1 | 
|---|---|---|---|---|
| - | 0.99 | 0.98 | 0.99 | 0.97 | 
| 1 | 0.96 | 0.95 | 0.93 | 0.93 | 
| 2 | 0.95 | 0.99 | 0.97 | 0.95 | 
| 3 | 0.99 | 0.99 | 0.99 | 0.99 | 
| 4 | 0.94 | 0.96 | 0.96 | 0.95 | 
| 5 | 0.99 | 0.99 | 0.99 | 0.99 | 
| 6 | 0.98 | 0.98 | 0.99 | 0.95 | 
| 7 | 0.98 | 0.99 | 0.99 | 0.99 | 
Under the leadership of OBE concept, English teaching in higher vocational colleges and universities should uphold the principle of “student-oriented and result-oriented”, clarify the educational objectives, give full play to the classroom effectiveness, and reasonably construct the teaching assessment system to promote the continuous progress of students. In the following, the author discusses vocabulary teaching strategies for higher vocational English based on OBE education concepts from several aspects.
First, consider the end as the beginning and establish teaching objectives. Under the OBE teaching mode, when teachers set word teaching goals, they should take students’ expected performance as the starting point and the end point, and use it as a guide to plan the teaching content and methods.
Second, focusing on the outcomes and determining the teaching process. Teachers should make detailed and clear teaching plans at the beginning of each class so that students have clear understanding of their own learning objectives and expected results.
Third, understand the needs and determine the teaching strategies. In teaching practice, teachers need to deeply understand the unique personality and ability of each student, so that they can adjust their teaching strategies in a targeted manner and operate in a diversified teaching mode that is more adaptable to individual needs.
First, multimodal learning. English teaching in higher vocational colleges and universities has widely used the “multimodal” learning strategy, which aims at mobilizing multi-sensory experiences such as visual, auditory and tactile senses in order to strengthen students’ language skills.
Second, vocabulary contextualization. English vocabulary contextualization refers to learning in a specific context, so that students can learn and understand vocabulary in the actual environment. In this way, it can not only enhance students’ memory of new vocabulary, but also enable them to understand the meanings and usages of these vocabulary words in real contexts, matching the OBE teaching concept of focusing on effectiveness.
Third, vocabulary interactive exercises. As a teaching strategy that integrates language communication, cooperation, and competition, vocabulary interactive practice can achieve student-centered interactive practice and enhance students’ learning interest and effectiveness.
Fourth, vocabulary classification and generalization. Vocabulary classification and induction is an effective learning strategy in higher vocational English courses, the core concept of which is to guide students to systematize and memorize new vocabulary in a logical order in order to improve the efficiency of memorization and learning effectiveness.
First, design an evaluation index system. In order to deeply and systematically evaluate the effectiveness of higher vocational English teaching practice based on the OBE education concept, teachers should construct a comprehensive evaluation system, which covers three dimensions: “learning outcomes”, “teaching process” and “teaching environment”.
Second, a combination of qualitative and quantitative. In order to ensure the comprehensiveness and accuracy of the course effects, teachers should adopt a combination of qualitative and quantitative assessment methods. In qualitative assessment, relevant data can be collected with the help of diversified ways such as listening and evaluating activities and interviews with teachers and students, in order to understand and grasp the actual experience of teachers and students on the education model in a more in-depth manner, and then the various types of information collected can be analyzed in-depth and compared, so as to make clear the strengths and weaknesses of this education method.
Third, the analysis of the evaluation results and verification of effectiveness. The analysis of the evaluation results and the verification of the effectiveness should start from the following aspects: evaluate the effect of the education model in improving students’ English practice ability and cultivating their self-directed learning and teamwork ability; Ensure that the classroom teaching strategies are fully adapted to the actual needs of students, and stimulate their subjective initiative and enthusiasm; focus on the contribution of the curriculum in shaping students’ innovative thinking and practical skills; Emphasis is placed on the potential of the education model to improve the quality of teachers, especially their professional quality.
Fourth, self-referential assessment is emphasized. Self-referential assessment is an emerging mode of assessment in the current field of education, which encourages students to look at their own learning process from a developmental perspective, based on their current level of understanding of the subject matter, their mastery of skills, and the dynamics of their feelings and emotions.
In order to test whether the vocabulary learning attitudes of the students in the experimental class after the experiment changed significantly compared with those before the experiment, the author distributed scales for pre-testing, planned to distribute 50 scales, and actually distributed 50 scales and recovered 50, with a recovery rate of 100%. The collected data were analyzed using the paired samples t-test in SPSS software, and the results of descriptive statistics and paired samples t-test are shown in Table 5 and Table 6 respectively. From the descriptive statistics of the experimental class students’ vocabulary learning attitude scale pre-experiment and post-test, it can be seen that the average score of the pre-measurement scale of learning attitude of the experimental class students is 71.352, and the post-test is 78.045, and the average score of the post-test is higher than the pre-measurement by 6.693 points. From the analysis of paired samples t-test, it can be found that the two-sided sig value (p-value) used to detect the significance is 0.000 less than 0.05, which indicates that the difference between the vocabulary learning attitudes of the students in the experimental class before and after the experiment is obvious, i.e., after the application of the improved algorithm of C4.5 and the adjusted teaching strategies, the students of the experimental class have obvious changes in their attitudes toward English vocabulary learning.
The study attitude scale was descriptive statistics
| Mean | N | SD | SEM | ||
|---|---|---|---|---|---|
| Learning attitude | Pre-exp. | 71.352 | 50 | 9.451 | 1.289 | 
| After-exp. | 78.045 | 50 | 11.568 | 1.536 | 
Test the test of the study attitude scale before and after the test
| Pair difference | t | df | Sig.2 | |||||
|---|---|---|---|---|---|---|---|---|
| Pre-exp. | Mean | SD | SEM | 95%CI Upper limit | 95%CI Lower limit | |||
| After-exp. | -6.1694 | 7.1562 | 0.9845 | -8.1456 | -4.1977 | -6.279 | 49 | 0.000 | 
The experimental class learning attitude pre-measurement scale cited the previous scale, which was used to investigate the experimental class students in four aspects, namely, attitude during vocabulary learning, attitude during vocabulary literacy, attitude during vocabulary review, and frequency of vocabulary review. The pre- and post-test scores of each dimension for each student were analyzed with the paired-samples t-test in SPSS in order to understand the changes in these four dimensions, i.e., the changes in students’ attitudes toward vocabulary learning before and after the experimental class. The results of descriptive statistics and paired samples t-test for each dimension are shown in Tables 7 and 8, respectively.
Descriptive statistics before and after dimensions
| Mean | N | SD | SEM | ||
|---|---|---|---|---|---|
| Lexical acquisition | Pre-exp. | 29.548 | 50 | 4.6258 | 0.6358 | 
| After-exp. | 31.056 | 50 | 5.4698 | 0.7589 | |
| Memorize words | Pre-exp. | 20.456 | 50 | 4.0658 | 0.5546 | 
| After-exp. | 21.598 | 50 | 4.2879 | 0.5839 | |
| Review the main ways of words | Pre-exp. | 11.395 | 50 | 2.7892 | 0.3476 | 
| After-exp. | 13.245 | 50 | 3.2689 | 0.4215 | |
| Review the frequency of words | Pre-exp. | 8.657 | 50 | 1.4539 | 0.2103 | 
| After-exp. | 10.698 | 50 | 1.4258 | 0.1987 | 
Test the matching sample t before and after the dimensions
| Pair difference | t | df | Sig.2 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Mean | SD | SEM | 95%CI Upper limit | 95%CI Lower limit | |||||
| Lexical acquisition | (Pre-exp.)-(After-exp.) | -1.6985 | 5.0598 | 0.6985 | -3.2655 | -0.3036 | -2.436 | 49 | 0.019 | 
| Memorize words | (Pre-exp.)-(After-exp.) | -1.3356 | 4.0987 | 0.5698 | -2.4561 | -0.2152 | -2.368 | 49 | 0.023 | 
| Review the main ways of words | (Pre-exp.)-(After-exp.) | -1.3025 | 3.1456 | 0.4316 | -2.1625 | -0.4358 | -3.025 | 49 | 0.006 | 
| Review the frequency of words | (Pre-exp.)-(After-exp.) | -1.6548 | 1.9784 | 0.2874 | -2.2358 | -1.1167 | -6.136 | 49 | 0.000 | 
The differences between the mean scores of the four dimensions of the post-test learning attitude scale and the mean scores of the pre-test learning scale in the experimental class are 1.508, 1.142, 1.85, and 2.041, respectively. In other words, the experimental class students’ scores on the four dimensions of the scale, namely, attitude when learning vocabulary, attitude when recognizing vocabulary, attitude when reviewing vocabulary, and frequency of reviewing vocabulary, have increased compared to the scores on the pre-test scale after the experiment. From the paired samples t-test of the pre- and post-tests of the dimensions of the learning attitude scale, it can be seen that the two-sided sig values (p-values) used to detect significance are 0.019, 0.023, 0.006 and 0.000 respectively, which explains that there is a significant difference between the scores of the experimental class on these four dimensions after the experiment compared with the pre-experiment, i.e., based on the improvement of the C4.5 algorithm and the adjustment of the teaching strategy of the English vocabulary teaching has a positive effect on students’ attitudes when learning vocabulary, attitudes when recognizing words, ways of reviewing vocabulary, and frequency of reviewing words.
In this paper, we propose an English vocabulary learning difficulty prediction method based on DBSCAN clustering and C4.5 classification prediction algorithm. The model can predict the vector representation of seven dimensions of English vocabulary learning difficulty, and also predicts a more concise and obvious difficulty classification, i.e., “easy”, “medium”, “difficult” three The experiments show that the model in this paper has good performance. The experiments show that the model in this paper has very good good goodness-of-fit and prediction accuracy, and its prediction accuracy reaches 0.988. In addition, the model in this paper also has very good robustness as a whole, and the R2 of the model does not change much after deleting some of the features.
Applying the methodology of this paper to the practice of English vocabulary teaching and proposing the adjustment method of English vocabulary learning strategies, the results of the practice show that the teaching strategy based on the concept of OBE education can effectively improve the level of English vocabulary teaching and effectively improve the students’ attitudes towards vocabulary learning. The four dimensions related to vocabulary learning attitudes examined also showed more significant improvements, and the average scores of the posttest in the four dimensions of attitudes when learning vocabulary, attitudes when recognizing vocabulary, attitudes when reviewing vocabulary, and frequency of reviewing vocabulary were improved by 1.508, 1.142, 1.85, and 2.041 points, respectively, when compared with the pre-test.
