Research on the Optimization of English Teaching Mode and Personalized Learning Path in Colleges and Universities Based on Big Data Regression Analysis

Innovative teaching mode is the key to improve the quality of teaching and cultivate students’ core competitiveness, exploring and applying data-driven teaching mode innovation has become crucial, through data analysis teachers can deeply understand students’ learning needs, learning progress and learning styles, so as to target the design and adjustment of teaching content and methods, personalized teaching has become possible to help each student to achieve better learning results [1-4]. However, to realize the innovation of teaching mode, it is necessary to overcome various challenges, continuously improve the technical infrastructure, ensure data security and privacy protection, and improve the professionalism and training level of teachers [5]. Only in this way can colleges and universities meet the challenges in the field of education and create a more valuable educational environment for students’ learning and development.

In the era of big data, the innovation of English teaching mode in colleges and universities has become the focus of extensive attention in the academic and educational circles. With the rapid development of information technology, traditional English teaching methods gradually appear to be lagging behind and cannot fully meet students’ language learning needs in the context of diversified disciplines and globalization [6-7]. The popularization of big data has led to an unprecedented shift in educational methods, and how to make full use of these technical means to improve the quality of English teaching in colleges and universities is an urgent problem to be solved. The changes brought by the era of big data are not only the updating of technology, but also the challenge to the traditional English teaching concepts and methods. Students’ subject backgrounds are becoming more and more diversified, and the social demand for English application ability and practical talents is also more complex and diverse, which requires the English teaching mode in colleges and universities to have stronger adaptability and innovation [8-10]. Therefore, in-depth research on English teaching mode in colleges and universities in the era of big data is imperative in order to provide more targeted educational programs and promote the overall development of students in language proficiency, innovative thinking, personalized learning and other aspects [11-14].

Literature [15] analyzed the LAN computer-optimized English teaching model, including classroom structure, classroom practice, teaching system, and classroom construction strategy, and the optimized teaching model significantly promoted teaching efficiency and teacher-student communication. Literature [16] established a user-driven rapid response knowledge space with multimodal subject knowledge resource integration to optimize the student-oriented English teaching mode under the guidance of educational objectives and improve the classroom interaction rate. Literature [17] computed English teaching data in colleges and universities by means of neural network algorithms, and realized the innovation and optimization of English teaching under the conditions of cognitive process simulation. Literature [18] explored the possibility of optimizing the teaching mode under the combination of deep learning and English blended teaching, and revealed the importance of balancing the English online resources with the traditional education mode by comprehensively analyzing the integration principle of deep learning and the blended teaching method, but did not have any outstanding features for the teaching mode combining the two compared with the traditional teaching mode. These current optimizations of the English teaching mode mainly promote the effect of teaching interaction and teaching efficiency, as well as the innovation of the teaching mode itself, and lack of specific references to the optimization of the teaching mode to explore.

In addition, the literature [19] uses the improved Drosophila optimization adjustable recurrent neural network to realize students’ personalized English learning, mainly through the analysis and evaluation of English learning data, and then extracts the features, according to which the personalized learning path recommendation is carried out. Literature [20] analyzed learning under intelligent education technology in business English through explanatory sequential mixed methods, showing that personalized learning concepts have a positive relationship with students’ motivation, commitment, and achievement, and that intelligent education technology promotes the development of personalized teaching. Literature [21] used big data technology to study the English personalized teaching strategies and learning analysis, mainly data mining technology to analyze the students’ learning situation in the absorption, while analyzing the personalized learning effect of the WeChat personalized network platform, but did not specifically introduce the personalized path. For the above introduction of personalized learning research to see, path planning research is relatively small, although the literature [22] through the self-mapping learning path theory to explore, which is a student can be based on their own learning situation and course design for multi-level personalized path development. However, this path puts high demands on students’ self-control, knowledge, and self-understanding, and is not practical for students with poor self-control, weak knowledge, and poor self-perception, so a personalized learning path with generality is needed to meet the needs of students at different stages and achieve personalized learning in the true sense.

And big data regression analysis is to explore the interrelationship between two or more variables by constructing a numerical model to predict the value of the dependent variable under the condition of known independent variables [23]. It gives a strong guidance for the optimization of teaching mode and personalized learning path planning under the influence of multiple factors in English teaching.

Based on the logistic regression model, an ordered multicategorical logistic regression model is constructed. This model is used to analyze the influencing factors of English teaching modes in colleges and universities. Subsequently, the modern and traditional English teaching models were integrated, while a learner state model based on online learner behavior was proposed to recommend a personalized learning path for learners that meets their learning state according to the judgment of the learner state, and finally an accurate personalized learning path based on the learner state was designed.

2

Optimization of English Teaching in Colleges and Universities Based on Big Data Regression Analysis

2.1

Big data regression modeling

2.1.1

Mathematical principles

Logistic regression algorithms can predict the likelihood of an event occurring under the action of a variety of different input variables, and can also be regarded as two opposing events, such as the occurrence of A and the non-appearance of A events for the credibility of the analysis, if greater than the manually pre-determined value of the interval between the judgement of the occurrence of A, and conversely, the non-appearance of A. Logistic regression algorithms, whether it is in the model of the training set of the learning, and in the model of the prediction of the time consumed, and the time consuming. Compared with the support vector machine, neural networks and other autonomous optimization training learning algorithms have a great advantage [24].

The common application of logistic regression is mainly reflected in the binary classification problem, logistic regression in the binary classification problem in the classification process only distinguish between 0 and 1 class, its probability distribution can be expressed as: (1) $P (Y = 1 | x, w) = \frac{e^{w x + b}}{1 + e^{w x + b} + b} = \frac{1}{1 + e^{(w x + b)}}$ (2) $P (Y = 0 | x, w) = \frac{1}{1 + e^{w x + b}}$

Eq:

w - model weight coefficients.

x - Input variables.

Y - Output results.

b - bias.

Where x ∈ Rⁿ, w ∈ Rⁿ, b ∈ {0, 1}, Y ∈ {0, 1}. x is the example of the input characteristics of the variables, w, b are the basic parameters of the model, Y is the regression model output results, P(Y = 0), P(Y = 1) can be expressed as the output results of the credibility of the 0 category and the output results of the credibility of the 1 category, respectively.

When the case of multiple inputs occurs, it is necessary to expand the model weight vector and input variables, but in this paper, it is still notated as w, x, $w = {(w^{1}, w^{2}, \dots, w^{n}, b)}^{T}$ , $x = {(x^{1}, x^{2}, \dots, x^{n}, 1)}^{T}$ , where wⁱ, x^j denote the ith dimension of the w, x vectors, respectively. At this time the mathematical expression of the logistic regression algorithm can be expressed as: (3) $P (Y = 1 | x, w) = \frac{1}{1 + e^{- w} T_{x}}$ (4) $P (Y = 0 | x, w) = \frac{1}{1 + e^{w} T_{x}}$

For convenience of presentation, the above two equations are unified in the following equation: (5) $f (x) = \frac{1}{1 + e^{- w^{T} x}}$

Equation (5) is known as the logistic regression function.

2.1.2

Model training

In order to obtain the optimal solution of the logistic regression model coefficients, the gradient descent method is commonly used to optimize the model coefficients during the training process of the logistic regression model. The gradient descent method determines the maximum step size of regression coefficient training based on the deviation between the actual results and the predicted results as well as the learning rate (the set parameter), and then adjusts it after several iterations to obtain the optimal regression coefficients.

Assuming that there is m sample data and the samples are labeled with only 0 and 1 categories, the logistic regression model can be optimized by training the model weights using great likelihood estimation. We denote P(Y = 1|w, x) as P_i and P(Y = 0|w, x) as 1 − P_i, then the observation probability can be denoted as: (6) $P (y_{i}) = P_{i}^{y_{i}} * {(1 - P_{i})}^{1 - y_{i}}$

where y_i is the output, and the maximum likelihood function for y_i ∈ {0, 1}, m samples is: (7) $L (w) = \prod_{i}^{m} f {(x_{i})}^{y_{i}} * {(1 - f (x_{i}))}^{1 - y_{i}}$

obtained by taking the above equation in logarithms: (8) $\begin{array}{rcl} l (w) & = & \sum (y_{i} * \ln f (x_{i}) + (1 - y_{i}) * \ln (1 - f (x_{i}))) \\ = & \sum y_{i} (\frac{\ln (f (x_{i}))}{1 - f (x_{i})} - \ln (1 - f (x_{i}))) + \ln (1 - f (x_{i})) \\ = & \sum (y_{i} * (w^{T} x_{i}) - \ln (1 + e^{w^{T} x_{i}})) \end{array}$

Eq. (8) can be regarded as the logarithmic loss function of the logistic regression function, which can be obtained by taking the derivative of J(w) = − 1/m × l(w) with respect to the loss function J(w): (9) $\begin{array}{rcl} \frac{d J (w)}{d w_{j}} & = & - \frac{1}{N} \sum_{i = 1}^{m} (y_{i} \frac{1}{f (x_{i})} \frac{δ}{δ w_{j}} f (x_{i}) - (1 - y_{i}) \frac{1}{1 - f (x_{i})} \frac{δ}{δ w_{j}} f (x_{i})) \\ = & - \frac{1}{N} \sum_{i = 1}^{m} (y_{i} - f (w^{T} x_{i})) x_{i}^{j} \\ = & \frac{1}{N} \sum_{i = 1}^{m} (f (x_{i}) - y_{i}) x_{i}^{j} \end{array}$

After obtaining the derivatives of the weights, the parameter update function is calculated based on the deviation of the derivatives that occurs at each training: (10) $w_{j} = w_{j} - α \frac{1}{N} \sum_{i = 1}^{m} (f (x_{i}) - y_{i}) x_{i}^{j}$

The updating formula of the weights is shown in equation (10), logistic regression after each prediction of the training samples, the deviation occurs after the logarithmic loss function to get the deviation of the target, as shown in equation (9), and then according to this deviation on the weights w_j tuning, which α is used to regulate the updating of the step coefficients, to prevent over-compensation or under-compensation.

In order to prevent overfitting phenomenon in model training and increase the generalization performance of the logistic regression model in the prediction process, people often tend to add the regular term L₂ paradigm in the logistic regression training process. Where L₂ the regular term expression is shown in equation (11). (11) $L_{2} = \sum_{i = 1}^{m} {(y_{i} - f (x_{i}))}^{2}$

The loss function formula after adding the L2 regular term becomes: (12) $J (w) = \frac{1}{2 m} \sum_{i = 1}^{m} {(f_{i} (x) - y_{i})}^{2} + λ \sum_{j = 1}^{m} w_{j}^{2}$

The update of the regularized gradient descent method w becomes: (13) $w_{j} = w_{j} - α \frac{1}{N} \sum_{i = 1}^{m} (f (x_{i}) - y_{i}) x_{i}^{j} - \frac{λ}{m} w_{j}$

where λ is the regularization factor.

2.2

Analysis of Influencing Factors of English Teaching in Higher Education Institutions

2.2.1

Data sources

The questionnaires were distributed through the public online questionnaire platform, Questionstar, and 545 online questionnaires were retrieved from September 2023 to October 2023, with 531 valid questionnaires and a validity rate of 97.43%, and all of these learners had at least half a year’s experience of studying in university English courses.

2.2.2

Questionnaire design

On the basis of defining each variable, the author adopted a 5-level scale to design the questionnaire, in the form of objective multiple-choice questions, including four major sections: learner factors, teacher factors, online course factors and environmental factors. The questionnaire was modified to take into account my own observations of English classroom teaching: learner factors include learner motivation and learning strategies, totaling six items; teacher factors include professionalism and teaching guidance, totaling five items; course factors include content format, totaling five items; and environmental factors include platform design and teaching interaction, totaling six items. Before the formal administration of this study, the initial scale was pilot tested in a small area, the 50 questionnaires collected at the initial stage were factor analyzed, and the questionnaires were modified on the basis of categorization and analysis, and the formal questionnaire was finally determined to be administered to the participants.

2.2.3

Big data regression analysis

The ordered multicategorical Logistic regression model is a probabilistic nonlinear regression model that is suitable for analyzing the relationship between an ordered multicategorical dependent variable and multiple independent variables. The model does not require the variables to obey a normal distribution, and its independent variables can be continuous or discontinuous, and it is most appropriate for discrete, hierarchically categorized dependent variables [25]. The basic idea of ordered multicategorical logistic regression model is to partition the dependent variable into two classes, for which a logistic regression model with dichotomous dependent variable is built.

Let the ordered dependent variable y be divided into k classes: 1, 2, ⋯, k; y the probability π_r = P(y = r|x), p_r = p(y ≤ r|x), r = 1, 2, ⋯, k of the class r taken and satisfying $\sum_{r = 1}^{k} π_{r} = 1$ . If the k classes are divided into two main categories, {1, 2, ⋯, s} and {s + 1, s + 2, ⋯, k}, where s = 1, 2, ⋯, k − 1 so, the ordered multicategorical dependent variable can be regressed according to the Logistic regression model for two-categorical dependent variables. Therefore, a k − 1 two-categorical Logistic regression equation needs to be fitted as shown in equation (14): (14) $\begin{matrix} L_{s} = \ln (\frac{p_{s}}{1 - p_{s}}) = α_{s} + \sum_{i = 1}^{q} β_{i} x_{i} \\ s = 1, 2, \dots, k - 1 \end{matrix}$

where L_s is the snd cumulative logistic regression model, $(x_{1}, x_{2}, \dots, x_{q})$ is the vector of independent variables, q is the number of independent variables, α_s is the intercept term parameter, and β_i is the bias regression coefficient.

The parameter estimates of the model can be derived using the great likelihood method. Assuming that n_j(x) is the number of sample observations at level j under condition x and G is the number of combinations at each level of x, the log-likelihood function is: (15) $\ln L = \sum_{j = 1}^{k} \sum_{g = 1}^{G} \ln {P (y = j | x)}^{n_{j} (x)}$

Where P(y = j|x) = π_j = p_j − p_j−1 is extremely large. The calculation of the function solution needs to be done iteratively using a computer, and the article uses the statistical software R language for the operation.

In this paper, the teachers’ evaluation performance is divided into five grades, and it is set as the dependent variable, which takes the value of y = 1, 2, 3, 4, 5. Therefore, this paper uses the ordered multicategorical Logistic regression model to model and analyze the actual situation to find out the key factors affecting the performance of college teachers’ evaluation.

In order to test the accuracy of the model, 120 samples are taken as the test set to test the model fitting effect; the remaining 360 samples are used as the training set to fit the model. An ordered multicategorical Logistic regression model is built for the training set, and combined with the polr function using the MASS package of the statistical software R language, the parameter estimation results of the full model are obtained as shown in Table 1. From the table, it can be seen that there are many variables with small t-values, with a minimum value of -13.92, and there may be a problem of multicollinearity between the independent variables.

Table 1.

Full model parameter estimation result

Variable	Coefficient	Standard error	T value
Intercept 1\|2	4.47	1.71	2.47
Intercept 2\|3	4.03	1.69	2.87
Intercept 3\|4	6.02	1.66	3.48
Intercept 4\|5	8.11	1.66	4.67
Learner factor (x₁)	1.27	0.28	4.44
Teacher factor (x₂)	0.29	0.32	0.83
Online course factor (x₃)	0.68	0.35	1.86
Environmental factor (x₄)	0.53	0.81	0.62
Learner motivation (x₅)	0.02	0.01	1.84
Learning strategy (x₆)	-0.01	0.01	-1.02
Professional literacy (x₇)	-0.69	0.36	-1.79
Teaching guidance (x₈)	-12.57	0.88	-13.92
Content form (x₉)	0.02	0.01	3.31
Platform design (x₁₀)	0.43	0.03	12.41
Teaching interaction (x₁₁)	0.35	0.05	2.47
Residual error	409.00	AIC	439.00

In order to optimize the model, the independent variables were screened using the backward stepwise regression method, and the insignificant variables were gradually eliminated: x₄ (environmental factors), x₁₀ (platform design), x₉ (content form), x₂ (teacher factors), x₆ (learning strategies), x₈ (instructional guides), and x₅ (learner motivation), and then regression analysis was done again for the remaining variables using the ordered multicategorical logistic model. The regression results and test results are shown in Tables 2 and 3.

Table 2.

Model regression

Variable	Coefficient	Standard error	T value
Intercept 1\|2	-4.41	0.61	-6.97
Intercept 2\|3	-3.91	0.52	-7.48
Intercept 3\|4	-2.91	0.38	-7.48
Intercept 4\|5	-0.89	0.29	-2.95
learner factor (x₁)	1.28	0.26	4.45
course factor (x₃)	0.79	0.35	2.18
Environment factor (x₄)	-0.85	0.30	-2.01

Table 3.

Test result

Variable	LR card	Freedom	Significance	95% confidence interval
Variable	LR card	Freedom	Significance	Lower limit	Upper limit
learner factor (x₁)	19.2045	1	5.786e-05^***	1.1963	1.3457
course factor (x₃)	4.3988	1	0.018675^*	0.75478	0.84279
Environment factor (x₄)	8.7842	1	0.001679^**	-0.8317	-0.9167
Residual error	421.78
AIC value	435.78
-2 log likelihood	1317.39

As shown in Tables 2 and 3, the P-value of each variable in the model is less than 0.02, which is significant; moreover, the AIC value of the model is 435.78, which is a relatively small amount of deficit pool information. Overall, the model has a good fitting effect.

The accuracy of the model is tested using the test set. By using the R software predict function, the predictions are obtained and the predicted and true values are compared and analyzed and the results are shown in Table 4.

Table 4.

The proposed effect of the predicted value is analyzed

Predictive value (grade)	True value
Predictive value (grade)	1	2	3	4	5
1	0	0	0	0	0
2	0	0	0	0	0
3	0	0	0	0	0
4	0	0	0	0	1
5	0	1	4	14	95

From Table 4, it can be seen that there is no corresponding predicted value for the test value of true value 1, while the predicted values of true value 2, 3 and 4 are all 5, the number of which are: 1, 5 and 15 respectively.There is only 1 test value of true value 5 corresponding to test value 4, and the corresponding test value of 5 is 95. Therefore, the overall accuracy of the model is 95/120 = 0.7917.

Since the predicted value for a true value of 3 is 4, and the predicted value for a true value of 5 is 3, which is close to being correct, the correctness rate is approximately equal to (1+14+95)/120=0.92. It is a better prediction, thus suggesting that the chosen regression model is more reasonable. From this, we can conclude: The main factors that can reflect the influence of English teaching in colleges and universities are learner factors, curriculum factors and environmental factors.

3

Optimization of English Teaching Mode in Colleges and Universities by Integrating Personalized Learning Paths

3.1

English Teaching Mode in Colleges and Universities Supported by Information Technology

3.1.1

Teacher’s classroom dominance

In the English teaching mode of colleges and universities supported by information technology, information technology is used as a teaching aid to join the teaching process, the teacher in the English classroom teaching process, still occupies a dominant position, the teacher needs to guide the students in all aspects of the learning process to solve the problems and difficulties encountered by the students in a timely manner, so that in the actual process of teaching, in accordance with the actual learning situation of the students, to carry out the Targeted English teaching, improve the quality and efficiency of English classroom teaching.

3.1.2

Introduction of information technology

In the English teaching mode of colleges and universities supported by information technology, information technology still has certain advantages, and teachers should make full use of these advantages to create a good learning environment for students’ English learning and improve students’ learning efficiency. Teachers in the actual teaching process, but also according to the actual situation of the students, to develop appropriate learning programs, the corresponding teaching preparation, to provide students with targeted teaching.

3.1.3

Integration of modern teaching methods with traditional teaching modes

In the English teaching mode of colleges and universities supported by information technology, although information technology has certain advantages, the traditional teaching mode also has certain educational value, information technology can provide students with rich learning content, and the traditional teaching mode can better integrate the process of students’ learning knowledge, deepen students’ understanding and mastery of knowledge, and can effectively improve the efficiency of classroom teaching. Therefore, in the actual teaching process, it is necessary to combine modern teaching methods with traditional teaching methods, so as to improve the quality and efficiency of English teaching in colleges and universities.

3.2

Learner State Modeling

3.2.1

Knowledge point difficulty modeling

Starting from the perspective of the implicit relationship that exists between learners’ online learning behaviors and the difficulty of knowledge points, which is often overlooked, this study takes into account the relationship between a series of online learning behaviors (including video viewing behaviors, forum interaction behaviors, and practicing behaviors) and the difficulty of knowledge points in the process of trying to judge the difficulty of knowledge points. In this study, in order to abstract the relationship between different parameters, the author constructed a model for measuring the difficulty of a specific knowledge point for general students, i.e., a knowledge point difficulty score model based on learner behavior. The specific formula is shown in Equation (16). (16) $d i f f_{(j)} = ω_{1} [1 - \bar{s c o} (j) + ω_{2} \bar{r e p} (j) + ω_{3} \bar{c o m} (j)]$

Where, $\bar{r e p} (j)$ , $\bar{c o m} (j)$ represent the quantitative results after normalizing the learning behavior data of the learners during the online learning process. $\bar{S C O} (j)$ is the practice test score obtained from the online learners’ practice after watching the video.

In Eq. (16), ω₁, ω₂, and ω₃ are the weights of the input parameters, and $\bar{s c o}$ , $\bar{r e p}$ , and $\bar{c o m}$ are the average test scores of the ith knowledge point for the jth student, the total number of repeated viewings of the video of the ith knowledge point for the jth student, and the total number of discussions of the ith knowledge point for the jth student, respectively. The specific calculations are as in Eqs. (17), (18) and (19). (17) $\bar{s c o} (j) = \frac{\sum_{i = 1}^{N_{j}^{h i s t o r y}}^{s c o i j}}{N_{j}^{h i s t o r y}}$ (18) $\bar{r e p} (j) = \frac{\sum_{i = 1}^{N_{j}^{h i s t o r y}} r e p i j}{N_{j}^{h i s t o r y}}$ (19) $\bar{c o m} (j) = \frac{\sum_{i = 1}^{N_{j}^{h i s t o r y}} c o m i j}{N_{j}^{h i s t o r y}}$

The input parameter of the Knowledge Point Difficulty model is the average historical learning performance of all users who have learned the knowledge point $(1 - \bar{s c o} (j) + ω_{2} \bar{r e p} (j) + ω_{3} \bar{c o m} (j))$ , and the output is the difficulty of mastering the knowledge point $(d i f f f_{(j)})$ . After the calculation, a larger value of diff_(j) means that the knowledge point is more difficult for the student.

3.2.2

Judgment of learning status

The assessment of learner state values sets the foundation for subsequent personalized path planning. In order to realize the judgment of students’ learning knowledge state, this study constructs a learner state model based on learners’ online learning behavior. The model aims to objectively evaluate the learner’s mastery state of knowledge points and assign values to them. The input parameters of the learner state judgment model are the individual learner’s video viewing behavior of a specific knowledge point and its practice test results, and the output is the individual learner’s mastery of a specific knowledge point, i.e., the state value judgment.

The judgment process of learners is mainly the following steps: 1)

The online learner follows the initial sequence of course planning.

2)

The learner watches videos and completes chapter test questions.

3)

The learner’s learning status is determined based on the normalized values of the learner’s online learning behavior and practice test results.

4)

Plan the appropriate next knowledge point sequence for the learner based on the learner status, i.e., complete the personalized path planning process.

The learning behavior of the online learner will be recorded in the form of log data and the results of the completed test will be normalized. In this study, the author divides the learning status of online learners into four states, namely the state of “not learning”, the state of “not mastering”, the state of “insufficient mastery” and the state of “mastered”. The state value of the “unlearned” state is assigned as 1; The status value of the “not mastered” learning state is assigned to 2; The state value of the “insufficient mastery” learning state is assigned to 3; The status value of the Mastered learning status is assigned to 4.

3.3

Personalized learning paths based on learner state

3.3.1

Formal Representation of Learning Paths

A learning path is an ordered sequence of learning content and learning activities experienced by a learner in the learning process, in which the learner realizes the learning of basic knowledge, the mastery of the method system, and the completion of the problem solving and tasks, so as to enhance the corresponding competence [26]. Therefore, the learning path can be represented by a three-dimensional vector matrix including three dimensions: knowledge $(K b)$ , method $(M t)$ and problem task $(Q s \cup T s \cup C s)$ .

Learning path formalized representation: (20) $\begin{matrix} [Q s_{1} (A_{i}) \cup T s_{1} (A_{i}) \cup C s_{1} (A_{i}), Q s_{2} (A_{i}) \\ \cup T s_{2} (A_{i}) \cup C s_{2} (A_{i}), \dots \dots, Q s_{n} (A_{i}) \cup T s_{n} (A_{i}) \\ \cup C s_{n} (A_{i}) M t_{1} (v_{1}, r_{1}), M t_{2} (v_{2}, r_{2}), \dots \dots, M t_{n} (v_{n}, r_{n}) \\ K b_{1} (M t_{i}), K b_{2} (M t_{i}), \dots \dots, K b_{3} (M t_{i})] \end{matrix}$

Where, 1, 2 until n indicates a total of n learning paths starting from the 1st, 2nd, i serves as a symbol identifying the learner, v denotes the validity, and r denotes the reliability.

3.3.2

Personalized Learning Path Planning

The learning paths experienced by learners can be divided into mainstream learning paths and personalized learning paths. Mainstream learning paths are simple sequences of learning content and activities that meet the learning needs of most students and are applicable to most students based on the big data and knowledge mapping of the learning outcomes of the student population, which also includes three dimensions: knowledge $(K b)$ , method $(M t)$ and problem task $(Q s \cup T s \cup C s)$ . Among them, knowledge mapping is aimed at achieving students’ learning objectives and cultivating higher-order thinking skills, correlating and graphically presenting the knowledge structure of a subject with the problems or tasks corresponding to the competencies formed based on the knowledge. The mainstream learning path can be represented as: (21) $[\begin{matrix} Q s_{n} \cup T s_{n} \cup C s_{n} \\ M t_{n} \\ K b_{n} \end{matrix}]$

Personalized learning path is a learning sequence based on the analysis of each learner’s learning outcomes, designing learning objectives to meet his/her learning needs, and providing learning content and activities that meet his/her learning style and cognitive characteristics, which are paced and controlled by the learner. A personalized learning path can be expressed as follows: (22) $[\begin{matrix} Q s_{n} (A_{i}) \cup T s_{n} (A_{i}) \cup C s_{n} (A_{i}) \\ M t_{n} (V_{n}, R_{n}) \\ K b_{n} (M t_{i}) \end{matrix}]$

Learning path planning is to match each learner’s learning profile with a learning path that suits the learner’s individual development on the basis of mainstream learning paths.

4

Personalized Learning Path Design for English Teaching in Colleges and Universities

4.1

Learner State Clustering

This study utilized the K-means function of the R software to conduct exploratory cluster analysis on the final estimated attribute mastery probabilities of 531 students from the College of Foreign Languages of a university, and the final cluster class value determined after the cluster analysis of the attribute mastery probabilities was 11 classes, as shown in Table 5.

Table 5.

The properties of the clustering are the probability

Categories	A1	A2	A3	A4	A5	A6	A7	T1	T2	Mean
Ks1	0.977	0.99	0.262	0.172	0.737	0.965	0.839	0.396	0.178	0.613
Ks2	0.751	0.949	0.699	0.272	0.521	0.174	0.358	0.284	0.125	0.459
Ks3	0.751	0.949	0.699	0.272	0.521	0.174	0.358	0.284	0.125	0.534
Ks4	0.992	1	0.883	0.933	0.423	0.99	0.592	0.395	0.792	0.778
Ks5	0.948	0.955	0.381	0.095	0.179	0.902	0.539	0.308	0.242	0.505
Ks6	0.968	0.962	0.686	0.688	0.762	0.95	0.467	0.481	0.556	0.724
Ks7	0.836	0.936	0.223	0.307	0.332	0.974	0.838	0.467	0.809	0.636
Ks8	0.814	0.96	0.244	0.211	0.458	0.841	0.646	0.802	0.161	0.571
Ks9	0.000	0.409	0.001	0.275	0.066	0.000	0.229	0.192	0.005	0.131
Ks10	0.000	0.01	0.137	0.01	0.037	0.042	0.003	0.961	0.48	0.187
Ks11	0.968	0.962	0.686	0.688	0.762	0.951	0.556	0.792	0.539	0.767

In the table, the average attribute mastery probability of ks9 is the lowest at 0.131, the average attribute mastery probability of ks4 is the highest at 0.778, yet the average mastery probability of ks11, which can be categorized as all mastery, is not the highest at 0.767.

4.2

Learning path construction

There are complete 5 learning paths from all attributes are not mastered to all attributes are mastered, the number of people in each path in the state of knowledge is merged, you can get the number of students in these 5 complete learning paths, specific can be obtained as shown in Table 6. From the table, we can know that the number of students in learning path 4 is 245, accounting for 46.14% of the total number of students, from all the attributes have not mastered to master the attributes A1 (basic knowledge) A2 (lexical properties of words) A3 (grammatical composition of the English language) and A5 (translation of English sentences) to achieve the state of knowledge of the ks2; advancement to the ks6 needs to be mastered in the ks9 based on mastering the attributes A4 (reading comprehension of the problem solving) A6 (word mastery) and T2 (relational representation); and finally mastering A7 (English listening) and attribute T1 (the skill of recognizing implicit conditions) to reach the state of full mastery of all attributes at ks11.

Table 6.

Complete learning path type and number

Type	Path process	The general number of cognitive states
1	ks9→ks3→ks11	117
2	ks9→ks10→ks5→ks8→ks11	154
3	ks9→ks10→ks5→ks1→ks11	179
4	ks9→ks2→ks6→ks11	245
5	ks9→ks10→ks5→ks7→ks4→ks11	235

4.3

Determination of individual student learning paths

4.3.1

Individual Learning Path Specific Determination Processes

For example, out of a total of 531 students in this study, using the student’s current learning status as the starting point, then there are 15 learning path types and the number of individuals in each learning path and the corresponding competency values were derived.

The types and numbers of individual learning paths are shown in Table 7. The number of students judged to have mastered all the paths in the table is 22, which is 4.14% of the total number of students. However, Learning Path 13 has the smallest percentage of students of all path types, with only 0.56%. The number of students who were judged to have Learning Path Type 1 was 22.79% of the total number of students, which was the highest percentage of all path types. The specific pathway process is that the student’s current categorized knowledge state is ks4 which means that at this point the student has mastered all of the attributes except for the T1 (skill of recognizing implicit conditions) attribute, on which they have progressed to ks11 which is the state of full mastery. The average competency value for students who are classified as this path type is 0.47.

Table 7.

Individual learning path types and Numbers

Type	Path process	The number of people in the path	Mean capacity
1	ks4→ks11	121	0.41
2	ks1→ks11	55	0.265
3	ks8→ks11	15	-0.102
4	ks3→ks11	85	-0.614
5	ks6→ks11	76	0.574
6	ks2→ks6→ks11	27	-0.201
7	ks7→ks4→ks11	25	0.004
8	ks5→ks7→ks4→ks11	15	0.011
9	ks5→ks8→ks11	6	-0.052
10	ks5→ks1→ks11	15	-0.025
11	ks10→ks5→ks7→ks4→ks11	52	-1.635
12	ks9→ks10→ks5→ks7→ks4→ks11	10	-1.835
13	ks9→ks2→ks6→ks11	3	-1.263
14	ks9→ks3→ks11	4	-1.236
15	Master of	22	1.041

4.3.2

Attribute Mastery for Student #107

A learning diagnostic report for Student 107’s mastery of the attributes based on the above analysis is shown in Table 8, which is suitable for this student. It can be seen from the table that the probability of mastery of attributes A3 (grammatical composition of English), A4 (reading comprehension solutions), A5 (English sentence translation), T1 (skill of recognizing implicit conditions) and T2 (relational representations) is relatively low for student No. 107, and the final judgment is that he has not mastered them. So focus on attributes A3, A4, A5, T1 and T2.

Table 8.

The number is 107 students’ properties

	A1	A2	A3	A4	A5	A6	A7	T1	T2	θ
mp	0.9927	0.9965	0.4527	0.041	0.0332	0.9645	0.5115	0.2601	0.2347	0.113
ks	1	1	0	0	0	1	1	0	0

Note: mp refers to this student’s estimated probability of mastery of the attribute; ks refers to the student’s estimated pattern of mastery.

4.3.3

Recommendations for Student Learning Pathway #107

According to the student’s attribute mastery, it can be known that the knowledge state of student No. 107 is in the position of KS5 in the learning path map, and if you want to advance to the highest level, there are three paths to take, namely: KS5→KS7→KS4→KS11; KS5→ KS1→ KS11 and KS5→KS8→KS11. The first path is to learn attribute T2 first; the second path is to learn attribute A5 first; and the third learning path is to learn attribute T1 first. Then as to which path is the best path for student 106 to choose, by calculating the center distance between ks5 to ks7, ks1, and ks8, which are 0.885761, 0.916453, and 0.977684, respectively. Choosing the distance with the smallest distance, then the final choice of the next level of this student’s state of knowledge will be ks7, which means that the recommended path of study for student 106 is ks5→ks7→ks4→ks11.

5

Conclusion

The influencing factors of English teaching in colleges and universities were analyzed by using ordered multicategorical logistic regression model, and three factors, namely, learner factors, curriculum factors and environmental factors, were mined to have significant influence on the English teaching mode in colleges and universities. Then a model for the learner state is proposed, based on which an accurate personalized learning path planning framework is designed based on learner state. The 531 students were clustered into 11 knowledge states using K-means clustering, which finally resulted in five complete learning paths. The largest number of people on knowledge attributes Ks4 and Ks6 was 122, which accounts for the largest proportion.

For the learning path aspect of individual students, it is based on the estimated knowledge states of the students. If student 106’s knowledge state is in the state of KS5, then there can be three learning paths for this student, all of which can lead to the state of full mastery. Through the calculation, it can be seen that ks5→ks7→ks4→ks11 is the most time-saving path.

Langue:: Anglais

Périodicité:: 1 fois par an
Sujets de la revue:: Sciences de la vie, Sciences de la vie, autres, Mathématiques, Mathématiques appliquées, Mathématiques générales, Physique, Physique, autres

RSS Feed de la revue

Research on the Optimization of English Teaching Mode and Personalized Learning Path in Colleges and Universities Based on Big Data Regression Analysis

Dongmei Li

Publié en ligne: 24 mars 2025

Reçu: 01 nov. 2024

Accepté: 10 févr. 2025

DOI: https://doi.org/10.2478/amns-2025-0794

Mots clésLogistic regression, Teaching model optimization, Personalized learning path, English teaching

© 2025 Dongmei Li, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Mots clés
Logistic regression, Teaching model optimization, Personalized learning path, English teaching