A Study of Goal Motivation Strategies Based on Dynamic Planning Methods to Enhance the Effectiveness of Public English Teaching in Universities

In recent years, public English teaching in universities has faced many challenges. The students on campus have an increasing level of public English proficiency, but they still feel bored with public English learning and find it difficult to persevere until the end. Considering this risk, many teachers keep proposing new teaching approaches to enhance their students’ enthusiasm for public English learning [1-4]. In this situation, goal motivation has become an important hot topic in the academic world, which will motivate students to participate in public English learning and improve their learning attitudes and public English teaching outcomes in public English teaching [5-7].

Generally speaking, the concept of goal motivation is based on learners’ individual abilities and approaches, and it is a learner-centered teaching method used to motivate, guide and encourage learners to realize their self-worth in the process of reaching goals [8-11]. The goal motivation method mainly utilizes the basic principle of “taking the goal as the starting point, guided by realistic feedback, and based on regulating responses”, and divides the whole process of achieving learners’ goal behaviors into four parts: cognition, action, feedback and regulation [12-15].

In public English teaching, goal motivation has a lot in common with other teaching methods. In teaching activities, teachers should, on the one hand, provide students with clear learning goals, provide timely feedback on learning outcomes, help students get correct feedback from mistakes, and motivate students to move forward on learning tasks [16-19]. On the other hand, students should also be allowed to study the learning tasks, solve problems, analyze the learning outcomes, and adjust the learning process according to the learning situation, so as to stimulate learners’ motivation and autonomy [20-23]. Therefore, in public English teaching, teachers should pay attention to students’ participation, and according to the different qualities of students, they can use a variety of ways, such as seminar matches, group discussions, role performances, etc., so as to let the students devote themselves to classroom activities, and to promote the students’ efforts to reach their own English learning goals [24-27].

This paper investigates the design of a goal motivation strategy based on the dynamic planning method and the effect of this strategy on the effectiveness of university public English teaching. The strategy design is implemented by the adaptive learning path recommendation model RL4ALPR, and the students’ knowledge level is deeply tracked and modeled by the deep knowledge tracking model FDKT-ED, which integrates the difficulty of exercises and forgetting behavior. In order to verify the effectiveness of this strategy, the actual efficacy of the FDKT-ED model and the RL4ALPR model were evaluated separately, and teaching experiments were designed to assess their effectiveness in teaching public English courses in universities.

2

A target incentive model of university public English based on dynamic planning methodology

In order to enhance the teaching effectiveness of university public English courses, this paper designs a goal motivation strategy based on the dynamic planning approach, which realizes adaptive dynamic planning of students’ learning paths by applying the FDKT-ED model for in-depth knowledge tracking of students and using the RL4ALPR model framework.

2.1

Deep Knowledge Tracking Model for College Public English

In this paper, based on the traditional DKVMN model [28], a deep knowledge tracking model (FDKT-ED) that integrates the difficulty of the exercises and the difficulty of the answers is proposed by considering the results of the answers and the difficulty of the exercises, optimizing the simulation of the learner’s learning process, and taking the key behavior of forgetting into account in the modeling process.

2.1.1

Basic definitions

Learner learning data generated in online education is usually viewed as a specified sequence of learning, with the answering situation denoted by x_t, where x_t is described as a binary x_t = {e_t, a_t}, e_t denotes the question answered at the moment of t, and a_t denotes the corresponding answering result. In general, a_t takes the binary value 0 or l, indicating whether the question was answered correctly or not.

The knowledge tracking problem is to track and analyze the entire learning process by modeling the learning sequence ${x_{1}, x_{2}, x_{3}, \dots, x_{t}}$ $$\left\{ {{x_1},{x_2},{x_3}, \cdots ,{x_t}} \right\}$$ in chronological order and predicting the performance of the answer at the next moment x_t+1. Define K as the set of knowledge points, E as the set of exercises, and k_t ⊆ K denotes the set of knowledge points involved in the exercises e_t. Matrix M^K(d_k × |K|) represents the embedded representation of all |K| knowledge points, and each d_k column vector represents the embedded representation of one of the knowledge points. Matrix $M_{i}^{v} (d_{v} \times | K |)$ $$M_i^v({d_v} \times |K|)$$ represents the embedding matrix of the student’s knowledge point mastery at the end of the t moment of learning, while matrix $M_{i}^{F V} (d_{v} \times | K |)$ $$M_i^{FV}({d_v} \times |K|)$$ represents the embedding matrix of the student’s knowledge point mastery before the start of the t moment of learning. Matrix $M_{i}^{F V}$ $$M_i^{FV}$$ is obtained from matrix $M_{i - 1}^{v}$ $$M_{i-1}^{v}$$ by forgetting process. Definition level_t is the student’s knowledge mastery at the end of the tth moment of learning, which is represented by a number between (0, 1), where 0 means no mastery at all and 1 means full mastery.

The knowledge tracking model in this paper, whose whole knowledge tracking process not only focuses on the answering situation under the time series, but also combines factors such as the difficulty of exercises, answering intervals, and answering cycles. The model mainly consists of five modules: weight calculation, forgetting processing, learning simulation, result prediction and knowledge level output, and uses LSTM network for modeling, which finally leads to a more accurate prediction result [29].

2.1.2

Calculation of weights

The role of weight calculation is to compute the weights associated with the exercise questions and the corresponding knowledge points. The inputs to the module are the student’s current exercise topic e_t and the set of knowledge points covered by the topic k_t. e_t is then multiplied with the embedding matrix A(d_k × |E|) to obtain a d_k-dimensional exercise embedding vector v_t. The knowledge point embedding vector matrix is N_t, where each d_k-dimensional vector represents a knowledge point embedding vector. The inner product of the exercise embedding vector v_t and the covering knowledge point embedding vector N_i(i) is computed first, and then the inner product is computed by the Softmax function to obtain the associated weight vector w_t of the exercises and knowledge points, i.e.: 1 $w_{t} (i) = Softmax (v_{t}^{T} N_{t} (i))$ $${w_t}(i) = {\text{Softmax}}(v_t^T{N_t}(i))$$

2.1.3

Oblivion

The act of forgetting occurs immediately after the act of learning, and the rate of forgetting slows down gradually. The theory of the forgetting curve shows that students’ forgetting of knowledge is mainly influenced by two aspects: the number of repetitions of learning and the time interval between two learning sessions. In the process of knowledge tracking, there is not only the learning process, but also the forgetting process. The present study proposes four factors for forgetting behavior: the time interval between repetition of the same knowledge point (RK), the time interval from the last learning (RL), the number of repetitions of the same knowledge point (KT), and the degree of mastery of the knowledge point (KM).

Since the forgetting behavior is carried out with respect to the students’ knowledge mastery, the matrix of the students’ forgetting factors regarding each knowledge point is obtained first. First, RK, RL and KT are combined to obtain C_t(i) = [RK(i), RL(i), KT(i)], which represents the first three factors affecting the forgetting process of students on knowledge point i, and then the vector C_i(i) of each knowledge point is combined to obtain the matrix . The student’s mastery matrix of the knowledge point is denoted by $M_{i - 1}^{v}$ $$M_{i-1}^{v}$$, which is the fourth factor of forgetting, KM. C_t is combined with KM to obtain the matrix $F_{t} = [M_{i - 1}^{V}, C_{i}]$ $$F_t=[M_{i-1}^{V},C_i]$$, which represents the four factors affecting the forgetting.

To perform the forgetting process, the knowledge mastery state matrix of the previous moment is erased, and then the knowledge mastery matrix is updated. The main structure of the forgetting module is shown in Figure 1.

The student’s forgetting factor F_i(i) for knowledge point i is converted into a forgetting vector fe_i(i) by a sigmoid function: 2 $f e_{t} (i) = Sigmoid (F E^{T} F_{t} (i) + b_{f e})$ $$f{e_t}(i) = {\text{Sigmoid}}(F{E^T}{F_t}(i) + {b_{fe}})$$

The fully connected layer weight matrix FE is of the shape of (d_v + d_c) × d_v and the bias vector b_fe is d_v-dimensional.

The students’ forgetting factors F_t(i) for the knowledge points i are then converted into update vectors fu_t(i) by a Tanh function: 3 $f u_{t} (i) = Tanh (F U^{T} F_{t} (i) + b_{j α})$ $$f{u_t}(i) = \operatorname{Tanh} (F{U^T}{F_t}(i) + {b_{j\alpha }})$$

The weight matrix FU is in the shape of (d_v + d_ó) × d_τ and the bias vector b_fu is d_v-dimensional.

The student’s knowledge mastery state matrix $M_{i - 1}^{v}$ $$M_{i - 1}^v$$ is then updated based on the obtained forgetting and updating vectors to obtain matrix $M_{i}^{F V}$ $$M_i^{FV}$$: 4 $M_{t}^{F V} (i) = M_{t - 1}^{V} (i) (1 - f e_{t} (i)) (1 + f u_{t} (i))$ $$M_t^{FV}(i) = M_{t - 1}^V(i)(1 - f{e_t}(i))(1 + f{u_t}(i))$$

The forgetting layer was processed to get the knowledge mastery matrix of the students before the start of this study $M_{i}^{F V}$ $$M_i^{FV}$$.

2.1.4

Learning simulation

The main role of the learning simulation module is to update the knowledge mastery matrix $M_{i}^{F V}$ $$M_i^{FV}$$ of the student before starting this study, generate the knowledge mastery matrix $M_{i}^{V}$ $$M_i^V$$ at the end of the study, and construct a model of the student’s learning behavior based on the results of the student’s answers. As an input is the result of the student’s answer at moment t, which is represented by binary group (e_t, a_t). Multiply the binary group (e_t, a_t) with the answer result embedding matrix B(d_υ × 2|E|) to obtain the d_v-dimensional answer result embedding vector r_t. Then the answer result embedding vector r, with the knowledge point weight vector related to the exercise w_i, is used as an input to update the student’s knowledge mastery state through the LSTM network to complete the modeling of the learning behavior, that is: 5 $M_{i}^{V} (i) = L S T M (r_{i}, w_{i} (i) M_{i}^{F V} (i))$ $$M_i^V(i) = LSTM({r_i},{w_i}(i)M_i^{FV}(i))$$

2.1.5

Prediction of results

In this study, the difficulty of the exercise is represented by a Tanh function calculation, i.e.: 6 $d_{t + 1} = Tanh (W_{D}^{T} v_{t + 1} + b_{D})$ $${d_{t + 1}} = \operatorname{Tanh} (W_D^T{v_{t + 1}} + {b_D})$$

Where: W_D and b_D denote the weight vector and bias vector in the fully connected layer, respectively.

The main purpose of the result prediction module is to predict the student’s performance the next time he/she answers question e_t+1 based on his/her knowledge mastery matrix. The knowledge mastery matrix is first updated to obtain the knowledge mastery matrix $M_{i + 1}^{F V}$ $$M_{i + 1}^{FV}$$ at the next start of the study, and then the probability of answering e_i+1 correctly is predicted. The weighted sum of the knowledge point related weights w_t+1 and the knowledge mastery matrix $M_{t + 1}^{F V}$ $$M_{t + 1}^{FV}$$ is used to obtain the weighted mastery embedding vector m_t+1 for the knowledge points related to the exercise: 7 $m_{t + 1} = \sum_{i = 1}^{K} w_{t + 1} (i) M_{t + 1}^{F V} (i)$ $${m_{t + 1}} = \sum\limits_{i = 1}^K {{w_{t + 1}}} (i)M_{t + 1}^{FV}(i)$$

Then vector m_t+1, vector v_t+1 and vector d_t+1 are combined to get the new vector [m_t+1, v_t+1, d_t+1] and it is fed into the Tanh function to get: 8 $h_{t + 1} = Tanh (W_{1}^{T} [\begin{matrix} m_{t + 1}, v_{t + 1}, d_{t + 1} \end{matrix}] + b_{1})$ $${h_{t + 1}} = \operatorname{Tanh} (W_1^T\left[ {\begin{array}{*{20}{c}} {{m_{t + 1}},{v_{t + 1}},{d_{t + 1}}} \end{array}} \right] + {b_1})$$

Where: W₁ and b₁ denote the weight and bias vectors in the fully connected layer, respectively.

Finally, the obtained vectors are input into the Sigmoid function to obtain the probability p_t+1 that the student will answer question e_t+1 correctly: 9 $p_{t + 1} = Sig moid (W_{2}^{T} h_{t + 1} + b_{2})$ $${p_{t + 1}} = {\text{Sig}}\operatorname{moid} (W_2^T{h_{t + 1}} + {b_2})$$

where: W₂ and b₂ denote the weights and bias vectors in the fully connected layer, respectively.

2.1.6

Knowledge level output layer

The main purpose of the Knowledge Level Output Layer is to output the student’s mastery of each knowledge point at the end of the study. This section takes as input the knowledge mastery matrix of the student at the end of the study and outputs a K-dimensional knowledge mastery level vector level_t. The mastery level of a knowledge point is represented by a number between (0, 1).

In the knowledge level output layer, only the student’s comprehensive mastery of the knowledge point is required, and here the unit vector δ_i = (0, 0, ⋯, 1, ⋯, 0) is used as the weight vector, where the value at the i-dimensional position is l.

Utilization: 10 $M_{i}^{v} (i) = δ_{i} M_{i}^{v}$ $$M_i^v(i) = {\delta_i}M_i^v$$

Extract the embedded vector of students’ mastery level of knowledge point i. Then use Eqs. (10) and $x_{i} (i) = Tanh (W_{i}^{T} [M_{i}^{V} (i), 0] + b_{i})$ $${x_i}(i) = \operatorname{Tanh} (W_i^T[M_i^V(i),0] + {b_i})$$ to obtain the students’ knowledge mastery level, i.e: 11 $l e v e l_{i} (i) = S i g m o i d (W_{2}^{T} x_{t} (i) + b_{2})$ $$leve{l_i}(i) = Sigmoid(W_2^T{x_t}(i) + {b_2})$$

Where: W₁, b₁, W₂, b₂ Same settings as in Eq. (8) and Eq. (9). 0 vector is used to complement the vector dimension and has no practical meaning.

2.2

Adaptive learning path planning model based on reinforcement learning

In order to realize the dynamic planning of learning paths for university public English, this paper proposes RL4ALPR, an adaptive learning path recommendation model based on reinforcement learning.

2.2.1

General framework of the model

The general framework of the RL4ALPR model is shown in Figure 2. Assuming that the length of the learning path is N, i.e., the recommendation is divided into a total of N time step, next, the model details are explained in terms of t time steps.

RL4ALPR is divided into the following four sub-modules: knowledge level modeling (FDKT-ED), candidate learning item screening (CN), reinforcement learning recommender (A2C), and reward computation. At each time step, FDKT-ED explores the learner’s potential knowledge level based on his/her historical interactions. CN filters a set of candidate learning items based on the learning items answered by the learner in the previous time step in a prerequisite graph, which serves as the action space for A2C. A2C provides the learner with the learning items to be answered. The reward is passed to A2C for improving the recommendation strategy.

The path generation process is modeled as a Markov decision-making process. At time step t, the state vector s_t = y_t ⊕ T composed of learner’s knowledge level y_t and learning objective T is used as the input of the policy network π(a ∣ s; θ), and the function of the policy network π(a ∣ s; θ) is to recommend a learning item k_t as an actor according to the current policy, and its output is a vector, and each dimension of the vector represents the probability that each learning item is selected, and then a learning item k_t is randomly selected in the set of candidate learning items screened out by CN according to the ε − greedy strategy. The learner’s answers to k_t generate new interaction records, and the environment generates new levels of knowledge y_t+1 and new states s_t+1 based on the learner’s interaction records. The value network v(s; w) is used as a critic for evaluating the action, i.e., the recommended learning item, with inputs as vectors s_t and outputs a scalar representing the predicted future cumulative discounted rewards, i.e., the payoffs, which are used to evaluate how good or bad the current recommended learning item k_t is and to improve the strategy π(a ∣ s; θ) for selecting the recommended learning item. The degree of change in the learner’s level of knowledge acquisition at the t time step is considered as a reward r_t for updating the parameters θ and w of the strategy network π(a ∣ s; θ), and the value network v(s; w). Each time step in the recommendation process generates a trajectory (s_t, k_t, r_t, s_t+1).

2.2.2

Knowledge level modeling

A learner’s state of mastery of a knowledge point is unobservable, but it is crucial for subsequent recommendations, because only by knowing the learner’s current state can we provide him/her with a learning program that matches his/her situation. A knowledge tracking task is defined as a task that explores a student’s state of mastery of each knowledge point based on his/her past learning activities, and can be used for knowledge level modeling and predicting the learner’s performance on the next learning item. In this paper, the deep knowledge tracking model (FDKT-ED) that incorporates the difficulty of exercises and forgetting behaviors as previously mentioned is used for knowledge level modeling.

2.2.3

Candidate Learning Item Screening

Due to the complexity of the logical relationships between knowledge points, this paper designs a cognitive navigation algorithm based on a central node on the prerequisite graph for quickly filtering candidate nodes related to the central node. The set of learning items represented by the candidate nodes is the action space of the recommender on which the policy network in the recommender selects a learning item. The set of candidate learning items can be filtered to avoid recommending learning items that violate the logic of human cognition, and at the same time, it can reduce the search space of the policy function in the intelligent body, which serves to accelerate convergence. Given the precondition graph G, learning objective T, center node k_c, and hop count n. At each time step, the center node is the knowledge point contained in the learning item recommended to the learner at the previous time step. The time complexity of the cognitive navigation algorithm is related to the size of the precondition graph G and the number of hops n, i.e., the time complexity of the algorithm is $O (| G | \cdot n)$ $$O\left( {\left| G \right| \cdot n} \right)$$.

2.2.4

Recommender Modeling

The role of reinforcement learning A2C algorithm [30] is to select the next moment action based on the current state, i.e., according to the learner’s current level of mastery of each knowledge point possessed by the learner, it recommends the learning items that need to be answered at the next moment, and the structure of the A2C algorithm is shown in Fig. 3.

The reinforcement learning A2C algorithm is mainly composed of two parts, the policy network π(a ∣ s; θ) and the value network v(s; w), both of which are forward fully connected neural networks. Strategy network π(a ∣ s; θ) is an approximation to the strategy function π(a ∣ s), which controls the intelligent body to select the action a that interacts with the environment, i.e., recommending the learning item k_t, and uses the Softmax activation function to output a vector, the value of each dimension of which represents the probability that the learning item will be selected, and then randomly selects the learning item k_t according to the probability by using the ε − greedy strategy. Value network v(s : w) is an approximation to the state value function V_π(s), which serves the purpose of evaluate how good or bad the current state s is, outputting a scalar. In contrast to the action value function q(s, a; w) in actor-critic, v(s; w) it depends only on the state s and is independent of the action a and thus v(s; w) easier to train.

The A2C algorithm uses the strategy gradient algorithm with baseline to update the parameters θ of the strategy function in the actor, which can make the updated strategy have smaller variance and converge faster. The TD algorithm is used to update the parameters w of the value function in the criterion, and the parameters are usually updated using the TD algorithm with Multi-StepTDTarget because of the large amount of noise generated by the single-step update.

At each moment t during the learning path recommendation process, after observing the current state s_t, according to the policy network π(·|s_t; θ_t), the intelligent body uses a ε − greedy policy to randomly sample an action a_t, i.e. a learning item k_t, which is answered by the learner to k_t, and the environment gives a new state s_t+1 and reward r_t, to obtain a trajectory (s_t, k_t, r_t, s_t+1) at the moment of t. After that, s_t+1 will serve as an input for π(·|s_t+1; θ_t) to re-calculate a new probability of the distribution of the actions, i.e. the probability of each learning item is recommended a new probability, and then based on this probability a random sampling is done to get the learning item $\tilde{k_{t + 1}}$ $$\widetilde {{k_{t + 1}}}$$, $\tilde{k_{t + 1}}$ $$\widetilde {{k_{t + 1}}}$$ which does not need to be answered by the learner, and is sampled out $\tilde{k_{t + 1}}$ $$\widetilde {{k_{t + 1}}}$$ in order to update the parameters θ and w in the strategy network and the value network.

After observing the m trajectories ${(s_{t + i}, k_{t + i}, r_{t + i}, s_{t + 1 + i})}_{i = 0}^{m - 1}$ $$\{ ({s_{t + i}},{k_{t + i}},{r_{t + i}},{s_{t + 1 + i}})\}_{i = 0}^{m - 1}$$ within the period from the moment t to the moment (t + m − 1), the Multi-step TD Target is calculated first, as shown in Equation (12): 12 $m T D T_{t} = \sum_{i = 0}^{m - 1} γ^{i} \cdot r_{t + i} + γ^{m} \cdot v (s_{t + m}; w)$ $$mTD{T_t} = \sum\limits_{i = 0}^{m - 1} {{\gamma^i}} \cdot {r_{t + i}} + {\gamma^m} \cdot v({s_{t + m}};w)$$

where γ is the discount rate. TD Error is calculated using equation (13): 13 $δ_{t} = v (s_{t}; w) - m T D T_{t}$ $${\delta_t} = v({s_t};w) - mTD{T_t}$$

The parameters θ of the policy network π(k_t ∣ s_t; θ_t) are updated using the policy gradient algorithm, and the derivation of the policy network π(k_t ∣ s_t; θ_t) is performed using Equation (14): 14 $d_{θ, t} = \frac{\partial \ln π (k_{t} | s_{t}; θ_{t})}{\partial θ_{t}}$ $${d_{\theta ,t}} = \frac{{\partial \ln \pi ({k_t}|{s_t};{\theta_t})}}{{\partial {\theta_t}}}$$

Update parameter θ using gradient ascent as shown in equation (15): 15 $θ_{t + 1} = θ_{t} - β \cdot δ_{t} \cdot d_{θ, t}$ $${\theta_{t + 1}} = {\theta_t} - \beta \cdot {\delta_t} \cdot {d_{\theta ,t}}$$

The parameter w of the value network v(s : w) is updated using the TD algorithm with Multi-Step TD Target, which is derived for the value network v(s_t : w), as shown in Equation (16): 16 $d_{w, t} = \frac{\partial v (s_{t; w})}{\partial w_{t}}$ $${d_{w,t}} = \frac{{\partial v({s_{t;w}})}}{{\partial {w_t}}}$$

Use Equation (17) to update parameter w: 17 $w_{t + 1} = w_{t} - α \cdot δ_{t} \cdot d_{w, t}$ $${w_{t + 1}} = {w_t} - \alpha \cdot {\delta_t} \cdot {d_{w,t}}$$

where δ_t · d_w,t is the gradient of the loss function, and the loss function is shown in Equation (18): 18 $\begin{array}{rcl} L o s s (w) & = & \frac{1}{m} \sum_{i = 0}^{m - 1} {[v (s_{t}; w) - m T D T_{t}]}^{2} \\ = & \frac{1}{m} \sum_{i = 0}^{m - 1} δ_{t}^{2} \end{array}$ $$\begin{array}{rcl} \mathcal{L}oss(w) &=& \frac{1}{m}\sum\limits_{i = 0}^{m - 1} {{{[v({s_t};w) - mTD{T_t}]}^2}} \\ &=& \frac{1}{m}\sum\limits_{i = 0}^{m - 1} {\delta_t^2} \\ \end{array}$$

The Monte Carlo approximation calculation g(k_t; θ) is used in updating the policy network as shown in Equation (19): 19 $\begin{array}{rcl} g (k_{t}; θ) & \approx & \frac{\partial \ln π (k_{t} | s_{t}; θ)}{\partial θ} \cdot [\sum_{i = 0}^{m - 1} γ^{i} \cdot r_{t + i} + γ^{m} \cdot v (s_{t + m}; w) - v (s_{t}; w)] - δ_{t} \\ = & \sum_{i = 0}^{m - 1} γ^{i} \cdot r_{t + i} + γ^{m} \cdot v (s_{t + m}; w) - v (s_{t}; w) \\ = & m T D T_{t} - v (s_{t}; w) \end{array}$ $$\begin{array}{rcl} g({k_t};\theta ) &\approx &\frac{{\partial \ln \pi ({k_t}|{s_t};\theta )}}{{\partial \theta }} \cdot \left[ {\sum\limits_{i = 0}^{m - 1} {{\gamma^i}} \cdot {r_{t + i}} + {\gamma^m} \cdot v({s_{t + m}};w) - v({s_t};w)} \right] - {\delta_t} \\ &=& \sum\limits_{i = 0}^{m - 1} {{\gamma^i}} \cdot {r_{t + i}} + {\gamma^m} \cdot v({s_{t + m}};w) - v({s_t};w) \\ &=& mTD{T_t} - v({s_t};w) \\ \end{array}$$

2.2.5

Calculation of incentives

After the execution of each action in a reinforcement learning algorithm, the environment provides an immediate reward r_t to guide the generation of the next action. For the learning path recommendation problem, when the recommended learning item is appropriate for the current learner and is beneficial to increase his/her knowledge level, the reward should encourage this behavior and vice versa should penalize this behavior. Therefore, the reward function is defined as the degree of change in the learner’s mastery level of the target knowledge point after answering the recommended learning item. At each time step in the recommendation process, the reward function is set as follows: 20 $r_{t} = \sum (y_{t + 1, k} - y_{t, k})$ $${r_t} = \sum {({y_{t + 1,k}} - {y_{t,k}})}$$

where k denotes the index of the knowledge point included in learning objective T, and y_t,k denotes t the learner’s mastery of the knowledge point k at the moment.

3

Model application analysis

In order to prove the effectiveness of the proposed goal motivation model for university public English, this chapter uses the deep knowledge tracking model to predict the students’ knowledge state, then uses the dynamic learning path planning model to recommend appropriate learning paths for the students according to the students’ knowledge state, and finally designs the teaching experiments to evaluate the model’s enhancement effect on the effectiveness of university public English teaching.

3.1

Students’ public college English knowledge tracking experiment

3.1.1

Prediction of student performance

Most knowledge-tracking models indirectly measure the accuracy of knowledge state modeling through the accuracy of students’ future performance prediction on their answers. Therefore, the AUC and F1-Score of the FDKT-ED model proposed in this paper are compared with the following seven baseline models in the task of predicting students’ future performance in a public English course at a university on two online publicly available datasets, ASSIST15 and ASSISTO9: 1)

DKT: This model is a single-state knowledge tracking model which utilizes RNN to track students’ knowledge states.

2)

CKT: This model is the first single-state knowledge tracking model that introduces CNNs into the knowledge tracking domain, taking into account students’ personalized prior knowledge and learning rates.

3)

ContextKT: This model is a single-state knowledge tracking model that utilizes LSTM to capture the temporal information of students’ historical question-answering interaction sequences, and uses the attention mechanism to capture the total learning transfer impact of similar concepts examined in historical exercises on the concepts currently examined.

4)

DKVMN: This model is a multi-state knowledge tracking model, which assumes that the knowledge of each course consists of multiple potential concepts, and uses the Key matrix to save these potential concepts, and the Value matrix to store and update the knowledge state of potential concepts.

5)

SPARSEKT: This model is a full-state knowledge tracking model based on the attention mechanism, which models the learning migration effect between historically similar concepts and current concepts, and proposes 2 sparsification heuristics to enhance the robustness and generalization of the model.

6)

GKT27: This model is the first full-state knowledge tracking model based on learning migration, which takes into account the positive migration precedence relationships between concepts when constructing knowledge structures.

7)

SKT: This model is a popular all-state knowledge tracking model that considers learning migration factors, and it considers both positive migration precedence relations and positive migration similarity relations between concepts, and uses influence propagation to model the learning migration effects corresponding to these 2 learning migration relations.

The experimental results of the 8 models on the 2 datasets are shown in Table 1. From the results in Table 1, it can be seen that: 1)

Compared with the DKT model, the DKVMN model improved the AUC and F1-Score values by an average of 7.10% and 7.74% on the ASSIST15 dataset and ASSIST09 dataset, respectively. This is due to the fact that the DKT model fuses the knowledge states of all concepts in a single vector, whereas the DKVMN model is able to model finer-grained knowledge states (i.e., the knowledge states of multiple potential concepts).

2)

The AUC and F1Score values of the ContextKT model are higher than those of the DKT model on the 2 datasets, which suggests that modeling the effect of learning migration between concepts using the attention mechanism can improve the performance of knowledge tracking models to some extent.

3)

The AUC and F1-Seore values of the GKT model on the ASSIST09 dataset are 77.58% and 70.51%, respectively, which are better than all other models except the FDKT-ED model in this paper. This suggests that modeling the effect of learning transfer under one learning transfer relationship using highly interpretable learning transfer diagrams and accurately modeling the knowledge state of each specific concept in a course in datasets with long sequences of students’ historical question-answer interactions can significantly improve the performance of knowledge tracking models.

4)

The AUC and F1-Score values of the SKT model on the ASSISTI5 dataset are also better than all other models except the FDKT-ED model in this paper, reaching 73.11% and 54.28%, respectively. This suggests that in datasets with shorter sequences of students’ historical question-answer interactions, more fully exploiting the learning transfer relationship and modeling the knowledge state of each specific concept can improve the predictive performance of the model.

5)

The FDKT-ED model proposed in this paper outperforms other models on the 2 datasets with AUC and F1-Score averages of 71.03% and 75.68%, respectively. On the ASSIST15 dataset, the AUC and F1-Score values of the FDKT-ED model improved by 3.38% and 22.46%, respectively, over the SKT model. On the ASSIST09 dataset, the AUC and FI-Score values of the FDKT-ED model improved by 2.24% and 2.17%, respectively, relative to the GKT model. This suggests that incorporating the difficulty of exercises and students’ forgetting behaviors for deep knowledge tracking of students based on the traditional DKVMN model can improve the accuracy of the task of predicting students’ future answer performance.

Table 1.

Experimental results of 8 models on 2 datasets

Model	ASSIST15		ASSISTO9
Model	AUC	F1-Score	AUC	F1-Score
DKT	0.6673	0.4834	0.7128	0.6539
CKT	0.7142	0.4917	0.7345	0.6751
ContextKT	0.7125	0.4562	0.7684	0.6958
DKVMN	0.7153	0.5275	0.7628	0.6954
SPARSEKT	0.7121	0.4932	0.7625	0.6843
GKT	0.7156	0.5124	0.7758	0.7051
SKT	0.7311	0.5428	0.7647	0.6954
FDKT-ED	0.7558	0.6647	0.7932	0.7204

3.1.2

Visualization analysis

Visualization of the evolution of the knowledge state was manipulated: first, a sequence of historical question-answering interactions of 2 students was selected from the ASSIST15 dataset. Subsequently, knowledge tracking was performed for each of these 2 sequences using the FDKT-ED model. Finally, the knowledge states were visualized. The visualization results of the knowledge state changes of the 2 students are shown in Figure 4, where Figures (a) and (b) represent the heatmaps of the knowledge state changes of the 1st and 2nd students, respectively. The data in the heatmap color blocks were rounded and converted to retain 2 valid decimals.

From Figure 4(a), it can be seen that when a student completes the exercise about Concept C₂₄, there is a change in that student’s state of knowledge about Concept C₂₄, and a change in the state of knowledge about other knowledge concepts that have a learning-transfer relationship with Concept C₂₄, such as Concept C₅₁ and Concept C₆₄. At the same time, the student’s state of knowledge does not undergo a sudden change due to a momentary correct or incorrect answer, but is smoothly updated. For example, when a student answered four consecutive exercises about Concept C₆₃ correctly from Time Step 3 to Time Step 6, the student’s mastery of Concept C₆₃ also increased only from 0.6 to 0.8. When the student answered the exercises about Concept C₉₆ incorrectly at Time Steps 10 and 11, the student’s mastery of Concept C₉₆ changed by less than 0.1 (from 0.58524637 to 0.55741926 to 0.54124584). The change in the state of knowledge for the 2nd student was similar to the change in the state of knowledge for the 1st student.

3.2

Adaptive dynamic learning path planning experiments

This section validates the effectiveness of the proposed adaptive learning path recommendation model RL4ALPR based on reinforcement learning and uses it in a real case.

3.2.1

Experimental setup

1)

Experimental dataset

The experimental dataset used in this section is from junyiacademy.org, which consists of a knowledge graph and more than 12 million learner logs of university public English. Each record in the learner log contains information about a learner, including user ID, the name of the concept involved, session 1D, the answer to the exercise, and a timestamp. Each exercise can be mapped to a node in the knowledge graph based on the concept name. Each exercise involves only one concept. Those records with the same session ID represent that they come from the exercise data of the same learner in a learning phase.

2)

Environment Modeling

Since the intelligent body (Agent) in reinforcement learning needs to interact with an environment model in order to train the model, which cannot be supported by the existing offline data. Therefore, in this paper, two different environment models are designed to realize the closed loop of reinforcement learning process for intelligent agents, namely: the knowledge structure-based environment model (KSE) and the knowledge evolution-based environment model (KEE).

3)

Evaluation Metrics

In order to obtain more comprehensive results, both logic and cognitive state enhancement will be used as evaluation indexes. In this paper, the students’ college public English test score enhancement Ep, obtained from the environment model, is used as cognitive state enhancement. At the same time, several educational experts are invited to score the learning logic of the path planned by the RL4ALPR model from the point of view of whether it conforms to the general cognitive law, so as to validate the model effect from another angle.

4)

Benchmarking method

In addition, in order to prove the effectiveness and robustness of the RL4ALPR model proposed in this paper, it was compared with the following benchmark methods:

KNN: This model finds the other learners who are most similar to the learner by comparing the cosine distances of the learning paths, and decides the next learning program for the new learner based on the learning paths of the other learners.

GRU4Rec: GRU4Rec is a classic session-based recommendation model. The input of the model is a sequence of sessions, while the output is a probability distribution of learning items that appear in the next step.

MCS: Monte Carlo Search (MCS) can be combined with knowledge tracing models to form a search method. That is, the knowledge tracing model predicts the learning effect of each searched path and is used as a criterion to judge how good or bad the learning path is.

DQN: Reinforcement learning is utilized to solve the learning path recommendation problem, which requires rich domain experts to design the state transfer matrix in the MDP and set the accurate initial state at the same time. Instead of discrete state and transfer matrices and simple Q-learning in MDP, DQN uses knowledge tracking model and deep value network respectively.

CN-Random: a number of candidate learning items are selected based on a cognitive navigation approach, and a learning item is recommended from them in a uniformly sampled manner as it is taken. This method is a simple knowledge structure based recommendation method.

Cog: Similar to CN-Random, a number of candidate learning items are selected based on cognitive navigation, but the difference is that Cog samples from the candidates in a weighted way.

In this paper, we use MXNet to implement the deep learning models and train each of the benchmark models and the models in this paper on a Linux server with quad-core 2.0GHz Intel Xeon E5-2620 CPUs and a Tesla K20mGPU.

3.2.2

Analysis of experimental results

1)

Cognitive Enhancement Evaluation Based on Environmental Modeling

According to the statistics, the median number of exercises included in a single learning stage is 25, and the length of the paths to be recommended for learning public English at university is uniformly set to 25. The average cognitive state enhancement of the paths recommended by all the models to the learners, E_p, is shown in Table 2.

As can be seen from Table 2, the proposed RL4ALPR model in this paper performs the best, with its E_p given by the environment models KSE and KEE reaching around 0.35 and 0.41, respectively. More specifically, by modeling the cognitive state, it beats CN-Random, GRU4Rec and KNN, and by filtering and recommending candidate learning items on the knowledge structure, it achieves better results than DQN, MC-10 and MC-5. By organically combining cognitive states and knowledge structures and providing immediate rewards for behavioral guidance to learners, RL4ALPR outperforms Cog. Further, it can be found that methods with knowledge structures perform better than those without in KSE, which suggests that knowledge structures have an impact on the effectiveness of recommended learning paths. It can also be found that in KEE, GRU4Rec outperforms all methods except RL4ALPR without explicit modeling of cognitive state and knowledge structure, which suggests that the simple introduction of cognitive state and knowledge structure does not necessarily guarantee an improvement in the effectiveness of recommendations. Overall, unified modeling of cognitive structures including cognitive states and knowledge structures is necessary in adaptive learning, but how to model and apply it in the recommendation process is challenging.

2)

Assessment of the Logic of Pathway Learning Based on Expert Ratings

Inspired by existing research work, this paper invites six experts in the field of education who are familiar with learning path planning to rate the results of the recommended learning paths for university public English generated by various methods in terms of learning logic. The scores ranged from 1 to 5, with higher scores representing greater logic. The experts rated the 200 samples of recommendations based on their own experience and in terms of whether the paths conformed to general cognitive patterns. Each sample contains the learner’s historical learning records, learning goals, and recommended learning paths. Four benchmark models, GRU4Rec, DQN, CN-Random, and Cog, were selected to compare with the methods in this paper, which differ in their cognitive state and knowledge structure modeling.

The results of the path learning logic assessment based on expert ratings are shown in Figure 5. As can be seen in Figure 5, the recommended results of RL4ALPR are most consistent with what human experts consider to be logical for learning, outperforming all benchmark models. In addition, it can be observed that the approach with cognitive navigation received higher scores in the expert evaluation, which suggests that cognitive navigation helps to maintain the logic of the learning path. In addition, the comparison reveals that the scores given by different experts for the same sample are sometimes very different. Moreover, comparing the results of E_p given by previous environmental models, it was noted that some models with higher logic scores may not be evaluated as well in environmental models. From these observations, it is clear that logicality is a relatively subjective metric that is difficult to be objectively quantified. The perception of learning logics varies somewhat from person to person. At the same time, learning logics is not equivalent to score improvement.

3)

Learning path length affects validation

The median length of an exercise sequence in a learning phase is 25, so it can be conjectured that the relatively reasonable learning length of a learning phase in KEE should be close to this value. In this paper, we verify this conjecture through experiments, and also conduct extensive experiments on KSE. The results of the validation of the impact of learning path length are shown in Fig. 6, (a) and (b) represent the results of the impact of learning path length on E_p in KSE and KEE, respectively. It can be seen that in KSE, E_p grows with the increase of length because the knowledge evolution law in KSE is formulated by rules that have no restriction on length. In KEE, on the other hand, the learning effect of all methods does not increase greatly after the path length exceeds 25, which verifies the conjecture of this paper.

4)

Case Study

In this paper, a visualization example is given as shown in Fig. 7. For better understanding, this paper draws a sub-diagram in Fig. 7 including the learning target and its three-hop neighbor points as shown in Fig. (a). And Figures (b)~(d) represent the learning paths planned by GRU4Rec, Cog, and RL4ALPR models, respectively.

In this example, a learner wants to master the learning item numbered 638 (completing-he_square-1), and his/her historical learning records are {(638,0), (638,0), (638,0), (638,0), (638,0)}. It is easy to see that the learner has been repeating his/her attempts to learn 638, but has not been able to break through. This indicates that he/she is having some difficulty in trying to master that learning objective. In order to help this learner, different approaches give different learning paths. The path recommended by GRU4Rec is that the learner keeps repeating the learning of the final objective even though it is found through the previous analysis that he/she may have learning difficulties. Whereas, most of the learning items in the path recommended by Cog are far away from the learning goal (not even near the learning goal). The RL4ALPR method in this paper, on the other hand, first allows learners to review the learning items with prerequisite relationships, and then to try to attack the learning goal while reviewing the prerequisite entities. It is easy to see that RL4ALPR plans a more reasonable and effective learning path for learners.

Table 2.

Comparison of E_p results

Model	KSE	KEE
KNN	0.000812	0.268027
GRU4Rec	0.007819	0.210638
MC-10	0.123466	0.002347
MC-50	0.112548	-0.005241
DQN	0.105724	0.002795
CN-Random	0.281973	0.140437
Cog	0.172039	0.175481
RL4ALPR	0.354762	0.412704

3.3

Analysis of the Effectiveness of Public English Teaching in Universities

In order to verify the effect of the proposed model on improving the effectiveness of university public English teaching, this paper designed a teaching control experiment. The experimental class used the target motivation strategy that combines the FDKT-ED model and the RL4ALPR model in this paper, while the control class used the conventional teaching model and motivation strategy, and the English levels of the two classes were at the same level. After a year-long teaching practice, the final exam scores (i.e., final exam scores in January 2023 and final exam scores in July 2023) of the experimental and control classes in reading, listening, and writing were collected, and the data were processed and analyzed by using the Statistical Package for the Social Sciences (SPSS).

3.3.1

Descriptive statistics of students’ scores

SPSS statistical software was used to study the differences in the test scores and total scores of reading, composition, and translation of students in different classes, and the descriptive statistics of students’ scores are shown in Table 3. As can be seen from Table 3, the mean score of each item in the experimental class is higher than that of the control class. The difference in the total score was the largest with 13.61 points. Reading was the next highest with a difference of 6.03 points. The smallest difference is in composition, with a difference of 1.27 points. It indicates that in this teaching experiment, the adaptive learning path planning method based on deep knowledge tracking helps students in the experimental class to improve their performance in college public English.

Table 3.

Descriptive statistics of students’ scores in each item of the examination paper

	Class	Number	Average score	Standard deviation	Standard error of mean value
Reading	Experimental class	82	18.54	4.483	0.524
Reading	Control class	83	12.51	4.325	0.482
Composition	Experimental class	82	14.52	2.427	0.271
Composition	Control class	83	13.25	4.136	0.464
Translation	Experimental class	82	12.96	3.729	0.418
Translation	Control class	83	10.87	5.804	0.641
Total points	Experimental class	82	65.13	7.418	0.829
Total points	Control class	83	51.52	14.213	1.514

3.3.2

Independent samples t-test for student scores

Independent samples t-test was utilized to study the differences in reading, writing as well as translation scores of students in different classes in the examination and the results of independent samples t-test for the scores of students in the experimental and control classes are shown in Table 4. From the results of the data in Table 4, the students in the experimental and control classes showed significant differences in their reading scores since Sig.=0.615>0.05 and therefore the variance is chi-square, while Sig.(2-tailed)=0.001<0.05. Similarly, the variance of writing and translation scores of the experimental and control classes were not homogeneous (p=0.000<0.05, p=0.001<0.05), and there was a significant difference in both writing (p=0.004<0.05) and translation scores (p=0.003<0.05). As for the total scores, the variance of the scores of the two classes was chi-square (p=0.849>0.05) and still significantly different (p=0.002<0.05). Students’ scores on ismat platform showed significant differences. And as shown in Table 3, the average scores of the experimental class in each item are higher than those of the control class, which proves that the teaching incentive strategy based on the dynamic planning method can help to improve the teaching effectiveness of university public English in this teaching experiment.

Table 4.

Independent sample t-test of students’ scores

	Levin’s test for equality of variances				T-test
	Assuming	F	Sig.	t	df	Sig.(2-tailed)	95% confidence interval of the difference
	Assuming	F	Sig.	t	df	Sig.(2-tailed)	Lower limit	Upper limit
Reading	Equal variances	0.284	0.615	8.726	163	0.001	4.728	7.476
Reading	Unequal variances			8.742	161.829	0.001	4.728	7.476
Composition	Equal variances	31.628	0.000	3.018	163	0.004	0.548	2.604
Composition	Unequal variances			3.026	131.584	0.004	0.552	2.603
Translation	Equal variances	21.204	0.001	3.159	163	0.003	0.819	3.874
Translation	Unequal variances			3.167	139.715	0.003	0.821	3.862
Total points	Equal variances	0.054	0.849	4.342	163	0.002	12.728	31.485
Total points	Unequal variances			4.341	162.758	0.002	12.756	31.486

4

Conclusion

In this paper, we constructed a deep knowledge tracking model FDKT-ED by integrating the difficulty of exercises and forgetting behavior, and realized the construction of an adaptive learning path recommendation model RL4ALPR based on FDKT-ED, which was used as a target motivation strategy to explore the efficacy of this strategy in enhancing the effectiveness of public English teaching in universities.

The FDKT-ED model outperformed the other models on the 2 datasets, ASSIST15 and ASSIST09, with AUC and F1-Score averages of 71.03% and 75.68%, respectively. The AUC and F1-Score values of this model on the ASSIST15 dataset were improved by 3.38% and 22.46%, respectively, over the optimal SKT model. While on the ASSIST09 dataset, the model achieves an improvement of 2.24% and 2.17% relative to the optimal GKT model. It indicates that deep knowledge tracking of students based on the traditional DKVMN model, incorporating the difficulty of exercises and students’ forgetting behaviors, can achieve accurate prediction of students’ future answer performance.

The RL4ALPR model by the environmental models KSE and KEE gave test score improvement E_p around 0.35 and 0.41, respectively. The model beat all the models such as CN-Random, GRU4Rec, KNN, DQN, MC-10, MC-5, and Cog by combining cognitive state and knowledge structure organically. And comparing the four benchmark models of GRU4Rec, DQN, CN-Random, and Cog, RL4ALPR’s learning trouser-leg planning results are most consistent with the learning logics perceived by human experts, which proves the feasibility of the RL4ALPR model in the dynamic learning path planning task.

In the teaching experiment, the experimental class utilized the learning path dynamic planning and goal motivation strategies designed in this paper, while the control class used conventional teaching strategies. The experimental results show that the experimental class outperforms the control class, which is at the same English level before the experiment, in reading, translation, writing scores and total scores in university public English, which verifies that the modeling strategy of this paper can effectively improve the teaching effectiveness of university public English.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

A Study of Goal Motivation Strategies Based on Dynamic Planning Methods to Enhance the Effectiveness of Public English Teaching in Universities

Xiaoli Huang

Published Online: Sep 26, 2025

Received: Jan 07, 2025

Accepted: May 05, 2025

DOI: https://doi.org/10.2478/amns-2025-1077

KeywordsDeep knowledge tracking, FDKT-ED model, RL4ALPR model, Adaptive learning path planning, Teaching effectiveness

© 2025 Xiaoli Huang, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Keywords
Deep knowledge tracking, FDKT-ED model, RL4ALPR model, Adaptive learning path planning, Teaching effectiveness