A study on the efficiency and accuracy of neural network model to optimize personalized recommendation of teaching content

Today’s education needs to be tailored to the needs of different learners with different teaching content and means, which requires personalized teaching. The demand for personalized education is reflected in two aspects [1-3]. On the one hand, society has a personalized demand for education. With the development of science and technology and the intensification of competition for talents, the social requirements for talents have undergone profound changes. Education has also changed rapidly from the traditional manufacturing of talents in one mold to the cultivation of diversified and personalized talents by the society. On the other hand, learners have individualized needs for education. Individual learners have different needs and different learning characteristics in all courses. Educators in traditional classroom settings face a challenge-how to adapt their teaching or educational methods to this diversity [4-7].

The need for personalized education requires personalized education, which is directly manifested in the personalization of teaching and learning. As an emerging form of education - network teaching, which is based on network technology, it is easier and more necessary to reflect the characteristics of personalization in the design. Due to the existence of personality differences in the object of education, individual learning goals, ability, interest, habits, foundation, style, personality and so on there are differences [8-10]. At present, many network teaching system, although its own resources are huge, but the learning process is fixed, learning methods and modes appear to be relatively single. This has caused a growing contradiction between the two. Many network teaching system ignores the learning itself is a personalized process, did not do according to the individual, tailored to the individual, taking into account the individual differences. Therefore, there is an urgent need for personalized recommendation function in online teaching system [11-14].

Personalized recommendation of teaching content can provide personalized service for learners based on their own preferences, information left by historical visits or relevant information of other similar learners [15-17]. Recommendations can be made in the form of recommending pages for students to browse, recommending learning content, recommending knowledge resources that are of interest to students in order to improve their learning efficiency, providing personalized information services, and distributing targeted emails. The use of neural network models to optimize personalized recommendations for teaching content can lead to more personalized web services, whereby decision makers can adjust certain information to cater to different learners [18-21].

In this paper, we constructed a student cognitive diagnostic model based on GCN, and designed a teaching resource recommendation method based on convolutional joint probability matrix factorization (CUPMF) model to optimize the efficiency and accuracy of personalized recommendation of teaching content. First, the graph convolutional neural network is combined with the traditional cognitive diagnosis model to obtain the mastery matrix for student knowledge points. Then, the CUPMF model is used to decompose the probability matrix of student cognitive diagnosis information, teaching resource information and student score performance information on teaching resources, etc., and solve the implicit feature matrix of test questions, implicit feature matrix of students, and knowledge point feature matrix by stochastic gradient descent method, so as to predict the probable performance of students on teaching resources and, according to their cognitive diagnosis results, the student-users are personalized recommendation of teaching resources. Finally, the performance of the model is experimentally verified to compare the efficiency and accuracy of traditional personalized recommendations for teaching content.

2

Cognitive diagnosis and neural network-based model for teaching content recommendation

In this paper, a cognitive diagnosis model based on graph convolutional neural network (GCN) is constructed to judge the cognitive level of students, and a teaching resource recommendation method based on convolutional joint probabilistic matrix factorization (CUPMF) model is proposed to carry out personalized and intelligent recommendation of teaching resources according to the cognitive level of students, so as to improve the efficiency and accuracy of recommendation.

2.1

GCN-based diagnostic model of student cognition

2.1.1

Definition of cognitive diagnostic problems

Suppose there are M student, N test questions and P knowledge points. Then all the students can be denoted as S = S₁,S₂,⋯,S_M, the test questions can be denoted as E = E₁,E₂,⋯,E_N, and the knowledge points as K = K₁,K₂,⋯,K_P In the usual process of doing the questions, each student will choose a part of the test questions to practice, and the obtained answer records are recorded as R, R is a three-dimensional array R = (S_i,E_j,t_ij), where S_i ∈ S, E_j ∈ E, t_ij is the score of the students’ S_i answers to the test questions E_j, and the score is t_ij = 1 when the answers are correct, and t_ij = 0 when the answers are incorrect.

In addition, the cognitive diagnostic model requires the input of a knowledge matrix Q = Q_{j,l_N×P} which is usually labeled by an expert or a teacher, and the elements Q_j,l of the matrix are defined as follows: (1) $Q_{j, l} = {\begin{array}{l} 1, & Question E_{j} examines this knowledge point K_{l} \\ 0, & Question E_{j} examines this knowledge point K_{l} \end{array}$

The task of the student cognitive diagnostic model in this paper is defined as: given an array R of each student’s answer records and a matrix Q of knowledge points, the potential mastery $u_{s}^{i}$ of each student on each of the known knowledge points K_l and the corresponding characteristics of the multiple test questions themselves (difficulty $v_{d i f f}^{j}$ of the questions, differentiation $v_{d i s k}^{j}$ , etc.) is obtained by predicting each student’s S_i scores y_i,j on the test questions E_j.

2.1.2

Diagnostic Modeling of Student Cognition

In order to solve the problem posed above, this paper establishes a student cognitive diagnostic model incorporating Graph Convolutional Neural Network (GCN) [22]. In this, the important mathematical notations used are as follows:

$x_{s}^{i}$ --one-hot vectors of students.

$x_{e}^{j}$ --one-hot vector of test questions.

Q_j,l --Knowledge Matrix.

$u_{s}^{i}$ --student mastery vector.

V^(r) --the characterization matrix of the test questions.

A --the adjacency matrix of the test nodes.

$v_{e}^{j}$ --The updated vector of test questions using graph convolutional neural network.

Q_e -- the knowledge point related vector of the test question.

$v_{d i f f}^{j}$ --Difficulty of the test questions.

$v_{d i s k}^{j}$ --The differentiation of the test questions.

y_ij --Final predicted score.

t_ij --True Score.

The main framework of the developed cognitive diagnostic model is shown in Figure 1. Typically, cognitive diagnostic models need to model three elements: the student S_i, the test questions E_j and the interaction function between them. The inputs to the cognitive diagnostic model are the one-hot vector representation of the student $x_{s}^{i}$ , the one-hot vector representation of the test questions $x_{e}^{j}$ and the knowledge matrix Q = Q_{j,l_N×P}, and then the potential feature vectors of the student and the test questions are obtained through the operation with the parameter matrix and the iterative computation of the graphical convolutional neural network: $u_{s}^{i}$ and $v_{e}^{j}$ , after which a three-layer fully connected neural network is utilized to predict the interaction function between them, and finally the predicted scores of the students are obtained y_i,j, and the diagnosis of each student is the student potential feature vector $u_{s}^{i}$ .

1)

Student Vector

In this paper, we use the vector of students’ mastery on each knowledge point $u_{s}^{i}$ to represent the students S_i, each item in the vector $u_{s}^{i}$ is consecutive and belongs to the range of [0, 1], and the higher its value represents the better mastery of this student on that knowledge. The length of $u_{s}^{i}$ is the number P of knowledge points K and the total number of vectors is the number M of students.

First the student vector $u_{s}^{i}$ is computed, which is obtained by multiplying the one-hot representation vector $x_{s}^{i}$ of students with the trainable parameter matrix B: (2) $u_{s}^{i} = s i g m o i d (x_{s}^{i} \times B)$

The sigmoid in Eq. (2) represents the nonlinear activation function, and $u_{s}^{i} \in {(0, 1)}^{1 \times P}$ is the student vector, where the parentheses (0,1) denote continuous quantities whose values are all between 0 and 1.

$x_{s}^{i} \in {(0, 1)}^{1 \times M}$ is the one-hot representation vector of the students S_i input at the beginning, where the parentheses (0,1) indicate that the vector $x_{s}^{i}$ takes only two values: 0 and 1. M represents the total number of students, and B ∈ℝ^M×P is the parameter matrix for the computation process, which can be determined by training.

2)

Test Question Vector

In this paper, the updating method of graph convolutional neural network is combined in calculating the feature vectors of the test questions, which can be used to obtain a new representation of the test question vectors by aggregating the neighbor information around each test question, and the calculation steps are as follows:

The input data of the cognitive diagnostic model is the one-hot representation vector $x_{e}^{j}$ of each test question E_j, whose dimension and number are the number of test questions N. Each test question is treated as a node in a graph convolutional neural network, and the initial feature vector of each node is the one-hot vector $x_{e}^{j}$ of the test questions, and all the vectors $x_{e}^{j}$ are formed into a N×N -dimensional matrix V⁽⁰⁾ which is said to be the initial feature matrix of all the test questions, and each of its rows is the feature vector of a test question, i.e. $V_{j}^{(0)} = x_{e}^{j}$

where each node also forms a N×N -dimensional adjacency matrix A between them, and the elements of the matrix A have only 0 and 1 values, which are defined as follows: (3) $A_{j, j^{'}} = {\begin{array}{l} 1, & There is a certain connection between test E_{j} and test E_{j}^{'} \\ 0, & There is no relationship between question E_{j} and question E_{j}^{'} \end{array}$

The initial feature matrix V⁽⁰⁾ and the adjacency matrix A_j,j′ of the test problem are the two input data of the graph convolutional neural network. If we consider it as a common neural network structure, the update iteration formula between layers is as follows: (4) $V^{(r + 1)} = s i g m o i d (D^{- \frac{1}{2}} A D^{- \frac{1}{2}} V^{(r)} W^{(r)})$

Matrix Ã = A + I in Eq. (4), where A is the adjacency matrix and I is the unit matrix. D is the degree matrix of matrix Ã, which is a diagonal matrix with diagonal elements being the sum of the elements of each row of matrix Ã. Matrix V^(r) is the feature matrix corresponding to each test question at each network layer, the input layer at the beginning is the initial feature matrix of the test question V⁽⁰⁾, and sigmoid denotes the nonlinear activation function. W^(r) is the matrix of weight parameters to be trained to obtain in each layer of the graph convolutional neural network.

In this paper, the number of layers r of the graph convolutional neural network is set to 2, so it is necessary to go through three iterations of Eq. (4) to update, and finally we can get the updated feature matrix of the test questions V⁽³⁾, and then each row of V⁽³⁾ is split, and finally we can get the feature vector of each test question after the update of the graph convolutional neural network $v_{e}^{j}$ .

Then use $v_{e}^{j}$ to calculate the different potential test feature vectors corresponding to each test question, which are classified into three categories in this paper: the knowledge point related vector of the test question Q_e, the difficulty vector of the test question $v_{d i f f}^{j}$ and the differentiation vector of the test question $v_{d i s k}^{j}$ , respectively, and the updating formula is as follows: (5) $\begin{array}{l} Q_{e}^{j} = s i g m o i d (v_{e}^{j} \times Q) \\ v_{d i f f}^{j} = s i g m o i d (v_{e}^{j} \times C) \\ v_{d i s k}^{j} = s i g m o i d (v_{e}^{j} \times D) \end{array}$

The sigmoid function in Eq. (5) represents a nonlinear activation function, $v_{e}^{j} \in {(0, 1)}^{1 \times N}$ is the updated test question vector obtained in the previous step, the knowledge point matrix Q ∈ (0,1)^N×P, where P is the number of knowledge points, and the knowledge point correlation vector $Q_{e}^{j} \in {(0, 1)}^{1 \times P}$ . C ∈ ℝ^N×P and D ∈ ℝ^N×1 are parameter matrices, which can be obtained from the model training, multiplying the test question eigenquantities with the parameter matrices to obtain the test question difficulty vector $v_{d i f f}^{j} \in {(0, 1)}^{1 \times P}$ and the test question differentiation vector $v_{d i s k}^{j} \in (0, 1)$ required by the model in the next step.

3)

Interaction function

From the above two-step calculation and training, the student vector $u_{s}^{i}$ and the three types of feature vectors of the test questions $Q_{e}^{j}, v_{d i f f}^{j}, v_{d i s k}^{j}$ have been obtained, inspired by the multidimensional item response theory method (MIRT), this paper designs the interaction function of the first layer of the model as: (6) $y_{i j}^{0} = Q_{e}^{j} \circ (u_{s}^{i} - v_{d i f f}^{j}) \times v_{d i s k}^{j}$

where the two vectors represented by the computational symbol ∘ are multiplied element by element, and the dimensions of vectors $u_{s}^{i}$ and $v_{d i f f}^{j}$ are both (0,1)^1×P. The resulting $y_{i j}^{(0)}$ is input into two fully connected layers and an output layer, which ultimately yields the predicted scores of the student cognitive diagnostic model for the student on the individual test questions y_ij, as computed as follows: (7) $\begin{array}{l} y_{i j}^{(1)} = s i g m o i d (W_{1} \times y_{i j}^{(0) T} + b_{1}) \\ y_{i j}^{(2)} = s i g m o i d (W_{2} \times y_{i j}^{(1)} + b_{2}) \\ y_{i j} = s i g m o i d (W_{3} \times y_{i j}^{(2)} + b_{3}) \end{array}$

W_i,i = 1,2,3 in Eq. (7) is the parameter matrix to be trained to obtain in each layer, and b_i,j = 1,2,3 is the offset of the network.

In order not to lose generality, this paper assumes that each element of the parameter matrix W_i in each layer is positive, easy to obtain in each layer, there is $\frac{\partial y_{i j}}{\partial u_{s}^{i}} > 0$ constant, which ensures the monotonicity of the diagnostic model, i.e., when a student has a low level of mastery of the knowledge of the point corresponds to the prediction of the score is also low.

The loss function of the cognitive diagnostic model is the sum of the cross-entropy between each student’s predicted score y_ij and the student’s true score t_ij on each question, calculated as follows: (8) $l o s s f u n c t i o n = - \sum_{i, j} (t_{i j} \log y_{i j} + (1 - t_{i j}) \log (1 - y_{i j}))$

Unlike traditional models for predicting scores, the student vector $u_{s}^{i}$ , which is trained to minimize the value of the loss function, is a diagnostic of the student’s potential knowledge state, and it represents the potential mastery of student S_i in each knowledge point.

2.2

Recommendation of teaching resources based on the CUPMF model

2.2.1

Overall framework of the CUPMF model

In this paper, we combine the convolutional neural network with the joint probability matrix factorization (PMF) model to propose a teaching resource recommendation method based on the convolutional joint probability matrix factorization (CUPMF) model [23]. Its algorithmic framework is shown in Fig. 2, which mainly contains three parts: 1)

GCN-based student-user cognitive diagnostic modeling to obtain the student-knowledge point mastery matrix.

2)

Convolutional neural network module, whose main purpose is to deeply mine teaching resources in different dimensions through convolutional neural networks, while seamlessly integrating them into joint probability matrix decomposition through nonlinear transformations in the output layer.

3)

Joint probability matrix decomposition, this part performs probability matrix decomposition by combining multiple information such as students’ cognitive diagnosis information, teaching resources information, and students’ score performance information on teaching resources, and through stochastic gradient descent method, it solves the implicit feature matrix of the test questions, implicit feature matrix of the students, and the feature matrix of the knowledge points containing the parameters of the CNN, and then predicts students’ possible performances on the possible performance on teaching resources and recommend teaching resources with difficulty suitable for current student users based on their cognitive diagnosis.

2.2.2

Convolutional Neural Networks

In this paper, we choose to use convolutional neural network (CNN) to mine the teaching resource data, the CNN framework is mainly responsible for mining the potential features of the test question teaching resources, generating the implicit feature vector of the test question, constructing the implicit feature matrix representation of the test question with the CNN weight parameter, which is used in the joint probability matrix decomposition model for training and solving [24]. The convolutional network framework of the CUPMF model is shown in Fig. 3, and its consists of the following four layers:

1)

Word embedding layer

The word embedding layer converts the original test question information into a dense numeric matrix as input to the next convolutional layer. The test question information mainly contains three parts: the question stem, the answer and the paraphrase, and these three parts are processed through the word splitting technique, and each word is randomly initialized or converted into a word vector by the pre-trained word embedding model, and finally the test question is represented as a dense numeric matrix by connecting the word vectors in the test question information T_j ∈ R^p×l: (9) $T_{j} = [\begin{matrix} \dots & w_{i - 1} & w_{i} & w_{i + 1} & \dots \end{matrix}]$

Where p is the dimension of the vector and l denotes the number of word vectors.

2)

Convolutional layer

The convolutional layer is mainly used to extract the feature information of a test question. A trial context feature $c_{i}^{j} \in R$ is extracted by the j th shared weight $W_{c}^{j} \in R^{p \times w}$ , whose window size w indicates the number of surrounding words, i.e: (10) $c_{i}^{j} = f u n (W_{c}^{j} \otimes D_{(:, i : (i + w - 1))} + b_{c}^{j})$

where “⊗” denotes the convolution operation, $b_{c}^{j} \in R$ is the deviation corresponding to $W_{c}^{j}$ , and fun is a nonlinear excitation function. In addition, this paper uses ReLU to avoid the gradient vanishing problem. Then, by constructing a contextual feature vector c^j ∈ R^l−w+1 with weights $W_{c}^{j}$ : (11) $c^{j} = [c_{1}^{j}, c_{2}^{j}, \dots, c_{i}^{j}, \dots, c_{l - w + 1}^{j}]$

Considering the limited test question feature information captured by using only a single shared weight, this paper employs multiple sets of shared weights in the convolutional layer to obtain multiple sets of feature vectors describing test question feature information.

3)

Pooling layer

After the convolution operation, the test information is represented as a feature matrix with nc level dimensions, and the dimensions of the test feature vectors in the matrix are not uniform, i.e., the number of columns of the matrix is not uniform, which not only results in the vector dimensions being too high for the computational performance, but also makes it difficult to construct the subsequent layers. Therefore, this model extracts representative features from each trial feature vector by pooling layers and reduces the representation of the trial document to n_c fixed-length feature vector by constructing the subsumption operation of fixed-length feature vectors, i.e.: (12) $d_{f} = [m a x (c^{1}), m a x (c^{2}), \dots, m a x (c^{j}), \dots, m a x (c^{n_{c}})]$

4)

Output Layer

The output layer is mainly responsible for making a nonlinear mapping of the output of the previous layers. Therefore, it is necessary to map d_f on the k -dimensional space of the joint probability matrix decomposition model to accomplish the recommendation task, i.e., to generate the trial potential matrix by using the conventional nonlinear mapping: (13) $D_{j} = t a n h (W_{f_{2}} {t a n h (W_{f_{1}} d_{f} + b_{f_{1}})} + b_{f_{2}})$

where W_f1 ∈ R^f×n_c and W_f2 ∈ R^k×f are the mapping matrices, and b_f1 and b_f2 are the deviation vectors of W_f1,W_f1 and D_j ∈ R^k. Finally, through the convolution and nonlinear transformation processing of the hidden layer, the convolution part of the CUPMF model is approximately intertwined into a nonlinear function, which takes the test word vectors as inputs, and the outputs are the implicit feature vectors corresponding to each test question: (14) $D_{j} = C n n (W, T_{j})$

where w denotes all weight and bias variables, T_j denotes the word vector of trial j after diving through words, and D_j denotes the implicit feature vector of trial j.

2.2.3

Joint probability matrix decomposition

The joint probability matrix decomposition model of the CUPMF model is shown in Fig. 4, whose main idea is to decompose the student-test score information matrix R ∈ ℝ^m×n collected by the platform, the student-knowledge point mastery matrix A ∈ ℝ^m×l constructed by the cognitive diagnostic model, and the test-question-knowledge point relationship matrix Q ∈ ℝ^n×l recorded manually by the domain experts through the matrix decomposition technique, and obtain the student implicit feature matrix U ∈ ℝ^m×v, knowledge point implicit feature matrix K ∈ ℝ^l×v, and test question implicit feature matrix D = CNN(W,TS),D ∈ ℝ^n×v. Then, the original correlation information matrix is represented by the Bayesian criterion, and the real correlation information matrix data is fitted by continuous training to obtain the relevant parameters of each implicit feature matrix, and finally, the trained implicit feature matrix predicts the student’s performance on the teaching resources, and the recommendation is made in conjunction with the mastery of the student-knowledge points.

The prior probability of the initialization matrix U follows a Gaussian distribution with mean 0 and variance $σ_{U}^{2}$ : (15) $p (U | σ_{U}^{2}) = \prod_{i = 1}^{m} G (U_{i} | 0, σ_{U}^{2} I)$

The prior probability of matrix K obeys a Gaussian distribution with mean 0 and variance $σ_{k}^{2}$ : (16) $p (K | σ_{K}^{2}) = \prod_{i = 1}^{l} G (K_{i} | 0, σ_{K}^{2} I)$

The initialization of matrix D differs from the traditional joint probability matrix decomposition in that it is determined by three main variables: 1)

The weights W between neurons in the convolutional network, and the weight W probability distribution: (17) $p (W | σ_{W}^{2}) = \prod_{k} G (w_{k} | 0, σ_{W}^{2})$

2)

Word vector T_j representing the trial j generated by the word embedding technique.

3)

Gaussian noise $ε_{j} ~ N (0, σ_{D}^{2})$ variable.

Thus, the trial question j undergoes the CNN network to generate the implicit feature vector D_j: (18) $D_{j} = C N N (W, T_{j}) + ε_{j}$

The probability distribution of matrix D is obtained by the above equation: (19) $p (D | W, T, σ_{D}^{2}) = \prod_{j = 1}^{n} G (D_{j} | C N N (W, T_{j}), σ_{D}^{2} I)$

From the implicit vector U_i of student i and the implicit vector D_j = CNN(W,T_j) of test j, it can be obtained that the probability r_ij of student i ‘s score on test j obeys a Gaussian distribution with mean $h (U_{i}^{T} C N N (W, T_{j}))$ and variance $σ_{R}^{2}$ and is independent, respectively, and its conditional probability distribution has the mathematical expression: (20) $p (R | U, D, σ_{R}^{2}) = \prod_{i = 1}^{m} \prod_{j = 1}^{n} {[G (r_{i j} | h (U_{i}^{T} C N N (W, T_{j})), σ_{R}^{2} I)]}^{l_{i j}^{R}}$

where $I_{i j}^{R}$ is an indicator function, $I_{i j}^{R} = 1$ if student i has done test j and $I_{i j}^{R} = 0$ otherwise. h(x) is a sigmoid function, i.e., the value of $U_{i}^{T} C n n (W, T_{j})$ is mapped to the range (0, 1).

Similarly, from the implicit eigenvector U_i of student i and the implicit eigenvector K_j of knowledge point j, it can be obtained that: Student i ‘s mastery of knowledge point j, ∝_ij, satisfies a Gaussian distribution with mean $h (U_{i}^{T} K_{j})$ and variance $σ_{A}^{2}$ and independent, with a conditional probability distribution: (21) $p (A | U, K, σ_{A}^{2}) = \prod_{i = 1}^{m} \prod_{j = 1}^{l} {[G (α_{i j} | h (U_{i}^{T} K_{j}), σ_{A}^{2} I)]}^{l_{i j}^{A}}$

where $I_{i j}^{A}$ is an indicator function, $I_{i j}^{A} = 1$ if student i has mastered knowledge point j and $I_{i j}^{A} = 0$ otherwise.

Similarly, from the implicit eigenvector D_i of test question i and the implicit eigenvector K_j of knowledge point j, we can get that: the association case q_ij of test question i and knowledge point j satisfies the Gaussian distribution with mean h(CNN(W,T_j)^TK_j) and variance $σ_{Q}^{2}$ and is independent, and its conditional probability distribution is: (22) $p (Q | D, K, σ_{Q}^{2}) = \prod_{i = 1}^{m} \prod_{j = 1}^{l} {[G (α_{i j} | h (C N N {(W, T_{j})}^{T} K_{j}), σ_{Q}^{2} I)]}^{l_{i j}^{Q}}$

where $I_{i j}^{Q}$ is the indicator function, $I_{i j}^{Q} = 1$ if test i examines knowledge point j, and $I_{i j}^{Q} = 0$ otherwise.

Combined with the above equation for the prior probability distribution, the posterior probability distribution for matrix U,D,W,K is obtained from the Bayesian criterion as: (23) $\begin{array}{l} (U, D, W, K | R, A, Q, T, σ_{U}^{2}, σ_{W}^{2}, σ_{D}^{2}, σ_{K}^{2}, σ_{R}^{2}, σ_{A}^{2}, σ_{Q}^{2}) \\ = p (Q | D, K, σ_{Q}^{2}) \cdot p (K | σ_{K}^{2}) \cdot p (D | W, T, σ_{D}^{2}) \cdot p (W | σ_{W}^{2}) \\ \cdot p (R | U, D, σ_{R}^{2}) \cdot p (U | σ_{U}^{2}) \cdot p (D | W, T, σ_{D}^{2}) \cdot p (W | σ_{W}^{2}) \\ \cdot p (A | U, K, σ_{A}^{2}) \cdot p (U | σ_{U}^{2}) \cdot p (K | σ_{K}^{2}) \end{array}$

The above equation can be obtained by taking logarithms on both sides of Equation (23): (24) $\begin{array}{l} \ln p (U, D, W, I & | R, A, Q, T, σ_{U}^{2}, σ_{W}^{2}, σ_{D}^{2}, σ_{K}^{2}, σ_{R}^{2}, σ_{A}^{2}, σ_{Q}^{2}) \\ = - \frac{1}{2 σ_{Q}^{2}} \sum_{i = 1}^{n} \sum_{j = 1}^{l} I_{i j}^{Q} (^{q_{i j} - h (C n n (^{W, T_{i}) T} K_{j})) 2} \\ - \frac{1}{2 σ_{R}^{2}} \sum_{i = 1}^{m} \sum_{j = 1}^{n} I_{i j}^{R} (^{r_{i j} - h (U_{i}^{T} C n n (W, T_{j}))) 2} \\ - \frac{1}{2 σ_{A}^{2}} \sum_{i = 1}^{m} \sum_{j = 1}^{l} I_{i j}^{A} (^{α_{i j} - h (U_{i}^{T} K_{j})) 2} - \frac{1}{2 σ_{U}^{2}} \sum_{i = 1}^{m} U_{i}^{T} U_{i} \\ - \frac{1}{2 σ_{D}^{2}} \sum_{j = 1}^{n} {(D_{j} - C n n (W, T_{j}))}^{2} - \frac{1}{2 σ_{W}^{2}} \sum_{i = 1}^{| W_{i} |} W_{i}^{T} W_{i} \\ - \frac{1}{2 σ_{K}^{2}} \sum_{i = 1}^{l} K_{i}^{T} K_{i} - \sum_{i = 1}^{m} \sum_{j = 1}^{n} I_{i j}^{R} \ln σ_{R} - \sum_{i = 1}^{m} \sum_{j = 1}^{l} I_{i j}^{A} \ln σ_{A} \\ - \sum_{i = 1}^{n} \sum_{j = 1}^{l} I_{i j}^{Q} \ln σ_{Q} - p \sum_{i = 1}^{m} \ln σ_{U} - p \sum_{i = 1}^{l} \ln σ_{K} \\ - p \sum_{i = 1}^{n} \ln σ_{D} - p \sum_{i = 1}^{| W_{i} |} \ln σ_{W} + C \end{array}$

where p denotes the dimension of the implicit eigenvector and C is a constant. The maximization formula (24) can be regarded as an unconstrained optimization problem, which is equivalent to the minimization formula (25): (25) $\begin{array}{l} E (U, D, W, K, R, A, Q) = \frac{φ_{Q}}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{l} I_{i j}^{Q} (^{q_{i j} - h (C n n (^{W, T_{i}) T} K_{j})) 2} \\ + \frac{1}{2} \sum_{i = 1}^{m} \sum_{j = 1}^{n} I_{i j}^{R} (^{r_{i j} - h (U_{i}^{T} C n n (W, T_{j}))) 2} \\ + \frac{φ_{A}}{2} \sum_{i = 1}^{m} \sum_{j = 1}^{l} I_{i j}^{A} (^{α_{i j} - h (U_{i}^{T} K_{j})) 2} + \frac{φ_{U}}{2} \sum_{i = 1}^{m} U_{i}^{T} U_{i} \\ + \frac{φ_{D}}{2} \sum_{j = 1}^{n} (D_{j} - C n n (W, T_{j}))^{2} + \frac{φ_{w}}{2} \sum_{i = 1}^{| W_{i} |} W_{i}^{T} W_{i} + \frac{φ_{K}}{2} \sum_{i = 1}^{l} K_{i}^{T} K_{i} \end{array}$

where the value of parameter φ_Q,φ_A,φ_U,φ_D,φ_W,φ_K is $\frac{σ_{R}^{2}}{σ_{Q}^{2}}, \frac{σ_{R}^{2}}{σ_{A}^{2}}, \frac{σ_{R}^{2}}{σ_{U}^{2}}, \frac{σ_{R}^{2}}{σ_{D}^{2}}, \frac{σ_{R}^{2}}{σ_{W}^{2}}, \frac{σ_{R}^{2}}{σ_{K}^{2}}$ , respectively.

3

Model application and analysis

3.1

Comparative Experiments on Cognitive Diagnostic Models

3.1.1

Experimental environment

The experiment was completed under the Python 3.11.7 environment, using the deep learning open source framework Pytorch, built on the Ubuntu 24.10 LTS operating system, remotely assisted by PyCharm software, and using an RTX4090D graphics card for high-performance computing.

3.1.2

Experimental data set and preprocessing

In this experiment, two representative public datasets in the field of smart education were used: the ASSISTMents2015 dataset and the EdNet dataset. The ASSISTMents 2015 dataset contains the answers of some students from 2015 to 2016 collected by the ASSISTMents platform, and contains rich information such as students, exercises, and knowledge points. While EdNet is a massive dataset that collects hundreds of millions of learning behavior records from nearly a million students from multiple platforms, in this experiment, the focus is only on information about the students, the exercises, the knowledge points, and whether or not the students answered the questions correctly as a result.

In this experiment, the dataset was preprocessed as follows: first, exercises without knowledge point annotations and students with less than 20 answer records were filtered out. In addition, considering that the EdNet dataset is too large, 1200 relatively active students were randomly selected for the diagnostic study during the experiment. Meanwhile, the experiment was divided to get the training set, validation set and test set by taking the ratio of 7:1:2. After data preprocessing, the statistics of the two data sets are shown in Table 1.

Table 1.

The statistics of datasets

Statistical term	ASSISTMents2015	EdNet
Number of students	4286	1200
Number of exercises	18026	13624
Knowledge points	131	195
Number of answer records	289653	1125341
Number of correctly answered exercises	201328	836725
Number of incorrect answers to exercises	88325	228616

3.1.3

Experimental setup

1)

Comparison of models

In this experiment, traditional diagnostic models, such as IRT, MIRT and DINA, and the latest neurocognitive diagnostic models, such as NeuralCD and RCD, are selected for comparison in order to validate the effectiveness of the cognitive diagnostic model proposed in this paper, which is based on graphical convolutional neural networks. The comparison models used are described as follows: (1)

IRT: It is a continuous nonlinear model, which is mainly used to model the relationship between one-dimensional students’ abilities and the characteristics of the exercises (e.g., difficulty of the exercises, differentiation of the exercises, etc.).

(2)

MIRT: It is an extension of the traditional unidimensional item response theory model that takes into account students’ fine-grained abilities and models student abilities and exercise characteristics from the perspective of multidimensional knowledge points.

(3)

DINA: is a discrete diagnostic model that views student cognitive states as multidimensional and independent, with students either mastering or not mastering in relation to knowledge points. It also focuses on the student’s guessing factor and sliding factor.

(4)

NeuralCD: Neural networks are introduced to model the complex interactions between students and exercises, and the monotonicity assumption in traditional diagnostic models is used to ensure the interpretability of student factors and exercise factors. The model uses multidimensional successive vectors to model students and exercises, where each dimension of the vectors represents a different knowledge point. For the student factor vector, each dimension corresponds to the degree to which the student has mastered the knowledge point. For the exercise factor vector, each dimension corresponds to the degree of relevance of the exercise to the knowledge point.

(5)

RCD: Modeling student-exercise-knowledge point relationships based on a hierarchical graph structure, especially the dependency graph of knowledge concepts.

2)

Evaluation Indicators

Considering that students’ ability level cannot be directly measured in cognitive diagnosis, this experiment used some common indicators to evaluate the performance of the model, such as root mean square error (RMSE), accuracy (ACC) and area under the ROC curve (AUC). From a regression perspective, RMSE is used to measure the difference between the predicted value and the actual score (0 or 1), with smaller values being better. From a categorization perspective, predicting whether a student will answer an exercise correctly can be viewed as a binary categorization problem. The predicted value is set to 1 if the predicted value is greater than or equal to 0.5, and 0 if it is not, and is measured using the ACC and AUC, with values as close to 1 as possible.

3.1.4

Analysis of experimental results

The experimental results of the GCN-based cognitive diagnostic model proposed in this paper on the ASSISTMents2015 and EdNet datasets as well as their comparison experiments are shown in Table 2.

Table 2.

Overall results on student performance prediction

Model	ASSISTMents2015			EdNet
Model	ACC↑	RMSE↓	AUC↑	ACC↑	RMSE↓	AUC↑
IRT	0.6555	0.5361	0.6092	0.6921	0.4654	0.7118
DINA	0.6659	0.5337	0.6788	0.6894	0.4871	0.6826
MIRT	0.7033	0.4695	0.7251	0.7047	0.4476	0.7273
NeuralCD	0.7406	0.4481	0.7605	0.7166	0.4304	0.7784
RCD	0.7421	0.4349	0.7938	0.7299	0.4239	0.7865
This article	0.7826	0.4271	0.8072	0.7497	0.4043	0.7907

From the data in Table 2, it can be seen that the ACC, RMSE, and AUC of this paper’s model on the ASSISTMents2015 dataset are 0.7826, 0.4271, and 0.8072, respectively, whereas on the EdNet dataset, it achieves 0.7497, 0.4043, and 0.7907, respectively. Compared with the other comparative models, especially the state-of-the-art hierarchical graph-based RCD model, this paper’s GCN-based cognitive diagnostic model achieves the best performance on all evaluation metrics, i.e., ACC and AUC achieve the maximum value and RMSE achieves the minimum value. This phenomenon indicates the effectiveness of the graph convolutional neural network model in combination with the traditional cognitive diagnostic model. In addition, the results from the ASSISTMents2015 dataset show that the model in this paper has a more stable performance when the dataset is sparse, which suggests that the interaction function module between the student vectors and the test question vectors plays an important role in this model. In addition, observing all the comparative experiments found that the neural network-based approach has better performance than the traditional cognitive diagnostic model, which indicates that the neural network-based approach can more effectively tap into the complex relationships among students, exercises and knowledge points.

3.2

Construction of student user profile based on cognitive diagnosis

In this section of the experiment, the student score matrix X and knowledge point examination matrix Q are obtained as inputs in the GCN-based cognitive diagnostic model after data preprocessing, and the EM algorithm is continuously looped and iterated to update the parameter failure rate s and guessing rate g, and eventually the updated s and g are obtained and the knowledge point mastery matrix A is computed, and the resulting cognitive portraits of some of the students are shown in Fig. 5.

From Fig. 5, it can be seen that the students’ knowledge point mastery varies, for example, the cognitive vector α = (1,1,0,0,0,0,0,1,1,1,0) for the student numbered S1, and the cognitive vector α = (1,1,1,1,1,1,1,1,1,1) for the student numbered S2. Teachers can understand the cognitive state of students based on the student user portraits derived from the GCN-based cognitive diagnostic model, and students can also perform targeted gap-finding based on their cognitive portraits.

The student score matrix X obtained from preprocessing is also used as the input of S-P table (Student-Problem Chart) analysis to get the student learning state portrait as shown in Fig. 6. Teachers can understand the students’ learning status from the learner state portrait obtained from the S-P Table analysis and pay targeted attention to students with high student warning coefficients.

3.3

Experiments and Analysis of Personalized Recommendation of Teaching Content

3.3.1

Experimental design and data set description

In this section, five recommendation methods are selected for comparison experiments, including the random recommendation selection strategy Random, the selection strategy DT based on multiple decision trees, the selection strategy IRT based on Item Response Theory and the teaching resource recommendation method based on the CUPMF model described in this paper.

Experimental environment: the experimental platform Windows11, program implementation using Python3.11.7. The data required to conduct the experiment is shown in Table 3.

Table 3.

Experimental data

Network node	5 knowledge points in each chapter
Data	15623 student history answer data
	KU relationship
	Recommended number of test questions n
	Number of question banks

EXPERIMENTAL DESIGN: A group of target students in each of the three categories of A, B, and C were selected for test question recommendation and evaluation observation. The experimental test was conducted at the time when the students’ historical answer records reached 30 questions.

Taking the students of category A as an example, the recommendation method of this paper was applied to recommend 10 test questions for 10 students of category A respectively, totaling 100 questions, and 20 recommended test questions were obtained by removing the duplicated test questions among them. At the same time, without communicating with each other, two experienced classroom teachers were asked to recommend two groups of 10 questions, respectively, to get 19 effective test questions, and the teacher recommendation results were used to evaluate the accuracy of the algorithm recommendation. Similarly, three groups of students, A, B, and C, were tested for recommendation and response respectively, and test data were obtained. The same method is used to obtain the recommendation test results of the remaining three comparative recommendation methods (Random, DT, and IRT), and the test result data obtained from the three types of students and the five recommendation methods are analyzed to compare the performance in terms of the common indicators of the model, the recommendation accuracy, and the reasonableness of the recommendation, respectively.

3.3.2

Evaluation indicators

In order to assess the effectiveness and accuracy of the algorithm, a variety of metrics were used to comprehensively evaluate the performance of the algorithm.

1)

Precision, Recall and F1

Precision, Recall and F1 metrics are used to evaluate the performance of the CUPMF model and other comparative methods in terms of recommendation algorithm modeling. Precision, Recall and F1 are specifically defined as follows: (26) $p r e c i s i o n = \frac{T P}{T P + F P}$ (27) $r e c a l l = \frac{T P}{T P + F N}$ (28) $F_{1} = \frac{2 * p r e c i s i o n * r e c a l l}{p r e c i s i o n + r e c a l l}$

Where TP in Eq. denotes the number of recommended test questions that correctly hit the teacher-recommended test questions, FP denotes the number of recommended test questions that failed to correctly hit the teacher-recommended test questions, and TN denotes the number of algorithm-recommended test questions that did not include the teacher-recommended test questions.

2)

MAE

MAE (Mean Absolute Error) measures the average of the absolute error between the predicted and actual values, and is used here to reflect the difference between the positive response rate of the test question recommendation results and the statistical expectation, i.e: (29) $M A E = \frac{\sum_{i = 1}^{n} | p r e d i c a t e d_{i} - a c t u a l_{i} |}{n}$

where n is the number of test questions, predicated_i is the predicted positive response rate value of the test questions for that student, and actual_i is the true expected positive response rate value for that category of students.

3)

Standard deviation of ideal response rate

Unlike the “optimal” tendency of test recommendation questions and common product recommendations, a higher positive response rate is not better for students doing the exercises. Different types of students have different positive response rates within a certain range to achieve optimal learning results.

In this paper, students are categorized into three types of students, including Class A students with good foundation, Class B students with moderate foundation, and Class C students with poor foundation. The optimal range of positive answer rate varies among the three categories of students. The expected positive response rate for recommending test questions to different students depends on a number of factors, such as the difficulty of the test questions and the knowledge level of the students. Based on the analysis of the historical data of the students in the category and the answer records of the students, the historical estimated correct response rate is found, and close to this range, it is considered to have found a suitable range for the students, in which the students will learn better.

As a result, the correct response rate used in this paper, defined as CRR (Correct response rate), i.e: (30) $\bar{C R R} = \frac{R i g h t}{T o t a l}$

Where Total is the total number of answers of the student’s type of student group in the recommended test questions, Right is the number of correct answers of the students of this type in the recommended test questions, and $\bar{C R R}$ can reflect the actual difficulty of the recommended questions for this type of students.

Based on the statistically obtained historical positive answer rate of the three types of students, in this paper, the experiment is selected as the average positive answer rate of the Nth time of the historical answer record of the same type of students, which is regarded as the ideal positive answer rate of this type of students in the Nth time of the answer. In the experiment, the 31st answer record was selected, and the ideal positive answer rate for type A students was 0.5287, for type B students was 0.4530, and for type C students was 0.4089. The standard deviation of the experiment was calculated based on the ideal positive answer rate for the corresponding type of students.

3.3.3

Analysis of experimental results

Complete the experiment, calculate the evaluation indexes, and get the comparison results of the index data of different algorithm models as shown in Table 4~Table 6. Table 4~Table 6 represent the indicator data of the test results of A, B and C students respectively.

Table 4.

Comparison of indicator data for test results of class A student

Algorithm/model	Precision	Recall	F1	MAE	$\bar{C R R}$ std. deviation	KU
Random	0.5562	0.7544	0.6403	0.0525	0.0235	1,2,4
DT	0.6164	0.7613	0.6812	0.0387	0.0016	2,3,4
IRT	0.6599	0.5836	0.6194	0.0656	0.0421	1,2,4,5
PMF	0.8599	0.8774	0.8686	0.0308	0.0045	1,2,4
CUPMF	0.8953	0.9105	0.9028	0.0147	0.0018	1,2,4

Table 5.

Comparison of indicator data for test results of class B student

Algorithm/model	Precision	Recall	F1	MAE	$\bar{C R R}$ std. deviation	KU
Random	0.6648	0.7534	0.7063	0.0289	0.0182	1,2,4,3
DT	0.7081	0.7521	0.7294	0.0226	0.0129	1,2,4
IRT	0.7541	0.8453	0.7971	0.0188	0.0141	1,2,4
PMF	0.8028	0.9052	0.8509	0.0111	0.0023	1,2,4
CUPMF	0.8517	0.9121	0.8809	0.0136	0.0019	1,2,4

Table 6.

Comparison of indicator data for test results of class C student

Algorithm/model	Precision	Recall	F1	MAE	$\bar{C R R}$ std. deviation	KU
Random	0.6062	0.7651	0.6764	0.0285	0.0051	1,2,4,5
DT	0.7036	0.8247	0.7594	0.0275	0.0118	2,3,4
IRT	0.6796	0.8247	0.7452	0.0223	0.0138	1,2,4,3
PMF	0.7637	0.8839	0.8194	0.0182	0.0032	1,2,4
CUPMF	0.8046	0.8921	0.8461	0.0141	0.0029	1,2,4

According to the experimental results, the data comparison table for each category is comprehensively observed. From the table, it can be seen that comparing the decision tree algorithm, the algorithm designed in this paper has a good performance in precision rate, recall rate and F1 indexes, and at the same time, after optimizing the sub-module and improving the performance of prediction network, the overall ability of the model has a small increase, which can reflect that the model in this paper has a higher performance than the other models.

In terms of MAE index, this paper’s model has the best performance, the mean value of error are lower, the MAE value of the test results of the three categories of students in A, B and C are 0.0147, 0.0136 and 0.0141, respectively. In terms of the standard deviation of the CRR, this paper’s model has the best recommendation, its $\bar{C R R}$ standard deviation is the smallest, and the standard deviation of the test results of the three categories of students in A, B and C are $\bar{C R R}$ 0.0018, 0.0019 and 0.0029, indicating that it is closest to the ideal correct response rate in the more expected range, so it is more reasonable., 0.0019 and 0.0029, indicating that it is closest to the ideal correct response rate, which is in the more expected range, and therefore more reasonable. Among the three categories of A, B and C students, the standard deviation of the C students with poorer foundation is the largest, reflecting that the algorithm needs to be improved to consider more about the reasonableness of the recommendation for the C students. In addition, from the point of view of the number of knowledge units (KUs) included in the recommended test questions, the model in this paper has good diversity in the recommended test questions while improving personalization.

4

Conclusion

In this paper, we optimize the efficiency and accuracy of personalized recommendation of teaching content by constructing a student cognitive diagnosis model based on GCN and a teaching resource recommendation model based on convolutional joint probability matrix factorization (CUPMF).

First, the cognitive diagnostic model in this paper achieved the best performance against other models with ACC, RMSE and AUC of 0.7826, 0.4271 and 0.8072 on the ASSISTMents2015 dataset and 0.7497, 0.4043 and 0.7907 on the EdNet dataset, respectively. This indicates that the graph convolutional neural network model can be effectively combined with the traditional cognitive diagnosis model, which can more effectively tap into the complex relationship between students, exercises, and knowledge points.

Secondly, the cognitive portrait of student users and the learning state portrait are constructed based on the cognitive diagnosis results of students, so as to understand the cognitive state of students based on the cognitive portrait of student users, and to understand the learning state of students based on the learner state portrait, as well as to pay targeted attention to the students with high student warning coefficients.

Finally, the personalized recommendation algorithm for teaching content designed in this paper has good performance in terms of precision rate, recall rate and F1 indexes, and from the point of view of the two indexes of MAE and $\bar{C R R}$ standard deviation, it has the best performance, with lower error mean and $\bar{C R R}$ standard deviation, indicating that it is the closest to the ideal correct response rate, which is in the range of more expected, and thus more reasonable. Meanwhile, from the point of view of the number of knowledge units (KUs) included in the recommended test questions, the model in this paper has a good diversity of recommended test questions while perfecting personalization.

Funding:

Higher Education Teaching Reform Research and Practice Project of Hebei Province: Construction and Practice of Innovation and Entrepreneurship Education Curriculum System Based on “Whole Field Double”.

Higher Education Teaching Reform Research and Practice Project of Hebei Province: Construction and practice of the curriculum system of "Labor Education" in local normal universities under the background of application transformation development -- based on the perspective of "vocational maturity" (2023GJJG371).

Research and Practice Project on Teaching Reform of Innovation and Entrepreneurship Education of Hebei Provincial Department of Education: Construction of Innovation and Entrepreneurship Education Curriculum System Based on the Linkage of Three Courses: A Case Study of Primary Education (2023CXCy175).

Langue:: Anglais

Périodicité:: 1 fois par an
Sujets de la revue:: Sciences de la vie, Sciences de la vie, autres, Mathématiques, Mathématiques appliquées, Mathématiques générales, Physique, Physique, autres

RSS Feed de la revue

A study on the efficiency and accuracy of neural network model to optimize personalized recommendation of teaching content

Weihang Zhang

Xinjiang Mi

Publié en ligne: 21 mars 2025

Reçu: 08 nov. 2024

Accepté: 16 févr. 2025

DOI: https://doi.org/10.2478/amns-2025-0547

Mots clésGraph convolutional neural network, Joint probability matrix decomposition, CUPMF, Cognitive diagnosis, Personalized recommendation

© 2025 Weihang Zhang et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Mots clés
Graph convolutional neural network, Joint probability matrix decomposition, CUPMF, Cognitive diagnosis, Personalized recommendation