Research on the method of combining artificial intelligence technology to improve the effectiveness of teaching analytical chemistry in colleges and universities

Analytical chemistry, as an important public foundation course for higher agricultural and forestry colleges and universities in agronomy, forestry, medicine, environmental science and life science and other related majors, is characterized by many knowledge points, strong logic and strong science [1-2]. Through teaching, students can master the basic principles, basic methods and basic experimental skills, cultivate students’ ability to find problems, analyze problems and solve problems, as well as cultivate students’ rigorous scientific attitude and scientific research style of excellence, laying a good theoretical foundation and practical basis for the study of later professional courses and scientific research.

The traditional teaching content of analytical chemistry course is designed in a single way, and the teacher only describes the main content through some words and guides the students to independently analyze the important and difficult knowledge [3-4]. However, the traditional means of analysis and testing are too outdated, and the experimental results obtained lack precision, which leads to insufficient understanding of the content of engineering expertise and stagnation in the development of engineering specialties [5-7]. With the wide application of intelligent technology in China’s education and teaching, it has changed the traditional teaching mode, and the intelligent classroom teaching method based on artificial intelligence can create a good analysis environment in the practice process [8-11]. With the help of artificial intelligence technology can intuitively display the important and difficult teaching content in front of students, guide students to actively analyze the operation process of the three-dimensional image in the chemistry experiment, replace the teacher’s traditional demonstration with virtual interactive experiments, which can save the time of the teacher’s teaching and explaining, and solve the problem of the experimental process that can’t be clearly observed due to the large number of students [12-15]. A large amount of classroom time is used to carry out independent experiments, stimulate students’ interest in exploration, take the initiative to analyze the chemical experiment, and the teacher plays a good supporting role in the experimental process [16-19]. In addition, the use of artificial intelligence technology will be a complete record of many data parameters involved in the classroom experimental process, laying a strong foundation for future experimental investigations in analytical chemistry [20-21].

The article first introduces the definition of cognitive diagnosis. According to the cognitive diagnosis technology, combined with the information of students’ answer situation in history chemistry, the TDINA model is proposed by introducing influencing factors such as the number of times of answering knowledge point-related resources on the calculation of the key positive answer rate, and improving the existing personalized students’ cognitive diagnosis model. In this article, ablation experiments and comparison experiments are conducted on two datasets, Junyi and EdNet-KT1, respectively. The predictive accuracy of the model is evaluated and analyzed using the predicted ACC and AUC. The article finally collected 310 valid test samples of the subject students and used the TDINA model proposed in this paper to diagnose and analyze the results of the response data, to calculate the probability of mastery of the students for each attribute and to determine the attribute mastery pattern of the students after choosing the appropriate algorithm. After analyzing the data on students’ attribute mastery probability and pattern, the characteristics and differences between different student groups are analyzed to verify the effectiveness of the model proposed in this paper.

2

Method

2.1

Cognitive Diagnostic Basis

2.1.1

Overview of Traditional Cognitive Diagnostics

In the field of educational psychology, the goal of measurement is to measure the level of competence, knowledge mastery, and other states of students in the learning process. However, the difficulty of psychological measurement lies in the subjective and hidden nature of human psychology. The performance of students in the same class often varies greatly. The task of cognitive diagnosis is to diagnose the knowledge mastery of students based on their answer records, and to analyze what they have learned and what they still lack in the learning process. Cognitive diagnosis is done by analyzing students’ answer records to analyze their degree of mastery of each knowledge point in the test questions. Traditional cognitive diagnostics are divided into continuous and discrete models based on how they model students’ knowledge mastery status. The representative models are item response theory (IRT) and deterministic input noise gate model (DINA) [22]. 1)

Item Response Theory

IRT assumes that students’ answer records are independent of each other, and each student’s cognitive state is represented by a real number, and combined with the objective factors of the test questions, such as the difficulty of the questions, the differentiation of the questions, the probability of students guessing correctly and the probability of error and a series of parameters of the diagnostic assessment of the cognitive level of the students, and the level of student’s knowledge and the results of answering the test questions can be expressed in the form of a function, which is called the item response function (ICF). In item response theory, there are three most common item response functions, which are single-parameter, two-parameter, and three-parameter logistic models, as shown in equations (1) and (3) respectively. (1) $P (u_{i j} = 1 | θ_{i}) = \frac{e^{(θ_{i} - b_{i})}}{1 + e^{(θ_{i} - b_{i})}}$ (2) $P (u_{i j} = 1 | θ_{i}) = \frac{1}{1 + e^{- D α_{j} θ_{i} - b_{j}}}$ (3) $P (u_{i j} = 1 | θ_{i}) = c_{j} + \frac{1 - c_{j}}{1 + e^{- D α_{j} (θ_{i} - b_{j})}}$

The image presented by these functions is known as item characteristic curve (ICC). The meaning of different parameters on the three-parameter item characteristic curve is shown in Figure 1.

In this case, parameter θ_i in the three-parameter model is the learning ability of the student i, which is represented by a continuous real value. α_j is the coefficient of differentiation of test questions, which indicates the differentiation between test questions, and determines the slope of the middle of the curve in the item-specific curve, and the smaller the value is, the more difficult it is to differentiate students’ scores [23]. b_j is the difficulty coefficient of the test questions, which indicates the difficulty of the test questions, and in the item special curve indicates the displacement in the direction of the horizontal axis, and the greater the difficulty coefficient, the higher the competence required for the students to get a high score. c_i is the guessing coefficient, which indicates the probability that a student will rely on guessing to answer the test question correctly without corresponding knowledge.

Multidimensional Item Response Theory (MIRT) is proposed to convert unidimensional features into multidimensional features by using multidimensional vectors θ = (θ₁, θ₂, …, θ_m) to represent students’ knowledge mastery, where each dimension θ_t corresponds to the mastery level of knowledge point i. Correspondingly, test question differentiation is represented in vector form α = (α₁, α₂, …, α_m)^T, where each dimension α_i represents the differentiation level of test question i. The item response function of MIRT is shown in Equation (4). (4) $P (θ) = \frac{e^{α^{T} θ + d}}{1 + e^{α^{T} θ + d}}$

The accuracy of IRT modeling depends on the accuracy of parameter estimation.The commonly used parameter estimation methods for IRT models are great likelihood estimation, EM algorithm, Bayesian algorithm and so on. Taking the great likelihood estimation method as an example, when all the parameters in IRT are unknown, the alternate estimation method is usually used for processing. For the unknown parameters first assume an initial value, bring in the student’s answer record, and establish the great likelihood function of the ability parameters as in equation (5). (5) $L = \prod_{i = 1}^{m} p_{i}^{y_{i}} {(1 - p_{i})}^{1 - y_{i}}$

where p_i is the probability that the student will get the correct answer and y_i is how the student really answered. Taking the logarithm of formula (5) yields: (6) $\ln (L) = \sum_{i = 1}^{m} y_{i} \ln (p_{i}) + (1 - y_{i}) \ln (1 - p_{i})$

The capacity parameter θ can be calculated by derivation of the great likelihood function of (6), after which the individual parameters can be calculated after continuous iteration. 2)

Deterministic input noise with gate

The DINA model is a multidimensional discrete cognitive diagnostic model, in which a student’s knowledge mastery is represented by a binary discrete vector, where each dimension of the vector represents a corresponding knowledge point, and the binary values (“0” and “1”) denote “not mastered” and “mastered”. Among the trial factors, DINA involves only two parameters, the guessing factor and the missing factor, and is therefore also more flexible than other models. Therefore, DINA’s model consists of the probability of not making a mistake in the “mastered” scenario and the probability of successful guessing in the “not mastered” scenario, which are expressed as shown in equations (7) and (8): (7) $P (\begin{matrix} u_{i j} = 1 | θ_{i}, s_{j}, g_{j} \end{matrix}) = {(\begin{matrix} 1 - s_{j} \end{matrix})}^{ξ_{i j}} g_{j}^{1 - ξ_{i j}}$ (8) $ξ_{i j} = \prod_{k = 1}^{K} θ_{i k}^{q_{i k}}$

Where, g_j denotes the probability that a student i correctly answers a test question by guessing without having mastered the knowledge points examined in test question j. s_j denotes the probability that a student i makes a mistake that results in failure to answer the test question correctly when he or she has mastered the knowledge points examined in test question j. ξ_ij indicates whether the student i fully grasps all the knowledge points examined in the test question j, where K indicates the number of knowledge points examined in the test question, and ξ_ij takes the value of 1 when all the knowledge points q_j contained in the test question j in the knowledge mastery θ_i of the student i are 1 (mastered), and ξ_ij takes the value of 0 when the student i has not fully mastered the knowledge points contained in the test question j Under the assumption of local independence, the The likelihood formula for the DINA model is shown in (9): (9) $L (s, g | θ) \prod_{i = 1}^{M} \prod_{j = 1}^{N} p_{j} {(\vec{θ_{l}})}^{u_{i j}} {(1 - p_{j} \vec{θ_{l}})}^{1 - u_{i j}}$

The whole model can be maximized by using the EM algorithm to maximize the marginal likelihood of the formula, which can be used to find the maximum likelihood estimator ${\hat{s}}_{j}$ for the miss rate and the maximum likelihood estimator ${\hat{g}}_{j}$ for the guess rate and finally using the maximum posterior probability algorithm can be used to find the estimator ${\hat{θ}}_{i}$ for the student’s mastery vector of the knowledge points.

Unlike the IRT model, which can only model a single knowledge point using a single dimensional variable, DINA can model all knowledge points simultaneously and introduces a Q matrix to make the model more interpretable.

2.1.2

Neurocognitive diagnosis

The schematic of the model of neurocognitive diagnosis is shown in Figure 2. The neurocognitive diagnostic model as a whole is divided into three components, the first part is composed of interaction vectors consisting of students’ knowledge point mastery, test question differentiation, knowledge point difficulty and knowledge point relevance. The second part is the interaction function composed of deep neural networks. The third part is the prediction layer that predicts the students’ answers to the test questions in the next moment.

In the neurocognitive diagnostic model, given the Q-matrix of the student’s answer record and the test questions, the student’s one-hot vector and the test question one-hot vector are input. The student vectors are multiplied by the corresponding learnable matrix A to obtain the student’s knowledge point mastery vector h^s. The test question vectors are multiplied by the learnable matrices B and D, respectively, to obtain the knowledge point difficulty vector h^diff and the differentiation of the questions h^disc. The knowledge point relevance vector Q_e is derived directly from the vectors in the Q matrix corresponding to the test questions. Inspired by IRT on MIRT, the interaction vector of the neurocognitive diagnostic model consists of the above four vectors together, as shown in Equation (10). (10) $x = Q_{e} ° (h^{s} - h^{d i f f}) \times h^{d i s c}$

In the neurocognitive diagnostic model, the interaction function, i.e., the cognitive diagnostic function in IRT, consists of three positively fully connected layers. The probability of a student correctly answering a question in the process of answering a question should be positively correlated with the student’s mastery of the knowledge point, and the neural network consisting of positive fully connected layers can ensure the monotonicity assumption in the IRT, which can more closely fit the real interaction curve of the student and ensure the interpretability of the model.

Finally, the prediction layer outputs the probability that the tested student can answer the test question correctly at the next moment, and uses the cross-entropy with the real answer record as the loss function for training, which can get the final diagnosis of the knowledge point mastery of the tested student.

2.2

TDINA model-based diagnosis of chemical cognition

2.2.1

Relevant Definitional Properties

1)

Definition and nature of knowledge

Definition of knowledge points: knowledge points are the basic elements of online education and classroom teaching activities in the process of teaching information transfer, but also the smallest unit of knowledge and the most specific content, in some cases also known as the “test points”, the unit of knowledge points for k, the whole set of knowledge points for K.

Defining Atomic Knowledge Points: for any knowledge point k_i(k_i ∈ K) in any knowledge point set, if it satisfies ∀k_i¬∃k_g(k_i, k_g ∈ K)(k_i ⊇ k_g), then k_i is said to be an atomic knowledge point, otherwise k_i is said to be a non-atomic knowledge point.

Define the containment relationship between knowledge points: for any knowledge points k_i and k_j in the full set of knowledge points K, if knowledge point k_i is contained in the category of knowledge point k_j, then knowledge point k_j is said to contain k_i, which is denoted as k_i ⊆ k_j. If condition ¬∃k_g(k_g ∈ K, k_j ⊇ k_gΛk_g ⊇ k_i) is satisfied by k_i and k_j, then knowledge point k_j is said to contain k_i directly, which is denoted as k_i ⊂ k_j. Containment relationship is also intuitively understood as the granularity of knowledge point k_j is larger than that of knowledge point k_i.

A set of knowledge points with associated relationships, between the knowledge points according to their inclusion and inclusion relationship can constitute a knowledge point structure tree, that is, according to the inclusion relationship between the associated knowledge points to build a lesson tree structure [24]. If k_i ⊆ k_j, then k_i is the child node, k_j is the parent node, if k_j ⊃ k_i, then the ancestor of knowledge point k_i is knowledge point k_j, and knowledge point k_i is the descendant of knowledge point k_j.

Definition of the relationship between the knowledge points of the previous order: for the full set of knowledge points K if ∀k_x, ∀k_y(k_x, k_y ∈ K) to meet the knowledge points k_x built on the basis of knowledge points k_y, that is, students need to master knowledge points k_x before continuing to learn knowledge points k_y, then k_y and k_x directly exist in the order of teaching the sequential relationship, recorded as k_x ⇒ k_y. That is, knowledge points k_x for the knowledge points k_y of the previous order of knowledge points, knowledge points k_y for the knowledge points k_x of the subsequent knowledge points.

Define the brotherhood relationship between knowledge points: given any knowledge point k_x, k_y, k_z in the full set of knowledge points K, if the condition (k_z ⊇ k_x) ∧ (k_z ⊇ k_y) is satisfied, then k_x, k_y is said to be a brother knowledge point of each other, denoted as k_x||k_y.

The following properties can be derived from the above definitions related to knowledge points:

Property 1: A non-atomic knowledge point can contain multiple knowledge points. In the process of knowledge point division, according to its complexity, a knowledge point can include more than one knowledge point.

Property 2: The inclusion relationship between knowledge points is irreversible. Reversal: If ∀k_x, ∀k_y(k_x, k_y ∈ K) meets (k_y ⊇ k_x) ∧ (k_x ⊇ k_y), then there must be k_x = k_y, therefore, there is no relationship between the two knowledge points contain each other, that is, the inclusion of knowledge points between the relationship is not reversible.

Property 3: the knowledge point “the area of the polygon” contains the knowledge point “the calculation of the base and height of the parallelogram”, i.e., the knowledge point contains the relationship is transferable.

2)

Definition of Knowledge Competency Level

Definition of Knowledge Competency Level (KSL): $K S L = {(k_{i}, l e v e l) | k_{i} \in K, l e v e l \in L e v e l}$ .

2.2.2

TDINA model algorithm design

1)

TDINA modeling

The TDINA model belongs to the potential classification model in the cognitive diagnostic model, which is suitable for cognitive diagnosis of dichotomous item tests [25]. The algorithmic flow of the TDINA model is shown in Fig. 3, which can be divided into the following eight steps.

Step1: Assuming that the target student user is u_i, filter the redundant data with small relationship according to the basic information (e.g., grade information, subject information, etc.) of student user u_i, so as to obtain the initial student-user set US = {u₁, u₂, u₃, …, u_m}, test-question set TS = {t₁, t₂, t₃, …, t_n}, and knowledge point set KS = {k₁, k₂, k₃, …, k_l}.

Step2: Construct a student-test score matrix from the behavioral data (mainly answer data) collected by the system, which is denoted as R_m×n. Construct a test question-knowledge point examination matrix from the knowledge point data labeled by some domain experts of the university, which is denoted as Q_n×l. Matrix R and matrix Q are shown in Fig. 4.

Step3: Define that each student user u_i can obtain a knowledge point mastery vector ${\vec{α}}_{i} = {α_{i 1}, α_{i 1}, α_{i 1}, ..., α_{l l}}$ , where α_ij = 1 indicates that the student u_i has mastered the knowledge point, and α_ij = 0 indicates that the student u_i has not mastered the knowledge point k_f. The ultimate goal of the TDINA model is to find the student-knowledge point mastery matrix $A = {\vec{α_{1}}, \vec{α_{2}}, \vec{α_{3}}, \dots, \vec{α_{m}}}$ .

Step4: Define the student’s initial response, where ξ_ij = 1 indicates that the student user u_i has mastered all the knowledge points examined in question t_j. Conversely, ξ_ij = 0 indicates that the student user u_i has not fully mastered all the knowledge points examined in test question t_j. (11) $ξ_{i j} = \prod_{k = 1}^{l} α_{i k}^{q_{j k}}$

Step5: Combine the actual situation of student users, i.e., there are some mistakes and guesses when students answer a certain test question. Therefore, the error rate and guessing rate are defined as Equation (12) and Equation (13), respectively. (12) $s_{j} = p (r_{i j} = 0 | ξ_{i j} = 1)$ (13) $g_{j} = p (r_{i j} = 1 | ξ_{i j} = 0)$

Step6: Combine the formulas defined in the previous steps as well as the matrices to compute the positive answer rate p(r_ij = 1|α_i) on test t_j, which is expressed in a formula as in Equation (14). (14) $p (r_{i j} = 1 | α_{i}) = {(1 - s_{j})}^{ξ_{i j}} g_{j}^{1 - ξ_{i j}}$

Step7: Analyze from the students’ point of view, the students’ positive answer rate for the test questions changes with their own personality constantly students’ own personality information has a great influence on the calculation of positive answer rate. In this paper, these two factors are introduced into the time factor t_j at the same time to improve the DINA model, in which the time factor is defined as in Equation (15). (15) $τ_{j} = T (λ, α, β) = \frac{\sum β (1 - λ t^{α})}{c o u n t (β)}$

Where λ and α are constant parameters used to fit the Ebbinghaus forgetting curve, β parameter for the student user history of answering the test question t_j, if the student answered correctly then 1, otherwise the value of 0, count(β) represents the number of times the student answered the test question t_j, combined with the positive answer rate in Step6 to get the final formula for the positive answer rate of the student as equation (16). (16) $\begin{matrix} p^{'} (r_{i j} = 1 | α_{i}) = T (λ, α, β) {(1 - s_{j})}^{ξ_{i j}} g_{j}^{1 - ξ_{i j}} \\ = \frac{\sum β (1 - λ t^{α})}{c o u n t (β)} {(1 - s_{j})}^{ξ_{i j}} g_{j}^{1 - ξ_{i j}} \end{matrix}$

Step8: The EM algorithm is used to maximize the marginal likelihood in Eq. to find the maximum likelihood estimator for the miss rate ${\hat{s}}_{j}$ and the maximum likelihood estimator for the guess rate ${\hat{g}}_{j}$ . Finally, the maximum a posteriori probability algorithm is used to find the estimator of the student-user’s Knowledge Mastery Vector ${\overset{⌢}{\vec{α}}}_{i}$ . 2)

EM algorithm solution

The TDINA model is a conditional distribution of response data r_ij given the student’s knowledge point mastery vector ${\vec{α}}_{i}$ . Here it is assumed that students’ responses to each question are independent given the attribute vectors, so the conditional distribution of ${\vec{r}}_{i}$ is obtained as in equation (17). (17) $L ({\bar{r}}_{i} | α_{i}) = \prod_{j = 1}^{n} p^{'} {(r_{i j} = 1 | {\bar{α}}_{i})}^{r_{i j}} {(1 - p^{'} (r_{i j} = 1 | {\bar{α}}_{i}))}^{1 - r_{i j}}$

The conditional probability distribution of the score matrix R for m student U is obtained from Eq. (18). (18) $L (R | A) = \prod_{i = 1}^{m} L ({\vec{r}}_{i} | α_{i}) = \prod_{i = 1}^{m} \prod_{j = 1}^{n} p^{'} {(r_{i j} = 1 | {\vec{α}}_{i})}^{r_{i j}} {(1 - p^{'} (r_{i j} = 1 | {\vec{α}}_{i}))}^{1 - r_{i j}}$

In order to calculate the estimates of the guess factor and the miss factor, the total likelihood function of the response data is given as equation (19). (19) $L (R) = \prod_{i = 1}^{m} L ({\vec{r}}_{i}) = \prod_{i = 1}^{m} \sum_{j = 1}^{2^{k}} p ({\vec{r}}_{i} | {\vec{α}}_{j}) p ({\vec{α}}_{j})$

Randomly given θ = {(s₁, g₁), (s₂, g₂), (s₃, g₃), …, (s_n, g_n)} set of initial values. Then the E-step and M-step of the EM algorithm are executed.

E-step: compute matrix $P (R | A) = {[p (\vec{r_{i}} | {\vec{α}}_{j})]}_{m \times 2^{k}}$ using the s_j and g_j estimates obtained from the previous round of EM, and compute the value of matrix $P (A | R) = {[p (\vec{α_{j}} | \vec{r_{i}})]}_{m \times 2^{k}}$ using P(R|A), where i = 1, 2, …, m, j = 1, 2, …, 2^k.

M-step: make Eqs. $\frac{\partial}{l} o g L (R) \partial s_{j}$ and $\frac{\partial}{l} o g L (R) \partial g_{j}$ 0, respectively, which can be obtained: (20) ${\hat{s}}_{j} = \frac{l_{j k}^{(1)} - R_{j k}^{(1)}}{l_{j k}^{(1)}}, {\hat{g}}_{j} = \frac{R_{j k}^{(0)}}{l_{j k}^{(0)}}$

Where $I_{j k}^{(0)}$ is the expected number of students who have fully mastered all the chemistry knowledge points examined in question j, it is the expected number of students in knowledge point mastery mode k, $R_{j k}^{(0)}$ is the expected number of students who answered question j correctly in $l_{j k}^{(0)}$ , and the meanings of $l_{j k}^{(1)}$ and $R_{j k}^{(1)}$ are similar to those of $I_{j k}^{(0)}$ and $R_{j k}^{(0)}$ , with the difference that the values of $l_{j k}^{(1)}$ and $R_{j k}^{(1)}$ are obtained when students have mastered all the knowledge points examined in question j. Therefore, the estimates calculated in step E can be used to calculate the values of $I_{j k}^{(0)}, R_{j k}^{(0)}, I_{j k}^{(1)}$ and $R_{j k}^{(1)}$ , and from these, new estimates of the error rate s_j and the guessing rate g_j can be obtained.

Repeat steps E and M until each θ-component converges. Find the maximum likelihood estimator of the miss rate ${\hat{s}}_{j}$ and the maximum likelihood estimator of the guess rate $\hat{g_{j}}$ . Finally, find the knowledge-point mastery vector estimator ${\overset{⌢}{\vec{α}}}_{t}$ of the student-user by combining the formulae with the maximum a posteriori probability algorithm (i.e., Bayes’ Theorem), which in turn leads to the student-knowledge-point mastery matrix A, and thus completes the diagnosis of the student-user’s level of cognitive ability in chemistry.

3

Results and discussion

3.1

Performance Comparison Experiment

In this section, we conduct several experiments to demonstrate the effectiveness of our proposed method and its implementation from various perspectives.

3.1.1

Experimental setup

1)

Datasets

In order to evaluate the performance of the TDINA method, we chose two publicly available datasets that are quantitatively large and satisfy the requirements of the proposed method’s features.The Junyi dataset collects the records of hundreds of thousands of learners answering the test questions in Junyi Academy from 2022 to 2023.The EdNet-KT1 dataset collects the records of hundreds of thousands of learners from a certain school’s students who have been studying and answering the test questions in Santa over a period of more than two years. Answering test question records. The details of the dataset are shown in Table 1. We randomly selected 10% of the data volume as the test set, and there are 12120552 records for the Junyi test set and 33152633 records for the EdNet-KT1 test set.

2)

Baseline methods

Furthermore, we compared our proposed model to several classical or recent baseline methods to evaluate its effectiveness. According to the techniques used in the baseline methods, the baseline methods are divided into three main categories: methods for recurrent neural networks, methods for customized neural networks, and methods for attention mechanisms.

3)

Experimental Setup

We set the hidden layer size of TDINA to 512, stacked 6 layers in both the encoder block and decoder block part, and its multiple heads in the multiple heads in the attention are set to 8, the learning rate is 0.0001, the optimizer uses Adam’s method, and the dropout is all set to zero.

Since the accuracy of the knowledge state is difficult to assess, we evaluate the performance of the model by predicting the learner’s performance. The two specific evaluation metrics are Area Under Curve (AUC) and Accuracy (ACC). It is worth noting that we set a threshold of 0.5 in the calculation of ACC metrics, i.e., a prediction value greater than or equal to the threshold is regarded as a correct answer, otherwise it is regarded as an incorrect answer.

Table 1.

Data set details

Particulars	Data set
Particulars	Junyi	EdNet-KT1
Number of interactions	12120552	33152633
Number of learners	195352	765223
Number of questions	726	14385
Number of knowledge points	42	21
Average number of learner interactions	69.6	45.8
The average number of questions related to knowledge points	19.3	155.24
The average number of knowledge points associated with the test	2	3.6

3.1.2

Analysis of experimental results

RQ1: Predictive Assessment of Learner Performance

We compared TDINA with seven baseline methods. The results of the comparison with the baseline models on AUC and ACC are shown in Table 2. We found three observations: first, our method achieves the best performance on both datasets. In particular, on the Junyi dataset, our method outperforms the baseline method’s best-performing AKT by 1.25% on AUC and 1.59% on EdNet-KT1. The results indicate that our designed method can effectively improve the accuracy of predicting the answer performance of chemistry learners. Second, in the comparison of the baseline methods, we found that on the Junyi dataset, the performance of AKT in the attention mechanism-based method is extremely similar to that of LPKT in the recurrent neural network-based method, which verifies the conclusion that the performance of LPKT and AKT in the middle is more similar. While AKT outperforms LPKT by 1.55% in AUC on EdNet-KT1 dataset, our method outperforms LPKT by 3.14% in AUC. This may be due to the fact that on large-scale datasets, methods based on the attention mechanism are more capable of mining rich correlation information and obtaining better prediction performance than LSTM, which is in line with the thesis in AKT.

Table 2.

Compare results with the benchmark model in AUC and ACC

Method	Junyi		EdNet-KT1
Method	AUC	ACC	AUC	ACC
DKT	0.7901	0.8767	0.7517	0.6006
LPKT	0.8073	0.8557	0.7822	0.6709
DKVMN	0.8123	0.8572	0.809	0.7271
HawkesKT	0.804	0.8533	0.7447	0.6359
SAKT	0.7113	0.8659	0.6558	0.6181
SAINT+	0.7388	0.8523	0.6763	0.6507
AKT	0.8091	0.8657	0.7977	0.7122
Ours	0.8216	0.7993	0.8136	0.7207

RQ2: Ablation Experiment

In this section, we conducted ablation experiments to further demonstrate the impact of each component on the final results.The results of the ablation experiments are shown in Table 3 (TDINA-K refers to disregard of knowledge point coding.TDINA-F refers to the use of only indexed positions in the answer sequence, no longer using temporal positions.TDINA-L refers to the exclusion of the Learning Proficiency module.TDINA-S refers to the exclusion of speed). The table’s experimental results reveal some intriguing conclusions: 1)

TDINA-S with exclusion speed has a poor performance effect on both datasets, and thus the multitask prediction with added response speed has the most important impact on the TDINA method. Learner reaction speed characteristics are less considered in traditional knowledge tracking methods, and their portrayal of knowledge states is one-sided. Reaction speed is a crucial component of response, which can offer a more precise depiction of the cognitive state.

2)

Temporal Distance Attention based on the forgetting law has a greater impact on our approach.TDINA assigns positions on the time axis to the time units that learners interact with, i.e., it captures the distance feature of the sequence interval time in the temporal feature, and transforms the traditional equidistant and discrete indexed positions into unequal and continuous temporal positions. Temporal distance attention not only captures the correlation of knowledge states between temporal units and the preceding sequential temporal units, but also obtains correlation between units in temporal distance.

3)

For the knowledge point experiment, we can find that the TDINA-K method without knowledge points performs poorly, because according to the knowledge points can make the practice of test questions containing the same knowledge points more closely, and the responses of the test questions with the same knowledge points in their antecedent sequence of time units will have a certain effect on the current knowledge state.

4)

The TDINA-L method, which excludes learning proficiency, performs slightly lower than the TDINA method in the AUC, especially more pronounced on the Junyi dataset. This performance is not obvious on EdNet-KT1, possibly because learning proficiency is closely related to the number of repetitions and intervals of the same knowledge point, whereas the average number of learner interactions in Junyi is more than that of EdNet-KT1, and therefore, learners may train the same knowledge point more repeatedly on Junyi, and the effect is more important.

Table 3.

Ablation experiment results

Methods	knowledge	speed	learning	forgetting	Junyi		EdNet-KT1
Methods	knowledge	speed	learning	forgetting	AUC	ACC	AUC	ACC
TDINA-K			√	√	0.8068	0.8505	0.7814	0.7247
TDINA-F	√	√	√		0.8672	0.8692	0.7957	0.6942
TDINA-L		√		√	0.8453	0.8484	0.7826	0.724
TDINA-S	√	√	√		0.808	0.8557	0.7915	0.6964
TDINA	√	√	√	√	0.8185	0.8543	0.7769	0.7836

RQ3: Impact of Attention

In order to verify the impact of the three multi-head self-attention mechanisms with masks used in TDINA, we randomly selected the records of a certain learner answering seven consecutive test questions and visualized their attention weights, and the attention visualization is shown in Fig. 5, (a) ~ (c) are the three heat maps in the figure, which are denoted in the order of the following in TDINA: the self-attention in the time-unit encoder mechanism, the first attention module and the second attention module in the time series decoder. The values in the heat maps are the weight values assigned to the presequence time units output by the three attention modules when predicting trial responses.

Specifically, we observe that the first attention module assigns attentional weights based on the correlation with the antecedent time units. We observe that the weights are maximized on the diagonal, i.e., each time unit has the highest correlation with its own weights, and that time units closer to the current one are assigned higher weights. This implies that learners’ cognitive states are more likely to be influenced by their most recent learning behavior than their more distant historical learning, which is in line with the findings of the MF-DAKT method.

RQ4: Visualization of speed

The visualization of reaction speed and response results is shown in Fig. 6, (a) to (d) are the few judgment errors, the high level of knowledge acquisition, the high probability of learners obtaining incorrect responses, and the highest difficulty of the test questions, respectively, by applying the method in this paper. We randomly selected four test questions in the 7-test set, and visualized the speed and the probability of correct prediction of all learners who answered these four questions, and compared them with the real responses represented by the colors. The horizontal coordinates in the graph represent the speed of response (there are 10 categories), and the larger their value, the faster the response. The vertical coordinate represents the predicted results. The higher the value, the higher the probability that the learner will answer the test question correctly. We find an interesting phenomenon in the figure. In the scatter plot, the upper side is dominated by hollow circles while the lower side is dominated by solid circles. This phenomenon is related to the threshold setting of the prediction value, i.e., if the prediction value is greater than 0.5, the learner is judged to have answered the question correctly: if the threshold is less than 0.5, the answer is wrong. And the upper and lower polar trends in the figure reflect that our method can predict accurately in most cases, and only a small number of judgments are incorrect.

RQ5: Visualization of Knowledge Proficiency

In order to explore the changes of learning proficiency on learners’ knowledge states, we plotted a visualization of the knowledge state evolution of 30 consecutive test questions answered by a learner, and the knowledge states (m represents the overflow value) are shown in Table 4. The visualization of knowledge states is shown in Figure 7(a). This is used to model the effect of learning and forgetting on the knowledge state. As seen from the figure, after the learners practiced the test questions related to the K4 knowledge points for many times, the mastery of the K4 knowledge points got a significant improvement. In addition, the proficiency required for different test questions and different knowledge points differs, e.g., the mastery state of the K4 knowledge point is still lower than that of the less practiced K2 and K5 knowledge points even after much practice, which indicates that the difficulty of different knowledge points varies, and the state thresholds required for them to answer different test questions correctly are also different. For more difficult test questions, learners need to put in more effort to reach a more proficient state. We will continue to explore the connections and differences between various knowledge points. Finally, the graph of the learners’ knowledge state after training 30 test questions is shown in Figure 7(b). Overall, learners’ state of knowledge mastery improves after several practice sessions, as in the case of knowledge points K2 and K4, while less practiced knowledge points, such as K1 and K3, become more and more insecure over time because their mastery is not strong.

Table 4.

Knowledge state (m represents the overflow value)

Serial number	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30
Epetitions	2	3	3	0	0	0	2	3	3	5	5	6	7	8	8	9	11	12	14	12	13	18	15	15	13	11	10	9	8	7
Interval time	45	12	17	/	/	/	13	8	8	5	5	10	m	51	75	m	36	m	m	37	53	m	7	5	4	5	9	32	8	8

3.2

Analysis of the effectiveness of chemistry teaching based on the TDINA model

3.2.1

Analysis of student attribute mastery probabilities

By analyzing the students’ mastery of each attribute, it is possible to get a clear picture of which attributes are better mastered and which are not so well mastered by the students, so that they can be consolidated in the subsequent teaching. The attribute mastery probabilities of all subjects as well as the specific attributes of each class are shown in Table 5. The attribute mastery probability distribution of all subjects is shown in Figure 8. From the probability of mastering attributes in each class, the probability of mastering attribute A1 (chemical reaction rate) in four classes A, B, C, D, reaches 100%, which is higher than the average value of the whole group, indicating that the first four classes have a very good mastery of attribute A1, while the probability of mastering attribute A1 in two classes, E and F, is relatively low and lower than the average value of the whole group. The probability of mastery of attribute A2 (factors affecting the rate of chemical reaction) is greater than 0.9 for classes A and B, while the probability of mastery for the other four classes ranges from 0.72 to 0.89. The probability of mastery of attribute A3 (reversible reaction) was relatively low for classes C and D, at 0.82 and 0.77, while the other four classes had a probability of mastery of A3 of 0.92 or more. The probability of mastery of attribute A4 (chemical equilibrium state) is lower than 0.9 for classes C, E, and F. All other classes are above 0.9. The probability of mastery of attribute A5 (regulation of chemical reactions) is generally low, around 0.85 in classes A and B, and below 0.8 in the other four classes.Combining the probability of mastery of the five attributes, the probability of mastery of each attribute is higher in focus classes A and B than in the other four parallel classes.

Table 5.

All class properties master probability statistics

mean	A1	A2	A3	A4	A5
Class A	1.0000	0.9148	0.9604	0.9028	0.845
Class B	1.0000	0.9905	0.9756	0.9683	0.8619
Class C	1.0000	0.7717	0.8235	0.8558	0.7408
Class D	1.0000	0.7299	0.7716	0.9505	0.7959
Class E	0.8961	0.8803	0.9278	0.8187	0.7105
Class F	0.9364	0.8824	0.9223	0.8874	0.7055
Totality	0.8864	0.897	0.9998	0.9025	0.7793

As can be seen from the data in the chart, the mastery probability of all attributes of all students is between 0.7 and 1.0, which shows that the students have a good overall mastery of chemical reaction rates and limits, but there are some deficiencies in some of them, and the attributes that are well mastered are attribute A1 (rate of chemical reaction) and A3 (reversible reaction), with the mastery probability of 99.53% and 94.35%, respectively, and the cognitively better mastered attributes are A2 (factors affecting the rate of chemical reactions) and A4 (state of chemical equilibrium) with mastery probabilities of 89.76% and 91.15% respectively, while attribute A5 (regulation of chemical equilibrium) has a low mastery probability of 80.28%. Considering that in the established attribute hierarchical structure, attribute A5 has a direct relationship or indirect relationship with attributes A1, A2, A3 and A4, so in the constructed ideal mastery model, when examining attribute A5, the other four cognitive attributes are bound to be examined at the same time, so that the questions are more comprehensive and difficult, and the mastery probability obtained in the end is on the low side. It also reflects that students will encounter certain difficulties in dealing with the more comprehensive questions. Overall, the results of the diagnostic analysis point out the direction for subsequent consolidation teaching, and teachers should focus on the regulation of chemical reactions, while taking into account the review of influencing the rate and limit of chemical reactions and chemical equilibrium state, and help students to establish the knowledge system of the rate and limit of chemical reactions at a hierarchical level, so as to facilitate the subsequent more in-depth study.

3.2.2

Differentiated Evaluation of Student Groups Based on the Attribute Mastery Model

1)

Calculation of Attribute Mastery Patterns

The calculation of students’ attribute mastery pattern needs to be processed based on their responses to the items. The results of the students’ responses to the test questions, with correct responses recorded as 1 and incorrect responses recorded as 0, form a vector of 0 and 1 elements based on the order of the test questions, which is called the item response pattern in cognitive diagnostic theory. In the cognitive diagnostic analysis platform (flexCDMs). The DINA model is selected, which requires the input of item response pattern, Q matrix, and then the selection of suitable subject parameter estimation methods in order to calculate the students’ attribute mastery pattern. Commonly used methods for subject parameter estimation include great likelihood estimation (MLE), great a posteriori estimation (MAP), and expected a posteriori estimation (EAP). According to the recent progress of the research on the selection of subjects’ parameter estimation methods, the MLE algorithm is more capable of distinguishing mastery patterns, but it is not possible to estimate the parameters of the quiz takers with full scores or zero scores, and the MAP algorithm is an improvement on it, so the improved MAP algorithm was chosen in this study to compute the attribute mastery patterns of the subjects. Table 6 exhibits the answer results, item response patterns, and mastery patterns of all subjects.

Table 6.

The results of the trial and the corresponding project response patterns and properties

Subject number	Results	Project response mode	Attribute master mode
1	ABDBCDAADAAAD	1111111111011	11011
2	ABDBCDBACADAB	1111111100101	11011
3	ABBBCDBBDABCB	1111111111100	11110
4	ABDCCDACDABDB	1111110011011	11111
5	ABDBCDBBDACCB	1111101111011	11110
6	ABDBCDBADADBA	1111111011011	11110
......	......	......	......
304	ABDBCDCABCADD	1110101100110	11111
305	ABBBCDCABCABC	0111110100111	11111
306	ABDBCBCBDADAB	1110110111011	11110
307	ABDBCDDDDBDDC	1101110100011	10111
308	ABCBCDABDABDD	1111110110010	11101
309	CBDBCDBDDADDB	0111111010010	11110
310	ADDBCDBBDDBDB	1101110110100	11110

After the cognitive diagnosis of DINA model, we got the attribute mastery patterns of all the responding students, and it can be seen that the attribute mastery patterns of some of the students are the same, and in order to facilitate the categorization of the study, the mastery patterns of all the students were classified, and the classification of the attribute mastery patterns of all the subjects is shown in Table 7. After categorization, it can be found that the number of cognitive attributes mastered by all subjects is 3 or more than 3, which indicates that the overall mastery situation is better, and there is no situation that the mastery of the content of the chapter at the end of the study is very low. Meanwhile, observing the table, we can learn that there are 6 attribute mastery patterns for all subjects, which are 10111, 11010, 11111, 11001, 11010, 11110. After calculation, there are 9 ideal mastery patterns. However, the ideal mastery pattern is only a learning path in an ideal state, and it is not ideal for all students. In other words, it does not mean that there are students whose mastery patterns are all ideal mastery patterns. It can be seen that three of the six attribute mastery patterns obtained by the group of subject students are not in the ideal mastery pattern, which are 11010, 11001, and 11010. In order to understand the ideal learning state of all subjects, the subsumption rate can be found. The ratio of subjects categorised in the ideal mastery pattern to the total number of subjects determines the subsumption rate.

Table 7.

The whole subjects are given the model classification

Master mode	frequency	proportion(%)	Cumulative ratio(%)	Whether it’s an ideal mode
10111	26	8.39%	8.39%	YES
11010	18	5.81%	14.2%	NO
11010	2	0.65%	14.85%	YES
11001	25	8.06%	22.91%	NO
11111	42	13.55%	36.46%	NO
11110	197	63.55%	100%	YES

The imputation rate of the ideal mastery pattern is shown in Table 8. As shown in the table, it can be seen that the imputation rate of MAP algorithm is 92.2%, and the number of students whose mastery mode is 10111 is 26. The number of students with mastery pattern 11111 is 42, the number of students with mastery pattern 11110 is 197, and all of the above are students belonging to the ideal mastery pattern, which is 265 in total.

Table 8.

The ideal master mode is classified

Algorithm	MAP
Count	288
Total number	310
Classification rate	92.2%

2)

Differentiated evaluation of student groups in different classes

According to the students’ attribute mastery patterns calculated in the previous section, the distribution of mastery patterns in each class can be sorted out, and the statistics of the number of people in each mastery pattern in each class are shown in Table 9. Comparison can be found that the attribute mastery mode with the largest proportion in all six classes is 10111, of which the largest proportion is in classes A, D and E, and the smallest proportion is in class F, accounting for 65.31% of the total number of the class, indicating that students mastering all the cognitive attributes in all the six classes take up the majority of students, and that the overall learning status is good, but the proportion of the number of people who have mastered all of the key classes is generally higher than that of the parallel classes. Attribute mastery pattern 11010 only appeared in class E, with three students, accounting for 5.26% of the total class size, and the teacher of this class could individually remind this student to pay attention to reviewing chemical reaction rates. While the attribute mastery model 11010 appeared more in class D, totaling 9 people, accounting for 15.79% of the total number of students in the class, the students in this class may not have a sufficient understanding of the regulation of chemical reactions, and the teacher can appropriately add the relevant content in the review.

In summary, according to the different proportion of attribute mastery patterns in each class, teachers in each class can reasonably adjust the focus of subsequent classroom knowledge lectures based on the proportion of each attribute mastery pattern in the class, check and fill the gaps in the weak cognitive attributes of the class to achieve the purpose of precise lectures, and they can help the students to sort out the logical relationship between knowledge based on the reasonable knowledge structure in order to, after laying a good foundation, to We can also help students organize the logical relationship between knowledge based on a reasonable knowledge structure, so that after laying a good foundation, they can more firmly grasp the more comprehensive knowledge points.

Table 9.

The number of models is counted in each class

Attribute master mode	10111	11010	11010	11001	11111	11110	Total
Class A	35	6	0	4	2	2	49
Class B	33	8	0	0	1	4	46
Class C	32	5	0	5	5	5	52
Class D	35	9	0	5	3	5	57
Class E	35	6	3	4	4	5	57
Class F	32	8	0	3	1	5	49

4

Conclusion

Cognitive diagnosis can analyze the knowledge level of students at specific knowledge points based on their historical learning records and predict their future learning performance, which is an important core component of the intelligent education system. Based on this, the article proposes the TDINA model based on cognitive diagnosis theory, and analyzes the effectiveness of chemistry teaching through relevant experiments.

In the learner performance prediction assessment experiment, this article’s method is 1.25% higher than AKT on AUC and 1.59% higher on EdNet-KT1. The results of the comparison experiments show that the model proposed in this paper has better prediction accuracy compared with other comparison models.

The imputation rate of the ideal mastery model is 92.2%, with 26, 42 and 197 students with mastery models 10111, 11111 and 11110, respectively, for a total of 265 students or 85%. This validates the usefulness of the model in this paper in improving the effectiveness of chemistry teaching.

Idioma:: Inglés

Calendario de la edición:: 1 veces al año
Temas de la revista:: Ciencias de la vida, Ciencias de la vida, otros, Matemáticas, Matemáticas aplicadas, Matemáticas generales, Física, Física, otros

RSS Feed de revista

Research on the method of combining artificial intelligence technology to improve the effectiveness of teaching analytical chemistry in colleges and universities

Linghua Chen

Publicado en línea: 21 mar 2025

Recibido: 12 nov 2024

Aceptado: 14 feb 2025

DOI: https://doi.org/10.2478/amns-2025-0592

Palabras claveTDINA model, Knowledge mastery level, Cognitive diagnosis, Teaching effectiveness

© 2025 Linghua Chen, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Palabras clave
TDINA model, Knowledge mastery level, Cognitive diagnosis, Teaching effectiveness