Accesso libero

Research on the Innovative Model of English Teaching by Integrating Traditional Culture and Artificial Intelligence

  
29 set 2025
INFORMAZIONI SU QUESTO ARTICOLO

Cita
Scarica la copertina

Introduction

As cultural self-confidence becomes a new dimension of competition in China’s economic, social and industrial development in the new era, the era of advocating cultural self-confidence has come, and English has become a necessary medium for Chinese culture to go global [1]. English teaching in colleges and universities is directly related to the cultural competitiveness of the times, the construction of socialist culture, as well as the realization of the cross-cultural attributes of English, so it is necessary to cultivate cultural self-confidence in English teaching in colleges and universities [2-4]. Cultural self-confidence is the full affirmation and active practice of a nation, country, or political party of its own cultural value, as well as its firm confidence in the vitality of its own culture [5-7]. English is an extremely important tool for cultural communication, and English teaching in colleges and universities is an important way to ensure that English plays the role of a communication medium [8-9]. In this regard, colleges and universities should not only be responsible for the teaching of language knowledge itself at this stage, but also take the combination of excellent traditional Chinese culture and English teaching as an important task, use language as a communication medium, and cultivate students’ cultural self-confidence through English teaching in colleges and universities [10-13].

However, there are still some problems in the current integration of traditional cultural elements in English teaching in colleges and universities, such as the lack of systematic teaching resources, a single teaching method, and the lack of an effective assessment mechanism [14-15]. The rapid development of information technology provides a new opportunity and platform for the integration of traditional cultural elements [16]. With the popularization of network technology, educational resources can be freely circulated and shared, and these resources can greatly enrich the teaching content and improve the quality of teaching [17-18]. At the same time, students and teachers in the informationization era can interact in real time to improve the teaching effect [19]. In addition, through technologies such as big data and artificial intelligence, teachers can personalize teaching to the different needs and characteristics of students, understand the learning situation, interests and learning styles of each student, so as to provide personalized teaching programs for each student [20-23]. Finally, education in the age of informationization is no longer limited by time and space, as long as there is a network, students can learn anytime and anywhere, and at the same time, network education can make education more fair and popular [24-25]. Therefore, in the age of informationization, the research on how to use artificial intelligence technology to effectively integrate traditional cultural elements into English teaching in colleges and universities and improve students’ cross-cultural communication skills has important practical significance and theoretical value.

In this paper, we constructed an intelligent recommendation model for recommending similar English topics combined with deep knowledge tracking. Among them, Dynamic Key-Value Memory Network (DKVMN-F) based on forgetting behavior in sequences is used to track and predict students’ knowledge level. The attention-based neural network framework (HANN) is used to realize the intelligent recommendation of exercises in English teaching by extracting the information features of heterogeneous data of the topics, taking into account the students’ knowledge level. Based on the knowledge tracking and the intelligent recommendation model of exercises, an innovative model of English teaching is designed and combined with the teaching strategy of integrating traditional culture into the English curriculum, which realizes the organic integration and model improvement of traditional culture and English teaching based on artificial intelligence.

Teaching Strategies for Integrating Chinese Traditional Culture into English Courses

The reform of the “Three Teachings” of teachers, teaching materials and teaching methods is an important content and direction of the supply-side reform of vocational education and the connotation construction of schools, a major initiative for the development of higher education in the new era, and an important opportunity to enhance the Chinese cultural identity of public language courses in colleges and universities. The reform of “Three Teachings” is based on the fundamental tasks of cultivating morality and cultivating high-quality workers and technical and skilled talents with moral and technical skills, deepening the integration of industry and education, and school-enterprise cooperation, and realizing the combination of theory and practice, improving the relevance, vocationality and practicability of teaching and upgrading the level of talent cultivation. Based on the content of the “Three Teachings” reform, this study proposes teaching strategies to enhance Chinese cultural identity in college English courses.

Strengthening the concepts and objectives of teaching Chinese cultural identity

College English courses in higher education should clarify the concepts and objectives of teaching Chinese cultural identity, that is, to cultivate students’ sense of Chinese cultural identity as the core, to improve students’ cultural literacy and cross-cultural communication skills, and to promote students’ all-round development and implement the practice of socialist core values as the orientation. Specifically, the following points should be achieved:

Based on the historical inheritance and practical needs of the Chinese nation, emphasize the important position and role of Chinese excellent traditional culture in English teaching in colleges and universities, so as to make students realize the necessity and urgency of learning and spreading Chinese excellent traditional culture.

Highlighting the people-oriented approach, paying attention to students’ personality differences and interests, respecting students’ subjective status and right to choose, so that students can actively participate in the teaching of Chinese cultural identity. Focus on cultivating students’ innovative spirit and practical ability, encouraging students to think and express themselves creatively on the basis of understanding and appreciating the excellent traditional Chinese culture, combining the excellent traditional Chinese culture with modern social life, and giving full play to its contemporary value and practical significance.

To guide students to establish a correct worldview, outlook on life and values, to cultivate a sense of social responsibility and a sense of mission, and to enable students to contribute to the realization of the Chinese dream of the great rejuvenation of the Chinese nation on the basis of the excellent traditional Chinese culture and with the guidance of the socialist core values.

Enriching the Content and Form of Teaching Chinese Cultural Identity

University English programs in colleges and universities should enrich the contents and forms of teaching Chinese cultural identity, and adopt various teaching forms such as lecturing, discussion, demonstration, experience and participation, so as to enable students to feel and understand the excellent traditional Chinese culture from multiple perspectives and levels. Specifically, university English programs in colleges and universities should do the following:

Select teaching contents related to Chinese excellent traditional culture, which should cover many fields such as philosophy, history, literature, art, science and technology, and reflect the spiritual connotation, value concepts, aesthetic characteristics, and innovative achievements of Chinese excellent traditional culture. Taking into account the professional characteristics of English teaching in colleges and universities and the needs of the industry, the teaching content related to students’ professional background and future development should be selected to reflect the application and influence of Chinese excellent traditional culture in different fields and industries. A variety of teaching forms are used, such as lectures, discussions, demonstrations, experiences, etc., to increase the interest and interactivity of teaching, to mobilize students’ subjective initiative and participation, and to improve the effectiveness and efficiency of teaching.

Innovative Methods and Means of Teaching Chinese Cultural Identity

University English programs in colleges and universities should innovate the methods and means of teaching Chinese cultural identity, use information technology, network platforms, virtual reality and other modern educational technologies to create a characteristic and attractive teaching scenario of Chinese cultural identity, set up a platform for exchanges and cooperation with relevant institutions and individuals at home and abroad, and expand the space and channels for teaching Chinese cultural identity. Specifically, university English programs in colleges and universities should do the following:

Use information technology, network platforms, virtual reality and other modern educational technologies to provide rich resources and support for the teaching of Chinese cultural identity such as the use of multimedia, network video, e-books, etc. to present the image and sound of the excellent traditional Chinese culture, and the use of online courses, online tests, cloud assignments, etc. to improve the quality and efficiency of the teaching of Chinese cultural identity.

Build a platform for exchanges and cooperation with relevant organizations and individuals to provide a broader vision and opportunities for teaching Chinese cultural identity, such as using social media, online forums, blogs, etc., communicating and exchanging with experts, scholars, cultural institutions, etc., and dialoguing and interacting with celebrities and famous personalities by using tele-videoconferencing, online interviews, webcasts, etc., and learning about the inheritance of Chinese outstanding traditional culture in different regions and innovation in different regions.

Expand the space and channels for teaching Chinese cultural identity, and provide more diversified forms and ways for teaching Chinese cultural identity, such as using libraries, museums, art galleries, etc. on and off campus for field visits and study, using on-campus clubs and organizations, volunteer teams, etc., to carry out practical activities and volunteer activities, participating in the preservation and promotion of the excellent traditional Chinese culture, and using on-campus and off-campus exchange programs, study abroad programs, etc., to visit and study abroad, experience the cultural customs of different regions and countries, and disseminate the excellent traditional Chinese culture. We also make use of exchange programs and study abroad programs to visit and study abroad, to experience the culture and customs of different regions and countries, and to spread the excellent traditional Chinese culture.

Intelligent Recommendation Model for Similar English Topics Combined with Deep Knowledge Tracking

In order to explore an innovative model of English teaching that integrates traditional culture and artificial intelligence, this paper constructs a dynamic key-value network (DKVMN-F) based on forgetting behaviors in sequences to track students’ in-depth knowledge, and on the basis of which, it proposes a recommendation model for similar English topics based on heterogeneous data-attention neural network (HANN).

Dynamic key-value memory network based on sequence forgetting behavior

In this paper, we introduce features such as repetition interval and number of past answers for modeling on the basis of Dynamic Key-Value Memory Network (DKVMN) [26], and integrate forgetting factors and learning factors in the prediction part through a gating mechanism, to propose a Dynamic Key-Value Memory Network (DKVMN-F) based on forgetting behaviors in a sequence.

Problem definition

Given a respondent’s answer sequence X=({q1,y1},{q2,y2},,{qt1,yt1}) , where qt is the question number and yt ∈ {0, 1} indicates whether the respondent answered the question correctly or not, with correct being 1 and incorrect being 0. The main task of knowledge tracking is to predict the probability that the respondent will answer a new question correctly p(yt=1|qt,X) in time t, based on the given answer sequence X.

Sequence feature extraction

The desired features are extracted based on the sequence X:

Repetition Time Gap (RTG), i.e., the delay between a student’s answering interaction for an exercise in the same knowledge point and the previous interaction.

Past Answer Count (PTC), i.e., the number of times a student has already answered a question for an exercise in the same knowledge point. The RTG for the first occurrence of a question is set to an infinite value, and the PTC is set to 0. In order to avoid the data from being too sparse, the extracted features are subjected to the 1b() operation.

Model framework construction

Based on the overall architecture of DKVMN, the features of repetition interval and past answer times are added to its prediction part, and then the gating mechanism is utilized for feature fusion, and the improved DKVMN-F network structure is shown in Fig. 1. The DKVMN-F framework consists of embedding layer, key processing layer, reading layer, writing layer, and prediction layer.

Figure 1.

DKVMN-F Model Framework

The implementation process of DKVMN-F model is as follows:

Initialization of mastery level. In DKVMN, the key matrix stores the potential knowledge points and the value matrix stores the mastery of knowledge points. Set the dimensions of key matrix (size N × dk) and value matrix (size N × dv), where N is the number of potential knowledge points, dk, dv are the dimensions of embedding, and randomly initialize the key matrix and value matrix.

Answer sequence embedding. Embed exercise number qi and tuple (qt,yt) in the embedding layer. Calculate the joint representation of tuple (qt,yt) as qyt=2qt+yt , and then multiply qt and qyt by the embedding vectors A and B, respectively, to obtain the embedding of exercise number kt and the joint embedding vt, the dimension of kt is dk, and the dimension of vt is dv. The One-Hot coding (One-Hot coding) of the repetition interval and the number of past answers features were multiplied by the embedding vectors WRTG and WPTG, respectively, to obtain the learning factor vector lt and the forgetting factor vector fi.

Knowledge point correlation is calculated by embedding the inner product of kt with each cell Mk(i) in the key matrix Mk=(Mk(1),Mk(2),,Mk(N)) through the exercise number at the key processing layer, and then performing Softmax activation to obtain the correlation weights between kt and each cell in the key matrix: wt(i)=Softmax(ktTMk(i))

That is, its correlation with each potential knowledge point.

Knowledge point mastery calculation. The knowledge point mastery required for exercise number qt is calculated at the reading layer based on the weights wt obtained from the correlation calculation, i.e., the weighted sum of the inner product of wt and each cell in the value matrix Mv=(Mv(1),Mv(2),,Mv(N)) , the required knowledge point mastery: rt=t=1Nwt(i)Mtv(i)

The residual linkage method was used to combine the exercise number embedding kt with the knowledge point mastery rt and to introduce the learning factor vector lt to obtain a vector of summarized mastered information: ot=Tanh(W1T[rt;kt;lt]+b1)

Where W1, b1 are the weight matrix and bias vector of the mastered knowledge respectively, so as to improve the performance of prediction.

Knowledge mastery update, the update operation is performed on the value matrix Mtv at the write layer, i.e., based on the question qt and the answer status yt to qt, the knowledge mastery of the student is obtained. For the embedding vector vt, the forgetting information vector et is computed first: et=Sigmoid(WeTvt+be)

where We, be are the weight matrix and bias vector of the forgotten information, respectively.

Next, the value of each knowledge point in the value matrix after forgetting is calculated: M˜t+1(i)=Mt(i)(1wt(i)et)

Where wt(i) is the correlation weight with each knowledge point obtained using correlation calculation.

Then the vector of added information learned from the topic is calculated at: at=Tanh(WaTvt+ba)

where Wa, ba are the weight matrix and bias vector of the added learning information, respectively.

Finally the updated memory matrix is: Mt+1(i)=M˜t+1v(i)+wt(i)at

Answer prediction, using the forgetting gate mechanism to integrate the degree of mastery of knowledge points to obtain the total vector of learning information ot and the total vector of forgetting: fst=wtft

where ft is the forgetting factor vector.

First, the gate vector gtRN is computed for controlling the reading and writing of memories: gt=Sigmoid(Woot+Wffst)

where WoRN×N and WfRN×N are the weight matrices of total learning information and total forgetting information, respectively.

Next, the combined information vector gltRN integrating forgetting and learning is obtained: glt=gtot+(1gt)fst

Finally, the combined information vector glt is sent to the Sigmoid layer to calculate the probability of correctly answering the current exercise: pt=Sigmoid(WgTglt+bg)

where WgRN, bg are the weight matrix and bias vector of the combined information.

Model optimization. The model is optimized using the cross-entropy loss function between the predicted exercise answer status and the real answer status: L=t(rtlogpt+(1rt)log(1pt))

Neural Network Based Recommendation Model for Similar English Topics

In order to recommend suitable topics for students under limited resources in AI-based English teaching, this paper proposes a design scheme for recommending topics based on the attention-based neural network framework (HANN) by extracting heterogeneous data information features (topic-text, topic-knowledge, and topic-source code) of the topics.

Problem definition

For any two topics Ea and Eb, this paper uses S(Ea,Eb) to evaluate the similarity between them. The higher the score of S(Ea,Eb) , the higher the similarity between Ea and Eb. In the absence of loss values, the Recommended Similar English Topics (RSE) can be formulated as: F(E,R,Θ)Rs

Θ is the parameter of F, R=(E1,E2,E3,) is the set of similar topics for topic E, and Rs=(E1s,E2s,E3s,) is derived in descending order according to the similarity score s(E,E1s),s(E,E2s), , for E the higher the similarity, the higher the score he gets.

In order to solve the above problem, this paper proposes a two-phase solution that includes a training phase and a testing phase. In the training phase, the text of the topic, the knowledge points of the topic and the source code of the topic are given. In this paper, we propose the HANN framework to learn a unified semantic representation of each topic through multi-modal processing of heterogeneous and structured data, while calculating the similarity scores of each team of topics. Then a pairwise loss function is utilized to train the HANN. After obtaining the training results of the HANN, for any topic Ea which is in the testing phase, its similar topic (Ea,1s,Ea,2s,) can be found and the set of similar topics is sorted by ranking them based on the similarity score.

Attention Neural Networks for Heterogeneous Data

The specific flow of the Heterogeneous Data Attention Neural Network (HANN) framework is shown in Fig. 2. HANN consists of three main parts: the Heterogeneous Data Representation Layer (HERL) of the topic, the Attention Layer (SA), and the Prediction Scoring Layer (SL).

Figure 2.

Flowchart of the HANN framework

HERL layer

For the input of HERL, firstly, the topic text features are extracted, the topic knowledge points are transformed into a knowledge point matrix (Q matrix), and the topic codes are one-hot coded to extract the code features. Then, an Attention-based Long and Short-Term Memory Network (Attention-based LSTM) is designed to learn a unified semantic representation of each topic by integrating different topic data through LSTM [27].

Topic Input: for the HERL framework the input is the data of the topic E, e.g., the text (ET), the topic source code (EC) and the topic knowledge matrix (EQ), which will be (ET) formalized as N a sequence of pairs of words ET=(w1,w2,wN) where wid0 , the word2vec initialized by a d0 trained word embedding. for the topic code, the keyword information of the code is extracted, and then encoded in a one-hot encoding, the code in the topic E can be represented by A matrix representation EC=(k1,k2,kL){0,1}L×Latt where ki is a one-hot vector whose dimension is equal to the total number of keywords in all the codes in the topic library, and Latt is the number of keywords of the codes in the topic E. The knowledge points of the topic will be represented as Q matrix information EQ=(q1,q2,qL){0,1}M×Natt .

Source code embedding: because the one-hot representation of the code’s keywords is too sparse to be trained, this paper uses an embedding operation to transform the initialized vector of information about the code’s keywords into a low-dimensional vector with dense values. Formally, for the keyword of the code ki, is transformed into vector ui: ui=kiWu

where WuLatt×d2 is the parameter of the embedding layer and uid2 is the output parameter of the embedding layer. Thereby, the source code keyword EC is transformed into a matrix u=(u1,u2,uL)L×d2 .

Attention based LSTM after obtaining the feature representations of the knowledge points and the code, by integrating all the heterogeneous data entered into the topic E, i.e., text representation, Q matrix and code representation. In each topic, different parts of the text are associated with different source codes and knowledge points. Therefore, in this paper, an attention-based LSTM architecture is designed to learn the representation of each topic as a neural network, where two attention strategies are used in this paper, one for text-source code (TCA) and one for text-knowledge points (TQA), which captures the text-source code (TCA) and text-knowledge point (TQA) associations, respectively.

In this paper, we use LSTM based architecture to learn the representation of word sequence topics of any length. The input to the LSTM network is all the data for each topic of the sequence x=(x1,x2,xN) . The hidden state h of the input step of step t, is updated as: it=σ(WXixt+Whiht1+bi)ft=σ(WXfxt+Whfht1+bf)ot=σ(WXoxt+Whoht1+bo)ct=ftct1tanh(WXcxt+Whcht1+bc)ht=ottanh(ct)

Where i·, f·, c·, and o· are the input gates, forget gates, memory cells, and output gates of the LSTM respectively. w· and b· are the learned weight matrix and bias values.

Obviously, in each input step, xt is a multimodal vector that combines text, knowledge points and source code, i.e: xt=Wtu^tq^t

where ⊕ is the operation of joining two vectors into one long vector, Wt is the representation of the step t word vector in Text ET, and u^t and q^t are the representations of the source code keywords and knowledge matrices of the topic, learned by TCA and TQA, respectively.

TCA aims to capture the association between text and source code. In TCA, step t input step, the association between each keyword of text Wt and source code is first measured. Since the keywords of the source code are related to the phrases in the text section, and ht−1 has information about the words before the input step t, it is necessary to take ht−1 into account when measuring the associations. Then, the related source code keyword representation u^t is modeled as a vector, and the result of the weighted summation by u is denoted as: u^t=j1Lαjujαj=φ(uj,wt,ht1)i1Lφ(ui,wt,ht1)φ(uj,wt,ht1)=Vactanh(Wac[ujwtht1])

where Vac and Wac are the learning parameters of TCA, while in topic E, (uj,wt,ht1) measures the association between the source code keywords uj and wt in step j, and αj represents the attention parameter normalized to φ(uj,wl,hi1) .

The goal of TQA is to capture text-knowledge point associations. Similar to TCA, the wavelet transform related knowledge point representation q in TQA can be modeled in the form of Eq. (17), where one can simply use qj and the learned TQA parameters vaq and Waq instead of uj, Vac, and Wac.

By attention based LSTM, the hidden state sequence h=(h1,h2,hN) can be obtained in combination with the input sequence x. In addition, this approach comes from the LSTM application in natural language processing, and finally the hidden state hN possesses the semantic information of the topic E as a whole of the input sequence x, so the semantic representation of hN is used for r(E), i.e., r(E) = hN. In addition, step t hides the state ht to save only the information of the sequence (x1,x2,xt) . Therefore, further h(E)=h is denoted as the representation of the topic E heterogeneous data, thus obtaining the unified semantic representation (r(E),h(E)) of E.

Attention Mechanism (SA)

In this paper, similar attention mechanisms are designed to measure the similar parts of two exercises and learn their attentional representations. In terms of the attention mechanism, an attention matrix A is used to measure the similar parts of the input topic pair (Ea,Eb) by calculating the cosine similarity between each part of Ea and Eb and h(Ea) and h(Eb) . ANEa×NEb can be represented as: Aij=cos(hi(Ea),hj(Eb))

where 1<i<NEa , 1<j<NEb , NEa and NEb are the lengths of the word sequences in Ea and Eb respectively. hi(Ea) is the step i representation in h(Ea) and hj(Eb) is the step j representation in h(Eb) .

With Attention Matrix A, it can be found that Total Score si(Ea)=k=1NEbAi,k actually measures the summed similarity between the step i representation in h(Ea) and each step representation in h(Eb) . Similarly, the total score value sj(Eb)=k=1NEaAj,k measures the summed similarity between the step j representation in h(Eb) and each step representation in h(E0) . Therefore, the two similarity score vectors s(Ea) and s(Eb) are denoted as similarity attention representations of Ea and Eb, respectively.

For topics Ea and Eb, the semantic attentional representations of exercise topics Ea and Eb, i.e., moments h(Ea) and h(Eb) of t, can be modeled with the weighted sums of hatt(Ea) and hatt(Eb) , respectively: hatt(Ea)=i=1NEaAi,NEahi(Ea)hatt(Eb)=i=1NEbANEb,jhi(Eb)

With the help of the attention mechanism, the attention matrix A can be obtained and the similar attention representations (s(Ea)) and (s(Eb)) and semantic attention representations hatt(Ea) and hatt(Eb) of the input topic pair (Ea*Eb) can be learned.

Prediction Score (SL)

The goal of the prediction scoring layer is to compute the similarity score for each topic pair and find the similarity score for each pair by using the similarity score to rank the set of similar topics. By utilizing the similarity scores of it topic pairs (Ea,Eb) . First they are connected to a vector i.e. z˜ab=r(Ea)r(Eb)S(Ea)S(Eb)hatt(Ea)hatt(Eb) . Then the similarity score S(Ea,Eb) is obtained by using two fully connected networks with nonlinear activation functions ReLU(x) = max(0, x) in the first one and Sigmoid function in the second one. i.e.: o˜ab=ReLU(W1z˜ab+b1)S(Ea,Eb)=σ(W2o˜ab+b2)

where W1, b1, W2, and b2 are network parameters.

HANN Learning

In this section, a pairwise loss function is specified in order to train the HANN. In the training phase, for topic E, sim(E) is used to denote the labeled similar topics and the unlabeled topics are considered as dissimilar DS(E). Considering that the similarity score of similar pairs (E,Ex) is higher than that of dissimilar pairs (E,Eds) , where EsSim(E), EdsDS(E), the pairwise loss function is further constructed as: L(Θ)=E,Es,Edmax(0,μ(S(E,Es)S(E,Eds)))+λΘΘ||2

where s(,) is calculated by Eq. (20). Θ is all the parameters of the HANN, and λΘ is a regularized hyperparameter. μ is an edge parameter that forces S(E,Es) to be greater than S(E,Eds) . In this way, HANN can be learned by directly minimizing the loss function L(Θ) using Adam [28].

Model application experiment and analysis
DKVMN-F model experiment and analysis
Experimental setup

In this section, we experimentally validate the effectiveness of the knowledge tracking approach based on the DKVMN-F model by comparing the performance of HRGKT with five benchmark models on four real English online education datasets, namely, ASSISTments 2009, Statics2011, ASSISTments2012, and Synthetic-5, and at the end of the ablation experiment to observe the effectiveness of the core module of the DKVMN-F model.

The benchmark models used are shown below:

DKT: Modeling students’ knowledge states using RNN.

Bi-CLKT: Contrastive learning with global bi-layer and local bi-layer structures was used, and graph-level and node-level GCNs were applied to extract information about exercise-to-practice and concept-to-concept relationships, respectively.

DKVMN: A key-value matrix is used to track students’ knowledge status of each knowledge point in real time.

LFKT: Taking students’ forgetting behavior into account, four forgetting influencing factors are comprehensively considered to simulate students’ forgetting behavior in the learning process.

DKT+forget: extends the DKT model based on DKT by adding forgetting characteristics calculated from the question-answer sequences, taking into account students’ forgetting behaviors.

In the experiment, AUC (area under the curve) evaluation index was used to assess the performance of the model. The ROC curve, which is fully known as receiver operating characteristic curve, is plotted with the false positive rate (FPR) as the horizontal coordinate and the true positive rate (TPR) as the vertical coordinate. As the ROC curve converges more to the upper left corner, it means that the performance of the classification model performs better. The value of AUC is equal to the area under the ROC curve, and the larger the value of AUC, the better the performance of the model.

The model in this paper is experimented on a host computer with 32 GB of memory, Intel Core i7, and RTX 3060. The model randomly divides the dataset in a ratio of 7:2:1 into 3 parts: training set, validation set and test set.

Model comparison experiments

The results of AUC comparison between the DKVMN-F model of this paper and the five benchmark models on the four publicly available datasets are shown in Table 1. The AUC values of the DKVMN-F model on the four datasets are 0.8669, 0.8670, 0.7757, and 0.8701, which are all better than the other five models. Relative to the other models, the DKVMN-F model achieves an average AUC value improvement of 7.86%, 5.17%, 4.49%, and 6.86% on the 4 datasets, respectively, which proves the superiority of this paper’s model.

Comparison of AUC on different data sets

Model ASSISTments2009 Statics2011 ASSISTments2012 Synthetic-5
DKT 0.7437 0.8142 0.7197 0.7598
DKT+forget 0.7528 0.7540 0.7354 0.7621
DKVMN 0.8254 0.8395 0.7362 0.8370
LFKT 0.8569 0.8619 0.7605 0.8614
Bi-CLKT 0.8549 0.8628 0.7617 0.8646
DKVMN-F 0.8669 0.8670 0.7757 0.8701
Model ablation experiments

The DKVMN-F model proposed in this paper introduces features such as the number of past answer times and repetition intervals on the basis of the DKVMN model, i.e., it integrates learning and forgetting behaviors, which are named as learning module and forgetting module, respectively. In this section, ablation experiments will be conducted to validate the performance of the learning module and forgetting module in DKVMN-F. The values of AUC for different variants of HRGKT on the four datasets are shown in Table 2. DKVMN-F (basic) denotes the baseline model without the addition of the learning module and the forgetting module, DKVMN-F denotes the final knowledge-tracking model, and the values of AUC for different variants of HRGKT are shown in Table 2. DKVMN-F (without forget) and DKVMN-F (without learning) models, while DKVMN-F (without learning) denotes the model with only learning module or forgetting module added, respectively.

AUC values of DKVMN-F variants on different datasets

Model ASSISTments2009 Statics2011 ASSISTments2012 Synthetic-5
DKVMN-F (basic) 0.8288 0.8386 0.7412 0.8406
DKVMN-F (without forget) 0.8481 0.8586 0.7560 0.8610
DKVMN-F (without learning) 0.8463 0.8587 0.7519 0.8567
DKVMN-F 0.8669 0.8670 0.7757 0.8701

As can be seen from Table 2, DKVMN-F (basic) has the lowest AUC, which is due to the fact that this model does not use learning and forgetting modules, and does not take into account past learning behaviors as well as the students’ forgetting behaviors. DKVMN-F (without forget) ignores the students’ forgetting behaviors, and its predictive performance decreases compared to that of DKVMN-F, which suggests that the factors affecting the forgetting are effective in accurately tracking students’ knowledge status, and forgetting behavior has an impact on students’ knowledge status. DKVMN-F (without learning) ignores students’ past answer information, and the predictive performance of the model is not as good as that of the DKVMN-F model, and decreases, the results indicate that the past answer information captured by the learning module is effective in improving the accuracy of the model.

In order to verify the effectiveness of the forgetting module, this paper compares the prediction of knowledge state by DKT+forget, DKVMN and DKVMN-F models, and the results of the comparison of knowledge state by different models are shown in Figure 3. Where the arrows indicate the chronological order. Ten questions answered by a randomly selected student from ASSISTments2009, which cover five knowledge points. From Fig. 3, it can be seen that DKT+forget at the moment of t=10 predicts lower results for the knowledge point k3. Compared with DKT+forget, the probability of forgetting k3 is lower than that of DKT+forget because DKVMN-F is able to capture the fact that Q6 at moment t=10 and Q3 at moment t=9 have the same knowledge point. at moments t=13, t=15 DKT+forget predicts lower probability of k4, which is due to the fact that Q4 and Q10 are appearing for the 1st time in the answer sequence in the sequence of answers, and it is not possible to capture the relationship between the two questions that contain the same knowledge points. In addition, DKVMN-F has a lower prediction probability than DKVMN at moment t=16 after learning k5 at moment t=12 without learning k5 in between because DKVMN does not consider the forgetting factor. Therefore, it can be proved that the forgetting module is very effective in knowledge tracking.

Figure 3.

Comparison results of knowledge states of different models

In order to verify the effectiveness of the learning module in knowledge tracking, 120 knowledge points in the ASSISTments2009 dataset are clustered, using the same icon to indicate similar knowledge points, and the knowledge points are renumbered for ease of representation, and the results of the dataset clustering are shown in Figure 4.

Figure 4.

ASSISTments2009 Data clustering class

In Figure 4 the knowledge points are divided into 9 categories, which shows that the learning module is able to categorize the knowledge points of the topic well, and is able to obtain better information about the students’ previous answers and the relationship between the two, which can prove the effectiveness of the module in knowledge tracking.

Experiments on English topic recommendation based on HANN modeling

In order to measure the effectiveness of the proposed heterogeneous data attention neural network (HANN)-based similar English topic recommendation model, this paper conducts model comparison experiments and hyperparameter experiments on it.

Experimental setup

Data sets

In this paper, two types of datasets are designed to validate the effectiveness of sequence recommendation models. The first type is two regular sequence recommendation model datasets: MovieLens and Gowalla. MovieLens is a movie rating dataset created by the GroupLens project team. MovieLens-1M (ML-IM), which contains 1M ratings, is selected for the experiment. In addition, Gowalla is a social location-based website that allows users to share their location to realize the punch card function, and the check-in dataset in Gowalla is used in this experiment. The second category is the educational dataset ASSISTments2009.

Baseline model

In this paper, the following baseline model is used to conduct a comparison experiment with the HANN model to verify the effectiveness of the model:

Pop: i.e., popularity-based recommendation method, is a recommender system method that ranks items based on their popularity.

BPR: is an algorithm designed for personalized recommender systems that specifically targets users’ implicit feedback data to generate personalized item rankings.

FPMC: is a recommendation model that combines matrix factorization (MF) and first-order Markov chain (MC).

GRU4rec: is a recurrent neural network (RNN)-based recommendation model designed to predict user preferences by capturing dynamic changes in user behavioral sequences.

Caser: is a convolutional neural network (CNN)-based sequential recommendation method that captures the representation of a user’s interests by performing convolutional operations on the first L items of the user’s behavior in two dimensions.

RCNN: is a hybrid neural network model that combines RNN and CNN.

CosRec: is a model designed for sequential recommendation tasks, which captures complex patterns and contextual information in a sequence of user behaviors by utilizing a two-dimensional convolutional neural network (2DCNN) to accurately predict a user’s future behavior or preferences.

SCosRec: is a method to further optimize the sequence recommendation model. By introducing symmetrically generated convolutional layers, the predictive power of the model is enhanced while reducing the computational and memory requirements of the model.

Comparison experiments

The results of the performance comparison between the HANN model proposed in this paper and the eight baseline models listed, on the MovieLens, Gowalla, and MovieLens as well as ASSISTments2009 datasets are shown in Tables 3, 4, and 5, respectively.

Comparison of model performance on ML-1M

Metric MAP Precision@1 Precision@5 Precision@10 Recall@1 Recall@5 Recall@10
Model
Pop 0.0694 0.1279 0.1122 0.1009 0.0055 0.0225 0.0369
BPR 0.0914 0.1472 0.1292 0.1183 0.0067 0.0301 0.0564
FPMC 0.1033 0.2004 0.1674 0.1448 0.0132 0.0454 0.0772
GRU4rec 0.1435 0.2512 0.2135 0.1914 0.0158 0.0618 0.1094
Caser 0.1512 0.2501 0.2189 0.1994 0.0146 0.0634 0.1118
RCNN 0.1681 0.2830 0.2491 0.2225 0.0191 0.0728 0.1267
CosRec 0.1895 0.3297 0.2831 0.2493 0.0211 0.0831 0.1441
SCosRec 0.1969 0.3447 0.2930 0.2585 0.0228 0.0886 0.1526
HANN 0.1971 0.3349 0.2971 0.2641 0.0238 0.0924 0.1613

Comparison of model performance on Gowalla

Metric MAP Precision@1 Precision@5 Precision@10 Recall@1 Recall@5 Recall@10
Model
Pop 0.0226 0.0519 0.0359 0.0282 0.0051 0.0272 0.0407
BPR 0.0756 0.1637 0.0983 0.0736 0.0236 0.0747 0.1083
FPMC 0.0762 0.1556 0.0934 0.0694 0.0248 0.0726 0.1065
GRU4rec 0.0582 0.1049 0.0733 0.0781 0.0151 0.0514 0.0831
Caser 0.0922 0.1955 0.1145 0.0575 0.0315 0.0858 0.1228
RCNN 0.0768 0.1767 0.0973 0.0737 0.0269 0.0708 0.1030
CosRec 0.0982 0.2139 0.1187 0.0872 0.0335 0.0878 0.1305
SCosRec 0.1011 0.2192 0.1192 0.0892 0.0343 0.0918 0.1317
HANN 0.1027 0.2199 0.1246 0.0918 0.0367 0.0948 0.1347

Comparison of model performance on ASSISTments2009

Metric MAP Precision@1 Precision@5 Precision@10 Recall@1 Recall@5 Recall@10
Model
Caser 0.0303 0.0490 0.0451 0.0432 0.0028 0.0135 0.0249
RCNN 0.0327 0.0489 0.0481 0.0491 0.0022 0.0131 0.0269
CosRec 0.0375 0.0591 0.0544 0.0515 0.0038 0.0177 0.0354
SCosRec 0.0394 0.0604 0.0572 0.0537 0.0036 0.0211 0.0367
HANN 0.0431 0.0661 0.0614 0.0565 0.0041 0.0218 0.0389

In the MovieLens and Gowalla datasets, the sequential model (e.g., Caser) was found to outperform the non-sequential model (e.g., BPR) in terms of performance across the baseline methods. This observation validates the significant improvement of the sequential model for model prediction. This may be due to the fact that the sequential model better takes into account the sequential relationship between items. When comparing the different forms of sequence modeling, it is noted that FPMC models through first-order Markov chains, while CosRee achieves superior results by capturing higher-order sequence relationships. This demonstrates how considering higher order sequence relationships may help improve the accuracy of model predictions. Three different convolutional neural methods were compared in the dataset; Caser, CosRec and HANN. Caser utilizes a one-dimensional convolutional kernel to capture the evolution of a user’s interests, CosRec models this by pairwise encoding and using a two-dimensional convolution to capture more complex interactions between items. And unlike these two, the model HANN proposed in this paper introduces a contrast learning framework for heterogeneous and structured data, utilizing an LSTM based on the attention mechanism as well as pairwise loss functions to accomplish the contrast framework. This makes HANN largely outperform the previous two in terms of modeling effectiveness, and HANN achieves a performance improvement of 0.5%-7.5% on both MovieLens-1M and Gowalla datasets. This may be due to the fact that contrast learning provides a deeper representation of information when learning complex feature relationships.

The experimental results on the ASSISTments2009 dataset show that HANN outperforms other models on several evaluation metrics. Specifically, on the MAP metric, there is a gradual improvement from 0.0303 for Caser-DR to 0.0431 for HANN. On Precision@1, HANN reaches a maximum of 0.0661, which is a significant improvement compared to 0.0591 for the CosRec base model. A similar increasing trend is observed on Precision@5 and Precison@10 metrics. In terms of recall, HANN also shows the best performance, especially on Recall@5 and Recall@10, reaching 0.0218 and 0.0389, respectively. These experimental results fully demonstrate the effectiveness of adding an exercise similarity attention mechanism to the HANN model, especially the significant improvement in precision and recall, which suggests that the model has a strong potential to deal with education recommendation domain problems, demonstrating the strong potential and practical value of the model to be applied to the task of intelligent recommendation of exercises in English teaching.

Hyperparametric experiments

This experiment is used to compare the impact of learning loss weights on the model effect, and analyze the performance change of HANN model by setting different comparison learning weights λ. The experimental results of selecting Precision@5 and Recall@5 as the comparison indexes on the ASSISTments2009 dataset are shown in Fig. 5.

Figure 5.

Influence of different comparative loss weights on Precision@5 and Recall@5

From the experimental results in Fig. 5, it can be seen that the model performance is more sensitive to specific weight coefficients. When the loss weight λ is adjusted to a lower value, especially around 0.1, the model shows better performance than other weight settings in both precision (Precision@5) and recall (Recall@5). This phenomenon suggests that lower loss weights help the model to focus more effectively on its primary task of learning, while when the weights are set to 0, the model will not use the contrast learning framework for data expansion and will be the baseline model for this paper.

An Innovative Model of English Teaching Based on Knowledge Tracking and Personalized Recommendation

Based on the deep knowledge tracking model DKVMN-F and the HANN English topic intelligent recommendation model HANN built in this paper, this paper designs an innovative English teaching model that integrates traditional culture and artificial intelligence. This English teaching model is concretely realized through learner portraits and game tasks.

Application of Learner Profiling with the Introduction of Knowledge Tracking Models

Learner Portrait refers to the collection and analysis of students’ learning behaviors, learning outcomes and other data, and then abstract the collection of students’ personalized characteristic labels. Learner Portrait provides a more accurate and personalized basis for English teaching by providing a detailed portrayal of students’ multi-dimensional characteristics such as learning styles, interests, and abilities.

These labels include learning styles, interests, ability levels and other aspects, which can comprehensively and objectively reflect students’ learning status and needs. The construction of learner profiles is a systematic process, which mainly includes three links: data collection, data analysis and profile formation. In the data collection process, attention is paid to students’ learning behaviors, learning outcomes and other aspects of the data. In data analysis, data mining and other technical means are used to process and analyze the collected data. In the portrait formation session, based on the results of data analysis, students’ learning characteristics are abstracted and summarized to form a personalized learner portrait.

In order to better understand the students’ English knowledge mastery, this paper introduces the deep knowledge tracking model DKVMN-F and the English topic intelligent recommendation model HANN, which form a refined learner portrait by tracking and analyzing the students’ learning in each knowledge point. By combining the results of the learner portrait and the DKVMN-F model, we provide more accurate and personalized teaching resources and tutoring strategies for each student’s learning characteristics and needs. According to the students’ learning progress and ability level, the HANN model is used to recommend appropriate learning paths and expansion content for students. For students who have difficulty in theoretical learning, tutoring courses and practical application sessions are added to deepen their understanding of theory through practice. For students with strong learning ability, more advanced English topic challenges and practical projects are provided to meet English learning needs. The online personalized recommendation strategy based on learner behavior is shown in Figure 6.

Figure 6.

Online personalized recommendation strategy

By adopting personalized teaching strategies, students’ motivation, engagement, and interest in English are improved. At the same time, through the application of learner profiling and knowledge tracking models, teachers are able to more accurately understand students’ learning needs and difficulties, so as to provide more effective guidance and assistance.

Task-Driven Instructional Design for Games Based on Learner Profiles
Integration of learner profiling and game task-driven instruction

The learner portrait, a tool that comprehensively describes students’ learning characteristics, interests, and cognitive styles, brings unprecedented precision to the design of English language instruction. In order to build an effective learner profile to support game task-driven instruction, we first collect multi-dimensional information on students’ historical learning data, course performance, and practical use of English.

Through data analysis and mining, students’ learning preferences, strengths and weaknesses are identified, and an accurate learner profile is drawn for each student. In the game task-driven teaching mode, this portrait information is utilized to design tasks for students that match their interests and ability levels.

Design and implementation of teaching activities

Taking the 21st grade students majoring in Artificial Intelligence in the School of Computer and Software of Dalian Neusoft School of Information as an example, a game task-driven teaching design was carried out based on the learner portrait. The specific implementation steps are as follows.

Analyze the learner portrait and determine the teaching objectives. Analyze the learner portraits of the participating students, including information about their learning styles, interest preferences, English proficiency, oral communication skills, and so on. Through data analysis, identify students’ commonalities and differences in order to provide them with personalized learning support and challenges. Meanwhile, the instructional objectives were designed to:

Enhance students’ practical application of English.

Enhance students’ ability to think creatively and extend their knowledge.

To develop students’ ability to speak English and express traditional culture.

Application of Teaching Strategies Integrating Traditional Culture

Combined with the teaching strategy of integrating traditional Chinese culture into English courses proposed in this paper, traditional culture is integrated into interesting games and teaching links, combined with artificial intelligence technology, creating a traditional Chinese cultural environment, guiding students to realize the English paraphrase and creative expression of traditional culture, enhancing students’ cultural self-confidence, and laying a foundation for promoting traditional culture to the world.

Analysis of implementation effects

After the completion of the English teaching tasks, a diversified evaluation method was used to assess the effectiveness of the teaching. Students’ course performance and their actual English proficiency were subjectively and objectively rated through preset grading criteria.

Taking the first-year students of non-English majors in College H as an example, comparing the English scores of the previous students, the average academic performance of the students has improved significantly. Compared with the pre-implementation period, the average grade increased by 13.6%. Among them, the number of good and above (80-100 points) increased by 8.8% and the failure rate decreased by 4.5%. In terms of classroom participation, according to teachers’ observation records, the number of students’ independent presentations increased by 22.8% after the implementation of the teaching model. At the same time, the frequency of students’ questions and discussions after class increased significantly.

Through the student satisfaction survey, it was found that 89.7% of the students were positive about the design of the teaching model, and 42.9% of them were very satisfied. Compared with the pre-implementation period, the satisfaction level increased by 8.9 percentage points. Students generally agreed that the model could better meet their English learning needs and improve learning outcomes.

Conclusion

In this paper, by constructing the Dynamic Key-Value Memory Network (DKVMN-F) based on forgetting behaviors in sequences and the Attention-based Neural Network framework (HANN), we realized the intelligent recommendation of English learning resources to students, thus realizing the innovative design of the English teaching model integrating traditional culture and artificial intelligence.

The AUC values of the DKVMN-F model on the four datasets are 0.8669, 0.8670, 0.7757, and 0.8701, respectively, and the model achieves 7.86%, 5.17%, 4.49%, and 6.86% of the average AUC value enhancement on the four datasets, respectively, relative to the other models, which verifies the superiority of the model in this paper. Meanwhile, in the model ablation experiments, the models are DKVMN-F, DKVMN-F (without forget), DKVMN-F (without learning), DKVMN-F (basic) in descending order of AUC value. This indicates that the DKVMN-F model achieves improved prediction performance by introducing learning and forgetting modules.

On MovieLens and Gowalla datasets, the HANN model achieves 0.5%-7.5% improvement in prediction performance compared to both Caser and CosRec models, with the best overall prediction performance. On the ASSISTments2009 dataset, HANN outperforms the other models in several evaluation indexes, which fully proves the effectiveness of adding an exercise similarity attention mechanism to the HANN model and significantly improves the model’s precision and recall party, indicating that the model has a better application in the intelligent recommendation of exercises in English teaching.

After the implementation of the designed innovative model of English teaching, the average academic performance of students increased by 13.6% compared with the pre-implementation period, which is a significant improvement in teaching effectiveness. In addition, 89.7% of the students were positive about the design of the teaching model, of which 42.9% were very satisfied, and the teaching satisfaction increased by 8.9% compared with that before the implementation of the model, which verified the effectiveness of the English teaching model.

Lingua:
Inglese
Frequenza di pubblicazione:
1 volte all'anno
Argomenti della rivista:
Scienze biologiche, Scienze della vita, altro, Matematica, Matematica applicata, Matematica generale, Fisica, Fisica, altro