An Intelligent Classification Method for Online Resource Data of College Language Teaching Based on Deep Reinforcement Learning

With the continuous development of science and technology and the reform and innovation of education, digital education has become a hot topic in today’s education world. In language teaching in colleges and universities, the application of online resources provides students with a new way of learning and a broader learning space [1-4]. In the huge digital teaching resources, it is difficult for teachers and students to find the required resources, for this reason, it is of great significance to carry out the classification of digital teaching resources to promote the efficiency and quality of teaching [5-7].

Digitization of teaching resources is the process of transforming traditional teaching resources (such as books, textbooks, courseware, etc.) into digital form for storage, dissemination and utilization. This process includes not only scanning paper materials into electronic documents, but also converting non-text resources such as instructional videos, audios, interactive simulation experiments, etc. into digital formats [8-11]. Through digital teaching resources, educators are able to break through the limitations of time and space to provide students with richer and more diverse learning content. The advantages of digital teaching resources lie in their convenience, accessibility and interactivity, and students can access these resources anytime and anywhere via the Internet for independent learning and inquiry [12-15]. Teachers can also integrate teaching resources more easily and design more innovative and interactive teaching activities. Digital teaching resources can also support students with special needs, such as those with visual or hearing impairments [16-19].

With the continuous development of artificial intelligence applications, deep learning is increasingly used in various fields. Reinforcement learning, as a machine learning method that learns optimal behavioral strategies by interacting with the environment, has been widely researched and applied in resource scheduling, such as data centers, terminal devices, cloud services, wireless networks, and other fields [20-23]. In the classification of digital resources for language teaching in higher education, reinforcement learning learns, optimizes and classifies complex digital resources autonomously by interacting with the environment [24-25].

After the preliminary study of deep reinforcement learning in this paper, based on the characteristics of online educational resources, we try to perform feature extraction on online language teaching resources in colleges and universities, identify the key features from the online resource library and perform preliminary resource classification on the resources. Secondly, the intelligent classification model of college language online teaching resources based on DRML is constructed by optimizing the node combination problem of the original deep reinforcement learning. Taking text resources and image resources, which account for the largest proportion of university language online teaching resources, as experimental objects, we compare the text classification and image classification performance of the DRML model with that of other classification models to verify the effectiveness of the DRML model. Finally, the users’ evaluation of the resource classification effect of DRML model is collected through a questionnaire to explore the application effect of the model.

2

Online teaching resources extraction

2.1

Deep reinforcement learning

Deep reinforcement learning consists of deep learning and reinforcement learning [26], and its main principle framework is shown in Fig. 1. Among them, deep learning is mainly used for system perception, while reinforcement learning makes reasonable decisions based on system perception to achieve specific goals. The overall process of deep reinforcement learning is as follows: the intelligent body first observes the environment to get the high-dimensional features of the environment, and uses the deep learning method to perceive to get the specific state representation. Then, it evaluates the expected value of all the actions based on the expected return and selects the corresponding actions by a specific selection strategy (e.g., greedy strategy). Finally, the environment is changed by the actions and enters a new state. Through a continuous cycle of the above steps, the optimal policy is finally learned.

The DQN algorithm is a common deep reinforcement learning algorithm [27] that uses deep neural networks to replace the value function in traditional Q-learning algorithms. Due to the ability of deep neural networks to learn high-dimensional abstract features from data, DQN can handle high-dimensional state spaces and obtain a score corresponding to each action. In DQN, after an intelligent body observes the state from the environment and performs an action, the environment provides the intelligent body with a reward and a new state. The intelligent body uses this information to update the estimate of the value function, making the estimate as close as possible to the true value. Meanwhile, since the interactions of the intelligent body are constantly changing in reinforcement learning, updating the estimate of the value function using only the current interaction history may lead to instability of the estimate, which in turn affects the training effect. Therefore, DQN employs the technique of experience playback to cache and randomly replay the previous interaction history. With the empirical replay region, DQN can update the estimate of the value function smoothly and can overfit the i.e., forgetfulness problem.

2.2

Characteristics of online educational resources

High-quality learning resources can improve learners’ learning efficiency, help online learning platforms establish trust with learners and teaching staff, and improve the authority of online learning. It is also conducive to the joint maintenance of the online teaching environment by learners and teachers, and promotes the positive cycle of resource learning.

Online education teaching resources are characterized by knowledge-intensive, rich text content and clear logical structure. Therefore, most resource classification models focus on extracting semantic features from resource text content. However, false and misleading resources are characterized by strong concealment and incitement, which are difficult to be identified directly by virtue of semantic features.

Misleading resources disguise themselves by learning to imitate the line structure and writing style of high-quality teaching resources, and induce readers to spread them quickly through inflammatory remarks. Research has shown that inflammatory text content stimulates readers’ emotions such as fear, disgust, and shock, thus increasing their willingness to spread and interact, and emotional incitement is an important reason for the spread of false misleading information. Misleading information usually uses expressions with strong emotional colors and tendencies, while genuine teaching resources tend to be objective and truthful.

The review of online learning resources must ensure that the content is accurate, authoritative and in line with academic standards to prevent the spread of false and misleading information. Learning resources also need to comply with relevant national education laws and regulations to ensure compliance of online education resources.

2.3

Resource feature extraction process

Suppose that N isomorphic intelligences are in a static unknown environment, and each of them is randomly initialized with its position in the environment at the beginning. At each time step, the intelligences collect partial observations of the environment from the environment at the current position and process the observations locally. Afterwards the intelligences choose a specific action to change their position based on the observations. The common goal of the intelligences is to recognize the key features of the resource information from a limited number of categories and classify the resources in a limited time step. Due to the architecture of centralized training distributed execution, the observations of all the intelligences are learned by the centralized training network. The specific process is shown in Fig. 2.

In order to satisfy the characteristic of decentralized execution, the movement strategy of each intelligent body can only rely entirely on its own local information. Therefore, the intelligences need to learn to extract relevant features from partial observations and rationally plan their traveling paths in the environment to find the most valuable resource information and reliably solve the classification problem. The intelligent body observation module is shown in Figure 3.

It is assumed that a partial observation o_i(t) of the environment by the intelligent body i at moment t can be obtained from the current position p_i(t) of the intelligent body: (1) $o_{i} (t) = o (I, p_{i} (t))$

where I is the whole resource, this function ensures the repeatability of the observations, and the same local observations are obtained when different intelligences are at the same location. The resource processing module performs feature extraction on the observations, internally a parameterized feature extraction network: (2) $f_{i} (t) = f_{θ_{1}} (o_{i} (t))$

where θ₁ is a trainable neural network parameter and the nonlinear mapping f consists of two single-layer convolutional neural networks and a fully connected layer.

The current position of the intelligence is also useful information, so it is more efficient to learn it together with the locally observed features of the intelligence. For the current position can be handled by a function mapping: (3) $λ_{i} (t) = λ_{θ_{2}} (p_{i} (t))$

where λ consists of a fully connected layer and a CeLU activation function and θ₂ is the trainable tensor.

The trajectory in the environment is a sequence of observations, and in order for it to learn long sequences of knowledge, a Long Short-Term Memory (LSTM) unit is used as the perceptual network of the intelligent body. The hidden state of the LSTM module of the smart body i at moment t ≥ 0 is h_i(t) and the state of the unit is c_i(t). The inputs to the perceptual network u_i(t) include information about the current smart body position and local features: (4) $u_{i} (t) = {[f_{i} {(t)}^{T} λ_{i} {(t)}^{T}]}^{T}$

The evolution of the perceptual network over time is shown in Equation (5): (5) $[\begin{array}{l} h_{i} (t + 1) \\ c_{i} (t + 1) \end{array}] = b_{θ_{3}} ([\begin{array}{l} h_{i} (t) \\ c_{i} (t) \end{array}], u_{i} (t))$

The intelligences also need to use the existing learned knowledge to categorize the information and predict the network to use the final joint hidden state h_i(t) and mapping $q_{θ_{4}}$ : (6) $q_{i} (t) = q_{θ_{4}} (h_{i} (t))$

where q consists of a fully connected layer plus a ReLU activation function plus a fully connected layer, and then the joint prediction result vector is obtained after averaging the predictions of all the intelligences: (7) $\bar{q} = \frac{1}{N} \sum_{i = 1}^{N} q_{i}$

Finally the most probable category is taken out as the result: (8) $q_{c} = \underset{j \in {1, 2, \dots, M}}{\arg \max} S o f t \max (\bar{q})$

Instead of reaching a consensus on categories, the intelligences combine the local perceptions of all intelligences to reach a single prediction.

The global reward of the environment is computed based on the category prediction and the true label of the information r. The reward function is defined as follows: (9) $r = - {({\bar{q}}_{i} - e_{c})}^{2}$

${\bar{q}}_{i}$ is the predicted probability, and e_c is the uniquely hot coding of the information’s true category label. The loss function of the perceptual network is defined as follows: (10) $L (θ) = \frac{1}{N} (\sum_{i = 1}^{N} Q (o, a) r + r)$

$Q (o, a)$ is the joint Q value of all optional actions of the smart body under the current local observation.

3

DRML-based intelligent classification model for teaching resources

This section describes the generic DRL framework used to solve the node combination optimization problem to construct the DRML resource classification model. The specific information is shown below.

3.1

DQN

Deep Q-network is the core of DRL method, using deep neural network can analyze the input features of nodes and calculate the score of each node [28]. Let the DQN be computed as a state-action function Q(S, a), where $S = {s_{1}, s_{2}, \dots, s_{n} | s_{i} \in {0, 1}, 1 \leq i \leq n}_{n}$ is the state of the node and a ∈ [1, n] represents the selection of node v_a. For all i ∈ {1, 2, …, n}, s_i = 1 means that node v_i has been selected and s_i = 0 means the opposite. Performing action a represents the selection of node v_a and sets the state s_a of v_a to 1. In addition to this, node features (usually embedded features H) are also required as inputs to the DQN. In detail, the inputs to the DQN include node feature matrix H, node states S and other related features P. Therefore, let H′ be the input matrix of the DQN, which can be formulated as H′ = [H ⊕ S ⊕ P], where ⊕ is the matrix splicing notation. Taking the 3-layer MLP as an example, Q(S, a) the computation is shown below: (11) ${Q (S, a) = {[D Q N (W^{1}, W^{2}, H')]}_{a} = [Re l u (H', W^{1}) \cdot W^{2})]}_{a}$

where W¹ and W² are the weight parameters of the first and second layers of the DQN, respectively. Relu is an activation function that handles the nonlinear relationship between the eigenvalues and the result by setting features with negative eigenvalues to zero. The output of the DQN is a n-dimensional vector, and ${[D Q N (W^{1}, W^{2}, H')]}_{a}$ denotes the ath score of its result.

3.2

Markov Decision Making

Markov decision-making process is an important part of reinforcement learning [29]. Reinforcement learning aims to learn a function [30] that maps state-action pairs to a score, while Markov decision-making executes actions based on this function to obtain a reward and the next state. This process can be represented as a quaternion $(S_{t}, a_{t}, r_{t}, S_{t + 1})$ , where S_t is the state at moment t, a_t is the action chosen at moment t, r_t is the reward for executing the action at state S_t, and S_t+1 denotes the new state obtained after the action is executed. In the node combination optimization problem, S_t denotes the node selection state of the complex network G, r_t is the reward after selecting a node, and S_t+1 is the new state where the $v_{a_{t}}$ th value is set to 1.

In order to obtain a higher cumulative reward value, Markov decision making uses a greedy strategy [31] to select the action based on $Q (S, a)$ the highest score. In the node combination optimization problem, the Markov decision will select the node with the highest score, which has the highest expected return and can help the model to obtain better performance in classifying teaching resources. Specifically, the Markov decision process can be expressed as follows: (12) $Q (S_{t}, a_{t}) = \max_{a} Q (S_{t}, a)$ (13) $a_{t} = \underset{a}{\arg \max} Q (S_{t}, a)$

Where, Equation (12) calculates the maximum Q value in state S_t and selects the corresponding node a as the target node. Equation (13) sets the node with the largest Q value as a_t. (14) $S_{t + 1} = S_{t}; {[S_{t + 1}]}_{a} = 1$

where S_t denotes the node selection state at moment t and S_t+1 denotes the node selection state at moment t + 1. Equation (14) updates state S_t and obtains S_t+1 by setting the state of the newly selected node to 1. (15) $r_{t} = F (S_{t + 1}) - F (S_{t})$

where $F (S_{t})$ denotes the score of S_t and r_t is the reward for performing action a_t in state S_t. Equation (15) obtains the actual reward for node selection by calculating the difference between the scores of states S_t+1 and S_t.

3.3

DQN Optimization

Through DQN as well as Markov decision process, the node combination optimization problem can be equivalently transformed into an optimization problem with DQN to improve the model’s classification on teaching resources. Given a DQN, the solution state S can be obtained through a Markov greedy decision process. Therefore, in order to optimize the solution S, it is necessary to continuously adjust and find the appropriate DQN parameters W¹ and W². Therefore, the node combination optimization problem can be concretely expressed as: (16) $\begin{matrix} \max F (S) = F (W^{1}, W^{2}) \\ s . t . Q (S_{t}, a) = D Q N (W^{1}, W^{2}, H, S) \end{matrix}$

During the training process, RL employs a wandering strategy (ε-greedy strategy) to collect 4 samples and store them in the empirical playback area B. At the same time, the DQN is updated by performing a small batch of stochastic gradient descent on the loss function L. From this point of view, the training of the DQN is a minimization process of the loss function. In order to achieve a good balance between exploration and utilization, the MSE loss function and dual DQN methods are used. Dual DQN uses two DQNs: a prediction network Q and a target network $\hat{Q}$ . The prediction network Q is updated once per round, while the target network $\hat{Q}$ is updated once per c round. The loss function $L (W^{1}, W^{2})$ is evaluated: (17) $L (W^{1}, W^{2}) = E_{(S_{t}, a_{t} r_{t} S_{t + 1} \in B)} {[r_{t} + γ \max_{a'} \hat{Q} (S_{t + 1}, a'; \hat{W}) - Q (S_{t}, a_{t}; W)]}^{2}$

4

Classification analysis of online teaching resources

4.1

Text classification effect analysis

4.1.1

Classification performance

Due to the large number of label sets contained in the dataset of this experimental classification task, if the traditional classification evaluation metrics are used directly, not only will it consume a large amount of arithmetic power, but at the same time, each sample is only relevant to a very small amount of information, and the model performance is not good when a small amount of information is directly selected from a high-dimensional information set. Therefore, in the actual task scenario, the prediction result of each sample is provided with a short sorted list of relevant information, in which the higher the sorting means the higher the relevance, and the corresponding evaluation metrics should also pay more attention to the higher sorted labels. Therefore, researchers will prioritize the evaluation metrics that are more sensitive to the ranking to measure the performance of the classifier in classification scenarios. Based on the existing work, the evaluation metrics in this section choose Precision at top-k (P@k), which calculates the number of resources that are related to the samples in the first k positions of the ranked list, and if the better the model’s classification performance is, the higher the scores of the related resources will get and ranked at the head of the sorted list, and hence the value of P@k will be larger. In this section the DRML model is experimented with multiple text categorization models on the same dataset and the results are shown in Table 1.

Table 1.

Comparison experiment of DRML and other reference models (%)

Algorithm	Eurlex-4k			AmazonCat-13k			Wiki10-31k
Algorithm	P@1	P@3	P@5	P@1	P@3	P@5	P@1	P@3	P@5
PfastreXML	74.91	73.45	57.02	92.31	78.04	63.56	82.84	69.88	60.11
DisMec	83.90	72.02	61.15	97.46	81.12	67.48	85.79	74.92	64.88
Parabel	81.41	70.49	55.96	94.05	78.89	64.11	85.89	75.67	59.42
SLEEC	88.01	71.57	58.40	96.69	79.84	61.33	86.40	79.60	69.35
XML-CNN	83.18	65.77	54.59	91.73	82.04	66.88	87.02	75.89	61.37
LAHA	80.37	69.04	58.53	94.92	80.16	65.78	85.24	76.39	62.87
AttentionXML	81.86	77.95	62.62	95.13	85.16	66.60	85.94	79.63	60.45
X-Transformer	74.34	78.36	62.51	93.60	83.38	69.18	88.12	80.70	70.51
APLC-XLNet	81.14	68.85	58.15	97.25	79.49	69.54	84.75	77.78	64.17
LightXML	89.39	66.34	64.22	94.58	86.73	65.18	89.86	81.14	64.82
DRML	90.25	80.56	69.74	98.71	86.47	71.22	91.04	82.43	71.53

From the experimental results in Table 1, it can be seen that compared to the one-to-many, embedding-based and tree-structure-based text categorization models, the DRML model proposed in this paper outperforms the compared models in all the evaluation metrics of categorization performance, which are 6.35%, 8.54%, and 8.59% enhancement of P@1, P@3, and P@5 in Eurlex-4k. P@1, P@3, and P@5 in AmazonCat-13k improved by 4.66%, 7.58%, and 7.11%. P@1, P@3, and P@5 in Wiki10-31k improved by 5.25%, 7.51%, and 6.65%, which demonstrates the superior performance of deep learning in learning sample features.

While comparing the deep learning based text classification models, the DRML model has slightly lower metrics than LightXML in P@3 in the dataset AmazonCat-13k, but is ahead of the deep learning models including LightXML in all other metrics. Among them, the DRML model has a larger improvement in classification performance compared to LAHA, with 9.88%, 11.52%, and 11.21% improvement in P@1, P@3, and P@5 in Eurlex-4k, respectively. P@1, P@3, and P@5 in AmazonCat-13k are improved by 3.79%, 6.31%, and 5.44%, respectively. P@1, P@3, and P@5 in Wiki10-31k improved by 5.80%, 6.04%, and 8.66%, respectively.

4.1.2

Characterization

The purpose of the experiments in this section is to validate the relationship between the use of sentiment features, label semantic features and model classification performance of DRML model, the results of the experiments are shown in Table 2, where Difference shows the difference in performance between the two scenarios, where “+” indicates that the performance of the model improves, and on the contrary “-” indicates that the model performance decreases.

Table 2.

Relationship between emotion, label semantic features and classification performance (%)

Dataset	P@k	Emotion feature	Label semantic feature	Difference
Eurlex-4k	P@1	85.95	89.42	+3.47
	P@3	76.42	79.12	+2.70
	P@5	66.37	68.15	+1.78
AmazonCat-13k	P@1	94.86	98.43	+3.57
	P@3	84.25	86.44	+2.19
	P@5	68.36	69.75	+1.39
Wiki10-31k	P@1	89.45	91.64	+2.19
	P@3	79.64	81.06	+1.42
	P@5	69.63	70.87	+1.24

As can be seen from Table 2, in the three experimental datasets, using labeled semantic features as the final text features for the classification task is better than the experiments using sentiment feature outputs in terms of evaluation metrics, where P@1, P@3, and P@5 in Eurlex-4k are 3.47%, 2.70%, and 1.78% higher, respectively. P@1, P@3, and P@5 in AmazonCat-13k are 3.57%, 2.19%, and 1.39% higher, respectively. P@1, P@3, and P@5 in Wiki10-31k are 2.19%, 1.42%, and 1.24% higher, respectively.

4.1.3

Ablation experiments

In this section, in order to test whether the improved modules contribute to the classification performance of the DRML model, the experiments target the module that dynamically adjusts the data text set to explore its impact on the classification performance. The experiment is divided into two aspects, the first aspect of “Static” indicates that the text set is not dynamically adjusted, and is computed interactively with the text features in an initialized form. The second aspect of “Dynamic” indicates that the semantic features of the text set are dynamically adjusted by the text of the sample data, while “Difference” indicates the performance difference between the two cases, where “+” indicates the performance of the model is improved, and “-” indicates the performance of the model decreases, as shown in Table 3.

Table 3.

Ablation experiment results (%)

Dataset	P@k	Static	Dynamic	Difference
Eurlex-4k	P@1	85.46	89.63	+4.17
	P@3	74.59	78.45	+3.86
	P@5	66.32	67.96	+1.64
AmazonCat-13k	P@1	84.26	87.58	+3.32
	P@3	73.94	75.68	+1.74
	P@5	67.06	69.33	+2.27
Wiki10-31k	P@1	87.16	89.56	+2.40
	P@3	76.49	80.02	+3.53
	P@5	69.28	70.63	+1.35

From the experimental results in Table 3, it can be seen that the model with dynamically adapted text features, i.e., the full DRML model, has better performance in classifying text on the experimental dataset than the model using static text features, where P@1, P@3, and P@5 are 4.17%, 3.86%, and 1.64% higher in Eurlex-4k, respectively. P@1, P@3, and P@5 in AmazonCat-13k are 3.32%, 1.74%, and 2.27% higher, respectively. P@1, P@3, and P@5 in Wiki10-31k are 2.40%, 3.53%, and 1.35% higher, respectively, which also indicates the better classification performance of the DRML model.

4.2

Image classification effect analysis

In the online teaching resources of university languages, text and image are the highest and most important types of resources, and this subsection provides an in-depth investigation of the image classification effect of DRML model.

4.2.1

Classification performance

The experiments compare the classification effectiveness of the DRML model in this paper with the current state-of-the-art image classification models on three image datasets (miniImageNet, tieredImageNet, and QHGIM), and the results are shown in Table 4.

Table 4.

Comparison of classification accuracy of different methods (%)

Method	miniImageNet	tieredImageNet	QHGIM
ProroNet	69.48	75.46	73.52
RelationNet	70.56	75.08	70.63
SimCLR	81.03	80.12	76.28
SimSiam	81.89	83.44	78.49
TPMN	85.65	85.47	80.36
RE-Net	84.74	84.23	80.07
ProroNet+Swin	75.64	78.55	74.68
BML	77.42	84.64	83.59
SUN	85.79	87.09	83.66
DRML	88.96	90.41	89.73

Compared with the state-of-the-art models, DRML outperforms the other models on all three datasets of miniImageNet, tieredImageNet, and QHGIM, with accuracies of 88.96%, 90.41%, and 89.73% on the three datasets, respectively. DRML outperforms the baseline model (ProroNet) by 19.48%, 14.95%, and 16.21% classification accuracy, and improved the performance over the best available model (SUN) by 3.17%, 3.32%, and 6.07%, respectively. This indicates that DRML is able to classify images quickly and accurately and has some practical value.

4.2.2

Feature visualization and analysis

In order to study the feature extraction capability of DRML more intuitively, this subsection samples 20 samples per category on the QHGIM dataset, and uses DRML to show their distributions in the feature space, and the feature distributions of the test samples are visualized as shown in Fig. 4, where (a) and (b) are the results of the feature visualization for the benchmark models ProtoNet and DRML, respectively. The results show that DRML is able to generate more accurate decision boundaries so that different categories are distinguished more obviously.

In order to investigate the overall generalizability of different models on the QHGIM dataset, this section uses different methods to visualize the feature distributions of all the samples within the QHGIM dataset. The visualization of the feature distributions of all the samples within the QHGIM dataset is shown in Fig. 5, and (a) to (d) are the original feature distributions, the ProtoNet, the SUN, and the DRML feature distributions, respectively.

As shown in Fig. 5(a), the original QHGIM dataset has high overlap of different categories, which makes classification challenging. As shown in Fig. 5(b), the conventional prototype network has limited classification performance on the QHGIM dataset, which is only able to distinguish the classes of a small number of samples, and most of the samples still have significant inter-class overlap. As shown in Fig. 5(c), SUN achieves effective classification for most of the samples, but the class spacing between different classes is too close, and there is still class confusion for some samples. As shown in Fig. 5(d), in contrast DRML exhibits the best classification performance, effectively reducing the inter-class overlap and improving the overall results. This further demonstrates the effectiveness of DRML on image classification tasks.

4.3

Evaluation of application effectiveness

The DRML model of this paper was used in the teaching of language at school in School S. Questionnaires were distributed to the subjects in the school to investigate whether the classification of resources for teaching language at school based on the DRML model has gained the satisfaction of the users. Evaluation questionnaires were distributed to 442 teachers and students, and the data from the recovered questionnaires were counted, and the results are shown in Table 5, where -2, -1, 0, 1, and 2 stand for Strongly Disagree, Disagree, Uncertain, Agree, and Strongly Agree, respectively, and Fi stands for the score rate.

Table 5.

Evaluation for the college Chinese online teaching resource classification of DRML model

Index	Evaluation score (percentage)
Index	-2	-1	0	1	2	1/2	Mean
Classification accuracy	1.79%	4.91%	12.14%	54.76%	26.40%	81.16%	0.99
Resource quality	0.19%	2.31%	10.45%	55.82%	31.23%	87.05%	1.16
Classification speed	0.38%	4.14%	12.42%	53.69%	29.37%	83.06%	1.08
Result clarity	0.97%	2.45%	12.36%	51.28%	32.94%	84.22%	1.13
Learning difficulty	1.72%	2.47%	20.40%	57.05%	18.36%	75.41%	0.88
Acquisition difficulty	0.18%	2.74%	21.58%	53.32%	22.18%	75.50%	0.95
Efficiency improvement	1.32%	2.07%	15.79%	57.07%	23.75%	80.82%	1.00
Tool efficiency	1.64%	4.80%	16.60%	46.04%	30.92%	76.96%	1.00
Adopt willingness	1.18%	3.27%	11.36%	47.99%	36.20%	84.19%	1.15
Using willingness	0.39%	1.87%	11.89%	54.68%	31.17%	85.85%	1.14

As can be seen from Table 5, the percentage of subjects who expressed agreement/strongly agree in the 10 indicators is more than 75%, the percentage of those who expressed disagreement is not more than 2%, and the number of those who expressed strong disagreement is not more than 5%. The mean scores of the 10 indicators are in the range of 0.88-1.16, which can be seen that this paper’s DRML model of categorization of online language teaching resources in colleges and universities achieves a more satisfactory evaluation of users’ Results.

5

Conclusion

The article fully investigates the characteristics of online educational resources, and feature extraction of online resources for language teaching in colleges and universities is carried out by means of the deep reinforcement learning method of intelligentsia. The original deep reinforcement learning algorithm is optimized so as to construct a DRML model for use in the intelligent classification of online language teaching resources in colleges and universities. 1)

In text classification, the overall performance of DRML classification model in this paper is better than other classification models in text classification. In feature classification, classification using labeled semantic features outperforms classification using sentiment features, with P@1, P@3, and P@5 in Eurlex-4k being 3.47%, 2.70%, and 1.78% higher, respectively. It is 3.57%, 2.19%, and 1.39% higher in AmazonCat-13k. 2.19%, 1.42%, and 1.24% higher in Wiki10-31k, respectively. And the performance of DRML model for classifying text is better than the model using static text features.

2)

In image classification of teaching resources, the accuracy is 88.96%, 90.41%, 89.73% on miniImageNet, tieredImageNet and QHGIM datasets, which outperforms the best existing model by 3.17%, 3.32%, 6.07%, respectively. The method of feature visualization proves that the DRML model is indeed superior in image classification.

3)

In the evaluation of this paper’s DRML model for classifying teaching resources, the mean score interval of the indicators is [0.88, 1.16], more than 75% of the subjects agree/strongly agree in 10 indicators, and the percentage of the number of people who disagree or strongly disagree is less than 2% and 5%, respectively. The DRML model in this paper obtained high evaluation results.

Idioma:: Inglés

Calendario de la edición:: 1 veces al año
Temas de la revista:: Ciencias de la vida, Ciencias de la vida, otros, Matemáticas, Matemáticas aplicadas, Matemáticas generales, Física, Física, otros

RSS Feed de revista

An Intelligent Classification Method for Online Resource Data of College Language Teaching Based on Deep Reinforcement Learning

Jing Dong

Zhuyun Wang

Publicado en línea: 29 sept 2025

Recibido: 28 dic 2024

Aceptado: 17 abr 2025

DOI: https://doi.org/10.2478/amns-2025-1134

Palabras clave<kwd>Deep reinforcement learning</kwd>, <kwd>Feature extraction</kwd>, <kwd>DRML model</kwd>, <kwd>Teaching resource classification</kwd>

© 2025 Jing Dong and Zhuyun Wang, published by Sciendo.

This work is licensed under the Creative Commons Attribution 4.0 International License.

Palabras clave
<kwd>Deep reinforcement learning</kwd>, <kwd>Feature extraction</kwd>, <kwd>DRML model</kwd>, <kwd>Teaching resource classification</kwd>