Research on Adaptive Teaching Strategy of Smart Aesthetic Education Teaching Platform Based on Reinforcement Learning 
Published Online: Sep 23, 2025
Received: Jan 07, 2025
Accepted: Apr 19, 2025
DOI: https://doi.org/10.2478/amns-2025-0992
Keywords
© 2025 Songyu Wu et al., published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
The use of modern technology to accelerate the promotion of talent training mode reform, to achieve the organic combination of large-scale education and personalized training [1-2]. The state pointed out the direction of education development from the top-level design, education and information technology should be deeply integrated to promote the change of education mode. In this context, the concept of smart classroom came into being [3-4]. Smart classroom is an innovative teaching mode that uses information technology teaching means to realize the informatization and intelligence of classroom teaching, uses data analysis technology to provide a basis for teachers’ teaching design, uses rich data resources to provide a way for students to learn independently, and promotes the in-depth integration of information technology and teaching, so as to achieve better teaching results [5-8].
Aesthetic education refers to the education that cultivates students’ ability to recognize, appreciate and create beauty, also known as aesthetic education or aesthetic education [9-10]. Aesthetic education shoulders the important responsibility of promoting the overall development of college students, and plays an irreplaceable role in the cultivation of students’ moral sentiments, personality cultivation, and aesthetic ability [11-12]. In the intelligent education environment, the teaching of aesthetic education in institutions has undergone fundamental changes [13-14]. The educational concept of aesthetic education has changed from a single theory of aesthetic education inculcation or art skills enhancement to a comprehensive, vocational, innovative and diversified concept of promoting the spirit of Chinese aesthetic education [15-17], the aesthetic education resources of the institutions have broken through the limitations, realized the interconnection and sharing of the global scope, and become colorful, and the evaluation of aesthetic education teaching pays more attention to the evaluation of diversification, comprehensiveness, dynamics, and informatization, which can better promote students’ aesthetic learning and quality of life. The evaluation of aesthetic education teaching is more focused on diversification, synthesis, dynamization and informatization, which can better promote students’ aesthetic education learning and literacy enhancement [18-21].
The smart education environment provides rich technical support and new development opportunities for aesthetic education teaching [22]. By formulating curriculum standards, integrating aesthetic education resources, improving classroom ecology, and creating a nurturing ecology, an innovative, personalized, and highly efficient intelligent aesthetic education teaching model can be constructed [23-25].
Educational experts have conducted research on the core and importance of aesthetic education, and Muzyka, O et al. revealed that the process of aesthetic education neglects the development of students’ creativity, and that aesthetic education is not mainstreamed in the educational system, and that it lacks in terms of teaching resources [26]. Asamatdinova, J et al. explored the current state of practice and the links between the three components of adolescent, cognitive, aesthetic as well as moral education, making a positive contribution to the understanding of the meaning of aesthetic education [27]. Hurren, W. J based on the personal perspective of activism, analyzed the connection between aesthetic education and activism as well as the importance of aesthetic ability and emotional cultivation, and developed a detailed discussion with actual cases [28]. After a comprehensive understanding of the importance of aesthetic education, how to improve the effect of aesthetic education teaching and promote the innovation of aesthetic education teaching has been the concern of scholars.
Information intelligence technology empowered education promotes educational innovation and reform, Gao, J et al. designed a cloud wisdom online teaching system based on collaborative filtering recommendation algorithm, which promotes online cloud classroom intelligence and informatization [29]. Qin, L analyzed the advantages and disadvantages of the smart classroom teaching model, the teaching process, and the teaching logic, aiming to build a set of teaching quality assessment model adapted to the smart teaching classroom [30]. Zhu, Q et al. verified the intelligent classroom empowered by Moso Teach, an intelligent teaching system, based on teaching practice, which to a certain extent promoted students’ learning motivation and did not provide students with personalized learning experience [31]. The research related to information intelligent technology is broad, involving teaching assessment, personalized intelligent classroom, online cloud teaching, etc. There is also research on the teaching mode of integrating information intelligent technology in multidisciplinary courses, while there is a gap in the research on information intelligent teaching classroom for aesthetic education.
This paper proposes an adaptive teaching strategy for the smart aesthetic education teaching platform based on reinforcement learning for the needs of aesthetic education. First, a user-centered design framework for the smart aesthetic education teaching platform is proposed, and a smart aesthetic education knowledge graph is constructed. Second, knowledge representation learning is introduced based on the DDQN model, and the KERL4Rec model is proposed. Finally, the KERL4Rec model, the DDQN4Rec model and the four classical models are selected for performance comparison on the MOOPer dataset and the MOOCCube dataset. A questionnaire method is used to investigate user satisfaction and analyze the feasibility of the resource recommendation system. Effectiveness testing by fine-grained evaluation method is conducted to evaluate the application effect of the adaptive teaching strategy of the intelligent beauty education teaching platform based on reinforcement learning.
This paper positions the smart aesthetic education teaching platform as follows: the smart aesthetic education teaching platform is a digital aesthetic education platform that addresses the needs of aesthetic education and establishes a link between social aesthetic education resources and users’ aesthetic education needs, so as to achieve the goal of improving the quality of users’ aesthetic education.
The design process framework of the smart aesthetic education teaching platform is based on the user-centered design method. User-centered design refers to the user experience as the center of the design strategy in the process of designing products, emphasizing the user-centered design mode. In other words, it means that in every part of the product design process, the user’s experience and needs are taken as design elements, and the design is centered around the user. The user-centered design process revolves around user needs.
At the same time, the international standard ISO 13407: clear user-centered design process, user-centered design process is divided into the following steps:
 Obtain user requirements and provide design concepts for the product, including user needs and usage scenarios. By screening and analyzing user needs, detailed user requirements are derived to pave the way for the next design. Formulate and implement the design scheme, according to the user needs in the early stage, the designer can initially develop the design scheme, including user research, product positioning, design process program development, through the implementation of the design scheme to carry out the initial design work. According to user requirements, the design results of the user evaluation test, from the feedback to improve the program, to help designers to carry out in-depth refinement of the program. Re-determination of user requirements and re-design of the design solution. Through the feedback obtained in the previous step, the designer can incorporate further user requirements into the design solution and iterate the design solution to make it more complete.
Thus, the Aesthetic Needs-Based Design Process Framework can be understood as a methodological process that guides designers in exploring users’ aesthetic needs and design expression, which includes a number of design skills and tasks, such as user research, design expression, and so on. The design process framework based on aesthetic needs is shown in Figure 1, which consists of several major steps.

Sorting out the process framework
In this study, the purpose of constructing a knowledge graph for smart aesthetic education is to provide auxiliary information for the resource recommendation model, to promote personalized learning and accurate teaching, and at the same time, to provide learners and teachers with a visual representation of the knowledge structure. The knowledge graph stores data in the form of a triad of “entity, relationship, entity”, and there is a large amount of unstructured and semi-structured data in the source dataset, so it is necessary to carry out knowledge extraction operations.
Knowledge extraction is divided into two parts: entity recognition and relationship recognition, and relationship recognition is completed on the basis of entity recognition. In the process of knowledge identification, this study takes the smart aesthetic education ontology model as the extraction principle to automatically extract relevant data. In the relationship extraction stage, according to the planned relationship model, the common relationships in the data such as “contains” and “takes over” will be directly extracted, while some of the more specialized relationships will be manually extracted and defined.
Knowledge fusion can fuse different types of graph relationships to form a more complete knowledge map, mainly including three types of knowledge point fusion, relationship fusion and fusion of heterogeneous information. Knowledge fusion refers to the data fusion of similar concepts in the same level, because the different ways of describing different information lead to the existence of some of the data in the knowledge map of wisdom and aesthetic education that are consistent in their nature but inconsistent in their designation, in order to solve this problem, this model carries out knowledge fusion.
Smart Aesthetic Education Knowledge Graph in Schema Layer Construction the Smart Aesthetic Education ontology contains three parts: learners’ personal information, interactions in the learning system, and learning resources, and it is necessary to utilize knowledge fusion to correlate these three parts of heterogeneous information. In this knowledge graph, the “Learner ID” is used to associate the learner’s personal information with the learning interaction, the “Learning Resources” and “Learner ID” are used to associate the learners with the knowledge points, and the learning interaction and the learning resources are associated with the “Learning Resources” and “Resource Type”.
After the construction of the knowledge graph schema layer and data layer is completed, the entities and relationships need to be stored. In this paper, Neo4j graph database is selected, the main data is defined according to the knowledge graph ontology model, the conceptual model is defined as nodes, the relationship model is defined as edges, and the specific steps of knowledge graph storage are shown in Figure 2.
After the ontology-based knowledge graph of intelligent aesthetic education is stored by Neo4j graph database, a total of 230,651 nodes and 1,958,202 relations are generated.

Steps of knowledge graph storage
Reinforcement learning based recommender systems represent each user’s state as well as action independently, and do not effectively explore and utilize the potential connections between different products. To solve this problem, this paper proposes a knowledge representation-enhanced multi-task reinforcement learning recommender model (KERL4Rec), which connects the Double-DQN (DDQN) recommender model training process as well as the knowledge representation Learning (KERL4Rec), which connects the Double-DQN (DDQN) recommendation model training process and the knowledge representation learning process through a feature combination sharing unit. The introduction of knowledge representation learning enables the recommendation model to utilize useful information from the knowledge graph to alleviate the problem of data sparsity, and at the same time, it can also utilize the domain information of the related tasks to improve the generalization ability. In the next part of this paper, the deep reinforcement learning based recommendation module as well as the training and experimental results of the model will be presented and analyzed.
Mainstream classical reinforcement learning algorithms are mainly divided into the algorithm based on the policy gradient and the Q-learning algorithm based on the value function. The policy-based algorithm uses a neural network to directly approximate the policy, which is suitable for continuous action output scenarios, and has a better convergence, but the algorithm needs to collect the complete empirical data of each round in order to carry out the gradient updating, which results in a lower sampling efficiency. When updating the gradient of the strategy, sometimes a little increment to the gradient direction will also produce a high variance, making the model difficult to train and easy to lead to a local optimum; Q-learning is suitable for discrete state and action space, without sampling the whole complete experience, but it is difficult to train on large-scale state and action space, and a little change in the 
Although DQN has improved the convergence and stability of reinforcement learning, but since the work of DQN selecting the next action and generating the 
Then use Objective 
The training process of DDQN is shown in Fig. 3, DDQN gives the optimal action selection under the new state 

DDQN training process
Based on the above analysis, in this paper, DDQN, an improved algorithm of DQN, will be chosen as the model of recommendation task module. Usually, the system has a series of user interaction information with the system, and this model considers the historical interaction information between these users and items to make the recommender system generate the next item recommended to the user.
In reinforcement learning-based recommender systems, the user serves as the environment that responds in a Markov decision process, and the recommender system serves as the intelligence, modeling the interaction process between the user and the recommender system as a Markov decision process, and the recommender system updates its policy in the interaction with the user. In the recommendation scenario, an MDP is a tuple consisting of 
Each element in the MDP tuple  State space  In this model, the state is constructed using  Both LSTM and its variant GRU can handle time-series data well and capture long-term dependencies, while GRU simplifies LSTM and makes the model easier to converge, which makes it more suitable for constructing more complex networks, so in this paper, in order to better capture the user’s sequential behaviors, GRU is used to deal with the user’s item-interaction-set features, i.e.,  The input to each GRU unit is the input  Update gate  At the end of step  Action space  The action acts as a feature vector representation of the item to be recommended to the user at that moment. Construction of Reward  Reward denotes the reward of the environment, and  State transfer probability In the model of this paper, when the system recommends an action to a user, if the user receives a positive REWARD, the interaction set of items in the next moment state of the user will be converted from  Discount factor Discount factor is an indicator of how much importance the recommender system attaches to future returns, when the value is 0, it means that the recommender system only attaches importance to the user’s current feedback, and when the value is 1, it means that the recommender system calculates the user’s feedback at each moment completely and equally into the return.

State composition

State transition diagram
In order to demonstrate the recommendation performance of the learning resource recommendation model proposed in this paper, the learning resource recommendation model KERL4Rec, the main task model DDQN4Rec, and the four classical models proposed in this paper are selected for comparison on the MOOPer dataset and the MOOCCube dataset. The baseline models are categorized into three types: traditional recommendation, knowledge graph-based recommendation and sequence recommendation. The traditional recommendation model is FM, the knowledge graph-based recommendation model is KGCN, and the sequence-based recommendation models include GRU4Rec and KERL. The details are described below:
FM: is a standard factorization model that obtains user vectors for recommendation by fusing users’ second-order features. For comparison experiments, learner interaction learning resources and connected entity information are used as feature inputs.
KGCN: is an end-to-end recommendation framework based on knowledge graph, which uses convolutional networks and knowledge graph to capture the association information between items and compute entity vectors, and recommends by multiplying the entity and user vectors as interaction probabilities.
GRU4Rec: is a model that utilizes gated recurrent units to capture user preferences for session and sequence recommendation.
KERL: is a knowledge-based guided sequence recommendation model that uses knowledge graph and Markov decision making to mine user preferences.
The specific experimental results of KERL4Rec, DDQN4Rec and the baseline model on the two educational datasets are shown in Table 1 and Table 2. The performance comparison of KERL4Rec, DDQN4Rec and the baseline model on the MOOPer dataset and the MOOCCube dataset in terms of two metrics, accuracy, and recall, is shown in Figure 6 and Figure 7.
Comparison experimental results of DDQN on the MOOPer dataset
| Index Model | HR@10 | HR@5 | HR@3 | HR@1 | NDCG@10 | MRR | 
|---|---|---|---|---|---|---|
| FM | 0.4829 | 0.3822 | 0.3219 | 0.2516 | 0.3728 | 0.3264 | 
| KGCN | 0.7823 | 0.6718 | 0.5567 | 0.3629 | 0.2662 | 0.3105 | 
| GRU4Rec | 0.7612 | 0.7629 | 0.7873 | 0.7053 | 0.7451 | 0.7322 | 
| KERL | 0.9638 | 0.9511 | 0.9340 | 0.9073 | 0.9371 | 0.9205 | 
| DDQN4Rec | 0.9711 | 0.9605 | 0.9411 | 0.9105 | 0.9417 | 0.9215 | 
| KERL4Rec | 0.9892 | 0.9784 | 0.9685 | 0.9235 | 0.9688 | 0.9398 | 
Comparison experimental results of DDQN on the MOOCCube dataset
| Index Model | HR@10 | HR@5 | HR@3 | HR@1 | NDCG@10 | MRR | 
|---|---|---|---|---|---|---|
| FM | 0.0536 | 0.0303 | 0.0189 | 0.0082 | 0.0241 | 0.0378 | 
| KGCN | 0.4351 | 0.3378 | 0.2767 | 0.1197 | 0.1822 | 0.1920 | 
| GRU4Rec | 0.7862 | 0.7439 | 0.6903 | 0.5209 | 0.6605 | 0.5832 | 
| KERL | 0.9285 | 0.8802 | 0.8577 | 0.7721 | 0.8440 | 0.8299 | 
| DDQN4Rec | 0.9309 | 0.8921 | 0.8602 | 0.7805 | 0.8512 | 0.8351 | 
| KERL4Rec | 0.9412 | 0.9088 | 0.8789 | 0.7932 | 0.8793 | 0.8573 | 

Precision comparison on two data sets

Comparison of Recall on two datasets
By observing the experimental results, this paper has the following findings.
As shown in Tables 1 and 2 in comparison with the baseline model, KERL4Rec performs outstandingly on the learning resource recommendation task. In the above experimental results, KERL4Rec outperforms the other models on several recommendation metrics on both the MOOPer dataset and the MOOCCube dataset. Compared with the KERL model, which has the best performance among the four classical models, KERL4Rec improves 2.64%, 2.87%, 3.69%, 1.79%, 3.38%, and 2.10% on HR@10, HR@5, HR@3, HR@1, NDCG@10, and MRR metrics of the MOOPer dataset; and 2.10% on the MOOCCube dataset’s HR@10, HR@5, HR@3, HR@1, NDCG@10, and MRR metrics by 1.37%, 3.15%, 2.47%, 2.73%, 4.18%, and 3.30%, respectively. This shows that the KERL4Rec model successfully captures the higher-order structural information in the graph and characterizes the learner state training learning from the knowledge graph entity state, sequence state, and learner state in three dimensions, effectively modeling the learner knowledge state and dynamic preferences.The KERL4Rec model can play a great effect enhancement to the recommendation based on reinforcement learning, which proves the validity of this model. From the comparison of Table 1 and Table 2 with the DDQN4Rec model, it can be seen that the introduction of knowledge representation learning into the learning resource recommendation algorithm can alleviate the data sparsity problem. The number of recommendable learning resources in the preprocessed MOOCCube dataset is 37826, the number of learners and the number of learner interaction behaviors are 34164 and 4291853, respectively, and there is no duplicate learning record for the same learner, and the non-zero elements in the interaction matrix of the MOOCCube dataset are calculated to account for about 0.36% of the total elements. Therefore, the MOOCCube dataset is a sparse dataset. On the MOOCCube dataset, KERL4Rec improves 1.11%, 1.87%, 2.17%, 1.63%, 3.30%, and 2.66% in HR@10, HR@5, HR@3, HR@1, NDCG@10, and MRR metrics, respectively, when compared with DDQN4Rec, which has not been introduced knowledge representation learning. This proves that the introduction of knowledge representation learning can effectively alleviate the data sparsity problem and obtain better learning resource recommendation results. From the comparison of Fig. 6 and Fig. 7 with other models, it can be seen that KERL4Rec can effectively capture user interests. The accuracy of KERL4Rec on MOOPer dataset is improved by 2.27% and the recall is improved by 2.41% compared with the best-performing other models; on MOOCCube dataset, the accuracy of KERL4Rec is improved by 3.41% and the recall is improved by 3.49% compared with the best-performing other models. It shows that the model can better capture user interests and improve recommendation performance.
This paper analyzes the feasibility of the resource recommendation system based on user satisfaction surveys, and the analyses are all conducted in the form of controlled experiments. Twenty students were selected as users to retrieve resources through Baidu search engine (control group) and resource recommendation system (experimental group). A survey questionnaire form was used for anonymous voting by the 20 research subjects, which included three dimensions of satisfaction with the usefulness, applicability, and accuracy of the resource recommendation system, with each dimension scored 0-5 points according to the degree of satisfaction from low to high. The control results were expressed in the form of mean ± standard deviation, and the Mann-WhitneyU test was used to analyze the differences, with P<0.05 as the difference being statistically significant. The results of the user satisfaction test are shown in Table 3.
The total user satisfaction score of the experimental group was (13.75±0.59) and the user satisfaction score of the control group was (7.80±0.46). Using Mann-WhitneyU test, the difference between the control group and the experimental group in the comparison of total user satisfaction scores was statistically significant (p<0.05). The results of the user satisfaction survey showed that the percentage of 5 points in the evaluation of practicality, applicability, and accuracy were 71.25%, 82.00%, and 84.75%, respectively, which indicated that the recommended resources using this system could meet the needs of most of the users in this test.
User satisfaction test results
| Group | Practicability | Applicability | Accuracy | Total points | 
|---|---|---|---|---|
| Control group | 2.85±0.26 | 2.33±0.27 | 2.74±0.28 | 7.80±0.46 | 
| Experimental group | 3.55±0.33 | 4.08±0.42 | 4.16±0.18 | 13.75±0.59 | 
| Z | 1.300 | 4.886 | 4.267 | 0.783 | 
| P | 0.022 | <0.001 | <0.001 | 0.043 | 
In this paper, we adopt the scheme of combining the evaluation of teaching effectiveness to evaluate the recommendation model, and the better the effect obtained by teaching, the better the recommendation is proved to be.
The evaluation experiment process is as follows:
 For a certain knowledge point, a standardized test is conducted for students before and after class to obtain the mastery degree of each student on the knowledge point; The students are divided into three groups by clustering, and the teaching evaluation effect is calculated for the high group and the medium-low group. Each group’s mastery of the knowledge point is adopted as the mean value of the knowledge point mastery of the students in the group; The mastery of knowledge points of the groups before and after the lesson is visualized to show the teaching effect of the experimental group in comparison with the control group.
In this section, eight knowledge points are chosen to compare the teaching effect. The experiment is divided into experimental group and control group, the experimental group teachers prepare lessons using the results of resource recommendation, the control group does not use the results of resource recommendation, the experimental data in the middle and low level and high level students are shown in Fig. 8 and Fig. 9, respectively.

Comparison of knowledge points of middle and low level students

Comparison of knowledge points of high-level students
For high-level students, the mastery level of the experimental group and the control group are both concentrated at 0.9, and the difference is not obvious. For middle and low level students, the difference between the experimental group and the control group is obvious, with the experimental group performing significantly better than the control group, and the highest difference in mastery of knowledge points is 0.06.
This paper proposes an adaptive teaching strategy based on reinforcement learning for smart aesthetic education teaching platform, and evaluates and analyzes the application effect of this teaching strategy from three aspects, namely, model performance, feasibility and practical application effect.
The performance of KERL4Rec model is compared with the DDQN4Rec model without introducing knowledge representation learning and four classical models on MOOPer dataset and MOOCCube dataset. The accuracy on MOOPer and MOOCCube datasets is improved by 2.27% and 3.41%, and the recall is improved by 2.41% and 3.49%, respectively, over the best-performing other models.Comparing the performance of the KERL4Rec model with that of the DDQN4Rec model on the sparse dataset, MOOCCube dataset, the performance of the KERL4Rec model is improved in the HR@10, HR@ 5, HR@3, HR@1, NDCG@10, and MRR metrics by 1.11%, 1.87%, 2.17%, 1.63%, 3.30%, and 2.66%, respectively. The results show that the KERL4Rec model performs outstandingly on the learning resource recommendation task, and the introduction of knowledge representation learning into the learning resource recommendation algorithm can alleviate the data sparsity problem. The feasibility of the resource recommendation system is analyzed according to the user satisfaction survey, which shows that the total user satisfaction score of the experimental group is (13.75±0.59), and the user satisfaction score of the control group is (7.80±0.46). The results of the user satisfaction survey show that the percentage of getting 5 points in the evaluation of practicality, applicability and accuracy are 71.25%, 82.00% and 84.75% respectively, which indicates that the recommended resources using the system in this paper can satisfy most of the users’ needs. Through the fine-grained evaluation of the system effect detection, for high-level students, the experimental group and the control group difference is not obvious, the mastery of knowledge are concentrated in 0.9; for the middle and low-level students, the experimental group and the control group of students performance difference is obvious, the highest difference in the mastery of knowledge is 0.06, which proves the effectiveness of the recommended system.
