Nonlinear Adaptive Optimization of Multi-Modal Learning Paths Using Graph Convolutional Networks and Reinforcement Learning for Intelligent Educational Systems 
Publié en ligne: 17 mars 2025
Reçu: 19 oct. 2024
Accepté: 04 févr. 2025
DOI: https://doi.org/10.2478/amns-2025-0829
Mots clés
© 2025 TongLI, published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
With the rapid development of artificial intelligence technologies, personalized learning recommendation systems have become a key research focus in the field of smart education. These systems aim to enhance students' learning efficiency by providing tailored learning pathways that dynamically adapt to their evolving learning behaviors and needs. However, the inherent complexity, heterogeneity, and dynamism of educational data pose significant challenges for traditional recommendation methods. These methods struggle to address issues such as the integration of multi-modal learning resources, modeling the temporal dynamics of learning behaviors, and dynamically optimizing learning pathways[1].
Recent advances in deep learning-based recommendation systems have shown promising results in modeling complex data relationships and non-linear patterns. For example, neural collaborative filtering (NCF) models utilize neural networks to capture intricate feature interactions between users and items, leading to improved recommendation accuracy [2]. Furthermore, the introduction of graph neural networks (GNN), particularly lightweight graph convolution networks (LightGCN), has provided new methods for modeling user-resource interaction relationships[3]. However, these models face limitations in educational contexts, including insufficient support for multi-modal resources and inadequate consideration of learners' dynamic behaviors.
To address knowledge dependencies and temporal dynamics in education, researchers have introduced knowledge graphs and temporal modeling techniques. Knowledge graph attention networks (KGAT) enhance the representation of complex knowledge relationships by integrating knowledge graphs with attention mechanisms[4]. Additionally, temporal modeling methods such as long short-term memory (LSTM) networks and Transformers have demonstrated strong applicability in capturing the sequential nature of learning behaviors and identifying critical learning stages [5][4]. Meanwhile, multi-modal fusion techniques have gained traction, particularly in educational recommendations, where dynamic weighting mechanisms are used to jointly model heterogeneous data such as textbooks, instructional videos, and coding exercises [14].
Reinforcement learning (RL) techniques have introduced new possibilities for dynamically optimizing recommendation strategies. In educational recommendation systems, RL not only adjusts recommendations in real-time to meet students' personalized needs but also optimizes long-term learning objectives through reward signal designs. For instance, deep reinforcement learning (DRL)- based recommendation methods, which incorporate learning pathways, resource coverage, and user behavior into multi-objective rewards, have significantly improved recommendation diversity and user satisfaction [18].
This paper addresses the aforementioned challenges by proposing a dynamic personalized learning recommendation system based on graph convolutional networks (GCN) and attention mechanisms. Using data from a "Computer Networks" course as a case study, the system integrates multi-modal learning resources (e.g., textbooks, assignments, and instructional videos) with temporal modeling of user learning behaviors and reinforcement learning optimization to overcome challenges in heterogeneity, diversity, and dynamism. Specifically, this study contributes:
 1) An end-to-end recommendation model combining GCN and attention mechanisms: The GCN models the interaction graph structure between users and resources, capturing dependency relationships between knowledge points, while the attention mechanism dynamically identifies critical learning nodes. 2) A dynamic multi-modal fusion mechanism: By incorporating gating mechanisms and reinforcement learning strategies, the system dynamically adjusts the weights of different modalities, such as textbooks, videos, and experiments, to accommodate learning behaviors at various stages. 3) A temporal modeling framework for learning behaviors: Combining LSTM and Transformers, the system effectively captures the temporal dependencies and phased characteristics of user learning behaviors, significantly enhancing the precision of personalized recommendations. 4) Optimized recommendation strategies: By designing reward signals in reinforcement learning, the system dynamically adjusts recommendation objectives, improving overall system performance in terms of recommendation accuracy, diversity, and user satisfaction.
Recommendation systems have emerged as a critical research area in artificial intelligence, finding applications across various domains. With the increasing demand for personalized learning and the diversification of educational resources, recommendation systems have evolved from traditional collaborative filtering techniques to deep learning-driven models that integrate knowledge graphs, multimodal data, and temporal modeling techniques. These advancements address the complex data characteristics and dynamic learning needs inherent in educational scenarios. However, compared to domains like e-commerce and social media, educational recommendation systems face unique challenges, such as modeling dependencies between knowledge points, capturing the temporal dynamics of student behaviors, and integrating multimodal, heterogeneous learning resources.
Collaborative filtering (CF) is a foundational technique in recommendation systems, leveraging user-item interaction matrices to predict user preferences. Matrix factorization techniques, such as singular value decomposition (SVD) and non-negative matrix factorization (NMF), have demonstrated strong performance in static recommendation tasks[8].However, CF methods encounter significant limitations in educational recommendations:
 1) Cold-Start Problem: CF struggles to generate accurate recommendations for new users or newly introduced resources due to sparse interaction data[9]. 2) Lack of Dynamic Modeling: CF fails to capture the time-dependent nature of user behaviors and learning processes, limiting its applicability in personalized education scenarios.
Recent research has addressed these limitations by incorporating contextual features (e.g., learning background and goals) or integrating knowledge graph-based methods. However, these enhancements still fall short in dynamic modeling and multimodal resource processing.
Deep learning has significantly advanced recommendation systems by enabling non-linear feature modeling. Representative models include:
 1) Neural Collaborative Filtering (NCF): Replacing traditional linear inner-product operations with multi-layer perceptrons (MLPs), NCF captures complex user-item interactions[2]. 2) DeepFM: Combining factorization machines with deep neural networks, DeepFM enhances the extraction of high-dimensional, sparse features[11]. 3) Dynamic Interest Networks (DIN/DIEN): These models leverage attention mechanisms to capture the evolution of user interests, improving recommendation accuracy and real-time responsiveness.
Despite their success in general applications, these models face challenges in education, such as processing heterogeneous learning resources (e.g., textbooks, instructional videos, and experimental code) and capturing the dynamic temporal aspects of user learning behaviors.
Knowledge graphs (KG) have seen increasing adoption in educational recommendations, enabling semantic representation of course chapters, knowledge points, and their dependencies. Recent advancements include:
 1) Knowledge Graph Embeddings: Techniques like TransE and DistMult map entities and relationships into low-dimensional vector spaces, enhancing semantic representation[5]. 2) Graph Neural Networks (GNNs): The integration of GNNs with KGs has further improved the representation of complex knowledge structures 1)[21]. 3) Knowledge Graph Attention Networks (KGAT): By incorporating attention mechanisms and graph convolution, KGAT effectively models complex knowledge relationships, supporting dynamic learning path generation [1].
While these methods have shown promise, they often lack robust support for temporal dynamics and multimodal data integration.
Modern educational resources are inherently multimodal, encompassing textual content, videos, and programming exercises. Recent efforts to model multimodal data include:
 1) Multi-Gate Mixture-of-Experts (MMoE): This framework dynamically assigns weights to different modalities, reflecting their varying importance in learning tasks[7]. 2) Pre-trained Models: BERT and CodeBERT have achieved significant advancements in textual and programming language modeling, respectively[21],[10].
Despite these innovations, challenges remain in dynamically adjusting modality weights to reflect changes in learning stages and effectively integrating the semantics of heterogeneous data sources.
Temporal modeling plays a crucial role in capturing the dynamic nature of user learning behaviors. Key techniques include:
 1) Long Short-Term Memory (LSTM): LSTM networks excel in capturing long-term dependencies in sequential data[22]. 2) Temporal Convolutional Networks (TCN): TCNs improve efficiency and parallelization in temporal modeling while maintaining high performance [23]. 3) Transformers: Attention mechanisms in Transformers highlight critical events in sequences, making them highly suitable for optimizing dynamic learning paths[5].
Attention mechanisms have emerged as a key solution for addressing challenges in multimodal fusion and temporal modeling. By focusing on critical features and time points, attention mechanisms significantly enhance the performance of recommendation systems [23].
Reinforcement learning (RL) optimizes recommendation strategies based on real-time user interactions. Deep reinforcement learning (DRL) extends RL capabilities with deep neural networks, achieving notable success in domains such as e-commerce and video streaming [14].
In education, RL enables dynamic adjustment of learning paths to accommodate evolving student needs, offering a promising direction for personalized recommendations [15].
Reward signal design is critical in RL, particularly in education, where multi-objective optimization (e.g., learning completion, knowledge coverage, and resource diversity) is essential. Multi-objective reward functions provide theoretical foundations for dynamic recommendation strategies [16].
This study introduces a gating mechanism combined with reinforcement learning to achieve dynamic weighting of multimodal features, such as textbooks, instructional videos, and experiments, effectively addressing varying learning stage requirements.
A novel framework integrating graph convolutional networks (GCN), temporal modeling, and reinforcement learning is proposed. This framework captures interdependencies among knowledge points, dynamically adjusts recommendation paths, and represents a paradigm shift in personalized educational recommendations.
In summary, recent studies on educational recommendation systems have made significant progress, yet notable research gaps remain in the generation of dynamic learning paths, the integration of multimodal data, and the modeling of temporal and knowledge associations. This paper comprehensively reviews recent advancements, highlighting the strengths and limitations of traditional collaborative filtering methods, deep learning-based recommendation techniques, and knowledge graph-driven models in educational contexts. Additionally, it explores the latest developments in multimodal learning and temporal modeling, as well as the potential of reinforcement learning for personalized recommendations [7],[13]. Based on this foundation, the paper proposes an innovative framework for personalized learning recommendation systems that integrates graph convolutional networks (GCN)[1],[2],[20], attention mechanisms, dynamic weighting of multimodal features, and reinforcement learning, offering a novel technical pathway to address these challenges.
The Computer Networks course is a critical component of computer science education, characterized by unique learning needs and recommendation challenges. These include: diverse learning objectives, encompassing both fundamental theories (e.g., network protocols, layered architectures) and practical exercises (e.g., network simulations, socket programming); personalized learning paths, requiring the system to dynamically adapt to students' varying learning levels, interests, and mastery of knowledge; and temporal dependency, indicating that students' learning needs are often sequential, such as studying foundational knowledge (e.g., TCP/IP protocols) before advancing to hands-on practices[16].
This study proposes a dynamic personalized learning recommendation system tailored to the Computer Networks course. The system is designed to recommend multimodal learning resources, such as textbooks, instructional videos, lab projects, and code repositories[13],[17]. By modeling user learning behaviors and employing temporal sequence analysis, the system dynamically adjusts recommendation strategies, enabling students to construct efficient learning paths. To achieve this, the architecture leverages a combination of Graph Convolutional Networks (LightGCN)[2], attention mechanisms [5], and reinforcement learning[8][23] to optimize recommendations, while incorporating domain-specific input features to capture the unique aspects of the Computer Networks course[17].
The proposed dynamic personalized learning recommendation system is structured as illustrated in Figure 1, which outlines the hierarchical workflow of the system, from input processing to recommendation generation and feedback optimization. This framework includes five key modules: the Input Layer, the Multimodal Data Fusion Layer[7], the Temporal Modeling Module, the Recommendation Engine Module, and the Feedback Module[18].

Infrastructure components of the designed system
Figure 1 provides a detailed view of how the various components of the system are interconnected, including data flow and the roles of specific technologies. For instance, the Input Layer processes raw user data and extracts features via a shared embedding network, feeding them into subsequent layers. The Multimodal Data Fusion Layer[7], as depicted, employs a gated mechanism enhanced by reinforcement learning to integrate multi-modal information dynamically. This layered design ensures the system's ability to adaptively recommend learning materials tailored to individual students' needs, particularly within the context of a Computer Networks course[18].
The proposed system consists of five key components:
 1) Input Layer: Collects user learning behavior data (e.g., resource clicks, dwell time), course content features (e.g., textbook sections, code snippets), and temporal features (e.g., learning stages) while performing feature engineering and preprocessing. 2) Multimodal Data Fusion Layer: Uses a gating mechanism to dynamically integrate features from different modalities, with reinforcement learning optimizing the fusion strategy. 3) Temporal Modeling Module: Employs LSTM to capture the temporal dependencies of user behavior while integrating attention mechanisms to focus on key learning activities. 4) Recommendation Engine Module: Utilizes LightGCN [2] for feature propagation over the user-item interaction graph and combines attention mechanisms to generate personalized recommendations. 5) Feedback Module: Implements reinforcement learning with reward signals (e.g., recommendation efficiency and diversity) to iteratively refine recommendation quality.
1) Multimodal Feature Fusion: A gating mechanism is designed to dynamically integrate multimodal features, effectively combining semantic features from text, videos, and code relevant to the Computer Networks course.
1) Temporal Sequence Modeling: By combining LSTM and attention mechanisms, the system effectively models students' stage-specific learning needs.
2) Reinforcement Learning Optimization: Reward signals based on diversity and efficiency improve the quality of recommendations[8],[23].
3) Efficient Feature Propagation: LightGCN [2] is employed for efficient modeling of user-item interactions, significantly reducing computational complexity while maintaining high performance.
The use of pretrained models like BERT and CodeBERT for extracting textual and code semantic features has been validated in multiple studies [9],[10].
The Multimodal Data Fusion Layer, illustrated in pink in Figure 1, is responsible for integrating the processed input features derived from the Input Layer. By utilizing a gated mechanism and reinforcement learning[8], this layer dynamically weighs the importance of different data modalities—textbooks, videos, and code examples—based on their contextual relevance. This fusion ensures that the output is a rich, comprehensive representation that retains essential features from all modalities and prepares it for time-series analysis in the Temporal Modeling Module. The design and optimization strategies for this layer address the specific challenges posed by the diverse resource types in the Computer Networks course.
The resources for the computer networks course are highly diverse, encompassing textbooks (text), videos (dynamic visualizations), and code examples (hands-on practice). These modalities significantly influence students’ learning behaviors and progress. However, the heterogeneous nature of these modalities presents challenges in directly fusing them, as this could result in information loss or imbalanced contributions from different modalities. To address these challenges, we design a dynamic multi-modal fusion strategy tailored for computer networks course resources. This strategy leverages a gating mechanism and reinforcement learning to dynamically adjust the weights of modalities, ensuring precise modeling of students’ learning needs.
To dynamically adjust the importance of different modalities, the gating mechanism incorporates user behavior features 
where the modality weight αm is computed as:
Here, g(·) is a nonlinear function (e.g., an MLP) that outputs an importance score based on the modality feature Fm and behavior feature Am.
Weight Design for Computer Networks Course:For chapter-based content (e.g., link layer, network layer, application layer), the weights of different modalities are dynamically adjusted. For example:
 Textbooks are assigned higher weights during topics like the "Link Layer" to emphasize theoretical understanding. Code Examples are prioritized for topics like the "Application Layer," where practical coding exercises are more relevant.
The reinforcement learning optimization strategy, as described in Figure 1, enhances this gating mechanism by dynamically learning optimal weights during the recommendation process. The reward function used for reinforcement learning is defined as Formula (8):
where Learning Progress measures the completion of learning tasks, and Engagement Level evaluates student interactions with the recommended resources.
The multi-modal fusion process is modeled as a Markov Decision Process (MDP), where the state st represents the modality features in the current learning phase, and the action at adjusts the modality weights αm. The objective of reinforcement learning is to maximize the cumulative reward RR:
where γ is the discount factor. A Deep Q-Network (DQN) is utilized as the policy network, which takes the state st as input and outputs the optimal action at.The iterative optimization ensures that the system adapts to students’ evolving learning behaviors over time.
As highlighted in Figure 1, the reinforcement learning strategy works in tandem with the gating mechanism to dynamically adjust the fusion strategy during training. This approach not only maximizes learning efficiency but also ensures resource diversity in the recommendations.
The Input Layer (highlighted in blue in Figure 1 is responsible for collecting and preprocessing user data. For the Computer Networks course, the input data consists of the following components:
 1) Behavioral Data: Captures students' interactions with learning resources, including:
 (1) Click behaviors: Frequency and duration of clicks on textbooks, videos, and labs. (2) Interaction behaviors: Time and quality of completing lab projects or submitting code. (3) Dwell behaviors: Time spent on specific learning resources. 2) Course Content Features:
 (1) Textual Modality: Extracts semantic features from textbook sections using:
 where Content represents the textual content of the textbook, and fcontent ϵ ℝd is the semantic feature vector. (2) Code Modality: Extracts semantic representations from code snippets using:
 where Code is the programming code, and fcode ϵ ℝd is the feature vector. 3) Temporal Features: Captures the sequential dependency of learning modules. Temporal features are encoded using:
 where Module represents the module ID, and et is the temporal embedding vector.
To adapt multimodal input data, the following preprocessing steps are performed:
Removes outliers and standardizes features:
where μ and σ\sigma denote the mean and standard deviation of the features, respectively.
Textual and code features are extracted using pretrained models (e.g., BERT, CodeBERT) and mapped to a shared feature space:
where W and b are learnable parameters.
The shared embedding network maps multi-modal features (text, code, and temporal) into a unified feature space, generating the final input vector:
where h represents the user’s multi-modal feature representation, and σ is an activation function (e.g., ReLU).
In the Computer Networks course, knowledge points exhibit a strong sequential and dependency relationship. For instance, learning the "Network Layer" relies on foundational knowledge from the "Link Layer." To model this dependency, a dependency matrix D is defined as follows:
In the LSTM, the hidden state update formula is modified to incorporate the dependency relationship:
where Dt·ht–1 explicitly represents the influence of the current knowledge point's prerequisite dependencies on the hidden state.
As shown in Figure 1, this dependency modeling is embedded in the Temporal Modeling Layer, enabling the system to accurately capture the hierarchical structure of knowledge points in the course. The dependency matrix enhances the temporal dynamics modeled by LSTM by introducing explicit representations of prerequisite relationships.
The temporal sequence data X = [x1,x2, … , xT] represents the student's learning behavior sequence, where each xt is the feature vector of learning activities at time step t LSTM is employed to capture the temporal dependencies in this sequence. The core equations of LSTM are as follows:
where σ(·)is the sigmoid activation function and ⊙denotes element-wise multiplication.
As depicted in Figure 1, the Temporal Modeling Layer applies LSTM to the processed input sequence from the Multimodal Data Fusion Layer, effectively learning long-term and short-term dependencies in students' learning behaviors. This facilitates capturing the sequential nature of knowledge acquisition in the course.
To enhance the modeling of the importance of course knowledge points, an attention mechanism is introduced to compute the significance weight of each time step:
The final weighted representation of the temporal sequence is then computed as:
This attention mechanism, as visualized in Figure 1, enhances the LSTM's output by focusing on critical knowledge points in the sequence. The weights βt\beta_t can be further adjusted according to chapter importance, enabling the system to prioritize more impactful sections of the course material.
Beyond the attention mechanism, reinforcement learning is utilized to optimize the temporal modeling of students' learning behaviors. A reward function is designed to encourage students to follow the dependency chain of knowledge points sequentially:
where:
 Correctnesst evaluates the accuracy of the learning content at the current time step, Completiont measures the proportion of learning objectives achieved at the current stage.
As shown in Figure 1, this reinforcement learning framework is integrated into the Feedback Module and linked back to the Temporal Modeling Layer. The dynamic reward strategy ensures that the model adapts to individual students' progress and learning goals, enhancing both personalization and temporal alignment in recommendations.
The code snippet shown in the Figure 2 represents a core part of the dynamic personalized learning recommendation system described in the paper. The snippet outlines the process for multi-modal data fusion and temporal modeling for user representation generation, which are crucial for the proposed model. Here's a brief breakdown:
 Pseudo code of the “Multimodal Data Fusion and Temporal Modeling”Algorithm
Step 1: Multi-modal Data Fusion:
This step combines multiple data sources (e.g., text, images, or other features) using a weighted fusion technique. Each modality's contribution to the final representation is computed based on the softmax function, which normalizes the weights across different modalities.
Step 2: Temporal Modeling with LSTM
In this step, the Long Short-Term Memory (LSTM) model is used to process the time sequence data. LSTM is commonly used for sequential data and helps capture temporal dependencies in user behavior (e.g., learning patterns over time).
Step 3: Attention Mechanism for Temporal Features
This step applies an attention mechanism to the temporal features, which allows the model to focus more on important time steps in the sequence. The attention mechanism uses learned weights βt to assign different attention scores to each time step.
Step 4: Final User Representation
Finally, the outputs of the multi-modal fusion and temporal modeling (i.e., the fused feature vectors and attention-modulated time steps) are concatenated to form the final user representation, which is then used for personalized recommendations.
This code showcases an efficient method to fuse multi-modal data and account for temporal dependencies in user behavior, key for the personalized learning recommendations the paper addresses.
For the recommendation task in the Computer Networks course, we constructed not only a user-resource interaction graph G = (V, E), representing the relationships between users and learning resources (e.g., textbook chapters, experimental videos, code projects), but also an extended knowledge-point graph Gkn = (Vkn, Ekn). The components of the graphs are defined as follows:
 Vkn: A set of knowledge points (e.g., key topics such as the Link Layer, Network Layer, and Transport Layer). Ekn: A set of prerequisite relationships between knowledge points (e.g., the Network Layer depends on understanding the Link Layer). The embedding updates fZor nodes in the interaction graph incorporate multimodal relationships and are defined as:
 where: N(v: The set of neighbors of node v in the interaction graph. Nkn(v): The set of neighbors of resource node v in the knowledge-point graph. λ: A weighting parameter that controls the influence of knowledge dependencies on embedding updates.
As illustrated in Figure 1, the Graph Construction step within the Recommendation Engine Module integrates both interaction and knowledge-point graphs. This dual-graph modeling approach combines users’ learning behaviors with the knowledge structure of the course, thereby enriching the embeddings used in subsequent recommendation generation.
In the recommendation generation process, the user's progress in learning specific course resources is incorporated into the scoring computation. The recommendation score 
where:
 Attention(hu, hi): The compatibility score between the user and the learning resource, calculated as:
 with Wu, Wi, and qT as learnable parameters. Progress(u,i): The completion level of user u for resource  γ: A weighting parameter for the progress score.
As depicted in Figure 1, the Recommendation Engine Module combines embeddings from the Feature Propagation step (via LightGCN) with user-progress information to produce personalized recommendations. By incorporating user progress, the system ensures that recommended resources align with the user's current learning stage, providing a dynamic and context-aware recommendation experience.
In the Computer Networks course, the hierarchical structure of knowledge (e.g., progressing from the Link Layer to the Application Layer) plays a crucial role in ensuring effective learning. To respect this hierarchical progression, the recommendation system prioritizes incomplete but essential chapters or experimental projects. The re-ranking mechanism dynamically adjusts recommendation results by combining the user's personalized compatibility score with the prerequisite importance of the learning resources. The re-ranking score for resource iii is defined as:
where:
 PrerequisiteScore(i): A dynamically adjusted score that reflects the importance of resource iii in the context of the knowledge hierarchy. This score leverages the prerequisite relationships in the knowledge-point graph Gkn , as shown in Figure 1, to assign higher priority to foundational topics.
The PrerequisiteScore is derived using the knowledge-point dependencies modeled in the Graph Construction step (depicted in Figure 1) and is calculated based on the criticality and completion status of the resource's prerequisite nodes. For example, if a student has not completed the foundational chapters in the "Link Layer," these chapters are assigned higher priority in the re-ranking process, ensuring that subsequent topics (e.g., "Network Layer") are not recommended prematurely.
In the recommendation task for the Computer Networks course, the learning state of a user must comprehensively represent the current progress and the remaining tasks. The state st at time step t is defined as:
where:
 (
As depicted in Figure 1, the Feedback Module (green block) processes this state to generate a reward signal Rt, which reflects the user's learning engagement, progress, and diversity in the recommended resources. This detailed representation ensures that the system effectively tracks and adapts to the user’s learning trajectory, aligning recommendations with both completed and pending knowledge points.
The reward signal is tailored to the learning scenario, incorporating user interactions, knowledge point progress, and course completion rate. The composite reward function is defined as:
where:
As illustrated in Figure 1, the reward signal computation is tightly integrated with the Feedback Module. This module dynamically adjusts the reward by monitoring the user’s progress along the knowledge graph, captured in the Knowledge Graph Propagation step (purple block).
To encourage recommendation diversity, a diversity score based on the distribution of knowledge points is incorporated into the reward signal. The diversity score is defined as:
where:
 The final reward signal is updated to include diversity:
where λ4 is the weight of the diversity score. As depicted in Figure 1, the diversity-enhanced reward signal is processed in the Reward Signal block of the Feedback Module, ensuring that the recommendations are not only personalized but also varied, encouraging exploration and comprehensive learning.
Below is the pseudocode that integrates knowledge graph propagation, dynamic user progress adjustment, recommendation generation, and reinforcement learning-based optimization.
As illustrated in Figure 1, the algorithm relies on distinct modules, including the Knowledge Graph Propagation (purple block), Multi-modal Data Fusion (pink block), and the Feedback Module (green block), to achieve personalized, adaptive, and effective recommendations.

To highlight the characteristics of the Computer Networks course, we analyze the complexity added by knowledge dependencies and dynamic user progress updates.
1) Knowledge Dependency Propagation Complexity: For the knowledge-point graph Gkn = (Vkn, Ekn), the feature propagation complexity is:O(K·|Ekn|dg),where K is the number of propagation layers, |Ekn| is the number of edges, and dg is the embedding dimension.
2) Dynamic User Progress Updates: Updating progress for each user and module interaction has a complexity of: O(Nu·Nm) ,where Nu is the number of users, and Nm is the number of modules.
3) Multi-modal Data Fusion Complexity: For M modalities with feature dimensions dm, the feature fusion complexity is:O(M·dm).
4) Recommendation Generation Complexity: Computing matching scores for user-resource pairs involves:O(Nu·Nm·dg)
5) Reinforcement Learning Feedback Optimization: The feedback optimization complexity is influenced by the state-action space, with single-step complexity: O(T·df) ,where Tis the recommendation sequence length, and df is the feature dimension.
6) The total complexity is:O(K·|Ekn|dg + Nu·Nm + M·dm + Nu·Nm·dg + E·T·df)
Considering the characteristics of the knowledge-point graph and course modules, the space complexity is:
 1) Embedding Storage: For user and module embeddings:O(Nu·dg + Nm·dg). 2) Knowledge Graph Storage: Sparse storage for the knowledge graph:O(|Ekn|). 3) Multi-modal Feature Storage: For MMM modalities:O(M·dm). 4) Total space complexity:
To comprehensively and scientifically validate the effectiveness of the proposed GCN-Attention-RL framework for dynamic personalized learning recommendation systems, this section presents detailed analysis and evaluation using multiple metrics and comparisons between the experimental group (students using the proposed framework) and the control group (students using traditional learning methods).
The experimental data is derived from the 2016 to 2022 cohorts of Computer Science students from the School of Information Engineering, who have participated in the Computer Networks course. The dataset comprises 486 students, divided into two groups:
(1) Students in this group utilized the proposed Graph Convolutional Network (GCN) and Attention Mechanism-based Dynamic Personalized Learning Recommendation Framework[1].
(2) The learning resources provided to students were dynamically adjusted, incorporating multimodal features such as textbooks, instructional videos, and code snippets, along with time-sequence modeling[10].
(3) Personalized recommendations included customized textbook chapters, experimental tasks, and practice code resources.
(1) Students followed a fixed teaching schedule (e.g., textbook chapter order) to complete the course.
(2) All students used identical textbook chapters and experimental projects[14].
(3) No personalized recommendation or dynamic adjustment was provided.
(1) Textual modality (textbook chapters) was processed using BERT to extract semantic embeddings.
(2) Code modality was processed using CodeBERT to extract code embeddings.
(1) Anomalous data (e.g., extremely short or long durations) was filtered out.
(2) Click and engagement behaviors were normalized using standardization.
To comprehensively evaluate the performance of the model, the following metrics were used:
 1) Recommendation Accuracy :
 2) Time Modeling Effectiveness:RMSE (Root Mean Square Error) was used to evaluate the accuracy of time-sequence modeling[22]. 3) Recommendation Diversity:Diversity was calculated to measure the difference among recommended items[6]:
where |S| is the size of the recommendation set, and Sim(·)is the cosine similarity.
4) Learning Effectiveness:Learning completion rates (e.g., chapter completion percentages) and the average experimental task scores[7].
The performance of the proposed framework was compared against the following methods:
 1) Baseline Models:Collaborative Filtering (CF);Matrix Factorization (MF) . 2) Deep Learning-based Models:Neural Collaborative Filtering (NCF);DeepFM . 3) Graph-based Models:GC-MC ;LightGCN.
We use Precision@K, Recall@K, and NDCG@K as the primary metrics for recommendation accuracy. The results are shown in Figure 3:
 Recommendation Accuracy with primary metrics
Analysis:
 1) The proposed GCN-Attention-RL model achieves the best performance across all metrics. Compared to the strong baseline LightGCN, Precision@5 improves by 5.9%, Recall@5 by 5.6%, and NDCG@5 by 5.0% [2]. 2) The improvement is attributed to:
 (1) Multi-modal data fusion: Effectively integrates textbook, video, and code features for comprehensive resource representation. (2) Temporal modeling: Combines LSTM and attention mechanisms to capture critical time points in user learning behavior, enhancing recommendation accuracy and personalization. (3) Reinforcement learning-based feedback optimization: Dynamically improves recommendation performance through user interaction feedback.
The results of time series modeling using RMSE (Root Mean Square Error) are shown in Figure 4:
 The results of time series modeling using RMSE
Analysis:
 1) The GCN-Attention-RL model achieves the best RMSE score, reducing error by 20.3% compared to LSTM and by 13.4% compared to TCN[5]. 2) The incorporation of attention mechanisms assigns differentiated weights to key behavior points, enabling more precise temporal modeling and avoiding redundant information.
We evaluate recommendation diversity and user learning outcomes, including course completion rate and average experimental scores. The results are presented in Figure 5:
 Diversity and Learning Outcome Analysis Result
Analysis:
 1) Recommendation Diversity: The diversity metric for GCN-Attention-RL reaches 0.698, a 10.1% improvement compared to LightGCN [4]. This improvement is due to:
 (1) Diversity-oriented reinforcement learning reward functions: Encourages diverse recommendation outcomes. (2) Multi-modal data integration: Incorporates textbook, video, and code features, enriching the structural variety of the recommended resources. 2) Learning Outcomes:
 (1) Completion rate (74.9%) and average experimental score (87.1) in the experimental group significantly outperform the control group and baseline models. (2) This indicates that the dynamic recommendation system not only enhances the usage of learning resources but also improves students' learning effectiveness.
To further quantify the practical impact of the GCN-Attention-RL framework on learning behavior, we compare the experimental group (using the proposed framework) with the control group (using traditional methods) on several key metrics. Results are summarized in Figure 6:
 Experimental Group vs. Control Group Analysis
(1) Completion Rate and CTR: The experimental group achieves an 11.7% higher completion rate and a 10.2% higher CTR compared to the control group, demonstrating that personalized recommendations significantly enhance student engagement with learning resources[16].
(2) Experiment Submission Rate and Scores: The experimental group shows a 11.2% increase in experiment submission rates and an 8.5% improvement in experimental scores, indicating that the proposed framework effectively supports students in completing hands-on tasks and improves their overall learning performance.
(3) Conclusion: The significant improvement in the experimental group highlights the effectiveness of personalized learning recommendations in adapting to student needs and enhancing learning outcomes[20].
(1) Superior Overall Performance:The GCN-Attention-RL model exhibits outstanding performance across recommendation accuracy, temporal modeling, diversity, and user satisfaction metrics, demonstrating the strong potential of dynamic personalized learning recommendation systems.
(2) Applicability to Computer Networks Course:The framework effectively integrates multi-modal resources (textbooks, experiments, and code) and aligns with the diverse learning needs of the computer networks course.
(3) Impact on Teaching Practices:The comparison between the experimental and control groups validates the framework's application value in real-world teaching, significantly improving student engagement and academic performance.
To further validate the effectiveness of each module in the proposed model, a series of ablation experiments were conducted by progressively removing key components. The experiments focused on evaluating the contribution of the multi-modal data fusion module, temporal modeling module, and reinforcement learning feedback module.
The multi-modal data fusion module was removed, and the model was evaluated using individual modalities (e.g., textbook content, code features, and video features). The following four configurations were compared:
 (1) Content Only: Using only textbook content features. (2) Code Only: Using only experiment code features. (3) Video Only: Using only teaching video features. (4) Full Model: Utilizing all modalities, including textbook, code, and video features, through the fusion module.
Table 1 shows the performance in terms of recommendation accuracy and diversity across different configurations.
| Model Configuration | Precision@5 | Recall@5 | Diversity | 
|---|---|---|---|
| Content Only | 0.632 | 0.511 | 0.581 | 
| Code Only | 0.647 | 0.520 | 0.562 | 
| Video Only | 0.624 | 0.495 | 0.534 | 
| Full Model | 0.732 | 0.641 | 0.693 | 
(1) Models using a single modality performed poorly in both recommendation accuracy and diversity. The Code Only modality slightly outperformed others, indicating the significance of code-related features in computer networking courses.
(2) The Full Model, which combines all modalities, significantly improved both recommendation accuracy (Precision@5 increased by 13.1%) and diversity (increased by 19.2%). This demonstrates that multi-modal data fusion effectively captures diverse and complementary information from different resources.
The temporal modeling module was removed, and the model performance was compared under the following three configurations:
 (1) No Temporal Module: Static features without any temporal modeling. (2) LSTM Only: Using LSTM for temporal modeling without incorporating attention mechanisms. (3) Full Temporal Module: Combining LSTM with attention mechanisms.
Table 2 shows the results for recommendation performance and temporal sequence prediction accuracy (measured by RMSE).
| Model Configuration | Precision@5 | Recall@5 | RMSE (Temporal Prediction) | 
|---|---|---|---|
| No Temporal Module | 0.681 | 0.582 | - | 
| LSTM Only | 0.704 | 0.603 | 2.11 | 
| Full Temporal Module | 0.732 | 0.641 | 1.72 | 
(1) Removing the temporal modeling module (No Temporal Module) significantly reduced recommendation performance, confirming that temporal user behavior is critical for effective recommendations.
(2) The LSTM Only configuration improved both Precision@5 and RMSE compared to the static model, showcasing the importance of sequential modeling.
(3) The Full Temporal Module, integrating LSTM with attention mechanisms, achieved the best results. The RMSE decreased from 2.11 to 1.72, highlighting the ability of attention mechanisms to capture key time points in user behavior.
The reinforcement learning feedback module was removed, and a static recommendation approach was used for comparison. The following two configurations were tested:
 (1) No RL: Using fixed multi-modal fusion weights and a static recommendation ranking. (2) Full RL Module: Employing reinforcement learning to dynamically adjust recommendation weights and ranking.
Table 3 summarizes the impact of the reinforcement learning module on recommendation diversity and user satisfaction.
| Model Configuration | Diversity | CTR (Click-Through Rate) | Learning Completion Rate | 
|---|---|---|---|
| No RL | 0.612 | 0.465 | 0.602 | 
| Full RL Module | 0.693 | 0.514 | 0.681 | 
(1) The removal of the reinforcement learning module (No RL) led to lower diversity and user satisfaction metrics. This indicates that a static approach fails to adapt to dynamic user behavior and diverse learning preferences.
(2) The Full RL Module significantly enhanced recommendation diversity (13.2% increase), click-through rate (10.5% increase), and learning completion rate (13.1% increase). These results confirm that reinforcement learning enables the system to dynamically adjust to user needs, resulting in more effective and engaging recommendations.
Through the ablation experiments, we validated the contributions of each module in the proposed model. The multi-modal data fusion module, temporal modeling module, and reinforcement learning feedback module collectively contribute to the significant improvements in recommendation accuracy, diversity, and user satisfaction. These findings further underscore the robustness and adaptability of the proposed model for personalized learning recommendations in computer networking courses.
| Module | Time Complexity | Optimization Suggestions | 
|---|---|---|
| Multi-modal Fusion Module | O(M·dm) | Reduce the feature dimension dm. | 
| Temporal Modeling (LSTM) | Decrease the sequence length T. | |
| Temporal Modeling (Attention) | O(T2·dt) | Apply sparse attention mechanisms. | 
| Reinforcement Learning Module | O(T·df) | Enhance parallelization of training. | 
The Multi-modal Fusion Module has a linear complexity with respect to the number of modalities MM and feature dimension dm, making it relatively efficient. The Temporal Modeling Module using LSTM and attention mechanisms exhibits quadratic complexity concerning the sequence length T, where sparse attention can help mitigate the computational overhead. The Reinforcement Learning Module adds linear complexity for each time step T, making it suitable for real-time recommendation when implemented efficiently.
Table 5 provides the space complexity of each module in the proposed framework
| Module | Space Complexity | Optimization Suggestions | 
|---|---|---|
| User and Resource Embeddings | O(Nu·dg + Nm·dg) | Reduce embedding dimensions dg. | 
| Knowledge Graph Storage | O(|Ekn|). | Ekn | 
The User and Resource Embedding Module has a space complexity linear in the number of users Nu and resources Nm, scaled by the embedding dimension dg Reducing dg without compromising feature representation can alleviate memory consumption. Additionally, the Knowledge Graph Storage Module, which stores dependencies between learning modules, can be optimized using sparse representations to manage the large-scale storage efficiently.
This detailed complexity analysis ensures the proposed model remains scalable for real-world applications, particularly in the context of personalized learning recommendation systems.
This study utilizes learning data from the "Computer Networks" course to develop and validate a dynamic personalized learning recommendation system that integrates Graph Convolutional Networks (GCN), attention mechanisms, and reinforcement learning. The following discussion delves deeper into the research findings and their significance.
1) Effectiveness of Multimodal Feature Fusion: By dynamically adjusting the weights of text, code, and video modalities using a gating mechanism, the model significantly improves recommendation accuracy (e.g., Precision@5 increased by 13.1%) and diversity (increased by 19.2%)[13][19]. This validates the effectiveness of multimodal data fusion in complex course resource recommendations and highlights its innovation in utilizing heterogeneous data comprehensively.
2) Critical Role of Temporal Modeling: The temporal modeling module, combining LSTM and attention mechanisms, not only accurately captures the temporal dependencies in user learning behaviors but also identifies key time points, ensuring that recommendations align with students’ learning progress[5],[22].
3) Contribution of Reinforcement Learning: By dynamically adjusting recommendation strategies, the reinforcement learning module effectively enhances user satisfaction (e.g., CTR increased by 5.8%) and learning completion rate (increased by 6.8%)[8],[16]. This design offers a novel approach for dynamically optimizing educational recommendation systems.
1) Data Scale and Diversity: The scale and scope of the dataset used in this study are limited. Future work could involve incorporating larger datasets and exploring cross-disciplinary recommendation scenarios to improve the model’s generalizability [17].
2) Computational Complexity: The inclusion of multiple modules increases the computational cost of training and inference. Future research could adopt efficient model compression and acceleration techniques to enhance performance[11],[2].
3) Cold Start Problem: Due to the reliance on user behavior data, the model may perform poorly for new or inactive users. Future studies could address this issue through transfer learning or meta-learning approaches[14],[18].
1) Support for Personalized Teaching:The proposed method provides dynamic and personalized learning path recommendations, helping students optimize their learning process and improve efficiency[6],[20].
2) Assistance for Instructional Decision-Making:The model identifies students’ weaknesses in learning and offers intelligent suggestions for resource allocation, demonstrating significant practical implications[13],[15].
This study proposes a dynamic personalized learning recommendation system that integrates Graph Convolutional Networks, attention mechanisms, and reinforcement learning, and validates its effectiveness through experiments on "Computer Networks" course data. The main contributions are as follows:
 1) A multimodal feature fusion framework is proposed to effectively address the challenge of integrating heterogeneous data[10]. 2) The temporal modeling module enhances the model’s ability to capture temporal dependencies in learning behaviors[22],[5]. 3) The reinforcement learning-based recommendation strategy dynamically optimizes recommendations, significantly improving accuracy, diversity, and user satisfaction[8],[16].
1) Expansion to Multiple Courses and Disciplines: Extending the model to other core courses (e.g., Data Structures, Operating Systems) or interdisciplinary courses (e.g., Artificial Intelligence) to verify its applicability and generalizability [17],[18].
2) Optimization of Temporal Modeling and Real-Time Performance: Introducing sparse attention mechanisms or transformer-based models to optimize the temporal modeling module[5].
3) Addressing Cold Start Problems: Leveraging transfer learning or meta-learning techniques to reduce reliance on user behavior data and improve recommendations for new or inactive users[14],[18].
The proposed method not only advances the development of personalized educational recommendation systems but also provides technical insights for multimodal recommendation models based on GCNs and attention mechanisms. With the expansion of educational data and continuous algorithmic improvements, this approach is expected to further enhance educational equity and help students achieve efficient, personalized learning experiences30[7].
