Nonlinear Adaptive Optimization of Multi-Modal Learning Paths Using Graph Convolutional Networks and Reinforcement Learning for Intelligent Educational Systems

With the rapid development of artificial intelligence technologies, personalized learning recommendation systems have become a key research focus in the field of smart education. These systems aim to enhance students' learning efficiency by providing tailored learning pathways that dynamically adapt to their evolving learning behaviors and needs. However, the inherent complexity, heterogeneity, and dynamism of educational data pose significant challenges for traditional recommendation methods. These methods struggle to address issues such as the integration of multi-modal learning resources, modeling the temporal dynamics of learning behaviors, and dynamically optimizing learning pathways[1].

Recent advances in deep learning-based recommendation systems have shown promising results in modeling complex data relationships and non-linear patterns. For example, neural collaborative filtering (NCF) models utilize neural networks to capture intricate feature interactions between users and items, leading to improved recommendation accuracy [2]. Furthermore, the introduction of graph neural networks (GNN), particularly lightweight graph convolution networks (LightGCN), has provided new methods for modeling user-resource interaction relationships[3]. However, these models face limitations in educational contexts, including insufficient support for multi-modal resources and inadequate consideration of learners' dynamic behaviors.

To address knowledge dependencies and temporal dynamics in education, researchers have introduced knowledge graphs and temporal modeling techniques. Knowledge graph attention networks (KGAT) enhance the representation of complex knowledge relationships by integrating knowledge graphs with attention mechanisms[4]. Additionally, temporal modeling methods such as long short-term memory (LSTM) networks and Transformers have demonstrated strong applicability in capturing the sequential nature of learning behaviors and identifying critical learning stages [5][4]. Meanwhile, multi-modal fusion techniques have gained traction, particularly in educational recommendations, where dynamic weighting mechanisms are used to jointly model heterogeneous data such as textbooks, instructional videos, and coding exercises [14].

Reinforcement learning (RL) techniques have introduced new possibilities for dynamically optimizing recommendation strategies. In educational recommendation systems, RL not only adjusts recommendations in real-time to meet students' personalized needs but also optimizes long-term learning objectives through reward signal designs. For instance, deep reinforcement learning (DRL)- based recommendation methods, which incorporate learning pathways, resource coverage, and user behavior into multi-objective rewards, have significantly improved recommendation diversity and user satisfaction [18].

This paper addresses the aforementioned challenges by proposing a dynamic personalized learning recommendation system based on graph convolutional networks (GCN) and attention mechanisms. Using data from a "Computer Networks" course as a case study, the system integrates multi-modal learning resources (e.g., textbooks, assignments, and instructional videos) with temporal modeling of user learning behaviors and reinforcement learning optimization to overcome challenges in heterogeneity, diversity, and dynamism. Specifically, this study contributes:

1) An end-to-end recommendation model combining GCN and attention mechanisms: The GCN models the interaction graph structure between users and resources, capturing dependency relationships between knowledge points, while the attention mechanism dynamically identifies critical learning nodes.

2) A dynamic multi-modal fusion mechanism: By incorporating gating mechanisms and reinforcement learning strategies, the system dynamically adjusts the weights of different modalities, such as textbooks, videos, and experiments, to accommodate learning behaviors at various stages.

3) A temporal modeling framework for learning behaviors: Combining LSTM and Transformers, the system effectively captures the temporal dependencies and phased characteristics of user learning behaviors, significantly enhancing the precision of personalized recommendations.

4) Optimized recommendation strategies: By designing reward signals in reinforcement learning, the system dynamically adjusts recommendation objectives, improving overall system performance in terms of recommendation accuracy, diversity, and user satisfaction.

2

Related Work

2.1

Learning Resource Recommendation Systems

Recommendation systems have emerged as a critical research area in artificial intelligence, finding applications across various domains. With the increasing demand for personalized learning and the diversification of educational resources, recommendation systems have evolved from traditional collaborative filtering techniques to deep learning-driven models that integrate knowledge graphs, multimodal data, and temporal modeling techniques. These advancements address the complex data characteristics and dynamic learning needs inherent in educational scenarios. However, compared to domains like e-commerce and social media, educational recommendation systems face unique challenges, such as modeling dependencies between knowledge points, capturing the temporal dynamics of student behaviors, and integrating multimodal, heterogeneous learning resources.

2.1.1

Evolution of Traditional Collaborative Filtering

Collaborative filtering (CF) is a foundational technique in recommendation systems, leveraging user-item interaction matrices to predict user preferences. Matrix factorization techniques, such as singular value decomposition (SVD) and non-negative matrix factorization (NMF), have demonstrated strong performance in static recommendation tasks[8].However, CF methods encounter significant limitations in educational recommendations:

1) Cold-Start Problem: CF struggles to generate accurate recommendations for new users or newly introduced resources due to sparse interaction data[9].

2) Lack of Dynamic Modeling: CF fails to capture the time-dependent nature of user behaviors and learning processes, limiting its applicability in personalized education scenarios.

Recent research has addressed these limitations by incorporating contextual features (e.g., learning background and goals) or integrating knowledge graph-based methods. However, these enhancements still fall short in dynamic modeling and multimodal resource processing.

2.1.2

Deep Learning-Driven Recommendation Techniques

Deep learning has significantly advanced recommendation systems by enabling non-linear feature modeling. Representative models include:

1) Neural Collaborative Filtering (NCF): Replacing traditional linear inner-product operations with multi-layer perceptrons (MLPs), NCF captures complex user-item interactions[2].

2) DeepFM: Combining factorization machines with deep neural networks, DeepFM enhances the extraction of high-dimensional, sparse features[11].

3) Dynamic Interest Networks (DIN/DIEN): These models leverage attention mechanisms to capture the evolution of user interests, improving recommendation accuracy and real-time responsiveness.

Despite their success in general applications, these models face challenges in education, such as processing heterogeneous learning resources (e.g., textbooks, instructional videos, and experimental code) and capturing the dynamic temporal aspects of user learning behaviors.

2.1.3

Knowledge Graphs in Educational Recommendations

Knowledge graphs (KG) have seen increasing adoption in educational recommendations, enabling semantic representation of course chapters, knowledge points, and their dependencies. Recent advancements include:

1) Knowledge Graph Embeddings: Techniques like TransE and DistMult map entities and relationships into low-dimensional vector spaces, enhancing semantic representation[5].

2) Graph Neural Networks (GNNs): The integration of GNNs with KGs has further improved the representation of complex knowledge structures 1)[21].

3) Knowledge Graph Attention Networks (KGAT): By incorporating attention mechanisms and graph convolution, KGAT effectively models complex knowledge relationships, supporting dynamic learning path generation [1].

While these methods have shown promise, they often lack robust support for temporal dynamics and multimodal data integration.

2.2

Multimodal Learning and Temporal Modeling

2.2.1

Multimodal Data Representation in Education

Modern educational resources are inherently multimodal, encompassing textual content, videos, and programming exercises. Recent efforts to model multimodal data include:

1) Multi-Gate Mixture-of-Experts (MMoE): This framework dynamically assigns weights to different modalities, reflecting their varying importance in learning tasks[7].

2) Pre-trained Models: BERT and CodeBERT have achieved significant advancements in textual and programming language modeling, respectively[21],[10].

Despite these innovations, challenges remain in dynamically adjusting modality weights to reflect changes in learning stages and effectively integrating the semantics of heterogeneous data sources.

2.2.2

Temporal Modeling of User Behavior

Temporal modeling plays a crucial role in capturing the dynamic nature of user learning behaviors. Key techniques include:

1) Long Short-Term Memory (LSTM): LSTM networks excel in capturing long-term dependencies in sequential data[22].

2) Temporal Convolutional Networks (TCN): TCNs improve efficiency and parallelization in temporal modeling while maintaining high performance [23].

3) Transformers: Attention mechanisms in Transformers highlight critical events in sequences, making them highly suitable for optimizing dynamic learning paths[5].

2.2.3

Attention Mechanisms for Multimodal and Temporal Fusion

Attention mechanisms have emerged as a key solution for addressing challenges in multimodal fusion and temporal modeling. By focusing on critical features and time points, attention mechanisms significantly enhance the performance of recommendation systems [23].

2.3

Reinforcement Learning in Recommendations

2.3.1

Reinforcement Learning for Dynamic Recommendation

Reinforcement learning (RL) optimizes recommendation strategies based on real-time user interactions. Deep reinforcement learning (DRL) extends RL capabilities with deep neural networks, achieving notable success in domains such as e-commerce and video streaming [14].

In education, RL enables dynamic adjustment of learning paths to accommodate evolving student needs, offering a promising direction for personalized recommendations [15].

2.3.2

Reward Signal Optimization in Educational Recommendations

Reward signal design is critical in RL, particularly in education, where multi-objective optimization (e.g., learning completion, knowledge coverage, and resource diversity) is essential. Multi-objective reward functions provide theoretical foundations for dynamic recommendation strategies [16].

2.4

Innovation and Contributions of This Study

2.4.1

Advances in Dynamic Multimodal Fusion

This study introduces a gating mechanism combined with reinforcement learning to achieve dynamic weighting of multimodal features, such as textbooks, instructional videos, and experiments, effectively addressing varying learning stage requirements.

2.4.2

Time and Knowledge-Driven Personalized Recommendations

A novel framework integrating graph convolutional networks (GCN), temporal modeling, and reinforcement learning is proposed. This framework captures interdependencies among knowledge points, dynamically adjusts recommendation paths, and represents a paradigm shift in personalized educational recommendations.

In summary, recent studies on educational recommendation systems have made significant progress, yet notable research gaps remain in the generation of dynamic learning paths, the integration of multimodal data, and the modeling of temporal and knowledge associations. This paper comprehensively reviews recent advancements, highlighting the strengths and limitations of traditional collaborative filtering methods, deep learning-based recommendation techniques, and knowledge graph-driven models in educational contexts. Additionally, it explores the latest developments in multimodal learning and temporal modeling, as well as the potential of reinforcement learning for personalized recommendations [7],[13]. Based on this foundation, the paper proposes an innovative framework for personalized learning recommendation systems that integrates graph convolutional networks (GCN)[1],[2],[20], attention mechanisms, dynamic weighting of multimodal features, and reinforcement learning, offering a novel technical pathway to address these challenges.

3

Methodology

3.1

Overall Architecture Design

3.1.1

System Design Objectives

The Computer Networks course is a critical component of computer science education, characterized by unique learning needs and recommendation challenges. These include: diverse learning objectives, encompassing both fundamental theories (e.g., network protocols, layered architectures) and practical exercises (e.g., network simulations, socket programming); personalized learning paths, requiring the system to dynamically adapt to students' varying learning levels, interests, and mastery of knowledge; and temporal dependency, indicating that students' learning needs are often sequential, such as studying foundational knowledge (e.g., TCP/IP protocols) before advancing to hands-on practices[16].

This study proposes a dynamic personalized learning recommendation system tailored to the Computer Networks course. The system is designed to recommend multimodal learning resources, such as textbooks, instructional videos, lab projects, and code repositories[13],[17]. By modeling user learning behaviors and employing temporal sequence analysis, the system dynamically adjusts recommendation strategies, enabling students to construct efficient learning paths. To achieve this, the architecture leverages a combination of Graph Convolutional Networks (LightGCN)[2], attention mechanisms [5], and reinforcement learning[8][23] to optimize recommendations, while incorporating domain-specific input features to capture the unique aspects of the Computer Networks course[17].

3.1.2

System Structure Description

The proposed dynamic personalized learning recommendation system is structured as illustrated in Figure 1, which outlines the hierarchical workflow of the system, from input processing to recommendation generation and feedback optimization. This framework includes five key modules: the Input Layer, the Multimodal Data Fusion Layer[7], the Temporal Modeling Module, the Recommendation Engine Module, and the Feedback Module[18].

Figure 1 provides a detailed view of how the various components of the system are interconnected, including data flow and the roles of specific technologies. For instance, the Input Layer processes raw user data and extracts features via a shared embedding network, feeding them into subsequent layers. The Multimodal Data Fusion Layer[7], as depicted, employs a gated mechanism enhanced by reinforcement learning to integrate multi-modal information dynamically. This layered design ensures the system's ability to adaptively recommend learning materials tailored to individual students' needs, particularly within the context of a Computer Networks course[18].

The proposed system consists of five key components:

1) Input Layer: Collects user learning behavior data (e.g., resource clicks, dwell time), course content features (e.g., textbook sections, code snippets), and temporal features (e.g., learning stages) while performing feature engineering and preprocessing.

2) Multimodal Data Fusion Layer: Uses a gating mechanism to dynamically integrate features from different modalities, with reinforcement learning optimizing the fusion strategy.

3) Temporal Modeling Module: Employs LSTM to capture the temporal dependencies of user behavior while integrating attention mechanisms to focus on key learning activities.

4) Recommendation Engine Module: Utilizes LightGCN [2] for feature propagation over the user-item interaction graph and combines attention mechanisms to generate personalized recommendations.

5) Feedback Module: Implements reinforcement learning with reward signals (e.g., recommendation efficiency and diversity) to iteratively refine recommendation quality.

3.1.3

Summary of Key Innovations

1) Multimodal Feature Fusion: A gating mechanism is designed to dynamically integrate multimodal features, effectively combining semantic features from text, videos, and code relevant to the Computer Networks course.

1) Temporal Sequence Modeling: By combining LSTM and attention mechanisms, the system effectively models students' stage-specific learning needs.

2) Reinforcement Learning Optimization: Reward signals based on diversity and efficiency improve the quality of recommendations[8],[23].

3) Efficient Feature Propagation: LightGCN [2] is employed for efficient modeling of user-item interactions, significantly reducing computational complexity while maintaining high performance.

3.2

Input Layer: User Data Collection and Processing

The use of pretrained models like BERT and CodeBERT for extracting textual and code semantic features has been validated in multiple studies [9],[10].

3.3

Multi-Modal Data Fusion Layer

The Multimodal Data Fusion Layer, illustrated in pink in Figure 1, is responsible for integrating the processed input features derived from the Input Layer. By utilizing a gated mechanism and reinforcement learning[8], this layer dynamically weighs the importance of different data modalities—textbooks, videos, and code examples—based on their contextual relevance. This fusion ensures that the output is a rich, comprehensive representation that retains essential features from all modalities and prepares it for time-series analysis in the Temporal Modeling Module. The design and optimization strategies for this layer address the specific challenges posed by the diverse resource types in the Computer Networks course.

3.3.1

Challenges in Multi-Modal Data Fusion

The resources for the computer networks course are highly diverse, encompassing textbooks (text), videos (dynamic visualizations), and code examples (hands-on practice). These modalities significantly influence students’ learning behaviors and progress. However, the heterogeneous nature of these modalities presents challenges in directly fusing them, as this could result in information loss or imbalanced contributions from different modalities. To address these challenges, we design a dynamic multi-modal fusion strategy tailored for computer networks course resources. This strategy leverages a gating mechanism and reinforcement learning to dynamically adjust the weights of modalities, ensuring precise modeling of students’ learning needs.

3.3.2

Gating Mechanism Design and Implementation

To dynamically adjust the importance of different modalities, the gating mechanism incorporates user behavior features A_m, such as click rate, video watch duration, and interaction frequency. The fusion formula is defined as: (6) $F_{fused} = \sum_{m = 1}^{M} α_{m} F_{m}$ \[{{\text{F}}_{\text{fused }\!\!~\!\!\text{ }}}=\sum\nolimits_{\text{m}=1}^{\text{M}}{{{\alpha }_{\text{m}}}{{\text{F}}_{\text{m}}}}\]

where the modality weight α_m is computed as: (7) $α_{m} = \frac{\exp (g (F_{m}, A_{m}))}{\sum_{j = 1}^{M} \exp (g (F_{j}, A_{j}))}$ \[{{\alpha }_{\text{m}}}=\frac{\exp \left( \text{g}\left( {{\text{F}}_{\text{m}}},{{\text{A}}_{\text{m}}} \right) \right)}{\sum\nolimits_{\text{j}=1}^{\text{M}}{\exp \left( \text{g}\left( {{\text{F}}_{\text{j}}},{{\text{A}}_{\text{j}}} \right) \right)}}\]

Here, g(·) is a nonlinear function (e.g., an MLP) that outputs an importance score based on the modality feature F_m and behavior feature A_m.

Weight Design for Computer Networks Course:For chapter-based content (e.g., link layer, network layer, application layer), the weights of different modalities are dynamically adjusted. For example:

Textbooks are assigned higher weights during topics like the "Link Layer" to emphasize theoretical understanding.

Code Examples are prioritized for topics like the "Application Layer," where practical coding exercises are more relevant.

The reinforcement learning optimization strategy, as described in Figure 1, enhances this gating mechanism by dynamically learning optimal weights during the recommendation process. The reward function used for reinforcement learning is defined as Formula (8): (8) $R = λ_{1} \cdot Learning Progress + λ_{2} \cdot Engagement Level$ \[\text{R}={{\lambda }_{1}}\cdot ~\text{Learning Progress}~+{{\lambda }_{2}}\cdot ~\text{Engagement Level }\!\!~\!\!\text{ }\]

where Learning Progress measures the completion of learning tasks, and Engagement Level evaluates student interactions with the recommended resources.

3.3.3

Reinforcement Learning Optimization Strategy

The multi-modal fusion process is modeled as a Markov Decision Process (MDP), where the state s_t represents the modality features in the current learning phase, and the action a_t adjusts the modality weights α_m. The objective of reinforcement learning is to maximize the cumulative reward RR: (9) $ℒ_{RL} = E [\sum_{t = 1}^{T} γ^{t} R_{t}]$ \[{{\mathcal{L}}_{\text{RL}}}=\mathbb{E}\left[ \sum\nolimits_{\text{t}=1}^{\text{T}}{{{\gamma }^{\text{t}}}{{\text{R}}_{\text{t}}}} \right]\]

where γ is the discount factor. A Deep Q-Network (DQN) is utilized as the policy network, which takes the state s_t as input and outputs the optimal action a_t.The iterative optimization ensures that the system adapts to students’ evolving learning behaviors over time.

As highlighted in Figure 1, the reinforcement learning strategy works in tandem with the gating mechanism to dynamically adjust the fusion strategy during training. This approach not only maximizes learning efficiency but also ensures resource diversity in the recommendations.

3.3.4

User Behavior and Course Content Feature Collection

The Input Layer (highlighted in blue in Figure 1 is responsible for collecting and preprocessing user data. For the Computer Networks course, the input data consists of the following components:

1) Behavioral Data: Captures students' interactions with learning resources, including:

(1) Click behaviors: Frequency and duration of clicks on textbooks, videos, and labs.

(2) Interaction behaviors: Time and quality of completing lab projects or submitting code.

(3) Dwell behaviors: Time spent on specific learning resources.

2) Course Content Features:

(1) Textual Modality: Extracts semantic features from textbook sections using: (1) $f_{content} = BERT (Content)$ \[{{\text{f}}_{\text{content }\!\!~\!\!\text{ }}}=~\text{BERT}(\text{Content})~\]

where Content represents the textual content of the textbook, and f_content ϵ ℝ^d is the semantic feature vector.

(2) Code Modality: Extracts semantic representations from code snippets using: (2) $f_{code} = CodeBERT (Code)$ \[{{\text{f}}_{\text{code }\!\!~\!\!\text{ }}}=~\text{CodeBERT}(\text{Code})~\]

where Code is the programming code, and f_code ϵ ℝ^d is the feature vector.

3) Temporal Features: Captures the sequential dependency of learning modules. Temporal features are encoded using: (3) $e_{t} = ModuleEncoding (Module)$ \[{{\mathbf{e}}_{\text{t}}}=~\text{ModuleEncoding}(\text{Module})~\]

where Module represents the module ID, and e_t is the temporal embedding vector.

3.3.5

Data Preprocessing and Feature Engineering

To adapt multimodal input data, the following preprocessing steps are performed:

1)

Behavioral Data Cleaning and Normalization:

Removes outliers and standardizes features: (4) $x^{'} = \frac{x - μ}{σ}$ \[\mathbf{{x}'}=\frac{\mathbf{x}-\mu }{\sigma }\]

where μ and σ\sigma denote the mean and standard deviation of the features, respectively.

2)

Feature Extraction and Mapping:

Textual and code features are extracted using pretrained models (e.g., BERT, CodeBERT) and mapped to a shared feature space: (5) $z = W [f_{content}, f_{code}] + b$ \[z=W\left[ {{\text{f}}_{\text{content }\!\!~\!\!\text{ }}},{{\text{f}}_{\text{code }\!\!~\!\!\text{ }}} \right]+b\]

where W and b are learnable parameters.

3.3.6

Shared Embedding Network Design

The shared embedding network maps multi-modal features (text, code, and temporal) into a unified feature space, generating the final input vector: (6) $h = σ (W_{1} z b_{1})$ \[h=\sigma \left( {{\text{W}}_{1}}z{{b}_{1}} \right)\]

where h represents the user’s multi-modal feature representation, and σ is an activation function (e.g., ReLU).

3.4

Temporal Modeling Module

3.4.1

Dependency Modeling of Knowledge Points

In the Computer Networks course, knowledge points exhibit a strong sequential and dependency relationship. For instance, learning the "Network Layer" relies on foundational knowledge from the "Link Layer." To model this dependency, a dependency matrix D is defined as follows: (10) $D_{ij} = {\begin{matrix} 1, if chapterj is a prerequisite for chapter i \\ \begin{array}{l} \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} 0, \end{matrix} \end{matrix} & otherwise . \end{matrix} \end{matrix} \end{array} \end{matrix}$ \[{{\text{D}}_{\text{ij}}}=\left\{ \begin{matrix} 1,~\text{if chapterj is a prerequisite for chapter i }\!\!~\!\!\text{ } \\ \begin{array}{*{35}{l}} \begin{matrix} \begin{matrix} \begin{matrix} \begin{matrix} 0, & {} \\ \end{matrix} & {} & {} \\ \end{matrix} & {} & \text{otherwise}. \\ \end{matrix} & {} & {} \\ \end{matrix} & {} & {} \\ \end{array} \\ \end{matrix} \right.\]

In the LSTM, the hidden state update formula is modified to incorporate the dependency relationship: (11) $h_{t} = σ (W_{h} x_{t} + U_{h} h_{t - 1} + D_{t} \cdot h_{t - 1} + b_{h})$ \[{{\text{h}}_{\text{t}}}=\sigma \left( {{\text{W}}_{\text{h}}}{{\text{x}}_{\text{t}}}+{{\text{U}}_{\text{h}}}{{\text{h}}_{\text{t}-1}}+{{\text{D}}_{\text{t}}}\cdot {{\text{h}}_{\text{t}-1}}+{{\text{b}}_{\text{h}}} \right)\]

where D_t·h_t–1 explicitly represents the influence of the current knowledge point's prerequisite dependencies on the hidden state.

As shown in Figure 1, this dependency modeling is embedded in the Temporal Modeling Layer, enabling the system to accurately capture the hierarchical structure of knowledge points in the course. The dependency matrix enhances the temporal dynamics modeled by LSTM by introducing explicit representations of prerequisite relationships.

3.4.2

LSTM-Based Temporal Sequence Modeling

The temporal sequence data X = [x₁,x₂, … , x_T] represents the student's learning behavior sequence, where each x_t is the feature vector of learning activities at time step t LSTM is employed to capture the temporal dependencies in this sequence. The core equations of LSTM are as follows: (12) $\begin{matrix} i_{t} & = σ (W_{i} x_{t} + U_{i} h_{t - 1} + b_{i}), \\ f_{t} & = σ (W_{f} x_{t} + U_{f} h_{t - 1} + b_{f}), \\ o_{t} & = σ (W_{o} x_{t} + U_{o} h_{t - 1} + b_{o}), \\ c_{t} & = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ \tanh (W_{c} x_{t} + U_{c} h_{t - 1} + b_{c}), \\ h_{t} & = o_{t} ⊙ \tanh (c_{t}), \end{matrix}$

where σ(·)is the sigmoid activation function and ⊙denotes element-wise multiplication.

As depicted in Figure 1, the Temporal Modeling Layer applies LSTM to the processed input sequence from the Multimodal Data Fusion Layer, effectively learning long-term and short-term dependencies in students' learning behaviors. This facilitates capturing the sequential nature of knowledge acquisition in the course.

3.4.3

Attention Mechanism Optimization

To enhance the modeling of the importance of course knowledge points, an attention mechanism is introduced to compute the significance weight of each time step: (13) $β_{t} = \frac{\exp (q^{T} \tanh (W_{a} h_{t} + b_{a}))}{\sum_{j = 1}^{T} \exp (q^{T} \tanh (W_{a} h_{j} + b_{a}))}$ \[{{\beta }_{\text{t}}}=\frac{\exp \left( {{\text{q}}^{\text{T}}}\tanh \left( {{\text{W}}_{\text{a}}}{{\text{h}}_{\text{t}}}+{{\text{b}}_{\text{a}}} \right) \right)}{\sum\nolimits_{\text{j}=1}^{\text{T}}{\exp \left( {{\text{q}}^{\text{T}}}\tanh \left( {{\text{W}}_{\text{a}}}{{\text{h}}_{\text{j}}}+{{\text{b}}_{\text{a}}} \right) \right)}}\]

The final weighted representation of the temporal sequence is then computed as: (14) $H_{attn} = \sum_{t = 1}^{T} β_{t} h_{t} .$ \[{{\text{H}}_{\text{attn}}}=\sum\nolimits_{\text{t}=1}^{\text{T}}{{{\beta }_{\text{t}}}{{\text{h}}_{\text{t}}}.}\]

This attention mechanism, as visualized in Figure 1, enhances the LSTM's output by focusing on critical knowledge points in the sequence. The weights βt\beta_t can be further adjusted according to chapter importance, enabling the system to prioritize more impactful sections of the course material.

3.4.4

Reinforcement Learning-Assisted Temporal Optimization

Beyond the attention mechanism, reinforcement learning is utilized to optimize the temporal modeling of students' learning behaviors. A reward function is designed to encourage students to follow the dependency chain of knowledge points sequentially: (15) $R_{t} = λ_{1} \cdot Correctness_{t} + λ_{2} \cdot {Completion}_{t}$ \[{{\text{R}}_{\text{t}}}={{\lambda }_{1}}\cdot ~\text{Correctness}{{~}_{\text{t}}}+{{\lambda }_{2}}\cdot ~\text{Completion}{{\text{ }\!\!~\!\!\text{ }}_{\text{t}}}\]

where:

Correctness_t evaluates the accuracy of the learning content at the current time step,

Completion_t measures the proportion of learning objectives achieved at the current stage.

As shown in Figure 1, this reinforcement learning framework is integrated into the Feedback Module and linked back to the Temporal Modeling Layer. The dynamic reward strategy ensures that the model adapts to individual students' progress and learning goals, enhancing both personalization and temporal alignment in recommendations.

The code snippet shown in the Figure 2 represents a core part of the dynamic personalized learning recommendation system described in the paper. The snippet outlines the process for multi-modal data fusion and temporal modeling for user representation generation, which are crucial for the proposed model. Here's a brief breakdown:

Step 1: Multi-modal Data Fusion:

This step combines multiple data sources (e.g., text, images, or other features) using a weighted fusion technique. Each modality's contribution to the final representation is computed based on the softmax function, which normalizes the weights across different modalities.

Step 2: Temporal Modeling with LSTM

In this step, the Long Short-Term Memory (LSTM) model is used to process the time sequence data. LSTM is commonly used for sequential data and helps capture temporal dependencies in user behavior (e.g., learning patterns over time).

Step 3: Attention Mechanism for Temporal Features

This step applies an attention mechanism to the temporal features, which allows the model to focus more on important time steps in the sequence. The attention mechanism uses learned weights β_t to assign different attention scores to each time step.

Step 4: Final User Representation

Finally, the outputs of the multi-modal fusion and temporal modeling (i.e., the fused feature vectors and attention-modulated time steps) are concatenated to form the final user representation, which is then used for personalized recommendations.

This code showcases an efficient method to fuse multi-modal data and account for temporal dependencies in user behavior, key for the personalized learning recommendations the paper addresses.

3.5

Recommendation Engine Module

3.5.1

Construction of User-Learning Resource Interaction Graph

For the recommendation task in the Computer Networks course, we constructed not only a user-resource interaction graph G = (V, E), representing the relationships between users and learning resources (e.g., textbook chapters, experimental videos, code projects), but also an extended knowledge-point graph G_kn = (V_kn, E_kn). The components of the graphs are defined as follows:

V_kn: A set of knowledge points (e.g., key topics such as the Link Layer, Network Layer, and Transport Layer).

E_kn: A set of prerequisite relationships between knowledge points (e.g., the Network Layer depends on understanding the Link Layer).

The embedding updates fZor nodes in the interaction graph incorporate multimodal relationships and are defined as: (16) $h_{v}^{(k)} = \underset{u \in N (v)}{\sum^{}} \frac{1}{\sqrt{l}} h_{u}^{(k - 1)} + λ \underset{j \in N_{kn} (v)}{\sum^{}} h_{j}^{(k - 1)}$ \[\text{h}_{\text{v}}^{(\text{k})}=\underset{\text{u}\in \text{N}(\text{v})}{\mathop{\mathop{\sum }^{}}}\,\frac{1}{\sqrt{\text{l}}}\text{h}_{\text{u}}^{(\text{k}-1)}+\lambda \underset{\text{j}\in {{\text{N}}_{\text{kn}}}(\text{v})}{\mathop{\mathop{\sum }^{}}}\,\text{h}_{\text{j}}^{(\text{k}-1)}\]

where:

$h_{v}^{(k)}$ \[\text{h}_{\text{v}}^{(\text{k})}\]: The embedding representation of user or learning resource node v at the h_th layer.

N(v: The set of neighbors of node v in the interaction graph.

N_kn(v): The set of neighbors of resource node v in the knowledge-point graph.

λ: A weighting parameter that controls the influence of knowledge dependencies on embedding updates.

As illustrated in Figure 1, the Graph Construction step within the Recommendation Engine Module integrates both interaction and knowledge-point graphs. This dual-graph modeling approach combines users’ learning behaviors with the knowledge structure of the course, thereby enriching the embeddings used in subsequent recommendation generation.

3.5.2

Progress-Aware Recommendation Generation

In the recommendation generation process, the user's progress in learning specific course resources is incorporated into the scoring computation. The recommendation score s_u,i for user u and resource i is calculated as: (17) $s_{u, i} = Attention (h_{u}, h_{i}) + γ \cdot Progress (u, i)$ \[{{s}_{u,i}}=\text{Attention}\left( {{h}_{u}},{{h}_{i}} \right)+\gamma \cdot \text{Progress}(u,i)\]

where:

Attention(h_u, h_i): The compatibility score between the user and the learning resource, calculated as: (18) $Attention (h_{u}, h_{i}) = q^{T} \tanh (W_{u} h_{u} + W_{i} h_{i} + b)$ \[\text{Attention}\left( {{\text{h}}_{\text{u}}},{{\text{h}}_{\text{i}}} \right)={{\text{q}}^{\text{T}}}\tanh \left( {{\text{W}}_{\text{u}}}{{\text{h}}_{\text{u}}}+{{\text{W}}_{\text{i}}}{{\text{h}}_{\text{i}}}+\text{b} \right)\]

with W_u, W_i, and q^T as learnable parameters.

Progress(u,i): The completion level of user u for resource i, reflecting the user’s learning progress.

γ: A weighting parameter for the progress score.

As depicted in Figure 1, the Recommendation Engine Module combines embeddings from the Feature Propagation step (via LightGCN) with user-progress information to produce personalized recommendations. By incorporating user progress, the system ensures that recommended resources align with the user's current learning stage, providing a dynamic and context-aware recommendation experience.

3.5.3

Adaptive Re-Ranking Based on Knowledge Hierarchies

In the Computer Networks course, the hierarchical structure of knowledge (e.g., progressing from the Link Layer to the Application Layer) plays a crucial role in ensuring effective learning. To respect this hierarchical progression, the recommendation system prioritizes incomplete but essential chapters or experimental projects. The re-ranking mechanism dynamically adjusts recommendation results by combining the user's personalized compatibility score with the prerequisite importance of the learning resources. The re-ranking score for resource iii is defined as: (19) $Rank (i) = s_{u, i} \cdot PrerequisiteScore (i)$ \[\text{Rank}(\text{i})={{s}_{u,i}}\cdot \text{PrerequisiteScore}(i)\]

where:

s_u,i : The compatibility score between user uand resource i, computed in the recommendation process (see 3.5.2 Progress-Aware Recommendation Generation for details).

PrerequisiteScore(i): A dynamically adjusted score that reflects the importance of resource iii in the context of the knowledge hierarchy. This score leverages the prerequisite relationships in the knowledge-point graph G_kn , as shown in Figure 1, to assign higher priority to foundational topics.

The PrerequisiteScore is derived using the knowledge-point dependencies modeled in the Graph Construction step (depicted in Figure 1) and is calculated based on the criticality and completion status of the resource's prerequisite nodes. For example, if a student has not completed the foundational chapters in the "Link Layer," these chapters are assigned higher priority in the re-ranking process, ensuring that subsequent topics (e.g., "Network Layer") are not recommended prematurely.

3.6

Feedback Module and Model Optimization

3.6.1

State Definition in Reinforcement Learning

In the recommendation task for the Computer Networks course, the learning state of a user must comprehensively represent the current progress and the remaining tasks. The state s_t at time step t is defined as: (20) $s_{t} = (h_{u}, C_{completed}, C_{remaining})$ \[{{s}_{\text{t}}}=\left( {{\text{h}}_{\text{u}}},{{\text{C}}_{\text{completed }\!\!~\!\!\text{ }}},{{\text{C}}_{\text{remaining }\!\!~\!\!\text{ }}} \right)\]

where:

(h_u : The embedding representation of the user.

C_completed: The set of knowledge points already completed by the user.

C_remaining: The set of knowledge points yet to be completed by the user.

As depicted in Figure 1, the Feedback Module (green block) processes this state to generate a reward signal R_t, which reflects the user's learning engagement, progress, and diversity in the recommended resources. This detailed representation ensures that the system effectively tracks and adapts to the user’s learning trajectory, aligning recommendations with both completed and pending knowledge points.

3.6.2

Reward Signal Design and Optimization

The reward signal is tailored to the learning scenario, incorporating user interactions, knowledge point progress, and course completion rate. The composite reward function is defined as: (21) $R_{t} = λ_{1} \cdot Engagement + λ_{2} \cdot Learning Progress + λ_{3} \cdot CompletionRate$ \[{{\text{R}}_{\text{t}}}={{\lambda }_{1}}\cdot ~\text{Engagement}~+{{\lambda }_{2}}\cdot ~\text{Learning Progress}~+{{\lambda }_{3}}\cdot \text{ }\!\!~\!\!\text{ CompletionRate}~\]

where:

Engagement: Measures the user's interaction with the recommended resources, calculated as: (22) $Engagement = \frac{Interactions}{Total Resources} .$ \[\text{ }\!\!~\!\!\text{ Engagement}~=\frac{\text{ }\!\!~\!\!\text{ Interactions }\!\!~\!\!\text{ }}{\text{ }\!\!~\!\!\text{ Total Resources }\!\!~\!\!\text{ }}.\]

Learning Progress: Reflects the user’s advancement in the knowledge graph, expressed as: (23) $Learning Progress = \frac{| C_{completed} |}{| C_{completed} | | C_{remaining} |}$ \[~\text{Learning Progress}~=\frac{\left| {{\text{C}}_{\text{completed }\!\!~\!\!\text{ }}} \right|}{\left| {{\text{C}}_{\text{completed }\!\!~\!\!\text{ }}} \right|\left| {{\text{C}}_{\text{remaining }\!\!~\!\!\text{ }}} \right|}\]

CompletionRate: Represents the proportion of the course completed by the user.

As illustrated in Figure 1, the reward signal computation is tightly integrated with the Feedback Module. This module dynamically adjusts the reward by monitoring the user’s progress along the knowledge graph, captured in the Knowledge Graph Propagation step (purple block).

3.6.3

Diversity-Enhanced Optimization

To encourage recommendation diversity, a diversity score based on the distribution of knowledge points is incorporated into the reward signal. The diversity score is defined as: (24) $Diversity = 1 - \frac{\sum_{i, j \in s} sim (h_{i}, h_{j})}{| S | \cdot (| S | - 1)}$ \[~\text{Diversity}~=1-\frac{\sum\limits_{\text{i},\text{j}\in \text{s}}{\text{sim}\left( {{\text{h}}_{\text{i}}},{{\text{h}}_{\text{j}}} \right)}}{|\text{S}|\cdot (|\text{S}|-1)}\]

where:

S: The set of recommended resources.

Sim(h_i,h_j): The similarity between resource embeddings h_i and h_j, typically computed using cosine similarity.

The final reward signal is updated to include diversity: (25) $R_{t} = R_{t} + λ_{4} \cdot Diversity$ \[{{\text{R}}_{\text{t}}}={{\text{R}}_{\text{t}}}+{{\lambda }_{4}}\cdot ~\text{Diversity}~\]

where λ₄ is the weight of the diversity score. As depicted in Figure 1, the diversity-enhanced reward signal is processed in the Reward Signal block of the Feedback Module, ensuring that the recommendations are not only personalized but also varied, encouraging exploration and comprehensive learning.

3.7

Algorithm Pseudocode and Complexity Analysis

3.7.1

Pseudocode for the Overall Model

Below is the pseudocode that integrates knowledge graph propagation, dynamic user progress adjustment, recommendation generation, and reinforcement learning-based optimization.

As illustrated in Figure 1, the algorithm relies on distinct modules, including the Knowledge Graph Propagation (purple block), Multi-modal Data Fusion (pink block), and the Feedback Module (green block), to achieve personalized, adaptive, and effective recommendations.

3.7.2

Time Complexity Analysis

To highlight the characteristics of the Computer Networks course, we analyze the complexity added by knowledge dependencies and dynamic user progress updates.

1) Knowledge Dependency Propagation Complexity: For the knowledge-point graph G_kn = (V_kn, E_kn), the feature propagation complexity is:O(K·|E_kn|d_g),where K is the number of propagation layers, |E_kn| is the number of edges, and d_g is the embedding dimension.

2) Dynamic User Progress Updates: Updating progress for each user and module interaction has a complexity of: O(N_u·N_m) ,where N_u is the number of users, and N_m is the number of modules.

3) Multi-modal Data Fusion Complexity: For M modalities with feature dimensions d_m, the feature fusion complexity is:O(M·d_m).

4) Recommendation Generation Complexity: Computing matching scores for user-resource pairs involves:O(N_u·N_m·d_g)

5) Reinforcement Learning Feedback Optimization: The feedback optimization complexity is influenced by the state-action space, with single-step complexity: O(T·d_f) ,where Tis the recommendation sequence length, and d_f is the feature dimension.

6) The total complexity is:O(K·|E_kn|d_g + N_u·N_m + M·d_m + N_u·N_m·d_g + E·T·d_f)

3.7.3

Space Complexity Analysis

Considering the characteristics of the knowledge-point graph and course modules, the space complexity is:

1) Embedding Storage: For user and module embeddings:O(N_u·d_g + N_m·d_g).

2) Knowledge Graph Storage: Sparse storage for the knowledge graph:O(|E_kn|).

3) Multi-modal Feature Storage: For MMM modalities:O(M·d_m).

4) Total space complexity:O(N_u·d_g + N_m·d_g + |E_kn|M·d_m).

4

Experiment and Results Analysis

To comprehensively and scientifically validate the effectiveness of the proposed GCN-Attention-RL framework for dynamic personalized learning recommendation systems, this section presents detailed analysis and evaluation using multiple metrics and comparisons between the experimental group (students using the proposed framework) and the control group (students using traditional learning methods).

4.1

Experimental Setup

4.1.1

Data-set and Experimental Groups

The experimental data is derived from the 2016 to 2022 cohorts of Computer Science students from the School of Information Engineering, who have participated in the Computer Networks course. The dataset comprises 486 students, divided into two groups:

1)

Experimental Group (Adopting the Proposed Framework):

(1) Students in this group utilized the proposed Graph Convolutional Network (GCN) and Attention Mechanism-based Dynamic Personalized Learning Recommendation Framework[1].

(2) The learning resources provided to students were dynamically adjusted, incorporating multimodal features such as textbooks, instructional videos, and code snippets, along with time-sequence modeling[10].

(3) Personalized recommendations included customized textbook chapters, experimental tasks, and practice code resources.

2)

Control Group (Traditional Learning Approach):

(1) Students followed a fixed teaching schedule (e.g., textbook chapter order) to complete the course.

(2) All students used identical textbook chapters and experimental projects[14].

(3) No personalized recommendation or dynamic adjustment was provided.

4.1.2

Data Preprocessing

1)

User Behavior Data:

(1) Click Behavior: Logs of student interactions with learning resources, including click frequency and time spent on resources[17].

(2) Engagement Behavior: Submission time and accuracy of experimental tasks[15].

(3) Learning Progress: Completion rates of textbook chapters by students[13].

2)

Course Content Features:

(1) Textual modality (textbook chapters) was processed using BERT to extract semantic embeddings.

(2) Code modality was processed using CodeBERT to extract code embeddings.

3)

Data Cleaning and Normalization:

(1) Anomalous data (e.g., extremely short or long durations) was filtered out.

(2) Click and engagement behaviors were normalized using standardization.

4.1.3

Evaluation Metrics

To comprehensively evaluate the performance of the model, the following metrics were used:

1) Recommendation Accuracy :

• Precision@K: The proportion of relevant items in the top K recommended items[2].

• Recall@K: The proportion of relevant items retrieved out of all possible relevant items.

• NDCG@K: Normalized Discounted Cumulative Gain at K, assessing ranking quality[2].

2) Time Modeling Effectiveness:RMSE (Root Mean Square Error) was used to evaluate the accuracy of time-sequence modeling[22].

3) Recommendation Diversity:Diversity was calculated to measure the difference among recommended items[6]: (26) $Diversity = 1 - \frac{\underset{i, j \in s}{\sum^{}} Sim (h_{i}, h_{j})}{| S | \cdot (| S | - 1)}$ \[~\text{Diversity}~=1-\frac{\underset{\text{i},\text{j}\in \text{s}}{\mathop{\mathop{\sum }^{}}}\,\text{Sim}\left( {{\text{h}}_{\text{i}}},{{\text{h}}_{\mathfrak{j}}} \right)}{|\text{S}|\cdot (|\text{S}|-1)}\]

where |S| is the size of the recommendation set, and Sim(·)is the cosine similarity.

4) Learning Effectiveness:Learning completion rates (e.g., chapter completion percentages) and the average experimental task scores[7].

4.1.4

Comparative Methods

The performance of the proposed framework was compared against the following methods:

1) Baseline Models:Collaborative Filtering (CF);Matrix Factorization (MF) .

2) Deep Learning-based Models:Neural Collaborative Filtering (NCF);DeepFM .

3) Graph-based Models:GC-MC ;LightGCN.

4.2

Experiment Results

4.2.1

Recommendation Accuracy Evaluation

We use Precision@K, Recall@K, and NDCG@K as the primary metrics for recommendation accuracy. The results are shown in Figure 3:

Analysis:

1) The proposed GCN-Attention-RL model achieves the best performance across all metrics. Compared to the strong baseline LightGCN, Precision@5 improves by 5.9%, Recall@5 by 5.6%, and NDCG@5 by 5.0% [2].

2) The improvement is attributed to:

(1) Multi-modal data fusion: Effectively integrates textbook, video, and code features for comprehensive resource representation.

(2) Temporal modeling: Combines LSTM and attention mechanisms to capture critical time points in user learning behavior, enhancing recommendation accuracy and personalization.

(3) Reinforcement learning-based feedback optimization: Dynamically improves recommendation performance through user interaction feedback.

4.2.2

Temporal Modeling Performance Analysis

The results of time series modeling using RMSE (Root Mean Square Error) are shown in Figure 4:

Analysis:

1) The GCN-Attention-RL model achieves the best RMSE score, reducing error by 20.3% compared to LSTM and by 13.4% compared to TCN[5].

2) The incorporation of attention mechanisms assigns differentiated weights to key behavior points, enabling more precise temporal modeling and avoiding redundant information.

4.2.3

Diversity and Learning Outcome Analysis

We evaluate recommendation diversity and user learning outcomes, including course completion rate and average experimental scores. The results are presented in Figure 5:

Analysis:

1) Recommendation Diversity: The diversity metric for GCN-Attention-RL reaches 0.698, a 10.1% improvement compared to LightGCN [4]. This improvement is due to:

(1) Diversity-oriented reinforcement learning reward functions: Encourages diverse recommendation outcomes.

(2) Multi-modal data integration: Incorporates textbook, video, and code features, enriching the structural variety of the recommended resources.

2) Learning Outcomes:

(1) Completion rate (74.9%) and average experimental score (87.1) in the experimental group significantly outperform the control group and baseline models.

(2) This indicates that the dynamic recommendation system not only enhances the usage of learning resources but also improves students' learning effectiveness.

4.2.4

Experimental Group vs. Control Group Analysis

To further quantify the practical impact of the GCN-Attention-RL framework on learning behavior, we compare the experimental group (using the proposed framework) with the control group (using traditional methods) on several key metrics. Results are summarized in Figure 6:

1)

Analysis:

(1) Completion Rate and CTR: The experimental group achieves an 11.7% higher completion rate and a 10.2% higher CTR compared to the control group, demonstrating that personalized recommendations significantly enhance student engagement with learning resources[16].

(2) Experiment Submission Rate and Scores: The experimental group shows a 11.2% increase in experiment submission rates and an 8.5% improvement in experimental scores, indicating that the proposed framework effectively supports students in completing hands-on tasks and improves their overall learning performance.

(3) Conclusion: The significant improvement in the experimental group highlights the effectiveness of personalized learning recommendations in adapting to student needs and enhancing learning outcomes[20].

2)

Summary and Discussion

(1) Superior Overall Performance:The GCN-Attention-RL model exhibits outstanding performance across recommendation accuracy, temporal modeling, diversity, and user satisfaction metrics, demonstrating the strong potential of dynamic personalized learning recommendation systems.

(2) Applicability to Computer Networks Course:The framework effectively integrates multi-modal resources (textbooks, experiments, and code) and aligns with the diverse learning needs of the computer networks course.

(3) Impact on Teaching Practices:The comparison between the experimental and control groups validates the framework's application value in real-world teaching, significantly improving student engagement and academic performance.

4.3

Ablation Study

To further validate the effectiveness of each module in the proposed model, a series of ablation experiments were conducted by progressively removing key components. The experiments focused on evaluating the contribution of the multi-modal data fusion module, temporal modeling module, and reinforcement learning feedback module.

4.3.1

Analysis of the Multi-Modal Data Fusion Module

1)

Experimental Design:

The multi-modal data fusion module was removed, and the model was evaluated using individual modalities (e.g., textbook content, code features, and video features). The following four configurations were compared:

(1) Content Only: Using only textbook content features.

(2) Code Only: Using only experiment code features.

(3) Video Only: Using only teaching video features.

(4) Full Model: Utilizing all modalities, including textbook, code, and video features, through the fusion module.

2)

Results and Analysis:

Table 1 shows the performance in terms of recommendation accuracy and diversity across different configurations.

Table 1.

Performance of ablation experiments

Model Configuration	Precision@5	Recall@5	Diversity
Content Only	0.632	0.511	0.581
Code Only	0.647	0.520	0.562
Video Only	0.624	0.495	0.534
Full Model	0.732	0.641	0.693

3)

Conclusions:

(1) Models using a single modality performed poorly in both recommendation accuracy and diversity. The Code Only modality slightly outperformed others, indicating the significance of code-related features in computer networking courses.

(2) The Full Model, which combines all modalities, significantly improved both recommendation accuracy (Precision@5 increased by 13.1%) and diversity (increased by 19.2%). This demonstrates that multi-modal data fusion effectively captures diverse and complementary information from different resources.

4.3.2

Analysis of the Temporal Modeling Module

1)

Experimental Design:

The temporal modeling module was removed, and the model performance was compared under the following three configurations:

(1) No Temporal Module: Static features without any temporal modeling.

(2) LSTM Only: Using LSTM for temporal modeling without incorporating attention mechanisms.

(3) Full Temporal Module: Combining LSTM with attention mechanisms.

2)

Results and Analysis:

Table 2 shows the results for recommendation performance and temporal sequence prediction accuracy (measured by RMSE).

Table 2.

Analysis of the Temporal Modeling Module

Model Configuration	Precision@5	Recall@5	RMSE (Temporal Prediction)
No Temporal Module	0.681	0.582	-
LSTM Only	0.704	0.603	2.11
Full Temporal Module	0.732	0.641	1.72

3)

Conclusions:

(1) Removing the temporal modeling module (No Temporal Module) significantly reduced recommendation performance, confirming that temporal user behavior is critical for effective recommendations.

(2) The LSTM Only configuration improved both Precision@5 and RMSE compared to the static model, showcasing the importance of sequential modeling.

(3) The Full Temporal Module, integrating LSTM with attention mechanisms, achieved the best results. The RMSE decreased from 2.11 to 1.72, highlighting the ability of attention mechanisms to capture key time points in user behavior.

4.3.3

Analysis of the Reinforcement Learning Feedback Module

1)

Experimental Design:

The reinforcement learning feedback module was removed, and a static recommendation approach was used for comparison. The following two configurations were tested:

(1) No RL: Using fixed multi-modal fusion weights and a static recommendation ranking.

(2) Full RL Module: Employing reinforcement learning to dynamically adjust recommendation weights and ranking.

2)

Results and Analysis:

Table 3 summarizes the impact of the reinforcement learning module on recommendation diversity and user satisfaction.

Table 3.

Analysis of the Temporal Modeling Module

Model Configuration	Diversity	CTR (Click-Through Rate)	Learning Completion Rate
No RL	0.612	0.465	0.602
Full RL Module	0.693	0.514	0.681

Conclusions:

(1) The removal of the reinforcement learning module (No RL) led to lower diversity and user satisfaction metrics. This indicates that a static approach fails to adapt to dynamic user behavior and diverse learning preferences.

(2) The Full RL Module significantly enhanced recommendation diversity (13.2% increase), click-through rate (10.5% increase), and learning completion rate (13.1% increase). These results confirm that reinforcement learning enables the system to dynamically adjust to user needs, resulting in more effective and engaging recommendations.

Through the ablation experiments, we validated the contributions of each module in the proposed model. The multi-modal data fusion module, temporal modeling module, and reinforcement learning feedback module collectively contribute to the significant improvements in recommendation accuracy, diversity, and user satisfaction. These findings further underscore the robustness and adaptability of the proposed model for personalized learning recommendations in computer networking courses.

4.4

Complexity Analysis

4.4.1

Time Complexity Evaluation

Table 4.

summarizes the time complexity of each module in the proposed model.

Module	Time Complexity	Optimization Suggestions
Multi-modal Fusion Module	O(M·d_m)	Reduce the feature dimension d_m.
Temporal Modeling (LSTM)	$O (T \cdot d_{t}^{2})$ \[\text{O}(\text{T}\cdot \text{d}_{\text{t}}^{2})\]	Decrease the sequence length T.
Temporal Modeling (Attention)	O(T²·d_t)	Apply sparse attention mechanisms.
Reinforcement Learning Module	O(T·d_f)	Enhance parallelization of training.

The Multi-modal Fusion Module has a linear complexity with respect to the number of modalities MM and feature dimension d_m, making it relatively efficient. The Temporal Modeling Module using LSTM and attention mechanisms exhibits quadratic complexity concerning the sequence length T, where sparse attention can help mitigate the computational overhead. The Reinforcement Learning Module adds linear complexity for each time step T, making it suitable for real-time recommendation when implemented efficiently.

4.4.2

Space Complexity Evaluation

Table 5 provides the space complexity of each module in the proposed framework

Table 5.

Space Complexity of each module in the proposed framework

Module	Space Complexity	Optimization Suggestions
User and Resource Embeddings	O(N_u·d_g + N_m·d_g)	Reduce embedding dimensions d_g.
Knowledge Graph Storage	^O(\|Ekn\|).	E_kn

The User and Resource Embedding Module has a space complexity linear in the number of users N_u and resources N_m, scaled by the embedding dimension d_g Reducing d_g without compromising feature representation can alleviate memory consumption. Additionally, the Knowledge Graph Storage Module, which stores dependencies between learning modules, can be optimized using sparse representations to manage the large-scale storage efficiently.

This detailed complexity analysis ensures the proposed model remains scalable for real-world applications, particularly in the context of personalized learning recommendation systems.

5

Discussion

This study utilizes learning data from the "Computer Networks" course to develop and validate a dynamic personalized learning recommendation system that integrates Graph Convolutional Networks (GCN), attention mechanisms, and reinforcement learning. The following discussion delves deeper into the research findings and their significance.

5.1

Analysis of Model Advantages and Innovations

1) Effectiveness of Multimodal Feature Fusion: By dynamically adjusting the weights of text, code, and video modalities using a gating mechanism, the model significantly improves recommendation accuracy (e.g., Precision@5 increased by 13.1%) and diversity (increased by 19.2%)[13][19]. This validates the effectiveness of multimodal data fusion in complex course resource recommendations and highlights its innovation in utilizing heterogeneous data comprehensively.

2) Critical Role of Temporal Modeling: The temporal modeling module, combining LSTM and attention mechanisms, not only accurately captures the temporal dependencies in user learning behaviors but also identifies key time points, ensuring that recommendations align with students’ learning progress[5],[22].

3) Contribution of Reinforcement Learning: By dynamically adjusting recommendation strategies, the reinforcement learning module effectively enhances user satisfaction (e.g., CTR increased by 5.8%) and learning completion rate (increased by 6.8%)[8],[16]. This design offers a novel approach for dynamically optimizing educational recommendation systems.

5.2

Limitations and Directions for Improvement

1) Data Scale and Diversity: The scale and scope of the dataset used in this study are limited. Future work could involve incorporating larger datasets and exploring cross-disciplinary recommendation scenarios to improve the model’s generalizability [17].

2) Computational Complexity: The inclusion of multiple modules increases the computational cost of training and inference. Future research could adopt efficient model compression and acceleration techniques to enhance performance[11],[2].

3) Cold Start Problem: Due to the reliance on user behavior data, the model may perform poorly for new or inactive users. Future studies could address this issue through transfer learning or meta-learning approaches[14],[18].

5.3

Implications for the Education Domain

1) Support for Personalized Teaching:The proposed method provides dynamic and personalized learning path recommendations, helping students optimize their learning process and improve efficiency[6],[20].

2) Assistance for Instructional Decision-Making:The model identifies students’ weaknesses in learning and offers intelligent suggestions for resource allocation, demonstrating significant practical implications[13],[15].

6

Conclusion

6.1

Research Summary

This study proposes a dynamic personalized learning recommendation system that integrates Graph Convolutional Networks, attention mechanisms, and reinforcement learning, and validates its effectiveness through experiments on "Computer Networks" course data. The main contributions are as follows:

1) A multimodal feature fusion framework is proposed to effectively address the challenge of integrating heterogeneous data[10].

2) The temporal modeling module enhances the model’s ability to capture temporal dependencies in learning behaviors[22],[5].

3) The reinforcement learning-based recommendation strategy dynamically optimizes recommendations, significantly improving accuracy, diversity, and user satisfaction[8],[16].

6.2

Future Research Directions

1) Expansion to Multiple Courses and Disciplines: Extending the model to other core courses (e.g., Data Structures, Operating Systems) or interdisciplinary courses (e.g., Artificial Intelligence) to verify its applicability and generalizability [17],[18].

2) Optimization of Temporal Modeling and Real-Time Performance: Introducing sparse attention mechanisms or transformer-based models to optimize the temporal modeling module[5].

3) Addressing Cold Start Problems: Leveraging transfer learning or meta-learning techniques to reduce reliance on user behavior data and improve recommendations for new or inactive users[14],[18].

6.3

Societal and Technological Significance

The proposed method not only advances the development of personalized educational recommendation systems but also provides technical insights for multimodal recommendation models based on GCNs and attention mechanisms. With the expansion of educational data and continuous algorithmic improvements, this approach is expected to further enhance educational equity and help students achieve efficient, personalized learning experiences30[7].

Lingua:: Inglese

Frequenza di pubblicazione:: 1 volte all'anno
Argomenti della rivista:: Scienze biologiche, Scienze della vita, altro, Matematica, Matematica applicata, Matematica generale, Fisica, Fisica, altro

Feed RSS della rivista

Nonlinear Adaptive Optimization of Multi-Modal Learning Paths Using Graph Convolutional Networks and Reinforcement Learning for Intelligent Educational Systems

TongLI,

Pubblicato online: 17 mar 2025

Ricevuto: 19 ott 2024

Accettato: 04 feb 2025

DOI: https://doi.org/10.2478/amns-2025-0829

Parole chiaveGraph Convolutional Networks (GCN), Attention Mechanism, Personalized Learning Recommendation, Multi-Modal Data Fusion, Temporal Behavior Modeling, Reinforcement Learning, Nonlinear Mathematical Modeling, Educational Technology

© 2025 TongLI, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Parole chiave
Graph Convolutional Networks (GCN), Attention Mechanism, Personalized Learning Recommendation, Multi-Modal Data Fusion, Temporal Behavior Modeling, Reinforcement Learning, Nonlinear Mathematical Modeling, Educational Technology