Traffic Flow Prediction Using Deep Learning Techniques in Urban Road Networks

Accurate traffic flow prediction is a crucial component of intelligent transportation systems (ITS), playing a key role in managing urban road networks, reducing congestion, and improving mobility. Lv et al. [1] showcased the potential of deep learning in traffic flow prediction by employing a stacked autoencoder, achieving performance that surpassed traditional approaches. Polson and Sokolov [2] built on this foundation, applying deep learning methods for short-term traffic forecasting and emphasizing their adaptability in dynamic urban settings. Further advancements by Pamuła and Żochowska [3] explored deep learning models for origin-destination (OD) matrix estimation, showcasing their ability to handle uncongested urban road networks.

Recent innovations in graph-based models have also emerged as a promising direction. Yang and Lv [4] proposed a graph deep learning approach that captures spatial dependencies in urban traffic flow, achieving improved prediction accuracy. Medina-Salgado et al. [5] provided a comprehensive review of urban traffic flow prediction techniques, emphasizing the importance of integrating spatial-temporal patterns into predictive models. Razali et al. [6] highlighted current gaps and evaluation metrics in traffic flow prediction with machine learning and deep learning, offering guidance for future studies.

Deep learning techniques using convolutional methods have been employed for short-term traffic forecasting. Bilotta et al. [7] utilized convolutional neural networks (CNNs) to predict urban traffic flow, effectively extracting spatial patterns from traffic data. Liu et al. [8] developed deep learning models based on mobility data, showcasing the capability of neural networks to handle large-scale urban traffic datasets. Chen et al. [9] advanced this further by integrating deep learning into Internet of Vehicles (IoV) traffic prediction, proposing innovative methods for incorporating real-time vehicle data.

To address data quality challenges, Pamuła [10] examined how data loss impacts traffic flow predictions in neural networks, emphasizing the importance of robust preprocessing methods. Essien et al. [11] explored incorporating external factors, such as traffic incidents derived from social media, into deep learning models, offering a new method to enhance prediction accuracy. Wang et al. [12] introduced a route-oriented deep learning model capable of accurately capturing path-specific traffic dynamics in urban transportation networks.

Federated learning has recently gained attention as a potential solution for traffic flow prediction in heterogeneous scenarios. Pei et al. [13] reviewed federated learning methods for handling diverse data environments, while Xia et al. [14] proposed novel approaches for memory evaluation and federated unlearning in distributed traffic management networks. These advancements pave the way for privacy-preserving and scalable traffic prediction systems.

Spatiotemporal modeling continues to be a key area of research. Xie et al. [15] examined machine learning techniques for urban flow prediction, emphasizing the integration of spatial and temporal features. Han and Huang [16] introduced a deep learning approach for short-term traffic flow forecasting, demonstrating its suitability for real-time use cases. Abdullah et al. [17] improved traffic flow forecasting with soft GRU-based recurrent neural networks, enabling better congestion management in smart cities.

Hybrid deep learning approaches have demonstrated significant potential in traffic forecasting. Wu et al. [18] integrated multiple deep learning frameworks to create a hybrid model for traffic flow prediction, achieving enhanced performance in diverse scenarios. Fouladgar et al. [19] proposed scalable neural networks to predict urban traffic congestion, effectively tackling computational challenges in large-scale applications. Miglani and Kumar [20] conducted an extensive review of deep learning methods for traffic prediction in autonomous vehicles, highlighting solutions and challenges in incorporating predictive models into advanced transportation systems.

While deep learning methods have greatly improved traffic flow prediction, challenges remain. Many existing models face difficulties in effectively managing the complexity of spatial-temporal dependencies in urban traffic data, particularly in dynamic and unpredictable scenarios. Additionally, many approaches face limitations in scalability and computational efficiency, particularly when deployed in real-world, large-scale transportation networks. To tackle these challenges, This paper presents a new deep learning-based algorithm for traffic flow prediction, integrating advanced spatiotemporal feature extraction with adaptive optimization techniques. The proposed method leverages a hybrid architecture combining convolutional and recurrent neural networks to capture fine-grained spatial relationships and long-term temporal trends, ensuring robust and accurate predictions. Furthermore, a multi-task learning framework is employed to enhance computational efficiency and support the simultaneous prediction of multiple traffic metrics, such as flow, speed, and congestion levels. The contributions of this work are threefold: (1) introducing an innovative hybrid deep learning framework tailored for urban traffic flow prediction, (2) addressing scalability challenges through adaptive optimization techniques, and (3) demonstrating the model’s efficacy in real-world scenarios through extensive experimental validation on multiple benchmark datasets. This research provides a comprehensive solution to the pressing challenges in traffic flow prediction, paving the way for smarter and more efficient urban traffic management.

2

Method

This section presents a deep learning framework tailored for traffic flow prediction in urban road networks. The design focuses on tackling challenges such as capturing intricate spatial-temporal dependencies, managing large-scale data, and maintaining computational efficiency. The approach incorporates advanced methods, utilizing convolutional neural networks (CNNs) to extract spatial features and recurrent neural networks (RNNs) to capture temporal dynamics, all incorporated into a multi-task learning framework.

2.1

Framework Overview

The proposed framework consists of three primary modules: data preprocessing, feature extraction, and prediction modeling. Figure 1 illustrates the architecture.

1) Data Preprocessing: Traffic data, encompassing flow, speed, and occupancy, is gathered from multiple sources, including loop detectors, GPS devices, and IoT sensors. This raw data is preprocessed to remove outliers, fill missing values, and normalize the inputs for subsequent learning.

2) Spatial-Temporal Feature Extraction: A combined architecture of CNNs and gated recurrent units (GRUs) is used to model spatial-temporal dependencies. The CNN layers focus on extracting localized spatial features from the road network, while the GRU layers handle temporal dynamics over successive time intervals.

3) Prediction Modeling: A multi-task learning framework predicts multiple traffic metrics simultaneously, enhancing efficiency and improving prediction accuracy by leveraging interrelated tasks.

2.2

Data Preprocessing

Accurate traffic flow prediction depends on reliable and high-quality data. In this study, raw traffic datasets, including historical traffic volumes, vehicle speeds, and congestion levels, were obtained from sensors deployed across urban road networks. However, such data frequently contains noise, missing entries, and outliers that can negatively impact prediction accuracy. To mitigate these issues, a detailed data preprocessing pipeline was implemented, comprising the following steps:

1) Data Cleaning: To remove anomalies caused by sensor errors or external interference, outlier detection was performed using statistical methods. Specifically, data points exceeding three standard deviations from the mean were flagged as outliers and replaced with interpolated values. For missing data, linear interpolation was applied to maintain temporal continuity.

2) Data Normalization: To standardize features and improve the convergence of deep learning models, all numerical attributes were normalized to a range of [0,1] using the min-max scaling formula: $x' = \frac{x - min (x)}{max (x) - min (x)}$ \[x=\frac{x-\text{min}(x)}{\text{max}(x)-\text{min}(x)}\]

1) where x represents the original value, and max(x) and min(x) denote the feature's maximum and minimum values, respectively.

3) Temporal Aggregation: Traffic data often has high temporal granularity, leading to redundancy and noise. To address this, data was aggregated into 15-minute intervals, balancing prediction accuracy and computational efficiency.

4) Feature Engineering: Temporal features like time of day, day of the week, and holiday indicators were extracted from timestamps to reflect periodic traffic patterns. Spatial features, including road type and connectivity, were also encoded to reflect spatial dependencies in the road network.

5) Data Splitting: The processed dataset was divided into three parts: 70% for training, 15% for validation, and 15% for testing. A time-based split was applied to preserve the temporal order of the traffic data, ensuring realistic evaluation of the prediction model.

By applying this preprocessing pipeline, the resulting dataset was standardized, noise-free, and representative of the underlying traffic dynamics, providing a robust foundation for deep learning-based traffic flow prediction.

2.3

Spatial-Temporal Feature Extraction

Accurately predicting traffic flow requires an effective representation of both spatial and temporal dependencies. Traffic data inherently exhibits strong correlations across spatially connected road segments and temporally evolving patterns. To address these complexities, a hybrid feature extraction framework combining graph-based spatial modeling and temporal sequence analysis was employed.

1) Spatial Feature Extraction: The urban road network was modeled as a graph $G = (V, ℰ, A)$ \[\mathcal{G}=(\mathcal{V},\mathcal{E},\mathbf{A})\], where V epresents the nodes (road intersections), ε denotes the edges (road segments), and A is the adjacency matrix capturing the network's connectivity. To model spatial dependencies effectively, a Graph Convolutional Network (GCN) was utilized. The GCN aggregates information from neighboring nodes based on their connectivity and updates node features iteratively. For a node v_i, the feature update is given by: $h_{i}^{(l + 1)} = σ (\sum_{j \in N (i)} \frac{h_{j}^{(l)}}{\sqrt{deg (v_{i}) \cdot deg (v_{j})}} W^{(l)})$ \[\mathbf{h}_{i}^{(l+1)}=\sigma \left( \sum\limits_{j\in \mathcal{N}(i)}{\frac{\mathbf{h}_{j}^{(l)}}{\sqrt{\text{deg}\left( {{v}_{i}} \right)\cdot \text{deg}\left( {{v}_{j}} \right)}}}{{\mathbf{W}}^{(l)}} \right)\] where $h_{i}^{(l)}$ \[\mathbf{h}_{i}^{(l)}\] is the feature vector of node v_i at layer l, N(i) represents the neighbors of v_i, deg(v_i) is the degree of node v_i, W^(l) is the learnable weight matrix, and σ(·) is an activation function. This approach ensures that spatial correlations between neighboring road segments are effectively captured.

2) Temporal Feature Extraction: Temporal patterns in traffic data, such as daily fluctuations or rush-hour peaks, were modeled using a Long Short-Term Memory (LSTM) network. LSTMs are well-suited for sequential data as they effectively capture long-term dependencies. For a traffic data sequence ${x_{t}}_{t = 1}^{T}$ \[\left\{ {{x}_{t}} \right\}_{t=1}^{T}\], the LSTM updates its hidden state h_t and cell state c_t at each time step as follows: $\begin{matrix} \begin{matrix} f_{t} = σ (W_{f} [h_{t - 1}, x_{t}] + b_{f}), & i_{t} = σ (W_{i} [h_{t - 1}, x_{t}] + b_{i}) \end{matrix} \\ \begin{matrix} o_{t} = σ (W_{o} [h_{t - 1}, x_{t}] + b_{o}), & c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ tanh (W_{c} [h_{t - 1}, x_{t}] + b_{c}) \end{matrix} \\ h_{t} = o_{t} ⊙ tanh (c_{t}) \end{matrix}$ \[\begin{matrix} \begin{matrix} {{\mathbf{f}}_{t}}=\sigma \left( {{\mathbf{W}}_{f}}\left[ {{\mathbf{h}}_{t-1}},{{x}_{t}} \right]+{{\mathbf{b}}_{f}} \right), & {{\mathbf{i}}_{t}}=\sigma \left( {{\mathbf{W}}_{i}}\left[ {{\mathbf{h}}_{t-1}},{{x}_{t}} \right]+{{\mathbf{b}}_{i}} \right) \\ \end{matrix} \\ \begin{matrix} {{\mathbf{o}}_{t}}=\sigma \left( {{\mathbf{W}}_{o}}\left[ {{\mathbf{h}}_{t-1}},{{x}_{t}} \right]+{{\mathbf{b}}_{o}} \right), & {{\mathbf{c}}_{t}}={{\mathbf{f}}_{t}}\odot {{\mathbf{c}}_{t-1}}+{{\mathbf{i}}_{t}}\odot \text{tanh}\left( {{\mathbf{W}}_{c}}\left[ {{\mathbf{h}}_{t-1}},{{x}_{t}} \right]+{{\mathbf{b}}_{c}} \right) \\ \end{matrix} \\ {{\mathbf{h}}_{t}}={{\mathbf{o}}_{t}}\odot \text{tanh}\left( {{\mathbf{c}}_{t}} \right) \\ \end{matrix}\] where f_t, i_t, and o_t are the forget, input, and output gates, respectively; W and b are the weight matrices and biases, and ⊙ represents element-wise multiplication. The LSTM network models the temporal evolution of traffic data, allowing the system to predict future states based on historical observations.

3) Combined Spatial-Temporal Modeling: To integrate spatial and temporal features, a hybrid Graph Convolutional Network and LSTM (GCN-LSTM) model was constructed. The GCN extracts spatial embeddings H_s for all nodes, while the LSTM processes the temporal sequences H_t for each node. The combined representation H_st is obtained by concatenating the outputs: $H_{s t} = Concat (H_{s}, H_{t})$ \[{{\mathbf{H}}_{st}}=\text{ Concat }\left( {{\mathbf{H}}_{s}},{{\mathbf{H}}_{t}} \right)\] where H_st serves as the input to subsequent prediction layers. This hybrid approach ensures that both spatial and temporal dependencies are effectively captured, leading to more accurate traffic flow predictions.

By combining the strengths of GCNs and LSTMs, the proposed framework provides a robust mechanism for extracting meaningful spatial-temporal features, which are critical for understanding and predicting traffic dynamics in urban road networks.

2.4

Prediction Modeling

Accurate prediction of traffic flow requires a sophisticated model that effectively integrates the extracted spatial-temporal features into a predictive framework. This section outlines the prediction model design, utilizing the combined outputs of the previously discussed Graph Convolutional Network (GCN) and Long Short-Term Memory (LSTM) network. This framework efficiently captures spatial relationships among road segments and temporal patterns over time, enabling precise traffic flow predictions.

1) Model Structure: The prediction model is a multi-layer neural network designed to process spatial-temporal features H_st and output predicted traffic flow ŷ for future time steps. The structure is mathematically represented as: $\hat{y} = f_{predict} (H_{s t}; Θ)$ \[\mathbf{\hat{y}}={{f}_{\text{predict }}}\left( {{\mathbf{H}}_{st}};\Theta \right)\] where ŷ denotes the predicted traffic flow, f_predict is the prediction function, and Θ represents the model's trainable parameters.

The model comprises the following essential components: - Input Layer: Accepts the spatial-temporal feature matrix H_st ∈ ℝ^N×D, where N represents the number of nodes (road segments) and D is the dimensionality of the features. - Hidden Layers: Multiple fully connected layers with nonlinear activation functions are used to capture complex interactions among the spatial-temporal features. The hidden layers are formulated as: $z^{(l + 1)} = σ (W^{(l)} z^{(l)} + b^{(l)})$ \[{{\mathbf{z}}^{(l+1)}}=\sigma \left( {{\mathbf{W}}^{(l)}}{{\mathbf{z}}^{(l)}}+{{\mathbf{b}}^{(l)}} \right)\] where z^(l) is the output of the l -th layer, W^(l) and b^(l) are the weight matrix and bias vector, respectively, and σ(·) is the ReLU activation function. - Output Layer: Produces the final traffic flow prediction for each node at the target time step. The output layer is defined as: $\hat{y} = w_{out} z^{(L)} + b_{out}$ \[\mathbf{\hat{y}}={{\mathbf{w}}_{\text{out }}}{{\mathbf{z}}^{(L)}}+{{\mathbf{b}}_{\text{out }}}\] where z^(L) is the output of the last hidden layer, and W_out and b_out are the output layer parameters.

2) Loss Function: The model is trained using the Mean Squared Error (MSE) loss function, which measures the difference between predicted traffic flow ŷ and the ground truth y. The loss is expressed as: $ℒ_{MSE} = \frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2},$ \[{{\mathcal{L}}_{\text{MSE}}}=\frac{1}{N}\sum\limits_{i=1}^{N}{{{\left( {{y}_{i}}-{{{\hat{y}}}_{i}} \right)}^{2}},}\] where N is the total number of nodes. The MSE penalizes large deviations, ensuring accurate traffic flow predictions.

3) Regularization: To reduce overfitting and improve the model's generalization, L2 regularization is applied to the trainable parameters: $ℒ_{reg} = λ \sum_{l} {‖ W^{(l)} ‖}_{2}^{2},$ \[{{\mathcal{L}}_{\text{reg}}}=\lambda \sum\limits_{l}{\left\| {{\mathbf{W}}^{(l)}} \right\|_{2}^{2}},\] where λ is the regularization coefficient. The total loss function for training the model is a combination of the MSE and the regularization term: $ℒ = ℒ_{MSE} + ℒ_{reg} .$ \[\mathcal{L}={{\mathcal{L}}_{\text{MSE}}}+{{\mathcal{L}}_{\text{reg}}}.\]

4) Training Procedure: The model is trained using a gradient-based optimization algorithm, such as Adam, which adjusts the parameters Θ iteratively to minimize the loss function. The training process involves: 1. Forward propagation to compute predictions ŷ. 2. Calculation of the total loss ℒ. 3. Backpropagation to compute gradients with respect to Θ. 4. Parameter updates using the Adam optimizer: $Θ \leftarrow Θ - η \nabla_{Θ} ℒ,$ \[\Theta \leftarrow \Theta -\eta {{\nabla }_{\Theta }}\mathcal{L},\] where η is the learning rate.

5) Multi-Step Prediction: For real-world applications, predicting traffic flow over multiple future time steps is crucial. The model adopts a recursive approach for multi-step prediction. Given the predicted traffic flow at time step t, ŷ_t, the next prediction is made by feeding ŷ_t back into the model: ${\hat{y}}_{t + 1} = f_{predict} ({\hat{y}}_{t}; Θ) .$ \[{{\mathbf{\hat{y}}}_{t+1}}={{f}_{\text{predict }}}\left( {{{\mathbf{\hat{y}}}}_{t}};\Theta \right).\]

This approach enables the model to predict traffic flows for extended horizons, supporting proactive traffic management and decision-making.

By integrating spatial-temporal feature extraction and a robust prediction modeling framework, the proposed method achieves high accuracy in traffic flow forecasting while maintaining scalability and adaptability for urban road networks.

2.5

Optimization and Training

To ensure the effectiveness and efficiency of the proposed traffic flow prediction model, an optimization-driven training framework is employed. This section details the optimization strategies, training procedures, and techniques used to enhance the model’s performance and generalization capabilities.

1) Loss Function Design: The model aims to minimize prediction errors while maintaining robustness. The Mean Squared Error (MSE) is used as the primary loss function to quantify the difference between predicted traffic flows ŷ and actual values y. It is expressed as: $ℒ_{MSE} = \frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2},$ \[{{\mathcal{L}}_{\text{MSE}}}=\frac{1}{N}\sum\limits_{i=1}^{N}{{{\left( {{y}_{i}}-{{{\hat{y}}}_{i}} \right)}^{2}}},\] where N is the number of road segments. To enhance generalization and prevent overfitting, L2 regularization is applied to the trainable weight matrices W^(l) in each layer, expressed as: $ℒ_{reg} = λ \sum_{l} {‖ W^{(l)} ‖}_{2}^{2},$ \[{{\mathcal{L}}_{\text{reg}}}=\lambda \sum\limits_{l}{\left\| {{\mathbf{W}}^{(l)}} \right\|_{2}^{2}},\] where λ is the regularization coefficient. The total loss combines these terms: $ℒ = ℒ_{MSE} + ℒ_{reg} .$ \[\mathcal{L}={{\mathcal{L}}_{\text{MSE}}}+{{\mathcal{L}}_{\text{reg}}}.\]

2) Gradient-Based Optimization: The model minimizes the loss function ℒ using the Adam optimizer, an advanced gradient-based method that adapts learning rates for individual parameters. Adam is particularly effective for handling sparse gradients and dynamically adjusting learning rates, making it suitable for complex models. The parameter update rule is: $Θ \leftarrow Θ - η \nabla_{Θ} ℒ,$ \[\Theta \leftarrow \Theta -\eta {{\nabla }_{\Theta }}\mathcal{L},\] where Θ represents the model parameters, η is the learning rate, and ▽_Θℒ is the gradient of the loss function.

3) Learning Rate Scheduling: An adaptive learning rate scheduler dynamically adjusts the learning rate during training, reducing it by a factor of γ if the validation loss shows no improvement for k consecutive epochs: $\begin{matrix} η \leftarrow η \cdot γ, & if no improvement for k epochs . \end{matrix}$ \[\begin{matrix} \eta \leftarrow \eta \cdot \gamma , & \text{if no improvement for }k\text{ epochs}\text{.} \\ \end{matrix}\]

This strategy ensures steady convergence while avoiding premature stagnation.

4) Batch Training: To balance computational efficiency and model convergence, the model is trained in mini-batches. Each batch contains a subset of the training dataset, allowing for efficient memory utilization and reducing variance in gradient updates. For a mini-batch of size B, the loss is computed as: $ℒ_{batch} = \frac{1}{B} \sum_{i = 1}^{B} {(y_{i} - {\hat{y}}_{i})}^{2} .$ \[{{\mathcal{L}}_{\text{batch }}}=\frac{1}{B}\sum\limits_{i=1}^{B}{{{\left( {{y}_{i}}-{{{\hat{y}}}_{i}} \right)}^{2}}}.\]

5) Early Stopping: Early stopping is employed to prevent overfitting by monitoring performance on a validation set. Training is halted if the validation loss does not improve after a set number of epochs.

6) Multi-Step Prediction Training: For multi-step traffic flow forecasting, the model is trained iteratively to predict sequential future time steps. The predicted output ŷ_t at time t is used as input for generating the forecast for the next time step ŷ_t+1. The training objective for multi-step prediction is: $min_{Θ} \sum_{t = 1}^{T} ℒ_{t},$ \[\underset{\Theta }{\mathop{\text{min}}}\,\sum\limits_{t=1}^{T}{{{\mathcal{L}}_{t}}},\] where T is the number of predicted time steps, and ℒ_t is the loss at time step t.

7) Computational Efficiency: To enhance computational efficiency, training is parallelized using GPU acceleration. Libraries such as PyTorch are employed to optimize tensor computations and leverage GPU capabilities. Additionally, mixed-precision training is used to reduce memory consumption and accelerate computation.

8) Training Workflow: The complete training workflow is as follows:

Initialize model parameters Θ and optimizer settings.

(1) Shuffle the training dataset and divide it into mini-batches.

For each mini-batch, perform forward propagation to compute predictions ŷ.

Compute the loss ℒ and perform backpropagation to calculate gradients ▽_Θℒ.

(2) Update parameters using the Adam optimizer.

(3) Monitor validation loss and adjust the learning rate if necessary.

(4) Apply early stopping if validation performance stagnates.

This optimization and training framework ensures that the model achieves high accuracy and generalization performance while maintaining computational efficiency.

3

Experiment

The proposed traffic flow prediction model was evaluated through extensive experiments on benchmark datasets, such as METR-LA and PEMS-BAY. These experiments aimed to examine the model's scalability, resilience to missing data, computational efficiency, and accuracy in long-term predictions. Comparisons were made against baseline models, including LSTM, GCN, ST-GCN, and T-GCN, to validate the superiority of the proposed method.

3.1

Experimental Setup and Datasets

The experiments were conducted in a high-performance computing environment with NVIDIA Tesla GPUs, using Python and the PyTorch framework for implementation. To improve computational efficiency, training was parallelized across multiple GPUs. The evaluation utilized two well-known urban traffic datasets: METR-LA and PEMS-BAY. METR-LA includes data from 207 loop detectors on Los Angeles highways, measured every 5 minutes over 4 months, capturing speed, volume, and occupancy rates. PEMS-BAY comprises data from 325 sensors in the Bay Area, recorded at 5-minute intervals over several months, providing detailed metrics such as speed and flow rates. These datasets offer extensive spatial-temporal information, enabling a robust evaluation of the model’s performance across various urban traffic conditions. Missing data were imputed, and traffic flows were normalized to enhance model convergence during training.

3.2

Baseline Models

To demonstrate the strengths of the proposed model, its performance was evaluated against several state-of-the-art models: LSTM, a recurrent neural network designed to capture temporal dependencies in sequential data; GCN, a graph convolutional network that extracts spatial features from graph-structured traffic data; ST-GCN, a spatial-temporal graph convolutional network combining GCNs and RNNs for joint spatial-temporal learning; and T-GCN, a time-aware GCN model that directly integrates temporal information into the graph convolution process. For fair comparisons, all baseline models were fine-tuned, with hyperparameters such as learning rate, batch size, and hidden layer dimensions optimized using grid search and cross-validation.

3.3

Scalability and Robustness Evaluation

Scalability is a crucial aspect of traffic flow prediction models, especially when applied to large urban networks with extensive sensor deployments. To assess the scalability of the proposed model, subsets of the METR-LA dataset containing 50, 100, and 200 sensors were analyzed. As summarized in Table 1, the proposed model consistently outperformed baseline models such as LSTM, GCN, and ST-GCN, achieving lower Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) values across all sensor subsets. Notably, the MAE increased by only 8.6% when the number of sensors increased fourfold (from 50 to 200), indicating the model's ability to scale effectively without significant performance loss. This scalability is attributed to its efficient spatial-temporal feature extraction, which mitigates the challenges posed by higher data dimensions.

Table 1.

Scalability Analysis Results on METR-LA Dataset

Number of Sensors	50	100	200
MAE (Proposed, mph)	2.31	2.43	2.51
RMSE (Proposed, mph)	4.52	4.67	4.80
MAE (ST-GCN, mph)	2.45	2.60	2.78
RMSE (ST-GCN, mph)	4.76	4.98	5.24
MAE (LSTM, mph)	2.68	2.83	3.01
RMSE (LSTM, mph)	5.05	5.32	5.60

Robustness was evaluated by introducing missing data into the METR-LA dataset at rates of 10%, 20%, and 30%. As depicted in Table 2, the proposed model demonstrated remarkable resilience, with MAE increasing by only 15.9% under 30% data loss. In comparison, baseline models such as T-GCN and LSTM showed steeper degradation in accuracy. This robustness stems from the model's attention mechanism, which adaptively assigns weights to input features, effectively reducing the influence of missing data.

Table 2.

Robustness to Missing Data

Missing Data Rate	10%	20%	30%
MAE (Proposed, mph)	2.45	2.62	2.84
RMSE (Proposed, mph)	4.69	4.92	5.18
MAE (T-GCN, mph)	2.61	2.82	3.12
RMSE (T-GCN, mph)	4.88	5.16	5.54
MAE (LSTM, mph)	2.85	3.12	3.45
RMSE (LSTM, mph)	5.15	5.46	5.84

The results highlight the proposed model’s advantages in both scalability and robustness. The minimal increase in error with larger network sizes and higher data loss underscores the model’s adaptability to real-world scenarios. These improvements are attributed to its hybrid spatial-temporal feature extraction and dynamic attention mechanisms, which effectively manage increased data complexity and incomplete input data.

3.4

Computational Efficiency and Long-Term Prediction

The computational efficiency of the proposed model was assessed by measuring training time, inference time, and GPU memory usage during training and prediction. These metrics were compared against state-of-the-art models, such as LSTM, T-GCN, and ST-GCN, utilizing the METR-LA dataset.

Computational Efficiency. Table 3 summarizes the computational efficiency results. The proposed model achieved a training speed of 1.8 seconds per epoch, significantly faster than ST-GCN (2.6 seconds per epoch) and LSTM (3.1 seconds per epoch). This efficiency can be attributed to the hybrid attention mechanism, which reduces redundant computations by focusing on the most relevant spatial-temporal features. Similarly, the inference time of the proposed model was 42 milliseconds per batch, which is approximately 21% faster than the closest competitor, ST-GCN. Moreover, the model’s memory usage was 13% lower than LSTM, demonstrating its ability to handle larger datasets without excessive hardware demands.

Table 3.

Computational Efficiency Analysis

Model	Training Time (s/epoch)	Inference Time (ms/batch)	GPU Memory (GB)
Proposed Model	1.8	42	6.2
ST-GCN	2.6	53	6.8
T-GCN	2.9	56	7.1
LSTM	3.1	59	7.2

Long-Term Prediction Accuracy. The model's ability to predict over extended horizons was tested with a 60-minute prediction window. Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) were evaluated for prediction intervals of 15, 30, and 60 minutes. As shown in Figure 2, the proposed model consistently outperformed baseline models, with an MAE of 3.12 mph at the 60-minute horizon, compared to 3.45 mph for ST-GCN and 3.74 mph for LSTM. The model’s superior long-term performance can be attributed to its dynamic attention mechanism, which prioritizes critical temporal features and minimizes error propagation over extended time horizons.

Error Trend Analysis. Figure 3 illustrates the error trend over increasing prediction horizons. While all models show a gradual increase in error as the horizon extends, the proposed model exhibits a more stable trend with smaller increments between 15, 30, and 60 minutes. This stability demonstrates the model's superior capability to capture long-term dependencies compared to baseline methods, making it suitable for real-world applications requiring extended predictions.

Insights. The results from computational efficiency and long-term prediction evaluations underscore the practical advantages of the proposed model. Its computational efficiency ensures rapid deployment and adaptability in resource-constrained environments, while its robust long-term prediction accuracy highlights its ability to handle extended forecast horizons.

3.5

Discussion of Results

The experimental results clearly demonstrate the superiority of the proposed model in predicting traffic flow across varying time horizons, as evidenced by its consistently lower MAE compared to benchmark models such as ST-GCN, T-GCN, and LSTM. The scalability and robustness evaluation highlighted the model’s ability to adapt to datasets of different sizes and complexities, maintaining high prediction accuracy even under scenarios of increased data volume and dynamic variations in traffic patterns. This advantage can be attributed to the effective integration of spatial-temporal feature extraction with deep neural architectures, which allows the model to capture both local and global traffic dynamics. Furthermore, the computational efficiency and long-term prediction experiments revealed that the proposed model not only reduces training time but also achieves reliable predictions over extended periods. This is primarily due to the optimization techniques employed during training, including gradient-based tuning and loss function customization, which ensure convergence to optimal solutions without overfitting. By leveraging its ability to learn spatial dependencies through graph convolution and temporal patterns through sequence modeling, the model inherently overcomes the limitations of traditional approaches that often treat spatial and temporal features independently. The model's robustness to noise and its ability to generalize across various urban networks emphasize its practical applicability in real-world settings. These results highlight the potential of the proposed approach to tackle key challenges in traffic flow prediction, providing a reliable, scalable, and computationally efficient solution for complex urban road systems.

4

Conclusion

This study presented a deep learning model for predicting traffic flow in urban road networks,addressing key challenges such as scalability, robustness, and computational efficiency. By integrating spatial-temporal feature extraction with graph-based convolution and sequence modeling, the model effectively captures complex traffic dependencies. Extensive experiments demonstrated its advantages over state-of-the-art methods, delivering higher prediction accuracy, improved computational efficiency, and better scalability across diverse datasets. Additionally, the model's resilience to noise and ability to generalize across varying traffic conditions highlight its practical relevance for real-world urban applications. This work provides a scalable and adaptive solution to meet the growing demands of intelligent transportation systems, supporting optimized urban traffic management. Future efforts will focus on enhancing the model with multi-modal data sources and exploring its use in multi-objective optimization for smarter urban mobility.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

Traffic Flow Prediction Using Deep Learning Techniques in Urban Road Networks

Yilin Han

Published Online: Mar 17, 2025

Received: Oct 15, 2024

Accepted: Feb 12, 2025

DOI: https://doi.org/10.2478/amns-2025-0832

KeywordsTraffic Flow Prediction, Deep Learning, Urban Road Networks, Graph Neural Networks (GNN), Long Short-Term Memory (LSTM), Transformer Models, Smart City, Spatial-Temporal Analysis

© 2025 Yilin Han, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Keywords
Traffic Flow Prediction, Deep Learning, Urban Road Networks, Graph Neural Networks (GNN), Long Short-Term Memory (LSTM), Transformer Models, Smart City, Spatial-Temporal Analysis