Research on Collection and Preprocessing Strategies of Traffic Data Driven by Big Data 
Data publikacji: 21 mar 2025
Otrzymano: 07 lis 2024
Przyjęty: 08 lut 2025
DOI: https://doi.org/10.2478/amns-2025-0613
Słowa kluczowe
© 2025 Hongyu Shi, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The ever-developing information technology has also updated the means of collecting transportation worker data, which enables transportation data to be collected from richer sources and to broaden the application areas of the data. Nowadays, traffic data has shown the development of multiple datasets [1-3]. The so-called static traffic data, in fact, is a certain period of time, the transportation system in the stable information data, facility information, traffic volume, etc. are its main content, while the dynamic traffic data, traffic control information, real-time traffic flow information, such as continuous changes in the information [4-6].
Intelligent transportation is a product of the fusion of electronic information technology and transportation technology, which is the best means to solve urban traffic congestion, improve driving safety, and improve travel efficiency [7-8]. Intelligent transportation system (ITS) is an inevitable trend in the development of the transportation industry, and the collection and processing of real-time dynamic traffic information is a prerequisite for the realization of ITS. If there is no massive traffic information collection and processing as a support, ITS will only stay in the conceptual stage [9-10].
To recognize big data, cloud computing must be recognized first. Cloud computing is a new type of computing processing developed in recent years, is a support platform for processing big data, is a systematic project in the era of big data, and is significantly affecting the process of information technology and transportation industry service [11-13].
The technique of pre-processing data before data mining is called data preprocessing. Its main working principle is to carry out a series of processing work such as necessary cleaning, integration, transformation, discretization and statute of the original data, so that it meets the minimum specifications and standards required by data mining algorithms to carry out knowledge acquisition and research [14-16]. Since the change process of dynamic traffic data is a real-time, nonlinear, high-latitude, non-stationary stochastic process, which has the characteristics of uncertainty, randomness, and incompleteness with different factors such as road and time, data preprocessing research is required [17-18]. In a complete data mining process, preprocessing techniques take about 60% of the time. First of all, data preprocessing as a key step to improve data quality, its importance is mainly manifested in the following aspects, first of all, ITS massive data has a large number of characteristics and distribution, resulting in the traditional manual elimination and screening methods are not suitable for dealing with massive data. Secondly, the continuous working conditions and environment for a long time make the chances of errors and failures of various traffic detection equipments greatly increase [19-21]. Furthermore, different systems and users have different requirements for data quality and accuracy, and need to adopt different preprocessing methods for real-time traffic data. Common data preprocessing techniques are data cleaning, data integration, data transformation and data statute, data cleaning, i.e., filling missing data values, smoothing noisy data, identifying or removing outliers, and correcting inconsistencies in the data [22-23]. Data integration refers to merging data from multiple data sources together to form a consistent data store, data transformation means, transforming data into a form suitable for mining, such as scaling attribute data to fall into a relatively small and specific interval. Data statute, on the other hand, represents, without affecting the mining results, the use of numerical aggregation, the removal of redundant features to compress the data, to improve the quality of the mining model and to reduce the time complexity. It should be noted that the above mentioned approaches are interrelated [24-25]. For example, the removal of redundant data is both a form of data cleansing and a data statute, and the data cleansing process still needs to be carried out after the integration of the data.
The main applications of big data technology to empower intelligent transportation systems include traffic data collection, data analysis and prediction, and data storage, etc., and the relevant research on the data sources of intelligent transportation systems based on big data technology includes, Literature [26] classifies the data sources used for traffic flow prediction, quantifies the contribution of data sources such as social media and cellular networks, and compares and studies traffic prediction data sources from the perspectives of reliability, adequacy, and operation and maintenance costs.Some scholars have focused on analyzing the data collection and mining strategies, from algorithms, tools, and other perspectives, in Intelligent Transportation Systems (ITS). Literature [27] conceived a big data analytics architecture for Intelligent Transportation Systems (ITS) to enable the analysis and storage of data collected from transportation multimodal sensors, and the feasibility of the proposed big data analytics architecture was verified using Hadoop program.Literature [28] introduces an automatic vehicle detection and counting strategy based on distributed fiber optic sensors and confirms that the proposed strategy is effective through performance tests, and finally, it also proposes the idea of rough vehicle classification, which achieves the classification and identification of heavy and light vehicles. The most popular research direction is how to use big data technology to preprocess and predict traffic data for control. Literature [29] discusses the problems related to traffic data acquisition and dynamic processing, and proposes to combine software analysis and visualization techniques to cope with the prediction and analysis of traffic flow in each time period, and the study provides an important reference for researchers in the field of transportation. Literature [30] aims to optimize the modern traffic information physical fusion network fitness, data type complexity, data collection and data transportation and other issues, proposed the intelligent traffic information fusion cloud control system program, and through simulation experiments for the relevant validation, confirmed the proposed program in the prediction of urban traffic flow as well as good performance in traffic flow control. Literature [31] compared and studied the performance of full sample demand distribution model, traffic integration model and model biological protein expression database model in traffic flow prediction, emphasized that the development of big data technology has brought new development and challenges to the field of traffic flow prediction, and pointed out that how to reasonably utilize the big data technology is the future key research direction in the field of traffic flow prediction. Finally, some researchers also analyzed the value of traffic data information based on big data technology, intelligent transportation system communication based on big data technology and other issues. Literature [32] discusses the application potential of spatio-temporal big data generated in the field of transportation, including the prediction of individual travel demand as well as transportation demand, and the provision of reference for the planning and construction of urban transportation networks. Literature [33] envisioned an intelligent transportation system architecture with big data analytics as the core logic to support data information processing and friendly communication in intelligent transportation, and the effectiveness of the proposed scheme was confirmed in data evaluation tests with multiple input libraries.
This paper analyzes the basic data system and information characteristics of traffic data, and processes the microwave data acquired by RTMS instruments on the expressway in M city. The preprocessed traffic flow data’s missing states are categorized. On the basis of considering the multidimensional spatio-temporal characteristics of lane traffic flow on urban expressways, tensor decomposition theory is introduced, and a tensor Tucker decomposition-based traffic flow missing data repair method (TDIM) is constructed. Five lane cross-sections in M city are selected as the research objects, and the tensor input model of expressway lane correlation is constructed, and the repair effect of the TDIM method is examined by constructing the missing tensor.
According to the different ways of obtaining traffic big data can be subdivided into four types of traffic flow monitoring data, traffic floating vehicle monitoring data, traffic monitoring video data and traffic service data data types of data include both unstructured data such as video, images and other types of data, including some structured data, according to the different sources of data and different information characteristics of the visual analysis of traffic big data also presents diverse characteristics.
Traffic flow monitoring data refers to the information about traffic flow and passenger flow collected by vehicle network sensor devices laid on urban road networks.Traffic flow monitoring information is generally obtained through a variety of sensors, such as loop coil detectors, ultrasonic coil detectors, magnetic coil detectors, microwave ultraviolet detectors, and so on.
Floating vehicles are ordinary vehicles equipped with satellite positioning and communication control devices.Floating cars during the driving process can provide real-time access to information such as driving time, driving direction, driving speed, position coordinates, and other data. The dynamic data information obtained through the floating vehicle collection equipment has the technical advantages of high efficiency, flexibility and low cost, and can constantly provide a large number of valuable data samples for traffic visual analysis.
Mainly through the establishment of a digital monitoring network covering highways, major traffic arteries and intersections, and the acquisition of traffic monitoring data in real time, uploaded to the vehicle scheduling center, through the visualization system to monitor the implementation of vehicle monitoring based on the video monitoring data can identify the vehicle characteristics, matching vehicle information and predict the location of the vehicle, to provide a source of data for the traffic visual analysis.
Mainly refers to a series of unstructured traffic service information released by the media to remind travelers to understand the current traffic conditions and assist in travel planning. Similar to traditional traffic data, traffic service data is unstructured data, and the use of this type of data also requires structured processing to convert it into user-recognizable text or image information.
1) Spatio-temporal mobility
Traffic data is dynamic and shows how a moving object’s trajectory changes at different locations. Since the traffic target cannot generate a trajectory route at a precise point in time, the change process of the traffic entity must be described based on continuous points in time. The movement process of the traffic entity can be regarded as a series of spatial-time pairs, through the use of a variety of data sensing equipment can record a huge number of spatial-temporal pairs (points) and thus generate the movement trajectory of the entity target, based on the visualization of the traffic situational information to be perceived.
2) Multi-source heterogeneity
Multisource refers to a wide range of traffic information sources, data types, and formats.Its information sources, information forms, occurrence times, occurrence spaces, and information users are very different.Heterogeneity mainly manifests itself in different forms of data and information expression, with different levels of certainty and standard formats. As traffic data come from different collection devices and application systems, there are different interface standards and storage formats, which leads to the existence of a large number of unstructured data and structured data with different attributes in traffic data.
3) Cycle similarity Although urban transportation is in constant dynamic change, in the same geographic location and time range, the travel characteristics of residents in a certain time cycle shows repetitiveness and regularity. Based on the information characteristics of traffic data cycle similarity, it can be combined with the existing traffic sample data through visual analysis to make a prediction of its traffic trends in the future.Based on the information characteristics of the traffic data cycle similarity, we can combine the existing traffic sample data and make predictions on the trend of the traffic situation in the future using visual analysis.
4) Regional correlation
The structure of urban transportation network is complex, and the traffic flow between different roads or regions often interacts with each other, resulting in the phenomenon of fusion and separation of traffic flow, which produces a certain correlation relationship. Based on the correlation analysis of regional traffic flow, the change of regional traffic structure can be studied to provide decision makers with a reference for decision-making.The characteristics of traffic flow cycles, such as similarity and regional correlation, are the basis for predicting traffic dynamics.
The data source of this paper is the microwave data acquired by the RTMS instrument on the expressway of M city [34], and in the subsequent research and analysis of this paper, the data taken is the traffic flow data located on a roadway in M city, which is collected from March 8, 2023 to March 22, 2023, a total of two weeks of data. The microwave data are collected every two minutes, i.e., the time interval between two adjacent data is 2 min, and each detector collects 800 data per day, and the data content mainly includes fields such as collection time, lane flow, lane speed, and lane occupancy. The fields contained in the raw data are: detector number (POSID), moment of data collection (TIMETAG), flow (VOLUME), speed (SPEED) and occupancy (OCCUPANCY) of each lane. The study section is a bidirectional 6-lane roadway, divided into inner ring lanes and outer ring lanes, with the middle of the roadway separated by a guardrail as a central barrier, for the three outer ring lanes, the L1, L2, and L3 lanes, respectively, from the middle band sequentially outward; for the three inner ring lanes, the same from the middle band sequentially outward for the L11, L12, and L13 lanes, respectively.
This paper proposes the following method for the processing of timestamps [35]:
The time of a day is divided into 720 intervals according to the interval of 2 min, and each interval can be expressed as  I. If there exists a unique  II. If there are two or more  III. If there is no corresponding 
Threshold theory refers to the identification of abnormal data by setting a reasonable critical value interval for the parameters and observing whether the actual detected values of the parameters are within the specified theoretical threshold range. In this paper, for any traffic flow parameter of flow rate, speed and occupancy, the actual detected values in the microwave data are screened by giving its reasonable value range, and the data points which are not within the threshold range are eliminated. This method can simply and quickly remove the data with abnormal values and make the values of each parameter of the study roadway realistic.
I. Flow rate:
where.
II. Speed:
Where, 
III. Occupancy:
Since the original data selected in this paper are microwave data, the occupancy data obtained from the detection is time occupancy, i.e., the proportion of the time in which a vehicle is detected to have passed over a period of time during the observation process to the total time of detection in that period, so it is expressed in the form of a percentage, and accordingly, the size of its value should be in the range of [0, 100%].
A traffic flow mechanism is a theoretical representation of the correlation relationship between different parameters, thus describing the operation of the traffic flow at the macro level. Under the premise that the detected values of each parameter are within the threshold range, it is also necessary to satisfy the logical relationship that conforms to the characteristics of the traffic flow. For example, when one of the parameters of a lane at a certain point in time is 0, if the other two parameters are not equal to 0, which is in line with the threshold theory but not in line with the combination of logical relationships between the parameters of the traffic flow, the data is abnormal data, which may be due to the software or hardware failures and other problems that lead to the content of the data is not recorded, and it should be repaired or eliminated from the process; if the other two parameters are also 0, it indicates that no vehicle passes through at that point in time and the detector does not detect vehicle data, which is in line with the combined logical relationship between the parameters and the operational characteristics of the traffic flow, then the data is recorded correctly.
Depending on the differences in the collection methods of traffic speed time series data, data correlation analysis can be performed in two dimensions: the dimension of adjacent time series and the dimension of cycle time series. Adjacent time series refers to the time series of traffic data for the same road section in a continuous and uninterrupted time range. Taking the traffic speed dataset of M city as an example, the traffic speed data of its same road section for the week of March 1 to March 7, 2023 is analyzed. The speed time series plot in adjacent time series mode is shown in Figure 1. Analyzing the data observations, it can be learned that in the consecutive one-week time period, the value of the traffic speed of the same road does not change much, and the overall trend of change is similar.

The velocity data diagram of one star period on a road in M city
Next, the correlation between the traffic speed data for this section of the roadway was calculated and analyzed according to the Pearson correlation function (Pearson). The Pearson correlation coefficient values for the weekly speed data are shown in Table 1. The results show that the Pearson correlation coefficient value of weekday traffic speed data from Monday to Friday is between 0.8915-0.9343, and the traffic speed data is linearly highly correlated, which indicates that the traffic speed data from Monday to Friday can be used as the reference data for the data repair model. And the similarity of the traffic speed data of the two days of the holiday weekend is 0.7867, showing a high degree of linear correlation, which indicates that the traffic speed data of the two days of the weekend are also significantly similar, and can also be used as the reference data of the traffic data interpolation model. However, in conjunction with the example in the circle in Fig. 1, it can be seen that the degree of similarity of the traffic speed data series between weekdays and two days of weekend compared to weekdays is in the range of 0.6116-0.684, which is a linear significant correlation due to the different start and end times of the peak traffic hours in these two periods of time. Therefore, although there is a certain degree of similarity between the traffic data of weekdays and two days of weekend, the degree of similarity is not high enough to be well used as reference data for the interpolation model of traffic data.
The velocity data Pearson correlation coefficient value
| ρ | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday | 
|---|---|---|---|---|---|---|---|
| Monday | 1 | ||||||
| Tuesday | 0.9324 | 1 | |||||
| Wednesday | 0.9245 | 0.9343 | 1 | ||||
| Thursday | 0.9211 | 0.8915 | 0.9319 | 1 | |||
| Friday | 0.9141 | 0.9187 | 0.9123 | 0.9275 | 1 | ||
| Saturday | 0.635 | 0.6555 | 0.6569 | 0.6465 | 0.684 | 1 | |
| Sunday | 0.6122 | 0.6116 | 0.6382 | 0.6271 | 0.6143 | 0.7867 | 1 | 
Cycle time series, on the other hand, refers to the time series of traffic data on the same roadway at the same time of the week. As an example, the traffic speed dataset of City M is analyzed for its traffic speed data on the same roadway in March 2023 on four consecutive Wednesdays. The speed time series plot in cycle time series mode is shown in Figure 2. Analyzing the data observations, it can be learned that the values of traffic speeds on the same roadway do not differ much during the time period of the same date in a month and the overall trend shows similar changes.

The speed time sequence diagram of the periodic time sequence mode
The correlation between the traffic speed data for the same days of the month for this segment was calculated and analyzed using the Pearson correlation function. The results of Pearson’s correlation coefficient for four consecutive weeks of week 3 speed data are shown in Table 2. It was learned that the values of Pearson correlation coefficient for the time period of one month with the same dates ranged from 0.7219 to 0.8316. From the range of Pearson’s correlation coefficient values, it can be seen that the traffic speed data in the time period of one month with the same date belongs to the linear significant correlation, which indicates that the traffic speed data in the time period of one month with the same date can be considered as the reference data for the data repair model.
Pearson coefficient of velocity data for continuous four-star period
| ρ | Week 1 | Week 2 | Week 3 | Week 4 | 
|---|---|---|---|---|
| Week 1 | 1 | 0.8316 | 0.7792 | 0.7936 | 
| Week 2 | 1 | 0.7458 | 0.7219 | |
| Week 3 | 1 | 0.8248 | ||
| Week 4 | 1 | 
Also based on the above idea, in analyzing the spatial correlation of traffic data, this section delves into the traffic speed data of five adjacent roadway segments on the day of 7/3/2023. The adjacent roadway speed data are shown in Figure 3, which depicts the curves of traffic speed over time for these five adjacent roadways.

Adjacent road speed data
The values of Pearson correlation coefficients for speed data of neighboring roads are shown in Table 3. The analysis found that the Pearson correlation coefficients between the five selected road sections range from 0.5075 to 0.8714, indicating that the speed data between these road sections show a linear significant correlation. Meanwhile, the correlation between different road segments changes with spatial distance. For example, the correlation between road 4 and road 5 is higher than that between road 1 and road 3, and similarly the correlation between road 1 and road 2 and road 3 is higher than that between road 4 and road 5. After comprehensive analysis, it can be learned that the traffic speed data in different sections of the road shows a certain correlation, and the degree of this correlation is affected by the spatial structure of the road network, so the traffic data for the adjacent roads should be targeted to be considered as the reference data for the data repair model.
The correlation coefficient value of the adjacent road velocity data
| ρ | Road 1 | Road 2 | Road 3 | Road 4 | Road 5 | 
|---|---|---|---|---|---|
| Road 1 | 1 | ||||
| Road 2 | 0.8608 | 1 | |||
| Road 3 | 0.8249 | 0.8299 | 1 | ||
| Road 4 | 0.5107 | 0.5145 | 0.5075 | 1 | |
| Road 5 | 0.5286 | 0.5181 | 0.5153 | 0.8714 | 1 | 
A tensor of order 
Freely varying two of the tensor subscript variables and fixing the other subscripts yields a slice of the tensor, i.e. 
The matrixization of mode 
The expansion matrix of the third-order tensor model [36] is shown in Fig. 4.

The expansion matrix of the third order tensor model
The third-order tensor is X(X ∈ ℝ3×3×2), then the 
Definition 1: Tensor inner product
The inner product of two Nth-order tensors 
Definition 2: Frobenius Paradigm of a Tensor
The Frobenius norm of a tensor 
Definition 3: Kronecker product of a matrix
If we have matrix 
Definition 4: Khatri-Rao product of matrices
If there is a matrix 
Definition 5: Hadamard product of tensors
If there are two dimensions each element of tensor 
Definition 6: n-mode product of a tensor
If there is a tensor 
To facilitate the illustration of the 
Definition 7: Weighted Paradigm of a Tensor
If there is a tensor 
Tensor Tucker decomposition is the representation of a tensor as a kernel tensor multiplied by a matrix along each mode. For a third order tensor 
The Tucker decomposition is improved as an extension of the matrix singular value threshold decomposition in a higher dimensional space, and therefore, the Tucker decomposition is also called the higher order singular value decomposition (HOSVD).
The Tucker decomposition of a third-order tensor X ∈ ℝ

Tucker decomposition of third-order tensors
The Tucker decomposition process for the third-order tensor 
When the individual dimensions of the kernel tensor are the same and diagonal, i.e., 

Relationship between CP decomposition model and Tucker decomposition
If the above Tucker decomposition model is generalized to any higher order tensor, the Tucker decomposition model for the Nth order tensor 
The expansion matrix is of the following form:
Suppose A is the original tensor of dimension 
Eq. 
The expression for the TDIM model is shown below:
The goal of the TDIM model is to obtain the kernel tensor 
Higher order singular values [37] (HOSVD) are applied to determine the initial values as follows:
For a given (
The gradient of F(S,
The initial value of the kernel matrix 
The following equation can be obtained by making M = W*A,N = W*(
Since the values of W and A do not change during the iteration, M can be obtained by precomputation. Then the partial derivative of the objective function F(
To implement the TDIM traffic flow data repair algorithm, this paper applies the MATLAB programming platform as well as TensorToolbox2.6 The tensor processing package solves 
The specific steps of the TDIM traffic flow missing data repair algorithm proposed in this paper are shown below.
Step 1: Data input
Input data include: original traffic data tensor A , missing data location tensor 
Calculation M = W*A.
Step 2: Initialization
Apply HOSVD to compute the initial tensor 
Step 3: Iterative process
For 
Calculation N = W*(
Calculate 
Compute function gradients 
Calculate 
If 
End for
Step 4: Data Output
Output the estimation tensor 
In this paper, two types of traffic data are selected as experimental data, namely traffic flow data and traffic speed data. The superiority of Tucker’s tensor decomposition algorithm can also be verified by comparing the recovery results with those of five missing value recovery methods.
In order to verify the superiority of the algorithm in this paper, typical missing value recovery algorithms such as LRMC, PPCA, LLS, Kernel Sparse Representation based on Elastic Network Regularization (KSR-EN), and High Accuracy Low Rank Tensor Completion Method (Ha LRTC) are introduced for comprehensive comparison. Among them, LRMC and Ha LRTC belong to the matrix and tensor complementation based methods, respectively.PPCA is based on probabilistic generative model, and LLS adopts linear regression model for imputation.KSR-EN is a missing value recovery model based on augmented sparse representation that combines kernel learning with elastic network regularization, and utilizes kernel algorithm to perform sparse representations in high-dimensional nonlinear space. The parameters in each method are tuned to their optimal values during the experimental validation. In order to reduce the random effect, each trial is repeated 10 times, and the average value is taken as the final experimental result. Taking the traffic flow data as an example, the experimental results in MCAR, MIXED, and MAR data missing modes are shown in Table 4.
Experimental results in MCAR, MIXED&MAR data deletion mode
| Traffic flow data | δ | LRMC | PPCA | LLS | KSR-EN | HaLRTC | TDIM | 
|---|---|---|---|---|---|---|---|
| MCAR | 0.2 | 81.83 | 77.23 | 73.84 | 68.25 | 64.04 | 60.55 | 
| 0.3 | 83.2 | 81.57 | 81.42 | 70.61 | 65.91 | 60.34 | |
| 0.4 | 87.41 | 83.88 | 88.83 | 75.76 | 71.19 | 64.07 | |
| 0.5 | 89.1 | 88.86 | 100.5 | 79.92 | 73.39 | 66.15 | |
| 0.6 | 95.59 | 93.23 | 113.49 | 86.41 | 77.8 | 71.99 | |
| MIXED | 0.2 | 88.82 | 81.86 | 80.25 | 71.62 | 67.99 | 61.87 | 
| 0.3 | 91.48 | 87.66 | 84.98 | 76.34 | 69.17 | 62.72 | |
| 0.4 | 95.65 | 89.45 | 93.21 | 80.68 | 71.05 | 64.4 | |
| 0.5 | 97.36 | 94.29 | 102.48 | 87.28 | 74.92 | 69.69 | |
| 0.6 | 103.23 | 100.61 | 117.36 | 94.83 | 79.79 | 73.52 | |
| MAR | 0.2 | 98.81 | 90.37 | 84.32 | 77.2 | 68.15 | 66.23 | 
| 0.3 | 100.3 | 96.04 | 90.09 | 79.94 | 71.93 | 66.61 | |
| 0.4 | 102.17 | 98.61 | 100.03 | 86.83 | 74.93 | 72.39 | |
| 0.5 | 107.4 | 101.79 | 112.08 | 91.53 | 76.77 | 74.87 | |
| 0.6 | 108.91 | 105.64 | 129.41 | 99.1 | 85.72 | 79.6 | 
Meanwhile, in order to illustrate the recovery results more intuitively and fully demonstrate the degree of fit between the recovered values and the real values, the comparison results of the recovery results of different methods are shown in Fig. 7 when the traffic flow data are in the MIXED missing mode and  1) In general, MCAR missing mode has the least difficult data recovery, while MAR is the most difficult case to recover. This is reasonable because successive losses lose a lot of valuable information, which increases the difficulty of accurate recovery. It can also be observed that the recovery error of each method increases with the missing rate. 2) The poor recovery results of the LRMC, PPCA, and LLS methods may be due to the fact that their a priori assumptions cannot be satisfied on this dataset.The KSR-EN and Ha LRTC methods outperform the above methods in modeling complex spatio-temporal data. However, the recovery performance of Ha LRTC is better than that of KSR-EN, which implies that multidimensional correlation modeling of traffic data facilitates the recovery of missing values. 3) The proposed method in this paper outperforms HaLRTC and all other compared methods in terms of recovery results regardless of any missing rate and missing mode. And the recovery errors obtained by this paper’s algorithm are 34%-42.12%, 25.89%-35.82%, 29.58%-45.37% lower than LRMC, PPCA, LLS, KSR-EN and HaLRTC algorithms, respectively, under the three datasets, 25.72%-31.94%, and -5.16%-4.47%.

The results of different methods were compared
The experimental results of traffic speed data under different missing modes and missing rates are shown in Table 5, and we can observe a similar phenomenon.In summary, the experimental results show that the method presented in this paper can effectively improve the recovery performance of the low-rank tensor complementation method for spatio-temporal traffic data.
Data experiment results of different missing modes and missing rates
| Traffic flow data | δ | LRMC | PPCA | LLS | KSR-EN | HaLRTC | TDIM | 
|---|---|---|---|---|---|---|---|
| MCAR | 0.2 | 4.446 | 4.161 | 3.982 | 3.702 | 2.738 | 2.804 | 
| 0.3 | 4.522 | 4.402 | 4.403 | 3.833 | 2.842 | 2.792 | |
| 0.4 | 4.756 | 4.530 | 4.815 | 4.119 | 3.135 | 2.999 | |
| 0.5 | 4.850 | 4.807 | 5.463 | 4.350 | 3.257 | 3.115 | |
| 0.6 | 5.211 | 5.049 | 6.185 | 4.711 | 3.502 | 3.439 | |
| MIXED | 0.2 | 4.834 | 4.418 | 4.338 | 3.889 | 2.957 | 2.877 | 
| 0.3 | 4.982 | 4.740 | 4.601 | 4.151 | 3.023 | 2.924 | |
| 0.4 | 5.214 | 4.839 | 5.058 | 4.392 | 3.127 | 3.018 | |
| 0.5 | 5.309 | 5.108 | 5.573 | 4.759 | 3.342 | 3.312 | |
| 0.6 | 5.635 | 5.459 | 6.400 | 5.178 | 3.613 | 3.524 | |
| MAR | 0.2 | 5.389 | 4.891 | 4.564 | 4.199 | 2.966 | 3.119 | 
| 0.3 | 5.472 | 5.206 | 4.885 | 4.351 | 3.176 | 3.141 | |
| 0.4 | 5.576 | 5.348 | 5.437 | 4.734 | 3.343 | 3.462 | |
| 0.5 | 5.867 | 5.525 | 6.107 | 4.995 | 3.445 | 3.599 | |
| 0.6 | 5.951 | 5.739 | 7.069 | 5.416 | 3.942 | 3.862 | 
In this study, the microwave data of traffic are acquired by RTMS instrument, the time stamps and outliers of the collected data are preprocessed, the missing data of traffic flow are repaired by using Tucker tensor decomposition technique, and the expressway lane operation data are utilized for analysis.
1) This paper systematically analyzes the spatio-temporal characteristics of the road network traffic data from two perspectives: temporal correlation and spatial correlation, and finds that the strong spatio-temporal correlation demonstrated by the urban road network traffic data is reflected in multiple levels. On the one hand, there is an obvious change cycle law of the road itself; on the other hand, the traffic conditions between different road sections influence and propagate each other. Therefore, by combining the theory of spatio-temporal correlation research, the traffic data repair model can be constructed more comprehensively and accurately.
2) From the perspective of temporal and spatial correlation of traffic data, it mines the overall correlation of the data from a higher dimension, while taking into account the correlation between samples within the traffic data, and makes effective use of the data patterns.Compared with the comparison methods, the experimental results show that when recovering the data through this paper’s method, it has the lowest recovery error in three different datasets than the LRMC, PPCA, LLS, KSR-EN and HaLRTC algorithms by -5.16%-45.37%, having the lowest recovery error.
