Open Access

Feature identification and processing strategies of machine learning techniques in big data traffic analysis

  
Sep 24, 2025

Cite
Download Cover

Figure 1.

SAE model structure process
SAE model structure process

Figure 2.

Large number according to traffic identification processing process
Large number according to traffic identification processing process

Figure 3.

The model of the classification confusion matrix in the test set
The model of the classification confusion matrix in the test set

Figure 4.

Online Detection Accuracy
Online Detection Accuracy

New attack detection changes before and after the update (%)

Test frequency Normal flow New attack Attack flow
Before After Before After Before After
1 98.15 98.88 91.21 97.38 97.23 99.71
2 97.89 98.45 90.66 97.86 97.61 99.14
3 97.5 98.14 90.2 97.78 98.03 98.53
4 97.57 98 90.23 97.87 98.5 98.49
5 97.71 98.46 91.94 96.92 97.52 98.96
6 98.26 98.94 90.69 96.31 97.85 98.04
7 98.35 98.43 90.69 96.28 97.02 98.13
8 97.58 98.1 91.83 97.17 98.03 98.34
9 97.98 98.46 90.79 98 98.25 99
10 98.09 98.49 90.44 96.43 97.72 98.97
11 98.07 98.01 90.71 97.56 97.57 99.96
12 97.31 98.29 90.19 96.36 98.17 98.16
13 98.1 98.25 91.85 97.97 97.73 99.31
14 97.9 98.92 91.61 97.07 98.27 98.78
15 97.2 98.92 91.59 96.77 97.94 99.48
16 98.17 98.25 91.28 97.89 97.58 99.71
17 97.16 98.94 90.16 96.95 97.02 98.65
18 97.67 98.95 91.92 96.62 97.5 99.55
19 97.82 98.69 90.56 96.14 97.87 98.91
20 97.36 98.47 91.54 97.86 97.11 98.73
Mean value 97.79 98.50 91.00 97.16 97.73 98.93

Test the data centralized flow type statistics

CICIDS2017
Tags Flow type Training set Test set
0 Normal 3256245 1395534
1 DoS GoldenEye 7524 3225
2 DoS Hulk 152634 65415
3 DoS SlowHTTPTest 3526 1511
4 DoS SlowLoris 4528 1941
5 Heartbleed 15 6
UNSW-NB15
Tags Flow type Training set Test set
0 Normal 54222 23238
1 DoS GoldenEye 5963 2556
2 DoS Hulk 6852 2937
3 DoS SlowHTTPTest 7724 3310
4 DoS SlowLoris 1524 653

Experimental results of different models in CICIDS2017

Model Precision (%) Recall (%) Accuracy (%) F1-Score (%)
BiAE-KNN 90.01 92.84 94.78 91.21
BiAE-MLP 91.39 89.19 93.88 91.85
BiAE-RF 90.47 91.16 92.1 93.34
GBDT 91.19 92.27 91.29 93.56
AdaBoost 90.15 90.93 92.02 90.83
This article 96.35 97.52 96.1 95.63

Experimental results of different models in UNSW-NB15

Model Precision (%) Recall (%) Accuracy (%) F1-Score (%)
BiAE-KNN 92.06 92.17 92.54 93.65
BiAE-MLP 93.78 93.78 92.09 93.87
BiAE-RF 92.78 93.94 92.53 93.4
GBDT 93.07 93.57 92.93 92.09
AdaBoost 92.15 92.72 93.07 93.64
This article 98.75 97.35 97.28 96.35
Language:
English