Feature identification and processing strategies of machine learning techniques in big data traffic analysis

Li, Ze

Open Access

Feature identification and processing strategies of machine learning techniques in big data traffic analysis

Ze Li

Li, Ze

Sep 24, 2025

Feature identification and processing strategies of machine learning techniques in big data traffic analysis's Cover Image

Applied Mathematics and Nonlinear Sciences

Volume 10 (2025): Issue 1 (January 2025)

About this article

Cite

Share

Download Cover

Published Online: Sep 24, 2025

Received: Dec 28, 2024

Accepted: Apr 27, 2025

DOI: https://doi.org/10.2478/amns-2025-0997

Keywords
SAE model, SCNN model, Distance function, Feature extraction, Traffic identification technique

© 2025 Ze Li, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Large number according to traffic identification processing process

The model of the classification confusion matrix in the test set

New attack detection changes before and after the update (%)

Test frequency	Normal flow		New attack		Attack flow
Test frequency	Before	After	Before	After	Before	After
1	98.15	98.88	91.21	97.38	97.23	99.71
2	97.89	98.45	90.66	97.86	97.61	99.14
3	97.5	98.14	90.2	97.78	98.03	98.53
4	97.57	98	90.23	97.87	98.5	98.49
5	97.71	98.46	91.94	96.92	97.52	98.96
6	98.26	98.94	90.69	96.31	97.85	98.04
7	98.35	98.43	90.69	96.28	97.02	98.13
8	97.58	98.1	91.83	97.17	98.03	98.34
9	97.98	98.46	90.79	98	98.25	99
10	98.09	98.49	90.44	96.43	97.72	98.97
11	98.07	98.01	90.71	97.56	97.57	99.96
12	97.31	98.29	90.19	96.36	98.17	98.16
13	98.1	98.25	91.85	97.97	97.73	99.31
14	97.9	98.92	91.61	97.07	98.27	98.78
15	97.2	98.92	91.59	96.77	97.94	99.48
16	98.17	98.25	91.28	97.89	97.58	99.71
17	97.16	98.94	90.16	96.95	97.02	98.65
18	97.67	98.95	91.92	96.62	97.5	99.55
19	97.82	98.69	90.56	96.14	97.87	98.91
20	97.36	98.47	91.54	97.86	97.11	98.73
Mean value	97.79	98.50	91.00	97.16	97.73	98.93

Test the data centralized flow type statistics

CICIDS2017
Tags	Flow type	Training set	Test set
0	Normal	3256245	1395534
1	DoS GoldenEye	7524	3225
2	DoS Hulk	152634	65415
3	DoS SlowHTTPTest	3526	1511
4	DoS SlowLoris	4528	1941
5	Heartbleed	15	6
UNSW-NB15
Tags	Flow type	Training set	Test set
0	Normal	54222	23238
1	DoS GoldenEye	5963	2556
2	DoS Hulk	6852	2937
3	DoS SlowHTTPTest	7724	3310
4	DoS SlowLoris	1524	653

Experimental results of different models in CICIDS2017

Model	Precision (%)	Recall (%)	Accuracy (%)	F1-Score (%)
BiAE-KNN	90.01	92.84	94.78	91.21
BiAE-MLP	91.39	89.19	93.88	91.85
BiAE-RF	90.47	91.16	92.1	93.34
GBDT	91.19	92.27	91.29	93.56
AdaBoost	90.15	90.93	92.02	90.83
This article	96.35	97.52	96.1	95.63

Experimental results of different models in UNSW-NB15

Model	Precision (%)	Recall (%)	Accuracy (%)	F1-Score (%)
BiAE-KNN	92.06	92.17	92.54	93.65
BiAE-MLP	93.78	93.78	92.09	93.87
BiAE-RF	92.78	93.94	92.53	93.4
GBDT	93.07	93.57	92.93	92.09
AdaBoost	92.15	92.72	93.07	93.64
This article	98.75	97.35	97.28	96.35

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

Feature identification and processing strategies of machine learning techniques in big data traffic analysis

Published Online: Sep 24, 2025

Received: Dec 28, 2024

Accepted: Apr 27, 2025

DOI: https://doi.org/10.2478/amns-2025-0997

Keywords
SAE model, SCNN model, Distance function, Feature extraction, Traffic identification technique

© 2025 Ze Li, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Figure 1.

Figure 2.

Figure 3.

Figure 4.

New attack detection changes before and after the update (%)

Test the data centralized flow type statistics

Experimental results of different models in CICIDS2017

Experimental results of different models in UNSW-NB15

Feature identification and processing strategies of machine learning techniques in big data traffic analysis

Ze Li

Published Online: Sep 24, 2025

Received: Dec 28, 2024

Accepted: Apr 27, 2025

DOI: https://doi.org/10.2478/amns-2025-0997

KeywordsSAE model, SCNN model, Distance function, Feature extraction, Traffic identification technique

© 2025 Ze Li, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Figure 1.

Figure 2.

Figure 3.

Figure 4.

New attack detection changes before and after the update (%)

Test the data centralized flow type statistics

Experimental results of different models in CICIDS2017

Experimental results of different models in UNSW-NB15

Keywords
SAE model, SCNN model, Distance function, Feature extraction, Traffic identification technique