Research on Efficient Algorithms for Intelligent Computing in Big Data Analytics
, und
03. Feb. 2025
Über diesen Artikel
Online veröffentlicht: 03. Feb. 2025
Eingereicht: 15. Sept. 2024
Akzeptiert: 04. Jan. 2025
DOI: https://doi.org/10.2478/amns-2025-0020
Schlüsselwörter
© 2025 Xiguo Zhou et al., published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

Comparison of query execution time
| Database | Unit: ms | Q1 | Q2 | Q3 | Q4 | Q5 | Q6 | Q7 | Q8 | Q9 | Q10 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LUBM-5 | Hadoop HDFS | Cold | 235 | 9445 | 241 | 369 | 425 | 1491 | 299 | 365 | 14K | 277 |
| Hot | 114 | 9188 | 159 | 152 | 194 | 513 | 109 | 142 | 14K | 152 | ||
| Jena-Hbase | Cold | 20K | 11K | 60K | 4256 | 62K | 2378 | NA | NA | NA | 18K | |
| Hot | 16K | 10K | 45K | 4024 | 9345 | 864 | NA | 322K | NA | 18K | ||
| SHARD | Cold | 156K | 302K | 184K | 212K | 287K | 672K | 65K | 203K | 856K | 200K | |
| Hot | 101K | 285K | 112K | 124K | 169K | 611K | 42K | 172K | 432K | 142K | ||
| LUBM-50 | Hadoop HDFS | Cold | 244 | 9051 | 303 | 314 | 415 | 2003 | 511 | 425 | 14K | 363 |
| Hot | 112 | 8879 | 115 | 164 | 185 | 1734 | 203 | 302 | 14K | 122 | ||
| Jena-Hbase | - | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | |
| SHARD | Cold | 188K | 415K | 224K | 306K | 179K | 406K | 206K | 108K | 425K | 174K | |
| Hot | 116K | 315K | 189K | 177K | 133K | 342K | 166K | 77K | 348K | 130K | ||
| LUBM-500 | Hadoop HDFS | Cold | 218 | 8974 | 266 | 273 | 231 | 18K | 237 | 321 | 15K | 227 |
| Hot | 112 | 8546 | 105 | 130 | 121 | 17K | 133 | 201 | 15K | 102 | ||
| Jena-Hbase | - | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | |
| SHARD | Cold | 306K | 986K | 426K | 387K | 462K | 884K | 506K | 472K | 926K | 412K | |
| Hot | 245K | 758K | 285K | 204K | 306K | 695K | 330K | 394K | 734K | 283K |
Hadoop HDFS index storage usage
| LUBM-5 | LUBM-50 | LUBM-500 | |
|---|---|---|---|
| Total | 195.4MB | 2.0GB | 17.9GB |
| Avg.±Std. | 10.25±1.68MB | 118.00±19.48MB | 1.02GB±203.45MB |
Comparison of clustering time cost of different parallel DBSCAN algorithms
| Data set | Algorithm | Clustering time |
|---|---|---|
| R15 | Naive DBSCAN | 20.485s |
| Spark DBSCAN | 17.065s | |
| Jain | Naive DBSCAN | 18.746s |
| Spark DBSCAN | 15.062s | |
| Pathbased | Naive DBSCAN | 17.223s |
| Spark DBSCAN | 16.012s | |
| Aggregation | Naive DBSCAN | 15.462s |
| Spark DBSCAN | 4.726s | |
| D31 | Naive DBSCAN | 87.633s |
| Spark DBSCAN | 40.745s |
Comparison of clustering result indexes of different parallel DBSCAN algorithms
| Data set | Algorithm | Silhouette coefficient | Purity | Rand index | Adjusted Rand index | F1-score |
|---|---|---|---|---|---|---|
| R15 | Naive DBSCAN | 0.7658 | 0.9644 | 0.9685 | 0.9532 | 0.9412 |
| Spark DBSCAN | 0.7346 | 0.9416 | 0.9602 | 0.9263 | 0.9331 | |
| Jain | Naive DBSCAN | 0.3015 | 0.9745 | 0.4913 | 0.1026 | 0.2578 |
| Spark DBSCAN | 0.3015 | 0.9745 | 0.4913 | 0.1026 | 0.2578 | |
| Pathbased | Naive DBSCAN | 0.3562 | 0.9278 | 0.7016 | 0.1152 | 0.1723 |
| Spark DBSCAN | 0.3562 | 0.9278 | 0.7016 | 0.1152 | 0.1723 | |
| Aggregation | Naive DBSCAN | 0.3325 | 0.8244 | 0.8078 | 0.1605 | 0.2346 |
| Spark DBSCAN | 0.3325 | 0.8244 | 0.8078 | 0.1605 | 0.2346 | |
| D31 | Naive DBSCAN | 0.5815 | 0.9045 | 0.9952 | 0.8142 | 0.8156 |
| Spark DBSCAN | 0.5685 | 0.8712 | 0.9896 | 0.7724 | 0.7789 |
