Design of Convolutional Neural Network Optimization Algorithm Based on Embedded System and Its Application in Real-Time Image Processing
oraz
24 mar 2025
O artykule
Data publikacji: 24 mar 2025
Otrzymano: 06 paź 2024
Przyjęty: 02 lut 2025
DOI: https://doi.org/10.2478/amns-2025-0744
Słowa kluczowe
© 2025 Baoyuan Liu et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Logical resource consumption statistics
| LUT | LUTRAM | DSP | BRAM36K | FF | |
|---|---|---|---|---|---|
| Convolution accelerator | 28.9K | 14.7K | 331 | 43 | 17.6K |
| ARM soft core | 14.8K | 0 | 5 | 19 | 3.2K |
| AXI DMA | 28.1K | 2.21K | 0 | 31.6 | 6.7K |
| total | 71.8K | 16.91K | 336 | 93.6 | 27.5K |
CNN performance test results
| t number | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|
| Sample size | 100 | 100 | 100 | 100 | 100 |
| True number | 130 | 159 | 142 | 129 | 155 |
| Number of omissions | 6 | 17 | 9 | 10 | 19 |
| Missing rate | 4.6% | 10.7% | 6.3% | 7.6% | 12.2% |
| Average missed detection rate | 8.28% | ||||
| Identification number | 128 | 154 | 138 | 128 | 151 |
| Recognition rate | 98.4% | 96.8% | 97.2% | 99.2% | 97.4% |
| Average recognition rate | 97.8% | ||||
| Sheet time | 1.23s | 1.46s | 0.92s | 1.12s | 1.24s |
| Average time spent per sheet | 1.19s |
Comparison of CNN network operation time in different hardware
| Operation time/s | Cortex-A9 single-core | Intel CPU | Zynq-7035 |
|---|---|---|---|
| Conv1+Pool1 | 20.5745 | 0.9790 | 0.1619 |
| Conv2+Pool2 | 161.4176 | 5.4650 | 0.2265 |
| Conv3 | 160.3363 | 5.4470 | 0.2023 |
| Conv4+Pool3 | 320.6059 | 10.9780 | 0.3118 |
| Conv5 | 159.0134 | 5.5630 | 0.1875 |
| Conv6+Pool4 | 318.0369 | 11.0760 | 0.3334 |
| Conv7 | 79.7953 | 2.7340 | 0.1552 |
| Conv8+Pool5 | 79.7529 | 2.7350 | 0.1497 |
| FC1 | 4.5438 | 0.0530 | 0.2569 |
| FC2 | 0.0261 | 0.0000 | 0.0021 |
| FC3 | 0.0002 | 0.0000 | 0.0000 |
| Total time | 1304.1029 | 45.03 | 1.9858 |
| Total duration ratio | ×658.12 | ×23.18 | ×1.0 |
| Convolution layer duration ratio | 99.72% | 99.69% | 87.15% |
Comparison of object detection hardware-level acceleration experiments
| CPU | GPU | Zynq | |
|---|---|---|---|
| Experimental platform | Intel core i510400f | NVIDIA GTX 3060 | ZU5EV |
| Development language | C | C | Verilog HDL |
| mAP | 71% | 71% | 69% |
| Data accuracy | Float32 | Float32 | INT8 |
| FPS | 30 | 355 | 220 |
| Power consumption(W) | 70W | 185W | 4.5W |
| Handling capacity(GOP) | 0.94 | 202 | 145 |
| Energy efficiency ratio(GOP/W) | 0.15 | 1.13 | 31.4 |
| Clock frequency | 2.8GHz | 1321MHz | 201MHz |
