Design of Convolutional Neural Network Optimization Algorithm Based on Embedded System and Its Application in Real-Time Image Processing 
 and   
Mar 24, 2025
About this article
Published Online: Mar 24, 2025
Received: Oct 06, 2024
Accepted: Feb 02, 2025
DOI: https://doi.org/10.2478/amns-2025-0744
Keywords
© 2025 Baoyuan Liu et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

Logical resource consumption statistics
| LUT | LUTRAM | DSP | BRAM36K | FF | |
|---|---|---|---|---|---|
| Convolution accelerator | 28.9K | 14.7K | 331 | 43 | 17.6K | 
| ARM soft core | 14.8K | 0 | 5 | 19 | 3.2K | 
| AXI DMA | 28.1K | 2.21K | 0 | 31.6 | 6.7K | 
| total | 71.8K | 16.91K | 336 | 93.6 | 27.5K | 
CNN performance test results
| t number | 1 | 2 | 3 | 4 | 5 | 
|---|---|---|---|---|---|
| Sample size | 100 | 100 | 100 | 100 | 100 | 
| True number | 130 | 159 | 142 | 129 | 155 | 
| Number of omissions | 6 | 17 | 9 | 10 | 19 | 
| Missing rate | 4.6% | 10.7% | 6.3% | 7.6% | 12.2% | 
| Average missed detection rate | 8.28% | ||||
| Identification number | 128 | 154 | 138 | 128 | 151 | 
| Recognition rate | 98.4% | 96.8% | 97.2% | 99.2% | 97.4% | 
| Average recognition rate | 97.8% | ||||
| Sheet time | 1.23s | 1.46s | 0.92s | 1.12s | 1.24s | 
| Average time spent per sheet | 1.19s | 
Comparison of CNN network operation time in different hardware
| Operation time/s | Cortex-A9 single-core | Intel CPU | Zynq-7035 | 
|---|---|---|---|
| Conv1+Pool1 | 20.5745 | 0.9790 | 0.1619 | 
| Conv2+Pool2 | 161.4176 | 5.4650 | 0.2265 | 
| Conv3 | 160.3363 | 5.4470 | 0.2023 | 
| Conv4+Pool3 | 320.6059 | 10.9780 | 0.3118 | 
| Conv5 | 159.0134 | 5.5630 | 0.1875 | 
| Conv6+Pool4 | 318.0369 | 11.0760 | 0.3334 | 
| Conv7 | 79.7953 | 2.7340 | 0.1552 | 
| Conv8+Pool5 | 79.7529 | 2.7350 | 0.1497 | 
| FC1 | 4.5438 | 0.0530 | 0.2569 | 
| FC2 | 0.0261 | 0.0000 | 0.0021 | 
| FC3 | 0.0002 | 0.0000 | 0.0000 | 
| Total time | 1304.1029 | 45.03 | 1.9858 | 
| Total duration ratio | ×658.12 | ×23.18 | ×1.0 | 
| Convolution layer duration ratio | 99.72% | 99.69% | 87.15% | 
Comparison of object detection hardware-level acceleration experiments
| CPU | GPU | Zynq | |
|---|---|---|---|
| Experimental platform | Intel core i510400f | NVIDIA GTX 3060 | ZU5EV | 
| Development language | C | C | Verilog HDL | 
| mAP | 71% | 71% | 69% | 
| Data accuracy | Float32 | Float32 | INT8 | 
| FPS | 30 | 355 | 220 | 
| Power consumption(W) | 70W | 185W | 4.5W | 
| Handling capacity(GOP) | 0.94 | 202 | 145 | 
| Energy efficiency ratio(GOP/W) | 0.15 | 1.13 | 31.4 | 
| Clock frequency | 2.8GHz | 1321MHz | 201MHz | 
