Combined Application of Speech Recognition and Natural Language Processing Technologies in the Electric Power Industry
, , , and
Mar 19, 2025
About this article
Published Online: Mar 19, 2025
Received: Nov 02, 2024
Accepted: Feb 02, 2025
DOI: https://doi.org/10.2478/amns-2025-0530
Keywords
© 2025 Tao Xu et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Figure 1.

Figure 2.

Figure 3.

Figure 4.

Figure 5.

The system response time varies according to different types of voice duration
Speech duration/s | Response time/s | ||
---|---|---|---|
Real-time voice | Voice file | Telephone terminal | |
5 | 2.6 | 1.6 | 3.3 |
10 | 3.0 | 1.9 | 3.8 |
20 | 4.5 | 3.0 | 4.9 |
30 | 5.3 | 4.4 | 5.8 |
60 | 8.5 | 7.1 | 9.7 |
Comparison of different input features
Data set | Input feature | CER% |
---|---|---|
THCHS-30 | Spectrogram | 15.72 |
Fbank | 17.28 | |
MFCC | 18.68 | |
AISHELL-1 | Spectrogram | 15.54 |
Fbank | 16.77 | |
MFCC | 19.30 |
Comparison of experimental results on Power voice data set
Method | Parameter quantity | WER(%) | RTF |
---|---|---|---|
Discriminative | 20 | 11.42 | 0.040 |
Pseudo Visual | 23 | 13.34 | 0.047 |
Vanilla Transformer | 25 | 10.62 | 0.031 |
Speech-Transformer | 26 | 10.06 | 0.026 |
Open-Transformer | 36 | 9.64 | 0.029 |
Ours | 25 | 8.21 | 0.017 |