Language model optimization and corpus construction techniques for Japanese speech recognition
Data publikacji: 21 mar 2025
Otrzymano: 08 lis 2024
Przyjęty: 24 lut 2025
DOI: https://doi.org/10.2478/amns-2025-0697
Słowa kluczowe
© 2025 Zhou Huang et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The framework of speech recognition system consists of two core modules: acoustic model (AM) and language model (LM). In this paper, we design a Japanese speech recognition model composed of a Bi LSTM-CTC-based acoustic model and a RNN-based language model. The training rate of the RNN language model can be improved by proposing a parallel optimized training algorithm based on batch processing (mini-batch). And the Japanese speech recognition corpus is constructed by speech-to-text alignment technique. Comparing the perplexity of the language model, the perplexity of the optimized RNN language model is 13.69%. The speech-text alignment technique achieves a sentence cutoff accuracy of 91.33%. The overall effectiveness of the Japanese speech recognition model based on Bi LSTM-CTC is evaluated by word error rate. The word error rates of the optimized speech recognition models designed in this paper are all lower than those of the baseline model and other speech recognition models. It shows that the language model optimization method and corpus construction technique chosen in this paper can improve the generalization ability of the Japanese speech recognition model.
