Otwarty dostęp

Language model optimization and corpus construction techniques for Japanese speech recognition

 oraz   
21 mar 2025

Zacytuj
Pobierz okładkę

The framework of speech recognition system consists of two core modules: acoustic model (AM) and language model (LM). In this paper, we design a Japanese speech recognition model composed of a Bi LSTM-CTC-based acoustic model and a RNN-based language model. The training rate of the RNN language model can be improved by proposing a parallel optimized training algorithm based on batch processing (mini-batch). And the Japanese speech recognition corpus is constructed by speech-to-text alignment technique. Comparing the perplexity of the language model, the perplexity of the optimized RNN language model is 13.69%. The speech-text alignment technique achieves a sentence cutoff accuracy of 91.33%. The overall effectiveness of the Japanese speech recognition model based on Bi LSTM-CTC is evaluated by word error rate. The word error rates of the optimized speech recognition models designed in this paper are all lower than those of the baseline model and other speech recognition models. It shows that the language model optimization method and corpus construction technique chosen in this paper can improve the generalization ability of the Japanese speech recognition model.

Język:
Angielski
Częstotliwość wydawania:
1 razy w roku
Dziedziny czasopisma:
Nauki biologiczne, Nauki biologiczne, inne, Matematyka, Matematyka stosowana, Matematyka ogólna, Fizyka, Fizyka, inne