Accesso libero

Language model optimization and corpus construction techniques for Japanese speech recognition

 e   
21 mar 2025
INFORMAZIONI SU QUESTO ARTICOLO

Cita
Scarica la copertina

The framework of speech recognition system consists of two core modules: acoustic model (AM) and language model (LM). In this paper, we design a Japanese speech recognition model composed of a Bi LSTM-CTC-based acoustic model and a RNN-based language model. The training rate of the RNN language model can be improved by proposing a parallel optimized training algorithm based on batch processing (mini-batch). And the Japanese speech recognition corpus is constructed by speech-to-text alignment technique. Comparing the perplexity of the language model, the perplexity of the optimized RNN language model is 13.69%. The speech-text alignment technique achieves a sentence cutoff accuracy of 91.33%. The overall effectiveness of the Japanese speech recognition model based on Bi LSTM-CTC is evaluated by word error rate. The word error rates of the optimized speech recognition models designed in this paper are all lower than those of the baseline model and other speech recognition models. It shows that the language model optimization method and corpus construction technique chosen in this paper can improve the generalization ability of the Japanese speech recognition model.

Lingua:
Inglese
Frequenza di pubblicazione:
1 volte all'anno
Argomenti della rivista:
Scienze biologiche, Scienze della vita, altro, Matematica, Matematica applicata, Matematica generale, Fisica, Fisica, altro