Language model optimization and corpus construction techniques for Japanese speech recognition

The framework of speech recognition system consists of two core modules: acoustic model (AM) and language model (LM). In this paper, we design a Japanese speech recognition model composed of a Bi LSTM-CTC-based acoustic model and a RNN-based language model. The training rate of the RNN language model can be improved by proposing a parallel optimized training algorithm based on batch processing (mini-batch). And the Japanese speech recognition corpus is constructed by speech-to-text alignment technique. Comparing the perplexity of the language model, the perplexity of the optimized RNN language model is 13.69%. The speech-text alignment technique achieves a sentence cutoff accuracy of 91.33%. The overall effectiveness of the Japanese speech recognition model based on Bi LSTM-CTC is evaluated by word error rate. The word error rates of the optimized speech recognition models designed in this paper are all lower than those of the baseline model and other speech recognition models. It shows that the language model optimization method and corpus construction technique chosen in this paper can improve the generalization ability of the Japanese speech recognition model.

Lingua:: Inglese

Frequenza di pubblicazione:: 1 volte all'anno
Argomenti della rivista:: Scienze biologiche, Scienze della vita, altro, Matematica, Matematica applicata, Matematica generale, Fisica, Fisica, altro

Feed RSS della rivista

Language model optimization and corpus construction techniques for Japanese speech recognition

Zhou Huang

Yang Cao

Pubblicato online: 21 mar 2025

Ricevuto: 08 nov 2024

Accettato: 24 feb 2025

DOI: https://doi.org/10.2478/amns-2025-0697

Parole chiaveBi LSTM-CTC acoustic model, RNN language model, Parallel optimization, Speech-to-text alignment technique, Japanese speech recognition

© 2025 Zhou Huang et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Parole chiave
Bi LSTM-CTC acoustic model, RNN language model, Parallel optimization, Speech-to-text alignment technique, Japanese speech recognition