Language model optimization and corpus construction techniques for Japanese speech recognition

The framework of speech recognition system consists of two core modules: acoustic model (AM) and language model (LM). In this paper, we design a Japanese speech recognition model composed of a Bi LSTM-CTC-based acoustic model and a RNN-based language model. The training rate of the RNN language model can be improved by proposing a parallel optimized training algorithm based on batch processing (mini-batch). And the Japanese speech recognition corpus is constructed by speech-to-text alignment technique. Comparing the perplexity of the language model, the perplexity of the optimized RNN language model is 13.69%. The speech-text alignment technique achieves a sentence cutoff accuracy of 91.33%. The overall effectiveness of the Japanese speech recognition model based on Bi LSTM-CTC is evaluated by word error rate. The word error rates of the optimized speech recognition models designed in this paper are all lower than those of the baseline model and other speech recognition models. It shows that the language model optimization method and corpus construction technique chosen in this paper can improve the generalization ability of the Japanese speech recognition model.

Język:: Angielski

Częstotliwość wydawania:: 1 razy w roku
Dziedziny czasopisma:: Nauki biologiczne, Nauki biologiczne, inne, Matematyka, Matematyka stosowana, Matematyka ogólna, Fizyka, Fizyka, inne

Kanał RSS czasopisma

Language model optimization and corpus construction techniques for Japanese speech recognition

Zhou Huang

Yang Cao

Data publikacji: 21 mar 2025

Otrzymano: 08 lis 2024

Przyjęty: 24 lut 2025

DOI: https://doi.org/10.2478/amns-2025-0697

Słowa kluczoweBi LSTM-CTC acoustic model, RNN language model, Parallel optimization, Speech-to-text alignment technique, Japanese speech recognition

© 2025 Zhou Huang et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Słowa kluczowe
Bi LSTM-CTC acoustic model, RNN language model, Parallel optimization, Speech-to-text alignment technique, Japanese speech recognition