Open Access

Language model optimization and corpus construction techniques for Japanese speech recognition

 and   
Mar 21, 2025

Cite
Download Cover

Boulton, A., & Cobb, T. (2017). Corpus use in language learning: A meta‐analysis. Language learning, 67(2), 348-393. BoultonA. & CobbT. (2017). Corpus use in language learning: A meta‐analysis. Language learning, 67(2), 348-393.Search in Google Scholar

Gablasova, D., Brezina, V., & McEnery, T. (2017). Exploring learner language through corpora: Comparing and interpreting corpus frequency information. Language Learning, 67(S1), 130-154. GablasovaD.BrezinaV. & McEneryT. (2017). Exploring learner language through corpora: Comparing and interpreting corpus frequency information. Language Learning, 67(S1), 130-154.Search in Google Scholar

Dunn, J. (2020). Mapping languages: The corpus of global language use. Language Resources and Evaluation, 54(4), 999-1018. DunnJ. (2020). Mapping languages: The corpus of global language use. Language Resources and Evaluation, 54(4), 999-1018.Search in Google Scholar

Gablasova, D., Brezina, V., & McEnery, T. (2017). Collocations in corpus‐based language learning research: Identifying, comparing, and interpreting the evidence. Language learning, 67(S1), 155-179. GablasovaD.BrezinaV. & McEneryT. (2017). Collocations in corpus‐based language learning research: Identifying, comparing, and interpreting the evidence. Language learning, 67(S1), 155-179.Search in Google Scholar

Kawazoe, Y., Shibata, D., Shinohara, E., Aramaki, E., & Ohe, K. (2021). A clinical specific BERT developed using a huge Japanese clinical text corpus. Plos one, 16(11), e0259763. KawazoeY.ShibataD.ShinoharaE.AramakiE. & OheK. (2021). A clinical specific BERT developed using a huge Japanese clinical text corpus. Plos one, 16(11), e0259763.Search in Google Scholar

Hayashibe, Y. (2020, May). Japanese realistic textual entailment corpus. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 6827-6834). HayashibeY. (2020, May). Japanese realistic textual entailment corpus. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 6827-6834).Search in Google Scholar

Iida, R., Komachi, M., Inoue, N., Inui, K., & Matsumoto, Y. (2017). NAIST text corpus: Annotating predicate-argument and coreference relations in Japanese. Handbook of linguistic annotation, 1177-1196. IidaR.KomachiM.InoueN.InuiK. & MatsumotoY. (2017). NAIST text corpus: Annotating predicate-argument and coreference relations in Japanese. Handbook of linguistic annotation, 1177-1196.Search in Google Scholar

Omura, M., & Asahara, M. (2018, November). Ud-japanese bccwj: Universal dependencies annotation for the balanced corpus of contemporary written japanese. In Proceedings of the Second Workshop on Universal Dependencies (UDW 2018) (pp. 117-125). OmuraM. & AsaharaM. (2018, November). Ud-japanese bccwj: Universal dependencies annotation for the balanced corpus of contemporary written japanese. In Proceedings of the Second Workshop on Universal Dependencies (UDW 2018) (pp. 117-125).Search in Google Scholar

Vitalaru, B. (2022). Public service interpreting and translation: training and useful skills for the labour market. TRANS: revista de traductología, (26), 329-347. VitalaruB. (2022). Public service interpreting and translation: training and useful skills for the labour market. TRANS: revista de traductología, (26), 329-347.Search in Google Scholar

Clouet, R. (2021). Foreign languages applied to translation and interpreting as languages for specific purposes: claims and implications. RLA. Revista de Lingüística Teórica y Aplicada, 59(1), 39-62. ClouetR. (2021). Foreign languages applied to translation and interpreting as languages for specific purposes: claims and implications. RLA. Revista de Lingüística Teórica y Aplicada, 59(1), 39-62.Search in Google Scholar

Oksana Andriivna, B., Olena, K., Oksana Pavlivna, K., & Valeriia Mykhaylivna, S. (2020). Using distance EdTech for remote foreign language teaching during the COVID-19 lockdown in Ukraine. Arab World English Journal (AWEJ) Special Issue on the English Language in Ukrainian Context. Oksana AndriivnaB.OlenaK.Oksana PavlivnaK. & Valeriia MykhaylivnaS. (2020). Using distance EdTech for remote foreign language teaching during the COVID-19 lockdown in Ukraine. Arab World English Journal (AWEJ) Special Issue on the English Language in Ukrainian Context.Search in Google Scholar

Kruse, I., Lutskovskaia, L., & Stepanova, V. V. (2022, September). Advantages and disadvantages of distance teaching in foreign language education during COVID-19. In Frontiers in Education (Vol. 7, p. 964135). Frontiers Media SA. KruseI.LutskovskaiaL. & StepanovaV. V. (2022, September). Advantages and disadvantages of distance teaching in foreign language education during COVID-19. In Frontiers in Education (Vol. 7, p. 964135). Frontiers Media SA.Search in Google Scholar

Mori, D., Ohta, K., Nishimura, R., Ogawa, A., & Kitaoka, N. (2024). Recognition of target domain Japanese speech using language model replacement. EURASIP Journal on Audio, Speech, and Music Processing, 2024(1), 40. MoriD.OhtaK.NishimuraR.OgawaA. & KitaokaN. (2024). Recognition of target domain Japanese speech using language model replacement. EURASIP Journal on Audio, Speech, and Music Processing, 2024(1), 40.Search in Google Scholar

Fu, J., Chiba, Y., Nose, T., & Ito, A. (2020). Language modeling in speech recognition for grammatical error detection based on neural machine translation. Acoustical Science and Technology, 41(5), 788-791. FuJ.ChibaY.NoseT. & ItoA. (2020). Language modeling in speech recognition for grammatical error detection based on neural machine translation. Acoustical Science and Technology, 41(5), 788-791.Search in Google Scholar

Hori, T., Cho, J., & Watanabe, S. (2018, December). End-to-end speech recognition with word-based RNN language models. In 2018 IEEE spoken language technology workshop (SLT) (pp. 389-396). IEEE. HoriT.ChoJ. & WatanabeS. (2018, December). End-to-end speech recognition with word-based RNN language models. In 2018 IEEE spoken language technology workshop (SLT) (pp. 389-396). IEEE.Search in Google Scholar

Lee, J. F. (2018). Gender representation in Japanese EFL textbooks–a corpus study. Gender and Education, 30(3), 379-395. LeeJ. F. (2018). Gender representation in Japanese EFL textbooks–a corpus study. Gender and Education, 30(3), 379-395.Search in Google Scholar

Zhang, J., & Matsumoto, T. (2019). Corpus augmentation for neural machine translation with Chinese-Japanese parallel corpora. Applied sciences, 9(10), 2036. ZhangJ. & MatsumotoT. (2019). Corpus augmentation for neural machine translation with Chinese-Japanese parallel corpora. Applied sciences, 9(10), 2036.Search in Google Scholar

Zhang, J., Tian, Y., Mao, J., Han, M., & Matsumoto, T. (2022). WCC-JC: a web-crawled corpus for Japanese-Chinese neural machine translation. Applied Sciences, 12(12), 6002. ZhangJ.TianY.MaoJ.HanM. & MatsumotoT. (2022). WCC-JC: a web-crawled corpus for Japanese-Chinese neural machine translation. Applied Sciences, 12(12), 6002.Search in Google Scholar

Eslam E. El Maghraby& Amr M. Gody. (2020). Noise robust speech recognition system using multimodal audio-visual approach using different deep learning classification techniques. International Journal of Advanced Computer Research(IJACR)(47),51-71. El MaghrabyEslam E.& GodyAmr M.. (2020). Noise robust speech recognition system using multimodal audio-visual approach using different deep learning classification techniques. International Journal of Advanced Computer Research(IJACR)(47),51-71.Search in Google Scholar

Arun Kumar,Nishant Gaur & Aziz Nanthaamornphong. (2024). Bi-LSTM Based Deep Learning Algorithm for NOMA-MIMO Signal Detection System. National Academy Science Letters(prepublish), 1-4. KumarArunGaurNishant & NanthaamornphongAziz. (2024). Bi-LSTM Based Deep Learning Algorithm for NOMA-MIMO Signal Detection System. National Academy Science Letters(prepublish), 1-4.Search in Google Scholar

Brahmaleen Kaur Sidhu. (2024). Explore the N-Gram Model of Adaptive Prediction Text. Journal of Educational Research and Policies(9),19-21. SidhuBrahmaleen Kaur. (2024). Explore the N-Gram Model of Adaptive Prediction Text. Journal of Educational Research and Policies(9),19-21.Search in Google Scholar

Panda Soumya Priyadarsini & Nayak Ajit Kumar. (2018). A Context-based Numeral Reading Technique for Text to Speech Systems. International Journal of Electrical and Computer Engineering (IJECE)(6), 4533-4544. PriyadarsiniPanda Soumya & KumarNayak Ajit. (2018). A Context-based Numeral Reading Technique for Text to Speech Systems. International Journal of Electrical and Computer Engineering (IJECE)(6), 4533-4544.Search in Google Scholar

Language:
English