Open Access

A Study on the Evolution of Language Style in Japanese Academic Articles Based on Text Mining

  
Mar 17, 2025

Cite
Download Cover

Distribution of sentence length of Japanese academic articles

Period Item 1~15 16~30 31~45 46~60 61~75 >75 Total
1981-1985 Sentence number 11876 8250 6144 4601 3368 2135 36374
Proportion 32.65% 22.68% 16.89% 12.65% 9.26% 5.87% 100%
1986-1990 Sentence number 12530 9344 6642 5170 3551 2440 39677
Proportion 31.58% 23.55% 16.74% 13.03% 8.95% 6.15% 100%
1991-1995 Sentence number 13779 9841 7338 5675 3630 2597 42860
Proportion 32.15% 22.96% 17.12% 13.24% 8.47% 6.06% 100%
1996-2000 Sentence number 13357 11024 7080 6237 4330 2334 44362
Proportion 30.11% 24.85% 15.96% 14.06% 9.76% 5.26% 100%
2001-2005 Sentence number 13577 10327 8490 8142 5647 2166 48349
Proportion 28.08% 21.36% 17.56% 16.84% 11.68% 4.48% 100%
2006-2010 Sentence number 14343 10728 9594 8796 5873 3908 53242
Proportion 26.94% 20.15% 18.02% 16.52% 11.03% 7.34% 100%
2011-2015 Sentence number 16058 11881 9756 8608 5850 4369 56522
Proportion 28.41% 21.02% 17.26% 15.23% 10.35% 7.73% 100%
2016-2020 Sentence number 17761 13345 10628 9664 6236 2615 60249
Proportion 29.48% 22.15% 17.64% 16.04% 10.35% 4.34% 100%

ALW and DLW of Japanese academic articles from 1981 to 2020

Period Total character number Total word number Average length of word Dispersion length of word
1981-1985 1585188 864852 1.8329 0.358
1986-1990 1733027 931985 1.8595 0.362
1991-1995 1881979 997286 1.8871 0.346
1996-2000 2032242 1086470 1.8705 0.351
2001-2005 2245315 1187746 1.8904 0.338
2006-2010 2487462 1298868 1.9151 0.345
2011-2015 2723730 1406522 1.9365 0.346
2016-2020 2968347 1521683 1.9507 0.346

Statistical results of segmented sentence length of Japanese academic articles

Period Character number Segmented sentence number Segmented sentence length
1981-1985 1585188 206674 7.67
1986-1990 1733027 225949 7.67
1991-1995 1881979 304527 6.18
1996-2000 2032242 347392 5.85
2001-2005 2245315 418122 5.37
2006-2010 2487462 515002 4.83
2011-2015 2723730 599941 4.54
2016-2020 2968347 711834 4.17

Statistical results of type-token ratio of Japanese academic articles of 1981-2020

Period Total word number Type number Type-token ratio
1981-1985 864852 24109 35.8726
1986-1990 931985 64061 14.5484
1991-1995 997286 63524 15.6994
1996-2000 1086470 70239 15.4682
2001-2005 1187746 171101 6.9418
2006-2010 1298868 171165 7.5884
2011-2015 1406522 87635 16.0498
2016-2020 1521683 100517 15.1386

Word density of Japanese academic articles of 1981-2020

Period Real word Total word number Word density
1981-1985 751470 864852 0.8689
1986-1990 811107 931985 0.8703
1991-1995 791446 997286 0.7936
1996-2000 941209 1086470 0.8663
2001-2005 1034646 1187746 0.8711
2006-2010 1081308 1298868 0.8325
2011-2015 1159677 1406522 0.8245
2016-2020 1292213 1521683 0.8492

Distribution of word length

Period Monosyllable frequency Two-syllable frequency Trisyllable frequency Four-syllable frequency Above-four-syllable frequency
1981-1985 70.62% 27.36% 1.26% 0.64% 0.12%
1986-1990 68.52% 26.92% 2.73% 1.29% 0.54%
1991-1995 67.05% 26.79% 2.76% 2.06% 1.34%
1996-2000 66.93% 26.84% 3.37% 2.21% 0.65%
2001-2005 66.65% 26.68% 3.73% 2.38% 0.56%
2006-2010 64.44% 25.52% 4.24% 4.27% 1.53%
2011-2015 63.79% 24.92% 5.36% 4.84% 1.09%
2016-2020 62.78% 23.55% 5.96% 6.03% 1.68%

Character and sentence number and average sentence length of Japanese academic articles

Period Character number Sentence number Average sentence length
1981-1985 1585188 36374 43.58
1986-1990 1733027 39677 43.68
1991-1995 1881979 42860 43.91
1996-2000 2032242 44362 45.81
2001-2005 2245315 48349 46.44
2006-2010 2487462 53242 46.72
2011-2015 2723730 56522 48.19
2016-2020 2968347 60249 49.27

Statistical results of single occurrence word of Japanese academic articles of 1981-2020

Period Single occurrence word number Total word number Accumulative frequency
1981-1985 10724 864852 0.0124
1986-1990 19292 931985 0.0207
1991-1995 25431 997286 0.0255
1996-2000 31073 1086470 0.0286
2001-2005 106185 1187746 0.0894
2006-2010 93909 1298868 0.0723
2011-2015 69763 1406522 0.0496
2016-2020 76388 1521683 0.0502

Lexical richness of Japanese academic articles of 1981-2020

Period Word density Type-token ratio Single occurrence word frequency
1981-1985 0.8689 35.8726 0.0124
1986-1990 0.8703 14.5484 0.0207
1991-1995 0.7936 15.6994 0.0255
1996-2000 0.8663 15.4682 0.0286
2001-2005 0.8711 6.9418 0.0894
2006-2010 0.8325 7.5884 0.0723
2011-2015 0.8245 16.0498 0.0496
2016-2020 0.8492 15.1386 0.0502
Language:
English