Research on Corpus-Based Linguistic Feature Analysis and Pattern Recognition in English Majors in Colleges and Universities
and
Mar 19, 2025
About this article
Published Online: Mar 19, 2025
Received: Oct 28, 2024
Accepted: Jan 31, 2025
DOI: https://doi.org/10.2478/amns-2025-0474
Keywords
© 2025 Yuemei Fu et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Figure 1.

Figure 2.

Figure 3.

The keyword search statistics result
| Survey content | The corpus retrieves the sort | |||||
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | ||
| Mountains of California | Frequency | 7905 | 4741 | 3849 | 2612 | 774 |
| Key | 341.05 | 339.78 | 7.46 | 50.03 | 17.41 | |
| Key words | the | of | and | in | with | |
| Travel to Alaska | Frequency | 5990 | 3832 | 2944 | 1611 | 865 |
| Key | 108.99 | 7.31 | 82.65 | 3.51 | 76.44 | |
| Key words | the | and | of | in | on | |
| Long grass | Frequency | 4940 | 2465 | 2077 | 1365 | 1237 |
| Key | 197.58 | 1.03 | 33.49 | 24.60 | 45.18 | |
| Key words | the | and | of | a | in | |
| Natural path | Frequency | 5316 | 2421 | 1540 | 1348 | 595 |
| Key | 67.03 | 72.38 | 1.59 | 27.44 | 26.92 | |
| Key words | the | of | a | in | it | |
The relative common frequency and meaning of behavioral characteristics
| Identification code | Tag hierarchy | To manage | To lake | To go very raidly |
|---|---|---|---|---|
| Tropism | Intransients | 0.0143 | 0.6667 | 0.9057 |
| Single sum | 0.9571 | 0.1250 | 0.0656 | |
| Verb | 0.0143 | 0.1667 | 0.0123 | |
| Complex verb | 0.0143 | 0.0416 | 0.0164 | |
| Form | Indefinite type | 0.1929 | 0.1250 | 0.2090 |
| In present tense | 0.1285 | 0.2500 | 0.0656 | |
| Now done | 0.2500 | 0.2083 | 0.2582 | |
| Past | 0.1214 | 0.1667 | 0.3770 | |
| Past participle | 0.2786 | 0.1667 | 0.0574 | |
| Imperative sentence | 0.0286 | 0.0833 | 0.0328 |
The absolute common frequency and meaning of the behavior eigenvector
| Identification code | Tag hierarchy | To manage | To lake | To go very raidly |
|---|---|---|---|---|
| Tropism | Intransients | 2 | 16 | 221 |
| Single sum | 134 | 3 | 16 | |
| Verb | 2 | 4 | 3 | |
| Complex verb | 2 | 1 | 4 | |
| Form | Indefinite type | 27 | 3 | 51 |
| In present tense | 18 | 6 | 16 | |
| Now done | 35 | 5 | 63 | |
| Past | 17 | 4 | 92 | |
| Past participle | 39 | 4 | 14 | |
| Imperative sentence | 4 | 2 | 8 |
The corpus distribution of corpus is retrieved
| Survey item | Look at the corpus | Reference corpus | ||||
|---|---|---|---|---|---|---|
| Mountains of California | Travel to Alaska | Long grass | Natural path | Walden | Maine forest | |
| Nouns(%) | 24.49 | 24.75 | 23.08 | 23.16 | 17.39 | 16.64 |
| Verbs (%) | 14.13 | 16.21 | 16.26 | 18.87 | 16.92 | 12.86 |
| Adjective (%) | 11.92 | 9.84 | 10.24 | 7.38 | 7.39 | 5.96 |
| Adverb (%) | 8.08 | 7.01 | 8.85 | 7.03 | 7.38 | 6.81 |
| Numerals (%) | 2.27 | 3.64 | 2.45 | 2.31 | 2.99 | 2.49 |
Statistical results of the word frequency of the corpus be
| Name of work | Total sentence | is | are | was | were |
|---|---|---|---|---|---|
| Mountains of California | 2866 | 1038 | 729 | 473 | 344 |
| Travel to Alaska | 2968 | 514 | 427 | 836 | 549 |
| Long grass | 2686 | 838 | 319 | 430 | 292 |
| Natural path | 3022 | 1167 | 478 | 306 | 266 |
