A Quantitative Analysis Study of Rhetorical Strategies in English Speeches Based on an Internet Corpus
Published Online: Mar 19, 2025
Received: Nov 03, 2024
Accepted: Feb 11, 2025
DOI: https://doi.org/10.2478/amns-2025-0416
Keywords
© 2025 Chao Fang, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Corpus is an important tool in linguistic research, which is to collect, store and analyze a large number of text samples of actual language use. Through the use of corpus, researchers can reveal the laws and characteristics of language [1–2]. After entering the 21st century, the development of corpus linguistics has crossed a new stage, which is mainly characterized by the development and construction of various types of large-scale online corpora, which show the trend of super-large scale, richer types and wider applications [3–6]. Web corpus refers to a large number of corpus resources collected and organized from the Internet. By building a web corpus, we can obtain a large amount of linguistic, cultural and social information, which is of great value in English speech [7–9].
Speech is an important way of communication, and rhetoric is an integral part of speech. Rhetoric refers to the use of rhetorical devices, including metaphor, personification, prose, rhetorical questions, etc. The application of these rhetorical devices can make the speech more vivid and interesting, and at the same time, it can better attract the attention of the audience [10–13]. Rhetoric is an indispensable part of English speech, which can make the speech more vivid, interesting and persuasive. Through learning and practicing, we can better master the skills of using rhetorical devices, improve our speaking ability, and make our speeches more wonderful [14–17].
In order to study the different rhetorical strategy tendencies of English speakers with different linguistic backgrounds, this paper chooses rhetorical structure theory as the theoretical basis of the study. TED global conference speeches are selected as corpus samples of native English speakers, while the English speeches of Chinese intermediate and advanced EFL learners are collected as corpus samples of the study. In order to maximize objectivity and accuracy of the study, a quantitative approach was taken to analyze the rhetorical strategies of English speeches of Chinese and EFL speakers. The significance of the difference in the frequency of rhetorical strategies used by the two groups of speakers was analyzed by comparing the chi-square values in the ANOVA. Antconc software was used to calculate the tendencies of connective use strategies among Chinese, British, and American speakers. Finally, the English speech metatexts of the two groups of speakers were compared.
Rhetorical structure theory is a theory used to describe rhetorical relations and rhetorical strategies between clauses [18]. A base semantic unit refers to the smallest unit in a discourse analysis, which by itself is sufficient to express complete and continuous information and has independent semantic integrity. A document is formed by organizing these base semantic units according to certain rules that define how different types of text can be combined to form a document. In this theory, each basic semantic unit plays a different role in semantic expression, and these roles are organized and linked together through rhetorical relations.
In rhetorical structure theory, a rhetorical relationship connects every two non-overlapping texts. The core semantic unit is the part of the text fragment that conveys important semantic information, and the satellite semantic units are the text fragments that support it. For example, in the BACKGROUND relationship, one part of the text fragment expresses the author’s core point of view, and the other part provides additional descriptive information for the reader to better understand the author’s intention. In the study, Mann and Thompson formalized the theory of rhetorical structure and categorized and extracted several typical rhetorical relationships. Each rhetorical relation is defined in the article through the following four aspects:
Restrictions on core semantic units. Restrictions on satellite semantic units. Restrictions on combinations of core and satellite semantic units. Rhetorical effects.
For example, in the definition of the EVIDENCE relation, the restriction on the core semantic unit is that the author believes that it is difficult for the reader to achieve the level of trust expected by the author in terms of trust in the semantics expressed by the core semantic unit. The restriction on satellite semantic units is that readers should trust the semantics expressed by them. Limitations on the combination of core and satellite semantic units: readers’ understanding of satellite semantic units can increase their trust in core semantic units. The rhetorical effect is that readers’ trust in the core semantic unit increases. A precise definition of these rhetorical relationships is the best tool to use to identify the relationships between text segments.
The theory of RST relies on the concept of rhetorical relations. Rhetorical relations exist between two segments that do not overlap each other but are clearly connected, and they are unstated but derivable relational propositions [19]. The concept of a segment in RST refers to any part of a textual segment that has a functional wholeness from the point of view of textual organization. Relationships exist between discourse segments and are identified through a definition of relationship.
The definition of relationship is the basis and criteria for determining the relationship between two discourse segments, which includes two aspects:
Constraints - including core discourse segment constraints, auxiliary discourse segment constraints and joint constraints of these two discourse segments. Effect - a description of the effect and the location of the effect that the author wishes to achieve by using a given relation.
Among the two discourse segments constituting a rhetorical relation, because of their respective roles in realizing the author’s communicative intent, the more important one is called the core unit and the relatively less important one is called the auxiliary unit. In addition to the core-subsidiary category of monocore rhetorical relations, there is also a category of multicore rhetorical relations in discourse in which the two (or more) segments are equally important in realizing the author’s communicative intent, and are therefore all core.
The N-S distinction is a reflection of the parts of any multi-unit discourse that fulfill the author’s central goals, and the parts that complement or are subservient to these goals. If S is deleted from a given relation, the N left behind still fulfills essentially the same function in the discourse. But if N is deleted and only S is left, the discourse is no longer as coherent. Also, unlike N, S can be replaced with different information without changing the function of the entire segment.
RST highlights that the various rhetorical relationships present in a discourse are closely linked to coherence, and thus can be utilized to explain the coherence of a chapter. On the one hand, because the effect of a particular relation may need to be expressed through a complex unit containing other relations, the various relations can be applied recursively to a discourse piece until all the units in the discourse piece are constituents of a particular rhetorical structural relation; on the other hand, the effect of a particular discourse piece can be summed up by a single uppermost level of relation, but it can be further broken down into the various relations that cause the effect. RST’s notion of chapter structure is defined in terms of a structure of relations between discourse segments that are one layer larger than the next, and a given discourse can be considered coherent if its RST structure is a connected whole, with each discourse segment being connected to that structure in some way.
Theories of rhetorical structure can be represented by tree diagrams. Mann and Thompson came up with five basic sub-tree models that can be used to represent the rhetorical relationship between two discourse segments. Figure 1 shows examples of these sub-tree models and relationships. These subtree models consist of three parts: a certain number of discourse segments, a description of the relationship between the segments, and a description of the relationship of a given segment to the discourse as a whole. In a discourse, a discourse segment is a structural fragment of any length that can be either a core or a subsidiary part.

Five basic models of rhetoric structure theory
The horizontal lines in the graph represent the segments of the chapter being analyzed, the vertical (and diagonal) lines represent the segments identified as “core units”, the names on the arcs indicate the relationships between the segments being analyzed, and the arrows start from the auxiliary units and point to the core units. The graph exhibits a linear relationship between the core unit and the auxiliary units. In sequences and linkage relations, there are only core units because there are no corresponding auxiliary units. The subtree model (a) can represent various asymmetric relationships. Subtree models (b, c, and e) correspond to symmetric relations, and the dyadic relations are convertible, while the join and sequence relations are not. The join relation does not establish content relations between its child nodes, therefore there are no arcs between these nodes.
The study firstly formulated six hot topics in the directions of politics, economy, society, culture and education, environmental protection and science and technology, which are closely related to the real life of the society, and based on these topics, selected TED global conference speeches with wide influence around the world as the reference group of discourse samples for advanced proficient speakers of English who are representative of (approximate to) the native speakers’ proficiency level. At the same time, based on these topics, corresponding speech tasks were designed, and the speeches completed by Chinese intermediate and advanced EFL learners who have reached a comparable level of language proficiency were collected as the discourse samples of the research group. Additionally, the speeches of winners of national English speech contests that were associated with the proposed thematic directions or topics were added as another part of the learner corpus. Admittedly, the reference corpus and the individuals in the study corpus differ considerably in terms of their backgrounds, such as identity, education, qualifications, age, and so on. However, the focus of this study is not on comparing the superiority of one with the other, but rather on revealing what exactly are the similarities and differences between the discourses of the Chinese English speakers and the rhetorical strategies of the (near-)native speakers of English and on the regularities, characteristics, and differences between them.
It should be noted that the study was preceded by research on the websites of the International English Speaking Association (IESA) and the Spoken English Consortium (SEC), the English Academic Speaking Corpus (EASC), the BNC and COCA. Speaking Corpus, the Chinese Students’ Corpus of Spoken and Written English (SWECCL) (2.0) (2008), the Parallel Corpus of Chinese College Student Interpreters (PACCEL-S) (2009), and the Chinese Learners’ Corpus of Spoken English (CLCSE) (2009). “Chinese Corpus of Oral English for Learners of English (COLSEC) (2015)”, etc., which examined the various types of speech corpus for English learners or users provided therein or the spoken corpus related to the form of speech discourse. Considering the fact that the comparable corpus should meet the factors that the comparative corpus should be basically balanced in terms of capacity and structure in terms of the specific type of discourse in focus, the identity, language, culture and educational background of the speakers, as well as the occasions, forms and themes of the speeches, including the cooccurrences, it is difficult to find well-matched corpus samples from the readily available corpus resources. In the process of corpus selection, the researcher has made full reference to the measurement standards of self-constructed small and medium-sized specialized corpora for language research, in order to try to ensure that the corpus samples have better relevance, scale, structure, effectiveness and representativeness.
The data on rhetorical strategies in speech discourse are shown in Table 1. Chinese speakers used a total of 29 rhetorical strategies, with a total usage frequency of 1,451 frequency times, including 22 types of monocore relations, with a total usage of 1,013 frequency times, and 7 types of multicore relations, with a total usage of 438 frequency times. Native English speakers’ speeches used a total of 29 rhetorical strategies with a usage frequency of 1,201 frequencies, including 23 types of mono-core relations, which were used a total of 738 frequencies, and 6 types of multi-core relations, which were used a total of 463 frequencies. Overall, both types of speakers used both mono-core and multi-core relationship categories of rhetorical strategies and converged in the types of relationships. However, there was a convergence in the frequency of relationship use, with Chinese speakers using far more monocore rhetorical strategies and total frequency than native speakers’ discourse, with a significant difference (p=0.000), while the frequency of multi-core relationships was again lower than native speakers’ discourse, but the difference was not significant (p=0.183).
Rhetorical relationship data
| RR | Category | Frequency | ||||||
|---|---|---|---|---|---|---|---|---|
| Chinses speaker | English speaker | χ^2 | P | Chinses speaker | English speaker | χ^2 | Sig. | |
| Mono-Nuclear | 22 | 23 | 0.1681 | 0.652 | 1013 | 738 | 28.895 | 0.000 |
| Multi-Nuclear | 7 | 6 | 0.1568 | 0.395 | 438 | 463 | 1.381 | 0.183 |
| Sum. | 29 | 29 | 0.0023 | 0.924 | 1451 | 1201 | 12.541 | 0.000 |
Both types of speech discourse make more use of monocore strategies such as detail-general-specific, comment, context, exemplification, control, willful result, willful cause strategies, and multicore strategies such as contrast, comparison, union, and juxtaposition strategies, which suggests that these seven monocore strategies and the four multicore strategies are dominant strategies in the rhetorical strategies used by native English speakers and Chinese speakers when delivering English speeches. In addition, both types of discourse are less likely to use monocore strategies such as purpose and definition strategies, and multicore strategies such as analogy strategies. Table 2 shows the statistics of single-core rhetorical strategies, while Table 3 shows the statistics of multi-core rhetorical strategies.
Statistical table of Mono-Nuclear rhetoric strategy
| No. | Rhetoric relationship | Chinese speaker | English speaker | χ^2 | p |
|---|---|---|---|---|---|
| 1 | Elaboration-general-specific(S) | 258 | 280 | 4.3172 | |
| 2 | Comment(N) | 130 | 53 | 29.7357 | |
| 3 | Background(S) | 81 | 80 | 0.0911 | 0.846 |
| 4 | Example(S) | 83 | 65 | 0.4232 | 0.563 |
| 5 | Antithesis(S) | 70 | 33 | 12.1395 | |
| 6 | Volitional result(N) | 66 | 44 | 3.3782 | 0.078 |
| 7 | Volitional cause(S) | 46 | 40 | 1.1085 | 0.334 |
| 8 | Non-Volitional cause(S) | 36 | 19 | 5.2894 | 0.055 |
| 9 | Solution hood(N) | 33 | 6 | 11.3376 | |
| 10 | Concession(S) | 28 | 21 | 0.8462 | 0.407 |
| 11 | Restatement(S) | 25 | 13 | 1.5192 | 0.131 |
| 12 | Motivation(N) | 24 | 3 | 18.5899 | |
| 13 | Non-volitional result(N) | 20 | 6 | 4.6919 | |
| 14 | Summary(N) | 18 | 38 | 12.5773 | |
| 15 | Evidence(S) | 15 | 12 | 0.2665 | 0.661 |
| 16 | Hypothesis(S) | 10 | 8 | 2.1939 | 0.131 |
| 17 | Rhetoric question(S) | 4 | 1 | 1.6688 | 0.188 |
| 18 | Elaboration-part-whole(S) | 2 | 2 | 0.3228 | 0.649 |
| 19 | Purpose(S) | 2 | 4 | 1.2768 | 0.216 |
| 20 | Definition(S) | 1 | 3 | 0.0386 | 0.213 |
| 21 | Conclusion(N) | 37 | 0 | 38.1436 | |
| 22 | Evaluation(N) | 18 | 0 | 18.6983 | |
| 23 | Topic-shift(S) | 6 | 0 | 4.7069 | 0.069 |
| 24 | Circumstance(S) | 0 | 3 | 1.2595 | 0.208 |
| 25 | Means(S) | 0 | 1 | 1.1073 | 0.945 |
| 26 | Otherwise(S) | 0 | 3 | 1.1852 | 0.372 |
| Total | 1013 | 738 | 27.1443 | ||
Statistical table of Multi-Nuclear rhetoric strategy
| No. | Rhetoric relationship | Chinese speaker | English speaker | χ^2 | p |
|---|---|---|---|---|---|
| 1 | Contrast(M) | 57 | 61 | 0.3024 | 0.561 |
| 2 | Comparison(M) | 68 | 103 | 7.8961 | |
| 3 | Joint(M) | 123 | 125 | 0.2577 | 0.613 |
| 4 | Conjunction(M) | 142 | 135 | 0.0446 | 0.842 |
| 5 | Analogy(M) | 5 | 8 | 0.1175 | 0.732 |
| 6 | Sequence(M) | 35 | 31 | 0.0634 | 0.838 |
| 7 | Parallelism(M) | 8 | 0 | 4.6563 | |
| Total | 438 | 463 | 12.6871 | 0.187 | |
The two types of speech discourse differ significantly in the specific frequencies of the 11 strategies. Chinese speakers coexisted with over- and under-use of certain rhetorical strategies, with over-use of 9 strategies and under-use of 2 strategies. The frequency of comment, motivation, control, conclusion, evaluation, and answer strategies in the speech discourse of Chinese speakers was much higher than that of native speakers, with significant differences (p<0.05). Among them, Chinese speakers’ rhetorical strategies used more commenting strategies with a total of 130 frequencies, while native English speakers’ speech discourse used only 53 frequencies, and Chinese speakers’ discourse frequency was higher than that of native speakers’ discourse with 77 frequencies.
Chinese speakers use the control strategy more often, with a total of 70 frequencies, which is 37 frequencies more than the native speakers’ discourse frequency. However, native speakers’ speeches use this strategy less often (33 frequencies). Chinese speakers’ speeches in turn used the answer strategy and the motivation strategy more often, with 33 and 24 frequencies respectively, while native speakers rarely used these two types of strategies, with 6 and 3 frequencies respectively. In addition, Chinese speakers’ speeches used conclusion strategies and evaluation strategies more often, with 37 and 18 frequencies, respectively, while native speakers’ discourse did not use these two strategies.
In addition, Chinese speakers’ speeches used involuntary reasons, topic shift, involuntary outcome, and parallel strategies significantly more frequently than native speakers’ discourse, with significant differences (p<0.05). Among them, the Chinese speakers’ speeches were 17 times more frequent than the native speakers’ discourse in terms of the frequency of the involuntary reason strategy, 6 times more frequent in terms of the frequency of the topic change strategy, 14 times more frequent in terms of the frequency of the involuntary outcome strategy, and 8 times more frequent in terms of the frequency of the parallel strategy, which were seldom or not used by the native speakers’ discourse.
Compared to the Chinese speakers’ speeches, the native speakers’ discourse used the summarization strategy, the detailed-general-specific strategy, and the comparison strategy more often. Chinese speakers’ speeches used the summarization strategy less often, only 18 times, while native speakers’ discourse used this strategy more often, a total of 38 times, with a significant difference in frequency (p=0.000). Meanwhile, Chinese speakers’ speeches used the comparison strategy less often (68 frequencies), while native speakers’ discourse used the strategy more often, totaling 103 frequencies, and the difference in their frequencies was significant (p=0.001). In addition, Chinese speakers’ speeches used the detail-general-specific strategy less (258 frequencies), while native speakers’ discourse used this strategy more, totaling 280 frequencies, with a significant frequency difference (p=0.033). This suggests that the speech structure of native English speakers develops in a linear fashion, while Chinese speakers structure their English speech in a spiral fashion.
We used Antconc software to calculate the frequency of all conjunctions in the Chinese speaker group and the British and American speaker group, and then calculated the frequency of these conjunctions per 10,000 words, and the results are shown in Figure 2.

Connectors usage frequency
The results are shown in Figure 2. The statistics show that Chinese speakers tend to use a lot of connectives with “cause and effect, clarity and transition” relationships (cause and effect: 71.58 vs. 57.68, clarity: 22.38 vs. 12.21, transition: 24.95 vs. 9.14). British and American speakers, on the other hand, preferred to use connectives that “illustrate and reinforce” relationships (illustrate: 20.81 vs. 15.33, reinforce: 6.82 vs. 2.22). There is only a slight difference between the two groups in the use of connectives in the “additional and opposite” relationship (additional: 56.68 vs. 55.82, opposite: 20.27 vs. 18.39).
In general, Chinese speakers use more connectives than British and American speakers, and the top four connectives used more frequently by Chinese speakers than British and American speakers are “consequently, whereas, as a result, on the other hand”. This shows that Chinese speakers use connectives that play the role of cause and effect and transition more frequently. We also find that the Chinese speakers only use a few causative connectives more frequently than the British and American speakers. The Chinese speakers hardly use some causative connectives and use them very infrequently, e.g., Hence, so, before and because, and the Chinese speakers tend to use as a result, consequently and since more often. Therefore, we can say that Chinese speakers lack diversity in their choice of causative connectives due to their non-native language.
Regarding the use of transitive connectives, we similarly observe a difference between Eastern and Western rhetorical strategies. The Eastern model has a tendency to be undirected and non-linear. In this rhetorical strategy, it is common for Chinese speakers to use more transitive texts, which naturally necessitate more transitive connectives. Especially Chinese speakers with higher English proficiency use more transitive connectives in their English speeches in order to make the text coherent and logical.
We also found that British and American speakers tend to use more examples in their papers, which suggests that British and American speakers may be more audience-friendly. Chinese speakers are not rude towards their audience, but they may believe that their audience should possess a high level of intelligence and knowledge, making it useless to give many examples. “Reinforcing” conjunctions such as “in fact”, “as a matter of fact”, and “on the contrary” are used by Chinese speakers. This may be due to the low frequency of these conjunctions and sampling bias. This difference needs to be studied further.
Because calculating the number of all generalized meta-texts in the main corpus is a difficult and short-term impossible task, we used random sampling to extract 120 sentences from the group of British and American speakers and 115 sentences from the group of Chinese speakers. After that, we carefully calculated the number of metatexts by determining each sentence sentence by sentence. After that, the number of meta-texts per 10,000 words was calculated. The number of metatexts is shown in Figure 3.

Meta-text quantity
It can be seen that Chinese speakers used more meta-texts for both interactive and interaction functions. However, British and American speakers use frame markers (89.82 vs. 21.15), evidential (259.38 vs. 210.71), and self-mention (102.81 vs. 67.88) more often. Frame markers are used to indicate the order or to show the different phases of a speech. Representative frame markers are “first, second, finally, as follows”. Chinese speakers are less aware of using frame markers as a rhetorical strategy to establish logic and coherence in their speeches. At the same time, although a considerable number of Chinese speakers have some knowledge of this rhetorical strategy, they often make mistakes when using English rhetorical strategies because of their limited English proficiency and lack of vocabulary.
British and American speakers tend to use evidential more often, which means that they quote other people’s opinions more often to support and confirm their own opinions in their speeches, which is due to the advantage brought by their mother tongue. For British and American speakers, it is easier to read a lot of English materials. For Chinese speakers, although they have been learning English for many years, their speed in reading English materials is often not comparable to that of native speakers.
In the secondary corpus, the Anglo-American speaker corpus has more “self-references”. This seems to contradict the findings of the main corpus. This may be a bias due to random sampling. However, the preference for “author’s presence” shown in the secondary corpus is consistent with the previous findings, i.e., the majority of self-references in the British and American speaker groups are “I”, while a portion of the self-references in the Chinese speakers are “I” and “we”.
Overall, both Chinese and Anglo-American speakers used both monocore and multicore relation rhetorical strategies. The frequency of use of commenting, motivating, controlling, concluding, evaluating, and answering strategies in the English speeches of Chinese speakers was significantly higher than that of native speakers (p<0.05). Chinese speakers’ speeches also used involuntary reasons, topic shifts, involuntary results, and parallel strategies all significantly more frequently than native speakers’ discourse (p<0.05), while summarization strategies were used less frequently. For every 10,000 words of speech, Chinese speakers tend to use more causal, explicit, and transitive connectives, but the frequency of causal connectives is not balanced and lacks variety compared to native speakers. We conclude that Eastern rhetorical strategies tend to be non-linear and undirected, while Western research rhetorical strategies are linear. It was also found that British and American speakers used more rhetorical strategies such as examples, structural labeling, and citation. The findings above amply demonstrate that English speakers from different linguistic backgrounds use different rhetorical strategies in their speeches.
