Acceso abierto

Research on the Methods of Enhancing Phonetic Fluency in Oral English Teaching Based on Information Technology

  
17 mar 2025

Cite
Descargar portada

Introduction

With the continuous development and progress of modern society and economy, information technology has developed rapidly, which in turn affects the development of society and economy and also brings huge space for the development of oral English education [1]. At present, many people are trying or have already applied information technology to oral English education. Because of the application of information technology in the modern process of education, the process of education and learning can break through the traditional teaching mode of the single, boring and narrow limitations. It makes the education more personalized while being popular, and makes the teaching process of oral English education more vivid and interesting [2-4].

As we all know, the ultimate goal of language learning is to achieve communication, so whether you can Speak fluent oral English is particularly important. Oral English is the use of oral language to express ideas, oral communication activities, it requires through reading and listening to obtain knowledge, information and language, through thinking and language, on the basis of the original knowledge and language to the content and language processing and reorganization, give new content, and then output, so as to complete the communication process [5-7]. Fluent oral expression ability is not achieved overnight, but can only be obtained through special training gradually accumulated over time. In order to improve oral expression ability, we must really obtain it through systematic learning and sufficient exercise [8-9]. However, in the actual teaching process, due to the influence of traditional teaching methods, the differences in teaching environment and teacher level, the differences in teaching materials and teaching equipment used, the quality of students and students’ psychology are not the same, and students’ oral English expression level is also different [10-12]. Therefore, it is necessary for information technology and students’ needs to build a mode of oral English teaching that deeply integrates information technology and classroom teaching, give full play to the advantages of information technology, and broaden the channels of oral English learning [13-15].

The integration of information technology and oral English teaching creates a more intuitive and enriched listening and speaking environment for students, prompting them to develop the desire to speak, read and use. Literature [16] emphasizes the necessity of fully integrating oral English teaching with information technology and proposes a multi-interactive multimedia teaching model based on computer technology, and the experimental results show that the proposed multi-interactive strategy for oral English is more capable of improving students’ oral English fluency than the traditional teaching model. Literature [17] takes interdisciplinary research as an entry point, based on the relevant theories of ecology and system science, discusses the feasibility of the oral English teaching method based on information technology, and puts forward the solution strategies and measures for the problems existing in the oral English teaching method of the university nowadays. Literature [18] developed a mobile oral English teaching case platform based on mobile virtual technology and verified its superior performance through system testing, which has the advantages of good platform interaction performance, short response time and high user satisfaction. Literature [19] based on computer-aided learning system, designed a user management between teachers and students, English listening training, oral English training, English homework and oral learning process five function module of oral English teaching system, and through the experiment verified the effectiveness of the system, it can help students form autonomous learning ability, improve voice fluency. Literature [20] changes the method of teaching oral English in the network environment and proposes an effective platform for teaching oral English, which mainly contains four modules: oral English perception processing, oral English pronunciation acquisition program, oral English pronunciation pre-processing and modeling of oral English pronunciation error detection. Literature [21] proposes oral English teaching method in university based on virtual reality technology, which can build a specific language environment for students, so that each student can really integrate into the English environment, and then improve the fluency of speech, and scientifically and efficiently improve the effect of oral English teaching.

Taking Chinese students’ oral English pronunciation as the research object, this paper focuses on two information technologies, namely pronunciation error detection and pronunciation quality assessment, and builds an efficient intelligent evaluation model of English pronunciation quality by integrating them organically. Based on the 4C/ID model, this paper combines the model with traditional oral English teaching activities, and designs an intelligent evaluation model to assist oral English teaching. Then the application practice of the new model of oral English teaching based on the intelligent evaluation model under the information technology environment is carried out, which mainly focuses on the three aspects of oral English performance, students’ learning attitudes and the credibility of the model’s evaluation, and gradually improves the new teaching model, and finally forms a complete and effective new teaching model.

Model for evaluating the quality of oral English pronunciation
General structure of the model

The English read pronunciation quality evaluation model constructed in this paper consists of the following five parts, speech signal preprocessing model, acoustic feature extraction model, Viterbi search decoding model, and pronunciation error detection and pronunciation quality evaluation model [22]. The overall structure of the intelligent evaluation model is shown in Figure 1, in which the speech signal preprocessing module firstly adjusts and corrects the input learner’s oral English pronunciation, and after a series of processing such as pre-emphasis, frame-splitting, and windowing, it removes the influence of various interfering factors on the oral English pronunciation.

Figure 1.

Overall structure of English pronunciation quality evaluation model

Then, the acoustic feature extraction model extracts the feature parameters that most accurately characterize the acoustic properties of the speech signal from the preprocessed read-aloud pronunciations, most often the Mel Frequency Cepstrum Coefficients (MFCC). Then the Viterbi search and decoding module, under the constraints and guidance of the information provided by the knowledge base consisting of the acoustic model, the language model, and the pronunciation lexicon, forcibly aligns the learner’s read-aloud pronunciation with the given read-aloud text, and slices and dices the results of the phoneme segmentation recognition. On this basis, various evaluation features required for pronunciation quality evaluation are calculated and fed into the pronunciation error detection and pronunciation quality evaluation models respectively, which comprehensively evaluate the learners’ pronunciation quality of reading aloud in English according to the designed error detection strategy and evaluation strategy. Finally, the results of the pronunciation quality evaluation are output and fed back to the learners.

Word Pronunciation Error Detection

The process of word pronunciation phoneme error detection is shown in Figure 2:

First, acoustic features are extracted from the preprocessed learners’ English read-aloud pronunciations.

Then, the trained acoustic model is utilized to perform Viterbi alignment between the acoustic features extracted from the read-aloud pronunciation and the given read-aloud text to slice out the boundary information of each phoneme segment in the read-aloud pronunciation.

Based on this, the pronunciation standardization score required for phoneme error detection is computed.

Finally, the designed phoneme error detection strategy is used to obtain the final results for phoneme error detection.

Figure 2.

The flow of word pronunciation phoneme error detection

It can be seen that the pronunciation standardization algorithm, acoustic modeling, and phoneme error detection strategy all affect the error detection performance of word pronunciation phonemes. Next, in this paper, we will perform pronunciation error detection using the logarithmic a posteriori probability algorithm, and then propose an improvement scheme using the acoustic model.

Log posterior probability algorithm

The logarithmic a posteriori probability algorithm calculates the final phoneme score based on the a posteriori probability scores of all frames within the phoneme. For phoneme qi, the frame-level a posteriori probability score P(qi|ot) is calculated as follows, assuming that each frame corresponding to it has an observation vector of ot: P(qi|ot)=P(qi,ot)P(ot)=P(ot|qi)P(qi)j=1MP(ot|qj)P(qj) where P(qi) is the a priori probability of qi, P(ot|qi) is the likelihood of the current observation vector ot for phoneme qi, and M is the total number of phonemes within the acoustic model set.

Assuming τi is the onset of qi and di is the duration of qi, the log posterior probability score LPP of phoneme qi is the sum of the log posterior probability scores of all frame-levels within qi: LPP(qi)=t=τiτi+di1logP(qi|ot)PPi=t1t2log(P(oi|pi)P(pi)p=1MP(oi|p)P(p))

In actual phoneme articulation error detection, the spectrum of the speech signal is altered due to the differences in individual characteristics of different learners or changes in the transmission channel. The log posteriori probability algorithm, however, is calculated in such a way that the numerator and denominator parts of Equation (1) are affected by this alteration to the same extent. This makes the scoring accuracy of the log posteriori probability algorithm less affected by these changes on the whole, and enables it to focus on measuring the standardized degree of learners’ pronunciation, with a high degree of consistency with manual judgments.

Acoustic model optimization adjustment

In this paper, a second-level adaptive strategy of Maximum Likelihood Linear Regression (MLLR) and Maximum A Posteriori Probability (MAP) is adopted to adjust the model parameters. When using MLLR to adjust the acoustic model, the state Gaussian distributions of the log posterior probability algorithm all use the same transformation matrix to allow all model parameters to be updated, and then the MAP algorithm is used to adjust the model in more detail.

The specific process of MLLR-MAP Level 2 acoustic model adaptation is as follows:

1) Using a standard American acoustic model, the students’ read-aloud speech was sliced into separate phoneme segments.

2) The articulatory standardization (GOP) scores for these phoneme segments are calculated. Assuming that the observation vector corresponding to phoneme qi is ot and M is the total number of phonemes within the set of acoustic models, the GOP calculation formula is: GOP(qi)=logP(qi|ot)=logP(ot|qi)P(qi)j=1MP(ot|qj)P(qj)

3) Phoneme segments with scores higher than a preset threshold value are used as the adaptive corpus;

4) Adjusting the model parameters using the MLLR algorithm.

5) Use MAP algorithm to adjust the model in more detail.

Pronunciation quality assessment
Pronunciation standardization assessment

For the articulatory quality assessment task, this paper mainly uses two algorithms, log posterior probability and GOP, in order to obtain a higher agreement with manual scoring [23].

For phoneme qi, assuming its duration is di, the calculated log posterior probability score is LPP(qi). The log posterior probability score of the whole sentence is the average of the log posterior probability scores of all phonemes in the sentence: LPP=1Ni=1NLPP(qi)di where N is the total number of phonemes in the sentence.

Similarly, the GOP score for phoneme qi is normalized by phoneme duration and then averaged to obtain the GOP score for the entire sentence: GOP=1Ni=1NGOP(qi)di

Pronunciation fluency assessment

In this paper, we first extract five high-quality temporal metrics, namely, speech rate (Vi), articulation speed (Vp), articulation time ratio (δ), average speech fluency (φ), and average pause length (l), whose formulas are defined as follows [24]: Vl=qNTi Vp=qNTiT* δ=TiT*Ti φ=qNN* l=T*N* where T* and N* are the total time and number of pauses, respectively, and qN and Ti are the total number of phonemes and the total time in spoken reading, respectively.

In addition, the phoneme segment duration (χ) feature was extracted in the experiment, and the formula is defined as: χ=1Ni=1NlgP(di×Vp|qi) where segment duration di represents the duration of different phonemes qi in the learner’s pronunciation, di normalized with the learner’s speech rate. is then input into a prior probability distribution under phonemes qi pre-trained using standard pronunciation data, and this feature is also able to measure the fluency of the learner’s pronunciation.

Pronunciation rhythmic assessment

The rhythm of a language is the difference and similarity in the height, weight, length and urgency of speech, and the regular alternating periodicity of speech unit fragments of a certain class. Rhythm is divided into three types: complete repetition type, incomplete repetition type, and emphasized repetition type. When reading aloud and speaking, they alternate with different combinations of rhythmic groups formed as units, and their meaningful functions are shown to enhance melody and musicality.

English sentences have the following three characteristics:

1) Generally speaking, the higher the frequency of repeated syllables in a sentence, the slower the rate of speech and the clearer the syllables sound.

2) Non-stressed syllables that appear crowded between stressed syllables sound brisk and slurred.

3) The length of time it takes to speak a sentence does not depend on the number of words or syllables in the sentence, but more importantly on the number of stressed syllables in the sentence.

Stressed syllables play the role of emphasis and contrast in the organization of utterance and semantic expression, and have three characteristics, including loudness, long pronunciation and legibility.

The evaluation mechanism of oral English pronunciation rhythm is shown in Figure 3, which includes the following steps:

Figure 3.

Oral English pronunciation rhythm evaluation mechanism

Extracting the short-time energy value of speech and composing a speech intensity graph

The feature of loud stressed syllables in a sentence is directly reflected in the energy intensity in the time domain, i.e., the speech energy intensity of stressed syllables is large. According to the definition of short-time energy of speech signal s(n): En=m=[ s(n)ω(nm) ]2

Short-time energy values are extracted for speech sentences to form an intensity profile.

Regularizing sentences

Due to the difference in speech speed between different speakers, for the pronunciation of the same sentence, the sentence duration varies from speaker to speaker, but their pronunciation follows a certain pattern, i.e., the ratio of the duration of the stressed syllable in the sentence to the duration of the whole sentence is relatively fixed. Therefore, in order to facilitate data processing and obtain more objective evaluation results, before evaluating the test utterances, it is necessary to regularize the duration of the test utterances proportionally to a degree similar to the standard utterances.

Calculate the intensity profile match between the standard utterance and the input utterance

The basic principle of the Dynamic Time Warping (DTW) algorithm is Dynamic Time Warping, which matches the otherwise mismatched lengths of time between the test template and the reference template. The traditional Euclidean distance is used to calculate their similarity, setting the reference and test templates as R and T, and the smaller the distance D[T,R] the higher the similarity. The disadvantage of the traditional DTW algorithm is that when performing template matching, the weights of all frames are the same, and all templates must be matched, and the computation is relatively large, especially when the number of templates increases faster, the amount of computation grows especially fast.

In this paper, by setting the matching boundary, the intersection points that need to be operated are limited to the parallelogram. The R and T are divided into N and M frames according to equal time, which can be divided into three paths (1, Xa), (Xa + 1,Xb) and (Xb + 1,N) to calculate the distance, which can be calculated according to the coordinates: Xa=13(2MN) Xb=23(2NM)

Where, Xa, Xb take the most similar integer. When the constraints 2MN ≥ 3, 2NM ≥ 2 are not satisfied, no dynamic matching is performed, which reduces the system expenditure.

Each frame on the X-axis is matched with the frame between [ymin,ymax] on the Y-axis, ymin, ymax is calculated as follows: ymin={ 12xx[ 0,Xb]2x+(M2N)x( Xb,N] ymax={ 2xx[ 0,Xa ]12x+(M12N)x( Xa,N ]

If Xa > Xb, the matched paths can be categorized as (1,Xb), (Xb + 1,Xa), and (Xa + 1,N). For each frame forward of the X-axis, although the number of frames corresponding to the Y-axis is different, the regularity characteristics are the same, and the cumulative distance is: D(x,y)=d(x,y)+min{ D(x1,y)D(x1,y1)D(x1,y2)} where D and d denote the cumulative distance and frame matching distance, respectively.

Carry out the division of accent units and determine the number of accent pronunciations

In this paper, the double threshold comparison method is used for accent endpoint detection, which is verified by a large number of experiments, and the threshold values are set, respectively: Emphasis threshold:Tu=(max(sig_i)+min(sig_in))/2.5 $$\matrix{ {{\rm{Emphasis}}\,{\rm{threshold:}}} & {} \cr {} & {{T_u} = (\max (sig\_i) + \min (sig\_in))/2.5} \cr } $$ Non-accentuated thresholds :Tl=(max(sig_in)+min(sig_in))/10 $$\matrix{ {{\rm{Non - accentuated}}\,{\rm{thresholds}}\,{\rm{:}}} & {} \cr {} & {{T_l} = (\max (sig\_in) + \min (sig\_in))/10} \cr } $$

In the double threshold comparison method, the maximum speech energy value Smax in the sentence is searched one by one according to the energy value of the sentence, which is greater than the stress threshold Tu. Then, the speech energy values Sl and Sr are searched to the left and right of Smax, which are equal to the non-stress threshold Tl, and then the stress signal of the sentence can be set to Sl and Sr. At the same time, the energy values between Sl and Sr are set to 0, to avoid the repetition of the search between Sl and Sr. Because of the long pronunciation of the stressed syllable in the sentence, and the first step of searching out the stressed syllable unit may have a large energy value, i.e., the auditory performance is loud but short in duration.

Calculating the respective rhythmic correlations of standard and input utterances

Calculating the pairwise variability index (PVI) using the segmental durations of successive syllabic units yields syllable duration variability, and can be used as a measure of the difference between stress-timed and syllable-timed languages, i.e., the correlation of speech rhythms. The pairwise variability index, which is used to calculate the variability of duration between neighboring units, if the variability is smaller, it means that the unit is isochronous.

According to the variability characteristics of English speech unit duration, this paper adopts the improved dPVI parameter calculation formula, compares and calculates the duration of syllable unit segments of standard utterances and test utterances respectively, and uses the converted parameters for the evaluation basis of the system, specifically: dPVI=100×(k=1m1| d1kd2k |+| d1td2t |)/Len

Where d is the length of the speech unit segment divided into sentences (e.g., dk is the length of the k rd speech unit segment), m = min(Number of Std units, Number of Test units) and Len are the lengths of the standard utterances. Since the length of the test utterance has been regularized to be comparable to the length of the standard utterance before the PVI calculation, only Len can be used as the calculation unit.

6) By comprehensively comparing the number of stress, intensity curve matching degree and dPVI parameters of test sentences and standard sentences, the rhythm evaluation and feedback of English pronunciation quality are carried out.

Intelligent Evaluation Modeling to Assist in the Design of Oral English Instruction

Before carrying out the design of oral English teaching, it is first necessary to carry out the analysis of teaching needs to clarify the application scenario, object-oriented and teaching objectives of the intelligent evaluation model constructed in the previous section. The English oral pronunciation quality evaluation model proposed in this study is mainly intended to assist in English oral classroom teaching. First, students can practice speaking through the evaluation model. Secondly, teachers assist oral teaching with the help of the model.

Evaluation modeling to assist the English acquisition process

In the information environment, the language acquisition process based on the assistance of the oral English pronunciation quality evaluation model is shown in Figure 4.

Figure 4.

The process of English acquisition with the aid of evaluation Model

1) In the language input stage, the Oral English Pronunciation Quality Evaluation Model can provide learners with a moderately difficult and comprehensible corpus to study. At the same time, it can also be used as a multimedia to play the corpus that the teacher has prepared for the classroom to release the tasks and present the problems that the students need to solve.

2) In the language practice stage, the learning content provided in the English Spoken Pronunciation Quality Evaluation Model is not only interesting, but also interrelated with the learners’ previous experience, providing learning resources based on the learners’ language knowledge level. Learners do not necessarily follow a boring grammar program in language learning by using the English Speaking Pronunciation Quality Assessment Model, but can flexibly arrange their learning content and activities in a non-linear way.

3) In the language output stage, the English Spoken Pronunciation Quality Evaluation Model changes the existing language teaching mode and learning environment, allowing learners to learn on their own at any time and any place, not only giving learners accurate, objective and timely pronunciation evaluation and feedback guidance, but also helping learners to find out the differences between their pronunciation and the standard pronunciation through repeated listening and comparison, and to correct their own pronunciation errors, so as to improve the efficiency of language learning. The program can also help learners identify the difference between their pronunciation and standard pronunciation through repeated listening and comparison, and correct their pronunciation errors, thus improving the efficiency of language learning.

Models of Design for Teaching English as a Foreign Language

The oral English pronunciation quality evaluation model constructed in this paper is an auxiliary oral English teaching process, as shown in Figure 5. The process is mainly divided into three main parts, which are teaching task analysis, teaching process design and teaching evaluation and reflection. And each of these parts is divided into many subsections. The analysis of teaching tasks is the whole teaching design to establish the foundation, including the analysis of teaching objectives, the analysis of participants and the analysis of learning tasks according to the idea of 4C/ID model. Teaching process design is the creation of a real learning situation with the technical advantages of the English oral pronunciation quality evaluation model. It also selects appropriate learning strategies and utilizes reasonable learning resources and tools for learners to carry out independent inquiry learning. Teaching evaluation and reflection are used to evaluate the learning effect of students and the overall situation of teaching, so as to optimize the teaching design scheme, which is an indispensable part of the teaching design.

Figure 5.

Evaluation model AIDS oral English teaching design

Analysis of teaching tasks

Pre-teaching task analysis includes teaching goal analysis, learning task analysis and participant analysis.

Teaching objectives can be designed according to Bloom’s taxonomy of educational objectives in terms of the three dimensions of knowledge and skills, process and method, and affective attitude and values. Learning tasks can be designed according to the design idea of the four-component instructional design (4C/ID) model, analyzing the system of knowledge to be mastered and sorting out its intrinsic connection. Then the teaching tasks are decomposed into interconnected individual skills and sub-skills, so that they are presented to the learners in the order from easy to difficult and from existing knowledge and experience to new knowledge.

Students need to be analyzed to determine internal factors such as learners’ knowledge base, cognitive ability and motivation to learn. The model for evaluating the quality of oral English pronunciation, on the other hand, needs to be clarified and understood, and the performance of the evaluation model used in terms of control, appearance and tactile, visual, phonological, and connectivity, to determine the mode of use of the model in the spoken English classroom.

Teaching process design

The design of the teaching process includes the design of learning situations, the design of teaching resources and the design of learning strategies:

1) Learning situation design is to design learning scenarios close to life according to the scenarios involved in the learning tasks and the theme of learning. And the evaluation model is used to conduct different role-plays in the real language environment to attract students’ attention, stimulate their interest in learning and subjective initiative, and pave the way for future learning.

2) Teaching resources are designed to provide students with the necessary preparatory knowledge, including supportive and procedural information, according to the needs of the learning content. The types of teaching resources should be as comprehensive as possible, and here usually refers to digital teaching resources for learners’ multifaceted learning.

3) Learning strategies are complex programs about the learning process that are purposely and consciously developed to improve the effectiveness and efficiency of learning under different teaching and learning conditions. The design of learning strategies needs to be selected and formulated based on the available teaching conditions, the characteristics of the knowledge content, and the intellectual and non-intellectual factors of the students. The model for evaluating the quality of oral English pronunciation, on the other hand, serves as an auxiliary tool for the implementation of teaching tasks, while the teacher can monitor the process of teaching activities as a whole, grasp the rhythm, check the gaps, control the overall situation, and ensure that the teaching activities are carried out in an orderly manner.

Empirical analysis of students’ speaking skills in the information technology environment
Comparative Analysis of English Speaking Fluency

In order to explore the effect of evaluating system-assisted oral English teaching on the improvement of students’ speech fluency, this paper conducted an oral English fluency test with the experimental class 701 and the control class 705 in X school. Before the experiment, the researcher conducted the experimental class, and the speaking test system used was the X Speaking Test System. The scoring system will give students corresponding scores based on their actual performance in pronunciation standardization (total score of 8), fluency (total score of 12), and rhythm (total score of 10), and finally form the total speaking score. This quasi-experiment was conducted in the school computer room. The subjects were two classes totaling 102 students, and all the students participated in the pre-test for speaking performance in this experiment.The data will be analyzed by SPSS experimental data analysis after obtaining it.

Comparative analysis of pre-test scores

The comparative results of the pre-test scores of Group 1 (experimental group) and Group 2 (control group) are shown in Table 1. The mean score of the experimental class’s spoken English score was 17.61, and the mean score of the control class’s spoken English score was 17.47, and the mean of the standard errors of the two classes were 0.36727 and 0.36001, respectively, indicating that the degree of dispersion of the students’ spoken scores was similar in the two classes. The Levine’s test of equivalence of variances yielded F=0.153, with a significance (p-value) of 0.1806 greater than the significance level of 0.05, indicating that there is no significant difference in variances. Meanwhile, pronunciation standardization, fluency and rhythm, the significance of the three sub-dimensions are 0.5549, 0.0982 and 0.6217 respectively, which are greater than the significance level of 0.05, indicating that there is no significant difference between the two classes in terms of pronunciation standardization, fluency and rhythm. This can be concluded that there is no significant difference between the English speaking ability of the students in the experimental class and the control class, which can be used as a research sample for the designed experiment.

Comparison of oral English pre-test results

Group N Mean SD Standard error mean P
Total score 1 51 17.61 2.612 0.36727 0.1806
2 51 17.47 2.464 0.36001
Standard degree 1 51 3.61 1.017 0.14104 0.5549
2 51 3.48 1.142 0.15133
Fluency degree 1 51 8.77 2.022 0.24224 0.0982
2 51 8.85 2.111 0.18667
Rhythmic degree 1 51 5.23 1.721 0.27448 0.6217
2 51 5.14 1.335 0.27231
Comparative analysis of post-test scores

The experimental class used the evaluation system designed in the previous section to assist in the oral English teaching mode, while the control class used the traditional teaching mode for a three-month teaching experiment.At the end of the experiment, the researcher conducted a post-test of oral proficiency for both classes. The students’ oral proficiency test is still detected using the English oral pronunciation quality evaluation model, and the data will be analyzed by SPSS data analysis after obtaining the data, of which the post-test data analysis is shown in Table 2. As can be seen from the total score of the post-test, it can be seen that the mean score of the experimental class’s English speaking score is 22.59, and the mean score of the control class’s English speaking score is 18.48. The mean standard errors for both classes are 0.64722 and 0.61375 respectively. The test corresponds to the significance (P-value) of 0.0081 is less than the level of significance of 0.05, which can be concluded that the experimental and the control class students’ of the experimental class and the control class have significant differences in their English speaking post-test scores. In other words, the new oral teaching mode can effectively improve students’ overall oral proficiency.

Comparison of oral English post-test results

Group N Mean SD Standard error mean P
Total score 1 51 22.59 4.258 0.64722 0.0081**
2 51 18.48 4.405 0.61375
Standard degree 1 51 5.15 2.204 0.31706 0.0422**
2 51 4.21 2.119 0.30624
Fluency degree 1 51 10.82 2.045 0.25882 0.0099**
2 51 8.72 1.971 0.20724
Rhythmic degree 1 51 6.62 2.442 0.32338 0.0415**
2 51 5.55 2.627 0.37262

Meanwhile, the mean score of speech fluency of the experimental class was 10.82 and the score of speech fluency of the control class was 8.72, and the mean of standard error of the two classes was 0.25882 and 0.20724 respectively. The test corresponds to a significance (p-value) of 0.0099 is less than the level of significance of 0.05.It can be concluded that there is a significant difference between the students of the experimental class and the control class. In other words, the students’ English oral fluency in the experimental class after the experiment is significantly higher than the students’ phonological fluency scores in the control class.

Comparative analysis of pre and post test scores

The results of the speaking scores and paired samples T-tests of the control class before and after the experiment are shown in Tables 3 and 4, respectively, which show that the increase in the mean scores of the speaking scores of the students in the control class measured before and after the experiment ranged from -0.13 to 0.73. Through the paired samples t-test, it is found that the p-value of each paired item is greater than the significant level of 0.05, which indicates that there is no significant difference in the oral scores of the students in the control class before and after the experiment, that is to say, there is not a big difference in the oral proficiency of the students of the control class before and after the experiment in various aspects of oral English (standardization of pronunciation, fluency, and rhythmic expression).

Oral performance before and after the experiment

Group 2 N Mean SD Standard error mean
Total score Pre-test 51 17.47 2.464 0.36001
Post-test 51 18.48 4.405 0.61375
Standard degree Pre-test 51 3.48 1.142 0.15133
Post-test 51 4.21 2.119 0.30624
Fluency degree Pre-test 51 8.85 2.111 0.18667
Post-test 51 8.72 1.971 0.20724
Rhythmic degree Pre-test 51 5.14 1.335 0.27231
Post-test 51 5.55 2.627 0.37262

Paired sample T test of oral performance before and after the experiment

Group 2 Pairing difference t DF P
Mean SD Lower limit Upper limit
Total score Pre-test 1.01 5.334 -2.562 0.525 -1.415 50 0.0672
Post-test
Standard degree Pre-test 0.73 3.134 -1.621 0.119 -1.931 50 0.0808
Post-test
Fluency degree Pre-test -0.13 2.56 -0.728 0.447 -1.742 50 0.6374
Post-test
Rhythmic degree Pre-test 0.41 2.056 -1.692 0.209 -0.505 50 0.1616
Post-test

The results of speaking scores and paired-sample t-tests of the experimental class before and after the experiment are shown in Tables 5 and 6, respectively. The average scores of the speaking scores of the experimental class after the experiment are improved compared with those before the experiment, and the p-value of each paired item is found to be less than the significant level of 0.05 through the paired-sample t-test, which indicates that there is a significant difference in the speaking scores of the experimental class students before and after the experiment. That is to say, after putting the evaluation model-based oral English teaching design into teaching implementation, the experimental class students’ oral English levels (pronunciation standardization, fluency, and rhythmic expression) have been significantly improved. Among them, students’ fluency improved most significantly, with an average increase of 2.05 points, which is close to 23.37 percentage points.

Oral English scores of experimental classes before and after the experiment

Group 1 N Mean SD Standard error mean
Total score Pre-test 51 17.61 2.612 0.36727
Post-test 51 22.59 4.258 0.64722
Standard degree Pre-test 51 3.61 1.017 0.14104
Post-test 51 5.15 2.204 0.31706
Fluency degree Pre-test 51 8.77 2.022 0.24224
Post-test 51 10.82 2.045 0.25882
Rhythmic degree Pre-test 51 5.23 1.721 0.27448
Post-test 51 6.62 2.442 0.32338

Experimental class oral performance paired sample T test

Group 1 Pairing difference t DF P
Mean SD Lower limit Upper limit
Total score Pre-test 4.98 0.635 1.324 4.052 4.038 50 0.007**
Post-test
Standard degree Pre-test 1.54 0.385 0.771 2.304 3.565 50 0.002**
Post-test
Fluency degree Pre-test 2.05 0.391 0.602 2.118 3.671 50 0.001**
Post-test
Rhythmic degree Pre-test 1.39 0.266 0.417 1.451 4.338 50 0.000**
Post-test
Analysis of students’ learning attitudes

Attitude is the basis for all activities, and learning attitude represents the subjective attitude of individuals in different teaching modes. Under the information technology environment, the oral English teaching mode based on the assistance of English oral pronunciation quality evaluation model is an exploration of the integration of information technology teaching means to create an efficient classroom, as well as a practical exploration of the concept of independent learning. The aim is to improve students’ independent learning initiative, classroom participation, analyzing and problem-solving ability by implementing the teaching model.

Under the information technology environment, the construction of a new oral English teaching mode based on the independent learning concept assisted by the English oral pronunciation quality evaluation model was carried out in Class 701 of School X. The study of the practical application effect was carried out, and the improvement in the students’ oral performance was achieved, especially in the aspect of the students’ speech fluency. To better understand the effects of the new oral English teaching method in an information technology-based environment, this paper prepared and distributed a questionnaire in an experimental class. The questionnaire survey started from the learning attitude to carry out the survey research, mainly exploring students’ acceptance of the new teaching mode, a total of 51 copies were distributed, 51 copies were recovered, and the recovery rate was 100%. The questionnaire as well as the statistical results of the survey are shown in Table 7 and Figure 6 respectively. In this paper, the reliability of the questionnaire was analyzed through SPSS, and the reliability was 0.813, which is up to standard. Through the analysis of the questionnaire on the implementation of the new teaching mode, it was found that compared with the previous teaching mode, after the implementation of the new teaching mode, the students’ attitude towards independent learning has undergone a more obvious change, 29 (56.86%) students said that they liked this teaching mode very much (Q1), and they always felt that they could not get enough of it after each lesson, and they not only remembered the contents presented in the classroom materials but also were able to fluently learn in appropriate occasions. Seventeen (33.33%) of the students said that they could adapt to the new teaching mode very quickly (Q2), and that they learned to listen and speak without realizing it by following the scenarios created by the multimedia courseware, and that they improved their learning effectiveness through interaction and presentation.

A questionnaire was prepared and distributed

Item Specific content
Q1 Do you like learning in the new English teaching mode? e
Q2 Can you adapt to this teaching mode?
Q3 Do you like the current pronunciation quality evaluation model for spoken English?
Q4 Do you think the current English teaching model improves English learning?
Q5 Can the new teaching mode improve the initiative of English preview?
Q6 Can the new teaching model improve the effect of English preview?
Q7 Do you like to actively present yourself in the new model of presentation class? the
Q8 Do you like the model of the presentation class?
Q9 Do you like the new English learning mode of group cooperative learning? e
Q10 Is the new mode of group cooperative learning fruitful?
Q11 Do you like the feedback class in the new model?
Q12 Did you learn much from the feedback class in the new model?
Figure 6.

Questionnaire survey statistics

For the 12 questionnaire items, the percentages of those who said they did not like it, did not use it much, and were not satisfied with it did not exceed 30%. It can be seen that the vast majority of students can accept and like the relaxed learning atmosphere brought by information technology means. Based on the oral English teaching mode assisted by the English oral pronunciation quality evaluation model, it has a certain effect on stimulating students’ interest in learning, and the new teaching mode brings a new learning experience. Students can actively complete the pre-study task, communicate and display in the classroom, and actively think and acquire knowledge.In the process of completing the tasks, students are very active, proving that their learning attitude is very positive.

Confidence analysis of human-computer evaluation of pronunciation quality

Fifty-one experimental class students’ English read-aloud pronunciations were selected as the acoustic adaptive modeling corpus, and all the students chose 20 sentences in the corpus and read them aloud at a standard level as much as possible. The English speech database mainly contains the speech of 51 students, but there is a significant difference in their level of pronunciation. All students read aloud 20 sentences from the corpus, each consisting of approximately 12 words. Three experienced English experts were invited as judges to evaluate the pronunciation quality for pronunciation standard, pronunciation fluency, and rhythmic integrity, with a score scale of 0-10, and the mean value of the experts’ scores was used as the manual score for all the voices.

Artificial scoring is an important basis for machine scoring, and consistency needs to be assessed first. The results of expert consistency scoring are specifically shown in Table 8. It can be seen that the mean values of sentence level and reader level are 0.72 and 0.83 respectively, which indicates that the consistency of artificial scoring is good and can be used as the main carrier of machine scoring.

Manual scoring results

Relevancy Sentence level Reader level
English expert A 0.67 0.83
B 0.78 0.86
C 0.71 0.79
Mean value 0.72 0.83

Figure 7 shows the bubble plots of the manual ratings and the evaluations of the model for evaluating the quality of spoken English pronunciation, where the horizontal coordinates are the four ratings D, C, B, and A, in that order. The bubble size represents the number of machine ratings samples that correspond to a particular manual rating. As can be seen from the figure, the vast majority of the artificial and machine model ratings differ within ±1, that is, by one grade, and show a clustered state. In the figure, the largest bubbles outside the two straight lines appear in the results where the manual ratings are A and the machine ratings are B. This phenomenon also reflects the fact that machine ratings are more rigorous than manual ratings.

Figure 7.

Manual and machine ratings correspond to bubble plots

The above results show that the oral English pronunciation quality evaluation model proposed in this paper has high adaptability and standardization in evaluating readers’ English pronunciation, and can comprehensively measure the students’ reading pronunciation quality based on pronunciation standardization, fluency, and rhythmic integrity. It can be applied to actual oral English teaching to achieve the goals of students practicing speaking and assisting oral English teaching.

Conclusion

Under the background of the new curriculum standard, students are the main focus of classroom learning. Only when students independently experience the learning process of perception, participation, and communication can they harvest knowledge.Combined with the learning characteristics of current oral English teaching, the construction strategy of the new model is developed. And by adding the two information technology means of pronunciation error detection and pronunciation quality assessment, the pronunciation quality evaluation model of spoken English is constructed, and the innovative design of the spoken English teaching model is completed. And the specific software implementation can not only point out students’ mispronunciation, but also assess their overall pronunciation level and provide effective feedback guidance. After experimental verification, the evaluation methods of pronunciation standardization, fluency and rhythmic completeness adopted in this paper are credible, which can give learners timely, accurate and objective evaluation and feedback guidance, help learners find out the differences between their pronunciation and standard pronunciation, correct pronunciation errors, and thus effectively improve students’ speech fluency.