Corpus-Driven Deep Learning-Based English-Chinese Translation Model Construction and Its Application to College English Teaching

The application scenarios of translation modeling have been gradually expanded from single bilingual translation to speech translation, conference translation, intelligent Q&A, document translation and video subtitle translation and other directions [1–2]. With the innovation of translation modeling, the accuracy of English-Chinese translation modeling has put forward higher requirements. In the process of English-Chinese translation using the English-Chinese translation model, it is affected by factors such as ambiguity of English utterances and language expression habits, which leads to ambiguous utterances in English-Chinese translation and poor accuracy of semantic analysis, resulting in a decline in translation quality [3–4]. Therefore, it is necessary to update and optimize the existing English-Chinese translation model or reconstruct a new model.

English-Chinese translation is a multi-attribute semantic decision-making problem, which needs to be analyzed by relevance and semantic similarity, combined with fuzzy decision-making model for the design of semantic expression preferences for translation [5–7]. And the corpus well assists the construction of English-Chinese translation model. In addition, the combination of corpus and English-Chinese translation modeling in university English teaching has become increasingly close, and the focus and themes of research attention have gone beyond the research field of traditional teaching and have a broader extension, mainly including: English for Specialized Purposes, terminology research, evaluation of students’ translations, translator cultivation, curricula and teaching materials, and other fields [8–10]. At the same time, people’s understanding of the concept of corpus has also broken through and expanded, from the monolingual or bilingual parallel corpus at the beginning, and began to try to build a one-time corpus, student translation work archive, etc., to provide convenience for English teaching [11–12]. As an evaluation tool, corpus has a high value in evaluating the quality of English-Chinese translation models. And corpus and English-Chinese translation are widely used in the field of university English teaching [13–15]. Nowadays, the English-Chinese translation model can be used to be applied to English education, to update the university English teaching materials, to help the reform of teaching materials and to improve the quality of teaching, and the construction of the new English-Chinese translation model can contribute to this.

In this paper, in order to investigate the effect of the application of English-Chinese translation model in university English teaching, an English-Chinese bilingual corpus was first constructed. After that, we estimate the probability of whether the bilingual pairs are equivalent translations with the support of the deep neural network Transformer model, so as to encode the collected data and construct a parallel corpus, and on the basis of which we construct an Encoder-Decoder model with an attention mechanism. Finally, the effects of this model on dual-drive university English teaching are analyzed to support the application of corpus-driven English-Chinese translation learning approaches in classroom teaching.

2

Bilingual corpus construction for English-Chinese machine translation

2.1

The process of English-Chinese bilingual corpus construction

2.1.1

Design thinking

According to the problems related to the corpus construction, targeted pre-preparation work was carried out, including the four elements of collecting and organizing the corpus, upgrading the word alignment model and entering the parallel sentence pairs into the corpus by using the web mining technology [16] and web content collection algorithms, in order to smoothly carry out the construction and management of the English-Chinese bilingual corpus in the following period. The following section specifically introduces the pre-preparation process of the bilingual corpus based on English-Chinese machine translation, as well as the specific construction process of collection, labeling, and processing. The corpus design process can be seen in Figure 1.

2.1.2

Collection of the corpus

1)

Source of corpus

Since English-Chinese machine translation has high data requirements, it requires a large amount of English-Chinese bilingual corpus, and the methods of data acquisition are more diverse. Considering the need to ensure the adequacy of the corpus, the final choice is to crawl the “English-Chinese Detailed Dictionary” combined with Python crawler technology.

2)

Entry of corpus

After confirming the dictionary crawl, the entry for the Chinese and English corpus is started. First of all, use a high-speed scanner to scan the English-Chinese Dictionary, but it should be noted that the scanned format is PDF, and it is necessary to convert the PDF format to the text format of the corpus, i.e., TXT format. Web-based English-Chinese corpus acquisition method, i.e., crawling English websites through a crawler program, and then obtaining corpus documents, from which valuable English information is extracted as the corpus input source. Using a generic web crawler based on the width-first data search strategy, we analyze the HTML documents of English “navigation websites” to extract the seed sets of URLs of multiple English websites, then we select the seed URLs to crawl each English website in turn, and download and save the documents of the web pages. After five steps of HTML tag filtering, alphabet recognition algorithm, phrase filtering, repetition filtering, and word spelling checking to extract the available English corpus, it is finally saved in an XML document as the corpus source.

2.1.3

Organization of the corpus

1)

Pre-processing of the corpus

In order to ensure the quality of the corpus and the accuracy of the study, it is necessary to proofread the scanned text of the corpus carefully, check the corpus for garbled codes, spelling errors and whether the specific content is different from the meaning of the original text, and proofread and correct it in time. The preprocessing of the corpus mainly involves standardizing the format and removing various impurities to achieve the function of accurate sharing of the Chinese and English corpora. After inputting the proofreading corpus, standardizing the format of the corpus and removing impurities, the Chinese and English corpus is divided into different documents, each of which is named with letters to facilitate the querying and loading of the documents.

2)

Categorical labeling of the corpus

Classification Criteria: When categorizing corpus information, there are different classification criteria. To address this, the text level of each corpus can be categorized at multiple levels by tagging the three basic attributes of each corpus: style, genre, and domain. The corpus can be categorized according to genre into three types: “Literature, Journalism, and Practical Writing”.

Bias labeling: the successful establishment of an English-Chinese bilingual corpus aims to be able to better serve machine translation and language learners in mastering the use of machine-translated languages and their development process. The types of errors in the corpus may lead to the creation of other error factors. In order to objectively and accurately grasp the frequency of learners’ use of specific single terms or expressions, the combination or connection between linguistic elements, it is necessary to clearly identify and label the types of errors in the corpus.

2.1.4

Processing of the corpus

The automatic alignment procedure is used to identify sentence/paragraph boundaries thus turning into paragraph-level/sentence-level bilingual alignment, and the constraints of the IGT model are introduced into the log-linear word alignment model, which in turn changes the word sequences in the global context scope; to further enhance the word order, the syntactic tree constraints are integrated into the IGT-based word alignment model. After integrating two types of syntactic knowledge points, it can effectively limit word order variation during word alignment in global and other local regions. Combining the automation technique with manual checking, the automatic alignment results are manually reviewed to obtain the correct sentence/paragraph boundary markers and alignment markers for the English-Chinese bilingual parallel corpus.

2.2

Extraction Techniques for English-Chinese Bilingual Parallel Sentence Pairs

2.2.1

Negative sampling of data

The parallel corpus C used for the training of the extraction model for English-Chinese bilingual sentence pairs is composed of parallel sentence pairs $(S_{k}^{S}, S_{k}^{T})$ , where S^S and S^T denote sentences in the source and target languages, respectively, and k ∈ {1, 2, ⋯, n}. Since these sentence pairs are from the standard English-Chinese Bilingual Parallel Corpus, they are all positive examples of the training data. However, to obtain a classifier model for distinguishing between parallel and non-parallel sentence pairs, the training data must contain samples of sentence pairs with negative examples. For this an auto-generation strategy is applied i.e., during the training process, for each sentence pair $(S_{k}^{S}, S_{k}^{T})$ , m pairs of negative examples $(S_{k}^{S}, S_{j}^{T})$ (where k ≠ j) are automatically drawn from the set C. Thus for each round of training, the training data will contain m(1 + n) triples $(S_{i}^{S}, S_{i}^{T}, y_{i})$ , where $S_{i}^{S} = (w_{i, 1}^{s}, w_{i, 2}^{s}, \dots, w_{i, N}^{s})$ is the source language sentence containing N words; $S_{i}^{T} = (w_{i, 1}^{T}, w_{i, 2}^{T}, \dots, w_{i, M}^{T})$ is the target-side sentence containing M words; and y_i ∈ {0,1} is a marker for $(S_{i}^{S}, S_{i}^{T})$ whether or not it is parallel.

2.2.2

Model architecture

The extraction model for English-Chinese bilingual parallel sentence pairs utilizes a deep neural network to learn cross-linguistic semantic information and uses this information to make a probabilistic estimation of whether a bilingual sentence pair is a reciprocal translation or not, i.e., $p (y_{i} = 1 | S_{i}^{S}, S_{i}^{T})$ , where S^T is the target-side sentence and S^S is the source-side sentence. The model architecture is chosen from the proposed bi-directional RNN for encoding bilingual sentences, and the recurrent activation function of the RNN [17] can be either LSTM or gated recurrent unit (GRU).

For the set of source sentences S, at moment t, the word $w_{i J}^{s}$ of the i rd sentence is defined by the k th index of the source word list v^s, i.e., the unique heat vector w ∈ ℝ ^{|V^s|}, where the k th element is 1 and all other elements are 0. This unique heat vector is multiplied with the matrix E^s ∈ ℝ^{|V|^s×d_e} of the embedding layer to obtain a continuous vector representation of the word $w_{i, j}^{S} = ℝ^{d_{e}}$ which is used as an input to the forward and backward RNN of the encoder. The forward RNN encodes the sentence one by one starting from the first word of the sentence up to the sentence terminator <EOS> to get a fixed length continuous vector representation ${\vec{h}}_{i, N}^{s} \in ℝ^{d_{h}}$ , where d_hrepresents the dimension of the output state of the implicit layer of the decoder. And the backward RNN is encoded one by one from the end symbol <EOS> to the first word to obtain the hidden layer representation of the reverse RNN ${\tilde{h}}_{i, 1}^{S} \in ℝ^{d_{h}}$ . The final hidden layer representation of the encoder of the sentence at the source end is the splice of the above mentioned two hidden layer output vectors, i.e. $h_{i}^{s} = [\begin{matrix} {\bar{h}}_{i, N}^{s}; {\bar{h}}_{i, 1}^{s} \end{matrix}]$ . Similarly, the sentence at the target end is encoded in the same way as the source to obtain the final hidden layer representation of the sentence at the target end $h_{i}^{T} = [{\vec{h}}_{i, M}^{T}; {\bar{h}}_{i, 1}^{T}]$ . The hidden state of the bidirectional the computational formula is as follows: 1 $\begin{array}{l} w_{i, t}^{S} & = E^{S^{T}} w_{k} \\ {\bar{h}}_{i, t}^{S} & = ϕ ({\bar{h}}_{i, t - 1}^{S}, w_{i J}^{S}) \\ {\bar{h}}_{i, t}^{S} & = ϕ ({\bar{h}}_{i, t + 1}^{S}, w_{i J}^{S}) \end{array}$ where ϕ can be any cyclic activation function, GRU is chosen in this work.

After obtaining the coding layer representation of the bilingual sentences, in order to capture the reliable features of whether the bilingual sentences are reciprocal translations or not, the product $h_{i}^{(l)}$ and difference $h_{i}^{(2)}$ of the elements of the two hidden layer representations are fed into the subsequent fully connected layer of the model, where σ is a softmax function, and W⁽¹⁾,W⁽²⁾,b and c are the parameters to be learned and optimized by the model. The training of the whole model is done by minimizing the cross entropy of the tagged sentence pairs, which is calculated as follows: 2 $h_{i}^{(1)} = h_{i}^{S} ⊙ h_{i}^{T}$ 3 $h_{i}^{(2)} = | \begin{matrix} h_{i}^{S} - h_{i}^{T} \end{matrix} |$ 4 $h_{i} = \tanh (W^{(1)} h_{i}^{(1)} + W^{(2)} h_{i}^{(2)} + b)$ 5 $p (y_{i} = 1 | h_{i}) = σ (W^{(3)} h_{i} + c)$ 6 $L = - \sum_{i = 1}^{n (1 + m)} y_{i} \log σ (W^{(3)} h_{i} + c)$ 7 $- (1 - y_{i}) l o g (1 - σ (W^{(3)}) + c)$

In the inference phase of the model, a sentence pair is marked as a parallel positive example if its probability score equals or exceeds a predetermined threshold of ρ. There are: 8 ${\hat{y}}_{i} = {\begin{array}{l} 1 & i f & p (y_{i} = 1 | h_{i}) \geq ρ \\ 0 & i f & p (y_{i} = 1 | h_{i}) < ρ \end{array}$

2.3

Algorithm implementation

2.3.1

Data pre-processing

Starting from processing the data, the steps of the experiment are in order: Step 1:

The input and target statements are indexed in the network layer to create a dictionary class, “SOS” and “EOS” denote the start and end markers of a sentence, respectively, and these two characters are divided into their 0 and 1 labels. While analyzing each word, if there is an unmarked word in the sentence, the out-of-the-way character is stored in a separate dictionary.

Step 2:

Convert the data file stored in Unicode characters into a file stored in ASCII, and make all the contents lowercase, and trim most of the punctuation marks.

Step 3:

Divide the file storing the data into lines, then split it into pairs, respectively in the form of Chinese against English, and standardize the text according to the length and content of the data. Because the whole dataset is too large, only a part of the dataset is filtered as the training set. In the training, we select the sentences with sentence length less than 30 as the training set, filter the statements that do not satisfy the conditions, and store the Chinese and English that satisfy the conditions under different dictionaries respectively.

2.3.2

The Encoder-Decoder model

Given a source utterance F, how to predict the output target utterance E is what the Encoder-Decoder [18–19] model mainly does. The main task of this model in machine translation is to compute the target utterance with the highest probability, i.e. the best match, based on the source’s utterance. Before calculating the final probability, the initial state of the language model is determined by placing the source utterance F on an RNN. The basic idea of the Encoder-Decoder model is to encode the source utterance F on an Encoder neural network layer, run it to derive a vector of actual values h, i.e., the information of the hidden layer, and then later on, use another neural network layer Decoder to predict the target utterance E. The Encoder-Decoder model’s structure can be seen in Fig. 2.

Where Encoder is denoted as RNN^(f)(·) and Decoder is denoted as RNN^(o)(·). The softmax function can be used RNN^(e) to transform the hidden state h_F into a probability p^(e) over a time step t.

The operational steps of the Encoder-Decoder model are shown below: Step 1:

First calculate the $m_{t}^{(f)}$ value and $m_{t}^{(f)}$ is the embedding calculation value: 9 $m_{t}^{(f)} = M_{., f_{t}}^{(f)}$

Step 2:

The value of the hidden layer in the Encoder when the source statement F is on time step t: 10 $h_{t}^{(f)} = {\begin{array}{l} R N N^{(f)} (m_{t}^{(f)}, h_{t - 1}^{f}) t \geq 1, \\ 0 o t h e r w i s e . \end{array}$

By computing $h_{t}^{(f)}$ , the Encoder will be able to see all the words in the source statement, and theoretically this hidden state $h_{e}^{(f)}$ is the result of encoding all the information in the source statement.

Step 3:

In the decoding phase, at each time step t, we have to predict the probability of word e_t, which is more or less the same as solving for the value of $m_{t}^{(f)}$ , except that in this case we have to use the previous state of e_t, i.e., the previous word e_{t – 1} of the current word, and at the same time, we have to take into account the current state e_t, i.e., the value of $m_{t}^{(e)}$ at this point in the decoder: 11 $m_{t}^{(e)} = M_{., e_{t - 1}}^{(e)}$

After that, the decoder is run to compute $h_{t}^{(e)}$ , which is similar to the Encoder’s steps, except that $h_{0}^{e}$ is set to the final state of $h_{F}^{(f)}$ , and the value of $h_{t}^{(e)}$ is: 12 $h_{t}^{(e)} = {\begin{array}{l} R N N^{(e)} (m_{t}^{(e)}, h_{t - 1}^{(e)}) t \geq 1 \\ h_{F}^{(f)} o t h e r w i s e \end{array}$

Step 4:

Finally by calculating the probability $p_{t}^{(e)}$ by using softmax function on the hidden layer $h_{t}^{(e)}$ , there is: 13 $p_{t}^{(e)} = softmax (W_{h s} h_{t}^{(e)} + b_{s})$

In the Encoder-Decoder model, the Encoder learns the information about the data that is fed to him internally.

2.3.3

Modeling

Step 1:

Build an encoder. First build an encoder, the encoder initializes the hidden state and the number of network layers, but the number of network layers is placed outside the recurrent neural network, if placed inside, the number of layers of the network determines the number of hidden states, therefore, to simplify the encoder, the number of layers of the network is built outside the recurrent neural network, then there is now only one hidden state, which continues to be passed on to the subsequent network layers Propagation. In forward propagation, the input sequence of labels is transformed into a sequence of word vectors, followed by the network output results and hidden states.

Step 2:

Build a decoder. Next, a decoder is built and one more linear layer is added to the decoder as an output layer, and the dimension is set to the number of words since the output is finally determined based on the probability of the output.

Step 3:

Add the attention mechanism to the decoder. In order to improve the translation quality and reduce the translation complexity, we put the attention mechanism in the decoder and build a decoder containing the attention mechanism.

To define the attention mechanism, a portion of other network layers are introduced in the initialization. In the forward function, the network inputs are first transformed into word vectors, then word vectors c_t and hidden states h_t are spliced together, followed by outputting fixed-length sequences ${\tilde{h}}_{t}$ and predictive distributions p through a linear layer plus a softmax activation layer: 14 ${\tilde{h}}_{t} = \tan h (W_{c} [c_{t}; h_{t}])$ 15 $p (y_{t} | y < t, x) = s o f t \max (W_{s} h_{t})$ $$p({y_t}|y \unicode {x003C} t,x) = soft\max \left( {{W_s}{h_t}} \right)$$

Where y_t denotes the target utterance and ${\tilde{h}}_{t}$ this sequence is the attention sequence, with the size of each number indicating the importance of the attention. The output of the encoding process and the attention weights are then obtained by batch matrix multiplication using torch. bmm, and finally this result is spliced with the network input. The dimensions are transformed into dimensions accepted by the recurrent neural network through a linear layer, which is used as input to the network and finally the final output is obtained through the network.

2.3.4

Model training

After building the model, we have to train it with the data, in the process of encoding, we have to define an empty sequence, the output after each step of encoding is filled into this empty sequence, and finally the hidden state is saved as the initial hidden state of the decoding process.

When training the data, first the sequence of source statements (x₁,x₂,…,x_T) is input through the encoder, and then the decoder will give a start symbol <SOS> to the first input statement and take the last hidden layer of the Encoder as its first hidden layer. Where the hidden layer is represented at time t: 16 $h_{t} = s i g m o d (W^{h x} x_{t} + W^{h h} h_{t - 1})$

The output sequence produced after the RNN model is: (y₁,y₂,…,y_T). then there are: 17 $y_{t} = W^{y_{t}} h_{t}$

Secondly, when defining the decoding process, the input and hidden state are passed into the decoder one at a time, and the encoded output is passed in during the decoding process, and then the result obtained is passed into the next step of the decoder, and if a terminator is encountered during the loop, then the sentence ends there and jumps out of the loop.

2.4

A dual-drive model for teaching English with a bilingual corpus

DDL can be viewed as a unique inductive pedagogy, where the main form of teaching is to allow students to discover patterns from the data themselves through induction, rather than through lectures by the teacher. Teachers cannot predict what students will discover, and their discoveries may be completely new or laws unfamiliar to both teachers and experts. Students are the center of teaching and learning activities, and teachers are only instructors and coordinators. DDL can be further categorized into direct DDL and indirect DDL based on whether the students have direct access to computers or not. Direct DDL necessitates specific computer hardware and software, but indirect DDL is provided by the teacher with corpus text, which eliminates students’ direct dependence on computers and is thus more convenient for classroom teaching. In fact, with the popularization of computers and cell phone networks, both data-driven learning approaches can be applied to classroom teaching.

2.4.1

Indirect data-driven dual-drive instruction

Indirect DDL uses corpus resources that have been processed and organized by the teacher in smaller quantities, and the difficulty and content can be controlled by the teacher, so it is more suitable for teaching words, phrases, and grammar. The basic steps of "identification-classification-induction" were proposed, and the methods of word teaching were discussed with examples such as "the difference between convince and persuade" and "the usage of should", which proved the good effect of data-driven. The indirect DDL method of teachers organizing “micro-texts” is a feasible and effective way of teaching. Teachers can control the content and difficulty of the corpus, and discuss it with students to prevent them from being misled by wrong generalization or over-generalization. A native-speaker-driven approach is also more likely to capture students’ interest in learning and make them better understand English words and phrases, as well as the similarities and differences between English and Chinese expressions.

2.4.2

Direct data-driven dual-drive instruction

In direct DDL, students can personally conduct searches on huge amounts of linguistic data and use it as an aid to self-motivated learning. This approach is mainly used for composition, translation training, and independent learning. By utilizing an online web corpus, students can verify their expressions, identify correctness and error, and eventually find the correct form of language. Empirical studies have shown that the corpus-based approach to college English writing teaching can effectively improve students’ writing skills. In terms of translation teaching, the characteristics of a corpus with convenient search and rich content are conducive to enhancing students’ perception of microlinguistic phenomena, which can effectively improve the accuracy and efficiency of translation. In addition to the above advantages, data-driven teaching based on bilingual corpus also has the advantage of being native language-driven, which has more room for play in English teaching.

3

Results and analysis of the application of the English-Chinese translation model in university English teaching

3.1

Performance Test of English-Chinese Translation Models

In order to verify the effectiveness of the methods in this paper, the original model Transformer, the improved model Transformer model, the maximum likelihood training method, and the adversarial training method are ranked and combined respectively, and compared with other baseline methods. There are six methods, including the combined four methods and two other methods.

3.1.1

Comparison of BLEU with sentence length for different models

The comparison results of the BLEU scores of different models as a function of sentence length are shown in Fig. 3. The figure plots the variation curves of the evaluation metrics BLEU scores with sentence length for the generated translations on the test dataset newstest2023 between the existing baseline method and this paper’s method. The figure illustrates that with the change of sentence length, this paper’s method achieves better performance compared with other methods at different sentence lengths.

3.1.2

Performance of different models in terms of perplexity with the training process

The test set newstest2023 belongs to the news domain, as a public corpus although it can reflect the performance of various methods to some extent, the robustness of the methods is very important. Many research methods, although they can achieve good performance on public datasets, do not perform well in real production environments. This is not because the model is overfitted, but because the public corpus has been pre-processed and standardized to do a certain amount of feature engineering, and the feature-engineered corpus data naturally performs very well. The results of the performance of the different models in terms of perplexity with the training process on the newstest2023 test set are shown in Figure 4. The figure plots the variation curves of the perplexity metrics with the number of training rounds for the existing baseline method and this paper’s method on the newstest2023 test set. The BLEU metrics and the perplexity metrics are the mainstream metrics used for the evaluation of machine translations at present.

The performance of different models in terms of perplexity on the corpus test set of the English-Chinese Translation Corpus with the training process is shown in Figure 5. It plots the change curves of perplexity metrics with training rounds on the English-Chinese Translation Corpus for the existing baseline method and this paper’s method. It illustrates the perplexity performance of the target translations generated by different methods, in which the target translations generated by this paper’s method obtain a better performance in terms of perplexity in the target language compared with other methods, and the lower the perplexity, the better the performance of the model.

3.1.3

Comparison results of performance experiments of different methods

The corpus of the English-Chinese Translation Corpus is an important corpus resource for measuring the robustness of the model due to the fact that the corpus of the English-Chinese Translation Corpus has a complex representation and contains more noise. In addition to the performance of different models and methods on the test dataset newstest2023 and the corpus of the English-Chinese Translation Corpus, this paper provides statistics on the average perplexity, the memory occupied by the models, and the average BLEU value of different methods in the experiments, so as to facilitate the performance of different methods can be analyzed and compared intuitively. The results of the experimental comparison of the performance of the different methods are shown in Table 1. The results show that the Encoder-Decoder model has the lowest average value of perplexity at 7.02, and its average BLEU value and the memory occupied by the model are the largest, with corresponding values of 29.93 and 298M, respectively. The mean value of the perplexity of the other five models ranged from 11.63 to 36.67, the average BLEU value ranged from 19.57 to 24.49, and the memory occupied by the models ranged from 89M to 255M. In contrast, the Encoder-Decoder model with adversarial training method proposed in this paper has a better performance on the test data compared to the traditional Transformer and Maximum Likelihood based training methods.

Table 1.

Performance experiment comparison results of different methods

Modell	Perplexity	Param (M)	BLEU
Encoder-Decoder	7.02	298	29.93
Transformer+RL	11.63	255	24.49
ATransformer	15.04	174	24.05
Transformer	19.53	136	22.83
RNN-embed	36.67	89	19.57
NN PR	41.98	97	17.03

3.2

Analysis of word alignment and attentional information results for experimental example sentences

3.2.1

Word alignment of experimental example sentences

The above results compare the performance of different models. In order to verify the effectiveness of multilayer aggregation, this paper visualizes and analyzes the distribution of word alignment and attentional information of sentence pairs composed of Chinese and generated English translations. The word alignment results of traditional Chinese-English sentence pairs are shown in Figure 6. It shows the visualization results of the word-to-word relationship between the source language sentences and the target language sentences on the experimental example sentences using statistical machine translation based on the shallow machine learning method. It is obvious that the traditional word alignment method only fixesly aligns the Chinese words with their corresponding translated words and does not reflect the relationship with other words.

3.2.2

Attention Distribution of Chinese and English Sentence Pairs

The attention distribution of Chinese-English sentence pairs based on Encoder-Decoder method is shown in Fig. 7.The result of the visualization of the attention distribution of the word-to-word relationship between the source language sentence and the target language sentence on the experimental example sentences by Encoder-Decoder method. The association between words can be found in the figure, and it is shown that the darker the color, the stronger the connection. Compared to traditional methods, the Encoder-Decoder method machine translation model proposed in this paper is effective.

3.3

Experimental study of dual-drive teaching of translation

Two classes of English majors in the second year of University Z were randomly selected as research samples and these students were used as the subjects for the implementation of this teaching experiment. The reason for the selection is that the teachers of the two classes belong to the same school, so the interference caused by differences and similarities between teachers can be ignored. Moreover, the contents and hours of instruction of the two classes are the same, and the main differences include the main textbook, common teaching aids, and the teaching mode. Second-year university students have a strong sense of learning, more stable learning habits and behaviors, both independent learning autonomy and excellent learning cooperation, can actively participate in the teaching process of the teacher and are willing to carry out a variety of useful teaching activities to try.

3.3.1

Effectiveness of Translation Theory and Technique Acquisition

In the Levene’s test of variance chi-square, Fmax=1.63<2, indicating that the data of the two groups, the control group and the experimental group, are variance chi-square, and that there is no difference between the distribution of the data of both groups in normal distribution condition. Then, an independent samples t-test was conducted to examine the overall effect value of lexical learning in both experimental and control groups. The results of the t-test of independence between the control group and the experimental group on lexical mastery are shown in Table 2. The data show that the experimental group (M=88.96, SD=2.958) indeed promotes college students’ better t=-9.378, P<0.05 (one-tailed test) than the control group (M=81.42, SD=3.741) on the mastery of translation theory and skills in English translation.

Table 2.

The independent t test results of the two groups

Group	N	Mean	SD	t	P
Control group	54	81.42	3.741	-9.378	<0.05
Experimental group	53	88.96	2.958	-9.378	<0.05

The results of the independent samples t-test between the control class and the experimental group on each sub-component of lexical mastery are shown in Table 3. The data show that in terms of translation theory mastery in the dimension of university English translation theory and skills, the experimental group (M=42.58, SD=2.148) is indeed more able to form a good learning effect for university students than the control group (M=38.14, SD=3.021), t=5.269, P<0.05 (one-tailed test). In terms of mastery of translation skills in the dimension of university English translation theory and skills, the experimental group (M=44.39, SD=2.002) did enhance the mastery of college students more than the control group (M=37.83, SD=2.679), t=-9.884, p<0.05 (one-tailed test).

Table 3.

The independence test of the sub-parts of each subsection is mastered

Dimension	Group	Mean	SD	t	P
Translation theory(50)	Control group	38.14	3.021	-5.269	<0.05
Translation theory(50)	Experimental group	42.58	2.148	-5.269	<0.05
Translation technique(50)	Control group	37.83	2.679	-9.884	<0.05
Translation technique(50)	Experimental group	44.39	2.002	-9.884	<0.05

3.3.2

Effects of English-Chinese Translation Mastery

The results of independent samples t-test of the control group and the experimental group on English-Chinese translation are shown in Table 4. It can be seen that in the Levene’s test of variance chi-square, Fmax=1.03<2, according to the thumb principle indicates that the data of both the control group and the experimental group are both normally distributed conditions, and there is no difference between them. Then, an independent sample t-test was further conducted on the values of the overall effect of syntactic learning for both the experimental and control groups. The data show that the experimental group (M=81.43,) indeed promotes college students’ mastery of English-Chinese interpreting skills in English translation more than the control group (M=75.17), t=-7.983, p<0.05, the difference between the two groups is significant.

Table 4.

The independent sample t test of the two groups of English and Chinese

Group	N	Mean	SD	t	P
Control group	54	75.17	3.274	-7.983	<0.05
Experimental group	53	81.43	2.762	-7.983	<0.05

The results of the independent samples t-test of the control group and the experimental group in each sub-section of English-Chinese translation are shown in Table 5. The data show that in the major aspect of English-Chinese mutual translation in university English translation, there is still a more significant difference between the experimental group that adopts dual-drive teaching and the control group that adopts traditional lecture-based teaching in the specific sub-links. Specifically, in the aspect of mastering single-sentence mutual translation in university English translation, the experimental group (M=16.08,) does promote the mastery of university students more than the control group (M=12.17), t=-6.847, p<0.05, the difference is significant. In terms of the learning of compound sentence interpreting in university English translation, the experimental group (M=15.82) did enhance the mastery of college students more than the control group (M=9.22), t=-10.048, p<0.05, the difference is significant. In terms of learning the use of subordinate clauses in college English translation, the experimental group (M=15.47) did enhance college students’ mastery more than the control group (M=10.14), t=-7.739, p<0.05, significant difference. In terms of learning to master the use of conversion and restructuring in university English translation, the experimental group (M=17.88) did enable university students to master it better than the control group (M=10.14), t=-12.972, p<0.05 significant difference. In terms of learning to master the conversion and application of stylistic types in college English translation, the experimental group (M=18.46) did enable college students to master better than the control group (M=13.02), t=-8.405, p<0.05, the difference is significant.

Table 5.

The independence test of the English and Chinese translation

Dimension	Group	Mean	SD	t	P
Single sentence(20)	Control group	12.17	2.08	-6.847	<0.05
Single sentence(20)	Experimental group	16.08	3.06	-6.847	<0.05
Compound interpretation(20)	Control group	9.22	4.46	-10.048	<0.05
Compound interpretation(20)	Experimental group	15.82	1.22	-10.048	<0.05
Usage of clauses(20)	Control group	10.27	1.36	-7.739	<0.05
Usage of clauses(20)	Experimental group	15.45	1.24	-7.739	<0.05
Conversion and structural adjustment (20)	Control group	10.14	2.17	-12.972	<0.05
Conversion and structural adjustment (20)	Experimental group	17.88	2.37	-12.972	<0.05
The conversion and application of stylistic types (20)	Control group	13.02	3.6	-8.405	<0.05
The conversion and application of stylistic types (20)	Experimental group	18.46	1.06	-8.405	<0.05

3.3.3

Analysis of Students’ Learning Effectiveness in Writing

The independent samples t-test for the control and experimental groups of writing learning is shown by Table 6. It can be seen that, firstly, in the Levene’s test of variance chi-square, Fmax=1.46<2, which indicates that the data of the two groups, the control group and the experimental group, are variance chi-square, and the distribution of the data of the two groups is normal distribution condition between them is not different. Then, an independent sample t-test was conducted to assess the overall effect value of lexical learning for both experimental and control groups. The data showed that the experimental group (M=83.15) did promote college students’ learning of English writing more than the control group (M=72.34), t=-16.124, p<0.05, and the difference between the two groups was significant.

Table 6.

The independent sample t test of the two groups of students

Group	N	Mean	SD	t	P
Control group	54	72.34	3.415	-16.124	<0.05
Experimental group	53	83.15	2.701	-16.124	<0.05

The independent samples t-tests of the control group and the experimental group in each sub-section of writing learning are shown in Table 7. The data show that there is still a more significant difference between the experimental group with dual-drive teaching and the control group with traditional didactic teaching in the teaching of college English writing in each of the specific subsections.

Table 7.

The independent sample t test of the sub-stages of the study

Dimension	Group	Mean	SD	t	P
General essay writing(25)	Control group	13.35	1.58	-10.992	<0.05
General essay writing(25)	Experimental group	21.87	2.97	-10.992	<0.05
Common application writing(25)	Control group	17.83	1.51	-7.209	<0.05
Common application writing(25)	Experimental group	23.75	0.94	-7.209	<0.05
Descriptive writing(25)	Control group	20.44	0.22	-3.759	<0.05
Descriptive writing(25)	Experimental group	24.94	3.24	-3.759	<0.05
Discerning writing (25)	Control group	13.48	2.25	-10.268	<0.05
Discerning writing (25)	Experimental group	20.82	0.99	-10.268	<0.05

Specifically, the experimental group (M=21.87) did promote college students’ mastery more than the control group (M=13.35) in terms of general short essay writing in college English writing, t=-10.992. The experimental group (M=23.75) did enhance college students’ mastery more than the control group (M=17.83) in terms of learning common application writing in college English writing, t=-7.209. In terms of descriptive writing learning in college English writing, the experimental group (M=24.94) did enhance college students’ mastery more than the control group (M=20.44), t=-3.759. The experimental group (M=20.82) did lead to better mastery of college students than the control group (M=13.48) in terms of learning mastery of explicit discriminative writing in college English writing, t=-10.268. Overall, there was a significant difference (<0.05) between the performance of the experimental group and the control group in each sub-section of writing learning.

4

Conclusion

This paper constructs an English-Chinese neural machine translation model based on corpus technology, and then evaluates and analyzes the learning effect of students under the dual-drive model of English teaching. The main conclusions are as follows: 1)

With the change of sentence length, Encoder-Decoder method performs best under different sentence lengths. And on the newstest2023 test set, the Encoder-Decoder model has the lowest average value of perplexity, which is 7.02, and its average BLEU value is the largest, which is 29.93. It is obvious that the Encoder-Decoder model proposed in this paper with the adversarial training method performs well on the test data.

2)

The results of the attention distribution of Chinese-English sentence pairs based on the Encoder-Decoder method show that the machine translation model of the Encoder-Decoder method proposed in this paper has a closer word-to-word connection between the source language sentences and the target language sentences than the traditional method, which proves that the method proposed in this paper is effective.

3)

In terms of students’ learning effects, dual-drive teaching also showed more obvious effects. Students in the experimental group significantly increased 4.44 and 6.56 points (p < 0.05) over the control group in terms of mastery of the dimensions of university English translation theory and skills, respectively. It can be seen that it can promote college students’ mastery of translation theory and skills in English translation. In the five subsections of English-Chinese translation, namely, “single-sentence translation, compound-sentence translation, use and conversion of subordinate clauses, use of structural adjustments, and conversion and application of stylistic types”, the mean scores of students in the experimental class were significantly higher than those of students in the control class by 3.91, 6.6, 5.18, 7.74, and 5.44 points, respectively (P <0.05). In the four subsections of “general short essay writing, common application essay writing, descriptive writing and discursive writing” of college English writing teaching, the experimental group using dual-drive teaching significantly increased their scores by 8.52, 5.92, 4.5 and 7.34 points, respectively, compared with the control group using the traditional didactic teaching (p<0.05). It can be seen that the impact of a college English classroom based on dual-drive teaching can become more effective, interesting, and efficient.

Idioma:: Inglés

Calendario de la edición:: 1 veces al año
Temas de la revista:: Ciencias de la vida, Ciencias de la vida, otros, Matemáticas, Matemáticas aplicadas, Matemáticas generales, Física, Física, otros

RSS Feed de revista

Corpus-Driven Deep Learning-Based English-Chinese Translation Model Construction and Its Application to College English Teaching

Fang Ju

Publicado en línea: 21 mar 2025

Recibido: 13 nov 2024

Aceptado: 15 feb 2025

DOI: https://doi.org/10.2478/amns-2025-0565

Palabras claveEnglish-Chinese translation, Attention mechanism, Encoder-Decoder, BLEU, Deep neural network

© 2025 Fang Ju, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Palabras clave
English-Chinese translation, Attention mechanism, Encoder-Decoder, BLEU, Deep neural network