Corpus-Driven Deep Learning-Based English-Chinese Translation Model Construction and Its Application to College English Teaching
Publicado en línea: 21 mar 2025
Recibido: 13 nov 2024
Aceptado: 15 feb 2025
DOI: https://doi.org/10.2478/amns-2025-0565
Palabras clave
© 2025 Fang Ju, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The application scenarios of translation modeling have been gradually expanded from single bilingual translation to speech translation, conference translation, intelligent Q&A, document translation and video subtitle translation and other directions [1–2]. With the innovation of translation modeling, the accuracy of English-Chinese translation modeling has put forward higher requirements. In the process of English-Chinese translation using the English-Chinese translation model, it is affected by factors such as ambiguity of English utterances and language expression habits, which leads to ambiguous utterances in English-Chinese translation and poor accuracy of semantic analysis, resulting in a decline in translation quality [3–4]. Therefore, it is necessary to update and optimize the existing English-Chinese translation model or reconstruct a new model.
English-Chinese translation is a multi-attribute semantic decision-making problem, which needs to be analyzed by relevance and semantic similarity, combined with fuzzy decision-making model for the design of semantic expression preferences for translation [5–7]. And the corpus well assists the construction of English-Chinese translation model. In addition, the combination of corpus and English-Chinese translation modeling in university English teaching has become increasingly close, and the focus and themes of research attention have gone beyond the research field of traditional teaching and have a broader extension, mainly including: English for Specialized Purposes, terminology research, evaluation of students’ translations, translator cultivation, curricula and teaching materials, and other fields [8–10]. At the same time, people’s understanding of the concept of corpus has also broken through and expanded, from the monolingual or bilingual parallel corpus at the beginning, and began to try to build a one-time corpus, student translation work archive, etc., to provide convenience for English teaching [11–12]. As an evaluation tool, corpus has a high value in evaluating the quality of English-Chinese translation models. And corpus and English-Chinese translation are widely used in the field of university English teaching [13–15]. Nowadays, the English-Chinese translation model can be used to be applied to English education, to update the university English teaching materials, to help the reform of teaching materials and to improve the quality of teaching, and the construction of the new English-Chinese translation model can contribute to this.
In this paper, in order to investigate the effect of the application of English-Chinese translation model in university English teaching, an English-Chinese bilingual corpus was first constructed. After that, we estimate the probability of whether the bilingual pairs are equivalent translations with the support of the deep neural network Transformer model, so as to encode the collected data and construct a parallel corpus, and on the basis of which we construct an Encoder-Decoder model with an attention mechanism. Finally, the effects of this model on dual-drive university English teaching are analyzed to support the application of corpus-driven English-Chinese translation learning approaches in classroom teaching.
According to the problems related to the corpus construction, targeted pre-preparation work was carried out, including the four elements of collecting and organizing the corpus, upgrading the word alignment model and entering the parallel sentence pairs into the corpus by using the web mining technology [16] and web content collection algorithms, in order to smoothly carry out the construction and management of the English-Chinese bilingual corpus in the following period. The following section specifically introduces the pre-preparation process of the bilingual corpus based on English-Chinese machine translation, as well as the specific construction process of collection, labeling, and processing. The corpus design process can be seen in Figure 1.

Process of corpus design
Source of corpus
Since English-Chinese machine translation has high data requirements, it requires a large amount of English-Chinese bilingual corpus, and the methods of data acquisition are more diverse. Considering the need to ensure the adequacy of the corpus, the final choice is to crawl the “English-Chinese Detailed Dictionary” combined with Python crawler technology.
Entry of corpus
After confirming the dictionary crawl, the entry for the Chinese and English corpus is started. First of all, use a high-speed scanner to scan the English-Chinese Dictionary, but it should be noted that the scanned format is PDF, and it is necessary to convert the PDF format to the text format of the corpus, i.e., TXT format. Web-based English-Chinese corpus acquisition method, i.e., crawling English websites through a crawler program, and then obtaining corpus documents, from which valuable English information is extracted as the corpus input source. Using a generic web crawler based on the width-first data search strategy, we analyze the HTML documents of English “navigation websites” to extract the seed sets of URLs of multiple English websites, then we select the seed URLs to crawl each English website in turn, and download and save the documents of the web pages. After five steps of HTML tag filtering, alphabet recognition algorithm, phrase filtering, repetition filtering, and word spelling checking to extract the available English corpus, it is finally saved in an XML document as the corpus source.
Pre-processing of the corpus
In order to ensure the quality of the corpus and the accuracy of the study, it is necessary to proofread the scanned text of the corpus carefully, check the corpus for garbled codes, spelling errors and whether the specific content is different from the meaning of the original text, and proofread and correct it in time. The preprocessing of the corpus mainly involves standardizing the format and removing various impurities to achieve the function of accurate sharing of the Chinese and English corpora. After inputting the proofreading corpus, standardizing the format of the corpus and removing impurities, the Chinese and English corpus is divided into different documents, each of which is named with letters to facilitate the querying and loading of the documents.
Categorical labeling of the corpus
Classification Criteria: When categorizing corpus information, there are different classification criteria. To address this, the text level of each corpus can be categorized at multiple levels by tagging the three basic attributes of each corpus: style, genre, and domain. The corpus can be categorized according to genre into three types: “Literature, Journalism, and Practical Writing”.
Bias labeling: the successful establishment of an English-Chinese bilingual corpus aims to be able to better serve machine translation and language learners in mastering the use of machine-translated languages and their development process. The types of errors in the corpus may lead to the creation of other error factors. In order to objectively and accurately grasp the frequency of learners’ use of specific single terms or expressions, the combination or connection between linguistic elements, it is necessary to clearly identify and label the types of errors in the corpus.
The automatic alignment procedure is used to identify sentence/paragraph boundaries thus turning into paragraph-level/sentence-level bilingual alignment, and the constraints of the IGT model are introduced into the log-linear word alignment model, which in turn changes the word sequences in the global context scope; to further enhance the word order, the syntactic tree constraints are integrated into the IGT-based word alignment model. After integrating two types of syntactic knowledge points, it can effectively limit word order variation during word alignment in global and other local regions. Combining the automation technique with manual checking, the automatic alignment results are manually reviewed to obtain the correct sentence/paragraph boundary markers and alignment markers for the English-Chinese bilingual parallel corpus.
The parallel corpus
The extraction model for English-Chinese bilingual parallel sentence pairs utilizes a deep neural network to learn cross-linguistic semantic information and uses this information to make a probabilistic estimation of whether a bilingual sentence pair is a reciprocal translation or not, i.e.,
For the set of source sentences
After obtaining the coding layer representation of the bilingual sentences, in order to capture the reliable features of whether the bilingual sentences are reciprocal translations or not, the product
In the inference phase of the model, a sentence pair is marked as a parallel positive example if its probability score equals or exceeds a predetermined threshold of
Starting from processing the data, the steps of the experiment are in order:
The input and target statements are indexed in the network layer to create a dictionary class, “SOS” and “EOS” denote the start and end markers of a sentence, respectively, and these two characters are divided into their 0 and 1 labels. While analyzing each word, if there is an unmarked word in the sentence, the out-of-the-way character is stored in a separate dictionary. Convert the data file stored in Unicode characters into a file stored in ASCII, and make all the contents lowercase, and trim most of the punctuation marks. Divide the file storing the data into lines, then split it into pairs, respectively in the form of Chinese against English, and standardize the text according to the length and content of the data. Because the whole dataset is too large, only a part of the dataset is filtered as the training set. In the training, we select the sentences with sentence length less than 30 as the training set, filter the statements that do not satisfy the conditions, and store the Chinese and English that satisfy the conditions under different dictionaries respectively.
Given a source utterance F, how to predict the output target utterance E is what the Encoder-Decoder [18–19] model mainly does. The main task of this model in machine translation is to compute the target utterance with the highest probability, i.e. the best match, based on the source’s utterance. Before calculating the final probability, the initial state of the language model is determined by placing the source utterance F on an RNN. The basic idea of the Encoder-Decoder model is to encode the source utterance F on an Encoder neural network layer, run it to derive a vector of actual values h, i.e., the information of the hidden layer, and then later on, use another neural network layer Decoder to predict the target utterance E. The Encoder-Decoder model’s structure can be seen in Fig. 2.

Encoder Decoder model diagram
Where Encoder is denoted as
The operational steps of the Encoder-Decoder model are shown below:
First calculate the The value of the hidden layer in the Encoder when the source statement F is on time step By computing In the decoding phase, at each time step After that, the decoder is run to compute Finally by calculating the probability In the Encoder-Decoder model, the Encoder learns the information about the data that is fed to him internally.
Build an encoder. First build an encoder, the encoder initializes the hidden state and the number of network layers, but the number of network layers is placed outside the recurrent neural network, if placed inside, the number of layers of the network determines the number of hidden states, therefore, to simplify the encoder, the number of layers of the network is built outside the recurrent neural network, then there is now only one hidden state, which continues to be passed on to the subsequent network layers Propagation. In forward propagation, the input sequence of labels is transformed into a sequence of word vectors, followed by the network output results and hidden states.
Build a decoder. Next, a decoder is built and one more linear layer is added to the decoder as an output layer, and the dimension is set to the number of words since the output is finally determined based on the probability of the output.
Add the attention mechanism to the decoder. In order to improve the translation quality and reduce the translation complexity, we put the attention mechanism in the decoder and build a decoder containing the attention mechanism.
To define the attention mechanism, a portion of other network layers are introduced in the initialization. In the forward function, the network inputs are first transformed into word vectors, then word vectors
Where
After building the model, we have to train it with the data, in the process of encoding, we have to define an empty sequence, the output after each step of encoding is filled into this empty sequence, and finally the hidden state is saved as the initial hidden state of the decoding process.
When training the data, first the sequence of source statements (
The output sequence produced after the RNN model is: (
Secondly, when defining the decoding process, the input and hidden state are passed into the decoder one at a time, and the encoded output is passed in during the decoding process, and then the result obtained is passed into the next step of the decoder, and if a terminator is encountered during the loop, then the sentence ends there and jumps out of the loop.
DDL can be viewed as a unique inductive pedagogy, where the main form of teaching is to allow students to discover patterns from the data themselves through induction, rather than through lectures by the teacher. Teachers cannot predict what students will discover, and their discoveries may be completely new or laws unfamiliar to both teachers and experts. Students are the center of teaching and learning activities, and teachers are only instructors and coordinators. DDL can be further categorized into direct DDL and indirect DDL based on whether the students have direct access to computers or not. Direct DDL necessitates specific computer hardware and software, but indirect DDL is provided by the teacher with corpus text, which eliminates students’ direct dependence on computers and is thus more convenient for classroom teaching. In fact, with the popularization of computers and cell phone networks, both data-driven learning approaches can be applied to classroom teaching.
Indirect DDL uses corpus resources that have been processed and organized by the teacher in smaller quantities, and the difficulty and content can be controlled by the teacher, so it is more suitable for teaching words, phrases, and grammar. The basic steps of "identification-classification-induction" were proposed, and the methods of word teaching were discussed with examples such as "the difference between convince and persuade" and "the usage of should", which proved the good effect of data-driven. The indirect DDL method of teachers organizing “micro-texts” is a feasible and effective way of teaching. Teachers can control the content and difficulty of the corpus, and discuss it with students to prevent them from being misled by wrong generalization or over-generalization. A native-speaker-driven approach is also more likely to capture students’ interest in learning and make them better understand English words and phrases, as well as the similarities and differences between English and Chinese expressions.
In direct DDL, students can personally conduct searches on huge amounts of linguistic data and use it as an aid to self-motivated learning. This approach is mainly used for composition, translation training, and independent learning. By utilizing an online web corpus, students can verify their expressions, identify correctness and error, and eventually find the correct form of language. Empirical studies have shown that the corpus-based approach to college English writing teaching can effectively improve students’ writing skills. In terms of translation teaching, the characteristics of a corpus with convenient search and rich content are conducive to enhancing students’ perception of microlinguistic phenomena, which can effectively improve the accuracy and efficiency of translation. In addition to the above advantages, data-driven teaching based on bilingual corpus also has the advantage of being native language-driven, which has more room for play in English teaching.
In order to verify the effectiveness of the methods in this paper, the original model Transformer, the improved model Transformer model, the maximum likelihood training method, and the adversarial training method are ranked and combined respectively, and compared with other baseline methods. There are six methods, including the combined four methods and two other methods.
The comparison results of the BLEU scores of different models as a function of sentence length are shown in Fig. 3. The figure plots the variation curves of the evaluation metrics BLEU scores with sentence length for the generated translations on the test dataset newstest2023 between the existing baseline method and this paper’s method. The figure illustrates that with the change of sentence length, this paper’s method achieves better performance compared with other methods at different sentence lengths.

The BLEU score of different models varies with the length of the sentence
The test set newstest2023 belongs to the news domain, as a public corpus although it can reflect the performance of various methods to some extent, the robustness of the methods is very important. Many research methods, although they can achieve good performance on public datasets, do not perform well in real production environments. This is not because the model is overfitted, but because the public corpus has been pre-processed and standardized to do a certain amount of feature engineering, and the feature-engineered corpus data naturally performs very well. The results of the performance of the different models in terms of perplexity with the training process on the newstest2023 test set are shown in Figure 4. The figure plots the variation curves of the perplexity metrics with the number of training rounds for the existing baseline method and this paper’s method on the newstest2023 test set. The BLEU metrics and the perplexity metrics are the mainstream metrics used for the evaluation of machine translations at present.

Different models of confusion in the newstest2023 test set
The performance of different models in terms of perplexity on the corpus test set of the English-Chinese Translation Corpus with the training process is shown in Figure 5. It plots the change curves of perplexity metrics with training rounds on the English-Chinese Translation Corpus for the existing baseline method and this paper’s method. It illustrates the perplexity performance of the target translations generated by different methods, in which the target translations generated by this paper’s method obtain a better performance in terms of perplexity in the target language compared with other methods, and the lower the perplexity, the better the performance of the model.

The confusion of different models in the microblogs test set
The corpus of the English-Chinese Translation Corpus is an important corpus resource for measuring the robustness of the model due to the fact that the corpus of the English-Chinese Translation Corpus has a complex representation and contains more noise. In addition to the performance of different models and methods on the test dataset newstest2023 and the corpus of the English-Chinese Translation Corpus, this paper provides statistics on the average perplexity, the memory occupied by the models, and the average BLEU value of different methods in the experiments, so as to facilitate the performance of different methods can be analyzed and compared intuitively. The results of the experimental comparison of the performance of the different methods are shown in Table 1. The results show that the Encoder-Decoder model has the lowest average value of perplexity at 7.02, and its average BLEU value and the memory occupied by the model are the largest, with corresponding values of 29.93 and 298M, respectively. The mean value of the perplexity of the other five models ranged from 11.63 to 36.67, the average BLEU value ranged from 19.57 to 24.49, and the memory occupied by the models ranged from 89M to 255M. In contrast, the Encoder-Decoder model with adversarial training method proposed in this paper has a better performance on the test data compared to the traditional Transformer and Maximum Likelihood based training methods.
Performance experiment comparison results of different methods
Modell | Perplexity | Param (M) | BLEU |
---|---|---|---|
Encoder-Decoder | 7.02 | 298 | 29.93 |
Transformer+RL | 11.63 | 255 | 24.49 |
ATransformer | 15.04 | 174 | 24.05 |
Transformer | 19.53 | 136 | 22.83 |
RNN-embed | 36.67 | 89 | 19.57 |
NN PR | 41.98 | 97 | 17.03 |
The above results compare the performance of different models. In order to verify the effectiveness of multilayer aggregation, this paper visualizes and analyzes the distribution of word alignment and attentional information of sentence pairs composed of Chinese and generated English translations. The word alignment results of traditional Chinese-English sentence pairs are shown in Figure 6. It shows the visualization results of the word-to-word relationship between the source language sentences and the target language sentences on the experimental example sentences using statistical machine translation based on the shallow machine learning method. It is obvious that the traditional word alignment method only fixesly aligns the Chinese words with their corresponding translated words and does not reflect the relationship with other words.

The word alignment results in the traditional Chinese and English sentences
The attention distribution of Chinese-English sentence pairs based on Encoder-Decoder method is shown in Fig. 7.The result of the visualization of the attention distribution of the word-to-word relationship between the source language sentence and the target language sentence on the experimental example sentences by Encoder-Decoder method. The association between words can be found in the figure, and it is shown that the darker the color, the stronger the connection. Compared to traditional methods, the Encoder-Decoder method machine translation model proposed in this paper is effective.

The distribution of English and Chinese in the Encoder-Decoder model
Two classes of English majors in the second year of University Z were randomly selected as research samples and these students were used as the subjects for the implementation of this teaching experiment. The reason for the selection is that the teachers of the two classes belong to the same school, so the interference caused by differences and similarities between teachers can be ignored. Moreover, the contents and hours of instruction of the two classes are the same, and the main differences include the main textbook, common teaching aids, and the teaching mode. Second-year university students have a strong sense of learning, more stable learning habits and behaviors, both independent learning autonomy and excellent learning cooperation, can actively participate in the teaching process of the teacher and are willing to carry out a variety of useful teaching activities to try.
In the Levene’s test of variance chi-square, Fmax=1.63<2, indicating that the data of the two groups, the control group and the experimental group, are variance chi-square, and that there is no difference between the distribution of the data of both groups in normal distribution condition. Then, an independent samples t-test was conducted to examine the overall effect value of lexical learning in both experimental and control groups. The results of the t-test of independence between the control group and the experimental group on lexical mastery are shown in Table 2. The data show that the experimental group (M=88.96, SD=2.958) indeed promotes college students’ better t=-9.378, P<0.05 (one-tailed test) than the control group (M=81.42, SD=3.741) on the mastery of translation theory and skills in English translation.
The independent t test results of the two groups
Group | N | Mean | SD | t | P |
---|---|---|---|---|---|
Control group | 54 | 81.42 | 3.741 | -9.378 | <0.05 |
Experimental group | 53 | 88.96 | 2.958 |
The results of the independent samples t-test between the control class and the experimental group on each sub-component of lexical mastery are shown in Table 3. The data show that in terms of translation theory mastery in the dimension of university English translation theory and skills, the experimental group (M=42.58, SD=2.148) is indeed more able to form a good learning effect for university students than the control group (M=38.14, SD=3.021), t=5.269, P<0.05 (one-tailed test). In terms of mastery of translation skills in the dimension of university English translation theory and skills, the experimental group (M=44.39, SD=2.002) did enhance the mastery of college students more than the control group (M=37.83, SD=2.679), t=-9.884, p<0.05 (one-tailed test).
The independence test of the sub-parts of each subsection is mastered
Dimension | Group | Mean | SD | t | P |
---|---|---|---|---|---|
Translation theory(50) | Control group | 38.14 | 3.021 | -5.269 | <0.05 |
Experimental group | 42.58 | 2.148 | |||
Translation technique(50) | Control group | 37.83 | 2.679 | -9.884 | <0.05 |
Experimental group | 44.39 | 2.002 |
The results of independent samples t-test of the control group and the experimental group on English-Chinese translation are shown in Table 4. It can be seen that in the Levene’s test of variance chi-square, Fmax=1.03<2, according to the thumb principle indicates that the data of both the control group and the experimental group are both normally distributed conditions, and there is no difference between them. Then, an independent sample t-test was further conducted on the values of the overall effect of syntactic learning for both the experimental and control groups. The data show that the experimental group (M=81.43,) indeed promotes college students’ mastery of English-Chinese interpreting skills in English translation more than the control group (M=75.17), t=-7.983, p<0.05, the difference between the two groups is significant.
The independent sample t test of the two groups of English and Chinese
Group | N | Mean | SD | t | P |
---|---|---|---|---|---|
Control group | 54 | 75.17 | 3.274 | -7.983 | <0.05 |
Experimental group | 53 | 81.43 | 2.762 |
The results of the independent samples t-test of the control group and the experimental group in each sub-section of English-Chinese translation are shown in Table 5. The data show that in the major aspect of English-Chinese mutual translation in university English translation, there is still a more significant difference between the experimental group that adopts dual-drive teaching and the control group that adopts traditional lecture-based teaching in the specific sub-links. Specifically, in the aspect of mastering single-sentence mutual translation in university English translation, the experimental group (M=16.08,) does promote the mastery of university students more than the control group (M=12.17), t=-6.847, p<0.05, the difference is significant. In terms of the learning of compound sentence interpreting in university English translation, the experimental group (M=15.82) did enhance the mastery of college students more than the control group (M=9.22), t=-10.048, p<0.05, the difference is significant. In terms of learning the use of subordinate clauses in college English translation, the experimental group (M=15.47) did enhance college students’ mastery more than the control group (M=10.14), t=-7.739, p<0.05, significant difference. In terms of learning to master the use of conversion and restructuring in university English translation, the experimental group (M=17.88) did enable university students to master it better than the control group (M=10.14), t=-12.972, p<0.05 significant difference. In terms of learning to master the conversion and application of stylistic types in college English translation, the experimental group (M=18.46) did enable college students to master better than the control group (M=13.02), t=-8.405, p<0.05, the difference is significant.
The independence test of the English and Chinese translation
Dimension | Group | Mean | SD | t | P |
---|---|---|---|---|---|
Single sentence(20) | Control group | 12.17 | 2.08 | -6.847 | <0.05 |
Experimental group | 16.08 | 3.06 | |||
Compound interpretation(20) | Control group | 9.22 | 4.46 | -10.048 | <0.05 |
Experimental group | 15.82 | 1.22 | |||
Usage of clauses(20) | Control group | 10.27 | 1.36 | -7.739 | <0.05 |
Experimental group | 15.45 | 1.24 | |||
Conversion and structural adjustment (20) | Control group | 10.14 | 2.17 | -12.972 | <0.05 |
Experimental group | 17.88 | 2.37 | |||
The conversion and application of stylistic types (20) | Control group | 13.02 | 3.6 | -8.405 | <0.05 |
Experimental group | 18.46 | 1.06 |
The independent samples t-test for the control and experimental groups of writing learning is shown by Table 6. It can be seen that, firstly, in the Levene’s test of variance chi-square, Fmax=1.46<2, which indicates that the data of the two groups, the control group and the experimental group, are variance chi-square, and the distribution of the data of the two groups is normal distribution condition between them is not different. Then, an independent sample t-test was conducted to assess the overall effect value of lexical learning for both experimental and control groups. The data showed that the experimental group (M=83.15) did promote college students’ learning of English writing more than the control group (M=72.34), t=-16.124, p<0.05, and the difference between the two groups was significant.
The independent sample t test of the two groups of students
Group | N | Mean | SD | t | P |
---|---|---|---|---|---|
Control group | 54 | 72.34 | 3.415 | -16.124 | <0.05 |
Experimental group | 53 | 83.15 | 2.701 |
The independent samples t-tests of the control group and the experimental group in each sub-section of writing learning are shown in Table 7. The data show that there is still a more significant difference between the experimental group with dual-drive teaching and the control group with traditional didactic teaching in the teaching of college English writing in each of the specific subsections.
The independent sample t test of the sub-stages of the study
Dimension | Group | Mean | SD | t | P |
---|---|---|---|---|---|
General essay writing(25) | Control group | 13.35 | 1.58 | -10.992 | <0.05 |
Experimental group | 21.87 | 2.97 | |||
Common application writing(25) | Control group | 17.83 | 1.51 | -7.209 | <0.05 |
Experimental group | 23.75 | 0.94 | |||
Descriptive writing(25) | Control group | 20.44 | 0.22 | -3.759 | <0.05 |
Experimental group | 24.94 | 3.24 | |||
Discerning writing (25) | Control group | 13.48 | 2.25 | -10.268 | <0.05 |
Experimental group | 20.82 | 0.99 |
Specifically, the experimental group (M=21.87) did promote college students’ mastery more than the control group (M=13.35) in terms of general short essay writing in college English writing, t=-10.992. The experimental group (M=23.75) did enhance college students’ mastery more than the control group (M=17.83) in terms of learning common application writing in college English writing, t=-7.209. In terms of descriptive writing learning in college English writing, the experimental group (M=24.94) did enhance college students’ mastery more than the control group (M=20.44), t=-3.759. The experimental group (M=20.82) did lead to better mastery of college students than the control group (M=13.48) in terms of learning mastery of explicit discriminative writing in college English writing, t=-10.268. Overall, there was a significant difference (<0.05) between the performance of the experimental group and the control group in each sub-section of writing learning.
This paper constructs an English-Chinese neural machine translation model based on corpus technology, and then evaluates and analyzes the learning effect of students under the dual-drive model of English teaching. The main conclusions are as follows:
With the change of sentence length, Encoder-Decoder method performs best under different sentence lengths. And on the newstest2023 test set, the Encoder-Decoder model has the lowest average value of perplexity, which is 7.02, and its average BLEU value is the largest, which is 29.93. It is obvious that the Encoder-Decoder model proposed in this paper with the adversarial training method performs well on the test data. The results of the attention distribution of Chinese-English sentence pairs based on the Encoder-Decoder method show that the machine translation model of the Encoder-Decoder method proposed in this paper has a closer word-to-word connection between the source language sentences and the target language sentences than the traditional method, which proves that the method proposed in this paper is effective. In terms of students’ learning effects, dual-drive teaching also showed more obvious effects. Students in the experimental group significantly increased 4.44 and 6.56 points (p < 0.05) over the control group in terms of mastery of the dimensions of university English translation theory and skills, respectively. It can be seen that it can promote college students’ mastery of translation theory and skills in English translation. In the five subsections of English-Chinese translation, namely, “single-sentence translation, compound-sentence translation, use and conversion of subordinate clauses, use of structural adjustments, and conversion and application of stylistic types”, the mean scores of students in the experimental class were significantly higher than those of students in the control class by 3.91, 6.6, 5.18, 7.74, and 5.44 points, respectively (P <0.05). In the four subsections of “general short essay writing, common application essay writing, descriptive writing and discursive writing” of college English writing teaching, the experimental group using dual-drive teaching significantly increased their scores by 8.52, 5.92, 4.5 and 7.34 points, respectively, compared with the control group using the traditional didactic teaching (p<0.05). It can be seen that the impact of a college English classroom based on dual-drive teaching can become more effective, interesting, and efficient.