A Study on the Influence Mechanism of English Corpus on Translation Quality in Multilingual Website Translation

Translation quality assessment refers to the process of judging translation products according to certain standards. Translation quality assessment can make translation activities comply with certain norms and standards and meet certain needs [1-2]. In the field of translation research, especially in the field of professional translation, translation evaluation has not been able to get rid of the baggage of subjectivity, a translation is acceptable to the evaluator in a certain context, but in another context or evaluator, it may become bad and unacceptable [3-4]. Both the client as an evaluator and the translation staff as a service provider feel confused within this subjective evaluation system. In order to achieve good translation results, translators must take a feasible route between language comparison and quality assessment. Further research and establishment of a set of objective criteria, on the basis of which a rigorous and considerate analysis can be carried out, is an irreversible choice to break the strange circle of subjectivity in translation quality assessment [5-8].

The Translated English Corpus (TEC) repository contains almost all the languages commonly used internationally, and the vocabulary of the Translated English Corpus is mainly derived from publicly available publications in which English is the native language [9]. It should be noted in particular that the native language of the translators of the TEC texts is by default English, with the vast majority of the texts translated after 1983 being typical of contemporary English translations [10-11]. The specific text types are biographies, novels, newspapers and magazines, of which novels account for more than 80%, magazines for 15%, and the remaining 5% are biographies and newspapers. Up to now, TEC stores roughly 10 million word times, and as the number of translations in the world continues to increase, if the copyright of the translated text is acquired, and then after scanning, editing, or labeling, new translated texts can be added, so the current TEC vocabulary has shown a continuous increasing trend [12-14]. In order to carry out an in-depth study of the characteristics of the translated text, TEC also carries out two forms of text and metadata annotation.TEC does not carry out detailed annotation in the translated text, which is a prerequisite for translating the text to ensure its completeness.For the parts such as introductions and summaries that do not need to be translated, they will be specifically made to deal with them in in the database index and they are intentionally ignored and hidden, and, therefore, they can't be found in the index [15-17]. Because of the needs of the research, the metadata of TEC has recorded the hyperlinguistic features of the translated text in detail, which specifically contains the basic information such as the translator's name, gender, nationality, occupation, etc. Meanwhile, the source language of the translated text, the publisher, the number of words, and the specifics of the author of the original text are also recorded in detail, and all of these annotations are carried out in the form of independent additional information of the text with the markup code of XML, and the commonly used TEI These annotations support the comparison of real/imaginary word ratios, class/form ratios, sentence lengths, and word collocation patterns, as well as the frequency of occurrence of words in translations and the differences in translations due to different sources, and the characteristics of translations are derived through inductive analysis [18-21]. Currently, with the help of TEC client browser, syntactic query, word frequency query, sorting of indexed line collocations, and saving of retrieval results can be performed [22].

The English corpus of translation mentioned above has been widely used in the assessment of learners' translation quality in the process of translation teaching. In the translation industry, the use of corpora for quality assessment is also increasing. How to effectively guide this kind of professional assessment and obtain better results requires researchers to grasp several key factors in the quality assessment model [23-24].

English corpus is an important tool for English translation, researchers have analyzed and synthesized the current situation of corpus in the field of translation through literature review and other analytical methods, and put forward some optimization suggestions.De Sutter, G et al. analyzed the current status of translation research with corpus as the underlying logic, and argued that based on the revised corpus translation research agenda, a multi-dimensional and multi-method approach should be taken to explore the research factors affecting corpus translation, such as socio-cultural context, technology and cognitive factors, and discussed in detail with practical cases [25].Wu, K et al. synthesized the trajectory of corpus translation research, pointed out that the research trend of corpus translation is diversified methods as well as interdisciplinary thinking, and pointed out that translation quality checking and assisted translation are the future development trend of corpus translation [26].In the field of translation, corpus mainly plays the role of assisting translation, translation quality detection and enhancement, Giampieri, P Empirical investigation reveals that corpus tools can effectively assist students to make scientific translation decisions, help students to develop and master translation skills such as vocabulary collocation, fixed expressions, etc., but at the same time pointed out that in the process of corpus use, we should be careful not to be distracted by the Internet data. [27].De Clercq, O et al. explored the quality of English-French bilingual machine translation based on the corpus research method, pointed out that there are linguistic features in machine translated texts that significantly deviate from the original French observation norms, and argued that editing and recording these features and using them for machine translation optimization would help to further improve the quality of machine translation [28].Carl, M et al. analyzed the effect of cross-linguistic syntax and semantic distance on translation production time in machine translation based on a corpus tool for multilingual alternative translation, noting that non-literalism is very difficult for translation from scratch and post-editing [29].Imankulova, A et al. conceptualized a quality estimation strategy with sentence-level round-trip translation as the core logic and combined it with a filtered pseudo-parallel corpus for data expansion training, which effectively improved the BLEU scores, and also carried out experiments to corroborate the positive effect of iterative bootstrapping on the quality improvement of translation performance [30].

In this study, the influence mechanism of English corpus on translation quality in multilingual website translation is explored mainly from two aspects: the construction of Neural Machine Translation Model (NMT) and the experiment of English corpus-assisted translation. By clustering the training corpus with K-Means clustering algorithm at the clustering layer and completing the construction of the memory module, a neural machine translation model incorporating translation memory is proposed, and the model's effect of RLEU score enhancement is analyzed for six English translation directions, namely, English-German, English-Vietnamese, English, Russian, German-English, Vietnamese-English, and Russian-English. Then, a control experiment was designed with an English parallel corpus as a control variable to verify the enhancement effect of the English corpus on translation quality. As a result, a translation quality enhancement mechanism combining the NMT model and the English corpus has been proposed.

2

Neural Machine Translation Model for Multilingual Websites

In order to make full use of corpus resources and realize high-quality translation of multilingual websites, this paper proposes a neural machine translation model that incorporates translation memories.

2.1

Machine translation language model

The role of machine translation is to convert the source language into the target language. Language modeling is the foundation of the field of natural language processing and plays an integral role in machine translation.Neural network-based language models, such as feed-forward neural networks and Transformer, also use conditional probabilities in the language model as the main path. A language model is a mathematical model that describes the laws of natural language in a form more suitable for automatic computer processing, and the final output is a probability. Specifically, the language model determines the probability that a certain sentence occurs in that language, and for common sentences, the language model should yield a high probability, and for incorrect sentences, the probability obtained by the language model should tend to zero, and a smooth translation is obtained by comparing whether or not a certain sequence of symbols is more likely to occur than another sequence of symbols.

In general, the language model decomposes the probability of a word sequence Y = y₁,y₂,…y_m into Equation (1): (1) $P (Y) = P (y_{1}) \times P (y_{2} | y_{1}) \times ... P (y_{m} | y_{1} y_{2} ... y_{m - 1})$ \[P(Y)=P({{y}_{1}})\times P({{y}_{2}}|{{y}_{1}})\times ...P({{y}_{m}}|{{y}_{1}}{{y}_{2}}...{{y}_{m-1}})\]

The N-gram model was proposed to model these conditional probabilities, where each word occurrence depends on a finite number of words preceding it, which is used to train machine translation models using maximum likelihood estimation with the help of Markov's assumptions [31]. For the monadic language model, the conditional probability of the current word does not take the context into account. For binary language models, the conditional probability of a word considers only the word preceding it. As N increases, the extracted information will be more comprehensive, but the complexity of the language model will be greater, so generally in use, the N value is set to no more than 3.

In order to realize machine translation, word embedding techniques are generally used to convert natural language text into a form that can be recognized and processed by machines. Word embedding technology represents natural language text as discrete or distributed vectors. The discrete representation, also known as One-Hot, encodes each of the z states with z bits of the state register, only one of which is valid at any given time [32]. When using the N-gram model, the input to the model is a vector of one-hot encodings for each word. As the number of words increases, the feature dimension of the unique heat encoding increases, and there is a high-dimensional sparsity problem when the number of words is large. In addition, the unique heat encoding results in vectors of different words being orthogonal to each other, and this representation cannot measure the relationship between different words. Moreover, this encoding can only reflect whether a word exists in the sentence or not, and cannot reflect the importance of the word.

Feedforward neural networks have many fewer parameters than N-gram models, each word is represented as a low-dimensional vector, modeled on a continuous space, and do not need to explicitly store all the N-grams.

2.2

Encoder-Decoder Structure

A “sequence” is a common data structure. In computer vision, an image can be viewed as a sequence of pixels. In the field of natural language processing, a sentence can be regarded as a sequence of words, phrases, and words. By improving the sequence-to-sequence (Seq2Seq) training method, the proposed encoder-decoder structure has become the current mainstream framework for neural machine translation models. The core idea of the encoder-decoder structure is to convert the source text sequence into semantic encoding by an encoder, and then decode it by a decoder.In the encoder-decoder framework, the length of the input sequence may be different from the length of the output sequence, which corresponds to the needs of translation applications. The corresponding structure of the encoder-decoder framework is shown in Figure 1.

As can be seen in Figure 1, the encoder-decoder framework consists of two parts. In the process of model training, the input parallel sentence pairs are embedded into a list of word vectors, the model parameters are randomly initialized, and the mapping relationship between the source language sequence and the target language sequence is learned during the training, and the parameters are updated until the minimum loss is obtained. Assuming that the source sequence S = (a₁,a₂,a₃) and the target sentence T = (b₁,b₂) are a pair of parallel corpus, the Encoder is responsible for extracting the information features of the source sequence from the source sentence sequence S to generate a context vector. The decoder generates one target word at a time based on the context vector. The start marker of the target sequence is “<sos>” and the end marker is “<eos>”.

The inference process of the encoder-decoder framework can be represented as shown in Fig. 2. The inputs to the decoder are the context vector, the hidden state of the previous moment, and the true predicted output of the previous moment, i.e., at this point, the actual prediction result of each step is used as the input for the next prediction by replacing the true target text entered in the training phase with the actual prediction result of each step. The training method of using only the correctly labeled corresponding words as inputs for the next moment is known as Teacher Forcing mechanism, which is used to effectively alleviate the problems such as weak prediction ability of early recurrent neural networks during the training phase. Both the inference and prediction phases of the model have a special flag “<sos>” as the initial state of the decoder and an end flag “<eos>”.

Encoder-Decoder model has been widely used in various fields of natural language and speech processing, including machine translation, text generation, speech recognition, etc. Commonly composed Encoder-Decoder structure are RNN, CNN, etc.

2.3

BLEU metrics for machine translation evaluation

BLEU is currently the most widely used n metric for automatic evaluation of meta-matching in machine translation, with the advantages of fast speed and low cost [33]. Its core idea is to statistically and computationally characterize the degree of similarity between machine translated translations and reference translations in terms of their n-gram accuracy, where n denotes the length of the cuts to the sentences. This automatic evaluation method of n can be specifically chosen according to the actual situation.BLEU allows one machine translated translation to correspond to multiple reference translations.

The accuracy of n-gram matching p_n can be expressed as shown in Equation (2): (2) $p_{n} = \frac{\sum_{n - g r a m \in c a n d i d a t e} min (c, m r c)}{\sum_{n - g r a m \in c a n d i d a t e} c o u n t (n - g r a m)}$ \[{{p}_{n}}=\frac{\sum\limits_{n-gram\in candidate}{\text{min}}(c,mrc)}{\sum\limits_{n-gram\in candidate}{count}(n-gram)}\] where candidate is the machine translated translation, c represents the number of occurrences of a certain n-gram in candidate, and mrc represents the maximum value of the number of occurrences of the n-gram in multiple reference translations. The denominator is the total number of occurrences of the n-gram in the machine translated translation.

There is a short sentence preference in the above evaluation method, and in order to make the machine translation generate sentences of appropriate length, BLEU introduces a penalty factor BP, whose calculation formula can be expressed as Equation (3): (3) $B P = {\begin{matrix} 1, & i f c > r \\ e x p^{1 - r / c}, & i f c \leq r \end{matrix}$ \[BP=\left\{ \begin{matrix} 1, & if\;c>r \\ ex{{p}^{1-r/c}}, & if\;c\le r \\ \end{matrix} \right.\] Where, c is the length of the machine translated translation and r is the length of the reference translation.The role of BP is to penalize the machine translation whose sentence length is less than the reference translation. The final BLEU score can be obtained based on equation (4): (4) $B L E U = B P \cdot e x p (\sum_{n - 1}^{N} w_{n} log p_{n})$ \[BLEU=BP\cdot exp(\sum\limits_{n-1}^{N}{{{w}_{n}}}\text{log}{{p}_{n}})\] where w_n denotes the proportional weight.

Many studies have confirmed the effectiveness of BLEU in differentiating the quality of translations, and the test results are relatively stable.The correlation between BLEU and the results of the manual evaluation is high in system level assessment, but the correlation between BLEU and the results of the manual evaluation may be poor in the sentence level.

2.4

Neural machine translation model incorporating translation memory

In order to enable the model to utilize the corpus more fully, in this paper, external knowledge is introduced by using translation memories and used to guide the training of the translation model in order to improve the quality of the model's translation results. The architecture diagram of the neural machine translation model that utilizes translation memory is illustrated in Fig. 3. The encoding layer uses a bidirectional encoder to encode the input source utterance x and target utterance y to obtain relevant utterances and word expressions. Subsequently, it is fed into the clustering layer, which updates the information of the memory module in the clustering layer through the K-Means clustering method, and calculates the similarity between the current source utterance expression and the semantic clusters of each source in the memory module to obtain the relevant source and target semantic information, which is fed into the decoding layer as the auxiliary information together with the source utterances [34]. The decoding layer uses the relevant source and target semantic information in the memory module to assist in generating the final translation. The target semantic self-encoding module is used to update the encoding result of the target utterance so that the target semantic clusters in the memory module are more relevant to the source semantic clusters. In particular, when translating the relevant information by retrieving the memory module, the global + local attention information is fused to guide the translation process from the sentence-level and word-level perspectives, respectively, so as to obtain higher quality translated utterances.

2.4.1

Coding layer

To for the current input source-target utterance pair < x,y >, the coding layer encodes the source utterance x by means of a bi-directional encoder that incorporates the knowledge of the words before and after each word into the coding vector as shown in Eqs. (5) and (6): (5) ${\vec{h}}_{i}^{x} = R N N (x_{i}, {\vec{h}}_{i - 1}^{x})$ \[\vec{h}_{i}^{x}=RNN({{x}_{i}},\vec{h}_{i-1}^{x})\] (6) ${\overset{\leftarrow}{h}}_{i}^{x} = R N N (x_{i}, {\overset{\leftarrow}{h}}_{i - 1}^{x})$ \[\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{h}_{i}^{x}=RNN({{x}_{i}},\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{h}_{i-1}^{x})\]

The splicing is performed to obtain the expression for each word, as shown in Equation (7): (7) $h_{i}^{x} = [{\vec{h}}_{i}^{x}; {\overset{\leftarrow}{h}}_{i}^{x}]$ \[h_{i}^{x}=[\vec{h}_{i}^{x};\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{h}_{i}^{x}]\]

Meanwhile, the expression before and after splicing toward the final moment is used as the final expression of the sentence, as shown in Equation (8): (8) $h^{x} = [{\vec{h}}_{N}^{q}; {\overset{\leftarrow}{h}}_{1}^{q}]$ \[{{h}^{x}}=[\vec{h}_{N}^{q};\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\leftarrow}$}}{h}_{1}^{q}]\]

Similarly, the same operations are performed on the target statement v as on the source statement to obtain the final expression h^y of the target statement.

2.4.2

Clustering layer

The clustering layer divides the source-target utterance pairs in the current corpus into K different source-target semantic cluster pairs by constructing a memory module. Specifically, after obtaining all the source-target utterance expressions in the English corpus through the encoding layer, all the source utterance expressions are divided into K different source semantic clusters by using the K-Means method, and all the target utterance expressions corresponding to all the source utterances under each source semantic cluster are put into the target semantic clusters corresponding to the source semantic clusters. Thus, the K group of source-target semantic cluster pairs is formed. The source semantic cluster group is denoted as $C^{x} = {C_{1}^{x}, C_{2}^{x}, \dots, C_{i}^{x}, \dots, C_{K}^{x}}$ ${{C}^{x}}=\{C_{1}^{x},C_{2}^{x},\cdots ,C_{i}^{x},\cdots ,C_{K}^{x}\}$, where $C_{i}^{x}$ $C_{i}^{x}$ is the cluster center expression of the ith source semantic cluster, and similarly, the target semantic cluster group is denoted as $C^{y} = {C_{1}^{y}, C_{2}^{y}, \dots, C_{i}^{y}, \dots, C_{K}^{y}}, C_{i}^{y}$ ${{C}^{y}}=\{C_{1}^{y},C_{2}^{y},\cdots ,C_{i}^{y},\cdots ,C_{K}^{y}\},C_{i}^{y}$, which is the cluster center expression of the i th target semantic cluster, and $C_{i}^{x}$ $C_{i}^{x}$ corresponds to $C_{i}^{y}$ $C_{i}^{y}$ one-to-one.

For the current input source utterance expression h^x, its similarity with the cluster center of each source semantic cluster in the source semantic cluster group is first calculated as the correlation between the utterance and each source semantics $α_{i}^{x}$ $\alpha _{i}^{x}$ as shown in Equation (9): (9) $α_{i}^{x} = h^{x} W_{a} C_{i}^{x}$ \[\alpha _{i}^{x}={{h}^{x}}{{W}_{a}}C_{i}^{x}\]

The weighted sum of all the correlations yields the memory module embedding expression M^x for the current input source utterance, as shown in Equation (10) and Equation (11).

(10)

α_{i}^{x} = \frac{e x p (α_{i}^{x})}{Σ_{j = 1}^{K} e x p (α_{j}^{x})}

\[\alpha _{i}^{x}=\frac{exp(\alpha _{i}^{x})}{\Sigma _{j=1}^{K}exp(\alpha _{j}^{x})}\]

(11)

M^{x} = \sum_{i = 1}^{k} α_{i}^{x} C_{i}^{x}

\[{{M}^{x}}=\sum\limits_{i=1}^{k}{\alpha _{i}^{x}}C_{i}^{x}\]

Similarly, for the current input target utterance expression h^y, its similarity to the cluster centers of each target semantic cluster in the target semantic cluster is calculated as the correlation of this target utterance with each target semantic $α_{i}^{y}$ $\alpha _{i}^{y}$, as shown in Equation (12): (12) $α_{i}^{y} = h^{y} W_{b} C_{i}^{y}$ \[\alpha _{i}^{y}={{h}^{y}}{{W}_{b}}C_{i}^{y}\]

The weighted sum of all the correlations is used as a memory module for the current input target semantics embedded in expression M^y. As shown in Eqs. (13) and (14): (13) $α_{i}^{y} = h^{y} W_{b} C_{i}^{y}$ \[\alpha _{i}^{y}={{h}^{y}}{{W}_{b}}C_{i}^{y}\] (14) $M^{y} = \sum_{i = 1}^{k} α_{i}^{y} C_{i}^{y}$ \[{{M}^{y}}=\sum\limits_{i=1}^{k}{\alpha _{i}^{y}}C_{i}^{y}\] where $C_{i}^{x}$ $C_{i}^{x}$ is the center of the ind cluster of the source semantic cluster, $C_{i}^{y}$ $C_{i}^{y}$ is the center of the ith cluster of the target semantic cluster, and W_a and W_b are parameter matrices.

In particular, the memory module embedding expressions M^x and M^y are obtained based on the overall semantics of the source and target utterances, respectively, and will be used as attentional information at the decoding level to guide the generation of translations at the sentence level.

2.4.3

Self-coding layer

The self-encoding layer mainly consists of the target utterance self-encoder. This module is mainly used to update all target utterance expressions in the corpus, thus updating the target semantic cluster expressions in the memory module. As shown in Eqs. (15) and (16): (15) $S^{y} = M L P [h^{x}; h^{y}]$ \[{{S}^{y}}=MLP[{{h}^{x}};{{h}^{y}}]\] (16) ${\hat{y}}^{'} = L S T M (S^{y})$ \[{{\hat{y}}^{\prime }}=LSTM({{S}^{y}})\]

The memory self-encoding layer loss is shown in Equation (17): (17) $L_{m e} = - E_{(x, y) \in C} l o g P ({\hat{y}}^{'} | x, y)$ \[{{L}_{me}}=-{{E}_{(x,y)\in C}}logP({{\hat{y}}^{\prime }}|x,y)\]

2.4.4

Decoding Layer

The decoding layer uses a unidirectional decoder, and the initial hidden state S₀ is obtained by using a multilayer fully-connected neural network after first splicing the encoded vector h^x of the source utterance with the retrieved memory information M^x and M^y, as shown in Equation (18): (18) $S_{0} = M L P ([M^{x}; h^{x}; M^{y}])$ \[{{S}_{0}}=MLP([{{M}^{x}};{{h}^{x}};{{M}^{y}}])\]

The decoding process utilizes the attention mechanism to capture the contextual attention information of the source utterance. Specifically, assume that the decoder obtains the hidden state S_t at moment t, which is normalized as a query vector with each word of the source utterance after computing the similarity as shown in Eqs. (19) and (20): (19) $β_{t, i}^{'} = S_{t} W_{c} h_{i}^{x}$ \[\beta _{t,i}^{\prime }={{S}_{t}}{{W}_{c}}h_{i}^{x}\] (20) $β_{t, i} = \frac{e x p (β_{t, i}^{'})}{\sum_{i = 1}^{N} e x p (β_{t, i}^{'})}$ \[{{\beta }_{t,i}}=\frac{exp(\beta _{t,i}^{\prime })}{\sum\limits_{i=1}^{N}{e}xp(\beta _{t,i}^{\prime })}\] where w_c is the parameter matrix. Weighted summation of all similarities yields the contextual attention H_t at the current moment, as shown in Eq. (21): (21) $H_{t} = \sum_{i = 1}^{N} β_{t, i} h_{i}^{x}$ \[{{H}_{t}}=\sum\limits_{i=1}^{N}{{{\beta }_{t,i}}}h_{i}^{x}\]

For the generative translation utterance decoding process, different target cluster attentions are computed for each moment since semantic shifts may occur at each moment and the target cluster semantics of interest are not exactly the same. Specifically, taking S_t as the query vector, its similarity to the cluster center of each target cluster in the target semantic cluster is calculated and normalized as shown in Eqs. (22) and (23): (22) $u_{t, i}^{'} = S_{t} W_{d} C_{i}^{y}$ \[u_{t,i}^{\prime }={{S}_{t}}{{W}_{d}}C_{i}^{y}\] (23) $u_{t, i} = \frac{exp (u_{t, i}^{'})}{\sum_{i = 1}^{N} exp (u_{t, i}^{'})}$ \[{{u}_{t,i}}=\frac{\text{exp}(u_{t,i}^{\prime })}{\sum\limits_{i=1}^{N}{\text{exp}}(u_{t,i}^{\prime })}\] where W_d is the parameter matrix. The target cluster attention at the moment of t is obtained by weighted summation of all similarities Z_t. As shown in Eq. (24): (24) $Z_{t} = \sum_{i = 1}^{N} u_{t, i} C_{i}^{y}$ \[{{Z}_{t}}=\sum\limits_{i=1}^{N}{{{u}_{t,i}}}C_{i}^{y}\]

In particular, the target cluster attention is computed once in the clustering layer and the relevant target semantic cluster information is obtained, while it will be obtained again at each moment in the decoding process, which is a two-level attention mechanism. The statement expression is used as the query vector in the clustering layer, and the attention information based on the overall semantic level is obtained, which guides the whole translation process at the sentence level, while the current hidden layer expression is used as the query vector in the decoding, and the attention information based on the currently decoded word is obtained, which guides the current translation moment at the word level. The overall attentional information ensures semantic accuracy and fluency of the overall translated utterance, while the attentional information of individual decoded moments allows for better alignment of the translation at the word level.

After fusing the three parts of information, the decoder generates the current t-moment word as shown in Equation (25): (25) $p ({\hat{y}}_{t}) = s o f t m a x (W_{p} [S_{t}, H_{t}, Z_{t}] + b_{p})$ \[p({{\hat{y}}_{t}})=softmax({{W}_{p}}[{{S}_{t}},{{H}_{t}},{{Z}_{t}}]+{{b}_{p}})\] where W_p is the parameter matrix and b_p is the bias vector.

The loss function of the decoding layer is shown in Eq. (26): (26) $L_{d} = - E_{(x, y) \in C} log P ({\hat{y}}^{'} | x, M^{y})$ \[{{L}_{d}}=-{{E}_{(x,y)\in C}}\text{log}P({{\hat{y}}^{\prime }}|x,{{M}^{y}})\]

2.4.5

Training of the model

The total model loss function consists of two parts, the decoding layer loss and the self-coding layer loss, as shown in Equation (27): (27) $L = L_{m e} + λ L_{d}$ $L={{L}_{me}}+\lambda {{L}_{d}}$ where λ is the hyperparameter.

In the training phase, within the current epoch, the encoding results of all source-target utterance pairs are first obtained and K-Means clustering is used to obtain the memory module, after which training is performed. At the end of the current epoch, all source-target utterance pair expressions are re-acquired, the memory module is updated, and the next round of training is performed.

For testing, only the source utterances are used as inputs, and the similarity of the source utterances in the source semantic clusters is used to replace the similarity of the target utterances in the target semantic clusters.

3

Experiments and analysis of results

3.1

NMT model comparison experiment

3.1.1

Experimental results

In order to verify the superiority of the constructed neural machine translation model incorporating translation memory, it is compared with other neural machine translation models in experiments on several divine machine translation (NMT) tasks, and the experimental results of different NMT systems are summarized in Table 1. Table 1 demonstrates the BLEU scores of the six NMT systems, and “Ours” denotes the NMT system that uses the NMT model construction method of fusing translation memories proposed in this paper.

Table 1.

BLEU score of different NMT systems for NMT tasks

Compare	System	English-German	German-English	English-Vietnamese	Vietnamese-English	English-Russian	Russian-English
1	RNN	21.58	26.91	30.03	29.33	15.52	17.83
1	Ours	24.49	28.32	30.85	30.54	15.99	19.48
2	Transformer	24.81	31.14	29.27	27.61	14.38	19.44
2	Ours	24.88	32.89	30.04	29.15	15.21	20.14
3	mRASP	30.82	37.61	35.41	38.19	19.04	25.21
3	Ours	31.85	38.02	36.47	38.78	20.15	25.44
4	mBART	29.81	37.74	35.13	37.64	19.18	25.53
4	Ours	30.92	38.07	35.44	38.53	19.82	25.99

In the six translation directions, compared with the baseline model, the model building method for fusing translation memories proposed in this paper can effectively improve the BLEU scores. The method in this paper has a significant effect on improving the translation quality of RNN models. Compared with the RNN model, the model in this paper not only achieves an improvement of 2.91 BLEU scores in the direction of English-German translation, but also achieves an improvement of 1.65 BLEU scores in the direction of Russian-English translation. For the Transformer model, the NMT system with this paper's method improves 1.75 BLEU scores and 1.54 BLEU scores in the German-English and Vietnamese-English translation directions, respectively. This paper's method improves the BLEU score of the mRASP model from 30.82 to 31.85 in the English-German translation direction, which has the greatest improvement effect on the multilingual pretrained model. In addition, the mBART model improves the BLEU score from 29.81 to 30.92 in the same translation direction, which also has an improvement effect.In summary, it can be seen that the modeling method proposed in this paper that incorporates translation memory achieves better performance in multiple translation directions for the multilingual neural machine translation model.

3.1.2

Learning curve

In order to better understand the proposed NMT model construction method for fusing translation memories, this paper further compares the BLEU scores obtained using the mRASP baseline model and the improved model of this paper to compare the advantages of this paper's method on six resource translation tasks. The BLEU score curves on the validation set under different translation directions are shown in Fig. 4. Where (a) ~ (f) represent the RLEU score curves on the six translation directions of English-German, English-Vietnamese, English, Russian, German-English, Vietnamese-English, and Russian-English, respectively.

As can be seen from Fig. 4, the NMT model with fused translation memory applied converges faster than the baseline model. This phenomenon indicates the better performance of the NMT model with fused translation memory during the model training process. One explanation for the NMT model with fused translation memory to have better performance momentum than the baseline model at the later stage of the training of the NMT model (i.e., after all the training samples are included) may be that when translating the relevant information through the retrieval memory module, it fuses both global and local attentional information, guiding the translation process from both the sentence-level and the word-level perspectives, which results in higher-quality translated utterances.

Meanwhile, when the source language is English, the performance of the mRASP model with the fused translation memory modeling approach outperforms the baseline at the outset. Several factors can explain this observation. First, the mRASP model was pre-trained using a large-scale multilingual corpus and learned shared information between multiple languages, benefiting from this prior knowledge, which focuses on the comprehension and generation of English sentences at that stage. This prior knowledge gives the mRASP model an initial advantage.Secondly, English examples tend to have relatively regular syntax and structure. This simplicity allows the mRASP model to quickly capture and learn patterns of expression in the language. Finally, the mRASP model is exposed to a large number of English sentences in the early stages of training, which helps the model learn English language features and expressions more quickly.

3.2

English Parallel Corpus-Assisted Translation Experiments

3.2.1

Experimental design

In order to investigate the mechanism of the influence of English corpus on translation quality in the translation of multilingual websites, this paper selected 80 English majors in the third year of undergraduate colleges as the experimental subjects, and designed an experiment of English parallel corpus-assisted manual translation. The experimental group used parallel corpus for the experiment, and the control group used conventional reference resources such as dictionaries for the experiment. In order to facilitate the comparison between the experimental group and the control group and control the variables as much as possible, the experiment was a 90min time-limited translation. At the same time, except for the difference in the use of reference tools, the conditions of the two groups were identical in all other aspects. Limited to the conditions of the existing English parallel corpus resources, only the German-English form was chosen for this experiment.

3.2.2

Experimental results and analysis

1)

The effect of English parallel corpus on translation efficiency

According to the preliminary analysis of the translations based on the experimental data, three sets of data were collected to examine the efficiency of the translations of the two groups from three different perspectives. One is the comparison of the number of the experimental group and the control group who all completed the translation task, as shown in Table 2. The second is the comparison of the total number of words in the translation tasks completed by the two groups respectively, as shown in Table 3. The third statistic shows the statistics of the two groups in completing the translation of 25 terms in the original text, as shown in Table 4.

Table 2.

Comparison of the number of people who completed all translation tasks

Group	Total number of people	Number of task completers	Percentage/%
Experimental group	40	3	7.5
Control group	40	21	52.5

Table 3.

Comparison of completed translations

Project	Experimental group	Control group
Standard translation quantity	7047 words	7047 words
Total translations completed*	4358 words	5706 words
Percentage of translations completed/%	61.84	80.97
* _χ² = 742.65 ,P<0.05,there are significant differences.

Table 4.

Terminology translation statistics

Project	Experimental group	Control group
Correct quantity*	7047 words	7047 words
Untranslated and incorrect quantities	4358 words	5706 words
* _χ² = 2.84 ,P>0.05,there are no significant differences.

As can be seen from 2, in terms of submitting complete translations, the experimental group lagged behind the control group in completing all the translation tasks within the stipulated 90 min, with only 3 people completing all the translation tasks. Meanwhile, Table 3 shows that there is a significant difference between the experimental group and the control group in terms of the total number of completed translations (total number of words) (χ² = 742.65, p<0.05). Many literatures take terminological resources as an important advantage of parallel corpora. To verify this, this paper counted and compared the translation of 25 terms in the two groups of translations respectively. In terms of absolute value, the control group is better than the experimental group in terms of the progress of terminology translation, but in terms of the chi-square test (χ² = 2.84, P>0.05), the advantage of the control group in terms of progress is not significant.

Comprehensive Tables 2~4 show that the experimental group not only has no advantage over the control group, but also lags significantly behind the control group in terms of translation efficiency. Theoretically speaking, an English parallel corpus should greatly facilitate translation practice activities and improve the efficiency of translators. The reason why the actual translation efficiency of the experimental group is not as good as that of the control group may be that the excessive search results of the English corpus have increased the burden of selection and screening for the translators, especially for the students who have a limited level of English and are not yet rich in translation practice experience. In the post-experimental interviews, the students in the experimental group emphasized that reading and browsing through the large number of search lines in the parallel corpus was a very time-consuming part of the translation process, for example, among the 25 terms sampled, the highest frequency of occurrence in the parallel corpus was 582 times, and the average frequency reached 61 times. In contrast, tools such as dictionaries query to a single and unambiguous result, and the burden of selection is much smaller than in parallel corpora. To address this problem, the NMT model incorporating translation memory constructed in this paper can be used to solve the problem, and machine translation can compensate for the slower efficiency of human translation.

2)

The Effect of English Parallel Corpus on Translation Quality

According to the current experimental conditions, for operability considerations, this paper determines two parameters to evaluate the translation quality: one is the accuracy of terminology translation, and the other is the completeness of sentence translation of the translated text. In pragmatic translation, terminology translation is an important criterion to measure the merits of translation. The experimental data on translated terminology for both groups is shown in Table 5.

Table 5.

Comparison of translation accuracy of terms

Project	Experimental group	Control group
Number of terms of translation	363	502
Number of terms of the correct translation*	307	325
Accuracy rate of term translations /%	84.57	64.74
* _χ² = 38.15,P<0.05,there are significant differences.

The percentage of accuracy of translated terms in the experimental group (84.57%) is higher than that of the control group (64.74%). And the chi-square test further confirmed that the experimental group was significantly better than the control group in terms of accuracy of translated terms (χ² = 38.15, p<0.05).

Meanwhile, for the purpose of comparing the data of the two groups, the total number of sentences in each of the two groups' completed translations and the number of legitimate sentences successfully constructed by using techniques such as adding words were counted separately as shown in Table 6. The experimental group has a significant advantage over the control group in constructing legitimate sentences by adding subjects (χ² = 11.83, p<0.05).

Table 6.

Adding words to construct translated sentence data statistics

Project	Experimental group	Control group
Total number of translated sentences	204	241
Adjusted number of processed sentences*	124	113
* _χ² = 11.83,P<0.05,there are significant differences.

In summary, it can be seen that although the experimental group is not as good as the control group in terms of terminology and translation efficiency, it has obvious advantages in terms of accuracy of terminology translation and construction of legitimate translated sentences. Judging from the data of the experiment, the English parallel corpus can improve the quality of translation to some extent, and by combining the NMT model that incorporates translation memory with the English corpus, high-quality and high-efficiency translations in multilingual websites can be realized.

4

Conclusion

This study examines the relationship between the English corpus and translation quality in multilingual website translation using both the Neural Machine Translation Model (NMT) and the English corpus.The main findings of the research are as follows:

The neural machine translation model construction method incorporating translation memory proposed in this paper can effectively improve BLEU scores. Relative to the RNN model, the model in this paper achieves a 2.91 BLEU score improvement in the direction of English-German translation, and a 1.65 BLEU score improvement in the direction of Russian-English translation. For the Transformer model, this paper's model improves 1.75 BLEU scores and 1.54 BLEU scores in the German-English and Vietnamese-English translation directions, respectively. Meanwhile, this paper's model improves the BLEU scores of the mRASP model and the mBART model by 1.03 and 1.11 in the English-German translation direction, respectively. While in the training process, this paper's model can converge faster than the baseline model. It can be seen that the proposed modeling method of fusing translation memories can effectively improve the translation performance of NMT models in multiple translation directions.

In terms of translation efficiency, the experimental group using the English parallel corpus has significantly less number of completed translations and total number of translations than the control group (P<0.05). As for translation quality, the percentage of accuracy of translated terms in the experimental group (84.57%) was higher than that of the control group (64.74%), and the experimental group had a significant advantage over the control group in adding subjects to construct legitimate sentences (P<0.05). Therefore, an English parallel corpus can effectively improve the quality of translation, and the deficiency in translation efficiency can be compensated for by using machine translation.

Lingua:: Inglese

Frequenza di pubblicazione:: 1 volte all'anno
Argomenti della rivista:: Scienze biologiche, Scienze della vita, altro, Matematica, Matematica applicata, Matematica generale, Fisica, Fisica, altro

Feed RSS della rivista

A Study on the Influence Mechanism of English Corpus on Translation Quality in Multilingual Website Translation

Ying Pu

Zhongchi Zhang

Pubblicato online: 21 mar 2025

Ricevuto: 31 ott 2024

Accettato: 06 feb 2025

DOI: https://doi.org/10.2478/amns-2025-0612

Parole chiaveBleu metrics, Neural machine translation modeling, English corpus, Translation quality

© 2025 Ying Pu et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Parole chiave
Bleu metrics, Neural machine translation modeling, English corpus, Translation quality