Computer Translation-Based Language Modeling Enables Multi-Scenario Applications of the English Language
Publicado en línea: 17 mar 2025
Recibido: 26 oct 2024
Aceptado: 07 feb 2025
DOI: https://doi.org/10.2478/amns-2025-0349
Palabras clave
© 2025 Lei Li, published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
In the past decade, with the development of many modern cutting-edge scientific technologies, the solution of cutting-edge problems in a number of disciplines has become increasingly dependent on the development of linguistics. Since language is the carrier of cultural and social information, the further development of information science depends to a large extent on the development of language science. For example, artificial intelligence, which must simulate the human brain mechanism and thinking process, cannot help but simulate the internal language process of human beings to some extent, because human abstract thinking is realized through the expression of language [1-4]. Take the human-computer dialogue, the most important thing is the modeling of natural language, without the study of language models, there is no realization of the high degree of formalization of natural language, therefore, it can be said that the “language model” is the “bridge” between linguistics and information science. From the perspective of “natural language processing”, language model is a mathematical model that describes the inner law of natural language, and the construction of language model is one of the core contents of the research method of computational linguistics, and it is also the core theory of corpus linguistics. It can be divided into rule model and statistical model [5-8].
Translating one language into another with the help of computers, referred to as computer translation (MT), is increasingly becoming an important technology, and the development of national defense, economy, political stability and social welfare depends on the sharing of information. Never in the history of mankind has the need to break through language barriers been felt more urgently than it is today, new common markets and growing world trade have created a strong need for language support, and the exchange of the nine official languages of the European Community alone means that a great deal of manpower is engaged in translating into seventy-two different languages in different directions every day [9-12]. Because of this, it is hoped that with the help of computers, the most powerful tool of information processing, people can be freed from such a heavy and tedious processing as translation work, or at least from a part or most of the repetitive and monotonous work, to engage in more creative labor. In summary, it seems very meaningful to examine the multi-scenario application of computer translation-based language modeling empowering the English language, through the practical use of language in communication, it can be observed whether the learners can use the learned language in a specific language environment well to carry out practical interactions and achieve the purpose of communication [13-15].
The study addresses the problem of catastrophic forgetting when incorporating BERT pre-trained language models into neural machine language models, and introduces a masking matrix strategy to mitigate it. Then, through the internal fusion and dynamic weighting of multi-attention mechanism, the model can make full use of the output information of the optimized BERT to realize the construction of English translation model based on improved Masking-BERT enhancement. The performance of the English translation model is analyzed in terms of English utterance compression effect, training loss and BLEU value, and its practical application effect is explored through the model’s translation accuracy, response time and expert satisfaction score. After that, Transformer is used to model grammatical error correction, taking into account both local contextual information and long-distance dependencies in the text. The English grammar error correction model is trained and tested, and other grammar error correction methods are selected to compare with this paper’s model, analyzing the accuracy rate, call rate and
Today, when the development of many cutting-edge scientific technologies has to rely more and more on language development, it is worthwhile to explore in depth how the research results on language can be used precisely and effectively for multi-scenario applications of the English language. Literature [16] summarizes the research related to ChatGPT, mainly analyzes the state-of-the-art large-scale language models in the GPT family and their applications in different domains, and points out that the key innovations of large-scale pre-training, instruction fine-tuning, and reinforcement learning from human feedback are of great importance for the adaptability and performance improvement of LLM. Literature [17] points out that pre-trained language models have led to a paradigm shift from supervised learning to pre-training to fine-tuning in the field of Natural Language Processing (NLP), and examines the future research directions in the field of language modeling by analyzing the classification methods, characterization methods, and frameworks for pre-trained models. Literature [18] proposes an extraction and abstraction neural network document summarization method based on transformational language models and experimentally verifies the effectiveness and feasibility of the method, which generates more abstract summaries and also achieves higher ROUGE scores. Literature [19] presents a cued large-scale language model for machine translation, which is verified to have excellent performance and can enhance translation results by using GLM-130B as a testbed. Literature [20] developed an English translation model based on intelligent recognition technology and deep learning framework, and designed experiments to verify its effectiveness, which has high accuracy and efficiency in speech recognition and translation, and provides a feasible and efficient solution for practical applications. Literature [21] proposed a machine translation language model by synthesizing feed-forward neural network decoder and attention mechanism, and experimentally verified that the proposed model has better performance and can be applied to different English scenarios, and the research provides new ideas for the direction of multi-scenario application of machine translation language model in the future.
With the rapid development of AI artificial intelligence technology, the use of computers to translate between different languages has become a big trend. In order to improve the efficiency and quality of English teaching, many educators apply the AI translation model to the English classroom. To address this phenomenon, an English translation model based on improved Masking-BERT enhancement is proposed.
In order to the oblivious catastrophe problem of BERT training, the study proposes to use a mask matrix based BERT training strategy as follows:
In BERT, the
where
Multiply this binary mask
In order to make the neural machine translation model can better utilize the output information of BERT trained based on the masking matrix strategy, internal fusion of multiple attention mechanisms in the BERT model is performed. Based on the output of the final hidden layer of Masking-BERT, the encoding and decoding attention results of the current layer of the model are computed as in Eqs. (3)~(5), respectively:
Let the two connected layers in layer
Let the hidden layer representation of the
where
Let the hidden layer state of the first
where
In order to validate the performance of the English translation model based on improved Masking-BERT enhancement designed by the study, the study conducts experimental validation in the CoNLL 2020 dataset, and the study utilizes online translation and machine translation to compare with the method of this paper.
The compression rate and compression stability are used as comparison indexes. Figure 1 shows the compression rate and compression stability comparison results of the three methods, where (a) is the compression rate comparison results of the three methods, and (b) is the compression stability comparison results of the three methods. As can be seen from Fig. (a), the average values of compression rates of online translation, machine translation and the English translation model of this paper are 77.9%, 81.4% and 88.4%, respectively. As can be seen from Fig. (b), the average values of compression stability of online translation, machine translation and systematic approach are 75.2%, 82.3% and 86.8%, respectively. This indicates that the English translation model in this paper also has high performance in the process of English utterance compression.

The comparison results of the compression and compression stability of 3 methods
To further validate the performance of the systematic approach, Figures 2 and 3 show the comparison of the training loss and BLEU evaluation results of the three models, respectively. The average loss of the English translation method in this paper is lower than that of online translation and machine translation, which are 0.80, 0.85 and 0.82, respectively.The BLEU scores of the English translation method in this paper are 0.86, while those of the online translation and machine translation are 0.78 and 0.82, respectively, which indicates that the English translation method in this paper has a certain degree of superiority compared with the more commonly used translation methods at present.

The results of the training loss of the three models

The results of the BLEU value of the three models
In order to further validate the performance of the designed English translation model based on improved Masking-BERT enhancement, the study evaluates the results of the three methods, and the evaluation results of the three are shown in Figure 4. The compression ratio (0.66), grammaticality (5.67) and information content (4.32) of the English translation method in this paper are significantly better than the other two methods, while the difference in the heat ratio (0.48) is not significant. It shows that the design of this paper can provide a higher-performance method for English translation.

The result of three ways to translate English into Chinese
In order to test the application effect of the constructed translation models in real problems, the actual operation stability and translation satisfaction of different translation models in the whole translation system are analyzed.
Figure 5 shows the translation accuracy and system response time of different models under the English to Chinese and Chinese to English tasks. The response times of online translation, machine translation and the English translation model of this paper in English to Chinese are 8.16 s, 5.92 s, 1.21 s, and the translation accuracy reaches 0.74, 0.81, 0.87, respectively.In the task of Chinese to English, the response times are 10.39 s, 7.96 s, 2.65 s, and the translation accuracy reaches 0.72, 0.76, respectively, 0.84.

Translation accuracy and system response time of different models
Figure 6 shows the satisfaction of 20 experts for the three translation models in the actual translation task, of which Figures (a), (b) and (c) show the satisfaction results of online translation, machine translation and the English translation model of this paper, respectively. The experts’ translation satisfaction and situational fitness for the English translation model of this paper are almost all in the range of 3 to 5 points, and most of the experts are concentrated in the first quadrant, which shows that the model has the highest satisfaction. In addition, the expert satisfaction of the other three models is either concentrated in the middle or more dispersed, thus showing that their satisfaction is poor.

Expert satisfaction test results
Automatic grammatical error correction is a typical task in natural language processing research, where the goal is to build an automated system to correct possible grammatical errors in texts. Broadly speaking, research on grammatical error correction has undergone a methodological evolution from rule-based, to statistical-based, to machine translation-based approaches. Computer translation is used in English grammatical error correction to construct a model for English grammatical error correction based on neural machine translation based on Transformer, an encoder-decoder model with self-attention mechanism.
Grammatical error correction is defined as the following process: input a sentence containing a grammatical error and output a corrected sentence that does not contain a grammatical error and that preserves the semantics of the original input sentence.
In general, model parameters
A recurrent neural network is a classical model for modeling variable-length sequences, and its standard formalization is as follows:
In order to solve the above problems, Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) are introduced.The formal definition of LSTM is as follows:
where
In sequence-to-sequence learning, this is typically modeled using a neural network encoder-decoder model. The encoder first encodes the input sequence into a series of implicit state representations (vectors) in a continuous space, and based on these implicit states output by the encoder, as well as the prefixes of the symbol sequence that have been output at the current time step, the decoder predicts the output of the next symbol. Formally:
The attention mechanism is an important component of today’s mainstream neural network encoder-decoder models. Specifically, at time step
In the above equation, Atten is the function that calculates the attention score, commonly defined as:
where
The Transformer model consists of an encoder and a decoder, given source-side error sentences
The encoder and decoder in Transformer each contain six identical layers, located in the encoder, consisting of a self-attention sublayer and a forward neural network sublayer. The input passes through the self-attention sublayer, after which the same forward neural network is applied to the output at different locations in the self-attention sublayer.
Given a query vector
In order to allow the model to simultaneously access information from different representation subspaces when encoding at different locations in the sequence, multi-head attention performs dot product attention computation with multiple scaling:
Multihead attention is applied in the self-attention sublayer, where query vectors, key vectors, and value vectors are derived from the output of the previous sublayer (or directly the Embedding of the input).
This sublayer is similar to the attention layer in a typical recurrent neural network encoder-decoder model, with query vector
Since the Transformer model does not contain any loop structure, in order to utilize the positional information of the symbols in the sequence, position encoding is incorporated into the input Embedding, and the dimension of position encoding is the same as that of the implicit dimension
As with typical sequence-to-sequence models, Transformer uses embedding layers at the bottom of the encoder and decoder to convert symbols in the sequence into vectors. When decoding to generate the symbols at the target end, the output of the decoder is converted to a vector of probability distributions on the symbol table by a linear transformation and softmax function.Unlike Transformer’s original paper, the parameters of the Embedding layer of the encoder and decoder, as well as of the linear transformation layer in front of the softmax layer are not shared in this paper when modeling syntactic error correction.
When training Transformer, great likelihood estimation is used with the goal of maximizing the likelihood of the model on training data
Given the input error sentence
The training data for the experiments in this chapter attempts to incorporate forged error sentences to train the model, in addition to using manual labeling training data. Two different types of training data are used in this experiment, and the composition of the training data is described below:
1) Due to the major grammatical error types in the corpus with manual annotation, e.g., the NUCLE dataset has 57,151 sentences, but only 3,716 sentences contain at least one or more grammatical errors, less than 0.4% of the words in the whole dataset need to be corrected, with the coronal errors accounting for 14.8%, the noun singular-plural errors accounting for 8.4%, and the prepositional errors accounting for only 5.4%. The sparsity of grammatical errors will greatly affect the effectiveness of model training, so in this chapter of the study, sentences with grammatical errors will be extracted as training data. 2) Since the corpus data with artificial error annotations are less, in order to improve the performance of the error correction classification model, in addition to the training data introduced above, the training data with forged errors are also introduced for the experiments in this chapter. The WiKi-Text103 corpus is selected as the seed corpus for this experiment, and prepositional errors, coronal errors, and noun singular-plural errors are randomly created, i.e., prepositions and coronal errors are randomly replaced or deleted after they are recognized in a sentence, and nouns are randomly replaced with another singular-plural form of a noun when they are detected, so as to generate more training data for training the classifier for the corresponding error types.
In this chapter, W&I-dev will be chosen as the validation set, and the test data provided by CoNLL-2014 will be used as the test set for this experiment, and P, R, and
In this paper, the labeled data and the forged error data sets are merged together to jointly train the English error correction model based on the self-attention mechanism, and the training results of the model are shown in Fig. 7, and the performance of the final model achieves 93% accuracy on the test set, with the loss value stabilized at 0.08.

Model training results
Experiments were conducted on the CoNLL-2014 test set on this paper’s English error correction method based on the self-attention mechanism and other grammar error correction methods (denoted as Method 1~ Method 4). The experimental comparison results of English grammar error correction methods are shown in Figure 8. The results of this paper’s model on the CoNLL-2014 test set are all improved compared with other grammar error correction methods, and the values of precision, recall, and

The experimental comparison of English grammar error method
The ERRANT Syntax Error Annotation Toolkit was used to explore the performance of the models in this paper under different syntax error types. The ERRANT toolkit was first used to convert the M2 files of the CoNLL-2014 test set into M2 files adapted to the input of the ERRANT toolkit prior to analysis. The prefixes “M”, “R”, and “U” represent the three meanings of missing, replacing, and uncorrected errors, respectively.
In this paper, we focus on the correction of common types of errors, and the syntactic error correction results of the model are shown in Fig. 9, where (a) and (b) are the test results of syntactic error correction and syntactic error detection, respectively. The model has a good effect of correcting nouns, prepositions, spelling errors, and verb form errors, and the accuracy of grammatical error correction is above 76%. Among them, the accuracy of correcting spelling errors is up to 79.1%. In terms of error type detection, the model has a high accuracy in identifying the type of error for singular and plural forms of nouns and verbs, with a 78.4% and 79.6%, respectively. For the identification and detection of punctuation errors, the combined performance of the model correction and detection achieved 0.726 and 0.748 on the

The grammatical error correction of the model
For better English language learning and utilization, this paper proposes an English translation model based on improved Masking-BERT enhancement based on the computer translation perspective. The English grammar error correction task is regarded as a sequence-to-sequence generative task, i.e., the error correction process is regarded as the process of translating an incorrect sentence into a correct one, and the self-attention mechanism is used to construct the English grammar error correction model. Through the analysis of the English translation and grammar error correction model, we explore its multi-scenario application in English language.
Compared with other models, the English translation model in this paper has better English utterance compression performance, with a compression rate and compression stability of 88.4% and 86.8%, respectively, and also outperforms the comparison model in terms of training loss and BLEU score. Meanwhile, the model’s translation accuracy in English to Chinese and Chinese to English reaches 72%~87%, the response time is less than 3s, and the satisfaction scores are concentrated in 3~5, which is superior in practical application.
The English grammar error correction model in this paper scores more than 78% in precision, recall, and
Under the trend of globalization, the increase in English usage scenarios such as work, study, and immigration has resulted in the emergence of a large number of English learning groups. This study constructs an English translation model and a grammar error correction model based on computer translation, which can assist the public in learning and daily application of the English language and promote the efficiency of English learning.
This research was supported by the Ministry of Education’s Supply and Demand Integration Project of Employment and Education: “Research on Enhancing the Innovation and Entrepreneurship Ability of Traditional Chinese Medicine International Communication Talents” (No. 2024041148735).
