A Study on the Effectiveness of Designing and Applying Computer-Assisted Grammar Error Correction System in College English Writing Teaching

With the increasing maturity of computer intelligence technology, its application in the field of education has attracted widespread attention. Especially in English writing teaching, the emergence of computer-assisted writing tools provides students with new learning support and writing ideas [1-2]. Grammatical errors are a common problem in college-oriented English writing teaching [3]. Embedding a computer-assisted grammar error correction system into college English writing teaching creates opportunities for students to learn and correct errors on their own with the help of intelligent teaching resources [4-6]. However, how effective is the application of computer intelligence technology in English writing assistance and how does it affect students’ writing ability still needs to be further explored.

The research on college students’ errors in English writing is very rich, and a large number of studies have shown that college students make grammatical errors in writing, among which the most errors are made in the choice of prepositions, articles, verbs and other vocabulary [7-9]. Traditional grammar teaching often relies on the explanation of rules and lacks rich contextual support, which makes it difficult for students to apply grammar knowledge to actual writing [10-12]. Computer intelligence technology, however, can provide a large number of examples to help students transform grammar knowledge into actual writing skills, helping them to better master complex grammar rules as well as better understand grammatical structures and avoid grammatical errors [13-16].

At the same time, research on how to correct writing errors is also developing. Error-correcting feedback in English writing instruction aims to use explanatory methods to make learners use correct English grammatical forms in their future learning, so error correction is beneficial in improving English writing skills [17-19]. However, more studies have explored how to give students appropriate feedback from the teacher’s point of view, while professors have studied to help unskilled L2 writers reduce the time spent in writing error correction [20-22]. The reason for this is that the traditional classroom in China is teacher-centered and the teaching method is based on the teacher’s lecture, allowing students to engage in inductive independent learning is to some extent neglected, which also makes the application of computer intelligence technology-assisted writing pedagogy in the domestic education field somewhat limited [23-27]. In this context, designing a set of grammar error correction system applicable to English writing teaching can help students understand the use of language in different contexts and enhance their comprehensive ability in English writing.

In this paper, a computer-aided grammar correction system for university English writing teaching is designed by integrating the pinyin detection algorithm, feedback filtering algorithm and data expansion method with the Transformer model of English grammar correction based on the replication mechanism as the core. In order to prove the effectiveness of the proposed English grammar error correction model, its performance is compared with that of the traditional CAMB grammar error correction model, as well as the UIUC method and the Corpus GEC method, respectively, and it is applied to the task of correcting Chinese students’ compositions. In addition, the designed English grammar automatic error correction system is used in university English teaching experiments to prove that the system can enhance students’ mastery of English grammar and improve their English writing ability.

2

English Grammar Error Correction Model Based on Machine Translation

2.1

Neural machine translation model and reordering strategy

2.1.1

End-to-End Neural Machine Translation Modeling

Machine translation is a task of generating another target language that expresses the same meaning by automatically reading a sequence of natural language. Machine translation models can provide a variety of candidate translations. Due to the differences between languages, many candidate translations have problems such as poor syntax and semantics, and do not conform to the daily expression habits. However, expanding the training data by forming parallel sentence pairs of candidate and target results provides a solution idea for data augmentation methods.

An end-to-end neural machine translation model that utilizes neural networks to achieve mapping from source language text to target language text. The main idea is “encoding-decoding”: given a source utterance x, $x = (x_{1}, x_{2} \dots \dots, x_{T})$ , an encoder is used to map it into a continuous, dense vector c, and then a decoder is used to transform the vector into a sentence in the target language y, $y = (y_{1}, y_{2}, \dots \dots y_{T})$ . p(y) represents the output probability of the target sentence. The formula is shown below: (1) $p (y) = \prod_{t = 1}^{T} p (y_{t} | {y_{1}, y_{2,}, \dots ., y_{t - 1}}, c)$

2.1.2

Pre-training models

1)

Transformer model

The function of an encoder is to transform a non-fixed-length semantic sequence into a fixed-length context vector and encode the information of the input semantic sequence. Transformer’s encoder is composed of N identical neuromodules, where the output of the previous neuromodule is the input of the next neuromodule. Each neuromodule has a multi-head attention layer and a feedforward network layer [28]. The Multihead Attention Layer consists of multiple Attention Layers spliced together, and each layer uses the Attention mechanism. The ATTENTION mechanism in the Transformer allows the model to focus on the valuable information in the context. When encoding words, not only the current word is considered, but also the contextual context of the current word. The entire contextual context is incorporated into the current word vector. The encoder consists of a number of layers and each layer includes two sub-layers, the self-attention layer and the feed-forward neural network layer.

The decoder has the same number of layers as the encoder and consists of a self-attention layer, an encoding-decoding attention layer and a feedforward neural network layer. The encoding-decoding attention layer helps the decoder to focus its attention on the relevant parts of the input sentence. Each sub-layer in the encoder and decoder uses residual concatenation and normalization. The hidden layer in the neural network uses normalization to normalize the output of each layer to a standard normal distribution, which ultimately speeds up training and convergence. The formula for calculating the multiple attention layers is as follows: (2) $M u l t i H e a d (Q, K, V) = C o n c a t^{(h e a d_{1}, \dots \dots h e a d_{n})} W^{0}$ (3) $h e a d_{i} = A t t e n t i o n (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V})$ (4) $A t t e n t i o n (Q, K, V) = s o f t \max (\frac{Q K^{T}}{\sqrt{d_{i}}}) V$

Where, Q denotes the query vector, K denotes the relevance vector, and V denotes the vector of queried information, i.e., the real-valued matrix. $\sqrt{d}$ is the fixation factor, and d is the hidden layer dimension. 2)

C-Transformer model

The principle of Copy-Augmented Transformer is to copy the unchanged words from the original utterance to the target utterance. In the text correction process, only a few words need to be corrected in each utterance, and the remaining text needs to be copied directly into the target utterance. The Transformer model based on the copying mechanism is able to determine whether the predicted output of the next word is generated by copying or from the word list space [29]. $P_{t}^{g e n}$ represents the output of the decoder and denotes the probability that t moments are generated after word list decoding using regular attention mechanisms. $α_{1}^{c o p y}$ is a balancing factor in order to balance the word list generation operation with the original text copying operation. $P_{t}^{c o p y}$ represents the probability that the decoded word at moment t is copied from the original text. The final generation probability distribution of the target word is determined by the generated decoding probability $P_{t}^{g e n}$ and the copying probability $P_{t}^{c o p y}$ . The target language decoding formulas with copying mechanism are shown in Eqs. (5) to (8): (5) $P_{t} (w) = (1 - α_{t}^{c o p y}) * p_{t}^{g e n} (w) + (α_{t}^{c o p y}) * p_{t}^{c o p y} (w)$ (6) $α_{t}^{c o p y} = s i g m o i d (W^{T} \sum (A_{t}^{T} \cdot V)), A_{t}^{T} = {(q_{t}^{T} K)}^{T}$ (7) $P_{t}^{c o p y} (w) = s o f t \max (A_{t})$ (8) $q_{t}, K, V = h_{t}^{t r g} W_{q}^{T}, H^{s r c} W_{K}^{T}, H^{s r c} W_{v}^{T}$

where q_t is the mapping of the decoded hidden state at moment t, K is the different transformations of the input encoded hidden state, and W is the parameter matrix.

2.1.3

Reordering strategy

A machine translation system consists of multiple components, each of which will provide multiple, different candidate target sentences. Due to the variability among different languages, the quality of translations obtained using machine translation varies, and even many translations are ungrammatical. In addition, in the text generation task, the output of the model in the previous time step affects the results of the next time step, and the final generation of a number of this paper will be selected using conditional probabilities based on historical results, i.e., the model only judges the corrected results by the probability of the value of the value of the good or bad. Therefore the corrected output of the model may not be the optimal corrected result. Nowadays, the candidate translation with the highest comprehensive score is selected as the output by using the reordering method of multiple features, which makes the final correction result optimized.

2.2

English Grammar Error Correction Model Based on C-Transformer

2.2.1

Problem definition

In this paper, we will regard the English grammar error correction task as a machine translation task. For the input source utterance $(x_{1}, \dots, x_{N})$ containing grammatical errors and the corresponding corrected target sentence $(y_{1}, \dots, y_{T})$ , the sequence generation process of the corrected sentence can be represented by the following equation: (9) $h_{1 \dots N}^{s r c} = e n c o d e r (L^{s r c} x_{1 \dots N})$ (10) $h_{t} = d e c o d e r (L^{t r g} y_{t - 1 \dots 1}, h_{1 \dots N}^{s r c})$ (11) $P_{t} (w) = s o f t \max (L^{t r g} h_{t})$

where L is the word embedding matrix, $h_{1 \dots N}^{s r c} N$ is the implicit state of the source utterance after encoding it through an encoder, h_t is to predict the implicit state of the target word, and P_t(w) represents the probability distribution of the target word.

However, the English grammar error correction model based on neural machine translation still faces great challenges. A complex neural network has a large number of parameters, and often a huge amount of data is required to make the model adequately trained. The number of sentence pairs with manual annotation for grammatical error correction is currently low, with only more than one million relevant annotated sentence pairs, and even less annotated data of higher quality. This greatly restricts the research progress of syntactic error correction algorithms based on neural machine translation. The scarcity of training data has become an important bottleneck in the research of grammatical error correction, which is a problem that needs to be solved urgently.

In this paper, we will expand the training data by creating pseudo-parallel sentence pairs, and study the effect of different pseudo-parallel sentence generation methods on the model effect.

2.2.2

English Grammatical Error Generation Methods

The three different ways of generating a pseudo-parallel corpus used in this paper are as follows: 1)

Noise Perturbation

The simplest way of generating syntactic errors is considered first, i.e., taking the approach of adding noise directly to a correct sentence. In order to avoid the situation of too many unlogged words, the sentence is first subjected to BPE segmentation, and then words are deleted, replaced, masked and other words are randomly inserted with random probability. This method can perturb the correct sentences and generate a large number of incorrect sentences, constituting a large number of parallel sentence pairs to augment the training data.

In practice, the subword to be manipulated in a sentence is first randomly selected, the word is replaced with “<UNK> ” with a probability of 0.1, i.e., the current word is masked, the original word is kept with a probability of 0.2, i.e., the sentence is not altered, and the current word is randomly replaced by a word from the whole corpus with a probability of 0.35, and the current word is deleted with a probability of 0.35 probability of deleting the current word. Briefly, the probabilities of masking, keeping, inserting and deleting the words in the sentence are denoted by p_mask, p_keep, p_insert and p_delete respectively, and the probability of the simplest noise addition selected in this paper is $[p_{m a s k}, p_{k e e p}, p_{i n s e r t}, p_{d e l e t e}] = [0.1, 0.2, 0.35, 0.35]$ .

2)

Noise Perturbation Model Closer to Real Errors

In order to produce more realistic grammatical errors in correct sentences, this paper will introduce the real occurring errors of English learners to perturb the correct sentences, and add common forms of errors such as lexical errors to generate English grammatical errors.

This experiment will capture the content modifications of the human editors to generate a dictionary of word modifications and apply them to grammatically correct sentences. In this paper, the edits of the corpus are extracted and those errors that occur more than 5 times are retained and they are generated into a corresponding perturbation dictionary. When scrambling a corpus with high grammatical quality, the corresponding word in the scrambled dictionary is randomly selected with a probability of 0.4 to replace the word currently ready for scrambling. With a probability of 0.2, the current word is replaced by a randomly selected word from the wordnet synonym set of the current word, and with a probability of 0.3, the current word is lexically perturbed, e.g., the singular form of a noun is perturbed to the plural form of a noun, and the past tense of a verb is changed to the present tense.

3)

Reverse Grammatical Error Generation Modeling

Grammatical error generation can be viewed as an English grammatical error reverse generation model, i.e., the process of grammatical error generation is viewed as translating a correct sentence into an incorrect one. In this paper, we will use the existing manually labeled data to train a grammatical error generation model and get the corresponding incorrect sentences to augment the parallel corpus. For the sequence of correct sentences to be processed $x = (x_{1}, x_{2}, \dots, x_{n})$ , the corresponding erroneous sentences to be generated $y = (y_{1}, y_{2}, \dots, y_{m})$ , and the training parameters θ by minimizing the loss function, the probability of sentence syntax error generation and the loss function of the model are shown in the following equation: (12) $p (x | y) = \prod_{t = 1}^{m} p (x_{t} | (y, x_{1 : t - 1}; θ)$ (13) $L (θ) = \sum_{t = 1}^{m} - \log p (x_{t} | y, x_{1 : t - 1}; θ)$

By the above method, a syntax error generation model can be trained, and by inputting some English corpus without syntax errors to the model, we can obtain English text with syntax errors, and generate a large number of parallel sentence pairs of “wrong sentence, right sentence”. In this experiment, this paper will use the BERT-fused model as the reverse grammatical error generation model [30].

2.2.3

Transformer model based on replication mechanism

In the English grammar error correction context, it is often a small portion of words that have grammatical errors, and most of the rest do not contain grammatical errors. In order to avoid the grammatical error correction algorithm from perturbing the correct sentences, this paper constructs a Transformer model (C-Transformer) based on the copying mechanism, i.e., those words that do not contain grammatical errors are directly copied into the target sentence. The probability distribution of the words in the target sentence is a mixture of the probability distribution $P_{t}^{g e n}$ generated by the error correction model and the probability distribution $P_{t}^{c o p y}$ copied from the source utterance, and the specific mechanism is shown in Equation (14): (14) $P_{t} (w) = (1 - α_{t}^{c o p y}) * p_{t}^{g e n} (w) + α_{t}^{c o p y} * p_{t}^{c o p y} (w)$

where $α_{t}^{c o p y} \in [0, 1]$ , $α_{t}^{c o p y}$ are the balance parameters used to control the generation probability and replication probability at each time step t.

The structure of the Transformer model based on the replication mechanism is shown in Fig. 1. In this structure, the probability distribution of the target word is generated by the underlying Transformer model, and a replication score is computed using the implicit state $h_{1 \dots N}^{s r c} (H^{s r c})$ of the input word of the source utterance and the implicit state h^trg of the target word. The specific replication-based attention score is computed in the same way as the underlying Transformer attention score, as shown in Eqs. (15) to (17): (15) $q_{t}, K, V = h_{t}^{t r g} W_{q}^{T}, H^{s r c} W_{k}^{T}, H^{s r c} W_{v}^{T}$ (16) $A_{t} = q_{t}^{T} K$ (17) $P_{t}^{c o p y} (w) = s o f t \max (A_{t})$

where q_t, K, and V are the query vector, key vector, and value vector that are used to compute the attention distribution and the replicated hidden layer, respectively. See Eq. (18) for the computation of the normalized attention distribution as the replica score and the replica hidden layer to estimate the equilibrium parameters $α_{t}^{c o p y}$ , $α_{t}^{c o p y}$ , and Eq. (19) for the loss function of the model: (18) $α_{t}^{c o p y} = s i g m o i d (W^{T} \sum (A_{t}^{T} \cdot V))$ (19) $l_{c e} = - \sum_{t = 1}^{T} \log (p_{t} (y_{t}))$

2.2.4

Model Optimizer

Adam (Adaptive moment estimation) is a stochastic optimization algorithm developed from the stochastic gradient descent (SGD) algorithm that can optimize neural network models very effectively [31]. It discards the method of updating the neural network parameters with the same learning rate in SGD, and chooses to use the methods of first-order estimation and second-order estimation to generate different adaptive learning rates for updating the network parameters. Adam’s algorithm combines the advantages of Adaptive Gradient Algorithm (AdaGrad) and Root Mean Square Propagation Algorithm (RMSProp), which is not only capable of solving the problem of the presence of sparse gradients, but also there are advantages such as high computational efficiency and better handling of non-stationary problems.

For the objective function f(ω), the parameters to be optimized ω and the initial learning rate α, Adam’s iterative process for parameter ω_t in step t is shown in the following equation: (20) $g_{t} = \nabla f (ω_{t})$ (21) $m_{t} = ϕ (g_{1}, g_{2}, \dots, g_{t})$ (22) $V_{t} = ψ (g_{1}, g_{2}, \dots, g_{t})$ (23) $η_{t} = α \cdot \frac{m_{t}}{\sqrt{V_{t}}}$ (24) $ω_{t + 1} = ω_{t} - η_{t}$

where g_t denotes the gradient of the objective function with respect to the parameters at the current moment, and m_t and n_t refer to the first-order and second-order momentum at the time of gradient computation. η_t and ω_t+1 refer to the gradient and the updated parameters at moment t, respectively. The detailed calculation process of the first-order momentum and second-order momentum refers to the following equation: (25) $m_{t} = β_{1} \cdot m_{t - 1} + (1 - β_{1}) \cdot g_{t}$ (26) $V_{t} = β_{2} \cdot V_{t - 1} + (1 - β_{2}) g_{t}^{2}$

3

Design of Computer-Assisted English Grammar Based Automatic Error Correction System

In this paper, the design of an automatic English grammar error correction system is realized by combining the English grammar error correction model based on C-Transformer, the pinyin detection algorithm and the feedback filtering algorithm.

3.1

System Functional Requirements Analysis

This system mainly corrects grammatical errors for English writing questions and returns the correction results to the user. In order to improve the accuracy of the system’s error correction results, users are allowed to provide feedback on the correction results, which is reviewed by the feedback filtering algorithm and the administrator. Through the continuous accumulation of grammatical error samples to expand the corpus, and the expanded corpus to train the model, so that the accuracy of the grammatical error correction model is continuously improved. Therefore, this system categorizes user roles into two types: ordinary users and administrator users.

Ordinary users can check and correct grammatical errors under the premise of logging into the system, input the text to be checked into the system, the system breaks the text and corrects grammatical errors sentence by sentence, and finally returns the results of the correction to the user by splicing. Users can give feedback on the correction results and query the historical correction results.

The administrator has the highest privilege of the system and can manage ordinary users, filter and audit feedback, manage the original corpus stored in the system and add new grammatical error texts, and train the grammatical error correction model.

3.2

Prototype system design

3.2.1

Overall system framework

The English Grammar Automatic Error Correction System can be divided into five layers according to different functions, and each layer refines the functions of the system into sub-tasks, which reduces the interdependence of the various parts of the system, facilitates the development and debugging of the system, and improves the scalability of the system. The overall framework of the system is shown in Figure 2, which is divided into four layers: data layer, application layer, representation layer and feedback layer. 1)

Data layer: this layer is used to receive the text to be detected sent by the user and preprocess the data, including removing html tags, removing irregular symbols and so on.

2)

Application layer: in this layer, the English grammar automatic error correction model based on the T5 pre-training model proposed in this paper will be used, using the collected public corpus combined with the grammar error text automatic generation algorithm proposed in this paper to train on the model and retain the optimal model. After receiving the request from the upper layer, the optimal model is calculated to generate the optimal result after grammatical error correction and combined with the pinyin detection algorithm to remove the error judgment caused by pinyin.

3)

Representation layer: this layer keeps the processing results of the application layer and unifies the processing results, which are returned to the user in a unified format after processing.

4)

Feedback Layer: After the representation layer displays the error correction results to the user, the user is allowed to give feedback on the error correction results. By comparing the user’s feedback on the error correction results and the system’s error correction results, the sentence confusion degree is calculated respectively, and if the user’s feedback sentence confusion degree is lower, then it is submitted to the management for review, and manual confirmation is made to confirm whether or not to adopt the suggestions and save the error correction results to the database, and the data set is expanded through the saving of the sample data, so as to train the model in the future. If the user’s feedback is more confusing than the system’s error correction result, the user’s suggestion will not be adopted, and the user will be allowed to submit the feedback again if there is any dispute about the system’s filtering, and it will be directly audited by the administrator.

3.2.2

System module design

In this paper, the English grammar automatic error correction system is divided into three parts: grammar error correction module, user feedback module and visualization module. 1)

Grammar Error Correction Module

The flow of the grammar error correction module is shown in Figure 3, which is mainly composed of five parts: text preprocessing, pinyin detection algorithm, grammar error correction algorithm, result optimization module, and grammar error correction model training. The main function of text pre-processing is to receive and pre-process the user request data, because manual writing may not be standardized, so this module needs to process the html tags and non-standard punctuation marks contained in the text. The text of the user’s request is cut according to the sentence level, and the text is broken. Grammatical error correction is performed sentence by sentence on the broken text, firstly, pinyin detection is performed on the text to find out the words in the statement that conform to the pinyin rules, and grammatical error checking and correction is performed on the statement. In order to avoid the types of errors caused by pinyin, the error correction results are optimized to remove those that conform to the pinyin rules in the set of error correction results, and finally, the detection results are returned to the user in the format of splicing processing for the user’s feedback on the error correction results.

2)

Feedback filtering module

The flow chart of the feedback module of this system is shown in Figure 4, the user can give feedback on the results of the system error correction, the use of trigram language model to calculate the degree of confusion for the user’s suggestions and the results of the system error correction, respectively. By comparing the perplexity, if the user feedback sentence is less perplexed, it means that the probability that the user feedback sentence is a sentence is greater, and then the feedback will be saved and manually reviewed whether to keep the suggestion. If the system modifies the sentence with lower confusion level, it means that the probability that the sentence modified by the system is a sentence is greater, then the user feedback is discarded and the optimal solution is finally saved in the database. The administrator account can manage the error correction texts saved in the database, and the model can be trained after the accumulated texts reach a set threshold. Because the errors caused by manual writing are more in line with human writing habits and are more valuable than a corpus expanded by rules, the model will be continuously optimized by continuously expanding the dataset and training the model.

3)

Visualization module

The visualization module is the only way for the system to interact with the user, and the degree of goodness of the system UI interface plays an important role in improving the user experience. Users add the text to be corrected by the front-end to send an error correction request, the error correction module completes the grammatical correction and returns the results, and the front-end renders the scoring results to complete the display of the results. The visualization module is mainly divided into two parts: grammatical error correction and result feedback, in addition to the administrator part also contains the system management interface.

3.3

Feedback Filtering Algorithm

According to Markov’s assumption, the probability of a word’s occurrence is only related to a fixed number of previous words, which is the basic idea of the n − gram model. n − gram Assuming that the occurrence of a word is only related to the previous n − 1 words, so there is no need to trace back to the previous too many words, greatly reducing the computational difficulty, and these n words constitute a gram. n is usually set to 1, 2, 3 in the n − gram-model, usually not more than 3, once more than 3 is not only computationally difficult to increase the effectiveness of the program will also be reduced. In this paper, we set n to 3, i.e. trigram model, and the calculation formula is: (27) $p (w_{1}, w_{2}, \dots, w_{T}) = \prod_{i = 1}^{T} p (w_{i} | w_{i - 2} w_{i - 1})$

Perplexity is a metric used to evaluate a language model that reflects how well a probability distribution or probability model predicts a sample. The lower the perplexity, the better the language model is and the higher the probability value it can assign to the sentences in the test set. The calculation of perplexity is shown in Equation (28): (28) $P P (W) = P {(w_{1} w_{2} \dots w_{T})}^{- \frac{1}{T}} = \sqrt[τ]{\frac{1}{P (w_{1} w_{2} \dots w_{T})}}$

4

Experimental results and analysis

In order to verify the effectiveness of the proposed English grammar error correction model in the application of the computer-aided grammar error correction system, this paper designs a model application experiment, a composition correction experiment, and a teaching experiment applying the grammar error correction system.

4.1

Experiments on the Application of English Grammar Error Correction Modeling

4.1.1

Experimental data set and pre-processing

The experimental data were chosen to be obtained from the English corpus, and the dataset was named CoNLL2023 English Writing Corpus, which contains a total of 25,000 sentences with 160 different kinds of grammatical errors in English writing, and the types of grammatical errors mainly include grammatical error types such as gerund tense errors, errors in the use of articles, errors in prepositions, and errors in acronyms.

The specific method of preprocessing the corpus is as follows: firstly, the English utterances are checked for spelling errors. Then the corpus is expanded, i.e., a mixture of news corpus, ESL corpus and corrected English corpus is used. Finally, manual error generation is performed, i.e., the news corpus is compensated for manual errors during model training to narrow the gap between the corpora. After data preprocessing in the above way, 28,000 English utterances containing grammatical errors are obtained. In order to better validate the performance of the proposed grammatical error correction model for English writing, the experiment will divide this dataset into a training set, a testing set and a validation set according to the allocation ratio of 7:2:1.

4.1.2

Model Training and Evaluation Indicators

In order to improve the model experimental effect, the Adam gradient descent optimizer is used for model training in this paper. Among them, the hyperparameter mini-batch of the model is set to 35, the initial learning rate is set to 0.0001, and the neuron disconnection probability p in the Dropout layer is set to 0.6.

The experiment chooses F_0.5 values, accuracy (P) and recall (R) as the evaluation indexes of English grammar error correction. The specific expressions of the three evaluation indexes are: (29) $P = \frac{\sum_{i = 1}^{n} | g_{i} \cap e_{i} |}{\sum_{i = 1}^{n} | e_{i} |}$ (30) $R = \frac{\sum_{i = 1}^{n} | g_{i} \cap e_{i} |}{\sum_{i = 1}^{n} | g_{i} |}$ (31) $F_{0.5} = \frac{(1 + {0.5}^{2}) \times R \times P}{R + {0.5}^{2} \times P}$

In the above equation, n denotes the total number of sentences employed in the test. g_i and e_i denote the set of standard and system-given modification results in sentence i, respectively.

4.1.3

Comparative analysis of model performance

In order to verify the performance advantages of the proposed C-Transformer-based English writing grammar error correction model, the experiments will be conducted by using the CoNLL2023 English writing corpus applied to this model and the traditional CAMB English grammar error detection and correction model for comparison experiments, respectively, and the model performance comparison results are obtained as shown in Table 1.

Table 1.

Test results of two syntax error correction models

Model	Precision /%	Recall /%	F_0.5 /%
CAMB	20.84	22.37	21.13
Ours	37.52	42.75	38.46

From the comparison results in Table 1, it can be seen that the accuracy, recall and F_0.5 metrics of the present model take the values of 37.52%, 42.75% and 38.46%, respectively, which are 16.68%, 20.38% and 17.33% higher than those of the traditional CAMB grammar error correction model, respectively. Comprehensive analysis shows that this model has higher accuracy and precision for detecting and correcting grammatical errors in English writing, better detection effect, and more superior model performance.

4.1.4

Verification of Grammatical Error Correction Effect in English Translation

In order to further validate the error correction effect of the proposed English writing grammar error correction model, the experiment will be based on the test set in Table 2 for six different types of English writing grammar errors. The number of different error types and their percentage in the dataset are shown in Table 2. The total number of English sentences in this test set is 5600, among which the number of sentences with English writing grammar errors is 2241, and the data set can be divided into Short and Long parts according to the length of the sentences, through which the validation of English writing grammar correction will be carried out.

Table 2.

Test data set

Error type	Number	Proportion /%
Article	652	11.65
Preposition	516	9.21
Nouns	461	8.23
Principal consensus	347	6.20
Verb form	265	4.73
Error total	2241	40.02
Total amount	5600	100

Based on the corpus of coronal errors and prepositional errors in the test set of Table 2, the experiment compares the present model with the traditional UIUC method and Corpus GEC method for grammatical error correction, and obtains the comparison results of the three grammatical error correction methods as shown in Table 3.

Table 3.

Article and preposition error correction results

Error type	Method comparison	P	R	F_0.5
Article error	UIUC	0.5246	0.3922	0.4914
	Corpus GEC	0.5873	0.4084	0.5400
	Ours	0.6148	0.5982	0.6114
Preposition error	UIUC	0.4072	0.3348	0.3903
	Corpus GEC	0.2685	0.2752	0.2698
	Ours	0.4214	0.4563	0.4280

From the comparison results in Table 3, it can be seen that in the coronal error detection, the error correction accuracy, recall and F0.5 of the present method take the values of 61.48%, 59.82% and 61.14%, respectively, which are higher than the other two grammatical error correction methods. And the F0.5 value of the present method is 12.00% and 7.14% higher than the UIUC method and Corpus GEC method, respectively. In preposition error detection, the error correction accuracy, recall and F0.5 of the present method are also greater than the corresponding values of the UIUC method and Corpus GEC method, reaching 42.14%, 45.63% and 42.80%, respectively. This shows that this method is effective in correcting grammatical errors of articles and prepositions in English utterances, and it improves the efficiency and precision of error correction for articles and prepositions by adding a replication mechanism in the decoding process of the Transformer model.

In order to further verify the error correction effect of the proposed grammatical error correction for English writing, the experiments will be carried out on the four typical error types of articles, prepositions, nouns and verbs respectively, and compared and analyzed with the long and short sentences as Long and Short respectively, and a comparison of the results of error correction of the four different grammatical errors by this method is obtained, as shown in Fig. 5.

As can be seen from the results in Fig. 5, the error correction F_0.5 value of this method for prepositions, nouns and verbs is significantly higher than that of short sentences, mainly due to the fact that the grammatical errors of English sentences in long sentences occur more frequently, and there are dependencies between individual words that are farther away, with less error-interfering information. Comprehensive analysis shows that the F_0.5 values of this method for four different grammatical error correction are kept above 55%, which indicates that the grammatical error correction results of this method satisfy the error correction needs of the English grammar error correction system.

4.2

Experimental Analysis of English Composition Critique for Chinese Students

In order to further validate the performance of this paper’s model in actual Chinese students’ English composition grammar critique, 600 non-English major college students’ English Level 4 and 6 compositions were tested from the SET3 and SET4 pools of the CLEC corpus, and the grammatical errors were further subdivided into ArtOrDet (Articles or Qualifiers), Nn (Nouns Singular and Plural), Npos (Nouns in All Possessives), Pform (pronoun form), Pref (pronoun reference), Prep (prepositions) Rloc- (word redundancy), Ssub (subordinate clauses), SVA (subject-verb agreement), Trans (linking words or connecting phrases), V0 (verb absence), Vform (verb form), Vm (modal verbs), Vt (verb tense), and Wci (fixed collocation), Wform (word form), WOadv (adjective or adverb order), WOinc (word order), and Others.

In this experiment, the number of labeled errors, the number of errors corrected by the model, and the number of correctly corrected errors will be counted, and the precision rate, the recall rate, and the F1 value will be used as performance evaluation metrics. The results of grammatical error correction of Chinese students’ English compositions by this paper’s model are shown in Table 4. It can be seen that this paper’s C-Transformer model has a good performance on actual grammar correction of Chinese students’ English compositions, and the average precision rate, recall rate, and F1 value of grammatical error correction reach 84.70%, 71.85%, and 77.75%, respectively.

Table 4.

Error correction results of Chinese students’ compositions by this paper model

Error type	The number of errors marked	Number of errors corrected	Correct the number of errors	P	R	F1
ArtOrDet	42	37	34	0.9189	0.8095	0.8608
Nn	69	64	59	0.9220	0.8551	0.8872
Npos	7	6	5	0.8333	0.7143	0.7692
Pform	18	13	10	0.7692	0.5556	0.6452
Pref	31	25	20	0.8000	0.6452	0.7143
Prep	32	28	24	0.8571	0.7500	0.8000
Rloc-	59	52	44	0.8462	0.7458	0.7928
Ssub	145	125	104	0.8320	0.7172	0.7704
SVA	63	58	47	0.8103	0.7460	0.7769
Trans	6	5	3	0.6000	0.5000	0.5455
V0	61	53	49	0.9245	0.8033	0.8597
Vform	46	30	26	0.8667	0.5652	0.6842
Vm	28	21	18	0.8571	0.6429	0.7347
Vt	35	32	27	0.8438	0.7714	0.8060
Wci	7	5	3	0.6000	0.4286	0.5000
Wform	6	4	3	0.7500	0.5000	0.6000
WOadv	14	8	6	0.7500	0.4286	0.5455
WOinc	20	15	9	0.6000	0.4500	0.5143
Others	89	79	68	0.8608	0.7640	0.8095
Total	778	660	559	0.8470	0.7185	0.7775

In order to be more intuitive, the data in the above table are presented using statistical charts and the statistical charts arranged according to the two metrics, precision rate and recall rate, from low to high are shown in Figures 6 and 7, respectively.

The accuracy rate of correcting the various types of grammatical errors in Figure 6 is segmented, and the accuracy rate statistics are shown in Table 5. It can be seen that the model has an accuracy rate of more than 90% in correcting three types of errors, namely, coronal or qualifying words, singular and plural nouns, and missing verbs, and for all grammatical errors, the accuracy rate is more than 60% (including 60%). On the one hand, this is due to the design structure of the model in this paper. The structure of the Transformer encoder based on the replication mechanism designed in this paper takes into account both grammatically incorrect words and non-grammatically incorrect words, and through the use of the replication mechanism, the model can exclude the interference of grammatically correct words and phrases when correcting errors, so as to increase the accuracy rate of the corrected errors. On the other hand, the data expansion method designed in this paper for the probability distribution of common grammatical errors made by Chinese students, the expanded data makes the trained C-Transformer model have a strong generalization performance when correcting common grammatical errors in English compositions made by Chinese students.

Table 5.

The correct precision rate of various grammatical errors is segmented statistics

Precision of error correction	Syntax error class
60~70%	Trans, Wci, WOinc
70~80%	Wform, WOadv, Pform, Pref
80~90%	SVA, Ssub, Npos, Vt, Rloc-, Prep, Vm, Others, Vform
>90%	ArtOrDet, Nn, V0

According to Figure 7, the segmentation statistics of the recall of correcting various types of grammatical errors are shown in Table 6. It can be seen that the model’s recall rate for missing verbs, articles or qualifiers, and singular and plural noun errors exceeds 80%, and it also has good recall performance in correcting some common grammatical errors, which is mainly due to the data expansion method of creating pseudo-parallel sentence pairs, which reflects the distribution of the grammatical error categories that are commonly committed by Chinese students, thus allowing the model to improve the recognition of these errors, and thus obtain a good recall performance. This is mainly due to the data expansion method of creating pseudo-parallel sentence pairs, which reflects the distribution of grammatical errors made by Chinese students.

Table 6.

The correct recall rate of various grammatical errors is segmented statistics

Precision of error correction	Syntax error class
<60%	Wci, WOadv, WOinc, Trans, Wform, Pform, Vform
60~70%	Vm, Pref
70~80%	Npos, Ssub, Rloc-, SVA, Prep, Others, Vt
>80%	V0, ArtOrDet, Nn

In addition, this paper further selects 1200 English compositions, which are manually grammar corrected and scored by the team English majors’ teachers, and the total score of grammar correction of an English composition is set to 15 points, then the model’s grammar correction result score is compared with the teachers’ score and the teachers’ score, and the average score of the teachers’ grammar correction is 13.25 and the average score of the model’s correction is 12.86, and the error of both of them is 0.4 The error between the two is within 0.4 points, so it proves that the automatic error correction model of English grammar designed in this paper has good practicality, and to a certain extent, it can replace the teacher’s grammatical correction of English compositions.

4.3

Results and analysis of teaching experiments

4.3.1

Selection of Experimental Subjects and Design of Experimental Procedures

In order to investigate the practical effect of the designed computer-aided grammar error correction system in college English writing teaching, this paper designs a teaching experiment. The experimental subjects were selected from 95 freshman undergraduates who were not majoring in English in College A. The experiment was conducted from November 1, 2023 to May 31, 2024, excluding one month of winter vacation, a total of six months, and the experimental process was divided into three stages: the first round of action experiment, the second round of action experiment and the third round of action experiment.

The first round of action experiment is from November 1, 2023 to December 31, 2023, the second round of action research is from January 1, 2024 to March 31, 2024, and the third round of action research is from April 1, 2024 to May 31, 2024. Composition training was conducted once every two weeks in all three phases, and the automatic English grammar error correction system designed in this paper was used to correct grammatical errors, with a total of four compositions in each training.

4.3.2

Pre-test results and analysis

By correcting the pre-test paper, this paper obtained the statistical results of the score rate of the pre-test paper as shown in Table 7 according to the eight lexical aspects examined in grammar: prepositional errors, pronoun errors, coronal errors, noun errors, adjective errors, adverb errors, verb errors (including predicate verbs and nonpredicate verbs), and conjunctions errors (including parallel conjunctions and subordinate conjunctions). Among the 20 questions tested, the highest score rate was 88.02% in question 5 which examined adverbs. The lowest scoring question was question 20 examining conjunctions with only 5.47%.

Table 7.

Pre-test score rate statistics

Item number	Examine the part of speech	Scoring rate	Item number	Examine the part of speech	Scoring rate
1	Nouns	83.52%	11	Pronoun	35.94%
2	Article	83.52%	12	Non-predicate verb	70.64%
3	Article	75.14%	13	Pronoun	79.35%
4	Nouns	27.15%	14	Predicate verb	22.86%
5	Adverb	88.02%	15	Predicate verb	70.64%
6	Adjective	66.34%	16	Coordinate conjunction	79.35%
7	Article	44.29%	17	Subordinating conjunctions	79.35%
8	Preposition	57.83%	18	Coordinate conjunction	57.43%
9	Preposition	62.26%	19	Predicate verb	70.64%
10	Non-predicate verb	62.26%	20	Subordinating conjunctions	5.47%

4.3.3

Results and analysis of the first round of action experiment post-tests

The test results before and after the first round of the action experiment are shown in Table 8. Before and after the first round of the action experiment, the score rate of the questions in 5 categories, such as prepositional, pronoun, noun, predicate verb and parallel conjunction, increased, but the score rate of the questions in 5 categories, such as coronal, adjective, adverbial, nonpredicate verb and subordinate conjunction, on the contrary, decreased. Among them, the biggest increase in the score rate of the noun category was 31.83%, while the biggest decrease was in the question of the adjective category, which amounted to 31.32%. Taken together, before and after the first round of the action experiment, although the fluctuating values of the students’ scores in individual categories varied greatly, the final average scores had few ups and downs and were more influenced by randomness.

Table 8.

Test results before and after the first round of action experiments

Item number	Before the first round of action experiments test (scoring rate)	After the first round of action experiments test (scoring rate)	Fluctuating value
Preposition	60.05%	67.74%	7.69%
Article	67.65%	60.24%	-7.41%
Pronoun	57.65%	83.91%	26.26%
Adjective	66.34%	35.02%	-31.32%
Adverb	88.02%	73.23%	-14.79%
Nouns	55.34%	86.17%	31.83%
Predicate verb	54.71%	58.93%	4.22%
Non-predicate verb	66.45%	54.47%	-11.98%
Coordinate conjunction	68.39%	72.05%	3.66%
Subordinating conjunctions	42.41%	42.39%	-0.02%
Average score	6.24	6.41	0.17

The results of the paired samples t-test for the pre- and post-test scores of the first round of the action experiment are shown in Table 9. As can be seen from Table 9, the value of significance is 0.952, indicating that there is no significant correlation between the changes in the pre- and post-test scores and the first round of the action research. Combined with the fluctuations in the score rates of the various types of grammar questions, it can be seen that the first round of the action experiment did not have a great impact on the students’ knowledge of grammar.

Table 9.

Pretest and posttest paired sample t-test for the first round of action experiments

		Pairing difference					t	df	Sig. (2-tailed)
		Mean value	Standard deviation	Standard error mean	95% confidence interval of the difference
		Mean value	Standard deviation	Standard error mean	Lower limit	Upper limit
Pairing 1	Posttest-Pretest	0.74%	19.502%	5.974%	-13.024%	14.122%	0.079	10	0.952

4.3.4

Results and analysis of the second round of action experiment post-tests

The test results before and after the first round of the action experiment are shown in Table 10. Before and after conducting the second round of the action experiment, all types of grammar questions showed an increase in score rate, except for adverbs, which showed a slight decrease in score rate. The category with the highest increase in score rate was the predicate verb category, which reached 32.58%, and only the score rate of adverbs decreased slightly. Overall, before and after the second round of the action experiment, students’ scores on grammar questions increased significantly, and the fluctuation value of the average score increased more significantly compared to the first round of the action experiment.

Table 10.

Test results before and after the first round of action experiments

Item number	Before the first round of action experiments test (scoring rate)	After the first round of action experiments test (scoring rate)	Fluctuating value
Preposition	67.74%	77.52%	9.78%
Article	60.24%	88.45%	28.21%
Pronoun	83.91%	89.36%	5.45%
Adjective	35.02%	64.37%	29.35%
Adverb	73.23%	66.74%	-6.49%
Nouns	86.17%	94.02%	7.85%
Predicate verb	58.93%	91.51%	32.58%
Non-predicate verb	54.47%	76.24%	21.77%
Coordinate conjunction	72.05%	73.13%	1.08%
Subordinating conjunctions	42.39%	45.52%	3.13%
Average score	6.41	6.85	0.44

A paired samples t-test for the pre- and post-test scores of the second action experiment is shown in Table 11. This time, the t-value is 3.138 and the significance value is 0.012<0.05, indicating that the changes in the pre- and post-test scores are significant and statistically significant. Combined with the fluctuation of the score rate of each type of grammar questions, it can be concluded that the second round of the action experiment has significantly improved the students’ mastery of each type of grammar.

Table 11.

Pretest and posttest paired sample t-test for the second round of action experiments

		Pairing difference					t	df	Sig. (2-tailed)
		Mean value	Standard deviation	Standard error mean	95% confidence interval of the difference
		Mean value	Standard deviation	Standard error mean	Lower limit	Upper limit
Pairing 2	Posttest-Pretest	13.27%	13.425%	4.436%	3.418%	23.064%	3.138	9	0.012

4.3.5

Post-test results and analysis of the third round of action experiments

After three rounds of action research, this paper administered a post-test to the students, using test questions of the same type and difficulty as the pre-test questions. The statistical results of the pre and post-test score rate comparison are shown in Table 12. Comparative analysis of the data in the table shows that the correct rates of the pre-test of 14 verb questions and 20 conjugation questions are 22.86% and 5.47% respectively, and the correct rates of the post-test are 40.35% and 21.81%, compared with other questions, the correct rates are not high, but the students’ grammatical learning of the types of questions has progressed significantly, and some of them have been able to analyze the errors better. Taken together, most students have improved their understanding of all types of lexemes. Although there was a decrease in scores on one quiz for articles (-8.71%), there was an increase in scores on the other quizzes, and it may be that this is just an isolated phenomenon and that these data show progress in students’ grammatical understanding and application.

Table 12.

Comparative statistics of scoring rate between pretest and posttest

Item number	Examine the part of speech	Pretest score rate	Posttest score rate	Fluctuating value
1	Nouns	83.52%	100%	16.48%
2	Article	83.52%	92.25%	8.73%
3	Article	75.14%	66.43%	-8.71%
4	Nouns	27.15%	61.98%	34.83%
5	Adverb	88.02%	100%	11.98%
6	Adjective	66.34%	88.05%	21.71%
7	Article	44.29%	70.48%	26.19%
8	Preposition	57.83%	70.48%	12.65%
9	Preposition	62.26%	92.72%	30.46%
10	Non-predicate verb	62.26%	88.41%	26.15%
11	Pronoun	35.94%	70.74%	34.80%
12	Non-predicate verb	70.64%	92.48%	21.84%
13	Pronoun	79.35%	96.79%	17.44%
14	Predicate verb	22.86%	40.35%	17.49%
15	Predicate verb	70.64%	88.07%	17.43%
16	Coordinate conjunction	79.35%	92.43%	13.08%
17	Subordinating conjunctions	79.35%	100%	20.65%
18	Coordinate conjunction	57.43%	79.24%	21.81%
19	Predicate verb	70.64%	85.87%	15.23%
20	Subordinating conjunctions	5.47%	27.28%	21.81%

In addition, this paper also conducted paired-sample t-tests on students’ pre and post-tests on single-sentence correction (A), short-text correction (B), single-sentence fill-in-the-blank (C), single-sentence translation (D), essay fill-in-the-blank (E), and written expression (F), and the results of the paired-sample t-tests for the different question types are shown in Table 13.

Table 13.

Paired sample T-test for different question types

		Pairing difference					t	df	Sig. (2-tailed)
		Mean value	Standard deviation	Standard error mean	95% confidence interval of the difference
Items		Mean value	Standard deviation	Standard error mean	Lower limit	Upper limit
A	Posttest-Pretest	5.164	3.364	0.526	4.115	6.172	9.864	46	0.000
B		3.285	4.128	0.612	1.954	4.376	5.201	46	0.000
C		3.447	3.902	0.579	2.283	4.519	5.945	46	0.000
D		1.841	3.017	0.474	0.814	2.754	3.822	46	0.000
E		3.423	4.725	0.923	1.532	4.308	3.487	46	0.000
F		1.586	2.413	0.361	0.341	0.797	4.378	46	0.000

As can be seen from Table 13, compared with the scores on the pre-test, single-sentence correction, short-text correction, single-sentence fill-in-the-blank, single-sentence translation, composition fill-in-the-blank, and written expression increased by an average of 5.164 points, 5.164 points, 3.285 points, 3.447 points, 1.841 points, and 3.423 points, respectively. The increases in the scores for each question type varied, but proved that the students had made some progress. In summary, the post-test performed better than the pre-test on all items mentioned. The t-value proves that such an improvement is statistically significant and the significance value of 0.000 for each item also indicates the significance of the improvement. This means that all the teaching activities, whether it is single sentence correction, short correction, single sentence fill-in-the-blank, single sentence translation, essay fill-in-the-blank, or written expression have achieved effective improvement. This fully demonstrates that the C-Transformer-based English grammar automatic error correction system designed in this paper can be applied to the teaching of English writing and can significantly improve students’ mastery of grammar.

5

Conclusion

In this paper, C-Transformer, an English grammar error correction model based on the replication mechanism, is constructed, and integrated with pinyin detection algorithm, feedback filtering algorithm, and data expansion method to complete the design of a computer-aided grammar error correction system in university English writing teaching.

The accuracy, recall and F_0.5 indicators of the C-Transformer model reached 37.52%, 42.75% and 38.46%, respectively, which were significantly higher than the traditional CAMB grammar error correction model. Applying the model to English writing grammar correction, the error correction accuracy, recall and F_0.5 values of this paper’s model are 42.14%, 45.63% and 42.80%, respectively, which are better than the UIUC method and the Corpus GEC method, indicating that this paper’s model has a higher accuracy and precision for detecting and correcting grammatical errors in English writing, with a better detection effect and more superior model performance.

By further subdividing the grammatical errors and conducting experiments on the C-Transformer model for Chinese students’ composition correction, the mean values of precision rate, recall rate and F1 value of this paper’s model for error correction of each type of grammatical errors reached 84.70%, 71.85% and 77.75%, respectively. Meanwhile, the error between the model’s grammar correction result score and the teacher’s score is within 0.4 points, which verifies the practicality of the English grammar error correction model in this paper.

In addition, an English grammar teaching experiment was designed, and after three rounds of action experiments, the students who received this paper’s English grammar automatic error correction system to assist their learning got a significant improvement (p<0.05) in the grammar scoring rate of various types of questions, especially the writing type of questions, which verified the feasibility of the designed system for assisting college English writing teaching.

Lingua:: Inglese

Frequenza di pubblicazione:: 1 volte all'anno
Argomenti della rivista:: Scienze biologiche, Scienze della vita, altro, Matematica, Matematica applicata, Matematica generale, Fisica, Fisica, altro

Feed RSS della rivista

A Study on the Effectiveness of Designing and Applying Computer-Assisted Grammar Error Correction System in College English Writing Teaching

Wei Shang

Hui Wang

Xuyan Chen

Pubblicato online: 29 set 2025

Ricevuto: 24 dic 2024

Accettato: 20 apr 2025

DOI: https://doi.org/10.2478/amns-2025-1089

Parole chiaveTransformer model, Replication mechanism, Data expansion, English grammar error correction

© 2025 Wei Shang et al., published by Sciendo.

This work is licensed under the Creative Commons Attribution 4.0 International License.

Parole chiave
Transformer model, Replication mechanism, Data expansion, English grammar error correction