Research on English Writing Teaching Strategies for College Students with the Assistance of Artificial Intelligence

Under the background of the artificial intelligence era, new technologies emerge one after another, and the penetration in English teaching in colleges and universities has prompted greater changes in English writing teaching in colleges and universities [1]. Artificial intelligence technology brings new opportunities while bringing challenges to the reform of English writing teaching mode in colleges and universities. In order to improve the English writing ability of college students, it is necessary to make full use of the advantages brought by artificial intelligence technology, seize the opportunity of higher education reform, actively change the teaching concept, innovate the English teaching methods in colleges and universities, design the teaching program from multiple perspectives, ensure the professionalism and rigor of English writing, and improve the college students’ commitment to English learning [2-5].

English writing is a core course for English majors and other majors in Chinese colleges and universities [6]. Most of the students in colleges and universities have a weak foundation in English, and they often encounter difficulties in the writing process, which makes it more difficult to teach English writing [7-9]. Under the traditional teaching mode, some higher vocational English teachers use the indoctrination teaching method, mechanically transferring the writing knowledge and skills to students, unable to mobilize students’ thinking, resulting in the difficulty of improving students’ writing ability [10-13]. At the same time, some teachers, in order for students to achieve good test scores, guide students to memorize English composition writing templates, so that students imitate according to the templates, resulting in a more procedural writing, which imprisons students’ thinking and is not conducive to the future development of students [14-17]. Therefore, colleges and universities need to innovate writing teaching methods, build a more flexible teaching classroom for students, so that students can think independently and exercise their thinking ability and creativity [18-20]. The integration of artificial intelligence technology to reform college English writing teaching and comprehensively improve the modernization level of college English writing teaching is the basic foothold of the integration of science and reality in college English teaching at this stage [21-23].

Using machine learning technology, one of the core technologies of artificial intelligence, can provide powerful technical support for personalized teaching of English writing. Li, X. proposed a personalized English teaching model based on machine learning algorithms, which extracts data from students’ direct materials in the process of language acquisition for organizing and analyzing, and paves the way to establish a personal tailored English teaching method [24]. Zhou, L. studied the automated English writing scoring system supported by machine learning technology, and built a new English writing teaching model around the model, which can effectively stimulate students’ interest in writing and thus improve their English writing level [25]. Fu, S. et al. analyzed the different needs of teachers’ teaching and students’ learning in the teaching of English writing in colleges and universities, and proposed to build an intelligent scoring model for English composition based on machine learning technology which can not only accurately score students’ writing texts, but also generate intelligent comments, providing a new direction for English writing teaching [26]. Its construction of intelligent writing system based on students’ writing behavior and data, automated evaluation and optimization of English writing, can meet the writing needs of different types of students.

Natural Language Processing (NLP) technology follows the objective law of university English writing teaching and belongs to one of the important English writing teaching technologies under the background of Artificial Intelligence. Li, Y. et al. used NLP technology to extract and analyze the features of a large number of student writing samples to determine the types of writing errors that occur in the process of students’ writing, and then put forward suggestions for corrections and strategies for improvement [27]. Chen, L. evaluated the effectiveness of NLP technology in improving students’ writing skills and teachers’ teaching methods, and the real-time feedback and writing suggestions it provides are important for improving students’ English writing skills [28]. Xu, P. et al. developed a set of automatic scoring system for English writing based on NLP technology, which is equipped with both auto-scoring and error-detecting functions, and it can help students to grasp the deficiencies in their own writing skills and make improvements, which has a high value of use in English writing teaching [29]. Zhao, D. explored the personalized English writing learning experience brought by NLP technology in terms of linguistic accuracy, content summarization and writing creativity, which provides an informative practical path to improve the quality of English writing teaching [30]. The intelligent writing assistance system based on natural language processing technology plays an important role in students’ writing analysis and evaluation, grammatical error correction, and writing skill improvement, helping students to better organize and express their opinions, and improving the quality of writing teaching.

Generative Artificial Intelligence, as an advanced AI technology, has been widely used in the field of education. Ibrahim, K. et al. explored the developmental potential that ChatGPT has in facilitating the process of teaching second language writing and demonstrated that it can enhance students’ English writing by increasing motivation, assigning instructional tasks, and providing personalized feedback [31]. Fitria, T. N. examined the application of ChatGPT in English writing, which focuses on the sequence of events and writing order in generative utterances and the use of active and passive voice in English writing [32]. Söğüt, S. elucidated the outstanding advantages of generative AI in teaching English writing for overcoming writing barriers and obtaining linguistic support, where students can promptly get personalized feedback from their own writing samples [33]. Alzubi, A. A. F. emphasized the effectiveness of generative AI in improving the writing skills of English language learners and conducted a study on AI literacy in the process of students’ English writing learning using this tool to ensure the effectiveness of the learning process [34]. It can be found that generative AI can help improve college students’ ability to analyze and research English writing and provide reliable data and case support for subsequent English writing skills learning.

This paper proposes an English text automatic error correction model based on BERT, and builds an English writing teaching platform based on the model, which enhances students’ English writing learning efficiency through the platform’s English text automatic error correction core function. After the elaboration of Transformer and BERT, the parameter initialization encoder learned from BERT is firstly used to initialize the parameters and complete the enhancement of model data. In the construction of the English text automatic error correction model, the encoder side adopts the dual coding structure of Bi-GRU-based syntactic encoder and BERT and Bi-GRU-based semantic coder, while the decoder side uses the structure combining the hybrid attention mechanism and Bi-GRU to refer to syntactic and semantic information extracted by the encoder to enhance the accuracy of decoding. Carry out the system design of the English writing teaching platform based on the English automatic error correction model, after determining the overall system structure design, formulate the model developer to carry out the process of data collection and annotation, and complete the design of the system database based on the users, roles, texts, errors and data sets. Design the page functions of text error correction, data collection, role management and permission management modules, and provide background management system for platform managers to facilitate data collection. Examine the performance of the English writing teaching platform constructed in this paper in terms of loss function, accuracy and response time. After it is clear that the performance of the platform is sufficient to meet the daily teaching needs, the application practice is carried out, using the platform to carry out English writing teaching in the control class and maintaining the traditional teaching method in the control class, and exploring the practical utility of the platform in English writing teaching in colleges and universities through comparative analysis.

2

Automatic Error Correction Model for English Text Based on BERT

In the era of big data and artificial intelligence, the amount of data of text information has seen explosive growth, and the application scene of English text error correction technology has been expanding, and its use in English teaching in colleges and universities has become increasingly widespread. Constructing an English text error correction platform has become the main way and strategy for teaching English writing to college students. For this reason, this chapter will propose an automatic error correction model for English text based on BERT, which provides the basis for the functional implementation of the construction of English text error correction platform in the following section.

2.1

BERT and Transformer

2.1.1

Transformer

In the field of natural language processing, Transformer is a deep learning model architecture based on the attention mechanism [35]. The Transformer model abandons the sequence structure of traditional RNN-based language models and realizes sequence-to-sequence modeling based entirely on the attention mechanism. With its powerful representation learning capability and parallel computing power, Transformer revolutionized the processing of traditional natural language processing (NLP) tasks and achieved significant performance gains in tasks such as machine translation, text categorization, sentiment analysis, and question and answer systems. Compared to traditional RNNs and short-term and long-term memory networks, Transformer utilizes the self-attention mechanism to parallelize computation, which significantly improves the speed and efficiency of model training.

Transformer’s inputs are first converted into word embedding vectors through an embedding layer, and then positionally encoded into a stack of multiple encoder and decoder layers. Transformer’s encoder consists of multiple encoder layers of the same structure, each encoder layer containing two sublayers: a multi-head self-attention mechanism layer and a feed-forward neural network layer. The self-attention mechanism allows the model to capture global information by focusing on all positions in the sequence within a single time step, while the feed-forward neural network layer implements nonlinear transformations through fully connected layers and activation functions. Transformer’s decoder also consists of multiple decoder layers of the same structure, each of which, in addition to containing the two sublayers in the encoder, also adds an encoder-decoder Attention Mechanism layer. This layer helps the decoder to focus on different positions of the input sequence in order to better generate the output sequence. In addition, Transformer’s residual join and layer normalization techniques help mitigate gradient vanishing and speed up the training process.

Transformer has two key innovations: One is the multi-head attention mechanism, which improves the model’s representational ability by dividing the self-attention mechanism into multiple heads for parallel computation, where each head can learn different representations and finally stitch them together. The computational process of multi-head attention is: 1)

Mapping the input vectors into new query $(Q)$ , key $(K)$ and value $(V)$ vectors by linear transformation respectively;

2)

Multi-head attention divides the above mapped Q, K, V vectors into multiple heads respectively, usually into multiple subspaces for parallel processing.

3)

Perform attention computation on the Q, K, and V vectors of each head to obtain the attention output of each head.

4)

Stitch the attention outputs of all the heads together and integrate them through another linear transformation layer to get the final multi-head attention output.

The specific formula for multi-head attention is as follows: (1) $\begin{matrix} M u l t i H e a d (Q, K, V) = C o n c a t (h e a d_{1}, \dots, h e a d_{h}) W^{O} \\ w h e r e h e a d_{i} = A t t e n t i o n (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V}) \end{matrix}$

In Eq. (1) W^O, $W_{i}^{O}$ , $W_{i}^{K}$ , $W_{i}^{V}$ denote independent trainable matrices, and attention is computed as: (2) $A t t e n t i o n (Q, K, V) = S o f t \max (\frac{Q K^{T}}{\sqrt{d_{k}}})$

where d_k denotes the dimension of the query vector, which serves to control the range of values of the dot product to ensure that the computation of the attentional weights is not biased by the dimension of the input vector.

Another important concept is position encoding (PE), which is used to provide Transformer with information about the position of words in the input sequence so that the model can distinguish between words in different positions. Transformer represents a fixed position encoding by using sine and cosine functions with different frequencies, which are computed as follows: (3) $P E (p o s, 2 i) = \sin (p o s / 10000^{2 i / d_{\mod e l}})$ (4) $P E (p o s, 2 i + 1) = \cos (p o s / 10000^{2 i / d_{\mod e l}})$

where $P E (p o s, 2 i)$ and $P E (p o s, 2 i + 1)$ denote the position encoded values at position pos and dimensions 2i and 2i + 1, respectively, and d_model denotes the dimension in the embedding vector.

2.1.2

BERT

The Transformer architecture has achieved great success in the field of natural language processing with models such as BERT and GPT. Among them, BERT is a pre-trained language model mainly applied in the field of natural language understanding.

BERT is a model for self-supervised pre-training in large-scale datasets based on Transformer’s encoder architecture [36]. Compared to traditional unidirectional language models, the bidirectionality of BERT enables the model to better understand the relationship between contexts, thus improving the understanding of textual context. The input of BERT contains three parts. The first part is word embedding, the BERT model adopts WordPiece embedding, which splits words into multiple subwords and transforms each subword into a vector representation. BERT adds special markers to the input sentences, such as $[S E P]$ for the end of the sentence or the interval between the two upper and lower sentences, and $[C L S]$ for the placeholder for the classification task, which is used to determine whether the latter sentence is the lower sentence of the previous sentence. The second part is the segmentation embedding (, which is used to distinguish the boundary between two sentences or text paragraphs, which is useful for processing text sequence pairs of tasks (e.g., question and answer tasks). The third component is the positional embedding, which is used to encode the positional information of words into a vector representation, allowing the model to take into account the relative positional relationships of words in a sequence. The input vector to BERT is the sum of the word embedding, segmental embedding and positional embedding. In this way, BERT is able to fully take into account the positional relationships and semantic information of words in a sentence, thus better capturing the meaning of the sentence.

BERT builds two pre-training tasks, the first one is a masked language modeling task where the process is to randomly mask 15% of the words in a sentence and let the model predict these words. As opposed to with other pre-training language models, which make predictions based on context, this can lead to inconsistencies between training and prediction since these hidden words are replaced with the token $[M a s k]$ in pre-training, whereas in real scenarios, the probability of occurrence of $[M a s k]$ is small. To mitigate this problem, 10% of the words in this 15% are replaced with random words, 10% are left unchanged and 80% are replaced with $[M a s k]$ . The second is the next sentence prediction training task, which aims to train the model to predict whether two sentences are adjacent in different contexts. These two training methods, together with multi-layer deep learning, enable BERT to effectively capture contextual information, and the semantic understanding capability surpasses all previously proposed word embedding models.

After pre-training, BERT can be fine-tuned for application on various NLP tasks, including text categorization, named entity recognition, and Q&A systems. During the fine-tuning process, the pre-training parameters of the BERT model are used as initial parameters, which are then combined with task-specific data for training. This allows the model to refine and optimize its language representation capabilities according to the needs of a specific task. Through fine-tuning, BERT can be adapted to task-specific data and features to improve the model’s performance on specific tasks.

2.2

Parameter initialization and data enhancement

1)

Initialize Transformer’s Encoder with BERT’s parameters

The Transformer model is a sequence generation model based on a multi-head attention mechanism. The model encodes the input sequence with word embedding and sums it with positional encoding as input, the encoder encodes the input as a high-dimensional implied semantic vector containing the semantic information of the whole input, the decoder decodes the implied semantic vector from it and obtains the output by using softmax function. The encoder consists of multiple identical layers, each containing two sub-layers, Multi-head Alttention and Feed Forward, and the decoder also consists of multiple identical layers, each containing three sub-layers: Multi-head Attention, Feed Forward, and Masked Multi-head Alttention.

Before starting to train the neural network, it is necessary to initialize the parameters, i.e., assigning values to each weight and bias in the network, and a reasonable initialization is conducive to model convergence. BERT adopts the Encoder structure of Transformer, also known as BERT-Encoder, which is pre-trained on a large-scale corpus mainly by Masked Language Model to learn the existing knowledge and extract the weight parameters learned by the pre-trained model to initialize the Transformer’s Encoder to improve the performance of the model [37]. The pre-training model is selected as Chinese-RoBERTa-wwm-ext, which employs a full-word masking strategy in the pre-training stage, eliminates NSP, and improves on many NIP tasks compared to other pre-training models. The Decoder weight parameter, W, is randomly initialized using the Xavier method, and W obeys a uniform distribution: $W ~ U [- \frac{\sqrt{6}}{\sqrt{n_{i} + n_{i + 1}}}, \frac{\sqrt{6}}{\sqrt{n_{i} + n_{i + 1}}}]$ .

2)

Parallel corpus data enhancement method based on dynamic masking

The size of the training corpus is one of the important factors in neural network modeling, in order to obtain more training data, for a given source utterance $X = (x_{1}, x_{2}, \dots, x_{i}, \dots, x_{n})$ with the corresponding target utterance $Y = (y_{1}, y_{2}, \dots, y_{i}, \dots, y_{n})$ , each word x^(j) in the source sentence X^(j) of the jrd round of training is replaced, with a certain probability δ, with other characters by the substitution function f(x), to obtain the masked input $\bar{X} (j) = (\bar{x} {(j)}_{1}, {\bar{x}}_{2}^{(h)}, \dots, {\bar{x}}_{i}^{(j)}, \dots, {\bar{x}}_{n}^{(ρ)})$ , viz: (5) ${\bar{x}}_{i}^{(j)} = {\begin{matrix} {\bar{x}}_{i}^{(j)}, p > δ \\ f ({\bar{x}}_{i}^{(j)}), p \leq δ \end{matrix}$

where p is a random number generated from a uniform distribution in the interval [0, 1] and f is the replacement function.

Pairing ${\bar{X}}^{(j)}$ and Y^(j) as a new training corpus so that the inputs are different for each round of training realizes the data enhancement effect without adding additional parallel corpus.

2.3

English Text Automatic Error Correction Model Construction

2.3.1

Encoder module

Borrowing from the idea of auxiliary encoder, this paper adopts the structure of dual encoder to do feature extraction of syntactic and semantic information of sentences respectively. On the syntactic encoder side, Bi-GRU is used to do syntactic feature extraction to obtain the syntactic representation of the sentence, and on the semantic encoder side, BERT and Bi-GRU are used to do semantic feature extraction to obtain the semantic representation of the sentence [38]. The details of these two encoders are described below. 1)

Syntactic Encoder

The syntactic encoder of the design model in this paper contains a lexical embedding layer and a Bi-GRU layer.

The computational formulae involved in the syntactic encoder are described as shown in the following equations. Firstly the embedding layer is computed, for the sentence lexical vector representations S_p, S_p will be fed into the lexical embedding layer and after the feed forward neural network computation the new lexical vector representation $S_{p}^{'}$ is obtained as shown in the following equation (6). The dimension of the lexical vector of each word after embedding is converted to the dimension of the input vector of Bi-GRU: (6) $S_{p}^{'} = \tanh (W_{p} \cdot S_{p})$

Then the Bi-GRU layer is computed. The sequence $S_{p}^{'}$ is fed into the Bi-GRU, and for each moment t, the hidden layer state h_t is the splice of the $h_{t}^{+}$ generated by the forward GRU unit and the h_t⁻ generated by the reverse GRU unit as in Eq. (7), where the concat function represents the vector splice operation. It should be mentioned about the computation process of hidden layer state ht in Bi-GRU: (7) $h_{t} = c o n c a t (h_{t}^{+}, h_{t}^{-})$

In addition, the syntactic encoder produces an intermediate vector c_syn that incorporates the syntactic information of the whole sentence, for which c_syn is computed as shown in Eq. (8) below. Doing a splice of the hidden layer states of the forward and reverse final time step of the uppermost layer k yields the final syntactic intermediate vector c_syn, which will be used as part of the initialization vector of the hidden layer states at the decoding end later on: (8) $c_{s y n} = c o n c a t (h_{t}^{k +}, h_{1}^{k -})$

2)

Semantic coder

The semantic coder designed in this paper contains BERT layer and Bi-GRU layer, and the extraction of semantic information is mainly divided into two steps.

Suppose the input sentence is S, the BERT-represented sentence is S′, S′ consists of sequences ${x_{1}, x_{2}, \dots x_{n}}$ , where each x_i is a 768-dimensional vector, and x_i is embedded with the semantic information of the context and implies the a priori knowledge gained from learning on large-scale corpus by the BERT pre-trained model. The formulaic expression is shown in Equation (9). After that, sequence S′ will be used as the input of Bi-GRU: (9) $S' = B E R T (S)$

2.3.2

Decoder Module

The decoder module of this model contains three main parts, which are the hybrid attention layer, the gated recurrent network layer, and the output layer. 1)

Hybrid Attention Layer

In order to better refer to the syntactic and semantic information of the context in the original sentence, this model designs a hybrid attention mechanism.

At the encoder side, syntactic intermediate vector c_syn and semantic intermediate vector c_sem have been obtained in the previous section, and the two are spliced together as the query vector q for the attention computation, as shown in Eq. (10): (10) $q = c o n c a t (c_{s p m}, c_{s e m})$

Then the syntactic attention is computed first. The syntactic features of the sentence context output by the syntactic encoder are represented as matrix $H_{s y n} = (h_{s y n 1}, h_{s y n 2}, \dots, h_{s y n n})$ , in which every position i of h_symi will be operated with q to find the corresponding attention weights, and then after softmax normalization operation, weighted summation, that is to say, the final syntactic attention vector is obtained. The specific operation process is shown in Eq. (11) below: (11) $\begin{array}{rcl} a_{s y n} & = & \sum_{i = 1}^{N} α_{i} \cdot h_{s y n i} \\ = & \sum_{i = 1}^{N} s o f t \max (s (h_{s y n i}, q)) \cdot h_{s y n i} \end{array}$

where a_syn represents the final syntactic attention vector and α_i represents the attention weight for each position i. The s operation uses a scaled dot product method with the formula shown in (12): (12) $s (h, q) = \frac{h^{T} \cdot q}{\sqrt{D}}$

After the syntactic attention vector a_syn and semantic attention vector a_sem are computationally obtained, the two are spliced and fed into a feed-forward neural network layer for further feature extraction to obtain the final hybrid attention vector. The computational formula is shown in (13) below: (13) $a_{f i n a l} = \tanh (W_{a t t} \cdot (c o n c a t (a_{s y n}, a_{s e m})))$

Eventually, through the computation of attention, a hybrid attention vector a_final incorporating syntactic and semantic information of the original sentence is obtained, which will be used in the subsequent decoding process of the Bi-GRU layer. 2)

Gated Recurrent Unit Layer

The decoding of the gated recurrent unit layer requires attention to the fact that when decoding the first word, since there is no reference word in front of it, the $[C L S]$ start character is set to serve as the input for the first position, and a_final serves as the a₀. The $[S E P]$ is the termination character, which terminates the decoding when the model decodes the $[S E P]$ .

The arithmetic formulas inside each GRU unit can be written as shown in (14) to (17): (14) $r_{t} = σ (W_{r} \cdot [a_{t - 1}, x_{t}))$ (15) ${\tilde{h}}_{t} = \tanh (W_{h} \cdot [r_{t} \cdot a_{t - 1}, x_{t}])$ (16) $z_{t} = σ (W_{z} \cdot [a_{t - 1}, x_{t}])$ (17) $h_{t} = (1 - z_{t}) \cdot a_{t - 1} + z_{t} \cdot {\tilde{h}}_{t}$

3)

Output Layer

The output layer at the decoder side contains a fully connected layer and a softmax classification layer. Assuming that the output of the gated loop unit at moment t is a vector h_t with dimension M, and the output of the fully connected layer is a vector o_t with dimension V of the size of the word list, the operation of o_t is shown in Eq. (18) below, where $W_{h} \in ℝ^{V \times M}$ : (18) $o_{t} = \tanh (W_{h} \cdot h_{t})$

Finally, the softmax function is used to convert vector o_t into a probability distribution vector ${\hat{y}}_{t}$ , and each number in ${\hat{y}}_{t}$ represents the probability of selecting the word from the word list. In this paper, the cross-entropy function is used to calculate the loss, and the formula is shown in (19) below: (19) $H ({\hat{y}}_{t}, y_{t}) = - \sum_{i = 1}^{n} {\hat{y}}_{t} (x_{i}) \log (y_{t} (x_{i}))$

where ${\hat{y}}_{t}$ is the probability distribution predicted by the model at moment t, while y_t is the true labeling at moment t, which is the uniquely hot representation of a word, and x_i represents each value in the vector.

2.3.3

Filtering error correction results module

One of the most common strategies for the search problem is greedy search, as in the case of prediction selection during model training, where the word with the highest probability is selected as a prediction at each position in the generated text sequence. The formulaic description of greedy search is shown in Equation (20). The search stops when ${\hat{y}}_{t}$ is the ending marker “ $[S E P]$ ” or when the maximum length of the sentence is reached. Since this result filtering strategy considers only the current optimal choice for each prediction, only one candidate sequence is retained, which is not guaranteed to be the globally optimal solution: (20) ${\hat{y}}_{t} = \arg \max_{y} P (y | {\hat{Y}}_{s}, X)$

In this paper, we use a modified method of greedy search for cluster search. Suppose that at moment t − 1, there exists a set of B candidate sequences denoted as $Y_{t - 1} = {Y_{t - 1}^{1}, \dots, Y_{t - 1}^{B}}$ , and the set $S_{t} = {(Y_{t - 1}^{b}, y_{t}) | \forall (Y_{t - 1}^{b} \in Y_{t - 1}) \land (y_{t} \in V)}$ is used to represent all combinations of sequences at moment t. Then, the B candidate sequences $Y_{t}$ at moment t can be described formulaically as shown in (21): (21) $Y_{t} = \arg \max_{Y_{t}^{1}, \dots, Y_{t}^{s} \in ε_{t}} \sum_{b = 1}^{B} \log P (Y_{t}^{b} | X)$

The modified formula is shown in (22). where α ∈ [0, 1], when α is 0, then no length penalty is applied, and when α is 1, then the sentence length T is directly used for the penalty: (22) $Y_{t} = \arg \max_{Y_{t}^{1}, \dots Y_{t}^{B} \in S_{t}} \sum_{b = 1}^{B} \frac{1}{T^{α}} \log P (Y_{t}^{b} | X)$

2.4

Experimentation and Analysis

In order to verify the effectiveness of the automatic error correction model for English text proposed in this chapter, the test sets used in this experiment are CoNLL-2014 and JFLEG, whose corresponding evaluation metrics are M² (M² includes accuracy P, recall R, and F_0.5 scores), and GLEU, respectively. This set of experiments will be used to validate the effectiveness of the model in this paper by comparing the prediction results, where the models used as comparisons are respectively are BERT-fuse Mask, BERT-fuse GED, BERT (None), RoBERTa (None), and BERT+SMT+Bi-GRU. The performance of the model proposed in this paper is compared through the evaluation scores (P, R, and F_0.5) of the different models on the test set, and the specific results of the evaluation are shown in Table 1. From the comparison of experimental results, the model in this paper performs optimally on the test sets CoNLL-2014 and JFLEG. For the CoNLL-2014 test set, the F_0.5 of this paper’s model is 44.32%, which is higher than the other compared models. The performance on accuracy and recall is also the best among all models, reaching 62.2% and 44.32%, respectively. In the JFLEG test set, the GLEU of this paper’s model is 58.49%, which is also the best performance among all models.

Table 1.

Comparison of prediction performance of the model

Model	CoNLL-2014(test)			JFLEG
Model	P(%)	R(%)	F_0.5(%)	GLEU(%)
BERT-fuse Mask	57.9	15.36	37.16	52.2
BERT-fuse GED	58.29	15.93	38.1	53.51
BERT(None)	61.61	16.36	39.52	55.67
RoBERTa(None)	62.91	19.54	43.5	57.87
BERT+SMT+Bi-GRU	60.2	20.12	42.96	58.37
Model of this article	62.2	20.68	44.32	58.49

In order to further investigate the correction of this paper’s model for different categories of errors in English texts, this section analyzes the correction of five common grammatical error categories, covering verb tense, noun singular and plural, subject-verb agreement, articles, and prepositional collocation problems. Based on the test set CoNLL-2014, the comparison of the number of corrections between this paper’s model and the BERT+SMT+Bi-GRU model in different categories of errors is shown in Figure 1. From the comparison results in the figure, it can be seen that this paper’s model is better in the above grammatical errors involving various different categories, especially in the types of errors such as verb tense, subject-verb agreement, etc., and the number of successful corrections reaches 177, 293. This can show that this paper’s model can better capture the grammatical structure of the text, and can locate and correct the errors that are strongly related to the grammar more conveniently.

3

English Writing Teaching Platform Based on English Automatic Error Correction Modeling

This chapter will focus on the system design and implementation of the English writing teaching platform based on the English text automatic error correction model proposed above. The main functional modules of the English writing teaching platform designed in this paper are text error correction module, data collection module and background management module. Among them, the background management module includes user management, role management, permission management and data set management.

3.1

System design

3.1.1

Overall system architecture design

The system adopts the overall structure of layered design, which can reduce the coupling between modules. The data storage layer uses MySQL and Redis databases. The public service layer is the public part of the text error correction platform, providing log analysis and file storage functions. Business service layer is the specific implementation of the functional modules of the text error correction platform, including multiple services. The user service layer provides users with the Web page of text error correction function, and users can use the text error correction function by pasting text or uploading files.

3.1.2

System process design

The text error correction process is mainly the process of interaction between the user and the text error correction platform after inputting the text. First of all, the user inputs the text to be detected and clicks the “Text Detection” button, then the system detects the error location and error type of the text and gives the correction suggestions, and the user can choose to replace or ignore according to the correction suggestions given by the system. The user can choose to replace or ignore according to the correction suggestions given by the system. Choosing to replace means adopting the correction results given by the system, while choosing to ignore keeps the error position unchanged. Finally, the user clicks the “Download File” button to save the corrected text as a file locally.

The data collection process is mainly for model developers to collect and annotate data. First of all, the model developer inputs the wrong sentence and the corresponding correct sentence, then marks the wrong location and the wrong type of the wrong sentence, and submits it. After the administrator approves the data, the data can be used as the dataset of the model, and finally, the approved dataset can be viewed.

3.1.3

Database design

The main entities in the database include users, roles, text, errors and datasets. Roles are used to control user privileges, text is the raw data used by users for text error correction, and datasets are the data collected by model developers. Both text and dataset may contain errors. There is a many-to-one relationship between users and roles, a one-to-many relationship between users and text, a one-to-many relationship between text and errors, and a many-to-one relationship between errors and datasets.

3.2

Main functional modules and their realization

3.2.1

Text Error Correction Module

The main part of the Text Error Correction page is a text field, users can directly enter the text to be detected in the text input box, or through the “Import File” button at the top of the page will be imported into the system and parsed into the text field. On the left side above the text field is a single selector for selecting the language of the text. On the right side above the text field is a multi-selector that allows the user to select the type of error to be detected. Next, the user clicks the “Text Error Correction” button at the bottom of the text field, and the system displays the detection results on the right side of the page. The results include the location and type of text errors detected by the system, as well as suggestions for corrections.

3.2.2

Data acquisition module

The data collection page includes data collection and annotation functions, and the main part of the page is a form. First, the model developer enters a set of sentences containing errors and corresponding correct sentences into the text input box. Next, below the sentence pairs is a data annotation function, which is used to annotate the location and type of errors in the sentences. By default, there is only one error per sentence, but the user can dynamically add or remove error annotations.

3.2.3

Back-office management module

The backend management includes functions such as user management, role management, permission management, dataset management, and menu management. These functions can only be accessed by the super administrator with the highest authority. The platform has a high level of security, and the permission control is very flexible and convenient for administrators to operate.

The main body of the role management page displays all the role information in the form of a list, and the list of roles is displayed on a page. The super administrator can add, modify and delete roles through the buttons on the top of the role list. In addition, the super administrator can also modify roles, delete roles, and enable or disable the status of roles through the buttons in the role list.

Permission control is based on the RBAC model, that is, role-based permission control. In the role management page, when clicking the “Modify Role” button, the system will pop up a modification box, which includes the setting of menu privileges. The super administrator can assign menu privileges to the role. Once the assignment is complete, users with that role will only be able to access the menus assigned to that role.

4

English Writing Teaching Platform Application Practice

This chapter will explore the effects of the English writing teaching platform on learners’ writing ability and writing self-efficacy by implementing an English writing teaching experiment based on the English writing teaching platform constructed in this paper in English classrooms in colleges and universities.

4.1

Performance testing

4.1.1

Analysis of loss function results

Before formally carrying out the experiments on the application of the English writing teaching platform in this paper, its performance is tested to verify the feasibility of the platform in practical application. The data of this experiment is adopted from the GEC parallel corpus NUCLE dataset, the NU-CLE dataset includes more than 1,400 student compositions and more than 1 million words with grammatical error markers i.e., the corresponding correct grammar, and the English grammatical errors in the dataset include a variety of error types including coronal error, collocation error and so on. The experimental environment and experimental model parameter settings are shown in Table 2.

Table 2.

Experimental environment

Environment and model	Parameter
Operating system	Windows 10
GPSS	NVIDIA GeForce GTX 1070 Ti
Tensorflow version	Tensorflow-gpu 1. 12. 0
Python version	Python3. 6
memory	8G
Network number	6
Word vector dimension	256
Learning rate	1

Comparison experiments were conducted between the English writing teaching platform and the nested attention neural model, the deep context model, and the CNN-based Sep2Sep model, and the results of the loss functions of the different models on the dataset NUCLE are shown in Figure 2. Observing the loss curves, it can be seen that the loss function of the English writing teaching platform proposed in the study decreases the fastest, and after 80K steps of training, its loss function gradually decreases to within 60, and always keeps fluctuating up and down in the range of 40~60. The CNN-based Sep2Sep model is basically consistent with the curve of the grammar automatic error correction system, with the loss value reduced to less than 70 after 100K steps, and the fluctuation smooth period is maintained between 50 and 70. The loss function of the nested attention neural model and the deep context model decreases more slowly, and the loss value decreases within 90 after 150K steps of training, and the function value keeps fluctuating between 70 and 90. The results show that the curve of this paper’s platform fluctuates less, and the stability and accuracy are significantly better than the other three models.

4.1.2

Accuracy analysis

In order to verify the accuracy of the English writing teaching platform based on the English automatic error correction model constructed in this paper, the RNN-based Sep2Sep model and the LSTM-based Sep2Sep model will be further introduced in this section for comparative analysis, and the text error correction effects of the six error correction models are shown in Table 3. From the table, it can be seen that the detection accuracy of this paper’s platform is the highest, with an F_0.5 value of 56.34, and a P-value and R-value of 66.84 and 35.11, respectively, which is significantly better than other error correction models.

Table 3.

Error correction effect of English text

Model	P	R	F_0.5
Sep2Sep model based on RNN	39.84	30.01	37.59
Sep2Sep model based on LSTM	48.96	34.02	42.42
Nested attentional neural model	54.88	25.23	45.76
Deep Context Model	53.77	21.32	43.21
The Sep2Sep model based on CNN	61.17	33.29	51.53
Grammar automatic error correction model	66.84	35.11	56.34

4.1.3

Response time analysis

Analyzing the performance of the English writing teaching platform designed on the basis of the BERT-based English text automatic error correction model, the results of the comparison of the running response time of the English writing teaching platform are shown in Figure 3. From the figure, it can be seen that compared with other English text automatic error correction models, the English writing teaching platform based on the BERT-based English text automatic error correction model in this paper has the fastest response speed and the shortest response time. From the change in response time of the grammar automatic error correction system, it can be found that with the increase in the number of data to be detected, its response time shows a trend of gradual increase and then stabilization, and finally stays at 1.35 s. Obviously, the English writing teaching platform of this paper effectively improves its own response speed on the basis of guaranteeing smooth operation.

4.2

Application Practice Analysis

The subjects of this experiment are two classes of first-year business English majors in University W of X city, totaling 80 students. The number of students in each of the two classes is 40, the ratio of male to female is basically the same, and the class progress in the English writing course is the same, and the learning level in the English writing course is comparable. This study chooses these two classes with little difference to conduct a two-group controlled experiment, in which the experimental class will apply the English writing teaching platform constructed in this paper to carry out the overall teaching, while the control class will continue to carry out the traditional teaching without making any changes. The two classes will have the same conditions except for the difference in the implementation of the whole unit teaching. After the fourteen-week experiment, the students were tested on their writing ability.

In this study, the subjects’ writing ability was analyzed in terms of five dimensions: content expression, organization, vocabulary use, grammar use, and standard writing. In order to understand the level of English writing ability of the subjects in the experimental class and the control class after the experiment, the researcher analyzed the data on the five dimensions of writing ability in the post-test. The levels of the dimensions of English writing in the experimental class and the control class are shown in Table 4. As can be seen from the data in the table, the dimensions of writing ability of the subjects in the experimental class after the experiment, including the five dimensions of content expression, organizational structure, vocabulary use, grammar use and standardized writing, are all higher than those of the subjects in the control class, and the mean values of the dimensions are higher than those of the subjects in the control class, by 2.53, 0.91, 0.91, 1.18, and 0.52, respectively.

Table 4.

English writing proficiency

Dimension	Class	N	Mean	Standard deviation
Content expression	Experimental class	40	25.85	0.954
Content expression	Control class	40	23.32	0.915
Organizational structure	Experimental class	40	18.19	0.755
Organizational structure	Control class	40	17.28	0.668
Vocabulary use	Experimental class	40	18.51	0.595
Vocabulary use	Control class	40	17.6	0.621
The use of grammar	Experimental class	40	21.82	0.792
The use of grammar	Control class	40	20.64	0.947
Normative writing	Experimental class	40	4.46	0.528
Normative writing	Control class	40	3.94	0.597

An independent samples t-test was conducted on the dimensions of English writing proficiency to find out whether there is a significant difference between the experimental class and the control class in terms of English writing proficiency. The specific results are shown in Table 5. The probability values of the variance chi-square test for the dimensions of content expression, organizational structure, vocabulary use, grammar use, and standardized writing are 0.488, 0.548, 0.068, 0.297, and 0.065, respectively, which are more than 0.05, which means that the data of this experiment meets the requirement of variance chi-square. And the t-test results of the variance of the means show that the Sig. (two-sided) of the dimensions of content expression, organization, vocabulary use, grammar use, and standardized writing are 0.000, 0.000, 0.000, 0.000, 0.001, respectively, which are less than 0.05, which proves that there is a significant difference between two classes, the experimental and the control classes, in all the dimensions of the English writing ability. It can be seen that English writing instruction applying the English writing teaching platform in this paper can have a positive effect on learners’ writing ability, i.e., it can improve the subjects’ writing ability in five dimensions, including content expression, organizational structure, vocabulary use, grammar use and standardized writing, which are able to produce a significant gap with the students of the conventional writing instruction classes.

Table 5.

Independent sample T test

-	-	Levene test of variance equation				T test of mean equation			95% confidence interval of difference
-	-	F	Sig.	t	df	Sig. (bilateral)	Mean difference	standard deviarian	lower limit	upper limit
Content expression	Suppose the variance is equal.	0.463	0.488	11.972	77	0.000	2.514	0.212	2.112	2.942
Content expression	Suppose the variance is not equal.	-	-	11.986	76.72	0.000	2.514	0.211	2.113	2.942
Organizational structure	Suppose the variance is equal.	0.346	0.548	6.648	77	0.000	1.012	0.152	0.705	1.305
Organizational structure	Suppose the variance is not equal.	-	-	6.642	76.302	0.000	1.012	0.152	0.705	1.305
Vocabulary use	Suppose the variance is equal.	3.262	0.068	7.989	77	0.000	1.027	0.14	0.77	1.296
Vocabulary use	Suppose the variance is not equal.	-	-	8.013	73.789	0.000	1.027	0.14	0.77	1.296
The use of grammar	Suppose the variance is equal.	1.098	0.297	5.88	77	0.000	1.08	0.183	0.709	1.433
The use of grammar	Suppose the variance is not equal.	-	-	5.894	75.244	0.000	1.08	0.183	0.709	1.432
Normative writing	Suppose the variance is equal.	3.588	0.065	3.585	77	0.001	0.424	0.122	0.195	0.676
Normative writing	Suppose the variance is not equal.	-	-	3.592	75.82	0.001	0.424	0.122	0.195	0.676

5

Conclusion

This paper takes the English writing text as the background, combines the characteristics of English writing text to put forward the English text automatic error correction model based on BERT, and builds an English writing teaching platform with the function of automatic error correction of English text as the core on the basis of the model, which provides a brand-new strategic direction for the English writing teaching of the students in colleges and universities.

After completing the research on the English text automatic error correction model of this paper, experiments and results analysis were carried out on the effect of the model. In the CoNLL-2014 test set, the F_0.5 of this paper’s model is 44.32%, and the accuracy and recall reach 62.2% and 44.32%, respectively, and the model’s performance is the best among all the models, and the performance in the JFLEG test set is also optimal, with a GLEU of 58.49%. Compared with the BERT+SMT+Bi-GRU model, this paper’s model has a higher number of corrections in different categories of errors and a better correction effect.

Before carrying out the application practice of this paper’s English writing teaching platform, its performance is tested. In the analysis of the loss function results, the platform’s loss function reduces the fastest, and the loss function always stays in the range of 40~60 after 80K steps of training, and the curve fluctuation of the loss function is smaller than that of other models, which is completely better than the other comparison models. Comparing the six error correction models such as the RNN-based Sep2Sep model and the LSTM-based Sep2Sep model, the platform in this paper has the highest detection accuracy with an F_0.5 value of 56.34, and P and R values of 66.84 and 35.11, respectively. In terms of response time as the number of data to be detected increases and gradually stabilizes to remain at 1.35s, also significantly better than the other comparison models.

Applying this paper’s English writing teaching platform to carry out application practice, students in the experimental class perform better in the five English writing ability dimensions of content expression, organizational structure, vocabulary use, grammar use and standardized writing, with dimension means 2.53, 0.91, 0.91, 1.18, 0.52 higher than those of the control class, respectively, and through the independent samples t-test, the Sig. (two-sided) for each dimension can be obtained as follows 0.000, 0.000, 0.000, 0.000, 0.001, which are all less than 0.05. This also proves that there is a significant difference between the experimental class and the control class in all dimensions of English writing ability, and that the English writing teaching platform constructed in this paper is able to effectively improve the English writing ability of students.

Lingua:: Inglese

Frequenza di pubblicazione:: 1 volte all'anno
Argomenti della rivista:: Scienze biologiche, Scienze della vita, altro, Matematica, Matematica applicata, Matematica generale, Fisica, Fisica, altro

Feed RSS della rivista

Research on English Writing Teaching Strategies for College Students with the Assistance of Artificial Intelligence

Liyun Xu

Pubblicato online: 29 set 2025

Ricevuto: 01 feb 2025

Accettato: 02 mag 2025

DOI: https://doi.org/10.2478/amns-2025-1091

Parole chiaveBi-GRU, BERT, English automatic error correction model, English writing

© 2025 Liyun Xu, published by Sciendo.

This work is licensed under the Creative Commons Attribution 4.0 International License.

Parole chiave
Bi-GRU, BERT, English automatic error correction model, English writing