Optimization of Intelligent Corpus and Language Writing Teaching Based on Embedded Task Processing System
Published Online: Mar 24, 2025
Received: Oct 31, 2024
Accepted: Feb 16, 2025
DOI: https://doi.org/10.2478/amns-2025-0771
Keywords
© 2025 Yan Li et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
In today’s information age, the development and application of artificial intelligence technology is becoming more and more widespread. Composition education, as an important part of cultivating students’ comprehensive literacy, has also begun to use AI technology to improve teaching effect and students’ writing ability [1-4]. To realize high-quality AI composition scoring and generation, a key link is to have a high-quality intelligent corpus [5-6].
Corpus in linguistics refers to a computerized language library containing a large amount of actual language materials, which is an advanced form of human language and a very important part of linguistic research [7-8]. Corpora are categorized into “text-only corpora” and “multimedia corpora”, in which “text-only corpora” can be saved in text form, and “multimedia corpora” can be saved in text form, and “multimedia corpora” can be saved in text form. The “multimedia corpus” contains text, audio, video and other forms of data [9-12]. Language writing teaching is an important part of cultivating students’ language expression ability and logical thinking ability. In language writing teaching, teachers should pay attention to cultivating students’ observation, thinking and expression abilities, so that they can express their views and thoughts in accurate and specific language [13-16]. Effective intelligent corpus plays an important role in improving language writing teaching, and embedded task processing system can play a role in promoting and optimizing it [17-18].
Embedded task processing system is a product of combining advanced microelectronics technology, communication technology and computer technology with each specific application field, which is a capital-technology-intensive and highly integrated and innovative multi-task processing knowledge system [19-22]. Embedded task processing system is application-centered, computer technology-based, hardware and software can be tailored to adapt to the application of the system has strict requirements for functionality, reliability, cost, volume and power consumption of special computer systems, so it can effectively optimize the intelligent corpus and language writing teaching [23-26].
The study chooses a neural network word embedding model as the text embedding model to construct the word embedding vector, and proposes a system design scheme and functional module design after analyzing the text processing requirements. The text processing system generates an intelligent corpus for language writing teaching, and the text processing system based on the text embedding model and the intelligent corpus are used in the teaching experiment. The text processing system and intelligent corpus are used in the teaching experiment. By comparing the changes and differences in the language writing levels of students in the experimental group and the control group before and after the experiment, we compare the text processing system and the intelligent corpus with the traditional language writing teaching methods to improve the students’ language writing levels, and verify that the method of this paper has a positive impact on the students’ language writing.
Functional requirements
User login authorization function: The user login authorization interface can be realized for teachers and students to log in the system with different roles to achieve the effect that users with different roles can have different menu privileges. FAQ Knowledge Base Function: A complete FAQ knowledge base platform interface is built for teachers and students to independently maintain the basic knowledge base of the intelligent platform to achieve the effect that users can conveniently add, delete, change and check the FAQ. Complaints about the text processing platform function: design the text processing platform interface for students to obtain writing assistance programs, the platform when the writing text belongs to the FAQ library questions, directly return to the FAQ library answers. When there is no answer in the FAQ library, the semantic understanding function will attempt to answer the question. When the semantic comprehension function also fails to give an answer, it returns an unknown answer. Text Classification Function: Realize the text classification platform interface, which can be used to input a user’s writing text in a single case, and the result will be directly below the input box after completing the intelligent classification, and it can also upload the user’s writing text in batch in the form of Excel file for intelligent classification, and then the classification result can be exported. Non-functional Requirements
High performance: the text processing system needs to provide functional services for multiple users in real time, to ensure that a variety of normal peak and abnormal load conditions under the normal operation of the system, and real-time user operations can be carried out in a low-latency link to the back-end processing, to complete the user’s authorization, FAQ Knowledge Base editing, complaints about the text processing and text classification. Accuracy: The accuracy of the text processing platform directly affects the value of the entire system, accuracy contains two aspects, the first is the accuracy of the FAQ knowledge base matching, the need to develop a reasonable calculation of the matching threshold to ensure that the results are accurate. The second issue is the accuracy of extracted answers, which requires the use of semantic understanding model algorithms to fully understand the candidate articles and text information to improve the accuracy of answer prediction.
Overall Architecture The overall structure of the text processing system based on semantic understanding is shown in Figure 1. The system’s overall structure shows that users can perform three major operations by logging into the system. The first operation is oriented to the semantic understanding task, in which the user inputs the writing text as well as, according to the content of the text, the question and the content text are divided into words and the deactivated words are removed respectively, and it is judged whether it matches with the FAQ library or not. When there is still no answer, the corresponding feature vector representation is generated by extracting text features, and the semantic understanding model is utilized to jointly process the question and text information to obtain a solution set for writing assistance. Finally, the solution set is output to the front-end interaction page to display to the user for their selection. The second major operation is oriented towards the FAQ library, completing the construction of the basic dataset by editing the FAQ library and entering all the saved information into the library. The third major operation is oriented to the text classification task, where the user inputs text and the classification results are displayed on the direct page or uploaded to an Excel table, in which the first column of each row lists each content text that needs to be classified, and the data is cleaned up after the content is extracted. Then through the BERT model to generate specific feature vectors with contextual information, and then use the CNN as the output layer, to get the classification results of each text, and then generate a new Excel table, the first column of each line is still the original content text, and the second column of each line is the classification category, and the user can finally export it. Database Design The database of the platform constructed in this thesis is mainly used to store user information, historical query records and platform access records. Storing the historical query information of the system can help collect the writing corpus and update and improve the system in time. According to the needs of the system, MySQL, a lightweight database with small size and fast response, has been selected as the platform database. Data Interaction Design This system, in order to ensure the security and authority of interface invocation, requires that each data interaction must hold the unique token of the current user in order to legally interact with the information. The security framework used is Spring Security+Jwt Token, access to the back-end interface needs to carry the token in the request header for access.

System structure diagram
Data preparation In this paper, we construct a text processing system in the field of language writing teaching, and the participle dictionary, named entity dictionary, lexical lexicon and BERT dictionary in the knowledge resources use public data resources, and the system design needs to manually construct the text categorization dataset, the text semantic comprehension dataset and the problem library. Data Processing After receiving the text in the writing text processing module, the data cleaning program firstly removes irrelevant characters in the text according to specific regular expressions, so as to ensure that the main semantics of the text is not interfered by “noise”. Then, the text is processed by word separation. The main steps of text pre-processing are as follows: Step 1: Get the content of writing text. Step 2: Specific regular expression removes irrelevant characters from the text to get the cleaned writing text. Step 3: Load the participle dictionary and deactivation dictionary, and call the CAS participle system NLPIR to generate participle results. Step 4: For each participle result, compare it with the words in the deactivation dictionary in order to determine whether the current keyword is a deactivated word. Step 5: If the keyword is a deactivated word, it is deleted from the segmentation result. Otherwise, continue Step 3 until the end of the traversal of all participle results. Step 6: Output the final result and end. Writing Text Processing The design of text processing strategy, the text processing module of the system is a kind of multi-strategy module based on semantic understanding task and text classification task, this module mainly realizes the text processing platform in the field of language writing teaching, analyzes the text with natural language semantics, and returns the user’s solution by using different solution strategies. Intelligent Classification The text classification module is designed to intelligently categorize language writing texts. Users of the system can obtain the classification result for a single case by entering a single text case into the input box and clicking the classification button. Or through the Excel document format for batch input of unclassified text, after uploading successfully, the back-end will carry out the corresponding batch text classification task, and then the classified text will be re-exported in the reverse of the Excel document format, so that the user can get the classified document.
The purpose of the text processing system is to improve students’ language writing on specific learning objectives, so before generating the word embedding vectors, it is necessary to first collect the corpus of corresponding writing texts for the learning objectives.
After collecting the corpus, it is also necessary to preprocess the data to ensure that it meets the input structure of the word embedding model. The steps of corpus preprocessing are shown in Figure 2.

Corpus data processing process
In data preprocessing, the collected corpus data is first grouped into a file to remove any unnecessary symbols such as commas, periods, numbers, and so on. After that, stop words in Chinese are removed. Based on the remaining corpus information, a vocabulary list is constructed, a three-word tuple is constructed based on the N-gram idea (i.e., three consecutively occurring words are treated as a tuple, and the first two words are used to predict the last word), and finally, input features and output labels are constructed based on the requirements of the input format of the word embedding model.
After data preprocessing, a corpus that can be used for training is constructed. After that, the corpus data is inputted into the word embedding model, and the word embedding vectors can be trained.
In NLP tasks, since computers cannot understand textual language, it is necessary to transform text into feature vector representations with semantic information that capture the semantic relationships between words and sentences. These vectors are usually designed as fixed-length dense vectors that enable computers to process and understand textual data efficiently. That is, the text units are transformed into fixed-length dense vectors to simplify the text processing process. Traditional text representation methods such as One-Hot or Bag-of-Words models are simple to operate in applications, but they produce high-dimensional sparse vectors that affect the capture of semantic relationships between words. In contrast, current text embedding models such as Word2vec, Glove and BERT, by generating low-dimensional dense vectors, are able to reduce the feature dimensions, lower the model complexity, reduce the resource consumption and improve the training efficiency.
Word2vec Word2vec, as a text embedding technique [27], converts text into numerical vectors through a neural network model, which in turn captures the semantic relationships between words. The two commonly used structures are the continuous bag of words model (CBOW) and the skip-gram model, which are similar in principle but different in prediction. The CBOW model takes a set of contextual words as input and predicts the target word, which works better in small datasets [28]. The skip-gram model takes the target word as input and predicts the contextual words around it, which requires the prediction of the prediction for each word occurring in the context, which is usually slower [29]. Overall, Word2Vec, as a word embedding model, only observes local information between words and their neighbors, and does not take into account connections with other words in the context. The structure of the two models is shown in Figure 3. BERT In the field of NLP, the problem of coding long sequences has always been a challenge, and previous research has widely relied on RNN, LSTM or GRU to process sequence data, which affects the performance of the model in understanding complex semantics due to the problems of gradient vanishing and gradient explosion. With the continuous advancement of text embedding technology in the field of NLP, the BERT pre-training model further improves the performance of NLP tasks by introducing a bidirectional representation of text [30], which opens the era of pre-training models. The proposal of Transformer effectively solves the problem by introducing a multi-head attention mechanism that allows sequence elements to be directly associated with other elements during encoding, effectively capturing global dependencies in the sequence. The model contains a multi-layer encoding-decoding structure, and each layer adopts a self-attention mechanism and a fully-connected network, which not only improves the computational efficiency and realizes the parallel processing of data, but also optimizes the model’s ability to capture long-distance dependencies. The design of the Transformer model discards the previous cyclic mechanism, and introduces the sequential information of sequence markers through the positional encoding, which further improves the model’s ability to understand sequence understanding of sequence features. The structure of the Transformer encoder is shown in Fig. 4, where E denotes the input sequence, Trm denotes the Transformer encoder, T denotes the word vector of the output sequence, and n denotes the length of the sequence. The BERT model consists of 12 encoder modules and achieves bidirectional comprehension of the text through MLM.

Word2vec model structure

Transformer structure
The BERT model is mainly divided into two stages: pre-training and fine-tuning, combining two mechanisms of Masked Language Model (MLM) and Next Sentence Prediction (NSP) in the pre-training process. The MLM model refers to the idea of the perfect fill-in-the-blank task, where 15% of the words are randomly replaced with [MSAK] markers in the sequence of the input words, and the model predicts the masked words through the context. The masking mechanism makes the predicted words closer to the original words by breaking the original contextual relationship and allowing the model to access the information during the training process. In addition the NSP model is used in the NLP field to determine whether there is a logical relationship between two sentences, NSP belongs to the binary prediction model, and BERT combines the two prediction strategies, which significantly improves the generalization ability of the model despite the high training cost, resulting in better results for text- and sentence-level prediction, which is applicable to more downstream tasks.
Although the BERT model shows good results in processing complex corpus, it performs poorly in the domain of language writing and teaching, so in this paper, we use a neural network-based text embedding model with domain-specific corpus for pre-training.
Neural network word embedding model High-dimensional dense word embedding vectors are usually generated by neural network training. In order to obtain word embedding vectors based on specific learning scenarios. In this paper, we use PyTorch framework, which is based on neural network technology, to construct a word embedding vector model. The structure of which is shown in Figure 5.

The word embedding model based on neural network
The system uses the N-gram idea to take three consecutively occurring words as a sample and uses the first two words to predict the third word. Therefore, the input requires two words, the embedding layer transforms these two words into the corresponding word embedding vectors, and then through the output layer output into transformed into output vectors (the output vector dimension is the same as the length of the word list table), and finally, through the softmax function, the output of the predicted word is the probability of the words in the vocabulary list of the individual words.
The loss function used in the training of the neural network is the cross-entropy loss function, and the optimization function used is the stochastic gradient descent method with a learning rate of 0.01.
Text categorization is a classic problem in text mining tasks, aiming to achieve automatic classification and labeling of massive text based on information such as topics. According to the length of text, text classification can be divided into short text classification and long text classification. This section focuses on the performance of neural network-based text embedding methods and other methods in long text categorization. The experimental results of this paper’s method and each comparison method on the writing dataset are shown in Table 1.
The writing dataset text classification experiment results
| Text representation method | Accuracy | Precision | Recall | F1 | Rank |
|---|---|---|---|---|---|
| DTM | 0.683 | 0.687 | 0.737 | 0.870 | 5 |
| FBOW | 0.787 | 0.863 | 0.878 | 0.886 | 2 |
| LDA | 0.811 | 0.726 | 0.701 | 0.881 | 3 |
| Word2Vec-DTM | 0.749 | 0.882 | 0.880 | 0.635 | 12 |
| P-SIF | 0.847 | 0.794 | 0.822 | 0.733 | 7 |
| Doc2Vec | 0.699 | 0.731 | 0.651 | 0.674 | 11 |
| WME | 0.748 | 0.839 | 0.813 | 0.875 | 4 |
| TextGCN | 0.697 | 0.723 | 0.761 | 0.728 | 8 |
| Attention-BiLSTM | 0.716 | 0.793 | 0.700 | 0.726 | 9 |
| TextCNN | 0.870 | 0.776 | 0.723 | 0.691 | 10 |
| XLNet | 0.842 | 0.770 | 0.880 | 0.850 | 6 |
| Ours | 0.908 | 0.920 | 0.907 | 0.896 | 1 |
As can be seen from Table 1, the performance of this paper’s neural network-based text embedding method on the writing text categorization dataset is ahead of the other methods, with obvious progress in the four evaluation indexes, and the rankings are all ahead of other text representation methods. The Accuracy, Precision, Recall, and F1 values of this paper’s method on writing text categorization are 0.908, 0.920, 0.907, and 0.896, respectively, all of which are the highest text representation methods among all index scores.
The experimental results show that the neural network-based text embedding method proposed in this paper is significantly better than other methods in the text categorization task. In this experiment, the length of the text in the dataset is long, and most of them are paragraph- or chapter-level text, and the method in this paper can appropriately improve the semantic overestimation phenomenon in the face of paragraph- and chapter-level text, and effectively enhance the text representation capability.
The text processing system and the intelligent corpus it generates are used in the practice of language writing teaching to investigate its effect on students’ language writing level. Two sophomore classes in secondary school A were chosen as the experimental subjects of this study. The scoring criteria for language writing were considered in nine aspects: accuracy of diction, fluency of discourse, chapter structure, logic of the essay, stylized narrative, writing specification, degree of innovation of the topic, aesthetics of language, and clarity of intention. One other class was randomly selected as the experimental group, in which the text processing system and intelligent corpus designed in this paper were used in its language writing teaching, and the other class became the control group, in which the traditional language writing teaching method was used. The two groups conducted a nine-week teaching experiment to determine the actual effectiveness of the text processing system and intelligent corpus by measuring the differences in the language writing levels of the two groups of subjects before and after the experiment.
Differences between the two groups before the experiment Before conducting the experiment, it is necessary to control the factors affecting the students’ language writing level and conduct the experiment according to the expected plan. Before initiating the experiment, students in both the experimental and control groups were evaluated for their language writing level, and the results were analyzed and processed using SPSS22.0 software. To analyze and compare the nine dimensional indicators of the language writing level of students in the two classes of the experimental group and the control group before the experiment, the test data were analyzed using the independent samples t-test, and the results of the analysis are shown in Table 2. According to Table 2, the mean values of the total scores of the two groups’ language writing level before the experiment are 50.32 and 50.27 respectively, and the difference of the total scores of the language writing level is only 0.05 points, p>0.05, so there is no significant difference between the total scores of the two groups’ language writing level. The scores of the nine dimensions of language writing in the experimental group before the experiment were 6.01, 5.93, 5.15, 5.72, 5.25, 5.34, 6.06, 5.44, and 5.42 respectively. The scores for the nine dimensions in the control group were 5.98, 5.95, 5.27, 5.88, 5.04, 5.22, 5.82, 5.46, and 5.65. The difference in scores between the experimental and control groups on the 9 dimensions was not more than 0.5 points. The p > 0.05 on these 9 dimensions between the experimental group and the control group before the experiment, so there is no significant difference between the two groups on the 9 dimensions of language writing level. Therefore, the two groups of students can be compared in terms of language writing level, and it will not affect the results of the later experimental teaching. Differences between the two groups after the experiment After the experiment on the experimental group and the control group of the two classes of students in the language writing level of the nine dimensions of the investigation, the investigation of the data using independent samples t-test to analyze the processing, the results of the analysis are shown in Table 3. According to Table 3: After the test, the p-value of the 9 dimensions of language writing level between the experimental group and the control group after the experiment is less than 0.01, and the total score of language writing level p=0.000 (p<0.01), indicating that there is a very significant difference between the experimental group and the control group after the experiment in the dimensions of language writing level and the total score. On the nine sub-dimensions of language writing level, the experimental group was higher than the control group by 2.83, 3.52, 3.44, 3.99, 4.30, 4.17, 3.22, 3.77, and 4.35 points, respectively, and exceeded the control group by 33.59 points in the overall level. The above data show that there is a highly significant difference between the experimental group and the control group in the sub-dimensions and overall level of language writing after the experiment, indicating that adding text processing system and intelligent corpus to language writing through teaching experiment can improve students’ writing level more than conventional language writing teaching.
Comparison of Chinese writing level of 2 groups before the experiment
| Dimension | Experimental group | Control group | t | p |
|---|---|---|---|---|
| M±SD | M±SD | |||
| Accuracy of words | 6.01±1.49 | 5.98±1.36 | 0.645 | 0.825 |
| Discourse fluency | 5.93±1.75 | 5.95±1.83 | -0.284 | 0.776 |
| Article structure | 5.15±1.45 | 5.27±1.63 | -0.438 | 0.794 |
| Textual logic | 5.72±1.97 | 5.88±1.55 | -0.305 | 0.771 |
| Stylized narrative | 5.25±1.31 | 5.04±1.38 | 0.296 | 0.868 |
| Writing standards | 5.34±1.33 | 5.22±1.82 | 0.542 | 0.872 |
| Subject innovation | 6.06±1.72 | 5.82±1.52 | 0.687 | 0.705 |
| Linguistic beauty | 5.44±1.60 | 5.46±1.85 | -0.152 | 0.844 |
| Explicit resolution | 5.42±1.33 | 5.65±1.68 | -0.723 | 0.914 |
| Total | 50.32±7.88 | 50.27±8.27 | 0.315 | 0.888 |
Comparison of Chinese writing level of 2 groups after the experiment
| Dimension | Experimental group | Control group | t | p |
|---|---|---|---|---|
| M±SD | M±SD | |||
| Accuracy of words | 9.78±5.13 | 6.95±1.34 | 3.984 | 0.004 |
| Discourse fluency | 9.76±5.62 | 6.24±1.62 | 4.956 | 0.003 |
| Article structure | 9.48±5.61 | 6.04±1.26 | 4.738 | 0.003 |
| Textual logic | 10.46±3.93 | 6.47±1.73 | 5.744 | 0.002 |
| Stylized narrative | 10.22±4.44 | 5.92±1.64 | 6.029 | 0.001 |
| Writing standards | 9.82±4.60 | 5.65±1.26 | 5.035 | 0.002 |
| Subject innovation | 9.24±5.53 | 6.02±1.46 | 4.356 | 0.003 |
| Linguistic beauty | 9.66±4.38 | 5.89±1.31 | 4.967 | 0.003 |
| Explicit resolution | 10.18±4.40 | 5.83±1.99 | 6.464 | 0.001 |
| Total | 88.60±13.52 | 55.01±7.26 | 14.979 | 0.000 |
Changes in the experimental group before and after the experiment In order to test whether the text processing system and intelligent corpus applied in the experimental group have an impact on the students’ language writing level, the data before and after the experiment of the students in the experimental group were subjected to the paired-sample t-test, and the comparison results are shown in Table 4. As shown in Table 4, the language writing scores of the students in the experimental group appeared to improve, and the scores of the nine dimensions of language writing proficiency increased by 3.77, 3.83, 4.33, 4.74, 4.97, 4.48, 3.18, 4.22, and 4.76 respectively, and the total score of language writing proficiency increased by 38.28, and the p-values of the dimensions and the total score are less than 0.01, which indicates that the students in the experimental group were language writing level improved very significantly after the teaching experiment. Therefore, the text processing system and intelligent corpus designed in this paper have a significant effect on improving students’ language writing skills. Changes in the control group before and after the experiment In order to test whether the traditional language writing teaching mode used in the control group has an impact on the students’ language writing level, the data before and after the experiment of the students in the control group were subjected to a paired-sample t-test, and the results of the comparison are shown in Table 5. As can be seen from Table 5, the language writing level of the students in the control group also appeared to improve, the improvement of the language writing dimensions is not more than 1 point, the total score of the writing level increased by 4.74 points, and the p-value of the dimensions and the total score is > 0.05, which indicates that the language writing level of the students in the control group, although there is an improvement in the language writing level after adopting the traditional language writing teaching, the improvement range is too small and not significant. Therefore, the traditional teaching mode has no significant effect on improving students’ language writing level.
Comparison of Chinese writing level of experimental group before and after experiment
| Dimension | Before | After | t | p |
|---|---|---|---|---|
| M±SD | M±SD | |||
| Accuracy of words | 6.01±1.49 | 9.78±5.13 | 3.312 | 0.003 |
| Discourse fluency | 5.93±1.75 | 9.76±5.62 | 3.548 | 0.003 |
| Article structure | 5.15±1.45 | 9.48±5.61 | 4.152 | 0.002 |
| Textual logic | 5.72±1.97 | 10.46±3.93 | 4.928 | 0.002 |
| Stylized narrative | 5.25±1.31 | 10.22±4.44 | 5.516 | 0.001 |
| Writing standards | 5.34±1.33 | 9.82±4.60 | 4.625 | 0.002 |
| Subject innovation | 6.06±1.72 | 9.24±5.53 | 2.584 | 0.004 |
| Linguistic beauty | 5.44±1.60 | 9.66±4.38 | 3.954 | 0.003 |
| Explicit resolution | 5.42±1.33 | 10.18±4.40 | 5.035 | 0.001 |
| Total | 50.32±7.88 | 88.60±13.52 | 15.554 | 0.000 |
Comparison of Chinese writing level of control group before and after experiment
| Dimension | Before | After | t | p |
|---|---|---|---|---|
| M±SD | M±SD | |||
| Accuracy of words | 5.98±1.36 | 6.95±1.34 | 1.654 | 0.734 |
| Discourse fluency | 5.95±1.83 | 6.24±1.62 | 0.521 | 0.905 |
| Article structure | 5.27±1.63 | 6.04±1.26 | 0.985 | 0.594 |
| Textual logic | 5.88±1.55 | 6.47±1.73 | 0.745 | 0.703 |
| Stylized narrative | 5.04±1.38 | 5.92±1.64 | 1.035 | 0.685 |
| Writing standards | 5.22±1.82 | 5.65±1.26 | 0.634 | 0.763 |
| Subject innovation | 5.82±1.52 | 6.02±1.46 | 0.312 | 0.671 |
| Linguistic beauty | 5.46±1.85 | 5.89±1.31 | 0.642 | 0.721 |
| Explicit resolution | 5.65±1.68 | 5.83±1.99 | 0.242 | 0.584 |
| Total | 50.27±8.27 | 55.01±7.26 | 3.685 | 0.807 |
The article develops a text processing system that uses a text embedding model, creates word embedding vectors using a neural network word embedding model, and creates an intelligent corpus for language writing teaching. By designing the text processing system and intelligent corpus, it assists in teaching language writing.
The neural network-based text embedding method in this paper has an F1 value of 0.896 for writing text classification, which is significantly more than other text representation methods, and can effectively improve text representation ability. Before the experiment, the score difference between the experimental group and the control group on each dimension and total score of language writing level is less than 0.5 points, and the p-value of each dimension and total score is greater than 0.05, and the language writing level of the two groups has very high homogeneity. After the experiment, the overall level of language writing of the experimental group was 33.59 points higher than that of the control group, 2.83, 3.52, 3.44, 3.99, 4.30, 4.17, 3.22, 3.77, 4.35 points higher than that of the control group in 9 dimensions, and the p-values were all less than 0.01.The increase in the scores of 9 dimensions of the language writing level of the students of the experimental group was more than 3.5 points, and the total scores were increased by 38.28 points, and the p-value is less than 0.01, while the control group’s improvement in all dimensions is not more than 1 point, and the total score is only 4.74 points, and the p-value is more than 0.05. It can be seen that, compared to traditional teaching methods for language writing, the text processing system and intelligent corpus designed in this paper have a significant positive impact on the students’ language writing level.
