Intelligent Construction of Civic Teaching Resources for Ancient Literature Course Based on Natural Language Processing Technology

The level of educational development is an important symbol of the overall development level and development potential of a country and a nation [1-2]. Strengthening students’ ideological and political education is an important measure to comprehensively implement the party’s education policy in the new era, to realize the fundamental task of establishing morality and nurturing talents for the country. Therefore, the construction of ideology and politics is of great significance in ancient Chinese literature courses.

First of all, ancient Chinese literature is an excellent cultural heritage created by the Chinese nation for thousands of years, with high artistic value and cultural heritage [3-4]. Studying ancient Chinese literature not only helps students to understand and grasp the cultural tradition of the Chinese nation, but also, more importantly, cultivates students’ national pride and cultural self-confidence. Secondly, as a humanities subject, ancient Chinese literature contains rich intellectual content and profound wisdom of life [5-6]. The study of ancient Chinese literature helps to cultivate students’ humanistic qualities, aesthetic interests and moral qualities, and to improve their life conditions and abilities. Again, ancient Chinese literature contains a rich social and historical background and information of the times [7-8].

In general, ancient Chinese literature, in which the ideas are profound, has a great help to students’ Civic and political literacy. However, today’s education mostly adopts the traditional teacher’s lecture, but this has the disadvantages of single teaching, low interest of students, and lack of teaching resources for course Civics. For this reason, a platform or system is needed to analyze ancient Chinese literature and obtain its central idea, so as to provide reliable resources for the teaching of course Civics and Politics. And natural language processing technology, can provide good help. Natural language processing is a research in the field of computer science, which aims to enable computers to understand and process natural language texts [9]. It mainly includes modules such as language modeling, syntactic analysis, semantic analysis and machine translation. Language modeling is a natural language processing technology based on statistical learning theory, and its purpose is to study the probability of the appearance of the next word in a language when a certain word appears; syntactic analysis is one of the important links in natural language processing technology, and its purpose is to analyze the grammatical structure in natural language text; semantic analysis is a technology that analyzes the meaning and reasoning ability of natural language text; machine translation is the use of computer technology to automatically translate sentences from one language to another, which is an important application of natural language processing technology [10-13]. Therefore, by extracting ancient literary features, establishing models, analyzing their syntax and semantics, translating the language in literature into conformity with modern views of ideological and political education, and after systematic processing and integration, forming teaching resources and realizing the teaching of intelligent chemical resources [14-15].

In this paper, on the basis of the construction of the keyword library of the elements of ideology and politics and text pre-processing, the cosine similarity and TF-IDF algorithms are used to calculate and count the similarity between the course content and the elements of ideology and politics in the text. The BERT model is introduced and improved, and the comprehensive scoring system of Civic and Political elements is constructed by combining the cosine similarity and TF-IDF algorithm to realize the automatic screening of Civic and Political elements. Based on the knowledge graph, the Civics Teaching Resources Recommendation Model for Ancient Literature Course is constructed, and suitable teaching resources are recommended to students precisely. Through relevant experimental analysis, we demonstrate the significance of this paper’s research in the construction of Civics teaching resources in ancient literature courses.

2

Development of resources for teaching Civics in the curriculum

2.1

Automatic Screening of Curriculum Civics Elements

2.1.1

Construction of a keyword library of Civics elements

It is of great significance to integrate ideological and political education elements into the curriculum of ancient literature to cultivate students’ comprehensive quality and sense of social responsibility. To this end, this paper constructs a comprehensive keyword database of ideological and political elements, which aims to serve as the basis for screening and integrating relevant educational resources. In terms of the selection of keywords, the research not only focuses on basic political theoretical terms, such as “socialism” and “core values”, but also focuses on modern issues closely related to ancient literature, including but not limited to “modern inheritance of traditional culture”, “film and television adaptation of literary classics”, and “use of ancient literary elements”. These keywords not only cover political theories, laws and regulations, moral norms, etc., but also include the ideological emotions and social responsibilities related to ancient literature. The construction of the keyword database is a dynamic process, which will be continuously updated and expanded with the development of society and the change in educational needs.

2.1.2

Text pre-processing

Chinese text is different from English text in that its basic constituent unit is the word, from which words are formed into words, and then from which words are formed into sentences, and words are not distinguished from each other by spaces as in foreign languages, but in the form of uninterrupted strings. Therefore, it is very necessary to preprocess Chinese text. According to the linguistic characteristics of Chinese, in addition to removing spaces, special symbols and other data cleaning operations, the text also needs to be processed by sentence or word division, and finally load the deactivation word list to remove some useless words, phrases, special symbols, etc. in the text.

1)

Text title and body screening

In the experimental data used in this paper, each document is separated by a document tag, and the internal id represents its location information. The summary tag contains the text title of the current document, while the short_text tag contains the main content of its text. In this paper, we use the BeautifulSoup extension library based on Python to parse the corresponding documents and extract the title and text content of each one.

After extracting the title and text content, the body of the document is divided into sentences and words, and the useless labels, special symbols and stop words are removed, which is convenient for the subsequent training of the BERT model.

2)

Text Segmentation

After extracting the title and body text, it is necessary to process the body part of the text using word splitting. According to the linguistic features of the text, firstly, according to the “.?!” punctuation mark to split the body text into sentences.

After the article is broken down into individual sentences, they are then broken down into words. Chinese words are composed of characters, which is the biggest difference between the preprocessing stage of Chinese natural language processing and other languages. Segmentation refers to the process of converting a sequence of characters in a document composed of Chinese characters into multiple phrases using different segmentation algorithms, and the multiple phrases formed still maintain the same order of expression as the original text. There are two types of algorithms for segmenting sentences into words in the Chinese domain: one is to match words in manually edited dictionaries, and the other is to use a statistical approach.

The word splitting method using dictionary matching is also called mechanical word splitting, the algorithm needs to prepare a large enough dictionary in advance, after that, the sentence to be processed will be matched with the words in the dictionary, and if they can be matched with each other, the current word will be split out in the sentence. According to the difference between the methods of finding strings, the N-shortest path splitting method, the maximum probability splitting method and the maximum matching algorithm are derived. In the Maximum Matching algorithm, it can be categorized into Forward Finding, Reverse Finding, and Bidirectional Finding based on the difference in the direction of finding the match in the text. The operation of the forward-based maximum matching algorithm is as follows:

Step 1: Assuming that the initial length of the longest phrase in the dictionary is L, cut the sentence to be split from left to right according to the length L to get the string c_str and match it with the phrase in the dictionary.

Step 2: If c_str can be searched in the dictionary, then c_str is segmented as a phrase. If it fails to match in the dictionary, delete the last text in c_str, search in the dictionary again, and repeat the above operation until a phrase is split or the length of c_str is zero. By analogy, repeat the above operation by shifting the window of length L to the right until the whole sentence is split.

The reverse max-matching algorithm is similar to the forward max-matching algorithm, but it is directed from right to left. The bidirectional maximum matching algorithm, on the other hand, fuses the two and further compares the results of the two matches to the processing.

3)

De-duplication

After the segmentation operation on the text, this paper will remove some deactivated words from the segmentation result. From a linguistic point of view, the so-called deactivated words mean that they have no special meaning in the text, but are only used to organize the language and make the text more coherent. Removing deactivated words increases the accuracy of the algorithm, improves search efficiency, and saves memory space on your computer.

2.1.3

The role of cosine similarity in text similarity calculation

Word Frequency-Inverse Document Frequency (TF-IDF) is a widely used weighting technique in the field of information retrieval and text mining [16]. This statistical method places more importance on the rarity of words. It is based on the core idea that the importance of a word in a particular document is positively correlated with its number of occurrences in that document, but negatively correlated with its frequency of occurrence in the entire document base. In short, TF-IDF takes into account both the frequency of a word in a single document and its prevalence in the entire document set.

TF, i.e., word frequency, is the frequency of occurrence of a keyword in a particular document. The core idea is that if a word or phrase occurs frequently in one document but more rarely in others, then this word or phrase has a strong category differentiation ability, and is therefore very suitable to be used as a basis for categorization.TF is calculated as follows: first, the number of times a given keyword occurs in a given document is compared with the total number of occurrences of all words in the document, and then the ratio of the two is calculated, so as to derive the keyword’s frequency in the whole set of documents. TF is calculated by first comparing the number of times a given keyword appears in a document with the total number of occurrences of all words in the document, and then calculating the ratio of the two, thus arriving at the word frequency of the keyword in the document, the formula is shown in equation (1): (1) $T_{i} = \frac{n_{i j}}{\sum_{k} n_{k j}}$

where n_ij represents the number of times the word appears in document d_j. $\sum_{k} n_{k j}$ represents the total number of occurrences of all words in document d_j.

IDF, i.e. Inverse Document Frequency, the core idea is that a keyword that appears in fewer documents will have a relatively high IDF value, which usually means that the keyword has a strong category differentiation ability. Specifically, the IDF value of a keyword is derived by calculating the inverse relationship between the number of documents in the document library that contain the keyword and the total number of documents in the library. This calculation can aid in assessing the rarity and importance of the keyword in the entire document library. The calculation formula is shown in equation (2): (2) $I_{d f, i} = \log \frac{| D |}{| j : t_{i} \in d_{j} |}$

Where, |D| represents the total number of documents in the document repository. |j:t_i ∈ d_j| represents the number of documents containing the given keyword.

To summarize, when the frequency of occurrence of a word within a particular document is high, while the frequency of occurrence of the word in the whole document collection is relatively low, the TF-IDF weight of the word is higher. This means that the TF-IDF algorithm tends to ignore words that are common and universal, while retaining those with specific meaning and importance. As for the specific calculation formula of TF-IDF for a certain keyword, it is shown in Equation (3): (3) $G_{T F - I D F} = T \times I_{d f}$

Cosine similarity, also known as cosine similarity, is a method of assessing the degree of similarity between 2 vectors by calculating their cosine values. When two vectors are orthogonal, they are said to be linearly unrelated and have a cosine similarity of 0. When two vectors are parallel and isotropic, they have a cosine similarity of ℝ . The cosine similarity of any two nonzero vectors α,β ∈ ℝ, α , and β in a finite-dimensional space V defined in a linear space 1 can be defined as follows: (4) $\cos (α, β) = \frac{α \cdot β}{| α | \times | β |}$

Cosine similarity reflects the correlation between vectors, the larger the cosine value between vectors, the more similar information the 2 vectors contain.

In this study, the TfidfVectorizer tool library was used to feature extract all text data and transform the text into TF-IDF vectors, which allowed the text to be quantitatively represented and used for subsequent similarity calculations. The cosine similarity between the course content and each of the Civic elements was calculated, and the larger of the two was chosen as the underlying composite score, reflecting the overall relevance of the course content to the Civic elements.

2.1.4

Calculation of the match of key professional knowledge points

In order to accurately under-detect whether the text covers the key concepts of ancient literature, this paper proposes to incorporate the strategy of calculating the degree of knowledge matching, which is an organic integration of similarity and relatedness to form the degree of matching. The main steps of calculating the matching degree between knowledge are as follows:

Step1: Calculate inter-knowledge view similarity. Construct the knowledge scoring matrix R_m×n, according to the scores in R_m×n, utilize the cosine similarity to calculate the scoring minds between knowledge are, so as to get the attempted similarity matrix between knowledge Sim_m×n, the results are shown in Table 1.

Table 1.

Knowledge similarity matrix

Knowledge item	K₁	K₂	⋯	K_n
K₁	Sim₁₁	Sim₁₂	⋯	Sim_1n
K₂	Sim₂₁	Sim₂₂	⋯	Sim_2n
⋮	⋮	⋮	⋮	⋮
K_n	Sim_n1	Sim_n2	⋯	Sim_nn

Step2: Calculate the correlation degree between knowledge. Using Apriori algorithm and its related principles to find out the correlation degree between knowledge items to form the correlation degree matrix asso_n×n, specific as shown in Table 2.

Table 2.

Knowledge relevance matrix

Knowledge item	K₁	K₂	⋯	K_n
K₁	asso₁₁	asso₁₂	⋯	asso_1n
K₂	asso₂₁	asso₂₂	⋯	asso_2n
⋮	⋮	⋮	⋮	⋮
K_n	asso_n1	asso_n2	⋯	asso_nn

Step3: Calculate the knowledge matching degree, i.e., assigning fusion to the inter-knowledge correlation and inter-knowledge view similarity. The linear fusion is utilized to perform the assignment fusion of the two, as shown in equation (5), where the weight coefficients are determined using the sigmoid function, as shown in equation (6): (5) $M (i, j) = α (i, j) \times (i, j) + (1 - α (i, j)) * a s s o (i, j)$ (6) $α (i, j) = 2 \times (1 - \frac{1}{1 + e^{- | t_{n} |}})$

Where: α is the weighting factor. I_u is the number of times the knowledge item i,j was scored.

Calculate the matching degree matrix M_n×n, where M_n×n is the symmetric matrix, as shown in Table 3.

Table 3.

Knowledge matching matrix

Knowledge item	K₁	K₂	⋯	K_n
K₁	M₁₁	M₁₂	⋯	M_1n
K₂	M₂₁	M₂₂	⋯	M_2n
⋮	⋮	⋮	⋮	⋮
K_n	M_n1	M_n2	⋯	M_nn

For sparse matrices, if the knowledge items have too few common ratings and the similarity calculation is performed only on the knowledge rating matrix, the obtained similarity results are likely to deviate from the actual similarity values due to the influence of multiple aspects. From equations (5) and (6), it can be concluded that if knowledge items i and j have a higher number of common ratings, the weight of the similarity of the knowledge item views in the matching degree is greater. If knowledge items i and j have a lower number of common ratings, the weight of knowledge item attribute relevance is greater in the matching degree.

2.1.5

Application of BERT model in text embedding vector computation

BERT is a pre-trained language representation model [17], which uses a bi-directional Transformer as an encoder for contextualizing the left and right sides, as opposed to the traditional single-item language model or the shallow splicing of two single-item language models for pre-training.The Transformer eschews the RNN recurrent network structure, and is based entirely on the attentional mechanism to text modeling. So it can better utilize the text context information to generate deeper bi-directional linguistic representations, which alleviates the problem of multiple meanings of words to a certain extent and improves the ability of the model to extract features.The structure diagram of the BERT model is shown in Fig. 1.

As shown in Fig. 1 BERT is composed of 12 layers of Transformer encoder, where each Transformer encoding unit mainly contains two parts: multi-head attention mechanism and feed-forward neural network. The relationship between the current word and other words in the context can be calculated by the multi-head attention mechanism, which is shown in equation (7). Where Q, K, V are the three different vectors of the word, and QKT is the similarity between the word vectors, and scaled by d_k, and finally normalized by softmax to get the probability distribution, and then get the weights of the sentence word vectors to sum up the representation. The result is then fed into the feed forward neural network and the output obtained is used as the input for the next Transformer until the final Transformer output to get the final output of the BERT: (7) $A t t e n t i o n (Q, K, V) = Softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V$

In this paper, in order to improve the ability of BERT as an Encoder to capture textual semantic information, the structure of BERT model is improved by introducing an average pooling layer acting on token embedding. Changes the vector representation of the original text before it is fed into the BERT encoder from token embedding + position embedding + segment embedding to token embedding + average pooling layer + position embedding + segment embedding. However, the original vector dimension of BERT is 768 dimensions, and the dimension decreases after the introduction of the average pooling layer, which can lead to dimensionality bias. Therefore, for different sizes of pooling kernels, the missing dimensions are complemented with zeros in the token embedding layer. The improved BERT structure is shown in Fig. 2.

2.1.6

Screening mechanism and scoring system design

In order to ensure the close connection between the Civics elements and the ancient literature course content and the effective mining of its inherent Civics education connotation, this study proposes a comprehensive assessment system. The system assigns a composite score to each ancient literature course material by comprehensively analyzing the textual relevance of the civic and political elements and the content of the ancient literature course.

The system first uses the cosine similarity algorithm to quantify the degree of thematic and conceptual matching of ancient literature course content, and generates a “text relevance score”. TfidfVectorizer converts the text into a vector space model and calculates the cosine similarity between the Civics element vectors and the course content vectors to accurately reflect the level of similarity between the two in terms of content.

Next, the system utilizes the pre-constructed keyword library of Civic and Political elements to count the number of occurrences of keywords related to Civic and Political education in the ancient literature materials, so as to derive the “Civic and Political Element Score”. By calculating the size of the intersection of keywords between the material text and the keyword library, the system evaluates the value of the ancient literature course in disseminating the concept of Civic and political education.

In order to further improve the assessment accuracy, the system incorporates the deep semantic similarity between the Civic and Political elements and the course content calculated by the BERT model - “BERT similarity” - to capture more complex semantic associations. In the end, the comprehensive scoring system aggregates the scores of every aspect by using the following modified formula: (8) $\begin{matrix} C o m b i n e d R e l e v a n c e (C, N) = w_{1} \times K e y w o r d O v e r l a p (C, N) \\ + w_{2} \times C o s i n e S i m i l a r i t y (C, N) \end{matrix}$ (9) $\begin{matrix} M a t c h S c o r e (C, N) = w_{3} \times C o m b i n e d R e l e v a n c e (C, N) \\ + w_{4} \times B E R T S i m i l a r i t y (C, N) \end{matrix}$

Where, C denotes the course content, N denotes the Civic elements, KeywordOverlap(C,N) refers to the keyword intersection score value, CosineSimilarity(C,N) is the cosine similarity calculated based on the TF-IDF vector. And BERTSimilarity(C,N) is the deep semantic similarity obtained by BERT model. The weight coefficients, w₁,w₂,w₃ and w₄, can be adjusted according to the actual application scenarios and requirements to balance the weight of each score item in the final matching score. Such a comprehensive calculation ensures a comprehensive and objective assessment of the relevance of the Civics elements to the content of the ancient literature course as well as the value of Civics education. In this paper, from the consideration of the importance of each factor, set w₁ = 0.3, w₂ = 0.7, w₃ = 0.6, w₄ = 0.4.

2.2

Recommendation of teaching resources for course Civics

Through the automatic scoring and screening of the Civics teaching elements in the previous section of the course, it is possible to successfully obtain high-quality Ancient Literature course teaching resources with a high content of Civics elements and easy to carry out Civics teaching. In this section, a resource recommendation algorithm will be designed based on the knowledge graph in natural language processing technology to automatically recommend high-quality ancient literature teaching resources to students. Students will be encouraged to accept the cultivation of the civic-political connotation embedded in the curriculum in the process of learning high-quality content, so as to give full play to the nurturing role of the construction of civic-political teaching resources for the ancient literature course.

2.2.1

Construction of Knowledge Maps

The process of knowledge graph construction involves several key steps, including: selection of domain knowledge, entity identification, relationship extraction, attribute extraction, knowledge fusion and continuous updating of the knowledge base [18]. The selection of domain knowledge requires a precise definition of the scope of the Civic Teaching Resources of Ancient Literature Course to ensure the specialization and applicability of the constructed knowledge map. Entity identification is to identify the key elements in the domain, laying the foundation for subsequent relationship extraction and attribute extraction. Relationship extraction focuses on the logical associations between entities, and its accuracy directly affects the quality of the knowledge graph and the efficiency of the recommendation system. Attribute extraction extracts the feature descriptions of entities from the data to enhance the information richness of the knowledge graph. The aim of the knowledge fusion process is to combine knowledge from different sources and solve information redundancy and contradiction. Continuous updating of the knowledge base ensures that the recommender system reflects the latest teaching resources and knowledge advances, and self-optimization of the knowledge base is achieved through the use of dynamic learning mechanisms and feedback loops.

2.2.2

Recommendation Algorithm Design

In this paper, the recommendation algorithm adopts a hybrid recommendation algorithm that incorporates content-based recommendation (CBR) and collaborative filtering recommendation (CF) mechanisms, and further enhances the accuracy and personalization of the recommendation through knowledge graph.

The set of users is set to be U = {u₁,u₂,⋯,u_n}, the set of resources is set to be R = {r₁,r₂,⋯,r_n}, and the set of user rating records for resources is set to be S = {s_ij}. Content-based recommendation analyzes the feature vectors of the resources, which are defined to be V_rj = {ν₁,ν₂,ν₃,⋯ν_k}, while the user’s preference vectors are derived from historical behaviors. The algorithm calculates the similarity between the user preference vector and the resource feature vector by cosine similarity.

The collaborative filtering part discovers the similarity by analyzing the rating patterns between users or between resources, using the user’s collaborative filtering based rating prediction formula as shown in Equation (10): (10) $s_{i j}^{'} = {\dot{s}}_{i} + \frac{\sum_{u_{i} \in U, k + i} \sin (u_{i}, u_{k}) \cdot (s_{k j} - {\dot{s}}_{k})}{\sum_{u_{i} \in U, k + i} | \sin (u_{i}, u_{k}) |}$

where $s_{i j}^{'}$ is the prediction of the rating of user u_i on resource r_j. s_i and s_k are the average ratings of users u_i and u_k, respectively. sin(u_i,u_k) is the calculation of similarity betweenusers u_i and u_k.

The key to the design of the recommendation algorithm combined with knowledge graph is to utilize the semantic relationship of knowledge graph to enhance the association analysis between resources, and let the association weight established through knowledge graph between resource r_j and resource r_k be w_ij, then the comprehensive recommendation score of the resource to the user is shown in Equation (11): (11) $s c o r e (u_{i}, r_{j}) = α \cdot s i m (u_{i}, r_{j}) + β \cdot s_{i j}^{'} + γ \cdot \sum_{r_{i} = R, k \neq j} w_{j k} \cdot s_{i k}^{'}$

where α,β,γ is a regulation parameter used to balance the influence of content-based recommendations, collaborative filtering recommendations, and knowledge graph-enhanced recommendations r_j. By r_k optimizing these 3 parameters, a fine-grained regulation of the recommendation system w_ij can be achieved.

2.2.3

Recommendations for teaching resources

Let the set of interaction behaviors of user u on resource r within time window T be B = {b₁,b₂,⋯b_n}, where, b₁ is a specific interaction behavior. Define the user’s aggregate interest in the resource I(u,r,T) as shown in Equation (12): (12) $I (u, r, T) = \sum_{i = 1}^{n} f (b_{i}, t_{i}) \cdot d e c a y (t - t_{i})$

where f(b_i,t_i) is a weight function based on the type of behavior and time that measures the importance of individual behaviors. decay(t–t_i) is a time decay function that reduces the influence of past behaviors on the current interest level and ensures real-time recommendations. The time decay function is shown in equation (13): (13) $d e c a y (t - t_{i}) = e^{- λ (t - t_{i})}$

where λ is the decay rate. t is the current time. t_i is the time when the behavior occurs. Through this decay function, the algorithm is able to dynamically adjust the user’s interest model to reflect the immediate changes in the user’s interest.

The recommendation list generation algorithm based on the comprehensive interest degree mainly considers the user’s behavior in the recent time window, calculates the interest degree of all resources to the user, and then sorts them according to the interest degree, and selects the resource with the highest score as the recommendation. The formula for generating recommendation list L(u,T) is shown in equation (14): (14) $L (u, T) = s o r t_{r \in R} I (u, r, T)$

where sort is the operation of sorting in descending order of interest. R is a collection of resources. In this way, a personalized recommendation list can be generated in real time in response to changes in user interest.

3

Application and analysis

3.1

Data acquisition

This paper writes a crawler program for the open Guda Literature Curriculum Resource Platform, sets keywords and screening rules such as “Ancient Literature”, “Curriculum Ideology and Politics”, and “Poetry Appreciation”, and scrapes the text data required by symbols from the web page. At the same time, in the academic database, the advanced search function is used to enter the name of the ancient literary work, the author, the dynasty, and the ideological and political related subject words such as “patriotism”, “moral cultivation”, “humanistic spirit”, etc., to accurately locate the required literature and materials, and obtain the original text or abstract information according to the download method provided by the database. Using the above methods, this paper finally obtained 124 poetry articles, including Lisao, and 18 ancient literary works, including The Dream of the Red Chamber. These collected and classified data have high academic value and reliability, and can be directly applied to the construction of teaching resources.

3.2

Screening and Analysis of Civics Elements

3.2.1

TD-IDF-based composite score

This section uses the TD-IDF method to calculate the scores of ideological and political elements based on the comprehensive scoring system constructed above, and automatically screens out the top 10 ancient literary resources and their prominent ideological and political elements, as shown in Figure 3, in which T1-T10 respectively represent “Lisao”, “The Book of Poetry: No Clothes”, “Difficult to Walk”, “Strange Tales from Liaozhai”, “University”, “Charcoal Seller”, “Zizhi Tongjian”, “Romance of the Three Kingdoms”, “Fisherman’s Proud Autumn Thoughts”, and “Summer Quatrain”. Analysis of the information in the figure shows that the scores of the ideological and political elements of 10 ancient literary resources such as Lisao are 0.922, 0.914, 0.879, 0.763, 0.682, 0.667, 0.639, 0.618, 0.611 and 0.602 respectively. The ideological and political elements prominently contained in these 10 ancient literary resources are: the pursuit of truth, unity and fraternity, positive optimism, daring to criticize, a sense of social responsibility, caring for the people’s livelihood, historical thinking, loyalty, patriotism, and perseverance. The results show that the comprehensive scoring system using the TF-IDF algorithm can preliminarily complete the score calculation and automatically screen for ideological and political elements in ancient literary resources.

3.2.2

Combined score based on TF-IDF and BERT

To improve the accuracy of automatic screening of ideological and political elements, and to test the effectiveness of the improved BERT model in the comprehensive scoring of ideological and political elements. Before the start of the experiment in this subsection, 100 modern literary classics, including “Camel Xiangzi”, “The Scream” and “Midnight”, were obtained by using the data collection method described above, and mixed into the constructed ancient literature resource database. The scores of ideological and political elements of literary resources were calculated by using the TF-IDF method and the method of combining TF-IDF and the improved BERT model, and the top 10 results were screened out, as shown in Figure 4, and (a) and (b) were the automatic screening results of TF-IDF and TF-IDF+ improved BERT model, respectively. In order to facilitate the distinction, A1-A10 are used to represent the results of TF-IDF screening, and B1-B10 are used to represent the results of TF-IDF+ improved BERT model screening. Analysis of Figure 4(a) shows that after adding modern literary works, the literary resources screened out by the TF-IDF method are “Camel Xiangzi”, “Scream”, “Lisao”, “Difficult to Walk”, “Dream of Red Mansions”, “Long Song Xing”, “Watching the Sea”, “Crossing the Zero Ding Yang”, “Lime Yin” and “Analects”, and the scores of ideological and political elements are 0.874, 0.833, 0.796, 0.748, 0.716, 0.683, 0.662, 0.631, 0.619, 0.600. The results show that the prominent ideological and political elements contained in these literary resources are fairness and justice, anti-feudalism, pursuit of truth, positive optimism, positive optimism, cherishing time, empathy, patriotism, honesty and dedication, and lofty ideals. However, through the appreciation of the poems, it is found that the prominent ideological and political elements in Dream of Red Mansions, Watching the Sea and Analects are criticism, lofty ideals and empathy. It shows that after adding modern literary works, the TF-IDF algorithm not only screens the two modern literary works “Camel Xiangzi” and “The Scream”, but also has the problem of wrong matching of ideological and political elements and literary resources. The results show that the addition of modern literature has a significant impact on the performance of the TF-IDF method. In Fig. 4(b), it can be seen that the literary resources screened by TF-IDF+ improved BERT model are Lisao, Mencius, Shiji, Dream of Red Mansions, Shi’er, Romance of the Three Kingdoms, University, Book of Songs, Charcoal Seller, and Summer Quatrain. The results show that the prominent ideological and political elements contained in it are: the pursuit of truth, noble character, taking history as a mirror, criticism, family and country feelings, loyalty, social responsibility, unity and friendship, caring for people’s livelihood, and perseverance. After poetry appreciation, it was found that the matching rate of literary resources and prominent ideological and political elements was 100%. The results show that TF-IDF+’s improved BERT model can effectively screen out ancient literary resources after adding modern literary works, and show the prominent ideological and political elements contained in them, so as to provide good technical support for resource recommendation.

3.3

Analysis of resource recommendations

In order to verify the overall effectiveness of the automatic recommendation method of foreign language teaching resources based on knowledge mapping, it is necessary to test the recommendation method of teaching resources for Civics and Politics of Ancient Literature course based on knowledge mapping.

3.3.1

Coverage

The coverage rate is used as a test metric for this paper’s method, the collaborative filtering-based method (Method1) and the deep learning-based method (Method2), and the coverage rate is calculated by the following formula: (15) $C o v e r a g e = \frac{| R (U) |}{| I |}$

In the formula, R(U) represents the list of recommended teaching resources for the ancient literature course, and I represents the number of teaching resources for the ancient literature course.

The coverage test results of this paper’s method, the collaborative filtering-based method and the deep learning-based method are shown in Table 4. Analyzing the data in the table, it can be seen that the coverage rates of using this paper’s method to recommend the teaching resources of the ancient literature course Civics for students are above 98%, the coverage rate of using collaborative filtering method to recommend the teaching resources of the ancient literature course Civics for students fluctuates around 75%, and the coverage rate of using deep learning to recommend the teaching resources of the ancient literature course Civics for students is around 60%. Comparing the coverage rates of this paper’s method, collaborative filtering method and deep learning method, it can be seen that this paper’s method based on knowledge graph has the highest coverage rate in recommending Civics teaching resources for ancient literature courses, because this paper’s method combines collaborative filtering recommendation algorithm on the basis of knowledge graph, and it has already discriminated the ancient literature resources through Civics element filtering between the recommendations, which improves the coverage rate of resources.

Table 4.

The coverage test results of different methods

Epoch	Coverage rate/%			Epoch	Coverage rate/%
Epoch	Ours	Method1	Method2	Epoch	Ours	Method1	Method2
1	99.13	75.36	59.42	11	99.05	74.83	61.57
2	98.62	77.26	64.22	12	99.80	76.15	58.38
3	98.66	75.69	57.43	13	98.32	76.08	62.32
4	98.71	75.03	57.45	14	99.5	74.75	57.00
5	98.75	73.65	60.45	15	98.22	73.95	64.33
6	98.23	76.55	60.14	16	98.37	76.14	60.35
7	98.82	73.90	65.48	17	99.24	75.14	60.78
8	99.29	74.62	60.17	18	99.50	73.09	59.23
9	98.89	74.31	57.71	19	99.59	75.05	62.34
10	98.53	72.85	59.62	20	99.28	75.55	63.75

3.3.2

P/N measurement

The P/N measure is used to measure the proportion of more important resources when recommending teaching resources for Civics in ancient literature courses. The test results of this paper’s method, collaborative filtering recommendation method and deep learning recommendation method are shown in Figure 5. The P/N of this paper’s method is always the highest during the 21 tests of ancient literature course Civics teaching resources recommendation. In contrast the collaborative filtering and deep learning methods have lower P/N values. Comparing the test results of this paper’s method, collaborative filtering method and deep learning method, it can be seen that this paper’s method has a better effect on the recommendation of the ancient literature course’s Civics resources.

3.3.3

AUC Measurement

The effectiveness of this paper’s method, collaborative filtering recommendation method and deep learning recommendation method is tested through the AUC, AUC is the area under the ROC curve, which is usually taken in the interval of [0.5,1], and the higher the AUC value, the higher the student’s satisfaction, and the test results of this paper’s method, collaborative filtering method and deep learning method are shown in Fig. 6. According to the figure, in the process of multiple tests, the ACU value of this method is the highest, the AUC value of collaborative filtering method is the second highest, and the AUC value of deep learning method is the lowest, which indicates that the method of this paper provides students with the highest degree of user satisfaction when providing students with teaching resources of the ancient literature course of philosophy and politics because the proposed method adopts the knowledge graph to provide students with courses accurately according to the student’s registration information and query information in the automatic recommendation of the teaching resources of the ancient literature course of philosophy and politics. This is because the proposed method adopts knowledge graph to provide students with accurate teaching resources of the ancient literature course based on their registration information and query information in the automatic recommendation of teaching resources of the ancient literature course, which improves the satisfaction of students.

4

Conclusion

The automatic filtering method of civics elements proposed in this paper can effectively complete the filtering of civics elements of different ancient literature resources, and can correctly filter the civics elements that match 100% with the ancient literature resources in the interference of adding modern literature writings, which shows good performance. Compared with the collaborative filtering recommendation model and the deep learning recommendation model, the course civics teaching resources recommendation model has higher resource coverage, P/N measure value and AUC value, and is more suitable for recommending high-quality ancient literature course civics teaching resources to students. The results show that adopting the method of this paper plays a significant role in intelligently constructing the teaching resources of the Civics and Politics of Ancient Literature course and promoting its benign reform.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

Intelligent Construction of Civic Teaching Resources for Ancient Literature Course Based on Natural Language Processing Technology

Chi Zhang

Published Online: Mar 21, 2025

Received: Oct 19, 2024

Accepted: Feb 16, 2025

DOI: https://doi.org/10.2478/amns-2025-0630

KeywordsNatural language processing technology, TF-IDF, BERT model, Knowledge graph, Teaching resources

© 2025 Chi Zhang, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
Natural language processing technology, TF-IDF, BERT model, Knowledge graph, Teaching resources