Construction of a Semantic Network for International Chinese Language Education Based on Knowledge Graph Technology and Optimization of Its Teaching Resources
Data publikacji: 23 wrz 2025
Otrzymano: 25 sty 2024
Przyjęty: 30 kwi 2025
DOI: https://doi.org/10.2478/amns-2025-1112
Słowa kluczowe
© 2025 Xiaoyun Han et al., published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
Teaching and research on Chinese grammar have traditionally had two entry points: the formal approach and the semantic approach. As far as the teaching objectives are concerned, they are generally the same, but there are quite a few differences in teaching effects. Current theoretical grammar research and grammar teaching practice have made great progress in semantic depiction, semantic teaching and contextualization, except for the teaching grammar system, which is still under the framework of “rule-based” structuralism [1]. It takes syntax as an outline and focuses on the meaning and usage of linguistic forms, with only very little coverage of semantic content such as special sentence patterns in Chinese, and does not pay due attention to the need for fine semantic expressions such as mood, subjectivization, and semantic context and the corresponding means of expression [2]. The main reasons for this situation are the lack of applicability of semantic description, the lack of accuracy of semantic feature induction, and the lack of scientific refinement of semantic concepts. In other words, we need to start from the semantic category, take "what form is needed to express a certain semantics" as a clue, carry out grammar research and teaching, and construct a scientific and applicable semantic teaching system, so as to distinguish it from the previous "syntactic teaching system" based on formal categories [3]. Only in this way can we meet the needs of the new era of international Chinese language education discipline construction and personnel training connotation development, and help to accelerate the construction of a more open, inclusive and standardized modern international Chinese language education system.
In the context of Education 4.0, the contradiction between the scaled coverage of education and the personalized cultivation of students is constantly highlighted, and the new generation of digital technologies represented by artificial intelligence provides a new solution to resolve this pair of contradictions. As a key cognitive intelligence technology, knowledge graph is a structured semantic knowledge base that describes concepts and their interrelationships in the physical world in symbolic form [4]. The combination of knowledge graphs and education produces a new type of teaching resource, educational knowledge graphs. Educational Knowledge Graph is a collection of knowledge and inter-knowledge association relationships in the field of education described using the method of Knowledge Graph. Learning software platforms based on educational knowledge graphs can not only face large-scale learners, but also plan personalized learning paths and recommend personalized learning resources for them, thus playing an important role in promoting large-scale tailored teaching [5]. At present, the construction of international Chinese language education teaching resources also presents a new trend: the initial establishment of the resource construction mechanism, the scale of the resource library is expanding, presenting the characteristics of three-dimensional, structured and serialized, of which the digital teaching resources are rich in form and considerable in quantity, injecting new vitality into the connotative development of international Chinese language education. In the context of promoting the digital transformation of international Chinese language education, the application of educational knowledge mapping in international Chinese language education is gradually unfolding, which promotes large-scale and personalized Chinese language learning [6].
Understanding the underlying logic of Chinese semantics and building a knowledge base of Chinese semantics is of positive significance for Chinese learning, and scholars have conducted related research around the construction of a knowledge base of Chinese semantics, Chinese semantics recognition, categorization and tracking. Li, S et al. combined with related studies on Chinese vocabulary to uncover 68 implicit morphological relations and 28 explicit semantic relations in Chinese, and explored the effects of contextual features and corpus on analogical reasoning in Chinese, corroborating that CA8 can be used as a benchmark for the evaluation of Chinese word embeddings [7]. Gui, T et al. affirmed the role of recurrent neural network (RNN) in Chinese named entity recognition (NER) for sequential tracking of character and word information, and tried to introduce into the lexicon-based graphical neural network with global semantics to improve the phenomenon that the rnn-based model is susceptible to word ambiguities [8]. Chen, J et al. conceived an annotation method with clustering as the core idea to efficiently deal with the logging problem with the same semantics, and confirmed the feasibility of the proposed method by demonstrating the performance of the method on a corpus, as well as showing six benchmarks for Sentence Semantic Equivalent Identification (SSEI) [9]. Wu, S et al. envisioned a cross-transformation algorithm with multivariate data embedding as the underlying architecture and incorporated Chinese character structural information to improve the performance of Chinese character NER, and carried out test and evaluation experiments to corroborate the excellent performance of the altered method [10].
With the growth of China’s influence, international Chinese language learning has also received enthusiastic attention from all over the world, and thus research literature on Chinese language education has sprung up with research perspectives on the empowerment of information technology, Chinese language teaching assessment systems, Chinese language teacher quality, and the optimization of Chinese language teaching methods. Cai, J examined the construction process of international Chinese language education informatization with examples and concluded that Chinese language education informatization needs to highlight its own characteristics, but teachers are skeptical about international Chinese language education informatization [11]. Lai, W et al. designed a strategy for online teaching quality assessment in international Chinese language education based on deep learning theory, and the comparative testing of the assessment models revealed that the designed assessment model had more superior performance [12]. Yu, S et al. examined the effect of international Chinese language teaching empowered by data mining technology and conducted a detailed study based on teaching experiments and theoretical analyses, which contributed positively to the reform and innovation of international Chinese language teaching [13]. Sun, H discusses the qualities of an excellent Chinese language teacher, including cultural self-awareness, self-reflection, and the ability for self-growth and continuous learning, which help to promote students’ understanding of Chinese culture and interest in Chinese language learning [14]. Yu, A et al. discussed the theoretical basis and significance of Chinese language teaching, on the basis of which to achieve the optimization of teaching Chinese as a foreign language, the study contributes to the construction of the curriculum system of teaching Chinese as a foreign language [15].
In order to solve the multimodal needs of semantic analysis of Chinese education, this study constructs a multimodal knowledge graph model, based on the multimodal knowledge graph embedding technology, aiming at the problems of insufficient utilization of structural knowledge in the multimodal knowledge graph relationship extraction task, the text encoder and image encoder are used to extract image and text features respectively, and the knowledge enhancement prompt tuning module is used to enhance the text embedding, and an embedding model based on knowledge enhancement and prompt tuning is proposed. And the two modal features are interacted and enhanced in the cross-modal encoder using cross-attention and similarity aggregator to get the final multimodal representation, and finally the multimodal knowledge semantic network is constructed. Finally, the performance and quality of the multimodal knowledge semantic network model in this paper are verified through experiments, which provide a powerful means of support for the optimization of Chinese educational resources.
Knowledge graphs are widely used in various scenarios in the real world, such as recommendation systems and information retrieval. However, in international Chinese language education, textual knowledge is often accompanied by corresponding image data, which makes the knowledge graph multimodal. In this study, a new multimodal knowledge graph embedding model based on knowledge enhancement and cue tuning (REKP) is proposed by utilizing text and images to extract knowledge and complement the missing facts in the knowledge graph.The REKP model enhances the text embedding of the multimodal knowledge graph through the Knowledge Enhancement Cue Tuning (KECT) module, and through the cross-modal encoder, utilizes the multi-layer cross-attention and similarity aggregator to the input text embeddings and image embeddings are interacted to produce the final multimodal representation, thus improving the accuracy of the multimodal knowledge graph relationship extraction task.
Given a multimodal knowledge graph
Specifically, a representation of the special labeling [CLS] is obtained from the final output embedding of the Hybrid Transformer architecture and the probability distribution over the class set
The REKP model employs a hybrid Transformer architecture and, in addition, in this paper, the number of layers in the image encoder is defined as
Transformer has become a core component of many models in fields such as natural language processing and computer vision, and it consists of
Among them,
The image encoder uses the initial
The text encoder, REKP, uses the initial layer
In order to solve the problem of heterogeneity and irrelevance between different modalities, REKP uses a cross-attention module after the MHA layer to reduce the heterogeneity between modalities. In addition, a similarity aggregation module is used in the FFN layer to reduce the effect of image noise.
Cross-attention, the REKP model uses cross-attention to reduce modal heterogeneity by performing per-layer header-attention computation on mixed keys and values. Specifically, text header
Then, the variant formulas were further derived:
To mitigate the detrimental effects of noise, REKP uses a similarity aggregator component in the cross-modal encoder to enable interaction between the two modalities. Denote by
REKP then applies the Softmax function to the similarity matrix
There should be a close interaction and connection between entity types and relational tags, so the REKP model introduces a knowledge enhancement cueing approach to structurally constrain the set of parameters
Structural constraints: in order to optimize the hints, REKP uses the structural constraints module. Specifically, the ternary (
The REKP model uses cross-entropy to compute a loss function that measures the difference between variable
Semantic network is the inevitable development direction of future scientific knowledge organization in the form of Internet era background, and it is also an inevitable result of knowledge automation in the future machine learning environment. The so-called semantic network refers to a kind of knowledge network built on the meaning of the text, in order to obtain the semantic network of a text, it is necessary to parse the semantics of the text and label it, the semantic parse is determined by the text of the various attributes, this work can be done automatically by the computer, which mainly utilizes the process of natural language processing in the lexical technology, latent semantic analysis and other technologies. This study is oriented to international Chinese education, in order to meet the actual needs of Chinese education, this paper proposes a multimodal knowledge graph embedding model based on knowledge enhancement and cue tuning, and then constructs a multimodal knowledge semantic network for Chinese education.
Knowledge graph is a kind of semantic network that adopts structured representation to describe the concepts in the real world and the relationship between them, which can be divided into manual construction mode, semi-automatic construction mode and automatic construction mode according to the construction mode. Entity recognition is the basic step of natural language processing tasks such as information extraction, text analysis, etc., which aims at obtaining named entities, such as names of people, places, proper nouns, etc., from text data. Relational extraction is one of the important foundations for realizing natural language processing text content understanding, and by extracting the semantic association information between entities in the text, text analysis is upgraded from the analysis of language structure to the level of text content analysis. At present, relationship extraction methods are mainly divided into rule-based methods and methods based on deep learning models. The joint extraction method obtains the entity triples with relations through the joint model of entity recognition and relation classification, and directly outputs the entity triples with relations, which can alleviate the problem of error propagation generated by the pipeline model, and the entity recognition and relation extraction tasks share a parameter model, which reduces the redundant information of the model and improves the efficiency of the model extraction.
The knowledge graph in this paper for Chinese education is a structured data model for representing and storing concepts, entities, relationships and events in the field of Chinese education, which can help educators and researchers to quickly access and analyze educational information and support educational decision-making, etc.
The knowledge semantic network construction methods in this paper are categorized into top-down and bottom-up. The top-down construction method of knowledge graph refers to extracting the knowledge system of knowledge graph from high-quality data and adding it to the knowledge base with the help of encyclopedic data, expert knowledge and other specialized knowledge. The bottom-up construction of knowledge graph refers to extracting the architecture of the data from the specified data source with the help of natural language processing technology, selecting the knowledge architecture with high confidence and applicable to the data, and adding it to the knowledge base after manual verification. Chinese educational terminology is a kind of high-quality text data with obvious data characteristics. In order to provide a shared conceptual model of Chinese education terminology and a high-quality terminology ontology structure, this paper adopts a top-down approach to construct a knowledge semantic network.
The process of constructing a knowledge semantic network for Chinese education is shown in Figure 1. First, this paper uses manual proofreading and web crawler to obtain the Chinese education terminology dataset, and the Chinese education terminology data is processed and analyzed. Under the guidance of experts in the field of Chinese education, this paper constructed the Chinese education terminology ontology in accordance with the ontology construction process (determining the domain and task, system reuse, listing elements, determining the classification system, defining attributes and relationships, and defining constraints), and verified the consistency of the ontology of Chinese education terminology. The entities in Chinese educational terminology data have significant syntactic features. Therefore, this paper designs a template-based approach to construct and recognize Chinese educational terminology entities, Chinese educational terminology synonymous entities and Chinese educational terminology homonymous entities. However, Chinese education terminology data exists in the form of separate individuals, and the semantic association relationship between Chinese education terms is weak, which is difficult to be applied to intelligent application scenarios of Chinese education. In order to enhance the semantic association relationship between Chinese education terms, this paper adopts a rule-based approach to extract the semantic association relationship with significant features in military terms, and initially constructs a semantic network of Chinese education knowledge. In order to extract the term ternary information in the terminology text which is not obvious and has semantic association relationship, this paper constructs a term relation extraction model based on the preliminary terminology knowledge map and model. The combination of the two relationship extraction methods can efficiently extract the semantic association relationships between Chinese educational terms. These entities and relational data are stored and used by Neo4j graph database. Finally, this paper constructs a Chinese education semantic knowledge service system, which includes a terminology knowledge management module, a Chinese education terminology knowledge query module, a terminology knowledge graph visualization module and a terminology text analysis module. The system is constructed with front-end and back-end separation, which provides a convenient and fast platform for Chinese education terminology knowledge management and application.

Flow chart of construction and application of terminology knowledge graph
In this paper, we use two Chinese education datasets, i.e., CTec2018 and CTec2020, which mainly consist of Chinese education textbooks published on the web between 2018 and 2020. The ratio of the training set to the test set is 7:3.In order to represent the effectiveness of the REKP model in this paper, several benchmark models are selected for comparison. Firstly, a set of representative text-based models are considered as CNN-Bi LSTM-CRF, BERT-CRF, in addition, this paper also compares other methods used for multimodal approaches, i.e., Adap CAN-Bert-CRF, Visual BERT, OCSGA, UMT, and UMGF.The evaluation metrics of the models are precision rate, recall rate, and F1 value.
The comparison results of different models are shown in Table 1. Firstly, compared with CNN-Bi LSTM-CRF and BERT-CRF, it can be clearly observed that the model of this paper outperforms the other methods on both datasets with an accuracy rate of 75.79% and 87.42% respectively on both datasets. Secondly, compared with the multimodal methods, the model in this paper still outperforms OCSGA, UMT and UMGF, so it can be shown that compared with the complete image, this paper is more helpful after knowledge enhancement for the improvement of the effect of the text entity extraction task. The precision, recall and F1 values of the pre-trained model Visual BERT are 68.77%,71.32% and 70.02% on the CTec2018 dataset, and they are all below 85% on the CTec2020 data, which is a lower performance line pair. Finally, comparing all modeling approaches, it can be seen that the multimodal knowledge graph proposed by text achieves the best results. And the effect in CTec2020 is better than the performance on CTec2018 dataset, the precision, recall and F1 value on CTec2020 dataset are 87.42%, 88.03% and 87.23% respectively, which shows that the model of this paper is better in large sample scenarios.
Comparison results of different models
Model | CTec2018 | CTec2020 | ||||
---|---|---|---|---|---|---|
Accuracy | Recall | F1 | Accuracy | Recall | F1 | |
CNN-BiLSTM-CRF | 66.17 | 68.02 | 67.08 | 79.93 | 78.69 | 79.33 |
BERT-CRF | 69.07 | 74.52 | 71.74 | 83.25 | 83.50 | 83.37 |
AdapCAN-Bert-CRF | 69.82 | 74.52 | 72.08 | 85.06 | 83.13 | 84.03 |
VisualBERT | 68.77 | 71.32 | 70.02 | 83.99 | 84.32 | 84.65 |
OCSGA | 74.64 | 71.14 | 72.85 | -- | -- | -- |
UMT | 71.60 | 75.16 | 73.34 | 85.21 | 85.27 | 85.24 |
UMGF | 74.41 | 75.14 | 74.78 | 86.47 | 84.43 | 85.44 |
HVPNet | 73.80 | 76.75 | 75.25 | 85.77 | 87.86 | 86.82 |
Ours | 75.79 | 76.91 | 76.34 | 87.42 | 88.03 | 87.23 |
This paper further tests the performance of the model in a cross-domain scenario, comparing the model selection of the better performing UMGF above with the performance of the model in this paper. CTec2020 is first tested using the model obtained by training on the CTec2018 dataset and notated as CTec2018 → CTec2020. Similarly, CTec2020→CTec2018 indicates that CTec2018 is tested using the model trained on CTec2020, and the results of the cross-task model comparison are shown in Figure 2. It can be seen that the F1 value of this paper’s model achieves better results in this cross-task scenario experiment, which is 78.8% and 76.61% on CTec2018 and CTec2020, respectively, and all the evaluation indexes of this paper’s model are higher than those of the UMGF model, which further proves that this paper’s model has excellent performance, and at the same time, it has made some progress in the migration of the model. Although the effect of this paper’s model on the CTec2018 dataset is slightly lower than that on the CTec2020 dataset, the effect of the model trained using the CTec2020 dataset is still higher than that of the model trained on CTec2018 in the migration experiment. This also shows that although the model in this paper has better results on large datasets, at the same time, training on larger amounts of data is still effective and can improve the understanding of the model. This cross-migration scenario is interesting and can facilitate the development of entity recognition tasks to better improve the effectiveness of language models.

Cross task model comparison results
In order to study the impact of the construction of knowledge semantic network on educational resources in the field of international Chinese language education in this paper, the quality of the construction of knowledge semantic network in this paper is analyzed through experiments in the research process, which mainly focuses on the process and the results of the construction of the knowledge semantic network to study the quality of the knowledge semantic network. At the same time, the positive impact of the construction of knowledge semantic network on the optimization of Chinese educational resources is studied.
International Chinese language education covers a wide range of content, including specialized vocabulary from multiple disciplines as well as everyday language, etc. Therefore, this paper classifies different terms and vocabularies in Chinese language education into five categories. Therefore, this paper categorizes the different terms and vocabularies in Chinese language education into five categories: Chinese language, mathematics and physics, chemistry and biology, history and geography, and politics. The initial entity set is then obtained and manually filtered to get the correct statistical results. The generalized method of word splitting often ensures that shorter words are accurately sliced, but when encountering long words, they are often split ambiguously, so the indicators in entity extraction from the number of composite concepts extracted, the number of entities, the average word length and the accuracy rate of several aspects of statistics on entity extraction, and the results of this paper’s extraction are compared with the traditional Ansj entity extraction method. The knowledge semantic network entity extraction results are shown in Table 2. Overall, the number of entities extracted by this paper’s method in each type of words in Chinese education is more than 3,000, which is more than the number of entities extracted based on Ansj, which indicates that the entity extraction based on this paper’s extraction is more capable of reflecting the domain of this type of words, and furthermore, it can show the important knowledge points of this type. In terms of average word length, the average word length calculated by each type of words based on the method of this paper is longer, and the average word length of the five types of vocabulary is 2.994, compared with the average word length of 2.480 for Ansj extraction, which is an improvement of 0.514 word length. In terms of accuracy, the accuracy of entity extraction in this paper ranges from 91.05% to 97.66%, and the accuracy of each type of extraction is higher than that of Ansj method. In conclusion, the entity extraction in this paper reflects its great advantages in terms of the number of domain entities, average word length, and accuracy rate, which improves the accuracy, comprehensiveness, and domain aspects of entity extraction in different domains of Chinese education, with better entity quality.
Knowledge semantic network entity extraction results
Categories | Physical extraction method | Document number | Complex concept | Entity number | Mean length | Accuracy rate(%) |
---|---|---|---|---|---|---|
Chinese | Ansj | 1352 | 2339 | 2177 | 2.35 | 91.58 |
Ours | 3382 | 3252 | 2.56 | 95.45 | ||
Mathematics and physics | Ansj | 3512 | 2991 | 2854 | 2.41 | 86.44 |
Ours | 3327 | 3178 | 2.85 | 91.05 | ||
Chemistry and biology | Ansj | 1315 | 2875 | 2280 | 2.55 | 90.63 |
Ours | 3941 | 3100 | 3.93 | 96.32 | ||
History and geography | Ansj | 1293 | 2286 | 1859 | 2.64 | 92.15 |
Ours | 3060 | 3255 | 2.87 | 97.66 | ||
Politics | Ansj | 846 | 1740 | 1381 | 2.45 | 94.33 |
Ours | 3342 | 3094 | 2.76 | 97.72 |
In knowledge graphs, point centrality not only measures the position of important knowledge points in the knowledge system, but also quantifies the knowledge points for easy differentiation. Knowledge graph in the field of basic education is essentially a graph, and the more nodes around a node, the more important the node is. The maximum degree of nodes in the entire knowledge graph reflects whether the knowledge graph better characterizes the degree of association between source data. The centrality of nodes in the knowledge graphs constructed by the two knowledge networks for each category of Chinese education is calculated by calculating the centrality of nodes in the graph and analyzed comparatively. Center degree comparison design 2 groups of knowledge graph construction experiments, experiment 1 for the use of Ansj participle and association rules to build knowledge semantic graph, experiment 2 for the knowledge semantic network constructed in this paper, the knowledge graph statistical results are shown in Table 3. From the comparison of experiment 1 and 2, the five types of corpus of Chinese education, Chinese language, mathematics and physics, chemistry and biology, history and geography, and politics are the highest point in experiment 2, respectively, 98, 221, 132, 159 and 147, and the results of experiment 2 are significantly better than experiment 1. In order to oriented to the various types of corpus of the Chinese language education field, the quality of this paper’s knowledge semantic network construction quality is higher overall quality. In entity extraction, the performance is outstanding, which can better solve the problems of weak entity domain and small entity coverage. In entity relationship extraction, according to the content of the discipline, the relationship template can be formulated, and specific relationships can be extracted, which is a great breakthrough for basic Chinese education entity relationship extraction. And it ranks highest in the number of nodes, number of edges, graph density, cohesive subgroups and other data, and these indicators actually reflect the coverage, comprehensiveness and other important quality measures of this knowledge graph, which further illustrates the excellent quality of the network in this paper.
Statistical results of the knowledge map
Categories | Building of knowledge map | Node number | Side number | Triad | Apogee | Network density(%) | Condensed subgroup |
---|---|---|---|---|---|---|---|
Chinese | Experiment 1 | 3894 | 2431 | 4445 | 65 | 0.17 | 418 |
Experiment 2 | 6124 | 2946 | 6124 | 98 | 0.18 | 551 | |
Mathematics and physics | Experiment 1 | 4308 | 1838 | 8431 | 205 | 0.31 | 244 |
Experiment 2 | 8643 | 2679 | 8643 | 221 | 0.39 | 429 | |
Chemistry and biology | Experiment 1 | 7612 | 2520 | 5349 | 52 | 0.25 | 267 |
Experiment 2 | 6617 | 2637 | 5617 | 123 | 0.29 | 441 | |
History and geography | Experiment 1 | 4981 | 1832 | 5208 | 142 | 0.36 | 154 |
Experiment 2 | 5741 | 2269 | 5741 | 159 | 0.47 | 323 | |
Politics | Experiment 1 | 6338 | 2698 | 4171 | 53 | 0.22 | 307 |
Experiment 2 | 6967 | 3020 | 4439 | 147 | 0.39 | 568 |
This study constructs a knowledge semantic network model for Chinese language education based on the multimodal knowledge graph embedding technique. The following conclusions are drawn through empirical analysis:
The accuracy rate of this paper’s model on the two datasets is 75.79% and 87.42%, respectively, and the performance is optimal under each index compared with all the comparison models. Meanwhile, the precision rate, recall and F1 value of this paper’s model on CTec2020 dataset are 87.42%, 88.03% and 87.23%, respectively, and the effect in CTec2020 is better than the performance in CTec2018 dataset, which has a relatively larger sample, which suggests that this paper’s model is better in the large sample scenario. Secondly, in the crossover scenario, the indicators of this paper’s model still achieve the best results, which further proves that this paper’s method can better improve the effect of language model. The number of entities extracted by this paper’s method in all types of Chinese education corpus is more than 3000, which is more than the number of entities extracted based on Ansj, which indicates that the entity extraction based on this paper’s extraction is more capable of reflecting the domains of this type of words. At the same time, the average word length calculated based on the method of this paper is longer, and the average word length of the five types of words is 2.994, compared with the average word length of Ansj extraction of 2.480, which is an improvement of 0.514 word length. From the point of view of accuracy, the accuracy rate of entity extraction in this paper ranges from 91.05% to 97.66%, and the extraction accuracy rate of each type of Chinese education corpus is the highest. Meanwhile, the five types of Chinese education corpus of Chinese language, mathematics as well as physics, chemistry and biology, history and geography as well as politics are all the highest points in the knowledge semantic network of this paper, which are 98, 221, 132, 159, and 147, respectively. In general, the quality of construction of the knowledge semantic network of this paper oriented to Chinese language education is of high quality in general.
In conclusion, this paper constructs a knowledge semantic network for Chinese language education through multimodal knowledge mapping, which provides the basis and support for resource processing and optimization of international Chinese language education.