The construction of an autonomous knowledge system in communication based on knowledge mapping
Published Online: Feb 03, 2025
Received: Sep 21, 2024
Accepted: Jan 02, 2025
DOI: https://doi.org/10.2478/amns-2025-0032
Keywords
© 2025 Huajin Li, published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
Journalism and communication is the study of journalistic activities, communication activities and various other communication phenomena. As a late-born knowledge category in the social sciences, the establishment and evolution of journalism and communication is closely related to the phenomenon of news and communication practice [1–2]. From the viewpoint of the development history of the discipline, the formation of the two major genealogies of news and communication, journalism and communication, is not uniform: journalism in the first and communication in the second [3–5]. Journalism came into being with the rise of Western journalism in the 19th century, which is an echo of a new industry - the newspaper industry (journalism). This new industry is a product of the communication revolution triggered by printing, telegraph and other technologies [6–9].
At present, Chinese journalism and communication are in an unprecedentedly important historical position. The booming development of the Internet in China and the continuous evolution of a deeply mediatized society have made China the most vibrant social field for journalism and communication practice in the digital age, which brings a rare opportunity for Chinese journalism and communication to break through the old disciplinary knowledge system and construct an independent knowledge system [10–13]. The great practice of contemporary China will create a new knowledge system, and the autonomous knowledge system will further promote the great practice on the new journey. A systematic, complete and theoretically persuasive autonomous knowledge system of Chinese journalism and communication [14–16] will provide contemporary, systematic and original knowledge support for solving the major theoretical and practical problems in the field of journalism and communication in China at present, and will better serve the construction of Chinese-style modernization, and make due contributions to the construction of a community of human destiny and the creation of a new form of human civilization [17–20].
Study the overall architecture process based on the knowledge graph to build the knowledge graph of the autonomous knowledge system in communication science. After collecting data related to communication and pre-processing it, a feedback mechanism is introduced to improve the seven-step method and complete the ontology construction of the autonomous knowledge system of communication. Subsequently, a joint information extraction model is designed for the information extraction task, and the text embedding operation is carried out through Bert. In the naming recognition module, a BiLSTM neural network is used to encode contextual semantic and temporal information, and then the CRF decoder is used to obtain labelled sequences of the text. In the relation extraction module, an attention mechanism is introduced to calculate the probability of relational categories in the text. Word2vec is used to calculate entity similarity to complete the fusion of the knowledge graph, and Neo4j is selected to store the knowledge graph. Finally, the running time, entity recognition effect and relationship extraction effect of the joint information extraction model in this paper are analysed through the comparative practice of multiple models to explore the knowledge extraction effect of the Bert-BiLSTM-CRF-Att model on the construction of the communication science system.
A knowledge Graph is a large-scale semantic network containing entities, attributes and various semantic relationships among them, which is both a set of artificial intelligence technology systems and a mode of knowledge organisation and expression, as well as a class of large-scale open knowledge base. Knowledge graphs have strong expressive ability and modeling flexibility, and are a human-recognizable and machine-friendly representation of knowledge. In the following, we will introduce knowledge graph technology from four aspects: architecture, construction, storage, and application of knowledge graph.
The architecture of the knowledge graph contains its logical structure and the adopted technical architecture.
According to the logical structure, the knowledge graph can be divided into the schema layer and the data layer. The schema layer is the skeleton of the knowledge graph, and the ontology construction method is usually used to achieve the construction of the schema layer. Ontology construction aids in the structured implementation of the knowledge base and the normalization of data organization. In the computing domain, ontologies denote explicit representations of conceptualisation. More specifically, ontologies are explicit formal specification descriptions of shared conceptual models. Ontology construction methods generally fall into two categories, namely manual and automatic construction. Manual construction relies on the involvement of domain experts, and the method has high accuracy and reliability. However, it has the disadvantage of high labor and time costs. The method of automatic construction relies on machine learning methods and ontology construction tools to achieve automatic acquisition of ontology concepts, but the disadvantage of this method lies in its limited accuracy rate. Therefore, the currently commonly used ontology construction adopts the automatic construction method, which is then manually optimized to improve the accuracy of the construction results. Such a common method can also be called semi-automated construction. The data layer is the flesh and blood of the knowledge graph, which fills a large amount of instantiated data on the basis of the skeleton of the schema layer, generally storing the relationship information between entities through the form of (Entity, Relationship, Entity), and storing various attribute information of entities through the form of (Entity, Attribute Category, Attribute Value). These two ternary representations constitute the metadata of the data layer, which is imported through a large amount of metadata, thus constituting a semanticised network of massive data, i.e., a complete knowledge graph.
The technical architecture of the knowledge graph identifies the specific technology used in the construction process, which typically involves knowledge extraction, knowledge fusion, knowledge processing, knowledge storage, and knowledge application. The generic knowledge graph construction process is shown in Figure 1. The left side indicates the acquisition of data sources, which contains three types of data, i.e., structured data, semi-structured data and unstructured data. The process of building and updating knowledge graphs involves three main steps: information acquisition, knowledge fusion, and knowledge processing. Depending on the type of data, the steps of data processing may differ. Structured data can be directly subjected to knowledge fusion. At the same time, the other two types of data need to acquire information first and then merge knowledge.

General knowledge map construction process
Information acquisition indicates the extraction of information, such as entities, attributes, relationships, etc., from the data source, which can also be called knowledge extraction and generally includes named entity identification and relationship extraction. Knowledge fusion refers to the processing and integration of knowledge acquired through information acquisition to eliminate potential ambiguities. Knowledge processing is the assessment and calibration of the quality of the knowledge base to determine whether the quality of its knowledge base meets the indicators. It is only after passing the calibration that it can be stored in the knowledge graph. Storage is achieved with the help of graph databases. Commonly used graph databases include Neo4j, DGraph, JanusGraph, and others.
Drawing on the knowledge graph construction process, this chapter constructs an ontology oriented to the autonomous knowledge system of communication and will combine the ontology to complete the information extraction and knowledge fusion of the knowledge graph. And finally, the storage of knowledge will be designed.
In this paper, in the process of constructing a knowledge map of the autonomous knowledge system of communication studies, multiple types of data, including Baidu encyclopedias, educational materials and related resource websites, are referred to in order to ensure the completeness of the knowledge map. After obtaining the relevant resource information of communication knowledge, in order to obtain high-quality data so as to understand the characteristics of the data in the field of communication, it is necessary to carry out pre-processing operations such as cleaning, completing and integrating these data.
Combined with the actual situation of this topic, the seven-step method is selected to complete the ontology construction of the autonomous knowledge system of communication science. In order to improve the disadvantage of the seven-step method, which makes it difficult to measure the quality of the ontology, this project introduces a feedback mechanism in the ontology construction to ensure the quality of the ontology. Figure 2 shows the ontology construction method with the addition of a feedback mechanism. Firstly, the domain and scope of the ontology are determined. This study is oriented to the autonomous knowledge system of communication, and the purpose is to abstract and integrate communication expertise resources and standardise their representation, so as to provide structural layer support for the construction of expertise mapping so the scope of the ontology is communication expertise. After determining the ontology domain and scope, the core concepts are analysed based on the characteristics of the existing data sources to achieve the construction of the ontology system. Finally, experts and scholars evaluate the constructed ontology of the autonomous knowledge system of communication to determine whether the ontology can effectively integrate the current data sources of communication knowledge and make adjustments and modifications to the ontology of the autonomous knowledge system of communication to ensure that the constructed ontology is fully applicable, so as to complete the construction of the ontology of the autonomous knowledge system of communication.

The ontology construction method for introducing the feedback mechanism
The task of extracting information can be broken down into two subtasks: recognition of named entities (NER) and extraction of relationships (RE). In this project, the joint extraction model is used to complete the information extraction task, and the specific design of the joint extraction model is shown in Figure 3. The model mainly consists of three parts: the text embedding layer, the named entity recognition module, and the relationship extraction module.

The specific design of the joint extraction model
The neural network does not directly process the input text data. It needs to go through the vectorisation operation of the text. This process is called text embedding. The coding layer needs to learn the connection between each character and its semantic information, and this topic performs the embedding operation on the text using Bert. The BERT layer represents the input text as a vector, and compared to Word2Vec, BERT can learn richer linguistic information during the pre-training process.
Named entity recognition is the process of extracting entities from unstructured or semi-structured data, which can also be viewed as a sequence annotation task.
BiLSTM coding layer
In the sequence annotation task, rich temporal information is also an important class of features that has a great impact on the final annotation result, and a bidirectional long and short-term memory network (BiLSTM) is used to extract the temporal features in the sentences. In order to make better use of the following information of the sequence, there is not only a process of forward computation of the ordinary LSTM model but also a link for reverse computation, and finally, the values computed in both directions are output to the output layer at the same time.
Long Short-Term Memory Network is a special kind of recurrent neural network, which overcomes the problem of vanishing and exploding gradients of traditional RNN models. The model can selectively save contextual information through the specially designed gate structure of LSTM. The basic unit of the LSTM architecture is a memory block that includes a neural cell (denoted as
The bi-directional long short-term memory network input is still the output of the embedding layer, but an output sequence
CRF decoding layer
After getting the local maximum label probability computed by BiLSTM, it is still necessary to perceive the global constraints and context label dependencies. A better solution is to connect the CRF layer after BiLSTM, and the trained CRF layer can perceive some global constraint information. Conditional Random Field (CRF) is a graphical model for computing joint probability distributions, and the basic flow of the work is as follows: firstly, we define the named entity labelling labels B (start position), I (intermediate position), and O (non-substantive part of the sentence), and then we make the input observation sequences correspond to the corresponding labels with words as the basic unit. Since the input and output of entity recognition tasks are linear, the CRF in entity recognition is -generally- a linear chain model.
Conditional Random Field (CRF) means: suppose
The essence of relational extraction is the automatic identification of semantic relationships existing between entities from the corpus. The output mode of relational extraction is in the form of entity-relationship pairs (triples), i.e., (entity, relationship, entity). Relational extraction is classified similarly to entity extraction methods and is divided into rule-based methods and machine learning-based methods. Relationship extraction methods are performed based on corpus structure, and dependent syntactic analysis is an effective method to get the corpus structure.
The attention mechanism is able to assign different weight coefficients depending on how much the words themselves contribute to the prediction of the relation category, allowing this model to capture the most important semantic information from each sentence.
In the attention mechanism, three vectors, Query(Q), Key(K) and Value(V), are mainly involved, and the work of the attention layer can be regarded as encoding a sequence of
In Eq. (8),
The self-attention mechanism is able to obtain information that can use matrix operations to obtain global information, but due to the fixation of its parameters, it can only obtain textual information in a certain subspace. In order to capture richer feature information, the attention network with multiple heads is used. The idea is to map Q, K, and V through the parameter matrix, and then pass into the Attention function to do the operation, repeat the process
The output vector
In order to further construct the structure of the communication knowledge system, the knowledge map of the communication autonomous knowledge system can be fused to form a complete communication autonomous knowledge system, which uses semantic similarity to assist the manual alignment of entities.
Word vectors refer to the feature vectors that represent the units in the participle in terms of words or characters. Each semantic feature of a word corresponds to a dimension in the word vector. Word vector representation can be achieved through solo thermal encoding and distributed vector representation. The idea of distributed representation is as follows: the given corpus is transformed into individual word combinations after text preprocessing, and the word combinations are trained to be transformed into groups of consecutive and length-determined vectors. Through word2vec processing, entities can generate the corresponding word vectors and then calculate the similarity between entities. The cosine similarity between word vectors can obtain the similarity between entities, and the cosine value of the angle of the inner product space of two word vectors is used as a measure of similarity, and the calculation formula is shown in equation (11):
In the above formula,
Combined with the degree of maturity of the software and the number of learning materials of the software, this topic selects Neo4j, which is mature in software development, perfect in technical documentation, and low in learning cost, to do the storage of knowledge graphs. The Neo4j graph database is stable in use, full-featured, and high in performance, and it uses the Cypher language as the query language of the database, which is convenient for the implementation of the system’s functions in the latter part of the paper.
In order to validate the recognition effectiveness of the Bert-BiLSTM-CRF-Att joint extraction model for each class of entities and the overall recognition effectiveness, a confusion matrix is introduced, and the precision rate (
A dedicated dataset of the autonomous knowledge system of communication was constructed using web crawling. The obtained experimental corpus was divided into training and testing sets in the dataset in the ratio of 8:2. The types of entities such as communication types, communication functions, communication modes, communication research methods, formation and development of communication science and their correspondences were also defined according to the application needs of the knowledge map of the autonomous knowledge system of communication science.
The training process involved verifying the model parameters in terms of both the learning rate LR and the impact of batch size BS on the model F1 value. The parameters that affect model performance were adjusted through simulation experiments, and the effects of the experimental parameters on model performance are shown in Fig. 4, where Fig. 4(a) shows the model LOSS value and sub-Fig. 4(b) and Fig. 4(c) show the trend of the F1 value when the learning rate and the training batch size were adjusted, respectively. The loss value of the Bert-BiLSTM-CRF-Att joint extraction model tends to 0 as the number of iterations increases, and the finalised parameters are: the learning rate is 0.002, and the size of Batch Size is 32.

The effect of experimental parameters on model performance
The BiLSTM-CRF model and BERT-BiLSTM are selected as comparisons, and the training speeds of each epoch of the models are compared in detail and the results of the comparison of the running times of the different models are shown in Fig. 5. The Bert-BiLSTM-CRF-Att co-extraction model has faster training speeds compared to the comparison models, and this difference is particularly significant. The average training speeds of the three models are 101.04, 81.70, and 54.75 epoch/s, respectively, indicating that the joint information extraction model based on the autonomous knowledge system of communication has a faster recognition effect.

The operating time comparison results of different models
In order to verify the superiority of the named entity recognition model proposed in this paper that incorporates the autonomous knowledge system of communication, a total of several sets of comparative experimental results were conducted using BiLSTM-CRF, IDCNN-CRF, BERT-CRF and BERT-BiLSTM on the communication text corpus in the same experimental environment. The comparison of the recognition results of each model is shown in Fig. 6. The Bert-BiLSTM-CRF-Att model in this paper performs the best on the current communication autonomous knowledge system dataset, with accuracy, recall and F1 values of 76.99%, 78.56% and 77.77%, respectively, which are better than the other comparative models in all three categories. The DA-BERT-BiLSTM-CRF model that has been proposed for text can better understand the semantic information of text and enhance its recognition effect on communication science knowledge entities.

Comparison of the identification effect of each model
In addition to focusing on the overall identification effect of the model, it is also necessary to examine the identification effect of the Bert-BiLSTM-CRF-Att model on different categories. The comparison of the recognition effect of entities on different categories is shown in Table 1. The accuracy, recall and F1 values of the Bert-BiLSTM-CRF-Att model are the highest on the five categories of communication types, communication functions, communication modes, communication research methods, and the formation and development of communication, with the overall values ranging from 0.85 to 0.90. Among them, the entity recognition effect of communication type is the best, and the values of the three indicators are all greater than 0.88. To sum up, this paper’s model has a certain positive influence on the recognition effectiveness of communication knowledge materials.
Different entity recognition effects
Physical category | Models | |||
---|---|---|---|---|
Propagation type | BiLSTM -CRF | 0.674 | 0.697 | 0.685 |
IDCNN-CRF | 0.787 | 0.835 | 0.810 | |
BERT-CRF | 0.809 | 0.824 | 0.816 | |
BERT-BiLSTM | 0.662 | 0.671 | 0.666 | |
Propagation function | BiLSTM -CRF | 0.769 | 0.773 | 0.771 |
IDCNN-CRF | 0.744 | 0.732 | 0.738 | |
BERT-CRF | 0.666 | 0.682 | 0.674 | |
BERT-BiLSTM | 0.787 | 0.756 | 0.771 | |
Propagation mode | BiLSTM -CRF | 0.749 | 0.782 | 0.765 |
IDCNN-CRF | 0.698 | 0.814 | 0.752 | |
BERT-CRF | 0.834 | 0.825 | 0.829 | |
BERT-BiLSTM | 0.825 | 0.661 | 0.734 | |
Communication method | BiLSTM -CRF | 0.804 | 0.734 | 0.767 |
IDCNN-CRF | 0.846 | 0.818 | 0.832 | |
BERT-CRF | 0.668 | 0.826 | 0.739 | |
BERT-BiLSTM | 0.791 | 0.685 | 0.734 | |
The formation and development of communication science | BiLSTM -CRF | 0.695 | 0.758 | 0.725 |
IDCNN-CRF | 0.769 | 0.704 | 0.735 | |
BERT-CRF | 0.654 | 0.830 | 0.732 | |
BERT-BiLSTM | 0.776 | 0.733 | 0.754 | |
In the article, BiLSTM, BiLSTM-Attention and BERT-BiLSTM-Attention are selected for the comparative analysis of the results of relational extraction, and the results of the comparison of the results of the relational extraction algorithms are shown in Fig. 7. The Bert-BiLSTM-CRF-Att model used in the article is higher than the comparative algorithms in terms of accuracy

The results of the relationship extraction algorithm were compared
With the rapid development of the Internet and the rise of the artificial intelligence field, the vast amount of subject resources are highly repetitive and lack systematic organization. Most of the traditional methods of combing knowledge in the domain rely on manual combing by experts and scholars, which mainly include methods such as mind maps, concept maps, knowledge trees, and problem trees. The knowledge graph’s network structure naturally matches the net-like knowledge structure system in the subject domain. In this paper, the use of knowledge mapping to sort out the professional knowledge system in communication studies can effectively construct a net-like knowledge system structure, avoid the homogenisation and lack of knowledge, and at the same time, provide a new direction for the reform of the professional training system. The construction of an autonomous knowledge system in communication based on knowledge mapping not only standardises the representation of knowledge structure in communication, but also brings new ideas for the sorting of professional knowledge systems in communication, and also promotes the efficient management and use of resources in communication.
With the development of technology, the knowledge graph has become a powerful tool for organizing, managing, and applying knowledge. This paper introduces a knowledge graph into the construction of autonomous knowledge system of communication science, establishes the Bert-BiLSTM-CRF-Att joint information extraction model for entity recognition and relationship extraction in the process of construction of autonomous knowledge system of communication science, and carries out experiments to analyse the recognition effect of the model.
The study introduces a feedback mechanism in the ontology construction of a knowledge graph after collecting relevant information about communication knowledge, and according to this method, the corpus is subdivided, manually integrated, and ontologically evaluated to obtain the knowledge graph ontology oriented to communication knowledge.
The average training speed of the constructed joint information extraction model is 54.75 epoch/s, which reflects better recognition efficiency. And it shows better recognition results in both entity recognition and relationship extraction, which can recognize all types of entities in the text of the autonomous knowledge system for communication studies. The overall F1 value for entity recognition is 77.77%, with the F1 value for communication categories being above 88%, which is the best recognition result. The accuracy (89.27%), recall (87.69%) and
In this paper, the research on the construction of an autonomous knowledge system of communication studies based on a knowledge graph has been realised according to the expected goal, but there are still many improvements to be made on this basis, which are mainly reflected in the limited channels and types of data acquisition, as well as the model’s semantic comprehension of the knowledge, which is still limited. In the future, extracting data from multimedia can be used to extract text, and efforts can be made to improve knowledge extraction accuracy.