Research on Big Data-driven Knowledge Graph Construction Technology for Intangible Cultural Heritage Digital Resources

Intangible cultural heritage (ICH) is an important factor in preserving cultural diversity and is in urgent need of creative transformation and innovative development. With the rapid development of science and technology, the field of ICH has made extensive use of digital technology to preserve ICH resources. Digital resource is a kind of resource reproduced through different forms of expression such as image, text, animation or sound, which is generally preserved in digital form. Digitization of ICH not only effectively protects ICH from physical damage and sovereignty, but also makes them more accessible and understandable to a wide range of people through the Internet [1-4]. With the digitization process of ICH, a large number of ICH digital resources have been created and accumulated. With the advantages of non-destructiveness, wide dissemination and easy storage, digital resources of ICH have become an important development trend [5-6]. Compared with traditional preservation methods, digital preservation methods can not only realize longer and safer preservation of ICH resources, but also reproduce them, so that more people understand and pay attention to ICH resources, and also help people to make use of ICH resources. In recent years, various countries and governments have strongly supported the digital protection and inheritance of ICH, and cultural institutions around the world have carried out ICH digital resource projects mainly in the form of databases, digital libraries, digital museums and digital teaching [7-10]. For example, Hong Kong Intangible Cultural Heritage Database, China Intangible Cultural Heritage Digital Museum, Canada Intangible Heritage Digital Museum, etc.. However, the heterogeneous, multi-source and living nature of digitization has posed new challenges to traditional resource sharing methods, resulting in incomplete and disconnected sharing of cultural heritage digital information or resources [11-14]. Therefore, the orderly management and in-depth development and utilization of digital resources of ICH has become an urgent problem to be solved at present.

Knowledge graph is a structured semantic knowledge base, first formally proposed by Google in 2012, and its original intention is to achieve a more intelligent search engine [15]. Knowledge graph can be more orderly and organic organization of knowledge, so that users can more quickly and accurately access the knowledge information they need, and knowledge mining and intelligent decision-making, which is widely used in search engines, intelligent customer service and recommender systems, etc [16-19]. Its goal is to describe the concepts, entities, and events of the objective world as well as the relationships between them [20]. In a knowledge graph, each node represents an entity, and edges represent relationships between entities. In the process of collecting, organizing and storing ICH, it is necessary to rely on a large amount of professional knowledge. Through the application of knowledge graph, the relevant information of ICH can be better mapped into the structured knowledge graph, and the information related to ICH can be enriched through entity identification and linking, attribute extraction, sentiment analysis and semantic representation, making the connection between different knowledge points, thus improving the coupling degree and relevance of knowledge, and realizing the sharing and communication of knowledge [21-24]. Knowledge graph provides a powerful tool and framework for realizing organic integration and deep understanding of knowledge.

In the process of digitization of ICH, the construction and application of knowledge graph is widely used. On the one hand, it helps people to quickly understand the culture of that ICH, and protect and disseminate ICH. On the other hand, the knowledge graph of ICH is dynamically updated to explore the connection between the knowledge and to realize the visual display and sharing of the knowledge. Qiu et al [25] constructed a knowledge graph of Langzhong ICHs and combined it with a virtual reality system to realize the visual display and interaction of Langzhong ICHs’ textile knowledge, patterns, and processes. Lu et al [26] constructed a domain knowledge graph with video resources of Nanjing Yunjin as information based on an ontology approach, and used Neo4j network topology and Cypher tools for relational analysis, semantic retrieval, and visualization of the knowledge graph. Gu et al [27] utilized crawler crawling and matching algorithms to implement a ICH knowledge acquisition store and an intelligent Q&A system, respectively, thus creating a ICH knowledge graph. Han et al [28] established a ICH knowledge graph based on multi-source data, and represented the graph in vectors, used the BERT (Bidirectional Encoder Representations from Transformers) model for bi-directional information encoding, and constructed ontology-entity correlation by combining cross-view modeling to optimize the graph of computational complexity and efficiency issues. Liang et al [29] mentioned an ontology-supported semantic description model of multimodal digital resources for embroidery ICH, which creates a multidimensional path for ICH resource knowledge services with knowledge graphs under the integration of multimodal knowledge such as textual and visual. Changjue et al [30] constructed a knowledge graph by extracting, integrating, and storing the knowledge of Yuanqu ICH from top-down, which can visualize Yuanqu ICH resources in multiple dimensions. Wu et al [31] converted batik ICH into image data, constructed domain-specific ontology driven by images, constructed batik ICH product knowledge graph under identifying entities and relationships, and optimized the evolutionary relationship, automatic update, and visualization of the knowledge graph by using image clustering, channel gating control mechanism with low-rank tensor fusion, Gephi and Neo4j tools, respectively. However, the unstructured data processing and fragmented knowledge in the ICH digital resources have not been effectively solved. In the context of emphasizing the storage and inheritance of ICH culture, strengthening the normative construction of the database and finding ways to put forward scientific response initiatives to ensure the perfection of the database function, in order to promote the deepening of the implementation of the inheritance and protection of ICH culture to provide a good carrier to support the work. And big data technology is a new technology carrier, the characteristics are very significant, in terms of volume, modality and other aspects of the function is more prominent, is the current information collection, storage and delivery of a new carrier, but also has a strong informationization and intelligence characteristics [32-33].

Taking the ICH digital resources of news media and official media as research samples, this paper incorporates CNN into the BERT-BiLSTM-CRF model to construct the BERT-CNN-BiLSTM-CRF ICH basic information recognition model. The text is converted into rich word vector representation using BERT, and the addition of CNN can accurately identify and emphasize the key information near the target words. Comparing the prediction performance of different models on each label of ICH data, the reliability of the model in this paper is explored. Based on this, the metadata of ICH digital resources are obtained. The metadata elements and qualifiers are mapped through the metadata, and then the knowledge ontology of ICH digital resources is created. Subsequently, the knowledge ontology is mapped to the nodes, tags, relationships, etc. of the knowledge graph, thus completing the transformation of ICH digital resources from metadata to semantic knowledge graph.

2

Characteristics of the distribution of types of digital resources of ICH

Cultural heritage is a valuable asset left to mankind by history, and can be divided into tangible and intangible cultural heritage in terms of the form in which it exists, one being tangible cultural relics of historical value, and the other being intangible historical and cultural customs that have been handed down. Both types of cultural heritage exist widely and are of great significance for research. In the practice of ICH transmission, the use of news media to publicize and report responds to the demand of ICH living heritage, leaving a huge amount of ICH digital media resources, and the study of the use of these resources for the protection and inheritance of ICH has a positive role in promoting. However, at present, there is no specific definition of ICH digital resources in the academic world, which leads to the lack of accurate understanding of such resources, and the development and utilization of such resources are also limited. Therefore, this section will focus on the conceptual interpretation of “ICH digital media resources” with the help of the previous conceptualization of news media and official media, in order to help further research on the organization of ICH digital resources.

2.1

Characteristics of the distribution of the types of digital resources of ICH in China

The “ten-point method” has the advantages of scientific and reasonable structure of category setting, high similarity between the same category and low overlap between categories, which is highly specialized, systematic and scientific, therefore, this paper adopts the “ten-point method” to make statistics on the distribution of the national-level ICH items in China. Therefore, this paper adopts the “ten-point method” to make statistics on the distribution of Chinese national-level ICH programs. As an ancient civilization in the East and one of the countries with the longest history in the world, the 5,000 years of history and culture have provided a rich historical and cultural soil for the generation and development of Chinese ICH digital resources. In terms of the total number of ICH items, the total number of national ICH items listed for key protection at the national level has reached 3610, and the distribution of the types of national ICH items is shown in Figure 1. The categories of its classification system are set as: 1. Folk literature. 2. Traditional music. 3. Traditional dance. 4. Traditional drama. 5. Quyi(traditional Chinese performing arts). 6. Traditional sports, amusement and acrobatics. 7. Traditional art. 8. Traditional skills. 9. Traditional medicine. 10. Folklore.

The distribution of China’s ICH programs shows an obvious imbalance, with a large number of ICH programs concentrated in a few head categories and a smaller number of ICH programs in some of the tail categories. Traditional skills, folklore, traditional drama, traditional music and so on have 17.42%, 13.63%, 13.10% and 11.94% of the total number of national-level ICH items respectively, and the number of ICH items in these four categories accounts for 56.09% of the total number of national-level ICH items. The number of ICH items in the categories of Quyi, traditional medicine, traditional sports, amusement and acrobatics accounted for 5.90%, 5.04% and 4.60% of the total number of ICH items at the national level, and the total share of the three items was only 15.54%.

2.2

Characteristics of the spatial distribution of ICH resources

Using Arc-GIS software to visualize the spatial distribution of the number of intangible cultural heritage in China, the results are shown in Figure 2, the darker the color the more intangible cultural heritage items. As can be seen from the figure, the spatial distribution of national-level ICH representative items in China shows a trend of dense in the east and west regions, moderate in the central region and sparse in the northeast region. The cultural space of national-level ICH items is mainly distributed in China’s Xinjiang Uygur Autonomous Region (XUAR), southeastern coastal provinces, as well as the central provinces of Sichuan, Guizhou and Yunnan. Among them, the Xinjiang Uygur Autonomous Region has 139 representative national-level ICHs, accounting for nearly 4% of the total number of national-level ICHs in China. The six eastern coastal provinces of Zhejiang, Hebei, Shandong, Jiangsu, Guangdong and Fujian have a total of 1048 national ICH items, accounting for nearly 30% of the national total. Among the other provinces, only Sichuan, Guizhou, Yunnan, Hubei, Hunan, Henan and Shanxi have more than 100 representative national-level ICH items, while the remaining provinces have less than 100 items. Three provinces in the northeast region, Jilin, Liaoning and Heilongjiang, have a total of only more than 160 national-level representative ICH items.

3

Ideas of constructing knowledge graph of ICH digital resources

3.1

Ideological framework for knowledge graph construction

The knowledge graph mainly reflects the relationship between things, according to the hierarchical architecture, the data layer composed of the data layer and the schema layer can be regarded as the data source of the knowledge graph, the “entity-relation-entity” in the fact and the “entity-attribute-value” triplet in the concept are the basic elements of the knowledge graph data layer, and its mapping in the knowledge graph is nodes and edges. Schema layer is the foundation of knowledge graph for semantic research, which consists of conceptual model and logical rules to constrain and standardize the data layer. Ontology as “metadata data” usually plays the role of concept definition in schema layer, and the data layer is the corresponding instances of ontology, and schema layer is the foundation of knowledge reasoning in knowledge graph. A knowledge graph can have only a data layer without a schema layer, but such a knowledge graph does not have semantic features and cannot be studied in depth. According to the above hierarchical structure of knowledge graph, there are two main methods for knowledge graph construction: one is top-down construction method and the other is bottom-up construction method, and the two different construction methods are used for different purposes [34]. The idea of knowledge graph construction of ICH digital resources is shown in Figure 3.

3.2

BERT-CNN-BiLSTM-CRF modeling

The BERT-CNN-BiLSTM-CRF model takes all the text blocks of the whole article as input, and the data is carried out in blocks to BERT encode word vectors for the text within the block, and after convolutional learning, the text vector Vecmterit is obtained after initial extraction of features [35]. At the same time, the effective attribute features of each Block element are extracted, and the Block attribute vectors enhanced with effective features are obtained by amplifying the features through the fully connected network after using One-Hot coding. The feature vectors of the whole article are passed through the bidirectional learning of BiLSTM network to obtain the sequence features between the Blocks, learn the contextual relationship of the text blocks inside the article, and then after the sequence information calculation of CRF, get the final sequence of text block categories of the article through SoftMax, so as to achieve the purpose of sub-benchmarking classification, and get the metadata required for the construction of the knowledge graph of ICH resources.

The structure of the BERT-CNN-BiLSTM-CRF (BCBLC) model proposed in this chapter is shown in Figure 4.

For information extraction, feature learning and utilization are important, so it is necessary to discover and use more and more effective features for model learning and training. At the same time, using the same features, using different models, or applying the models differently will have a great impact on the final classification results. BERT is one of the best classification models in recent years, and has been widely used because of its powerful characterization ability. Therefore, the text is embedded by the BERT pre-training model to obtain a globally meaningful vector representation of the text, so that the text embedding vectors can well express the semantics of the implied distance. Then the obtained text embedding vectors are used as inputs to the CNN layer, and the CNN can learn more morphological features of the text and adjust the weights of the parameters to play the more effective features better, and at the same time, streamline the parameters to reduce the computational amount of the model while ensuring the model effect. 1)

Direct maximization of feature quantification: such as text length, total number of characters, page size, block serial number, etc., which are directly represented by corresponding numbers.

2)

Indirect quantization of feature quantization: such as date, mailbox, hyperlink, etc., the ratio of the number of characters of the text type obtained by regular expression matching to the total number of characters: (1) $i s_{-} e m a i l = \frac{e m a i l_{-} c h a r_{-} n u m}{c h a r_{-} l e n g t h}$ (2) $i s_{-} d a t e = \frac{d a t e_{-} c h a r_{-} n u m}{c h a r_{-} l e n g t h}$ (3) $i s_{-} l i n k = \frac{l i n k}{c h a r_{-} l e n g t h}$

Among them, email_char_num, date_char_num, link_char_num denote the number of characters of email, date, and hyperlink contained in the text respectively, and charlength denotes the total number of characters in the text. 3)

Multi-dimensional quantization of feature quantization: the information such as label path, color, coordinates, etc. is represented by transforming into multi-dimensional.

4)

Transformational quantization of feature quantization: as in the case of labels, the text needs to be encoded and transformed into vectors of appropriate dimensions.

Then the attribute features of the text block are encoded separately, and the vectors encoded in each part are spliced, at which time the vector obtained expresses very weak information. Therefore, the vector is enhanced with features through a fully connected layer to expand the vector dimension so that it can represent the implicit intrinsic connections the current machine learning-based information extraction algorithms mainly focus on distinguishing between textual and non-textual information. The rest of the information extraction algorithms for some multi-categorization tasks also apply the internal text sequence features of the content blocks, and pay little attention to the sequence features between the content blocks. As for the dissertation web pages, the general layout of the dissertation information elements has a lot of common sequence features, such as the general top-to-bottom order: title, author, abstract, keywords. BiLSTM is a bidirectional LSTM model, and strong sequence features can be learned by bidirectional superposition of the LSTM model. The extraction of sequence features of each Block by BiLSTM achieves the learning of contextual sequence features.

Next, through the conditional probability calculation of CRF layer, the connection of the distant parts of Block sequences can be learned.

Finally, through the probability distribution statistics of SoftMax, the text block category sequence with the largest probability is obtained to realize the classification and obtain the metadata needed for the construction of the knowledge graph of ICH resources.

3.3

Two-stage knowledge graph construction

The term metadata occupies an extremely important position in the field of modern information science, and the discussion of its definition and role involves the intersection of several disciplinary branches or disciplinary fields, such as data management, information organization, and electronic records management [36]. Metadata can be defined as “data about data”, which is a kind of structured information describing information resources (e.g., documents, images, datasets, etc.) to support activities such as information retrieval, information organization, knowledge management, and knowledge discovery. Metadata is not only a simple “accessory” to data sets, but also constitutes the core of information resource management and utilization.

Based on the above metadata, this subsection will formally start the ontology construction work. The main process of ontology modeling is: first, a set of data-specific terminology is constructed. Second, the hierarchy of classes is confirmed based on the constructed terminology glossary. And define the classes and attributes sequentially [37]. This step is also the core and key step of ontology model design. After completing the definition of classes, data attributes and relationship attributes, the defined ontology is enriched and modified through the ontology checking process in order to pursue the scientific and optimization of the ontology model, and then the formal coding is used to complete the conversion of the ontology model. Ultimately. Realize the instantiated presentation and display of the ontology.

The schema layer organization of the knowledge graph of ICH digital resources is strictly in accordance with the steps and procedures of the schema layer organization in the design of the knowledge graph framework, and it completes the resolution of the ontology of ICH digital resources, the mapping of the ontology of ICH digital resources to the rules of the graph database, and the further defining of the instance relationship hierarchy and the content in sequence. The schema layer of the knowledge graph is constructed with RDF triples, with RDF triples as the basic unit of construction. Two different forms of RDF triples, “entity-attribute-attribute value” and “entity-relationship-entity”, are finally combined to form a complex RDF graph.

4

Identification model and application of basic information of ICH items

BERT-CNN-BiLSTM-CRF is a basic information recognition model for ICH items designed to facilitate the integration, management and exchange of cultural heritage information. A total of two sets of comparison experiments are set up to illustrate the improved efficiency of the model. The BERT-CNN-BiLSTM-CRF model is compared with RNN-CRF, BERT-RNN-CRF, BERT-LSTM-CRF, and BERT-BiLSTM-CRF. The performance of different models for predicting individual labels of ICH data is compared. The performance of models trained with different learning rates of BERT-CNN-BiLSTM-CRF proposed in this chapter is compared. The experiments were implemented based on pytorch framework and Tesla T4 was used for training the models.

The amount of data labels used for training and testing is shown in Table 1. This set of experiments will use each model to predict the Chinese ICH data and compare the effectiveness of each model in extracting different types of ICH basic information extraction. NAME is the name of the ICH project, TYPE is the type of the ICH project, POV is the provincial-level administrative area where the ICH project is located, LOC is the prefecture-level administrative area where the ICH project is located, TIME is the time when the ICH project has been included in the collection, BATCH is the batch of ICH projects that have been included, LEVEL is the number of batches of ICH projects that have been included in the collection. TIME is the time when the ICH item is included, BATCH is the batch of ICH items included, LEVEL is the level of ICH items included, and Avg is the average value of the performance of the above types of information extraction.

Table 1.

Data tag for training and testing

Tags	Type	Train	Test	Total
NAME	Name of the ICH project	15682	5512	21194
TYPE	Type of the ICH project	9124	3201	12236
POV	The provincial administrative region	9854	4025	13748
LOC	The land region	8847	3022	11247
TIME	Record time	11795	3945	15784
BATCH	Batch batch	7014	2713	9847
LEVEL	Rating	12452	4346	17432
O	Useless information	53219	8569	31128

A comparison of the overall performance of each model is shown in Table 2, where the BERT-CNN-BiLSTM-CRF model, improves the F1 value by 0.187 over the RNN-CRF model, improves the F1 value by 0.077 over the BERT-RNN-CRF model, improves the F1 value by 0.075 over the BERT-LSTM-CRF model, and improves the F1 value by 0.028 over the BERT-BiLSTM-CRF model F1 value was improved by 0.028.

Table 2.

Model integrity can be compared

Model	P	R	F1
RNN-CRF	0.711	0.784	0.735
BERT-RNN-CRF	0.812	0.862	0.845
BERT-LSTM-CRF	0.864	0.884	0.847
BERT-BiLSTM-CRF	0.887	0.886	0.894
BERT-CNN-BiLSTM-CRF(This model)	0.901	0.925	0.922

The relationship between the number of training rounds and the predicted F1 value of each model is shown in Fig. 5, from which it can be seen that the F1 value of the BERT-CNN-BiLSTM-CRF model is higher than that of the other models in all rounds, and the F1 value of each model tends to stabilize in the second round of training, and the BERT-CNN-BiLSTM-CRF model is still the best among the models in the tenth round. In summary, the BERT-CNN-BiLSTM-CRF model has a better performance in the task of extracting basic information of Chinese ICH items, so the BERT-CNN-BiLSTM-CRF model is chosen as the model for extracting basic information of Chinese ICH items.

5

The Construction of Knowledge Graph of ICH Digital Resources

Obtain metadata based on the above basic information recognition model of ICH items. This paper takes ICH digital resources as the research theme, and the goal is to construct semantic knowledge graph of ICH digital resources through two-stage mapping from ICH metadata to ontology and from ontology to knowledge graph on the basis of core elements and qualifiers of ICH metadata, to realize the exploitation of ICH digital resources and to excavate the potential knowledge associations. The data attributes of intangible cultural heritage digital resources can be realized according to the mapping of metadata elements or qualifiers, and the two parts of the metadata of intangible cultural heritage digital resources are classified as “intangible cultural heritage items” and “intangible cultural heritage digital resources” as shown in Table 3.

Table 3.

Partial data properties and metadata mapping relationships

Metadata element or qualifier	Data Properties	Domain	Range
Name	Name	Unleft project	xsd:string
Aliases	Aliases	Unleft project	xsd:string
Subject matter	Subject matter	Unleft project	xsd:string
Key words	Key words	Unleft project	xsd:string
Peoples	Peoples	Unleft project	xsd:string
Historical origin	Historical origin	Unleft project	xsd:string
Basic content	Basic content	Unleft project	xsd:string
Lineage	Lineage	Unleft project	xsd:string
Basic feature	Basic feature	Unleft project	xsd:string
Main value	Main value	Unleft project	xsd:string
Endangered condition	Endangered condition	Unleft project	xsd:string
Protective content	Protective content	Unleft project	xsd:string
Measures taken	Measures taken	Unleft project	xsd:string
Measures taken	Measures taken	Unleft project	xsd:string
Grade	Grade	Unleft project	xsd:string
Batch	Batch	Unleft project	xsd:string
Marker	Marker	Unleft digital resources	xsd:string
Identification	Identification	Unleft digital resources	xsd:string
Coding rule	Coding rule	Unleft digital resources	xsd:string
Coding pattern	Coding pattern	Unleft digital resources	xsd:string
Language language	Language language	Unleft digital resources	xsd:string
Copyright specification	Copyright specification	Unleft digital resources	xsd:string

The object attribute of intangible cultural heritage digital resources is the relationship between the seven classes, and the main relationship between the seven classes has been preliminarily determined in the core class of the text body, which can be further subdivided and optimized. For example, the relationship between “intangible cultural heritage items” and “people” is as follows: the second-level sub-categories of related characters are five categories: “inheritor”, “declarant”, “researcher”, “protector” and “creator”, so the relationship between related characters can be decomposed into five items: “inheritor”, “declarant”, “researcher”, “protector” and “creator”. By analogy, the relationship between “intangible cultural heritage items” and “organizational structure” is an associated institution, which can be decomposed into 5 items: “inheritance base”, “academic base”, “government management department”, “cultural institution” and “cultural ecological area”, and the relationship between “intangible cultural heritage digital resources” and “time” can be decomposed into 4 items: “created in”, “reviewed by”, “updated in” and “modified in”. Table 4 shows the ontology object attributes of intangible cultural heritage digital resources, and a total of 28 ontology object attributes of intangible cultural heritage digital resources are defined.

Table 4.

Unlicted digital resource ontology object properties

Object Properties	Domain	Range
Related project	Unleft project	Unleft project
Distribute to	Unleft project	Areal domain
Derive from	Unleft project	Source of origin
Inheritors	Unleft project	Inheritors
Declarator	Unleft project	Declarer
The researchers are	Unleft project	Researchers
Protectors	Unleft project	Protectors
Literature resources	Unleft project	Literature resources
Intuitive object	Unleft project	Intuitive object
Network resources	Unleft project	Network resources
Inherited base	Unleft project	Heritage base
Academic base	Unleft project	Academic base
There is a government management department	Unleft project	Administration department
Cultural organization	Unleft project	Cultural organization
There is a cultural ecological area	Unleft project	Cultural ecological area
Create on	Unleft digital resources	Unleft digital resource time
Audit	Unleft digital resources	Unleft digital resource time
Update	Unleft digital resources	Unleft digital resource time
Modify	Unleft digital resources	Unleft digital resource time
Record	Unleft digital resources	Unlicted digital resources
Creator	Unleft digital resources	The creator of the non-licerable digital resources
Master	Inheritors	Inheritors
Create	Things	Time, place
Liability agency	Things	Organization
Responsible person	Things	Things people
Be established	Organization	Organization time
Take office	Figures	Figure organization
Native form	Figures	Location

Through the above mapping analysis, a two-stage mapping of metadata, ontology and knowledge graph is designed for the knowledge graph construction process of ICH digital resources. The ontology construction tool used in this paper is Protege-5.6.1, the knowledge graph construction tool is neo4j community edition neo4j-community-5.15.0, and the JDK development environment is jdk-21.0.1. Visualization is one of the basic applications of knowledge graph, i.e., the use of directed graphs to intuitively understand and analyze the relationships, entities, and attributes in the graph. Through the visualization of ternary data in the knowledge graph, users can clearly understand the relevant content between each node, so that the individual individual links of each ICH digital resource become systematic knowledge, enabling users to understand the overall overview of ICH digital resources as a whole. The constructed knowledge graph is shown in Figure 6, which demonstrates the relevant information of the digital resources of Xuan paper making techniques, and indicates seven types of entity nodes of ICH projects, ICH digital resources, organizations, things, people, places and time in different colors, and users can see the corresponding node attributes by clicking on the corresponding entity nodes, and can be expanded on demand. For example, to get the information related to the researcher as Pan Jixing, including the related ICH projects, literature, working organization, place of origin, etc. of his research.

6

Conclusion

In this paper, the BERT-CNN-BiLSTM-CRF model, ontology, metadata, knowledge graph and other technologies are applied to the information recognition and knowledge organization of cultural heritage digital resources as the research object. The specific research results are summarized as follows. 1)

Analyzing the spatial distribution of China’s ICH items through Arc-GIS software, it presents the phenomenon of dense in the east and west, moderate in the central region and sparse in the northeast. They are mainly distributed in China’s Xinjiang Uygur Autonomous Region, Guangdong Province, Fujian Province, and the central provinces of Sichuan, Guizhou and Yunnan.

2)

Compared with other models, the performance of the BERT-CNN-BiLSTM-CRF ICH information recognition model with the introduction of CNN has higher accuracy and better recognition effect.

3)

The knowledge graph of ICH digital resources mapped by metadata, ontology and knowledge graph in two stages is constructed on the basis of existing metadata of ICH digital resources, and this method can effectively improve the efficiency of ontology and knowledge graph construction, and realize the integrated construction of metadata, ontology and knowledge graph of ICH digital resources.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

Research on Big Data-driven Knowledge Graph Construction Technology for Intangible Cultural Heritage Digital Resources

Xinxin Xu

Haoran Xu

Published Online: Sep 29, 2025

Received: Jan 15, 2025

Accepted: Apr 20, 2025

DOI: https://doi.org/10.2478/amns-2025-1123

Keywords<kwd>Metadata</kwd>, <kwd>knowledge graph</kwd>, <kwd>BERT-CNN-BiLSTM-CRF model</kwd>, <kwd>Intangible Cultural Heritage</kwd>, <kwd>Ontology</kwd>

© 2025 Xinxin Xu and Haoran Xu, published by Sciendo.

This work is licensed under the Creative Commons Attribution 4.0 International License.

Keywords
<kwd>Metadata</kwd>, <kwd>knowledge graph</kwd>, <kwd>BERT-CNN-BiLSTM-CRF model</kwd>, <kwd>Intangible Cultural Heritage</kwd>, <kwd>Ontology</kwd>