Information Processing and Knowledge Mapping Construction of Art Literature Based on Big Data Analysis--Taking Ming and Qing Art History as an Example
Data publikacji: 21 mar 2025
Otrzymano: 13 paź 2024
Przyjęty: 10 lut 2025
DOI: https://doi.org/10.2478/amns-2025-0650
Słowa kluczowe
© 2025 Ying Zhou et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Art professional library materials are one of the important curriculum resources in art teaching. Teaching resources of art majors include corresponding teaching equipment and devices, art galleries, libraries, museums, natural environment resources, information technology curriculum resources and art library materials outside the school [1-2]. Art library materials have always been and will always be one of the important curriculum resources in art teaching. This is because in the material resources necessary for art teaching, off-campus libraries, museums, and art galleries are used as part of teaching rather than regular teaching resources, and natural environment resources are often resources for students’ sketching [3-4]. And library materials, including teachers’ and students’ reference books, art magazines, art education magazines, slides, CD-ROMs and so on are the long-term use of resources for teachers to prepare for lessons, classes and students to collect and consult, so this part has long been an important support for the development of art teaching. The construction of art library and information materials as a whole library and information resources construction of a category, in colleges and universities in art teaching and scientific research has an important role, art colleges and universities in the construction of library and information materials is mainly taken to the institution library as the main, supplemented by the school library in the form of [5-6].
In the past, art specialized colleges and universities mainly focus on purchasing, collecting and subscribing paper books, such as a large number of picture books, art periodicals, etc. Then, with the arrival of the information age, the large number of electronic documents and the continuous development of the network in the information resources of art books, it is inevitable to bring new topics to the construction of library materials in art colleges and universities [7-9].
Fine art is a visual art, which inspires creative inspiration through exposition. With the arrival of the information society, modern art has a new sublimation, advocating the modernization, socialization and quality of art education is increasingly becoming a major issue of concern for higher art colleges and universities [10-11].
At present, the Internet network with art-related multimedia information resources are very rich, breaking through the traditional art education resources in the personnel, geographical, time and space on the multiple restrictions, to provide a large number of comprehensive, free and open information, they can enrich the students’ knowledge and knowledge reserves. However, these resources are in a “discrete” situation, which is not easy to retrieve and utilize [12-13]. In the information age, how to build and manage the library of art colleges and universities to better adapt to the development of the times and better serve the teaching and research can not be a problem worthy of serious consideration [14-15].
In this paper, knowledge mapping and parsing of research literature data on the development history of Ming and Qing art during 2019-2023 is carried out by means of the literature co-occurrence analysis method and co-occurrence clustering analysis technique. The overall situation of journal articles in the field of Ming and Qing art development history research was analyzed by scientometric visualization from the overall number of articles published in the development history, time, and research keyword co-occurrence and clustering. By discussing the development history of Ming and Qing art history research, it provides references for further improving and promoting the research of Ming and Qing art history.
Word frequency analysis is one of the most representative analysis methods in bibliometrics, which is mainly applied to the study of hotspots and development trajectories of disciplines, based on the basic theory of Zipf’s law, that is, through the keywords or subject words, the frequency of changes to determine the hotspots of research and its changing trends. Keywords, as the central content of a piece of literature, condense, highly summarize the main idea of the paper’s research, is a summary of scholars’ views and ideas, so the higher the frequency of keywords in a certain field of literature, the more representative of the field of research hotspots. The process of word frequency analysis includes data retrieval, cleaning and processing, vocabulary extraction, statistical analysis and other stages, the main use of analysis software Bibexcel, CiteSpace and SATI, etc. Different software in the operation of the process of data processing and analysis of the differences in each of the advantages and disadvantages, so in the statistical analysis of the combination of a variety of tools to cross-reference to draw scientific conclusions.
Citation analysis is the basis of literature co-citation and coupling analysis, which is mainly used to study the research frontiers of a certain discipline or field. Scientific research stands on the shoulders of the predecessors, and the results of the predecessors are the basis of the subsequent research, so is the writing of the paper, through the references to point out that the article on the reference of the existing research, reflecting the continuity of scientific knowledge and inheritance, to find out the correlation between the disciplines and the disciplines of the correlation between the disciplines, and the summary of the development of disciplinary dynamics and evolution of the pulse of an important value. In scientometrics, citation analysis is a quantitative analysis of a large number of citation data, that is, the use of a variety of mathematical and statistical methods and comparisons, induction, abstraction, generalization and other logical methods to analyze the citation of journals, theses, authors and other objects of the citation phenomenon, in order to reveal its quantitative characteristics and internal laws, to evaluate the development trend of predicting the development trend of science of a kind of literature, metrological analysis methods.
Multivariate statistical analysis is mainly used in SPSS software and includes principal component analysis, cluster analysis, and multidimensional scaling analysis. Factor analysis is to analyze the correlation between variables by calculating the correlation coefficient matrix between them, in order to extract a few uncorrelated random variables from them on behalf of all the original variables to describe the correlation between the variables, and therefore to some extent it is regarded as the promotion and development of principal component analysis. Cluster analysis is the classification of a group of objects based on the characteristics of the elements of the objective function, where the characteristics of the elements within the group are similar, while they differ from the characteristics of the elements outside the group. Cluster analysis is based on the analysis of data similarity, i.e., classification analysis based on the similarity and difference between the elements of the objective function to determine the criteria. Multidimensional scale analysis is a technique that converts data in a high dimensional space into data in a low dimensional space through some kind of nonlinear transformation, where the transformed data still maintains the geometric relationships of the original data in an approximate manner.
Social networks, which originated in anthropologists’ exploration of interpersonal relationships in complex communities, are collections of social actors and their relationships. Social network theory assumes that actors are situated within social networks and that characteristics such as the structure and relationships of the network have a significant impact on them. Social network analysis is a sociological research method that involves mapping and measuring relationships and flows of people, groups, organizations, or other entities processing information and knowledge. The formal representation of a social network is a point and a line, and thus, scholars of social network research generally view society as a collection of points and lines consisting of a number of social actors clustered into points and relationships between social actors as connecting lines between the points.
The co-word analysis method is a content analysis technique, which counts the number of times a group of words appear in the same document two by two, and carries out cluster analysis of these words on the basis of this, thus reflecting the affinity relationship between these words, and then analyzes the structural changes of the disciplines and themes represented by these words, discovers the hotspots of the discipline’s research, and analyzes horizontally and vertically the dynamic development and the static structure of the disciplinary field [16-17].
Knowledge graphs are generally constructed by combining co-occurrence analysis and word frequency analysis.The method of word frequency analysis is one of the traditional analysis methods of bibliometrics, in which the use and frequency of different words in the literature are somewhat regular.
Principle of co-word analysis: co-word analysis uses the co-occurrence of word pairs or noun phrases in a literature set as a means of determining the relationship between topics in the discipline represented by that literature set.
Co-word clustering analysis is a commonly used method in co-word analysis, which is the process of clustering on the basis of co-word analysis, taking the frequency of co-word occurrence as the object of analysis, and utilizing the statistical method of clustering to simplify the intricate co-word mesh relationship between numerous objects of analysis into the relationship between a number of relatively small number of clusters and visually represent the clustering [18-19].
There are two conceptual ways in which class combinations are determined when performing cluster analysis: first, the distance between classes and classes; second, the distance between points and points.The selection of various inter-point distances and inter-class distances is realized by the options provided by the statistical software during calculation.For counting distances between classes, the intergroup distance method is used, which means that the average distance between two classes is minimized.There are many ways of defining the inter-point distance, and the commonly used one is the Euclidean distance.
In Euclidean algorithm, the object to be calculated is divided into multi-dimensional computational space to count with co-word clustering analysis, the clustering of two subject words is called 2-dimensional, and the clustering between three subject words is called 3-dimensional, and for the 2-dimensional spatial distance the algorithm is:
The coordinates of subject term
The 3-dimensional formula:
Generalizing to
In the formula
Cluster analysis is a statistical analysis method of clustering. It is used to categorize situations where the appearance of the categories of things is not clear, or even where the total number of categories cannot be determined beforehand.
Hierarchical clustering is also known as systematic tight clustering and hierarchical clustering. According to the different directions of the clustering process can be divided into two categories: decomposition method and cohesion method.
Decomposition method: clustering starts by considering all individuals as belonging to a large class, and then decomposes layer by layer according to the distance and similarity until each individual participating in the clustering forms a class of its own.
Coalescent method: its procedure is opposite to decomposition method. First, n elements into n classes, and then the nature of the closest 2 classes merged into a new class to get n-1 class, and then find the closest 2 classes to be merged into n-2 class, and so on, and finally all the elements in a class.
In order to further clarify the relationship between intracluster and intercluster and intercluster, the following three indicators for cluster analysis should be established:
Adhesion: a measure of the extent to which each subject term within a class cluster contributes to the clustering into clusters, expressing the extent to which each subject plays a role in the aggregation process of the cluster. For a cluster of n subject terms, where subject term
Density: A measure of the strength of the links that allow words to aggregate into a class, that is, the internal strength of the class, which indicates the ability of the class to maintain itself and develop itself. The density of class clusters can be calculated in a number of ways, first by counting the number of occurrences between each pair of subject words in the class in the same literature, and then by calculating the mean, median, or sum of squares of these internal links to arrive at the density of the class.
Centripetal Degree: centripetal degree is used to measure the extent to which a class group is linked to other class groups in the discipline. The greater the number and strength of links between a disciplinary area and other disciplinary areas, the more central that disciplinary area tends to be in the overall research effort.
The three measures of adhesion, density and centripetalism quantify the clusters from the inside out, providing quantitative indicators for the determination of cluster names, cluster composition and roles.
The matrix of observed values of the frequency of co-occurrence of two keywords reflects an appearance. Because the amount of two keywords co-occurrence frequency is directly affected by the size of the respective word frequency of the two keywords. In bibliometrics and scientometrics, there are several statistical indices commonly used to indicate the strength of association between keywords:
1) Ochiia coefficient method: the formula for Salton index is:
Where 2) Jaccard index: the Jaccard index is calculated by the formula:
3) In addition, the word pair frequency can also be inclusivized. The inclusionization process can reflect the degree of close connection between two words, and there are the following three formulas for inclusion processing: Inclusion index: it is mainly used to calculate the level of subject area, and the formula is:
Where,
Proximity index method: the proximity index method is the opposite of the inclusion index, which reflects the keywords with relatively low frequency of occurrence of word pairs. In all literature keywords, there may be some keywords with low frequency of occurrence, but still with some non-important keywords between the existence of a certain relationship between the proximity index method is calculated as:
Mutual inclusion factor method: first proposed by Callon et al. Its formula is:
1) Each number in the co-occurrence matrix is divided by the product of the open squares of the total frequency of occurrences of the two words with which it is related, i.e., Ochiia coefficient of words A and B = (number of times words A and B co-occur) ÷ (open squares of the frequency of occurrences of word A × open squares of the frequency of occurrences of word B). 2) The data on the diagonal of the two words indicate the degree of correlation between a word and itself, which are all calculated to be 1 by replacing the above equation, thus obtaining the correlation matrix. In the correlation matrix, due to the excessive number of 0 values, it is easy to cause too much error in statistics, which may affect the analysis results.
The numbers in the correlation matrix are similar data, and the size of the numbers shows the distance between the corresponding two keywords, and the larger the value is, the closer the distance between the keywords is, and the better the similarity is.
This paper surveys the literature studying Ming and Qing art and its development history during the period 2019-2023. The results of the survey of research literature focusing on Ming and Qing art are shown in Figure 1. Overall, Ming and Qing fine arts have progressed to a certain extent under the impetus of various forces in Chinese society and the unremitting efforts of several generations of scholars. The survey results show that during the period from 2019-2023, parallel research literature related to the development history of Ming and Qing art increased from 811 in 2019 to 1,371 by 2023. However, the number of research literatures with influential studies on the development history of Ming and Qing art is not very large. Specifically, its number ranges from 579 to 756 from 2019 to 2023.

A study of the research of qing and qing dynasties
Knowledge mapping is a product of information visualization. It is suitable for the visual representation of large-scale non-digital information resources, enabling users to witness, explore, and even immediately understand large amounts of information. Fine Arts of China, Fine Arts, Fine Arts Appreciation, Fine Arts Research, Fine Arts Education, Fine Arts Art, Fine Arts World are some of the major journals in the field of fine arts research in contemporary China.The density map of the fine arts knowledge graph constructed by keywords from 2019-2023 is shown in Figure 2. In this paper, articles, reports, summaries, dialogues, features, etc. of these seven journals from 2019 to 2023 were combed with “fine arts” as the subject term, and a total of 4,920 documents and 9,035 keywords were found. Using VOSviewer software, with “keyword co-occurrence”, and setting the keywords to have at least 20 co-occurrences, we obtained 14 keywords that meet the requirements. Thus, a knowledge map of keywords in the literature of fine arts for five years was obtained.

2019-2023 the key words build the art knowledge map density diagram
Keyword emergence refers to the rapid increase of some key words within a relevant research topic within a short period of time, which can be used to show or describe the frontiers of the research field due to the high frequency and rapid appearance of these keywords. Using knowledge graphs to analyze the emergence of keywords in literature data. The emergent keywords of Ming and Qing art research literature are shown in Figure 3. In this paper, the 11 keywords are sorted according to the chronological order of the appearance of the emergent words, which are: traditional fine arts, inheritance, folk art, art design, portrait painting, development history, Ming and Qing dynasties, landscape painting, big data, folklore, and knowledge mapping. Among them, traditional fine arts, inheritance, folk art, and fine art design mainly correspond to the research theme of reflecting the current situation of traditional fine arts in the Ming and Qing dynasties. Portrait Painting, Landscape Painting, Folklore, Ming and Qing Periods and Development History mainly study the contents presented and the subjects reflected in specific stages of art works in the Ming and Qing Periods, constituting their development process. Big data and knowledge mapping, on the other hand, mainly focus on analyzing all the literature related to the study of fine arts in the Ming and Qing dynasties by big data to find their common features. From the current point of view, the research keywords of big data, development history, landscape painting, traditional fine arts, folklore and knowledge mapping have a longer duration and are likely to continue for a period of time, and the durations of the research focused on the other keywords have been completed, but it does not mean that they will not appear again in the future.

Key words in the literature of Ming and qing dynasties
Cluster analysis is an effective method of data mining, and this type of analysis allows one to discover the global distribution patterns of data and the interrelationships between data attributes.Cluster analysis is the fusion of multiple objects of the same kind to categorize different objects into different classes.Keywords can be analyzed using different categorization methods, which helps to accurately grasp the research topic and expand the analysis of vocabulary clusters obtained.
The results of keyword clustering of Ming and Qing art research literature are shown in Figure 4. In terms of the content of literature research, based on the proportion of different contents, Ming and Qing art can be divided into five types: Jiangnan method, ink and bone method, white drawing method, Hercynian method, and Ren Bonian’s ink writing method that combines Western realism with Chinese ink and brush tradition at the end of the Qing Dynasty. These forms of artistic expression are diverse, although partly influenced by foreign cultural factors, but more is the inheritance of the fine traditions of Chinese art painting itself.10 VOSviewer software clusters the keywords into five clusters: purple, brown, green, blue and yellow. The main keywords in the purple cluster are art art, art history, art style, and so on. It contains painting theories, painting styles, and some ancient artists, which are clustered for art and styles.The main keywords of brown clustering are art creation, artists, artists’ associations, and so on. This part is mainly concerned with the relationship between art creation and artists’ associations. Green clustering main keywords in contemporary art, monumentalism, art criticism. The blue cluster focuses on portraits, landscapes, crafts design, and various brochures. This part of the cluster focuses on categorizing fine art. The yellow clustering is mainly in art education. Among them, art specialization and middle-school art education are the two main directions of art education research.Through keyword clustering analysis, it is easy to simulate the major direction and field of Ming and Qing art research in the past five years.

The results of the literature of the qing dynasty
Co-word analysis is an analytical technique that further considers the associations between words based on word frequency analysis.Its method is based on the psychological law of proximity exercise, knowledge structure, and mapping principles. In scientific bibliometrics, it usually refers to the frequency of occurrence of a word with other words in the title, keywords or abstracts at the same time, which is obtained on the basis of word frequency analysis. The higher the number of occurrences of two terms, the stronger the relationship between the two terms. By using VOSviewer software, the keyword co-word analysis was used.
The 2019-2023 Ming and Qing art knowledge graph network links to the associated words are shown in Figure 5. The larger the node font size in the figure reflects the larger the degree centrality of the keyword in the network, and the degree centrality reflects the linkage of the subject term with other subject terms in the co-word network.Taking the keyword “Ming and Qing art” as an example, eight keywords, namely, development history, art, big data analysis, landscape painting, knowledge map, traditional art, folk art and folklore, are connected to the Ming and Qing dynasties. The keyword “Ming and Qing art” is in a marginal position in the keyword co-occurrence network diagram, which means that it has not tended to be the core research position in the art research in the past five years, and its research field has been related to the eight keywords mentioned above respectively in the past five years.

The relevant words of the link of the qing dynasty art network
This paper examines and critiques the literature on the study of Ming and Qing art from 2019-2023, and the main conclusions are as follows:
1) This paper combed 4,920 documents and 9,035 keywords during the period of 2019-2023 with “fine arts” as the subject term, and obtained 14 keywords that meet the requirements, so as to obtain the knowledge map of keywords in fine arts literature within five years. 2) In this paper, according to the chronological order of the appearance of the emergent words, we get the order of 11 keywords. Among them, the keywords reflecting the current situation of traditional art in the Ming and Qing dynasties are traditional art, inheritance, folk art and art design; the keywords reflecting the contents and subjects of art works in specific stages of the Ming and Qing dynasties are portrait painting, landscape painting, folklore, Ming and Qing dynasties and development history; and the keywords focusing on analyzing all the literature related to the study of art in the Ming and Qing dynasties are big data and knowledge mapping. The keywords of “big data, development history, landscape painting, traditional art, folklore, and knowledge map” with longer duration of the research emergent words are also obtained. 3) Through keyword clustering analysis, this paper can obviously discover the general direction and field of Ming and Qing art research in the past five years. In the end, the development of Ming and Qing art in the past five years is related to eight keywords, namely, development history, fine art, big data analysis, landscape painting, knowledge map, traditional art, folk art and folklore.