Data Mining Techniques for the Preservation and Inheritance of Classical Vocal Music in Modern Society
Publié en ligne: 19 mars 2025
Reçu: 17 nov. 2024
Accepté: 20 févr. 2025
DOI: https://doi.org/10.2478/amns-2025-0353
Mots clés
© 2025 Xiaoqing Chi, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
On the road to the development of Chinese folk music, the results achieved by borrowing from the West are promising. Although the entry of western culture into China has caused a certain impact on Chinese culture, it doesn’t mean that only China’s own local culture needs to be developed; culture is more valuable and meaningful only after mutual exchanges and borrowing. In order to obtain better protection and inheritance of classical vocal music in modern society, it is necessary to realize the self-development after borrowing, which is the real meaning of borrowing [1-4]. Music is not like medicine, mathematics, according to the book is no vitality, China’s fifty-six nationalities of ethnic music and cultural resources are extremely rich, these music and culture is the cornerstone of the survival of classical vocal music. Chinese culture shows a kind of simplicity and innocence, no matter in the academic or artistic field, the Chinese people strive to use the simplest and the least amount of words to express more content, and create the most moving flavor with the least amount of ink and pen [5-6]. Chinese classical vocal music, which is built on the foundation of traditional culture, uses folk music as the material and thus develops continuously. The creation of folk vocal music works is based on traditional culture, using western composition techniques, and constantly exploring and perfecting. And nowadays, the creation of modern folk vocal music has done better in these aspects. Chinese art is mostly linear art, emphasizing the horizontal beauty of melodic lines, while Western art is mostly about vertical three-dimensional beauty [7-9]. To borrow western music as the only feasible idea and theory in the creation and performance of folk vocal music is itself against scientific thought and biased practice. How to rationally integrate the Internet resources to reduce the additional cost of protection and inheritance, how to maximize the mining of data and effectively apply it to the protection and inheritance of classical vocal music is an urgent problem for the majority of researchers [10-12]. The use of cloud computing technology can further optimize the allocation of resources, the direct use of data resources stored in the cloud, which can minimize the pressure on the end customer, and promote the continuous growth of data stored in the cloud. Data mining generally refers to a process of searching for information hidden in massive data through algorithms [13-14]. As data mining technology is more and more widely used in various industries and trades, research workers classify different research objects, construct the research model corresponding to each class, and collect a series of user data, and then utilize a variety of different types of algorithms or relational graphical models or classifiers of music resources, and finally build a music expert system based on big data. It helps vocal music protection and inheritance, and contributes to the continuation and development of vocal music culture [15-16].
The author collects the relevant data of a music platform, conducts a big data analysis of the playback and dissemination of classical vocal music on a music platform, and then analyzes the relationship between users’ collection behavior and classical vocal music works by using the Apriori algorithm, and further analyzes the data on the users’ feelings, so as to understand the users’ views and feelings about classical vocal music works, thus further promoting the protection and inheritance of classical vocal music in modern society. The study will further analyze the relationship between users’ collection behavior and classic vocal works, and further analyze the data of users’ feelings, so as to understand users’ views and feelings towards classic vocal works, and thus further promote the protection and inheritance of classic vocal music in modern society.
In recent years in the classic vocal music platforms and network platforms have most people click on the songs and videos of the amount of data gradually increased, which makes the music network platforms not only get rapid development, but also the listener’s favorite songs for feedback analysis. For major music platforms, the classic sound resources and user experience are important indicators to reflect the good or bad of a platform. The classic vocal resources and user experience can be analyzed and organized by data mining, on the one hand, it can better analyze the most popular songs and improve the overall click rate of the classic vocal platform, on the other hand, through the listener often listen to the type of songs and singers to analyze the data, and better provide the listener with songs and singers of similar and similar types of music styles. Through the application of data mining technology in the field of classical vocal music, it not only promotes the overall development of online music platforms, but also brings a very lucrative source of funding for individuals and economic companies, thus promoting the development of classical vocal music in a better and more effective way.
Knowledge discovery is a newly emerged term with information technology, knowledge explosion era, it is from a large amount of incomplete, fuzzy, noisy data to distill regular, practical and strong knowledge and information, which people can use to improve the workflow, enhance the enterprise system and increase the efficiency of the enterprise. Data mining is the core part of knowledge discovery, which is more specialized than knowledge discovery, and can be said to be an advanced stage of using knowledge to accumulate data, and its main function is to use various algorithms to find out its intrinsic laws and patterns from a large amount of data, and to assist managers to make effective decisions. Data mining is also known as knowledge discovery in the database, it is with the help of advanced technology (mining software), a large amount of knowledge and methods (customer segmentation methods), some shallow, rough, messy information for cleaning and conversion, through the organization of the data after the discovery of the potential laws and connections, the development of future things to make an effective prediction, guiding the management staff to examine the current state of enterprise development, and make timely evaluation and corresponding decision-making in a timely manner. Data mining is an advanced technology to deal with a large amount of data, it is wider in scope and easier to use, it is in the absence of clear assumptions, you can discover knowledge, mining information, get with previously unknown but effective, usable information, so as to achieve the goal of saving resources, improve revenue to increase income, so that the enterprise is in a more advantageous competitive position.
Association analysis, also known as “shopping basket” analysis, is mainly used to determine the connection between different domains in the data, to find out the dependency relationship between multiple domains, is a more important method in data mining. Association rule is to find two or more data items between the values of a certain regularity, such as a certain law is called association. We can develop new strategies based on these laws discovered. Totaling out the original unknown association rules can promote business development. The purpose of association analysis is to uncover the hidden network of association rules in the database. There are many types of association rules in large databases, such as simple association, temporal association, and causal association. Most association rules always have to consider certain parameters due to their practicality, scientific validity, and successful use. [17] The parameters to be considered for association rules in general are the number of valid items, support, credibility, etc. By limiting these parameters, the diggers can discover rules that are more compatible with the specific requirements.
Let the association rule mining in a transaction number mining library can be described as follows:
Let Let For itemset An association rule of the form where An association rule that
In general, given a transactional database, the association rule mining problem is the process of finding strong association rules by specifying minimum support and confidence picks. The problem of mining association rules can be broken down into two subproblems. The first is to find frequent itemsets, through the minimum support given by the user, to find all frequent itemsets, i.e., to meet the Support is not less than the Minspport of all subsets of items. The second task is to generate association rules that find association rules with confidence values not lower than Minconfidence in each most frequent item set according to the minimum confidence provided by the user.
The Apriori algorithm incrementally discovers frequent itemsets by growing the number of item elements. First 1-frequent itemsets
A sample transaction database is given below and the Apriori algorithm is implemented on it. The sample transaction database is shown in Table 1.
Sample transaction database
| TID | Itemset | TID | Itemset |
|---|---|---|---|
| 1 | A,B,C,D | 4 | B,D,E |
| 2 | B,C,E | 5 | A,B,C,D |
| 3 | A,B,C,E |
Trace the execution of the Apriori algorithm for the transaction database shown in Table 1
Their support numbers by scanning the database:
Pick minsup_count ≥ 2 the set of items
The 2-candidate sets are generated by
The set of items selected for minsup_count ≥ 2 forms the set of 2-frequent items:
By
The set of items selected for minsup_count ≥ 2 forms a 3-frequent itemset:
By
The set of items selected minsup_count ≥ 2 form a 4-frequent itemset
By
Also, it is easy to get the maximum frequent itemset as {
Find the association rules whose Confidence is not less than Minconfidence in each maximal frequent itemset by the minimum confidence level plotted by the user:
Apriori, as the classic frequent itemset dusting algorithm, has a landmark role in database mining. However, with further research, its shortcomings are also exposed. The Apriori algorithm has two fatal performance bottlenecks. One is scanning the transaction database many times, which requires a large input/output load, e.g., if a frequent large itemset contains 10 items, then it is necessary to scan the transaction database at least 10 times: the second is the possibility of generating a huge amount of candidate sets, e.g.,
The experiments in this section first crawl the data of twelve classical vocal pieces on a classical vocal music playback platform using a web crawler. Next, the top ten tracks of each genre and their playback volume are selected by sorting them according to playback volume. Then, the Internet correlations between song genres are quantitatively calculated. Finally, the correlations between song genres on the Internet are adjusted to account for the Internet dissemination heat. If both genre A and genre B have audio or video transmission of track A in the Internet, it is considered that genre A and genre B have some Internet relevance for track A, and their relevance is set to 1, otherwise it is 0. The relevance of the top ten tracks of each genre is accumulated to determine the overall relevance. The Internet popularity of each genre is measured by the cumulative relative number of views for the top ten tracks. The overall Internet relevance matrix for classical vocal works is shown in Table 2.
The overall correlation degree matrix of the vocal music works network
| Symphony No. 9 | Magic Flute | Aida | Turandot | Carmen | Cruciform | Messiah | The Barber Of Sevilla | Peras And Melinda | Spring Sacrifice | The Ring Of Niberlong | Bogie And Beth | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Symphony No. 9 | 12 | 8 | 7 | 10 | 13 | 4 | 8 | 3 | 5 | 1 | 8 | 0 |
| Magic Flute | 8 | 12 | 2 | 2 | 8 | 3 | 9 | 2 | 0 | 0 | 2 | 3 |
| Aida | 1 | 7 | 8 | 12 | 3 | 5 | 13 | 4 | 4 | 3 | 2 | 4 |
| Turandot | 7 | 1 | 5 | 10 | 3 | 4 | 8 | 0 | 4 | 0 | 0 | 2 |
| Carmen | 5 | 5 | 0 | 2 | 8 | 7 | 4 | 3 | 0 | 0 | 0 | 1 |
| Cruciform | 6 | 7 | 3 | 6 | 6 | 8 | 5 | 1 | 0 | 1 | 1 | 0 |
| Messiah | 3 | 8 | 1 | 4 | 7 | 2 | 11 | 10 | 2 | 1 | 0 | 1 |
| The Barber Of Sevilla | 1 | 6 | 4 | 1 | 1 | 6 | 4 | 9 | 0 | 2 | 6 | 1 |
| Peras And Melinda | 13 | 2 | 6 | 9 | 7 | 0 | 1 | 0 | 9 | 3 | 6 | 1 |
| Spring Sacrifice | 4 | 11 | 3 | 1 | 0 | 1 | 6 | 7 | 0 | 7 | 2 | 1 |
| The Ring Of Niberlong | 4 | 0 | 1 | 1 | 1 | 2 | 0 | 2 | 1 | 0 | 11 | 0 |
| Bogie And Beth | 2 | 6 | 4 | 3 | 0 | 9 | 8 | 3 | 0 | 0 | 1 | 10 |
Since the top ten tracks of classic vocal music have a huge advantage in terms of Internet dissemination heat, they are ignored for the time being for the sake of data fitting. The relationship between the overall relevance of classic vocal works and the playback quality is shown in Figure 1.

The relationship between the general correlation and the heat of the playback
Taking into account the effect of track playback on relevance (more played tracks correspond to higher relevance), the calculation of weighted relevance is introduced. The weighted relevance matrix for the network of classical vocal works is shown in Table 3.
The classical vocal music works network weighted correlation matrix
| Symphony No. 9 | Magic Flute | Aida | Turandot | Carmen | Cruciform | Messiah | The Barber Of Sevilla | Peras And Melinda | Spring Sacrifice | The Ring Of Niberlong | Bogie And Beth | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Symphony No. 9 | 0.047 | 0.043 | 0.031 | 0.051 | 0.048 | 0.03 | 0.023 | 0.002 | 0.009 | 0.008 | 0.032 | 0.009 |
| Magic Flute | 0.101 | 0.345 | 0.115 | 0.142 | 0.161 | 0.201 | 0.153 | 0.134 | 0.006 | 0.063 | 0.079 | 0.004 |
| Aida | 0.072 | 0.087 | 0.074 | 0.097 | 0.053 | 0.042 | 0.036 | 0.022 | 0.03 | 0.017 | 0.03 | 0.018 |
| Turandot | 0.027 | 0.04 | 0.06 | 0.082 | 0.014 | 0.043 | 0.044 | 0.028 | 0.004 | 0 | 0.021 | 0.012 |
| Carmen | 0.073 | 0.075 | 0.029 | 0.027 | 0.074 | 0.012 | 0.034 | 0.016 | 0.006 | 0.029 | 0.023 | 0.008 |
| Cruciform | 0.041 | 0.061 | 0.043 | 0.064 | 0.051 | 0.105 | 0.035 | 0.025 | 0.015 | 0.011 | 0.028 | 0.011 |
| Messiah | 0.016 | 0.087 | 0.02 | 0.051 | 0.033 | 0.038 | 0.106 | 0.068 | -0.001 | -0.004 | 0.014 | -0.004 |
| The Barber Of Sevilla | 0.01 | 0.035 | 0.02 | 0.025 | 0.003 | 0.046 | 0.031 | 0.075 | 0.002 | 0.008 | 0.008 | 0.007 |
| Peras And Melinda | 0.004 | 0.001 | 0.007 | 0.003 | 0.003 | 0.01 | 0.001 | 0.003 | 0.009 | 0.003 | 0.008 | 0 |
| Spring Sacrifice | 0.019 | 0.033 | 0.028 | 0.009 | 0.02 | 0.009 | 0.023 | 0.007 | 0.014 | 0.038 | 0.016 | 0.02 |
| The Ring Of Niberlong | 0.008 | 0.008 | 0.003 | 0.004 | 0.002 | 0.008 | 0.003 | 0.009 | 0.016 | 0.004 | 0.001 | 0.01 |
| Bogie And Beth | 0.002 | 0.007 | 0.006 | 0.002 | 0.006 | 0.014 | 0.001 | 0.002 | 0.013 | 0.001 | 0.005 | 0.016 |
The relationship between the web-weighted relevance of classical vocal works and playback heat is shown in Figure 2.

The relationship between weighted correlation and playback
A music platform is used to do data mining of the world’s most influential and representative classical vocal works. Firstly, the 12 selected classic vocal works and the music platform song list are searched (based on the data of June 14, 2023), and then the irrelevant information is filtered out, and finally the statistics are made. The search frequency of classic vocal music and related statistics are shown in Table 4. It can be seen that there is a huge gap between the 12 classic vocal works in terms of comment heat. At the same time, the data also shows that “Carmen” constitutes a significant advantage over other classic vocal works in terms of comment heat. Its comment volume is more than 4000, and none of the other classic vocal works have tracks with more than 4000 comments.
Classic vocal search frequency and related statistics
| Name | Word Frequency | Top Number Of Comments | Average Number Of Comments |
|---|---|---|---|
| Symphony No. 9 | 9945 | 3512 | 245 |
| Magic Flute | 5690 | 1015 | 185 |
| Aida | 4671 | 2455 | 65 |
| Turandot | 6680 | 86 | 48 |
| Carmen | 3433 | 48532 | 1732 |
| Cruciform | 3632 | 1226 | 182 |
| Messiah | 6607 | 988 | 89 |
| The Barber Of Sevilla | 7611 | 745 | 1533 |
| Peras And Melinda | 8297 | 3221 | 1205 |
| Spring Sacrifice | 8582 | 2896 | 125 |
| The Ring Of Niberlong | 8255 | 1633 | 212 |
| Bogie And Beth | 4139 | 1874 | 91 |
Taking the data of a classical vocal music playback platform as an example, when users collect songs, they are faced with a huge amount of information on many classical vocal works, and often show dazzling, unable to quickly find a satisfactory classical vocal work, which increases the time and reduces the experience of vocal music appreciation. Users’ preferences and tastes for classical vocal works are generally fixed, and there is a certain rule of choosing different classical vocal works to match according to different types of classical vocal works. Classic vocal works are interconnected, while some classic vocal works are in opposition or competition (negative correlation), these laws are hidden in a large amount of historical data, if you can discover the laws of users’ preferences for classic vocal works through data mining, you can quickly identify the taste of music users. The data related to classic vocal works collected by collectors are obtained from it. The collection data is shown in Table 5.
Collect data
| Collectors | Symphony No. 9 | Magic Flute | Aida | Turandot | Carmen | Cruciform | Messiah | The Barber Of Sevilla | Peras And Melinda | Spring Sacrifice | The Ring Of Niberlong | Bogie And Beth |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Collectors 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
| Collectors 2 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 |
| Collectors 3 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| Collectors 4 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| Collectors 5 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
| Collectors 6 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 |
| Collectors 7 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 |
| Collectors 8 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 |
| Collectors 9 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 |
| Collectors 10 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
| Collectors 11 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 |
| Collectors 12 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
| Collectors 13 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 1 |
| Collectors 14 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |
| Collectors 15 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| Collectors 16 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 |
| Collectors 17 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 |
| Collectors 18 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 |
| Collectors 19 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 |
| Collectors 20 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
The minimum support of the above rules was set to 0.2 and the minimum confidence level was set to 0.65. 7 association rule results were obtained and the association rule results are shown in Table 6.
Association rule results
| Foreterm | Afterterm | Support% | Confidence% |
|---|---|---|---|
| Symphony No. 9 | Carmen | 63.122 | 86.544 |
| Magic Flute | Aida | 53.592 | 76 |
| The Barber Of Sevilla | The Ring Of Niberlong | 50 | 73.651 |
| Aida | Bogie And Beth | 53.641 | 65.951 |
| Cruciform | Magic Flute | 54.955 | 66.653 |
| Turandot | Symphony No. 9 | 54.875 | 62.152 |
| Carmen | Cruciform | 60.253 | 72.541 |
Rule 1: The probability of having collected Symphony No. 9 and Carmen is 63.122%, and the probability of having collected Symphony No. 9 and then Carmen is 86.544%.
Rule 2: The probability of having collected The Magic Flute and Aida is 53.592%, and the probability of having collected The Magic Flute and then Aida is 76%.
Rule 3: The probability of having collected The Barber of Seville and The Ring of the Nibelungen is 50.0%, and the probability of having collected The Barber of Seville and then The Ring of the Nibelungen is 73.651%.
Rule 4: The probability of having collected Aida and Porgy and Bess is 53.641%, and the probability of having collected Aida and then Porgy and Bess is 65.951%.
Rule 5: The probability of having collected Matthew’s Passion and The Magic Flute is 54.955%, and the probability of having collected Matthew’s Passion and then The Magic Flute is 66.653%.
Rule 6: The probability of having collected Turandot and Symphony No. 9 is 54.875%, and the probability of having collected Turandot and then Symphony No. 9 is 62.152%.
Rule 7: The probability of having collected Carmen and Matthew’s Passion is 60.253%, and the probability of having collected Carmen and then Matthew’s Passion is 72.541%.
Therefore, synthesizing the results of the above research, a classical vocal music playback platform can make use of big data to recommend music to users who like to listen to classical vocal works and collect classical vocal works, so that users can hear more classical vocal works.
The collected sentiments of a classical vocal music playing platform were processed by word division using Phthon software to filter out the top 50 words, of which learning, culture, China, tradition and inheritance were the top five words with the highest frequency of occurrence. Through the analysis, it was concluded that most users showed strong interest in classical vocal music and believed that by listening to classical vocal music, they not only learned more knowledge about national culture, but also improved their music literacy and artistic expression. Some users have even started to compose some works. The top 50 words are shown in Table 7.
Top 50 vocabulary
| Serial Number | Word | Frequency | Serial Number | Word | Frequency |
|---|---|---|---|---|---|
| 1 | Learning | 490 | 26 | Innovate | 70 |
| 2 | Culture | 475 | 27 | Deep | 68 |
| 3 | Vocal Music | 428 | 28 | Charm | 64 |
| 4 | Tradition | 399 | 29 | Peoples | 63 |
| 5 | Pass On | 385 | 30 | Language | 61 |
| 6 | China | 369 | 31 | Region | 58 |
| 7 | Peoples | 345 | 32 | Connotation | 58 |
| 8 | Art | 336 | 33 | Place | 57 |
| 9 | Develop | 236 | 34 | Tune | 54 |
| 10 | Native | 223 | 35 | Rhythm | 54 |
| 11 | Affections | 200 | 36 | Disseminate | 54 |
| 12 | History | 182 | 37 | Value | 54 |
| 13 | Platform | 136 | 38 | Country | 52 |
| 14 | Form | 132 | 39 | Cultural Heritage | 51 |
| 15 | Appreciation | 128 | 40 | Spirit | 51 |
| 16 | Uniqueness | 115 | 41 | Excellence | 50 |
| 17 | Region | 108 | 42 | Experience | 48 |
| 18 | Folk | 92 | 43 | Elegance | 48 |
| 19 | Feeling | 86 | 44 | National Music | 47 |
| 20 | Understand | 80 | 45 | Interest | 46 |
| 21 | Feature | 80 | 46 | Primitive Ecology | 44 |
| 22 | Practice | 78 | 47 | Chinese Nation | 43 |
| 23 | Lyrics | 75 | 48 | Love | 41 |
| 24 | To Create | 74 | 49 | Singing | 40 |
| 25 | Protect | 74 | 50 | Protection And Inheritance | 35 |
This study analyzes the protection and inheritance of modern vocal works using association rule algorithms, and obtains some conclusions that play a role in promoting the protection and inheritance of classic vocal works. In the big data analysis of the performance of classic vocal works in a music platform, it is found that “Carmen” has a significant advantage in the comment heat its comment volume exceeds 4000. No other classical vocal works have more than 4000 comments. In the analysis of the effect of the protection and inheritance of classical vocal music, the top 50 words in the ranking of the user’s feelings were screened out, of which the top five words with the highest frequency were learning, culture, China, tradition and inheritance. Based on this, it can be concluded that the majority of users on this music platform have a strong interest in classical vocal music.
