Open Access

Exploring the Communication Path of Ancient Literary Works Based on Corpus Analysis in the Perspective of Digital Humanities

  
Sep 26, 2025

Cite
Download Cover

Introduction

Literature is the product of a certain social life reflected in people’s minds. In a class society, the author stands on a certain class position to recognize life, reflect life, spread the ideas and opinions of his class and serve the interests of his class [1-2]. Therefore, no matter ancient or modern, literature always spreads people’s thoughts and actions through self and mutual class relations [3]. At the same time, because of the image of literature with the help of language, its means of communication must be natural language as the carrier of information, direct communication behavior, the so-called “linguistic communication”, or to the body, objects, space, time, art and other non-linguistic activities as the carrier of information, corresponding communication behavior, the so-called “non-verbal communication”. “Non-verbal communication”, it is the two modes of communication of the constant interaction between the use of the composition of many vivid images, scene integration of literary works [4-6]. It can be seen that although “communication science” originated from abroad, but as a communication behavior, is the ancient and modern, everywhere, it is not something new, only that the researchers usually seldom recognize and analyze the communication phenomenon that already exists in the literary works from this aspect [7-8].

The Chinese nation has gone through five thousand years of history, leaving a large number of excellent literary works for the children of China. As an expression of traditional culture, ancient Chinese literature has high academic research value and is the vein of cultural inheritance [9-10]. However, according to the survey, people in today’s society are mostly immersed in the cultural impact and entertainment atmosphere brought by the Internet, reading less and less ancient literature and paying less and less attention to the inheritance of traditional culture. Many ancient Chinese literary works have not been widely disseminated and are simply stored in libraries [11-12].

Ancient Chinese literature is an important part of Chinese civilization, and traditional Chinese culture is a product of history that still has an important influence on the development of today’s society [13-14]. As an effective way to make ancient Chinese literature come alive, dissemination can not only help the whole society to improve cultural literacy and firm cultural confidence, but also prompt people to explore the value of ancient Chinese literature in the new era and produce corresponding social benefits [15-16]. If ancient Chinese literature wants to be inherited, it is inevitable to utilize the power of the Internet and its own innovation, and if it wants to develop for a long time, modern media (i.e., new media) is the best cultural carrier [17].

The arrival of the new media era has changed the way of people’s work, study and life, which brings challenges to the dissemination of ancient Chinese literature as well as opportunities for sustainable development, and the society is paying more and more attention to how to increase the dissemination of ancient Chinese literature under the perspective of new media [18-19]. At the same time, it should be noted that there are two sides to everything, and the dissemination of ancient literature in the context of new media nowadays also presents corresponding problems. On the one hand, it is questionable whether the works created by the creators of ancient literature are adapted to the values of today’s new media era due to the role of educational concepts at that time. On the other hand, the content of the texts, language styles, symbolic references, target audiences, and ideological connotations of different historical periods have a certain degree of specificity, which inevitably cause a certain degree of difficult to interpret and misinterpretation of the inter-period dissemination, let alone the fragmentation and entertainment in the new media era. What’s more, the fragmentation and entertainment tendency in the new media era as well as the recommendation mechanism of technology have shown corresponding obstacles to the inheritance and presentation of ancient literary works [20-22]. Therefore, on this basis, exploring the related problems in the dissemination process of ancient Chinese literature, searching for the dissemination path of ancient Chinese literature in the current digital humanities perspective, and realizing the best effect of ancient Chinese literature in modern social life are the efforts and values of scholars, both in the practical significance and at the theoretical level [23-24].

In this paper, we design a text analysis model of ancient literary works and their dissemination based on text classification technology. Firstly, for the two problems of TF-IDF algorithm, which simply links word frequency and weight and does not take into account the influence of the different distribution of words in different documents on their weights, information entropy and relative entropy are applied to the keyword extraction algorithm, which reduces the influence of word frequency and inverse document frequency on the weight of words, and improves the accuracy of keyword mining. Use the model to analyze the word frequency and the writing connotation it represents in the classical work The West Wing. A comprehensive emotion dictionary is constructed and deep learning methods are used to analyze the pop-up comments of the ancient masterpiece Romance of the Three Kingdoms in film and TV dramas, to explore the audience’s emotional attitudes embedded in the pop-up texts, and accordingly to propose a path for the dissemination of ancient Chinese literature in the context of new media.

Text categorization techniques
Text pre-processing

Text preprocessing mainly consists of de-duplication, text segmentation and de-specialization. Text preprocessing is to divide the text into multiple words used to express the textual information, and to remove symbols or single stop words that do not play a role in categorization.

Text Segmentation

Segmentation is the first step before the text classification task is carried out. Text segmentation is a process of dividing a continuous sentence into multiple word collections according to certain text rules. Chinese text is composed of many words and some connectives, which can not be directly divided into words like English sentences, and the process of word separation will be much more complicated.

In this paper, we use python language comes with jieba lexical library to carry out news text lexical processing. jieba lexical use the existing control lexicon in the corpus to obtain the DAG graph (i.e. directed acyclic graph) of the sentence. According to the selected pattern, we search for the optimal and shortest paths to complete the sentence segmentation. As the Chinese expression itself has some problems, it brings these problems to Chinese word splitting:

Ambiguity problem: It means that there may be multiple semantics in Chinese semantics, and then there may be multiple ways to separate words in a text. Due to the ambiguity problem, then the computer can not get the most suitable way of processing. Therefore, the semantic ambiguity brought about by the word separation problem can not be avoided.

The problem of new word recognition: it refers to the appearance of some novel words or words that have not appeared in the sub-dictionary during the process of word recognition, such as names of people, places, and Internet terms. At the same time, these new words are updated quickly and have a strong timeliness, so the dictionary can not be fully included, and there is the problem of recognizing new words.

De-duplication of words

Since textual information often contains many stop words, it then increases the cost of computation. Therefore, text is often de-despended [25].

In general, stop words can be broadly divided into two categories, namely function words and vocabulary words. The main function words are “of”, “has”, “la”, “oops” and “yo”. The main vocabulary words are “want”, “think”, etc. These words exist extensively, but they generally do not bring useful information for the classification and recognition of texts. Since most of the stop words will increase the workload and reduce the classification efficiency in the text classification work, and most of the stop words have little effect on the classification, the removal of these stop words can be used to improve the efficiency of the classification work.

Word Vectors and Text Representation
Word2vec word vector modeling

Word2vec is a neural network-based language learning model that is trained to generate text word vectors. It is a model in the field of natural language processing proposed by Google around 2013 and serves to vectorize words. The mathematical vectors can then be mined for connections and correlations between words and analyzed quantitatively and modeled.

Due to the differences in model framework and predicted content, Word2vec word vector models mainly include CBOW and Skip-Gram models. CBOW model utilizes the content of a feature word to predict the context of this feature word, while Skip-Gram model utilizes the context of a to-be-predicted word to predict the word, which is the opposite of CBOW. After the training is done, each word can be mapped into a vector, which the word is represented as and can be used to calculate the correlation between words.

CBOW model predicts a given word by its context [26]. The mathematical representation of CBOW is shown in (1): P(Wt|τ(Wtk,Wtk+1,,Wi+k1,Wt+i))

Skip-Gram, on the other hand, predicts the context of a word by its current word, and the mathematical representation of Skip-gram is shown in (2): P(τ(Wtk,Wtk+1,,Wi+k1,Wt+i)|Wt)

The TF-IDF model

TF-IDF is composed of two parts, word frequency (TF) and inverse document frequency (IDF), by which it can be used to evaluate the importance of a feature word in a document in a document library. The word frequency is calculated as shown in equation (3) below: TFi,j=ni,jk=1knk,j

TFi,j represents the frequency of occurrence of word wi in document dj, ni,j is the number of occurrences of wi in document dj, the denominator is a summation equation of the frequency of occurrence of all the feature words in document dj, and k is the number of different words in document dj. The other part of the model, IDF, is calculated as shown in Equation (4): IDFi=log(nddf(d,wi)+1)

Where IDFi represents the reverse document frequency of feature word wi in document dj, nd is the number of all documents in the text base, df(d, w) is the number of documents containing feature word wi in the document base, and 1 is added in order to prevent the number of occurrences from being zero.

The idea of TF is that if a feature word appears more often in a document, then this feature word can be used as the keyword of the document, and it is very relevant to the text containing this keyword, and can be used as the feature vocabulary for classification. The idea of IDF indicates that if only few texts contain a certain feature word, then this keyword can be used as a classification feature word, which is more conducive to distinguishing different categories of texts. Generally, TF and IDF are used together to calculate the TF-IDF value, so as to achieve the purpose of text vectorization.

Improved TF-IDF word vector based on news text features

Since TF only considers the information contained in the text in terms of word frequency, it ignores the text structure information, such as the length of the text, the measure of the text, and so on. The classical TF calculation does not consider the length of the text and the problem of different quantifiers caused by the number of words contained in the text after word splitting.

In this paper, we consider two ways to improve TF:

The first is “standardization”, i.e., standardizing the traditional TF, which can eliminate the effect of the different lengths of the text on the scale to a certain extent.

The second is to consider that the original calculation of TF does not have a maximum upper threshold in the original definition. Although, it is generally believed that if a document contains a certain keyword many times, then generally speaking, this keyword may be more relevant to this document, but this relationship is almost difficult to show that the two must present a linear relationship. The linear growth tendency of TF is avoided by taking logarithms of TF and utilizing the characteristics of logarithmic functions in mathematical functions.

Based on improved TF-IDF keyword extraction algorithm
Traditional TF-IDF Keyword Extraction Algorithm
Basic Algorithmic Ideas

The TF-IDF keyword extraction algorithm uses two statistical features, word frequency and inverse document frequency, to calculate the weights of words, and ultimately ranks the words in the document based on the weights and selects the top-weighted words as the keywords of the document. In the traditional TFIDF, TF is the word frequency of the feature words appearing in the document, and IDF is the inverse document frequency, as shown in the following equation: TFIDF=TF*IDF=tf*log(Nnt+0.01)

Aiming at the defects of the traditional TFIDF algorithm, an improved TFIDF-AG algorithm is proposed, which calculates the TFIDF as follows: TFIDFAG=tf*lognt*log(Nnt+0.01)*[1j=1n(tftf^)2k1tf^]

where tf denotes the word frequency, tf^ denotes the average number of occurrences of the feature word in each document, N is the total number of documents, nt is the number of documents associated with the keyword, and k is the total number of documents in the other categories.

Shortcomings of the TF-IDF algorithm

TF-IDF for the calculation of keyword weights is still overly dependent on the original algorithm based on word frequency and inverse document frequency calculation of word weights, but also exposed some shortcomings, summarized in the following three points:

Simple linking of word frequency and weights

The TFIDF algorithm mentioned above obtains the document keywords mainly based on the statistical keyword word frequency, when a word appears frequently in the text, it means that there is a strong correlation between the word and the text, and at the same time, it relies on the inverse document frequency of the word to avoid the problem of overweighting of common words, however, there are several defects in the method: the inverse document frequency of the word will reduce the weights of all the common words in the text, especially in texts that are time-sensitive.

Failure to consider the impact of the different distribution of words in different documents on their weights

When using the TF-IDF algorithm, two important parameters need to be calculated: word frequency and inverse document frequency, for different words, when the value of these two parameters is the same, they are generally assigned the same weight, i.e., they have the same importance. However, when there are two words with the same word frequency and inverse document frequency, if one of the words is centrally distributed in more than one document of the same type, and the other is scattered in different types of documents, the weight of the centrally distributed words should be higher than that of the scattered distribution.

Failure to consider the location of the word information

In the extraction of keywords should be fully taken into account its location, the text of the first sentence and the last sentence of the summary statement, which contains the words of the important information of this document, should be given to the first sentence and the last sentence of the words in the higher weight.

Improved TF-IDF algorithm based on information theory

In order to cope with the keyword extraction task of massive text, this paper combines the two concepts of information entropy and relative entropy in information theory and introduces the word position weight factor at the same time, and puts forward the optimization scheme for the algorithm, and the flowchart of the improved algorithm in this paper is shown in Fig. 1.

Figure 1.

TFIDF-BOIT algorithm flowchart

The process of the improved algorithm is as follows:

Step 1: Load the text document dataset, calculate the word frequency and average word frequency of each word in the document set to be processed and save it.

Step 2: Judge whether the selected word has appeared, if it has appeared, go back to the previous step to add 1 to its frequency.

Step 3: Calculate the average word frequency, probability distribution, number of documents belonging to each word, calculate its information entropy, relative entropy, IDF value and save it as a collection of Map key pairs.

Step 4: Calculate the weight of each word based on the results obtained in the previous steps and sort each word based on the weight.

Information theory foundations

Entropy

In information theory the greater the uncertainty of a piece of information the greater its entropy value, and conversely if a thing contains clear information the lesser its entropy value. The formula for information entropy H(X) is as in (7): H(X)=E[I(X)]=E[lnP(X)]

The information entropy of X in Equation (7) is denoted as H(X) , E denotes the mathematical expectation, while I(X) denotes the self-information of X [27]. When the number of samples taken is limited, the calculation of entropy can be expressed by Equation (8): H(X)=iP(ai)logP(ai)

Conditional Entropy and Mutual Information

Entropy is used in information theory to measure information, if the uncertainty of a piece of information is high, relevant information about this piece of information needs to be provided to reduce its uncertainty. Based on this the concepts of conditional entropy and mutual information are introduced.

The formula for conditional entropy is shown in (9): H(A|B)=aA,bBP(a,b)logP(a|b)

Represents the joint distribution of events a, b as P(a, b), P(ab) represents the probability of event a occurring given the probability of event b occurring. This leads to the following conclusion: H(A|B)H(A)

Mutual information can be used to represent the correlation between two events, which is based on the same principle as relative entropy and is expressed as in equation (11): I(A,B)=P(a,b)logP(a,b)P(a)P(b)

Relative entropy

Relative entropy, also known as KL scatter or information scatter, is an asymmetric measure of the difference between two probability distributions, indicating the relevance of two things, which is defined as shown in Equation (12): K(φ(x),μ(x))=φ(x)logφ(x)μ(x)

The K(φ(x),μ(x)) obtained in the above equation calculated using φ(x) and μ(x) indicates the correlation between the two distributions. If φ(x) and μ(x) are perfectly similar then K(φ(x),μ(x))=0 . K(φ(x),μ(x))=1 if φ(x) and μ(x) are not similar.

Improved TF-IDF algorithm based on information theory

The total number of words in the text is N, the number of texts in the dataset is D, and the number of words in each text is S = N/D. The user query sentence is T, and a query statement includes several keywords. The weight corresponding to each keyword t is set to W, the probability of the keyword appearing in a text is set to μ(t), the probability of appearing in all texts is set to φ(t), and the keyword word frequency is tf(t). iv(t) is the information content of the keyword, and according to the concept of entropy there is the following formula (13): iv(t)=φ(t)*logφ(t)

In the above equation φ(t) denotes the probability of the keyword in the dataset, so the above equation can be further written as equation (14): iv(t)=tf(t)N*logtf(t)N

t1 and t2 are two keywords in the text, the number of times t1 appears in one text is equal to the number of times t2 appears in more than one text, which can’t simply record the IDF values of the two keywords as equal, based on the information theory The distribution of t1 is more centralized than that of t2, which means that t1 has a higher correlation with the text it is in. Based on the above analyzed formula (14) can be derived to get formula (15): iv(t)=tf(t)N*logtf(t)N=tf(t)logs*Dμ(t)*D(t)

While the TF-IDF algorithm is computed from TF*IDF, TF*IDF is denoted as in Eq. (16): TF*IDF=tf(t)logDD(t)

Equation (17) can be derived from equations (15) and (16): TFIDF(t)=iv(t)tf(t)logsμ(t)

Let the keyword relative entropy function be Eq. (18): f(x,y) = K(F(x),F(y))=F(x)logF(x)F(y) = F(x)*(logF(x)logF(y))

Assuming that the total number of words in the text dataset is Lsum, the number of texts is N, and the length of each text is Ln, the average length of the texts in the text set is expressed as Equation (19): Lave=LsumN

Setting the modifier of text length to δ1 and the modifier of word frequency to δ20 defines the word frequency control equation as (20): Tc=(δ2+1)*tftδ2+δ1+LiLavet+ft

The two modifier values in Eq. (20) are used to control the problem of high word frequency of words in long documents, when the length of the document exceeds the average length of the document, the larger the denominator, the smaller the Tc, which suppresses the problem of high word frequency of words in long documents.

After considering the information content and relative entropy of the keywords while normalizing the document and introducing the word position weight factor, the keyword weight model in Eq. (6) will be improved to Eq. (21): TFIDFBOIT(t)=(δ3+1)*(tftF(tft)logF(tft)F(tfavr))*iv(t) *(δ2+1)*tftδ2*δ1*LiLave+tft*lognt*log(Nnt+0.01)*[1j=1n(tfttf^)2k1tf^]

Process of emotional analysis of ancient literary communication

Taking the preprocessed dataset of classic works as the corpus, a comprehensive sentiment dictionary is built according to the method proposed in the previous section to prepare for the sentiment analysis process.

Next, the specific process of judging the sentiment polarity of classic works is introduced, as shown in Figure 2.

Figure 2.

The flowchart of the emotional value of the classic works

There are three key steps involved in this process: calculating sentiment scores for sentiment words, calculating sentiment scores for clauses, and calculating sentiment scores for complex sentences.

Assuming that the classical work is W, the data-cleaned classical work can be analyzed according to “.”, “?” and “!” Three Chinese punctuation marks are used to categorize the compound sentences, and each compound sentence in the text is defined as set W={W1,,Wi,,Wn} , where Wi denotes the ith compound sentence. At the same time, B is used to denote a single complex sentence, and each clause in a complex sentence is defined as set B={B1,,Bi,,Bn} , with Si denoting the inter-sentence rule for assigning sentiment weights to the clauses, with Ti denoting the sentence-type rule for assigning sentiment weights to the complex sentences, and with Score denoting the sentiment score.

First we use a comprehensive sentiment lexicon to compute the sentiment scores of sentiment words modified by negatives and adverbs of degree by matching the words in it and combining them with contextual information using specific word combination rules as shown in Equation (22): Score(Ci)=scoresentiment

Sentiment analysis is then performed for each clause Bi by summing the sentiment scores for each sentiment word Ci in the individual clauses and multiplying them by the weight Sl given to the clauses by the inter-sentence rule denoted as Score(Bi) . The formula is shown below: Score(Bi)=i=0nScore(Ci)×Si

The sentiment scores of the individual clauses are then summed and multiplied with the weight Ti assigned to the complex sentence by the sentence rule to obtain the sentiment score of the individual complex sentence, which is calculated using the formula shown below: Score(Wi)=i=0nScore(Bi)×Ti

Ultimately, by summing the sentiment scores of each compound sentence, we can obtain the sentiment score of the classic work, which is calculated by the formula shown below: Score(W)=i=0nScore(Wi)

Based on the obtained emotional scores of the classics, we can judge the emotional polarity of the classics, including positive, neutral and negative. The specific judgment rules are as follows:

If Score(W)>0 , the emotional polarity of the classic work is positive.

If Score(W)<0 , the emotional polarity of the classic work is negative.

If Score(W)=0 , the emotional polarity of the classic work is neutral.

Practical application of models
Word Frequency Analysis of the Ancient Literary Work “The West Wing

After removing the contents of the ancient dramatic literature “The West Wing” such as the table of contents, preface, character introduction and other contents that are not the content of the play or the stage effect, only the contents about the stage description and the characters’ language and action in “The West Wing” are imported. After importing, the highest word frequency of 100 words was obtained. After removing the words that represent the emitter of the language before the language of the drama characters and the words that give instructions for the characters to go up and down the stage, the top 20 words are shown in Fig. 3.

Figure 3.

High frequency vocabulary

Modal words appear more frequently, e.g. “oh” appears 164 times and “hum” appears 47 times. In addition to this, many phrases appeared, such as “outside sound”, “nothing”, “sigh”, “I told you”, etc. The performance of phrases becoming high-frequency words reflects that there are certain characteristics in the author’s writing language habits and the way characters are presented. And different types of literary works also present different vocabulary characteristics, such as “nothing” is a typical colloquial word, and “sigh” is a word that describes the actions of the characters, and it is also a high-frequency word in the drama “The Legend of the West Chamber”.

Next, we will analyze the distribution and characteristics of high-frequency words in the work. The most frequently used word is the word “no”, and Figure 4 shows the distribution of this word throughout the play.

Figure 4.

“No” in the full text

The word “no” runs throughout the drama in The Legend of the West Chamber, and its distribution is relatively even and dense. The statements that exist “no” were screened out separately, and it was found that these 172 occurrences were all used as separate words, and there were no cases of modifying them. Through the repeated application of this negative word, the author successfully shapes the depressing and depressing plot atmosphere and social environment, and reflects the characters’ struggle and cry against the shackles of the world.

The second most frequent word is “oh”, which appears 164 times, and is the most frequent modal word, as shown in Figure 5 for the distribution of this word throughout the play. The word “oh” is more evenly distributed in “The Legend of the West Chamber”, with a more concentrated front and a relatively sparse back. By using the KWIC function, the statements with “oh” were screened out separately, and it was found that the 164 occurrences of “oh” were consistent with the occurrence of “no”, all of which appeared as an independent particle and did not appear as a final particle word.

Figure 5.

“Oh” in the full text

The third most frequently used word is “mom”, which appears 143 times, and is the most frequent salutation, as shown in Figure 6. The term appears more intensively, but is scattered throughout the play. Screening the sentences with “mom” found that they are all sentences with the word “mom” appearing alone, and here only one different title is used to show the complex relationship between the characters, which shows the meticulousness of the author’s language.

Figure 6.

“Mum” in the full text

Emotional Analysis of Film and Television Communication of Ancient Literary Works

The Romance of the Three Kingdoms tells the story of the social unrest in the late Eastern Han Dynasty, the group of males fighting for the deer, and then the three points returned to the Jin Dynasty. The reasons for choosing “Romance of the Three Kingdoms” pop-ups for sentiment analysis are, firstly, “Romance of the Three Kingdoms” is a familiar classic to most people in China, and even those who haven’t read it in its entirety would be very familiar with some of the storylines in it, such as “Three Ties that Bind” in the Peach Garden, “The Burning of the Red Cliffs”, and “Tearful Chopping of Ma Su”, and so on. Secondly, in the ranking of total number of pop-ups, “Romance of the Three Kingdoms” ranked second in the list after “Foreign Daughter-in-Law”, and also ranked second in the ranking of average number of pop-ups per episode of TV series after “Shameful but Useful Escape”. Finally, Three Kingdoms contains ups and downs as well as rich emotions, which can be well analyzed emotionally.

Pop-up Sentiment Analysis of the First Episode of Romance of the Three Kingdoms

In this section, the pop-ups of the first episode of Romance of the Three Kingdoms are used as data for sentiment analysis. The pop-up data with no emotional tendency and positive emotional tendency account for most of the pop-ups, and the pop-up data with negative emotional tendency only accounts for a small proportion of the data, which only accounts for about 16%, while the no emotional tendency accounts for 46% and the positive emotional tendency accounts for 38%.

Figure 7 shows the distribution of pop-up emotions per minute in the TV drama. From the figure, we can see that at the beginning and the end of the visual drama, it is the place where the title song and the end song are played, and many people will choose to skip these two parts by themselves, so the number of pop-ups in these two parts appears to increase or decrease dramatically. From an overall perspective in addition to the opening and closing song parts, the number of pop-ups in the rest of the timeframe basically shows a flat state, and you can see that the audience’s love for this episode of the TV series has always been maintained at a high level.

Figure 7.

Every minute of the emotional distribution of the curtain

There are two watersheds in the chart, one around the 28-minute mark and the other around the 44-minute mark. 28 minutes ago, there were far more positive than negative pop-ups, as Liu Guanzhang and Zhang went from getting to know each other to bonding, and then attacking the Yellow Turban Army together, and the viewers were very satisfied with the episode and the characters that appeared in the episode, so the number of positive pop-ups during this time has remained high, while the number of negative pop-ups has remained high. At the 28th minute, the number of positive pop-ups drops sharply and the number of negative pop-ups increases sharply, and the ratio of positive to negative pop-ups is maintained until the 44th minute. This is the period when Liu Guanzhang and Zhang are bullied by the postman, and the viewers are indignant, so the number of negative tendency pop-ups increases sharply and the number of positive tendency pop-ups decreases. At 44 minutes, the number of positively inclined pop-ups increases and the number of negatively inclined pop-ups decreases sharply. This is the time when Liu Guanzhang whipped the postman, the audience finally released their anger towards the postman, and appreciated Liu Guanzhang’s behavior, so the number of positive inclined pop-ups increased dramatically and the number of negative inclined pop-ups decreased sharply at this time.

The trend of the audience’s liking for the plot is shown in Fig. 8, from which it can be clearly seen that there is a turnaround between 28 minutes and 44 minutes, and at the same time, it can be mainly realized that a place where the audience’s liking is the highest in addition to the beginning and the end of the plot is reached around 27 minutes, which coincides with the time of the familiar Peachland Triad.

Figure 8.

Audience’s tendency to love the plot

Comparative analysis with Water Margin

In order to continue to validate the accuracy of the trained sentiment analysis model, this section compares the audience’s liking for the first episode of Water Margin plot over here.

The sentiment trend of the audience for the first episode of Water Margin is shown in Figure 9. It can be clearly seen that the audience is happier to watch Romance of the Three Kingdoms in the first episode of the TV series which is also one of the four masterpieces.

Figure 9.

Audience’s tendency to love the plot

There is a trough at about 4 to 6 minutes, a peak at more than 6 to 8 minutes, a relatively smooth transition period from about 14 to 26 minutes, another trough from about 26 to 33 minutes, two troughs in 34 to 42 minutes, and finally a return to the climax.

The first trough coincides with the time when Goliath is still a joint punk, causing trouble in the streets with his foxes and friends. The first subsequent peak is when Wang Jin, who is not used to seeing Gao Zizi messing around on the street, hits Gao Zizi and his friends, and helps him out, which is a favorite plot of the audience. Subsequent episodes are in line with the trend. From the above analysis, it can be seen that the audience is used to the righteous people to fight for justice, do not like the villainous people do not like to see the good people being bullied.

By using this paper’s model to analyze two classical classic literature film and television dramas, it can be concluded that this paper’s model can capture the audience’s emotional changes more accurately.

Suggested paths for the dissemination of ancient Chinese literary works in the context of new media
Purify the communication environment and ensure high-quality communication content

In the context of new media, “everyone has a microphone” has become a reality. At the same time, however, it is necessary to realize that not everyone has the ability to disseminate effective information and create high-quality content. Ancient literature, as a fine piece of historical heritage, requires the participation of professionals. Therefore, it is of great significance to strengthen the supervision and cultivate professionals in the open new media environment to purify and standardize the communication space of ancient literature.

Self-regulation and other regulation

The combination of “self-discipline” and “self-discipline” is, to some extent, an effective measure to purify the new media communication environment of ancient literature and to improve the quality of the communication content of Chinese excellent traditional culture represented by ancient literature. As far as “self-discipline” is concerned, the secondary creation of ancient Chinese literature should be based on historical standards and facts. Content creators, publishers, audiences and relevant government departments should bear in mind their historical mission and firmly resist erroneous views.

Encouraging talent participation and strengthening professional teams

Among them, new media provide advantages for the dissemination of ancient literature and put forward higher requirements for dissemination. It is not only necessary for the main body of communication in the new era to have a certain understanding and appreciation of ancient literature in terms of content, and be familiar with the traditional communication history and experience of ancient literature classics, but also to have a keen audience perception and foresight in the use of new media technology, and master the editing and production and communication process, so as to strengthen the dissemination effect of ancient literature.

Developing cultural heritage awareness and strengthening mainstream communication channels
Strengthening reading awareness and popularization activities

In the face of insufficient dissemination of ancient literature in the new media environment, fragmented reading restricts the understanding of ancient literature. The state, society and even individuals should carry forward the sense of responsibility for cultural inheritance, take the initiative to utilize new media forms to carry out original reading and “deep reading” training, and consciously strengthen the in-depth study of classical literature. Taking the National Library and Himalaya APP as examples, the National Library can cooperate with libraries and community centers in universities and colleges to carry out public welfare activities by virtue of information technology, so as to enhance students’ and other social groups’ understanding of and learning of traditional cultural resources represented by ancient literature.

Adherence to media campaigns and strengthening of platforms

Actively utilizing different new media platforms to enhance the presentation of ancient literary resources is also a current communication focus. Social platforms such as microblogging and WeChat, and short video platforms such as Jitterbug and Shutterbug, as mainstream communication channels, are already effective ways for people to obtain information and promote ancient literary resources. Official organizations, cultural institutions and publishing houses can open accounts on these new media platforms to interpret and promote ancient literary works by writing articles, releasing short and long videos, and opening live broadcasts, thereby attracting young readers and non-student groups to read, pay attention to and disseminate them.

Enhancing the utilization of new media and improving online communication methods
Enhance the sense of innovation and develop diversified art forms

According to different media characteristics to express different content, differentiated product strategy for different audiences, to avoid fierce competition with similar products of other enterprises, in fact, is a kind of long-tail effect. Using the integration of technology and art to differentiate the different types of ancient literature is an important way to strengthen the utilization of new media and enhance the communication effect. Among them, focusing on the development of diversified forms is the prerequisite for differentiated communication.

Implementation of brand effect, online and offline dual-track communication

As a product of history and culture, the traditional publishing experience of ancient literary resources is more mature than that of the new media period. Therefore, publishing houses have the natural advantage of ancient literary resources. In this context, publishers should actively carry out digital transformation and strengthen their cultural responsibility and commercial operation thinking. In order to comply with the laws of cultural and commercial promotion under the background of new media, and to maximize the cultural and economic benefits of ancient literary resources. At the same time, the original communication channels and methods should not be abandoned, paper books, periodicals and newspapers, radio and television are also the main communication forms of ancient literary resources in the new media era. This is of great significance in perfecting the diversified communication paths and forming internal benign interaction.

Conclusion

Based on text classification technology, this paper designs an improved TF-IDF keyword extraction algorithm to realize the word frequency and sentiment analysis of ancient literary works and their transmission paths. In applying the model to the analysis of the classic ancient dramatic literary work “The Story of the Western Chamber”, it is found that the frequency of tone words is high, for example, “oh” appears 164 times and “hmph” appears 47 times, which shows the author’s linguistic habits and the image of the characters. This shows the author’s language habits and careful design of the characters’ images. Then we analyze the pop-up texts of the famous novel Romance of the Three Kingdoms, and find that the proportion and time of pop-ups in the three categories of positive, negative, and unbiased are closely related to the plot, and the online communication path can realize the emotional resonance between ancient literary works and the audience. Finally, we propose a path for the dissemination of ancient Chinese literature in the context of new media.

Language:
English