Research on Red Cultural Inheritance and Application of SVM Support Vector Machine in Sentiment Analysis

Red culture is a form of culture created by the Chinese people during the revolution and construction, with distinctive characteristics of the times and historical significance. Red culture is a critical culture, emphasizing the criticism of reality and the transformation of reality. This culture has a strong practical significance, and to a certain extent, it has promoted the process of China’s modernization [1-4]. Maintaining and inheriting the red culture is an important task in China and an important content in promoting the construction of socialist core values, which is of great significance in carrying forward the socialist core values and promoting the national spirit and national cohesion [5-8]. In the context of the new era, by strengthening red education, carrying out red cultural traditional festival activities, and promoting the spirit of red culture in order to ensure the effective inheritance and promotion of red culture [9-11].

Sentiment analysis refers to identifying the emotional tendency of text from a large amount of text data, which is categorized into positive, negative and neutral. Sentiment analysis technology includes natural language processing, machine learning and data mining, etc., and the current commonly used method is based on machine learning [12-15]. Sentiment analysis has a wide range of application scenarios, such as: monitoring users’ evaluation of brand products or services, analyzing consumers’ interest and purchase behavior, and predicting stock market sentiment [16-18]. SVM is a supervised learning method that can be used for classification and regression to solve binary or multivariate data classification problems. The core idea of SVM is to construct a space to separate different data [19-21]. SVM is widely used in sentiment analysis, which is mainly classified into two aspects: one is SVM method based on word frequency, which uses text categorization algorithms for sentiment analysis. The second is word2vec based SVM method, which uses word vectors for sentiment analysis [22-24].

This paper proposes about machine learning text emotion classification process, respectively, support vector machine, text emotion classification to briefly describe. Introducing the red cultural heritage, analyzing the development of red cultural heritage in the all-media era and the problems faced. As a result, we designed a prediction model for the inheritance tendency of red culture based on SVM, collected and organized corpus information, selected the NLPIR Chinese lexical system, constructed feature vectors, and determined the trend. Design a multilevel SVM classification model using a sentiment classification scheme. Take the text of online comments as training data and analyze the performance of the SVM classifier before and after optimization. Combine the online comment texts on red culture festivals to obtain the probability of netizens’ tendency towards red culture inheritance.

2

Foundations of sentiment analysis

2.1

Machine Learning Based Text Sentiment Classification

Text sentiment propensity classification is a popular research direction in the field of text categorization, and the current mainstream research method is based on statistical-based machine learning methods. The objects of the sentiment classification research conducted in this paper are all Chinese texts (red culture), and the overall process of text sentiment classification is shown in Figure 1.

The process of text emotion tendency classification is mainly divided into the following steps: 1)

Text pre-processing. After capturing the text, preprocessing work is carried out, including document paragraph merging, cutting, text segmentation, deactivation filtering and so on.

2)

Feature extraction. In the process of classifying the emotional tendency of text based on machine learning, the feature extraction part plays a crucial role, and the part that best represents the emotional characteristics of the text should be extracted.

3)

Training phase. After extracting text features, these samples with category labels should be used as training data. Let the computer automatically learn the rules of classification, and get a classifier model through learning. That is, a decision function that can accept the input of new samples without category labels and thus output their categories.

4)

Testing phase. The testing phase is the last phase of the classification module, where the test is conducted on new unlabeled samples. That is, the classifier model generated during the training phase is used to automatically discriminate the category of unlabeled samples. By comparing the discriminative labels of the classifier model with the real sample labels, the advantages and disadvantages of the classification effect of the classifier can be derived, and then the performance of the classifier can be measured and evaluated.

2.2

Support Vector Machines (SVM)

Support Vector Machine (SVM) has a wide range of applications. It has a solid theoretical foundation in statistics. It has played an important role in the fields of text classification, handwriting recognition, tampered image detection, etc. The main objective of SVM algorithms is to find an optimal classification hyperplane, which can be reduced to solving an optimization problem [25-26].

Optimization problem is a branch of applied mathematics, which studies how to find the optimal value of the objective function without constraints or under limited constraints. According to the different objective functions and constraints, optimization problems can be divided into the following three categories: 1)

Unconstrained optimization problems, as shown in equation (1): (1) $\min f (x)$ $$\min f(x)$$

where f(x) is the objective function whose minimum value is required. The general solution is based on Fermat’s theorem, the derivation of f(x), so that it is 0, so as to find the candidate value, and then verify. 2)

The optimization problem with equation constraints, the objective function and constraints are shown in Eq. (2): (2) $\begin{matrix} \min f (x) \\ s u b j e c t t o h_{i} (x) = 0; i = 1, 2, \dots n \end{matrix}$ $$\begin{array}{c} \min f(x) \\ subjectto{h_i}(x) = 0;i = 1,2, \ldots n \\ \end{array}$$

where f(x) is the objective function and h_i(x) = 0 is the equation constraints. For this type of optimization problem, it can be solved in most cases by using the Lagrange multiplier method. That is, the equational constraints and the objective function can be combined into one equation by a Lagrange factor α, as shown in Equation (3): (3) $L (α, x) = f (x) + α h (x)$ $$L(\alpha ,x) = f(x) + \alpha h(x)$$

Then the derivation of each parameter of L(α, x) makes it 0, and the joint equation is solved. 3)

The optimization problem with inequality constraints, the objective function and constraints are shown in Eq. (4): (4) $\begin{matrix} \min f (x) \\ s u b j e c t t o g_{i} (x) \leq 0; i = 1, 2, \dots n \\ h_{j} (x) = 0; j = 1, 2, \dots m \end{matrix}$ $$\begin{array}{c} \min f(x) \\ subjectto{g_i}(x) \leq 0;i = 1,2, \ldots n \\ {h_j}(x) = 0;j = 1,2, \ldots m \\ \end{array}$$

where f(x) is the objective function. g_i(x) ≤ 0 is the inequality constraints, and h_j(x) = 0 is the equality constraints. For such optimization problems containing inequalities, they are generally solved using KKT constraints.The KKT conditional method is a generalization of the Lagrange multiplier method, which can be used to write the inequality constraints, the equation constraints, and the objective function all in a single equation by using two parameters as shown in Eq. (5): (5) $L (a, b, x) = f (x) + a \cdot g (x) + b \cdot h (x)$ $$L(a,b,x) = f(x) + a \cdot g(x) + b \cdot h(x)$$

The KKT condition is the requirement that the optimal value must satisfy the following three conditions, as follows: 1)

L(a, b, x) is derived as 0 for x.

2)

h(x) = 0.

3)

a · g(x) = 0.

The candidate optimal value is obtained by solving the above three equations.

2.3

Text Sentiment Analysis

2.3.1

Text model

Latent Semantic Analysis (LSA) is also known as Latent Semantic Indexing [27]. The essence of the latent semantic analysis method is to find out the real semantics of the words in the document, and then to mine out the document’s word-independent topics, i.e., latent semantic topics. Thus, it solves the problems and deficiencies associated with the inability to consider latent semantics in the vector space model. Specifically, a large collection of documents is modeled using a reasonable dimension. And the process of representing the documents in this space is based on Singular Value Decomposition (SVD) and dimensionality reduction. Dimension reduction is the most important step in LSA analysis. Through dimensionality reduction, the “noise”, i.e., irrelevant information in the document is removed, so that the semantic structure is gradually presented. Compared to the VSM model, dimensionality reduction reduces computational effort and clears the semantic relationship.

With SVD, the previous document-word covariance matrix A can be subset into three matrices: (6) $X = U \sum V^{T}$ $$X = U\sum {{V^T}}$$

where U and V are the west matrices of X and Σ is the singular value matrix.

For matrix Σ, the lower order approximation matrix of matrix X is obtained by setting its smallest r − k singular values to 0, which gives Σ_k Then the lower order approximation matrix of matrix 8 is obtained by Eq. (7): (7) $C_{k} = U \sum_{k} V^{T}$ $${C_k} = U\sum\limits_k {{V^T}}$$

The general steps of LSA can be summarized as follows:

Step 1: Analyze the document collection and establish the “document-feature word” matrix (TD).

Step 2: Perform singular value diversity (SVD) on the TD matrix.

Step 3: Perform dimensionality reduction on the matrix after SVD singular value diversity, that is, the low-order approximation mentioned earlier.

Step 4: Use the reduced matrix to construct the latent semantic space or reconstruct the TD matrix.

2.3.2

Automatic text categorization techniques

Categorizing text is also a process of creating a mapping of text to categories. It maps the text to be categorized into the existing categories, and the mapping can be a one-to-one mapping or a one-to-many mapping. Because a text can relate to more than one topic. The mathematical description is as follows: (8) $f : A \to B$ $$f:A \to B$$

Where A is the collection of text data to be classified, B is the collection of categories in the classification system, and f is the mapping rule.

The mapping rule f of automatic text classification is the discriminative formula and discriminative rule established by the system according to the data information of a number of samples in each category that has been mastered to summarize the regularity of classification. Then when a new text is encountered, the category to which the text belongs is determined according to the summarized discrimination rules. 1)

Text Preprocessing

The primary work of text preprocessing is to deal with noise information and irregular information. Take web page text as an example, there will be a large amount of HTML markup information after it is acquired. This information is used for the layout and display of the web page text, but it is basically of no value to the text. After removing this kind of meaningless interference information, it is the second step in text preprocessing - word separation. The difficulty of word separation is to determine the smallest element of the vocabulary, that is, the most basic semantic unit.

Therefore, by converting all capital letters appearing in the text into lowercase forms, and by using non-alphabetic characters such as spaces, punctuation marks, etc. as separators, the text can be easily converted into a tabular list composed of semantic units (words).

2)

Text Representation

The more widely used weight calculation methods in automatic text categorization include Boolean weight, word frequency weight, TF*IDF weight, TFC weight, LTC weight and entropy weight. Among them, TF*IDF-weight is the most widely used numerical weight calculation method in the field of text processing. This method is based on the following reasons: ① The more times feature i appears in document j, the more important it is. The more documents in the document set that contain feature i, the less important it is. TF*IDF method is based on the idea and construction of statistics are very simple, and in practical applications have shown good performance. It has a variety of calculation methods, the more commonly used for the current formula (9): (9) $a_{i j} = f_{i j} \times \log (\frac{N}{n_{i}})$ $${a_{ij}} = {f_{ij}} \times \log \left( {\frac{N}{{{n_i}}}} \right)$$

where f_ij denotes the frequency of occurrence of word i in document j, N is the total number of texts in the text collection, and n_i is the total number of occurrences of word i in the text collection. 3)

Text features

When representing text as a feature vector, the original feature set consists of all words that appear in the text set. There are two main purposes for performing feature dimensionality reduction. First, if training and classification are performed directly on such a high-dimensional feature space, the amount of computation is too large. Dimensionality reduction can improve the execution efficiency and running speed of the program. Second, all words have different meanings for text categorization. Some generalized lexical entries, which are prevalent in all classes, contribute little to the classification. Phrases that appear in a specific class with a large proportion and in other classes with a small proportion contribute to text categorization, and dimensionality reduction improves the generalization ability of the classifier.

Feature dimensionality reduction is the selection of a true subset from the original document set $T = {t_{1}, t_{2}, \dots t_{s}}$ $$T = \left\{ {{t_1},{t_2}, \cdots {t_s}} \right\}$$: (10) $T' = {t_{n}, t_{p_{2}}, \dots t_{p_{s}}}$ $$T\prime = \left\{ {{t_n},{t_{{p_2}}}, \cdots {t_{{p_s}}}} \right\}$$

satisfies p_s ≤ s. where s is the size of the original feature set and p_s is the size of the feature set after dimensionality reduction. The criterion for selection is that the classification accuracy can be effectively improved after feature dimensionality reduction. Feature dimensionality reduction does not change the nature of the original feature space, but only selects a part of the important features from the original feature space to form a new low-dimensional space.

3

SVM-based prediction model of red cultural heritage

3.1

Red Cultural Heritage

1)

Connotation of Red Culture

The connotation of red culture is premised on a clear understanding of the imagery expressed by “red”. The imagery can be summarized in two aspects. First, the traditional meaning of red in the hearts of Chinese children. The second is the symbolism of red in the international communist movement.

In order to deeply understand the red culture, we should also excavate its core elements and realize its essence. First, the red culture embodies the lofty beliefs of the Communist Party.

Secondly, red culture embodies the pursuit of the Communist Party’s mission. Always concerned about the fundamental interests of the people, always concerned about the future destiny of the Chinese nation, the Communist Party bravely shoulders the responsibility entrusted by the times and becomes the most reliable person of the people.

Finally, the red culture manifests the fine tradition of the Communist Party. In the process of the Party’s development from weak to strong, the Communists have refined the qualities of hard struggle, courageous sacrifice, and innovation. The fine tradition for the red culture has planted the inner gene.

2)

Extension of red culture

On the basis of a correct understanding of the connotation of red culture, the extension of red culture is further clarified in terms of the phasing of time clues and the classification of different forms.

3.1.1

Red Cultural Inheritance in the All-Media Era

In the all-media era, certain technical conditions and changes in the social environment have raised new problems and brought new challenges to the inheritance of red culture. To do a good job of red cultural heritage work, we should summarize the past results on the basis of an in-depth study of the current outstanding problems, analyze the new trends, the new environment contains opportunities for development, and then grasp the favorable factors to improve the relevant work.

In recent years, the form of red culture carriers has been further expanded, gradually developing into various thematic educational activities that incorporate the characteristics of the new era. For example, in 2018, the Central Committee of the Communist Youth League organized and launched a series of learning activities for the youth, which included online knowledge contests, essay contests, speech contests, fun question and answer contests, and other forms, to further promote the normalization of red culture education, which is rich in connotations and has a remarkable effect.

Through TV programs, graphic news, webcasts and other channels, red culture has entered grass-roots organizations in various fields such as enterprises and public institutions, rural areas and communities, gradually taking the point to lead the way, breaking the audience limitations, and creating a trend in the society, so that more people come into contact with and enjoy red culture.

3.1.2

Problems facing the inheritance of red culture

In recent years, the Party and the State have attached great importance to the inheritance of red culture and given support in various aspects, but deficiencies in theoretical construction and social synergy still exist. People’s understanding of red culture is a little mixed, and their attitudes towards inheritance work are also very different. Analyzing the reasons, the inheritance of red culture mainly exists in the following four aspects. 1)

Changes in audience thinking

From the audience’s point of view, the red culture inheritance work is facing the deconstruction of the negative factors in the modern ideological concepts. Since the reform and opening up, politics, economy, culture, society and other fields have ushered in great changes, and the ever-changing information technology and high-tech power has profoundly changed all aspects of people’s food, clothing, housing and transportation. To a certain extent, this has reshaped people’s values, ways of thinking, and modes of action, and has brought new challenges to the inheritance of red culture.

2)

Lack of Innovation in Narrative Mode

From the perspective of narrative mode, the inheritance of red culture faces the problems of outdated discourse content and stereotyped expression. If the narrative of a culture can not keep up with the development of the times, then even if it is more advanced and excellent, it will not be able to get long-term development, and the inheritance and continuation of the culture will be impossible to talk about.

3)

The discourse system needs to be improved

From the perspective of discourse construction, the red cultural heritage is facing challenges due to multiple trends and the impact of Western ideology. As a Marxist ideology, the improvement of the discourse system of red culture is related to the flag and the soul.

The reality is that in the era of all-media, the developed media platform integrates the functions of information acquisition, processing, production, and dissemination, resulting in a fundamental change in the way of modern information dissemination. The immediacy, openness, and interactivity of media are prominent advantages over traditional media, providing people with more freedom of speech. This freedom has completely subverted the concept of “gatekeeper” in the process of information dissemination, so that different information and discourse expressions, such as mainstream and non-mainstream, elegant and vulgar, positive and negative, coexist in the society at the same time. The power of discourse has been slowly spreading from official institutions to the general public, and multiple voices have been exchanged and clashed. This has created a complex situation of mixed messages, weakened the dominant power of the red discourse, and made the inheritance of red culture difficult.

3.2

Hierarchical SVM prediction model of red cultural inheritance tendency

This study mainly includes data preprocessing and model training, as well as the corresponding parameter optimization strategy. The framework for the red cultural inheritance tendency prediction model based on SVM is shown in Figure 2.

The specific process of data preprocessing is shown in part A of the figure. Each webpage text S to be tested is used as model input, and the input webpage text is preprocessed. The output of the data preprocessing process is the vectorized representation of the webpage text to be tested.

The main part of the hierarchical SVM model is given in part B of the figure. The vector output from part A is the input to part B. After the prediction modeling, the probability of the user’s tendency to red cultural inheritance is output.

3.2.1

Corpus collection and pre-processing

A social media webpage is used as the experimental data and analysis object, and the experiments use Python’s BeautifulSoup as the page element crawling and parsing tool, and adopt the parallelization crawling strategy to obtain the Chinese text data.

The text of webpages within a certain period of time is collected as the original dataset. Due to the wide range of web page text data types, the filtering measures in the data crawling stage are not good enough to obtain the experimental data. Therefore, the crawled data needs to be manually filtered and labeled.

The experimental data contains the following information: user ID, homepage address, text content, release date (specific time period), and URL. It is important to state that the methods and data used in the text are privacy-protected.

Although the number of webpage texts is certain, the complexity and diversity of Chinese expressions lead to a certain degree of redundancy in the collected data. Therefore, the data needs to be preprocessed so as to extract sentiment information from unstructured data.

In order to more accurately detect whether the text content of web pages contains red cultural inheritance tendency, this paper focuses on the field of psychology, while taking into account the Internet terminology and Internet spoken expressions, etc., collects words and other expressions related to red cultural inheritance, and establishes a hierarchical classification scheme that includes different mental states.

Based on the sentiment classification scheme, each webpage text is manually labeled according to its content.

After labeling each data text for classification and initial preprocessing, the processed data is vectorized using Word2Vec, i.e., each preprocessed web page text is converted to $V i (x_{1}, x_{2}, x_{3}, x_{4} \dots x_{k})$ $$Vi\left( {{x_1},{x_2},{x_3},{x_4} \ldots {x_k}} \right)$$ and computed using a matrix. In this experiment, similarity between words is also taken into account to find similar words in different webpage texts. As for the calculation of similarity, this paper adopts the Euclidean distance calculation method, as in Equation (11): (11) $S = \sum_{i = 1}^{k} \sqrt{{(x_{1} - x_{2})}^{2}}$ $$S = \sum\limits_{i = 1}^k {\sqrt {{{\left( {{x_1} - {x_2}} \right)}^2}} }$$

where similarity is p = 1/(1 + S). For the number of similar words that are assigned the same weight and classified in the same subcategory, i.e., Eq. k value, can be customized.

3.2.2

Hierarchical SVM modeling

The basic model of SVM is to find the best hyperplane in a particular space for solving binary classification problems. However, with the deepening of research, the model can be used to solve nonlinear problems after the introduction of kernel function. In this paper, a 3-layer hierarchical classification model is constructed based on SVM according to the sentiment classification scheme to predict the probability of web users’ tendency to inherit red culture. The model uses the default RBF kernel function. In this case, the classification target corresponding to each layer of classification is based on the neighboring layer of classification. For the first layer of classification the segmentation hyperplane representation is shown in Equation (12): (12) $f_{1} (x) = ω_{1}^{T} φ (x) + b_{1}$ $${f_1}(x) = \omega _1^T\varphi (x) + {b_1}$$

where ω_t is the normal vector, which determines the direction of the separating plane. b_t is the displacement term, which determines the distance between the separating plane and the origin. The goal of this separating hyperplane classification is to determine whether the web page text content is emotionally relevant. Similarly, the separation hyperplane representation for the second level of classification is shown in Equation (13): (13) $f_{2} (x) = ω_{2}^{T} φ (x) + b_{2}$ $${f_2}(x) = \omega _2^T\varphi (x) + {b_2}$$

Where the constant b₂ has the same meaning as the corresponding meaning of equation (12), the determination of this hyperplane is based on the first level of classification. The main goal is used to classify web texts containing negative and non-negative emotions. The third layer of classification is the most important part of this experiment. By filtering the first two layers of classification, the data obtained is for each layer of classification, which can be transformed into its dyadic problem solving in order to get the hyperplane of maximum interval division. In this experiment, the Lagrange multiplier algorithm is utilized to obtain its dual problem, which is shown in Equation (14): (14) $L (ω, b, α) = \frac{1}{2} {‖ ω ‖}^{2} + \sum α_{i} (1 - y_{i} (ω^{T} x_{i} + b))$ $$L(\omega ,b,\alpha ) = \frac{1}{2}{\left\| \omega \right\|^2} + \sum {{\alpha _i}} \left( {1 - {y_i}\left( {{\omega ^T}{x_i} + b} \right)} \right)$$

The heart of the method lies in the derivation of the function L with respect to the relevant variables, and the dyadic problem that is ultimately transformed into is Eq. (15): (15) $\begin{matrix} \max_{α} \sum_{i = 1}^{m} α_{i} - \frac{1}{2} \sum_{i = 1}^{m} \sum_{i = 0}^{m} α_{i} α_{j} y_{i} y_{j} x_{i}^{T} x_{j} \\ s . t . \sum_{i = 0}^{m} α_{i} y_{i}, i = 1, 2, 3 \dots m \end{matrix}$ $$\begin{array}{l} {\max _\alpha }\sum\limits_{i = 1}^m {{\alpha _i}} - \frac{1}{2}\sum\limits_{i = 1}^m {\sum\limits_{i = 0}^m {{\alpha _i}} } {\alpha _j}{y_i}{y_j}x_i^T{x_j} \\ s.t.\sum\limits_{i = 0}^m {{\alpha _i}} {y_i},i = 1,2,3 \cdots m \\ \end{array}$$

After several experiments, it is found that parameter optimization can have a significant impact on the data analysis results. For SVM models with RBF kernel function, parameters C and γ need to be optimized. Among them, C is the tolerance level of classification error, and γ is a self-contained parameter of the kernel function. The higher value of parameter C indicates the more stringent requirements for model accuracy, and parameter γ determines the distribution of values after the original data is mapped into the feature space of different dimensions. With the use of the kernel function, the dyadic problem corresponding to Eq. (15) can be converted to Eq. (16): (16) $\begin{matrix} \max_{α} \sum_{i = 1}^{m} α_{i} - \frac{1}{2} \sum_{i = 1}^{m} \sum_{i = 0}^{m} α_{i} α_{j} y_{i} y_{j} φ (x_{i}^{T}) φ (x_{j}) \\ s . t . \sum_{i = 0}^{m} α_{i} y_{i}, i = 1, 2, 3 \dots m \end{matrix}$ $$\begin{array}{c} {\max _\alpha }\sum\limits_{i = 1}^m {{\alpha _i}} - \frac{1}{2}\sum\limits_{i = 1}^m {\sum\limits_{i = 0}^m {{\alpha _i}} } {\alpha _j}{y_i}{y_j}\varphi \left( {x_i^T} \right)\varphi \left( {{x_j}} \right) \\ s.t.\sum\limits_{i = 0}^m {{\alpha _i}} {y_i},i = 1,2,3 \cdots m \\ \end{array}$$

where φ(x) is the kernel function used.

3.3

Experimental validation and analysis of the sentiment prediction model

3.3.1

Preparation of experimental data

A webpage comment text dataset is used and 1500 training data are selected first. Positive and negative text data are balanced and used to train the support vector machine. Radial basis kernel function is used for the kernel parameters of the support vector machine. One set is using the parameters obtained by default in the model of the experimental tool, while the other set is using the optimal combination of parameters found by the genetic algorithm. Three test sets are created, each with 200 test data with identical categories. The two sets of classifier models for sentiment categorization are evaluated for their effectiveness.

3.3.2

Model analysis

Here is a comparison of the performance of the optimized SVM classifier. There are three sets of test data, and the comparison of the two classification results is shown in Figure 3.

The analysis on the experimental data shows that in the three evaluation metrics of Accuracy, Recall and F-measure value, the Hierarchical Support Vector Machine classification results are better and there is some performance improvement in all the three metrics. The mean values of the three evaluation metrics of the hierarchical support vector machine are around 90, which is higher than the mean value of the SVM model before optimization. And the results of all three sets of data are better than the support vector machine model with default parameters, indicating that the RBF kernel function can find a better combination of parameters in the results of parameter optimization search, which can effectively improve the classification performance of the classifier.

3.4

Results of the prediction of the tendency to inherit red culture

3.4.1

Algorithm design

1)

NLPIR Chinese Word Segmentation System

In this paper, NLPIR participle system is used. Among them, lexical annotation, named entity recognition, and user dictionary are all within the functional scope of Chinese participle system. It supports GBK encoding, UTF8 encoding, and BIG5 encoding.

Realization of automatic adaptation of participles requires automatic discovery of new words based on information cross entropy combined with feature phrases in slightly long text sentences, and the distribution model is realized by automatic adaptation to test the predicted linguistic probability, which is the function of emotional new word discovery and automatic adaptation of participles.

2)

Constructing Feature Vector and Determining Propensity

STEP1: After preprocessing the text of a social web page, it is necessary to construct feature vectors, which are applied to train SVM classifiers, and finally generate a classification model.

The lexicon used in this paper is a university’s sentiment_ontology, which contains 25830 sentiment words, and the given lexicon includes lexical properties, number of lexical meanings, sentiment categorization, intensity, and polarity values. Some of the dictionaries are provided as shown in Table 1.

Table 1.

Partial emotional dictionary screenshot

Word	Lexical	Meaning	Class number	Affective classification	Strength	Polarity
Dingy	adj	1	1	NN	7	2
Premature failure	adj	1	1	NE	5	1
Reprove	verb	1	1	NN	5	2
Thief eye	noun	1	1	NN	4	2
War	noun	1	1	ND	3	2
Clear roughness	adj	1	1	PH	5	0
Limpid	adj	1	1	PH	5	1

STEP2: The new words recognized by the segmentation tool need to go to the dictionary to find the polarity and intensity.

STEP3: The construction of feature vectors need to transform the intensity and polarity, stipulating that the intensity are divided by ten to get the value of sentiment word intensity. For polarity greater than 1 is designated as -1 (i.e., negative), polarity equal to 0 is designated as 0 (i.e., neutral), and polarity equal to 1 is designated as 1 (i.e., positive).

STEP4: The SVM classifier is used to train the data, and the vectorized training data needs to be trained to construct the viewpoint sentence extraction model. The constructed model is then used to classify the test corpus.

3.4.2

Data analysis

The crawled daily comment statements are brought into the comment1 of the hierarchical support vector machine for initial evaluation. If the output result of comment1 is 1, the output is changed to comment2. If the output result is 0, go to comment3. If the output result of comment2 is 1, it indicates a positive comment. If the result is 0, go to comment4. If the output result of comment3 is 1, it indicates a negative comment. If the result is 0, it indicates a negative review. If the output of comment4 is 1, it indicates a positive comment. If the result is 0, it means a neutral comment. Table 2 lists the detailed results of information collection. During the survey period from May 1 to May 9, the total crawl data of positive comments, neutral comments, and negative reviews were 4360, 1462, and 2953, respectively. The proportion of positive comments may be due to the fact that the survey period coincides with Labor Day and Youth Day, and online comments show a positive communication atmosphere. The daily accuracy of the hierarchical support vector machine proposed in this paper for daily comment statements is in the range of 85% to 89%, respectively, and the accuracy of daily comment classification is high.

Table 2.

Information collection details

	Date	Positive	Slightly positive	Neutrality	Slightly negative	Negative	Total	Daily accuracy
1	5/1	495	554	215	365	456	2085	89.824
2	5/2	635	512	123	214	424	1908	87.073
3	5/3	726	531	116	135	359	1867	89.364
4	5/4	615	193	154	256	461	1679	87.221
5	5/5	232	472	168	413	547	1832	89.307
6	5/6	425	268	121	149	132	1095	85.226
7	5/7	546	409	137	335	191	1618	88.919
8	5/8	387	514	256	215	185	1557	85.423
9	5/9	299	327	172	412	198	1408	87.341

An overall count of each sentiment type in the comment data reveals that positive sentiments are more prominent, mainly due to the positive guidance of traditional festivals and the promotion of red cultural festivals in the official media. The overall sentiment tends to be positive. The trend of the number of posted sentiments of daily comments is shown in Figure 4. From the plumb line graph of the number of daily comments on sentiment, it can be seen that May 1-May 4 is dominated by positive sentiment comments, and the discussion about Labor Day and Youth Day dominates this social media network.

In order to be able to further understand the main concerns of web users about the comments on the traditional festival of red culture - May 4 Youth Day, this paper counts the top 20 high-frequency keywords appearing in the comments. The high-frequency words of online comments are shown in Table 3.

Table 3.

Internet comment high frequency vocabulary

Number	Key words	Serial number	Number	Key words	Serial number
1	Youth talk	854	11	Youth dream	132
2	Holiday	792	12	Red culture	102
3	Youth	654	13	Cultural heritage	86
4	Youth festival	412	14	Youth education	81
5	May 4 commemorative activities	301	15	Traditional festival	75
6	May Fourth movement	256	16	Origin of festival	72
7	The origin of youth festival	221	17	Theme activity	69
8	A message of the youth day	217	18	Historical figure	64
9	Youth activity	185	19	Historical event	52
10	The history of the May 4 movement	164	20	Youth image	37

The top 10 keywords are “Youth Propaganda”, “Holiday”, “Youth”, “Youth Day Message”, “May Fourth Commemorative Activities”, “May Fourth Movement”, “Origin of Youth Day”, “Youth Day Greetings”, “Youth Day Activities”, and “The Historical Significance of the May Fourth Movement”. These high-frequency words can reflect that most netizens show a high degree of attention to issues such as red cultural traditional festivals, and hope to participate in the construction of red cultural traditional festivals and inherit and pass on red culture.

May Fourth Youth Day is not only a commemorative day, but also a cultural symbol with symbolic significance. In different historical periods, the red cultural connotations covered by May Fourth Youth Day have guided the development direction of the youth movement. May 4 Youth Day is one of the symbols of the contemporary inheritance and development of red culture.

Although “Internet + red culture” has development potential, at present, the social network platform’s preference for the dissemination of entertainment information tends to crowd out the space for the dissemination of red culture. At the same time, the fragmented and fast-food narrative logic of social network platforms will dismember the wholeness and profoundness of red culture.

In order to better utilize the Internet to spread red culture and realize the “Internet inheritance of red culture”, policymakers need to fully explore its economic value and promote the high-quality development of red culture industry.

Accelerate specialized legislation and judicial practice for the protection of red historical and cultural resources. There is no relevant basis for directly pursuing legal responsibility for malicious rumor-mongering and smearing of red culture on the Internet.

(c) Implementing regularized special operations to maintain a clear order in cyberspace. In response to harmful content such as “distortions of red history, denigration of the Party’s guidelines and policies, and achievements in social construction”, the public security authorities and the competent departments in charge of Internet and information technology have set up a special operation to combat false rumours about red culture and history. A mechanism for reviewing and guiding red cultural information has been set up, with increased supervision of platforms and dissemination channels, and a special area for reporting false information on red history and culture has been established.

Promoting the development of red culture in cyberspace through “Internet+”, enhancing the digital influence of red culture, educating and guiding netizens to follow socialist core values, and creating a new way of educating people through red culture on the Internet. Through “Internet+”, we have strengthened the placement of public service advertisements in cities, and made use of urban public broadcasting systems, such as bus mobile radio, television, electronic bulletin boards, giant-screen advertisements, and theaters to broadcast red stories and commemorative videos. Red cultural education is socialized through the network, providing citizens with extensive and innovative ideological and political education, and building a good urban environment for red education. Meanwhile, mainstream media actively organize cross-media linkage and hold widely attended programs to promote knowledge of red culture. Through central media TV placement, linked to cell phone question and answer prizes or promotion mechanisms, to promote the participation of the whole society, create viewing hot spots, and create a red topic.

4

Conclusion

This paper applies the text emotion classification technology based on machine learning to the inheritance of “Internet + Red Culture”. By analyzing the timing and challenges of red culture in the all-media era, we design a prediction model of red culture’s inheritance tendency for Internet comments. The predictions of red culture inheritance and the high-frequency vocabulary of comment sentiment are used to determine the possibility of red culture inheritance. 1)

The pre-optimization SVM model and the optimized hierarchical SVM model are experimentally verified, and both classification models achieve more than 85% accuracy in webpage text sentiment classification. Comparing the three evaluation indexes of accuracy, recall, and F-measure value, the optimized hierarchical SVM sentiment classification model has more classification advantages.

2)

Take May 4th Youth Day as the main time node, analyze the classification results of webpage text sentiment comments from May 1st to 9th. During the survey period, positive emotions dominate the webpage text sentiment comments, and most netizens pay high attention to the red culture inherited from May 4 Youth Day. When the inheritance of red culture in the Internet space needs to further strengthen the correct guidance to ensure the positive inheritance of red culture.

Idioma:: Inglés

Calendario de la edición:: 1 veces al año
Temas de la revista:: Ciencias de la vida, Ciencias de la vida, otros, Matemáticas, Matemáticas aplicadas, Matemáticas generales, Física, Física, otros

RSS Feed de revista

Research on Red Cultural Inheritance and Application of SVM Support Vector Machine in Sentiment Analysis

Zheng Zhao

Shuai Yang

Publicado en línea: 24 mar 2025

Recibido: 21 oct 2024

Aceptado: 11 feb 2025

DOI: https://doi.org/10.2478/amns-2025-0766

Palabras claveSVM, Text sentiment classification, Word2Vec, Red culture inheritance

© 2025 Zheng Zhao et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Palabras clave
SVM, Text sentiment classification, Word2Vec, Red culture inheritance