Research on Red Cultural Inheritance and Application of SVM Support Vector Machine in Sentiment Analysis
Publicado en línea: 24 mar 2025
Recibido: 21 oct 2024
Aceptado: 11 feb 2025
DOI: https://doi.org/10.2478/amns-2025-0766
Palabras clave
© 2025 Zheng Zhao et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Red culture is a form of culture created by the Chinese people during the revolution and construction, with distinctive characteristics of the times and historical significance. Red culture is a critical culture, emphasizing the criticism of reality and the transformation of reality. This culture has a strong practical significance, and to a certain extent, it has promoted the process of China’s modernization [1-4]. Maintaining and inheriting the red culture is an important task in China and an important content in promoting the construction of socialist core values, which is of great significance in carrying forward the socialist core values and promoting the national spirit and national cohesion [5-8]. In the context of the new era, by strengthening red education, carrying out red cultural traditional festival activities, and promoting the spirit of red culture in order to ensure the effective inheritance and promotion of red culture [9-11].
Sentiment analysis refers to identifying the emotional tendency of text from a large amount of text data, which is categorized into positive, negative and neutral. Sentiment analysis technology includes natural language processing, machine learning and data mining, etc., and the current commonly used method is based on machine learning [12-15]. Sentiment analysis has a wide range of application scenarios, such as: monitoring users’ evaluation of brand products or services, analyzing consumers’ interest and purchase behavior, and predicting stock market sentiment [16-18]. SVM is a supervised learning method that can be used for classification and regression to solve binary or multivariate data classification problems. The core idea of SVM is to construct a space to separate different data [19-21]. SVM is widely used in sentiment analysis, which is mainly classified into two aspects: one is SVM method based on word frequency, which uses text categorization algorithms for sentiment analysis. The second is word2vec based SVM method, which uses word vectors for sentiment analysis [22-24].
This paper proposes about machine learning text emotion classification process, respectively, support vector machine, text emotion classification to briefly describe. Introducing the red cultural heritage, analyzing the development of red cultural heritage in the all-media era and the problems faced. As a result, we designed a prediction model for the inheritance tendency of red culture based on SVM, collected and organized corpus information, selected the NLPIR Chinese lexical system, constructed feature vectors, and determined the trend. Design a multilevel SVM classification model using a sentiment classification scheme. Take the text of online comments as training data and analyze the performance of the SVM classifier before and after optimization. Combine the online comment texts on red culture festivals to obtain the probability of netizens’ tendency towards red culture inheritance.
Text sentiment propensity classification is a popular research direction in the field of text categorization, and the current mainstream research method is based on statistical-based machine learning methods. The objects of the sentiment classification research conducted in this paper are all Chinese texts (red culture), and the overall process of text sentiment classification is shown in Figure 1.

The overall process of text emotional classification
The process of text emotion tendency classification is mainly divided into the following steps:
Text pre-processing. After capturing the text, preprocessing work is carried out, including document paragraph merging, cutting, text segmentation, deactivation filtering and so on. Feature extraction. In the process of classifying the emotional tendency of text based on machine learning, the feature extraction part plays a crucial role, and the part that best represents the emotional characteristics of the text should be extracted. Training phase. After extracting text features, these samples with category labels should be used as training data. Let the computer automatically learn the rules of classification, and get a classifier model through learning. That is, a decision function that can accept the input of new samples without category labels and thus output their categories. Testing phase. The testing phase is the last phase of the classification module, where the test is conducted on new unlabeled samples. That is, the classifier model generated during the training phase is used to automatically discriminate the category of unlabeled samples. By comparing the discriminative labels of the classifier model with the real sample labels, the advantages and disadvantages of the classification effect of the classifier can be derived, and then the performance of the classifier can be measured and evaluated.
Support Vector Machine (SVM) has a wide range of applications. It has a solid theoretical foundation in statistics. It has played an important role in the fields of text classification, handwriting recognition, tampered image detection, etc. The main objective of SVM algorithms is to find an optimal classification hyperplane, which can be reduced to solving an optimization problem [25-26].
Optimization problem is a branch of applied mathematics, which studies how to find the optimal value of the objective function without constraints or under limited constraints. According to the different objective functions and constraints, optimization problems can be divided into the following three categories:
Unconstrained optimization problems, as shown in equation (1):
where The optimization problem with equation constraints, the objective function and constraints are shown in Eq. (2):
where
Then the derivation of each parameter of The optimization problem with inequality constraints, the objective function and constraints are shown in Eq. (4):
where
The KKT condition is the requirement that the optimal value must satisfy the following three conditions, as follows:
The candidate optimal value is obtained by solving the above three equations.
Latent Semantic Analysis (LSA) is also known as Latent Semantic Indexing [27]. The essence of the latent semantic analysis method is to find out the real semantics of the words in the document, and then to mine out the document’s word-independent topics, i.e., latent semantic topics. Thus, it solves the problems and deficiencies associated with the inability to consider latent semantics in the vector space model. Specifically, a large collection of documents is modeled using a reasonable dimension. And the process of representing the documents in this space is based on Singular Value Decomposition (SVD) and dimensionality reduction. Dimension reduction is the most important step in LSA analysis. Through dimensionality reduction, the “noise”, i.e., irrelevant information in the document is removed, so that the semantic structure is gradually presented. Compared to the VSM model, dimensionality reduction reduces computational effort and clears the semantic relationship.
With SVD, the previous document-word covariance matrix
where
For matrix Σ, the lower order approximation matrix of matrix
The general steps of LSA can be summarized as follows:
Step 1: Analyze the document collection and establish the “document-feature word” matrix (TD).
Step 2: Perform singular value diversity (SVD) on the TD matrix.
Step 3: Perform dimensionality reduction on the matrix after SVD singular value diversity, that is, the low-order approximation mentioned earlier.
Step 4: Use the reduced matrix to construct the latent semantic space or reconstruct the TD matrix.
Categorizing text is also a process of creating a mapping of text to categories. It maps the text to be categorized into the existing categories, and the mapping can be a one-to-one mapping or a one-to-many mapping. Because a text can relate to more than one topic. The mathematical description is as follows:
Where
The mapping rule Text Preprocessing The primary work of text preprocessing is to deal with noise information and irregular information. Take web page text as an example, there will be a large amount of HTML markup information after it is acquired. This information is used for the layout and display of the web page text, but it is basically of no value to the text. After removing this kind of meaningless interference information, it is the second step in text preprocessing - word separation. The difficulty of word separation is to determine the smallest element of the vocabulary, that is, the most basic semantic unit. Therefore, by converting all capital letters appearing in the text into lowercase forms, and by using non-alphabetic characters such as spaces, punctuation marks, etc. as separators, the text can be easily converted into a tabular list composed of semantic units (words). Text Representation The more widely used weight calculation methods in automatic text categorization include Boolean weight, word frequency weight,
where Text features When representing text as a feature vector, the original feature set consists of all words that appear in the text set. There are two main purposes for performing feature dimensionality reduction. First, if training and classification are performed directly on such a high-dimensional feature space, the amount of computation is too large. Dimensionality reduction can improve the execution efficiency and running speed of the program. Second, all words have different meanings for text categorization. Some generalized lexical entries, which are prevalent in all classes, contribute little to the classification. Phrases that appear in a specific class with a large proportion and in other classes with a small proportion contribute to text categorization, and dimensionality reduction improves the generalization ability of the classifier.
Feature dimensionality reduction is the selection of a true subset from the original document set
satisfies
Connotation of Red Culture The connotation of red culture is premised on a clear understanding of the imagery expressed by “red”. The imagery can be summarized in two aspects. First, the traditional meaning of red in the hearts of Chinese children. The second is the symbolism of red in the international communist movement. In order to deeply understand the red culture, we should also excavate its core elements and realize its essence. First, the red culture embodies the lofty beliefs of the Communist Party. Secondly, red culture embodies the pursuit of the Communist Party’s mission. Always concerned about the fundamental interests of the people, always concerned about the future destiny of the Chinese nation, the Communist Party bravely shoulders the responsibility entrusted by the times and becomes the most reliable person of the people. Finally, the red culture manifests the fine tradition of the Communist Party. In the process of the Party’s development from weak to strong, the Communists have refined the qualities of hard struggle, courageous sacrifice, and innovation. The fine tradition for the red culture has planted the inner gene. Extension of red culture On the basis of a correct understanding of the connotation of red culture, the extension of red culture is further clarified in terms of the phasing of time clues and the classification of different forms.
In the all-media era, certain technical conditions and changes in the social environment have raised new problems and brought new challenges to the inheritance of red culture. To do a good job of red cultural heritage work, we should summarize the past results on the basis of an in-depth study of the current outstanding problems, analyze the new trends, the new environment contains opportunities for development, and then grasp the favorable factors to improve the relevant work.
In recent years, the form of red culture carriers has been further expanded, gradually developing into various thematic educational activities that incorporate the characteristics of the new era. For example, in 2018, the Central Committee of the Communist Youth League organized and launched a series of learning activities for the youth, which included online knowledge contests, essay contests, speech contests, fun question and answer contests, and other forms, to further promote the normalization of red culture education, which is rich in connotations and has a remarkable effect.
Through TV programs, graphic news, webcasts and other channels, red culture has entered grass-roots organizations in various fields such as enterprises and public institutions, rural areas and communities, gradually taking the point to lead the way, breaking the audience limitations, and creating a trend in the society, so that more people come into contact with and enjoy red culture.
In recent years, the Party and the State have attached great importance to the inheritance of red culture and given support in various aspects, but deficiencies in theoretical construction and social synergy still exist. People’s understanding of red culture is a little mixed, and their attitudes towards inheritance work are also very different. Analyzing the reasons, the inheritance of red culture mainly exists in the following four aspects.
Changes in audience thinking From the audience’s point of view, the red culture inheritance work is facing the deconstruction of the negative factors in the modern ideological concepts. Since the reform and opening up, politics, economy, culture, society and other fields have ushered in great changes, and the ever-changing information technology and high-tech power has profoundly changed all aspects of people’s food, clothing, housing and transportation. To a certain extent, this has reshaped people’s values, ways of thinking, and modes of action, and has brought new challenges to the inheritance of red culture. Lack of Innovation in Narrative Mode From the perspective of narrative mode, the inheritance of red culture faces the problems of outdated discourse content and stereotyped expression. If the narrative of a culture can not keep up with the development of the times, then even if it is more advanced and excellent, it will not be able to get long-term development, and the inheritance and continuation of the culture will be impossible to talk about. The discourse system needs to be improved From the perspective of discourse construction, the red cultural heritage is facing challenges due to multiple trends and the impact of Western ideology. As a Marxist ideology, the improvement of the discourse system of red culture is related to the flag and the soul. The reality is that in the era of all-media, the developed media platform integrates the functions of information acquisition, processing, production, and dissemination, resulting in a fundamental change in the way of modern information dissemination. The immediacy, openness, and interactivity of media are prominent advantages over traditional media, providing people with more freedom of speech. This freedom has completely subverted the concept of “gatekeeper” in the process of information dissemination, so that different information and discourse expressions, such as mainstream and non-mainstream, elegant and vulgar, positive and negative, coexist in the society at the same time. The power of discourse has been slowly spreading from official institutions to the general public, and multiple voices have been exchanged and clashed. This has created a complex situation of mixed messages, weakened the dominant power of the red discourse, and made the inheritance of red culture difficult.
This study mainly includes data preprocessing and model training, as well as the corresponding parameter optimization strategy. The framework for the red cultural inheritance tendency prediction model based on SVM is shown in Figure 2.

Based on the SVM’s red cultural heritage tendency prediction model
The specific process of data preprocessing is shown in part A of the figure. Each webpage text S to be tested is used as model input, and the input webpage text is preprocessed. The output of the data preprocessing process is the vectorized representation of the webpage text to be tested.
The main part of the hierarchical SVM model is given in part B of the figure. The vector output from part A is the input to part B. After the prediction modeling, the probability of the user’s tendency to red cultural inheritance is output.
A social media webpage is used as the experimental data and analysis object, and the experiments use Python’s BeautifulSoup as the page element crawling and parsing tool, and adopt the parallelization crawling strategy to obtain the Chinese text data.
The text of webpages within a certain period of time is collected as the original dataset. Due to the wide range of web page text data types, the filtering measures in the data crawling stage are not good enough to obtain the experimental data. Therefore, the crawled data needs to be manually filtered and labeled.
The experimental data contains the following information: user ID, homepage address, text content, release date (specific time period), and URL. It is important to state that the methods and data used in the text are privacy-protected.
Although the number of webpage texts is certain, the complexity and diversity of Chinese expressions lead to a certain degree of redundancy in the collected data. Therefore, the data needs to be preprocessed so as to extract sentiment information from unstructured data.
In order to more accurately detect whether the text content of web pages contains red cultural inheritance tendency, this paper focuses on the field of psychology, while taking into account the Internet terminology and Internet spoken expressions, etc., collects words and other expressions related to red cultural inheritance, and establishes a hierarchical classification scheme that includes different mental states.
Based on the sentiment classification scheme, each webpage text is manually labeled according to its content.
After labeling each data text for classification and initial preprocessing, the processed data is vectorized using Word2Vec, i.e., each preprocessed web page text is converted to
where similarity is
The basic model of SVM is to find the best hyperplane in a particular space for solving binary classification problems. However, with the deepening of research, the model can be used to solve nonlinear problems after the introduction of kernel function. In this paper, a 3-layer hierarchical classification model is constructed based on SVM according to the sentiment classification scheme to predict the probability of web users’ tendency to inherit red culture. The model uses the default RBF kernel function. In this case, the classification target corresponding to each layer of classification is based on the neighboring layer of classification. For the first layer of classification the segmentation hyperplane representation is shown in Equation (12):
where
Where the constant
The heart of the method lies in the derivation of the function
After several experiments, it is found that parameter optimization can have a significant impact on the data analysis results. For SVM models with RBF kernel function, parameters
where
A webpage comment text dataset is used and 1500 training data are selected first. Positive and negative text data are balanced and used to train the support vector machine. Radial basis kernel function is used for the kernel parameters of the support vector machine. One set is using the parameters obtained by default in the model of the experimental tool, while the other set is using the optimal combination of parameters found by the genetic algorithm. Three test sets are created, each with 200 test data with identical categories. The two sets of classifier models for sentiment categorization are evaluated for their effectiveness.
Here is a comparison of the performance of the optimized SVM classifier. There are three sets of test data, and the comparison of the two classification results is shown in Figure 3.

Comparison of the classification results
The analysis on the experimental data shows that in the three evaluation metrics of Accuracy, Recall and F-measure value, the Hierarchical Support Vector Machine classification results are better and there is some performance improvement in all the three metrics. The mean values of the three evaluation metrics of the hierarchical support vector machine are around 90, which is higher than the mean value of the SVM model before optimization. And the results of all three sets of data are better than the support vector machine model with default parameters, indicating that the RBF kernel function can find a better combination of parameters in the results of parameter optimization search, which can effectively improve the classification performance of the classifier.
NLPIR Chinese Word Segmentation System In this paper, NLPIR participle system is used. Among them, lexical annotation, named entity recognition, and user dictionary are all within the functional scope of Chinese participle system. It supports GBK encoding, UTF8 encoding, and BIG5 encoding. Realization of automatic adaptation of participles requires automatic discovery of new words based on information cross entropy combined with feature phrases in slightly long text sentences, and the distribution model is realized by automatic adaptation to test the predicted linguistic probability, which is the function of emotional new word discovery and automatic adaptation of participles. Constructing Feature Vector and Determining Propensity STEP1: After preprocessing the text of a social web page, it is necessary to construct feature vectors, which are applied to train SVM classifiers, and finally generate a classification model. The lexicon used in this paper is a university’s sentiment_ontology, which contains 25830 sentiment words, and the given lexicon includes lexical properties, number of lexical meanings, sentiment categorization, intensity, and polarity values. Some of the dictionaries are provided as shown in Table 1.
Partial emotional dictionary screenshot
| Word | Lexical | Meaning | Class number | Affective classification | Strength | Polarity |
|---|---|---|---|---|---|---|
| Dingy | adj | 1 | 1 | NN | 7 | 2 |
| Premature failure | adj | 1 | 1 | NE | 5 | 1 |
| Reprove | verb | 1 | 1 | NN | 5 | 2 |
| Thief eye | noun | 1 | 1 | NN | 4 | 2 |
| War | noun | 1 | 1 | ND | 3 | 2 |
| Clear roughness | adj | 1 | 1 | PH | 5 | 0 |
| Limpid | adj | 1 | 1 | PH | 5 | 1 |
STEP2: The new words recognized by the segmentation tool need to go to the dictionary to find the polarity and intensity.
STEP3: The construction of feature vectors need to transform the intensity and polarity, stipulating that the intensity are divided by ten to get the value of sentiment word intensity. For polarity greater than 1 is designated as -1 (i.e., negative), polarity equal to 0 is designated as 0 (i.e., neutral), and polarity equal to 1 is designated as 1 (i.e., positive).
STEP4: The SVM classifier is used to train the data, and the vectorized training data needs to be trained to construct the viewpoint sentence extraction model. The constructed model is then used to classify the test corpus.
The crawled daily comment statements are brought into the comment1 of the hierarchical support vector machine for initial evaluation. If the output result of comment1 is 1, the output is changed to comment2. If the output result is 0, go to comment3. If the output result of comment2 is 1, it indicates a positive comment. If the result is 0, go to comment4. If the output result of comment3 is 1, it indicates a negative comment. If the result is 0, it indicates a negative review. If the output of comment4 is 1, it indicates a positive comment. If the result is 0, it means a neutral comment. Table 2 lists the detailed results of information collection. During the survey period from May 1 to May 9, the total crawl data of positive comments, neutral comments, and negative reviews were 4360, 1462, and 2953, respectively. The proportion of positive comments may be due to the fact that the survey period coincides with Labor Day and Youth Day, and online comments show a positive communication atmosphere. The daily accuracy of the hierarchical support vector machine proposed in this paper for daily comment statements is in the range of 85% to 89%, respectively, and the accuracy of daily comment classification is high.
Information collection details
| Date | Positive | Slightly positive | Neutrality | Slightly negative | Negative | Total | Daily accuracy | |
|---|---|---|---|---|---|---|---|---|
| 1 | 5/1 | 495 | 554 | 215 | 365 | 456 | 2085 | 89.824 |
| 2 | 5/2 | 635 | 512 | 123 | 214 | 424 | 1908 | 87.073 |
| 3 | 5/3 | 726 | 531 | 116 | 135 | 359 | 1867 | 89.364 |
| 4 | 5/4 | 615 | 193 | 154 | 256 | 461 | 1679 | 87.221 |
| 5 | 5/5 | 232 | 472 | 168 | 413 | 547 | 1832 | 89.307 |
| 6 | 5/6 | 425 | 268 | 121 | 149 | 132 | 1095 | 85.226 |
| 7 | 5/7 | 546 | 409 | 137 | 335 | 191 | 1618 | 88.919 |
| 8 | 5/8 | 387 | 514 | 256 | 215 | 185 | 1557 | 85.423 |
| 9 | 5/9 | 299 | 327 | 172 | 412 | 198 | 1408 | 87.341 |
An overall count of each sentiment type in the comment data reveals that positive sentiments are more prominent, mainly due to the positive guidance of traditional festivals and the promotion of red cultural festivals in the official media. The overall sentiment tends to be positive. The trend of the number of posted sentiments of daily comments is shown in Figure 4. From the plumb line graph of the number of daily comments on sentiment, it can be seen that May 1-May 4 is dominated by positive sentiment comments, and the discussion about Labor Day and Youth Day dominates this social media network.

Daily comments on the number of emotional trends
In order to be able to further understand the main concerns of web users about the comments on the traditional festival of red culture - May 4 Youth Day, this paper counts the top 20 high-frequency keywords appearing in the comments. The high-frequency words of online comments are shown in Table 3.
Internet comment high frequency vocabulary
| Number | Key words | Serial number | Number | Key words | Serial number |
|---|---|---|---|---|---|
| 1 | Youth talk | 854 | 11 | Youth dream | 132 |
| 2 | Holiday | 792 | 12 | Red culture | 102 |
| 3 | Youth | 654 | 13 | Cultural heritage | 86 |
| 4 | Youth festival | 412 | 14 | Youth education | 81 |
| 5 | May 4 commemorative activities | 301 | 15 | Traditional festival | 75 |
| 6 | May Fourth movement | 256 | 16 | Origin of festival | 72 |
| 7 | The origin of youth festival | 221 | 17 | Theme activity | 69 |
| 8 | A message of the youth day | 217 | 18 | Historical figure | 64 |
| 9 | Youth activity | 185 | 19 | Historical event | 52 |
| 10 | The history of the May 4 movement | 164 | 20 | Youth image | 37 |
The top 10 keywords are “Youth Propaganda”, “Holiday”, “Youth”, “Youth Day Message”, “May Fourth Commemorative Activities”, “May Fourth Movement”, “Origin of Youth Day”, “Youth Day Greetings”, “Youth Day Activities”, and “The Historical Significance of the May Fourth Movement”. These high-frequency words can reflect that most netizens show a high degree of attention to issues such as red cultural traditional festivals, and hope to participate in the construction of red cultural traditional festivals and inherit and pass on red culture.
May Fourth Youth Day is not only a commemorative day, but also a cultural symbol with symbolic significance. In different historical periods, the red cultural connotations covered by May Fourth Youth Day have guided the development direction of the youth movement. May 4 Youth Day is one of the symbols of the contemporary inheritance and development of red culture.
Although “Internet + red culture” has development potential, at present, the social network platform’s preference for the dissemination of entertainment information tends to crowd out the space for the dissemination of red culture. At the same time, the fragmented and fast-food narrative logic of social network platforms will dismember the wholeness and profoundness of red culture.
In order to better utilize the Internet to spread red culture and realize the “Internet inheritance of red culture”, policymakers need to fully explore its economic value and promote the high-quality development of red culture industry.
Accelerate specialized legislation and judicial practice for the protection of red historical and cultural resources. There is no relevant basis for directly pursuing legal responsibility for malicious rumor-mongering and smearing of red culture on the Internet.
(c) Implementing regularized special operations to maintain a clear order in cyberspace. In response to harmful content such as “distortions of red history, denigration of the Party’s guidelines and policies, and achievements in social construction”, the public security authorities and the competent departments in charge of Internet and information technology have set up a special operation to combat false rumours about red culture and history. A mechanism for reviewing and guiding red cultural information has been set up, with increased supervision of platforms and dissemination channels, and a special area for reporting false information on red history and culture has been established.
Promoting the development of red culture in cyberspace through “Internet+”, enhancing the digital influence of red culture, educating and guiding netizens to follow socialist core values, and creating a new way of educating people through red culture on the Internet. Through “Internet+”, we have strengthened the placement of public service advertisements in cities, and made use of urban public broadcasting systems, such as bus mobile radio, television, electronic bulletin boards, giant-screen advertisements, and theaters to broadcast red stories and commemorative videos. Red cultural education is socialized through the network, providing citizens with extensive and innovative ideological and political education, and building a good urban environment for red education. Meanwhile, mainstream media actively organize cross-media linkage and hold widely attended programs to promote knowledge of red culture. Through central media TV placement, linked to cell phone question and answer prizes or promotion mechanisms, to promote the participation of the whole society, create viewing hot spots, and create a red topic.
This paper applies the text emotion classification technology based on machine learning to the inheritance of “Internet + Red Culture”. By analyzing the timing and challenges of red culture in the all-media era, we design a prediction model of red culture’s inheritance tendency for Internet comments. The predictions of red culture inheritance and the high-frequency vocabulary of comment sentiment are used to determine the possibility of red culture inheritance.
The pre-optimization SVM model and the optimized hierarchical SVM model are experimentally verified, and both classification models achieve more than 85% accuracy in webpage text sentiment classification. Comparing the three evaluation indexes of accuracy, recall, and F-measure value, the optimized hierarchical SVM sentiment classification model has more classification advantages. Take May 4th Youth Day as the main time node, analyze the classification results of webpage text sentiment comments from May 1st to 9th. During the survey period, positive emotions dominate the webpage text sentiment comments, and most netizens pay high attention to the red culture inherited from May 4 Youth Day. When the inheritance of red culture in the Internet space needs to further strengthen the correct guidance to ensure the positive inheritance of red culture.
