Accesso libero

Modeling of Student Group Public Opinion Dissemination Mechanism Based on Graph Convolutional Networks in Ideological and Political Education in Colleges and Universities

 e   
25 set 2025
INFORMAZIONI SU QUESTO ARTICOLO

Cita
Scarica la copertina

Introduction

Along with the prosperous development of network media and the continuous growth of the scale of netizens, the social network public opinion shows a gushing development trend, which to a large extent brings certain risks to the supervision of network public opinion in colleges and universities, and at the same time puts forward higher requirements for colleges and universities to deal with the crisis of public opinion [1-4].

College network public opinion is rich in content and covers a wide range, which can be roughly divided into social stability, campus life, campus safety, college management, etc. according to the nature of the event, and can also be divided into positive and negative public opinion [5-6]. Although the content of university network public opinion is complex and diverse, due to the relative independence of the university environment and the relative fixity of the university subjects, the explosive points of university network public opinion are mainly the issues of college students’ rights and interests, academic corruption, teachers’ morals and ethics, and school safety management, which makes the management of university public opinion preplans follow the rules and regulations [7-10]. The main subjects in college public opinion events are mainly colleges and universities, college and university teachers, and college and university students. Contemporary college students are highly active, and they will actively participate in online public opinion events in colleges and universities and express their views in the online public opinion arena, which accelerates the fermentation and dissemination of online public opinion events in colleges and universities [11-14]. And the age, ability to accept new things, cultural literacy level, and active thinking of the college student group determine the special law of the evolution of the subject’s opinion tendency in the college network public opinion evolution system [15-16]. Research on the characteristics of the student body in the network opinion field in colleges and universities, can more accurately grasp the dynamics of student thinking, cultivate students’ rational thinking, the use of student “opinion leaders” team, to create a good campus public opinion ecological environment [17-18]. College students, as the main body of college network public opinion, have problems such as less mature mind, lack of social experience, weaker network discernment ability, etc., in the process of information dissemination, the tendency of opinion is very easy to be influenced, and it is easy to induce network “group polarization” behavior [19-20].

As the main position for cultivating college students to establish correct ideology, colleges and universities will inevitably be influenced by network public opinion [21]. With the gradual deepening of college students’ dependence on the Internet, the connection between ideological and political education and the guidance of network public opinion is getting closer and closer, and the two are influencing each other in a synergistic way. A deep grasp of the relationship between ideological and political education in colleges and universities and network public opinion is conducive to promoting the integration and development of the two [22-23].

Literature [24] constructed a network public opinion monitoring model based on dynamic monitoring strategy, with multi-level and all-round guidance, which can effectively support efficient early warning of network public opinion. Literature [25] reveals that social media has a significant impact on the shaping and dissemination of public opinion among college students based on questionnaire survey method, and the monitoring and control of public opinion should focus on the dynamics of college students’ social media. Literature [26] analyzes the underlying logic and factors of online public opinion formation from macro, meso and micro dimensions, and puts forward the strategy of establishing a close link between online public opinion and optimizing and improving civic education. Literature [27] envisions an intersection mechanism of college students’ civic education with genetic algorithm as the core logic, and explores the intersection of campus culture and civic education around the theme of network public opinion, aiming to promote the improvement of the effect of civic education. Literature [28] analyzes the current situation, challenges and potential obstacles of Civic and Political Education based on the characteristics of the new media era, and helps to explore the path of reform and innovation of Civic and Political Education in the context of new media, which makes a positive contribution to the improvement of the informatization of Civic and Political Education. Literature [29] combines the methods of collecting data, extracting knowledge, integrating knowledge, and applying knowledge reasoning to build a knowledge map of Civic and Political Education, and based on empirical analysis, it confirms the effectiveness of the proposed Civic and Political Education Knowledge Map, which can be used for accurately describing the connection between Civic and Political terms as well as scientifically and rationally assessing the effectiveness of Civic and Political Education.

The study firstly introduces the common statistical properties in social networks, including degree distribution, clustering coefficient, average path length and so on. Second, it introduces the node similarity index commonly used in social networks to measure the degree of similarity and connection between nodes in social networks, which provides a basis for opinion prediction, and then proposes the SIR model underlying the propagation of public opinion on social networks. Again, from the aspects of opinion topic attribute network and user’s historical social content, an opinion propagation prediction method based on representation learning and graph convolutional network is proposed to predict user’s behavior in the next time slice based on the current time slice data. Meanwhile, considering the competitive relationship between positive and negative opinions, the influence of positive and negative opinions is integrated into the user’s feature representation. Finally, the CNN-GCN neural network model constructed in this paper is applied to the student group public opinion for effect comparison, and the performance of the research method is also tested by Twitter15 and Twitter16 public datasets.

Complex network-based identification of propagation key nodes
Theoretical Foundations of Social Networks
Statistical properties of complex networks

Degree and degree distribution

In graph theory and complex network theory, the degree of a node refers to the number of edges connected to that node. Degree can be used to describe the importance and influence of a node in a network as well as the connection relationship between nodes. Let the degree of a node be k, then the degree of that node is represented as in equation (1). k=j=1NAij

where Aij denotes the connection relationship between the ind node and the jrd node in the network, and Aij is 1 when there is a connection between node i and node j, and 0 otherwise. Therefore, the degree k of node i is the sum of the number of connections between it and other nodes [30].

The degree distribution of a network can be expressed as a probability distribution function P(k) that represents the proportion of the number of nodes with degree k to the total number of nodes in the network, and in practice, the distribution function Pc(k) of the degree distribution is often used to describe the degree distribution. Pc(k) indicates the proportion of nodes with degree not less than k in the total nodes. As in equation (2). Pc(k)=k=kP(k)

Clustering coefficient

In complex networks, the clustering coefficient serves to describe whether the nodes are closely connected, i.e., how many of the nodes with degree k are connected to each other. The clustering coefficient measures the interrelationships between nodes and is important for understanding the topology and function of complex networks.

For a node i in the network, its clustering coefficient Ci can be expressed as equation (3). Ci=2Tiki(ki1)

Where Ti denotes the number of connections existing between nodes neighboring node i, ki denotes the degree of node i, and for the whole network, its average clustering coefficient C can be expressed as equation (4). C=1Ni=1NCi

Median centrality

The median centrality refers to the number of times a node appears on all the shortest paths in the network, i.e., the important role that the node plays in the shortest paths connecting other nodes. Specifically, let there be N node in the network, the number of shortest paths between node i and node j is gij, and the number of shortest paths from node k to node l that pass through node i is gkl(i), then the median centrality CB(i) of node i can be expressed as equation (5). CB(i)=kilgkl(i)gkl

Characteristic Path Length

The characteristic path length is the half-mean length of the shortest path between any two nodes in a complex network, usually denoted by L. The specific calculation is shown in equation (6). L=2N(N+1)ijdij

Basic network model

In order to better understand and study complex networks, many classical network models have been proposed, including random networks, scale-free networks, regular networks, and small-world networks. Each of the four models is described below.

A regular network is the simplest model of a network in which each node is connected to a fixed number of neighboring nodes and the connections between these neighboring nodes form a regular structure. This structure is usually described by a parameter k, indicating that each node has degree k (i.e., the number of edges connected to that node).

Random networks are another simple network model in which nodes and edges are randomly generated with a certain probability distribution. In a random network, the degree of the nodes and the number of node connections have uncertainty, and the general case is portrayed by the degree distribution P(k), where P(k) denotes the probability that a node has degree k.

Small-world network is a kind of network model between regular network and random network, which adds some randomness on the basis of regular network. The construction process of small-world networks is usually to connect each node of a regular network to its neighboring k nodes, and then reconnect the edges with a certain probability p so that there are some long-distance edges in the network.

Scale-free networks are an important network model whose node degrees show a power law distribution, i.e., there are a few nodes with extremely high degrees, while the vast majority of nodes have low degrees. The structure formation process of scale-free networks usually involves continuously adding new nodes and connecting them to existing nodes; the number of nodes connected to each new node is a random value, but the probability of connection is proportional to the degree of the existing nodes.

Node similarity metrics

Node similarity metrics are a class of metrics used to measure the similarity between nodes in a complex network, usually based on node attributes, neighboring nodes, paths and other factors to calculate the similarity between nodes.

Common Neighbor (CN), which is the simplest similarity metric, calculates the number of common neighbors between two nodes as the similarity, as in equation (7). SCN(u,v)=|N(u)N(v)|

Jaccard coefficient, which is introduced on the basis of CN, is the ratio of the number of common neighbors between two nodes to the number of all their neighbors, as in equation (8) below. SJaccard(u,v)=|N(u)N(v)||N(u)N(v)|

The cosine similarity metric is used to calculate the ratio of the number of nodes that two nodes share in their neighborhood to their total number of neighbors. It is based on the same assumption as CN, i.e., two nodes are similar to each other in a graph if they have many common neighbors. It is given in equation (9) below. SimS(u,v)=|N(u)N(v)||N(u)||N(v)|

Local path metric, which considers the neighbors of a node and the connectivity between its neighbors. It takes the length of the shortest path between two nodes as a similarity metric as in equation (10) below. SLP(u,v)=1d(u,v)

Katz coefficient, which uses an exponential function based on the distance between nodes and path weights as a similarity metric, as in equation (11) below. SKatz(u,v)=i=1βidi(u,v)

SIR model

In the SIR model, all people in the network are classified into 3 categories, the susceptible population that has not yet been infected, the patient population that has already been infected and is transmissible, and the recovered population that has recovered from the infection and has acquired immunity.

The number of the three main groups of people is denoted as S, I and R according to the initial letter of their English representation, and we introduce the concept of infection rate β, which represents the probability that a susceptible person will be infected with the disease after coming into contact with a person with the disease, or in the case of social networks, the probability of receiving the relevant information and opinions [31]. The number of all possible human contacts with each other is SI, then the expected value of the number of actual infectious events in Δt time is SI × βΔt, and the number of susceptible people is thus reduced as in equation (12). ΔS=SI×βΔt

The chance of a patient switching to the immune status in time Δt is γΔt, then the expected value of recovered patients in this period is I × γΔt. The process of change in the number of immune population is shown in equation (13). ΔR=I×γΔt

Overall, the number of patients varies according to equation (14) below. ΔI=SI×βΔtI×γΔt

In order to show the changing relationship between S, I and R more clearly, the SIR flowchart is shown in Figure 1.

Figure 1.

SIR Process diagram

When we take Δt → 0, i.e., in the limit, the above equation can be rewritten in the form of a differential equation shown in Eq. (15). dS~dt = βSI dI~dt = βSIγI dR~dt = γI

Prediction method of public opinion dissemination based on representation learning and graph convolutional network
Principle analysis

In this section, both the opinion topic attribute network as well as the user’s historical social content are considered, and the characterization strategy of attribute network representation learning combined with textual representation learning is adopted. Further considering the competition and symbiosis between positive and negative opinion, evolutionary game theory is introduced to measure this influence and incorporated into the network representation learning process to obtain a more accurate representation of the user’s features in the opinion topic network [32]. In addition, considering the sparseness of opinion topic data, which leads to poor model generalization, this chapter proposes a new data segmentation method to mitigate the adverse effects caused by it, and establishes a prediction model for user group behaviors in terms of opinion propagation.

Method realization process and key technologies
Method realization flow

The method in this section mainly consists of three parts: quantitative representation of positive and negative public opinion impacts, characterization of users, and propagation prediction model construction, and the method implementation flow is shown in Figure 2.

Figure 2.

Process of method implementation

Definitions

The main purpose of this section is to predict whether potential users under the opinion topic will forward the opinion message or the opposite opinion message by analyzing the users who participate in the positive and negative opinion messages, and also to predict the development trend of the opinion topic. The schematic overview of the problem is shown in Figure 3.

Figure 3.

Schematic diagram of the problem

Definition 1: Participating users U′ ⊂ (RtA′) and participating user networks GUt=(Ut,EU) , Ut denote the participating users of an opinion topic in a t time period. R′ is the user who participates in the dissemination of public opinion, and A′ is the user who participates in the dissemination of opposite public opinion information. GUt=(Ut,EUt) represents the users and networks participating in the opinion topic in time period t, and EUUt×Ut represents the set of edges of the users Ut participating in the opinion topic in time period t.

Definition 2: Potential user V′ and potential user network GV=(Vt,EV) , V′ denotes the potential user who participates in the original opinion or the opposite opinion topic dissemination in time period t. GVt=(Vt,EVt) denotes potential users and their networks in time period t, and EVtVt×Vt denotes the set of edges of potential users of opinion topics in time period t.

Definition 3: Factors affecting user retweets UP = {(ui, p) ∣ ui ∈ (RAV), p denote the factors that each user ui in an opinion topic participates in the propagation of the opinion topic, including the user’s activity behavior Act(ui), the user’s historical retweeting rate Ret(ui), the user’s information perception rate Pre(ui), and the influence of the user’s friends on the user Fri(ui). Thus, p=[ Act(ui),Ret(ui),Pre(ui),Fri(ui)] .

Definition 4: The user’s historical social content is set to HS = {(ui, s) ∣ ui ∈ (RAV)}. s denotes all historical microblog text messages posted or retweeted by user ui in the opinion topic network.

Definition 5: Information prevalence Pop(t). A propagation rate function InfoNum(t) − InfoNum(t − 1) is introduced to represent the prevalence of opinion topic t moments [33]. Therefore, the impact of opinion topics on information dissemination can be defined as: Pop(t)=(InfoNum(t)InfoMun(t1))×(12)ttw

Problem description

Set GUV represents the mixed attribute network under the topic of public opinion within the time period t. First, it is necessary to obtain the relationship network GU of the users who participate in disseminating the opinion or the opposite opinion and the relationship network Gv of the users who potentially participate in disseminating the opinion. Then, the social networks and social contents of the participating users and the potential users within the time period t are represented separately and combined to form the user feature representation. Finally, the obtained user feature representations are input into the prediction model to predict whether the potential users will forward the opinion information in the next time period, and if they do so, it is judged to be a positive or negative opinion, denoted by Yt+1. The problem can be expressed as: GUVtGUt,GVt UP={(ui,p)|ui(RAV)} HS={(ui,s)|ui(RAV)}}GUt+1=Yt+1

Quantitative Representation of the Impact of Positive and Negative Public Opinions

Information impact metrics

Information influence includes internal and external factors of users. Users with high user activity are more likely to participate in opinion information forwarding. The activity level of user ui is shown below: Actui=σ*Num[ orig(ui)]+Num[ retwui]

The ratio of the number of hot topics retweeted by the user to the number of hot topics acquired by the user reflects the probability of the user retweeting a new topic, whereas the information on hot topics acquired by the user mainly comes from the object of interest, the historical retweeting rate of the user is defined as: Ret(ui)=retwNum(ui)getRetNum(ui)

Here, getRetNum(ui) denotes the total amount of content a user receives from the object of concern.

The user’s information perception rate Pre(ui) reflects the probability that the user chooses the opinion topic, i.e: Pre(ui)=Fol(ui)Folave(net)

For a certain opinion topic, the propagation behavior of users participating in the opinion topic in the network may also have an impact on the state of potential users, and the degree of influence varies from one user to another, and this influence is represented by using a multidimensional vector, i.e: wuiFri=[ a1Fri,a2Fri,...,anFri]

Where n is the number of users and potential users involved in the dissemination of opinion topics. ajFri represents the influence of user uj on user ui, calculated as follows: ajFri=eretwedNum(uj)¯(k=0nretwedNum(uk)¯)/n

where retwedNum(uk)¯ denotes the average number of retweets of tweet content from user ui to user uk’s tweets. If k=0nretwedNum(uk)¯=0 or user uj is not a friend of ui, then ajFri=0 .

Construct internal influences based on user activity, user’s historical retweet rate and information perception rate fin(ui). Construct external influences based on friend influence and information dissemination influence, i.e.: fin(ui)=Act(ui)×Ret(ui)×Pre(ui) fout(ui,uj)=ajFri×Pop(t)

Finally, the influence function of positive and negative public opinion is constructed by integrating the internal and external factors of user behavior through the multiple linear regression algorithm: InfR(ui,uj)=ρ0+ρ1×fin(ui)+ρ2×fout1(ui,uj) InfA(ui,uj)=ρ0+ρ1×fin(ui)+ρ2×fout2(ui,uj)

Positive and negative opinion influence measurement

In this section, evolutionary game theory is introduced to construct a positive and negative opinion influence model to quantify the influence of positive and negative opinion information on users, and to obtain the positive and negative opinion influence adjacency matrix WUVM . In the neighboring user group of target user ui, the proportions of retweeting opinion information and retweeting the opposite opinion information are denoted as P1 and P2, respectively. There are some users who do not participate in the retweeting in the adjacent user group, but such users usually do not have much influence on the target user, so that P1 + P2 = 1. The benefit functions of these two strategies are denoted as follows: ProR(ui,uj)=P1×InfR(ui,uj) ProA(ui,uj)=P2×InfA(ui,uj)

Evolutionary game theory was then used to measure positive and negative opinion impacts: MutR(ui,uj)=e(ProR(ui,uj)ProA(ui,uj))1+e(ProR(ui,uj)ProA(ui,uj)) MutA(ui,uj)=e(ProA(ui,uj)ProR(ui,uj))1+e(ProA(ui,uj)ProR(ui,uj))

Finally, according to the competitive nature of positive and negative public opinion, the positive and negative public opinion influence adjacency matrix is constructed: WUVM=[ m(u1,u1) m(u1,u2) m(u1,un) m(u2,u1) m(u2,u2) m(u2,un) m(un,u1) m(un,u2) m(un,un)]

where m(ui,uj)=MutR(ui,uj)MutA(ui,uj) if i = j, then m(ui,uj)=0 .

User Characterization Representation

This section obtains the user’s feature representation by combining the user’s opinion topic attribute network representation with the historical social content this paper representation.

Opinion propagation prediction model

CNN-GCN model

First, the feature matrix is convolved using a CNN layer. Then, the output of the CNN layer and the preprocessed adjacency matrix A^ are input to a two-layer graph convolutional network with an intermediate layer. Finally, the model output is converted into probability values that have different classifications for different nodes using the softmax function, i.e: Z=softmax(A^ReLU(A^(rj0×cnnmodel(H0))W0)W1)

Here. ReLU(x)=max(0,x);rji~Bernoulli(p) . and softmax(xi)=exp(xi)i=1nexp(xi) .

The output of the whole model is Z = {P(rui), P(aui), P(dui)}. Because user forwarding behavior prediction is a three-classification problem which is defined and represented as follows: Y={ 1, P(r|ui)=max(P(r|ui),P(a|ui),P(d|ui)) 0, P(d|ui)=max(P(r|ui),P(a|ui),P(d|ui)) 1, P(a|ui)=max(P(r|ui),P(a|ui),P(d|ui))

Model training

First, the lesser category is identified from the users who participate in opinion dissemination and those who participate in opposite opinion dissemination, and 80% of the data of this category is divided as the training set. Then, the training set of the other two categories of users is divided in the ratio of 1:1:2, by which the negative impact caused by the imbalance in the ratio of sample labels can be effectively mitigated. After completing the model training, another opinion topic is used as the test set and validation set to optimize and finalize the model.

Recognition of Public Opinion Communication of Student Groups in Civic and Political Education

In this section, we use Python 3.0 and PyCharm to apply the SPGNR model to real student group public opinion events on Windows 7 platform and compare the effect of the application with the SIR model.

Applying CNN-GCN model to student group public opinion events, the CNN-GCN model application is shown in Figure 4. From the figure, it can be seen that the susceptible person, S, gradually decreases from August 2, and rapidly decreases from August 8 to 10, and decreases to 0 people on August 10th.

Figure 4.

CNN-GCN Model application

The number of infected I’s increased gradually from August 2, with a rapid increase from August 2 to 4 and a peak on July 5th. On August 1, the number of infected people I was rapidly decreasing to about 50,000 people. Beginning August 5, it was reduced to approximately 0 and remained unchanged. Immunizers, R, began a gradual increase on July 27 and increased rapidly from July 28 to Aug. 1, peaking at about 380,000 on Aug. 5 and remaining constant. In order to improve the effectiveness of monitoring, intervention settings are added to the monitoring of student body public opinion so as to facilitate real-time public opinion guidance, truth disclosure, and countermeasure implementation. According to the magnitude of negative impacts that may be triggered by student group public opinion, it is categorized into three different impact levels: low impact, serious impact, and bad impact. Correspondingly, three different intervention levels of monitoring measures are designed: 0, 1, and 2, where level 0 is no intervention, level 1 is general intervention strength, and level 2 is strong intervention strength.

From the experimental simulation, the communication process curve of student group public opinion is observed by repeatedly adjusting the value of the coefficient of intervention, and when the communication process curve reaches the effect of general intervention strength, the corresponding value of the coefficient of intervention is the value of the coefficient of intervention of general intervention strength. When the communication process curve reaches the effect of strong intervention, the corresponding intervention coefficient value is the intervention coefficient value of strong intervention. Setting different intervention levels corresponding to the intervention coefficient value, intervention coefficient table as shown in Table 1.

The intervention coefficient table

Intervention coefficient Corresponding parameter The intervention coefficient value of the intervention level is 1 The intervention coefficient value of the intervention level is 2
Learning rate intervention coefficient h 0.9 0.8
Infection rate intervention coefficient m 0.9 0.5
n 0.9 0.5
j 0.9 0.5
Conversion factor u 1.4 1.6
v 1.4 1.6
Immune rate intervention coefficient e 1.4 1.6
f 1.4 1.6
q 1.4 1.6
c 1.4 1.6

When the public opinion incident involves the truth of the incident, commitment to handle the results, etc. Relevant industry associations declare their intervention in handling the incident, announce the handling results, optimize the public management system, etc. When unofficial media follow up the truth of the incident and call for a rational view of the incident, etc., these can be considered as intervention measures with an intervention level of 1. Technical interventions by network platform operators, etc., can be regarded as intervention measures with an intervention level of 2. The application of the model to student group public opinion events when the intervention level is 0. The intervention level is shown in Figure 5 when the intervention level is level 0. The values of the parameters are shown in Table 2.

Figure 5.

The intervention level is level 0

Parameter value

Parameter Value Parameter Value Parameter Value
U 1.2 S 0.002 P 0.001
G 0.002 N 0.002 R 0.001
h 0.16 m 0.16 n 0.15
j 0.25 e 0.03 f 0.03
q 0.03 c 0.15 x 1.6
y 1.5 z 2.5 u 0.02
v 0.03

The evolution of public opinion at level 0 is shown in Table 3. When the intervention level is level 1, using the intervention coefficient values, the main differences from level 0 emerge according to the intervention coefficient values at intervention level 1: the peak value of the positively infected person P is about 50% of that at level 0, and the corresponding peak value of the number of postings X is about 65% of that at level 0, which decreases to about 0 entries on August 6, 1 day earlier than at level 0.

The evolution of public opinion with Level 0

Peak (person) Peak time The peak of the network is the peak The peak of the line is the time of time The number of times is reduced to zero
Susceptible About 300,000 August 1st
Active person About 50,000 August 2nd About 70,000 August 5th August 10th
Neutral person About 70,000 August 5th About 10,000 August 5th August 10th
Negative person About 110,000 August 5th About 15,000 August 5th August 10th
Immune About 350,000 August 10th
Total About 40,000 August 5th August 10th

The evolution of public opinion at level 0 is shown in Table 4. Setting the corresponding values of the intervention coefficients according to the table, the main differences from level 0 emerge: the number of actively infected people P peaks 1 day earlier than at level 0, with a peak of about 52% of that at level 0, and reduces to 0 people 4 days earlier than at level 0. The corresponding number of posts, X, peaked 1 day earlier than at level 0, peaking at about 60% of that at level 0, and decreased to 0 4 days earlier than at level 0.

The evolution of public opinion with Level 1

Peak (person) Peak time The peak of the network is the peak The peak of the line is the time of time The number of times is reduced to zero
Active person About 30,000 August 2nd About 40,000 August 5th August 10th
Neutral person About 50,000 August 5th About 90,000 August 5th August 10th
Negative person About 60,000 August 5th About 13,000 August 5th August 10th
Immune About 360,000 August 10th
Total About 300,000 August 5th August 10th

Intervention levels 1 and 2 are shown in Figures 6 and 7. Comparative analysis from the graphs reveals that the peak number of Internet posts at intervention level 2 is only nearly 40,000 less than that with intervention level 1, which is about 10% of that at level 0, and the effect is not too obvious. Considering the intervention costs required for different intervention levels and the corresponding intervention effects, the adoption of intervention level 1 for student group public opinion can meet the regulatory needs.

Figure 6.

The intervention level is level 1

Figure 7.

The intervention level is level 2

The comparison of the effect of applying the model of this paper and the SIR model with the actual public opinion is shown in Table 5. The model designed in this paper takes into account the problem of the difference in posting rate of different types of infected people and finds out its correspondence, which can show the number of net posts. The model designed in this paper can combine with the actual situation of student group public opinion, so that the number of web posts decreases rapidly after the peak, which is in line with the real situation of actual student group public opinion.

Comparison with the actual public opinion

Is it shown Public opinion Public opinion peak The peak time of public opinion Whether the peak is reduced quickly The time of the ttie is zero
SIR Model × Three days earlier Obviously small The same × Six days earlier
Ours Three days earlier The same The same Five days earlier
Model performance comparison experiments
Data sets and evaluation criteria

This section evaluates the proposed method on two real datasets: Twitter15 and Twitter16. The datasets contain 1450 and 800 ideological and political education message contents, respectively. The tags of each event source text in Twitter15 and Twitter16 are annotated according to the authenticity tags of the articles in the websites. The results of ideological and political education information recognition were evaluated using accuracy (Acc) and F1 score (F1) as evaluation metrics for the task.

Experimental setup

The experiments were conducted using a 2-layer GCN with the training batch set to 64. The node feature vectors output from the GCN layer were 32 dimensional. The learning rate is initialized to 5e-4 and gradually reduced during the model training process. The whole process is iterated for 30 training rounds, the word embedding is initialized with a word vector dimension of 300, the number of self-attentive heads in GAT K is set to 8, the hidden layer vector feature size is 32 dimensions, and the balancing loss parameter is selected for the best experimental results, η is set to 0.4 and k is set to 0.1.

Benchmarking model

In this chapter, a total of 11 baseline models are selected for experiments on the Twitter15 and Twitter16 public datasets as follows:

DTC:A rumor detection method that uses a decision tree classifier with manual features to obtain information credibility.

SVM-TS: a linear SVM classifier model considering the time series structure. SVM-TK [40]: an SVM classifier with a propagation tree kernel based on the rumor propagation structure.

MVAE: a multimodal rumor detection model that combines a variable autoencoder and a classifier to explore text and image information.

Rv NN: A rumor detection model based on propagation tree structure using GRU units to learn rumor representations.

PCC: a detection model combining RNN and CNN neural networks for mining user feature sequences.

GCAN: a source tweet-based and propagation-based user feature detection model combining GCN and double common attention mechanism.

VAE-GCN: a rumor detection model using GCN as an encoder and graph autoencoder (GAE) as a decoder to explore the structure of rumor propagation.

BI-GCN: a GCN-based rumor detection model that uses semantic bidirectional propagation structure to explore rumor propagation and diffusion.

GLAN: a detection model that jointly encodes global information between source tweets, retweets and users.

HGATRD: a metapath-based heterogeneous graph attention model for capturing textual semantic features and global propagation features.

Analysis of experimental results

Benchmark model analysis

The first set of experiments was conducted to verify the effectiveness of the models in this chapter, and 11 current advanced rumor detection models were compared with each other, the results of the experiments on Twitter15 and Twitter16 are shown in Table 6 and Table 7, respectively. According to the results, it can be seen that the ACC values of this paper’s model on both Twitter15 and Twitter16 are the highest, which are 0.993 and 0.998, respectively. Among all the baseline modeling algorithms (DTC, SVM-TS, SVM-TK) using manual features, their performance is significantly lower than that of the deep-learning-based methods. There is no doubt that deep learning methods can better mine effective features for rumors, while manual feature-based methods are less accurate and efficient. From the GCN-based detection models (GCAN, VAE-GCN, BI-GCN, GLAN, HGA TRD), it can be found that they perform relatively better than other deep learning models (RvNN, PPC), which suggests that GCN can learn more comprehensive information and better node representations from social networks. Since GRU, RNN, and CNN cannot process data with graphical structures, they ignore important structural features in social information, leading to performance degradation. The strong performance of VAE-GCN and HGATRD illustrates the superiority of VAE-GCN and HGATRD in rumor detection tasks. However, these approaches ignore the differences between semantic features and propagation representations and do not make good use of global features; our approach achieves the best performance because it selectively captures more effective features. Compared to some specific models, although our method does not account for all of the best evaluation data, it demonstrates the effectiveness of our method in the rumor detection task, considering the trade-offs between different performance metrics.

The comparison results of the model and the benchmark model are compared

Reference model Acc. NR FR TR UR
F1 F1 F1 F1
DTC 0.469 0.795 0.372 0.305 0.426
SVM-TS 0.513 0.784 0.453 0.372 0.443
SVM-TK 0.662 0.623 0.713 0.786 0.644
MVAE 0.598 0.544 0.676 0.728 0.397
RvNN 0.721 0.677 0.754 0.834 0.658
PPC 0.848 0.806 0.884 0.842 0.78
GCAN 0.855 0.844 0.847 0.904
VAE-GCN 0.868 0.771 0.767 0.916 0.846
BI-GCN 0.89 0.925 0.867 0.921 0.85
GLAN 0.936 0.943 0.906 0.887 0.815
HGATRD 0.933 0.922 0.915 0.892 0.886
Our method 0.993 0.984 0.9 0.915 0.882

The comparison results of the model and the benchmark model are compared

Reference model Acc. NR FR TR UR
F1 F1 F1 F1
DTC 0.458 0.663 0.406 0.395 0.45
SVM-TS 0.55 0.75 0.413 0.585 0.547
SVM-TK 0.635 0.631 0.612 0.781 0.667
MVAE 0.649 0.551 0.682 0.711 0.58
RvNN 0.739 0.647 0.768 0.838 0.717
PPC 0.878 0.818 0.914 0.824 0.819
GCAN 0.789 0.748 0.744 0.942
VAE-GCN 0.862 0.798 0.805 0.981 0.894
BI-GCN 0.896 0.814 0.848 0.936 0.837
GLAN 0.911 0.918 0.853 0.843 0.953
HGATRD 0.934 0.972 0.915 0.989 0.894
Our method 0.993 0.937 0.904 0.849 0.935

Graph reconstruction visualization and analysis

For a more intuitive comparison, the output is visualized by applying the t-SNE algorithm in a two-dimensional space. In the process of exploring the impact of global structural information on rumor detection, the graph reconstruction module is deleted, and only GAT is used to learn node features by modeling the text-word subgraph and the text-user subgraph, respectively, and the other experimental settings are kept unchanged, and the graph reconstruction importance is visualized (Twitter15) (Twitter16) as shown in Fig. 8 and Fig. 9 (Fig. a is with GAT only, and Fig. b is the model of this paper). From the figures, it can be seen that different types of events (NR, FR, TR, UR) in the dataset can be well categorized, and the model in this paper shows better performance. Specifically, the distribution of points in the graph modeled with GAT only is more scattered and irregular, and even some event categories overlap each other. On the other hand, the points in the model of this chapter have a regular distribution, with smaller intervals between the same categories and larger intervals between different categories. In summary, the model in this paper uses variogram self-coding (VGAE) to learn the posterior distribution, which not only provides a more flexible model for graph generation, but also provides better access to structural information.

Figure 8.

Reconstruction importance visualization

Figure 9.

Reconstruction importance visualization

Conclusion

The article draws the following conclusions:

The CNN-GCN model is applied and analyzed in the communication of student public opinion, and it is found that the number of susceptible persons gradually decreases from August 2nd to 0 on August 10th. At the same time, the model designed in this paper can take into account the difference in the posting rate of different types of infected people, and combined with the actual situation of student group public opinion, the number of posts decreases rapidly after the peak, and the experimental results are in line with the real situation of the actual student group public opinion.

Adequate experiments are conducted on Twitter15 and Twitter16 public datasets using the public opinion propagation prediction method based on representation learning and graph convolutional networks, and the results show that the distribution of points in the graphs modeled by GAT alone is more scattered and irregular, and there are some categories of events that overlap with each other. The points in this paper’s model have a regular distribution with smaller intervals between the same categories and larger intervals between different categories. Therefore, the model in this paper shows better performance and provides a more flexible model for graph generation than the experimental comparison model.

Lingua:
Inglese
Frequenza di pubblicazione:
1 volte all'anno
Argomenti della rivista:
Scienze biologiche, Scienze della vita, altro, Matematica, Matematica applicata, Matematica generale, Fisica, Fisica, altro