An In-depth Analysis of Data Science Methods on the Path of Women’s Consciousness Awakening under the Cultural System of Marxist Chineseization 
Publié en ligne: 19 mars 2025
Reçu: 01 nov. 2024
Accepté: 19 févr. 2025
DOI: https://doi.org/10.2478/amns-2025-0365
Mots clés
© 2025 Shaohong Li, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
With the modernization of Chinese society, the issue of women’s modernization is getting more and more attention and attention, and the social phenomenon of women’s consciousness is frequently discussed in our life. As a female subject indispensable to the development of society and human beings, she has naturally become the protagonist of this topic [1-4], and the growth of Chinese women’s consciousness undoubtedly has a more far-reaching significance to the process of China’s modernization. In reality, there are many factors that hinder the awakening of Chinese women’s consciousness, such as history, tradition, society, family, education, etc. These factors undoubtedly hinder the modernization process of women in modern China [5-8].
As a matter of fact, due to the deep-rooted influence of feudal rituals such as the Three Principles and Five Principles and male superiority and female inferiority, the social status of Chinese women was low, and they had been in the situation of being exploited, oppressed, and enslaved for a long time [9-10]. As the May 4th period was an era of great ideological liberation, as well as an era of Chinese and Western cultural exchanges, mingling and exchange, due to the impetus of the New Culture Movement, the Western freedom, democracy, equality, human rights, women’s rights, emancipation, education and other various trends of thought have flooded into China [11-14], these trends of thought injected fresh blood into the Chinese culture, and at this time onwards the self-awareness of the Chinese women began to awaken, and the vast number of women began to From then on, Chinese women’s self-consciousness began to awaken, and the majority of women began to fight for independence and autonomy. Until the Chineseization of Marxism and the modernization of China, Chinese women’s consciousness realized a comprehensive awakening under the development of economy, popularization of education, and advancement of science and technology [15-18].
This paper combines the Word2Vec tool and LDA theme, and adds the EMD distance formula to the text mining domain to build the model framework of the W2v_dist algorithm.The preparation of a female consciousness corpus, text segmentation, and data cleaning are sequentially carried out to achieve the training of model word vectors. Combined with the EMD distance formula, using the semantic information between the female consciousness word vectors, the distance between each female consciousness related topic word is calculated and normalized. Using the EMD distance formula again, the text distance metric is defined, and the text distance metric formula W2v_dist based on the EMD distance and the female consciousness LDA topic model is obtained.Using the firefly algorithm, the association of the semantics of the female consciousness text is converted into the similarity on the spatial model, and is represented by text clustering features. The improved firefly algorithm AFA combined with the idea of K-medoids algorithm is applied to text clustering, and a more ideal optimal solution is obtained for the topic words related to female consciousness. The model is applied to analyze the literature on the theme of “women’s consciousness” awakening, and its methods and groups and other information are studied.
In the CBOW model, word 
The cbow model structure consists of a three-layer network. The input layer inputs the initial word vectors with known contexts, the middle projection layer sums up the vectors input from the input layer, and in the last layer the output layer model modifies the values of each parameter in the model and the vector values initially input to the model by solving the maximum value of the objective function, inversely modifying the values of each parameter in the model and the initial input to the model, and when the objective function is maximized the current word 
The cbow model based word2vec algorithm trains word vectors as follows:
Input layer: a random vector of size 
Projection layer: the initial word vectors of context 
Output layer: In order to simplify the complex multiclassification problem, the output layer of the model is designed as a Huffman tree structure, which can transform the multiclassification problem into a biclassification problem after multiple bisections. Figure 1 shows the output layer structure of the cbow model, bisection is performed once at each non-leaf node in the Huffman tree, to the left or to the right, and the corresponding output value is represented by 0 or 1. The leaf nodes of the Huffman tree are all the words in the corpus, and the weights corresponding to the leaf nodes are the number of times the word corresponding to the node appears in the corpus. The specific process of solving the objective function based on the cbow model optimized by the hierarchl softmax algorithm is to search from the root node of the Huffman tree all the way to the corresponding leaf node where the word is located, and multiply all the probabilities on the whole path of the search.

Sample cbow model
In the figure, the context of female thoughts is used to predict the probability of “female thoughts”, and when the objective function is maximized, the model searches from the leaf node that maximizes the objective function to the root node while modifying the parameter values and the initial word vector.
The probability of going left or right in the Huffman tree is calculated according to equations (2) and (3). The probability of positive instances to the right is calculated according to equation (2) and the probability of negative instances to the left is calculated according to equation (3):
Use the gradient ascent method to obtain a partial derivative of the above equation with respect to 
The updated expression for 
Similarly for Eq. (5) with respect to 
The updated expression for 
The idea of skip-gram model is more similar to the idea of cbow model. skip-gram model is that the input word 
Input layer: the input intermediate word 
Projection layer: in fact, this layer does not have much significance, mainly to correspond to the cbow model structure.
Output layer: also corresponds to a Huffman tree can be analogous to Figure 1, through the Huffman tree can be transformed into a complex multi-classification problem into binary classification. The skip-gram model based on hierarchl softmax optimization is to transform the probability of 
The update formula for the parameter vector 
The update formula for the parameter vector 
LDA is a probabilistic generative model that recognizes implicit topic-word information in a document set. The topic of the document has uncertainty, which is hidden in the “topic-word” and “document-topic” probability distributions, and the number of topics is also uncertain, it is possible that the text contains more than one topic, and it is also possible that the whole text only centers on a topic.
The LDA model is a three-layer Bayesian model, which is divided from top to bottom into a document set layer, a topic layer and a feature word layer [21]. Among them, topics are features of documents, and each document can be considered as a mixed distribution of topic information.Words are features of topics, and each topic can be regarded as a multinomial distribution of words.The essence of the LDA model is to utilize common features to mine the topics of the text. The formula is as follows:
As shown in Equation (17), 
Assuming that the number of documents is 
In this section, based on the Word2Vec tool and the LDA topic model, this paper will introduce a distance metric, EMD distance formula, which is widely used in the image processing field but less used in the text mining field, to build a model of W2v_dist algorithm. First, the feasibility of using the EMD formula to measure text distance is evaluated.
EMD distance is also known as land movement distance, and EMD distance formula is commonly used to solve the optimal solution of transportation problems and calculate the similarity of images. It is widely used in the fields of image processing and computer vision.
The EMD distance provides a good quantification of the minimum cost required to transform histogram 
From equation (20), EMD exhibits the least consumable cost of transforming histogram 
EMD distance utilizes the features of the images to represent the distance of the images. Suppose there exist two images 
A normalized expression of Eq. (21) yields the EMD distance equation:
EMD distance can also be expressed in terms of the distance between texts in terms of the features in the text. From the analysis above, it can be seen that the topic is the feature of the text, while the words are the features of the topic. In this paper, we will use word vector and EMD distance to calculate the distance between topics. Then, the distance between topics combined with EMD formula is used to get the distance between texts.
The first step in the implementation of the W2v_dist algorithm is to train the word vectors. Training Word2Vec word vectors means that the researcher utilizes the Skip-gram model or the CBOW model in the Word2Vec tool to train a corpus in a particular domain. The ultimate goal is to obtain word vectors for each word in that corpus.
The size of the corpus is determined by the specific task. When the corpus size is small (total vocabulary less than 100 million words), it is more efficient to use the Skip-gram model to train word vectors. The research context of this paper is women’s consciousness, and the corpus used is a collection of related literature with a total vocabulary of less than 100 million words. Therefore, the skip-gram model is used to train word vectors in this paper.
In this paper, the Woed2Vec tool that comes with Gensim is used to train word vectors. Since Gensim is developed in Python, this paper uses Python to perform text segmentation operations on the corpus.
After using Stuttering Segmentation to perform the segmentation operation on the corpus, it is also necessary to filter out the stop words in the corpus. When loading the stop word list, not only the system’s stop word list should be loaded, but also the user-defined stop word list.
This paper describes the principle of LDA is that each topic can in turn be viewed as a multinomial distribution of a number of words. That is, words are features of topics. Then, each topic vector can be represented in the following form:
In order to better utilize the semantic information to calculate the distance between topics, this paper proposes the following formula in combination with EMD distance:
In Eq. (24), 
In the previous section, this paper pointed out that themes can be viewed as features of a text. Each text can be viewed as a vector of low dimensions, and each dimension of the vector represents a topic. The dimension value is the weight of the topic in the text. Therefore, this paper again applies the EMD distance formula and defines the text distance metric formula as follows:
In Eq. (25), 
In fact, this paper finds a strong similarity between the firefly algorithm and text clustering.The FA algorithm moves by imitating individual fireflies being attracted by the light intensity of their companions at night, and the specific flight distance is related to their distance and brightness, and the ultimate goal is to find out the location coordinates where the firefly with the largest brightness is located. In text clustering, after the pre-processing of the text feature vector matrix, each document corresponds to a vector with a right value, the vector is equivalent to the spatial location of the fireflies in the FA algorithm, the smaller the angle between the vectors the higher the degree of similarity, so you can use the brightness of the fireflies is set to the inverse of the cosine of the vector of the document, the greater the value of the brightness is, the lower the cosine value is. After obtaining the documents in the center of clustering, the text clustering process is completed once by comparing the cosine values of other documents to the documents in the center of clustering, and assigning them to the cluster where the center with the smallest cosine value is located. Thus, this paper explains how the FA algorithm can be used to cluster texts using the following key elements:
 1) Each firefly corresponds to one document. 2) The position of the firefly in the spatial coordinate system corresponds to the weight of the feature vector of each document. 3) The brightness of the firefly corresponds to the inverse of the sum mean of the distances of the vectors from one document to other documents in the cluster, i.e., the objective function.
At the same time, the firefly algorithm for clustering research is in the exploratory stage, there is no firefly algorithm applied to text clustering research, therefore, its use in text clustering has a strong exploratory, whether for the application of firefly algorithm, or text clustering field of the improvement of the improvement of the significance.
In summary, this paper will firefly algorithm applied to text clustering has full feasibility, next, this paper will introduce the construction of text clustering model based on firefly algorithm in detail.
The Firefly algorithm is simple in structure, robust, easy to implement, and strong in finding ability. Therefore, this paper applies the Firefly algorithm to text clustering. Next, this paper will describe in detail the text clustering model based on the firefly algorithm.
Text clustering is first to do preprocessing operations on the text to remove the deactivated words. Then, using feature selection or feature extraction, the best word items are selected to express the text features. Finally, its most fundamental process is realized through cluster analysis.Thus, the Firefly algorithm will be used for the final stage of clustering implementation.
The current research application is the VSM, which converts textual semantic associations into spatial modeling similarities. These similarities can then be made available to clustering algorithms for manipulation.After obtaining the feature words from the text dataset through feature selection or extraction, each document can be represented in the following form:
In the text clustering process, each text is composed of feature vectors, where each dimension represents the weight of the corresponding feature item in this piece of data. Two  1) Euclidean distance:
 2) Manhattan Distance:
 3) Minkowski distance:
 4) The vector cosine theorem:
From the above analysis, it can be learned that during the flight of the firefly, the step size of its movement has a direct impact on the performance of the algorithm. Therefore, it is very critical to set an appropriate position update strategy.
This paper utilizes the FA intelligent bionic algorithm to find the optimal solution ability, fast convergence speed and other characteristics, at the same time, for the traditional FA algorithm deficiencies made improvements, and the improved firefly algorithm AFA combined with the idea of the K-medoids algorithm is applied to the text clustering, and a new firefly clustering algorithm (K-AFA) is proposed. This paper describes three aspects: the selection of the objective function, the idea of the algorithm, and the process of the algorithm.
FA algorithm is based on the brightness of each firefly to search for the optimal solution of the search, usually set the brightness of the firefly as the value of the objective function, so the selection of the objective function directly affects the final algorithm results. In the 
The firefly FA algorithm is applied to the problem of selection of clustering centroids, in general, the FA algorithm is to search for the individual with the largest brightness, while the clustering centroid selection is to search for the point that minimizes the value of the objective function when the number of clusters in the cluster is given a value, for this reason, the brightness of the firefly 
The higher the value of 
Using the Firefly text theme clustering algorithm constructed in this paper to analyze the specific publication year of 210 papers, it can be found that: since the 1930s, women’s consciousness research began to revive, and the research results showed a general increasing trend, the number of papers increased rapidly during 1945, and after 1949 there was even a surge in the number of papers. The yearly values are shown in Figure 2.

The paper publishes the annual score
The two phases of the surge in the number of essays since 1937 are closely related to the enhancement of external forces, such as the policy orientation of the state to implement civic education. In 1941, the Women’s Federation Organization issued a call for a province-wide literacy campaign on the occasion of Women’s Day on the 8th of March, which further contributed to the development of the literacy class movement, and from then on, the consciousness of women began to awaken and develop rapidly.
From the second half of 1943 onwards, the war situation changed, the Anti-Japanese War entered the stage of strategic counter-offensive, the Japanese troops were heavily invested in the Pacific War, and the CPC continued to insist on guerrilla warfare, crushing and dismantling the Japanese sweeps in China, and the war situation was favorable. As a result, the scale of the base areas gradually expanded and clustered into smaller areas.Coupled with the maturity and perfection of the Party’s leadership work, all the work in the basic areas could be systematically carried out, and the female literacy class movement was no exception. At this stage, the literacy class movement was the largest in scale, and the people’s education continued to develop at a deeper and deeper level, and it reached the climax of the development of the literacy class movement in 1945. Almost all the young women in the revolutionary bases at that time participated in the literacy class movement, and the revolutionary bases showed a small cultural upsurge of literacy and learning in the form of “village-run schools, household-reading, anti-Japanese and national salvation, everyone competing to be the first,” with a total of 22 articles published in the literature on women’s consciousness from 1945 to 1947.
After the victory in the War of Resistance Against Japanese Aggression, the Communist Party and the Kuomintang maintained peace for a short period of time, and the two parties reached the Double Ten Agreements in Chongqing on the future development of China at that time. However, the Kuomintang side tore up the Double Ten Agreements in June 1946 and waged a war against the Communist Party, resulting in the War of Liberation.1947 saw the Kuomintang launch a focused offensive against Shandong, and the liberated areas of Shandong were constantly shrinking.The Communist Party went all out to break the Kuomintang’s focused attack, resulting in all undertakings in the revolutionary base areas being brought to a standstill, and the literacy class movement was once brought to a standstill. It was not until September 1948, when the Communist Party of China put an end to the Kuomintang’s rule in Shandong after a hard battle and the war slowed down a bit, that all the undertakings in the revolutionary base areas had time to recover and develop. The Women’s Relief Society (WRS) played an important role in this period, actively restoring and developing the literacy class movement in the liberated areas, and encouraging women to persist in participating in the literacy class movement was a key task of the WRS at that time.
During the seven years after the founding of the People’s Republic of China, the number of papers published in core journals under the title of “Women’s Consciousness” accounted for almost 81.71% of the total number of papers published since the War of Resistance Against Japanese Aggression.
Select 179 of them and study their research themes and contents.
The statistics of themes and contents of women’s consciousness research are shown in Figure 3. The themes of women’s consciousness research since the Anti-Japanese War have been widely distributed, but unevenly. Among the eight categories of themes and contents summarized, background research on the meaning, value, and practical foundation of cultivating women’s consciousness takes the first place, accounting for 49.721% of the total research, research on the status quo, problems, and countermeasures of cultivating women’s consciousness takes the second place, accounting for about 14.525% of the total research, and research on the social hotspots and phenomena caused by the lack of women’s consciousness accounts for 13.966% of the total research, and research involving Women, the concept, connotation and characteristics of women’s consciousness have not been given due attention, accounting for only 10.056% of the total. Historical studies on the development of women’s consciousness, comparative studies on theories, ideologies, policies and experiences of women’s consciousness in foreign countries as well as trends in the development of women’s education in the context of globalization, and the measurement and evaluation of women’s consciousness as well as a review of women’s consciousness have received less attention from the scholars. Measurement and evaluation of women’s consciousness and review of women’s consciousness studies have received less attention from scholars.

Women’s consciousness research theme and content statistics
Figure 4 shows the distribution of topics, content, and methods related to women’s consciousness research. The field of women’s consciousness research in China mainly adopts qualitative research methods, and among the 179 samples in this statistical survey, a total of 130 articles, or about 72.626%, have been used in literature analysis and theoretical discursive research. In contrast, comparative studies, case studies, experimental studies, and multivariate studies are rarely used. As an indispensable research method in scientific research, theoretical discursive research is of great significance to the construction of the basic research system of women’s consciousness, however, as an important practical field, the cultivation of women’s consciousness, its effectiveness, the current situation of women’s consciousness and the important factors influencing it at the micro level need to be supported by scientific research and empirical analysis, so as to make women’s consciousness practice work in a targeted way. However, as an important field of practice, the cultivation of women’s consciousness needs scientific research and empirical analysis to support its effectiveness, the current situation of women’s consciousness and the important factors influencing it at the micro level.

Women’s consciousness research topics and content, method distribution
Special groups such as farmers and migrant workers, primary and secondary school students, and university students are gradually receiving attention. Figure 5 shows the distribution of research groups on women’s consciousness, with the largest number of researches on the ideal general group, accounting for almost 58.659% of the total researches, and the special groups of citizens, farmers and migrant workers, college students, primary and secondary school students, party and governmental organs and the military, ethnic minority groups, and enterprise units have all been involved, among which the college students’ group has received a higher degree of attention. With the spread of the construction of the new socialist countryside and the prominence of the problem of rural migrant workers, the study of the female consciousness of farmers and rural migrant workers has begun to attract the attention of the academic world. In contrast, women’s awareness in primary and secondary schools, which is the main foundation of women’s education in China, has not received enough attention.The study of women’s consciousness in public, party, government, and military organizations, ethnic minority groups, and enterprise groups has been favored by only a few scholars and needs further attention from scholars.

Women’s consciousness research group distribution
As the core concepts of gender theory, the context of the times and the social environment are the results of the long-term development of institutional arrangements and economic culture in various historical periods, and are important indicators reflecting women’s stratification status, and the application of gender theory helps to identify inequalities in gender relations to a certain extent. With the rapid improvement of Chinese women’s social and economic status, verifying whether the traditional gender concept exists in the social stratification of modern society based on the gender perspective not only helps to expand the scope of the relevant theories and applications of social stratification, but also helps to more objectively judge the current status of gender equality in China under the high labor participation rate of women. Based on this, hypothesis H1 is proposed: the lower the social status of women, the more unfavorable it is for women to awaken to their consciousness.
Some studies in China have confirmed that cultural values characterized by education, occupation, and income have a certain influence on women’s identity, but there are differences in opinions about the intensity of the influence. People’s direct experience and cognition of objective cultural value differences are more likely to influence people’s class self-evaluation than objective factors such as education, occupation, income, etc., and such differences are due to people’s self-expectations and comparisons with other individuals and groups. Based on this, the hypothesis H2 is proposed: the higher the cultural values of women, the more favorable it is to the awakening of women’s consciousness.
The variables were categorized into explanatory, interpretive, and control variables according to the purpose of the study. The assigned values and descriptive statistics of each variable are shown in Table 1.
1) Explained variable. The explanatory variable is female consciousness awakening. Female consciousness awakening is measured according to women’s subjective evaluation of their class status, with a mean of 1.7365 and a standard deviation of 0.4693, indicating that female consciousness awakening is poor.
2) Explanatory variables. The explanatory variables are the era context and social environment, and cultural values.
(1) Era background and social environment. In this study, based on relevant studies in the academic world, gender division of labor, gender competence perception, marriage, gender discrimination in employment, and distribution of household chores are selected as the proxy variables for women’s awareness. The higher scores of the above five proxies represent the higher social status of women. According to the ranking, the mean values of gender division of labor, marriage, gender competence, employment discrimination, and housework distribution are 3.3155, 3.1485, 2.9645, 2.1566, and 2.1056, respectively.
(2) With regard to cultural values, the vertical comparison was higher than the horizontal comparison, and the difference between them was 0.5233.
(3) Control variables. The control variables are mainly the factors affecting the individual characteristics of women’s consciousness awakening, including 9 variables such as age, political appearance, marital status, years of education, work status, household type, geographical type, health status, and family economic status. Among them, age is a continuous variable, and the actual age of the respondents at the time of the interview is selected. Years of education is a continuous variable, and the years of education of the respondents are selected; political appearance, marital status, work status, type of household registration, geographical type, health status, and family economic status are added into the model in the form of fixed class variables. The distribution of the sample was statistically significant. Among them, the mean value of age is 49.4885 years and the standard deviation is 16.4586, which indicates that the surveyed women have a large age difference. The mean value of political affiliation is 0.1856, indicating that most of the surveyed females are members of the general public. The mean value of marital status is 0.7985, indicating that most of the surveyed females are married; the mean value of years of education is 8.0655, indicating that the surveyed females have lower years of education, and most of them have junior high school education. The mean value of work status is 0.5186, which indicates that the number of women surveyed who have a job is basically equal to the number of those who do not. The mean value of household registration type is 0.3648, indicating that most of the surveyed females have agricultural household registration. The mean value for geographic location is 0.3856, indicating that most of the females surveyed are located in inland areas.The mean health status is 0.5591, which indicates that most of the females surveyed are in good health.The mean value of household economic status is 0.6245, indicating that most of the females surveyed have above-average household income.
The assignment and descriptive statistics of each variable
| / | Variable | Mean | SD | |
|---|---|---|---|---|
| Explained variable | Female awareness | 1.7365 | 0.4693 | |
| Interpretation variable | Background and social environment | Gender division | 3.3155 | 1.2658 | 
| Bisexual ability | 2.9645 | 1.2615 | ||
| Marriage marriage | 3.1485 | 1.1596 | ||
| Gender discrimination | 2.1566 | 1.0066 | ||
| housekeeping | 2.1056 | 0.9655 | ||
| Cultural values | Lateral contrast | 1.7415 | 0.5236 | |
| Longitudinal contrast | 2.2648 | 0.6185 | ||
| Control variable | age | 49.4885 | 16.4586 | |
| Political appearance | 0.1856 | 0.3153 | ||
| Marital status | 0.7985 | 0.4188 | ||
| Education life | 8.0655 | 5.0652 | ||
| Working condition | 0.5186 | 0.5269 | ||
| Household registration | 0.3648 | 0.4866 | ||
| Geographic type | 0.3856 | 0.4856 | ||
| Health status | 0.5591 | 0.4969 | ||
| Family economy | 0.6245 | 0.4826 | ||
In the main effect analysis of the influence of era background and social environment, cultural values on the awakening of women’s consciousness, as shown in Table 2, Model 1, Model 2 and Model 3 are the regression results obtained from the fitting of ordered Logit model, and Model 4 is the regression results obtained from the fitting of multiple linear regression model. Among them, Model 1 is the result of regression analysis with only control variables, age, political appearance, marital status, years of education, type of household registration, type of region, health status, and family economic status all significantly and positively affect women’s awakening of consciousness, while work status does not show significant statistical significance.
1) Era background and social environment. Model 2 is the regression analysis result obtained after adding the era background and social environment on the basis of model 1. Among them, gender division of labor significantly and positively affects women’s class identity at the 1% level, and marriage significantly and negatively affects women’s consciousness awakening at the 5% level, i.e., the more women’s consciousness of marriage and marrying is inclined to the traditional, the more unfavorable it is to women’s consciousness awakening, which is mainly due to the harshness and modeling of the gender selection in the competitive talent market leading to the fact that women are facing a greater pressure of survival, and they put the value of their lives on the marriage. Marriage. Employment gender discrimination significantly and positively affects women’s class identity at the 1% level, i.e., the more employment gender discrimination exists among women, the higher the awakening of women’s consciousness. Combined with the above analysis, it can be seen that hypothesis H1 is partially verified.
2) Cultural values and the awakening of female consciousness. Model 3 is the result of regression analysis obtained by adding cultural values on the basis of Model 2. Among them, the two operationalized indicators of cultural values significantly and positively affect the awakening of women’s consciousness at the 1% level, and the adjusted R2 is 14.8%, whose overall explanatory power is improved compared with Model 2. This is mainly due to the fact that, on the one hand, under the influence of the rich-poor gap and relative poverty, people’s value pursuit is more inclined to material money, and the family economic status affects women’s cultural values to a great extent, which directly affects the awakening of female consciousness. On the other hand, when individuals compare themselves horizontally or vertically with others around them or with their own past, a sense of relative deprivation will arise, and this subjective feeling will directly affect the awakening of female consciousness. Therefore, hypothesis H2 was tested.
Analysis of the main effect of women’s conscious awakening
| Interpretation variable | Order Logit | Multivariate linear regression | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Model 1 | Model 2 | Model 3 | Model 4 | ||||||
| Coefficient | S.E. | Coefficient | S.E. | Coefficient | S.E. | Coefficient | S.E. | ||
| Background and social environment | Gender division | - | - | 0.094*** | 0.034 | 0.105*** | 0.034 | 0.015*** | 0.004 | 
| Bisexual ability | - | - | -0.015 | 0.032 | -0.013 | 0.036 | -0.003 | 0.005 | |
| Marriage marriage | - | - | -0.076** | 0.033 | -0.057 | 0.033 | -0.015* | 0.005 | |
| Gender discrimination | - | - | 0.115*** | 0.038 | 0.098** | 0.037 | 0.018** | 0.008 | |
| housekeeping | - | - | -0.026 | 0.035 | -0.023 | 0.035 | -0.005 | 0.008 | |
| Cultural values | Lateral contrast | - | - | - | - | 0.915*** | 0.078 | 0.182*** | 0.015 | 
| Longitudinal contrast | - | - | - | - | 0.348*** | 0.054 | 0.067*** | 0.013 | |
| Control variable | age | 0.015*** | 0.005 | 0.007*** | 0.005 | 0.006** | 0.042 | 0.002* | 0.000 | 
| Political appearance | 0.248* | 0.154 | 0.265** | 0.125 | 0.185 | 0.128 | 0.025 | 0.015 | |
| Marital status | 0.284*** | 0.083 | 0.245*** | 0.082 | 0.248*** | 0.085 | 0.048*** | 0.016 | |
| Education life | 0.048*** | 0.008 | 0.053*** | 0.008 | 0.048*** | 0.012 | 0.007*** | 0.003 | |
| Working condition | -0.043 | 0.078 | -0.037 | 0.072 | -0.085 | 0.078 | -0.015 | 0.014 | |
| Household registration | 0.192** | 0.065 | 0.246*** | 0.075 | 0.226*** | 0.071 | 0.042*** | 0.015 | |
| Geographic type | 0.548 | 0.062 | 0.265*** | 0.062 | 0.348*** | 0.062 | 0.059*** | 0.015 | |
| Health status | 0.345*** | 0.075 | 0.315*** | 0.073 | 0.264*** | 0.087 | 0.045*** | 0.013 | |
| Family economy | 1.485*** | 0.073 | 1.465*** | 0.071 | 0.958*** | 0.072 | 0.196*** | 0.015 | |
| Constant term | - | - | - | - | - | - | 0.948*** | 0.054 | |
| Adjust R2 | 0.085 | 0.135 | 0.148 | 0.175 | |||||
Note :1), * and ** indicate that each variable is significant at the level of 10%, 5% and 1%, respectively :2) Standard error is robust standard error :3) “-” in model 1 indicates that orderly Logit regression is not performed using gender awareness and socioeconomic status, and “-” in model 2 indicates that orderly Logit regression is not performed using socioeconomic status. A “-” in a constant term indicates that this value does not exist.
The household registration system is an important feature of the urban-rural dichotomy, and the type of household registration causes differences in resource endowments, lifestyles, and social attitudes among different groups, which significantly affects the group’s class identity. This study further conducted a sub-sample regression analysis of women’s awakening of consciousness based on household registration type, and Table 3 shows the analysis of urban-rural differences in the influence of era background and social environment, and cultural values on women’s awakening of consciousness. It is found that there is a significant difference between urban and rural areas in the influence of contemporary background and social environment on women’s awakening to consciousness, while there is no significant difference between urban and rural areas in the influence of cultural values on women’s awakening to consciousness.
The urban and rural differences of women’s consciousness
| Interpretation variable | Order Logit | ||||
|---|---|---|---|---|---|
| Countryside | Town | ||||
| Coefficient | S.E. | Coefficient | S.E. | ||
| Background and social environment | Gender division | 0.055 | 0.048 | 0.215*** | 0.053 | 
| Bisexual ability | 0.026 | 0.043 | -0.156** | 0.061 | |
| Marriage marriage | -0.034 | 0.034 | -0.086 | 0.057 | |
| Gender discrimination | 0.154*** | 0.045 | 0.034 | 0.072 | |
| housekeeping | -0.019 | 0.044 | -0.015 | 0.065 | |
| Cultural values | Lateral contrast | 0.082*** | 0.082 | 1.065*** | 0.136 | 
| Longitudinal contrast | 0.254*** | 0.065 | 0.469*** | 0.105 | |
| Adjust R2 | 0.115 | 0.189 | |||
| N | 3.485 | 1.915 | |||
Both horizontal and vertical socio-economic status comparisons show that socio-economic status significantly and positively affects the class identity of rural and urban women at the 1% level, further validating hypothesis H2. Meanwhile, the adjusted R2 is 0.115 and 0.189 in rural and urban areas, respectively, with urban areas being more awakened to women’s consciousness than rural areas.
In this paper, on the basis of word2vec tool and LDA topic model, EMD distance formula is introduced for building W2v_dist algorithm model, which is used to train the female conscious text set processing and word vectors to get the text distance metric formula. The Firefly algorithm text clustering is constructed to screen topic words in the text set and select the best word items to express text features.The constructed model is employed to study 210 texts on women’s consciousness awakening.From 1945 to 1947, the total number of published texts on women’s consciousness is 22, and women’s consciousness awakening comes to the stage of rapid development and maturity. In the seven years after the founding of the People’s Republic of China, the number of publications in journals focusing on women’s consciousness reached 81.71% of the total number of publications since the war, and women’s consciousness was further developed. Of the five proxy variables for the context of the times and the social environment, gender division of labor, marriage, gender competence, gender discrimination in employment, and the distribution of household chores ranked 3.3155, 3.1485, 2.9645, 2.1566, and 2.1056, respectively, and the difference between the vertical and horizontal comparisons of cultural values was 0.5233, higher than that of the horizontal comparisons.
