Open Access

Integrating Content Analysis and LDA Thematic Modeling to Analyze the Presentation of Youth Culture in Urban Cinema

  
Sep 26, 2025

Cite
Download Cover

Figure 1.

LDA probability model
LDA probability model

Figure 2.

Schematic diagram of LDA
Schematic diagram of LDA

Figure 3.

Modular structure of encoder
Modular structure of encoder

Figure 4.

Model structure comparison experimental results
Model structure comparison experimental results

Figure 5.

The comparison of feature fusion and pooling method
The comparison of feature fusion and pooling method

The operating time of each model in different topic keywords

Model Running time(s)
General key Synonymous words Keywords of multiple meanings
LSA 5.0944 5.0498 5.7439
PLSA 4.2886 3.7515 4.6910
STM 4.3783 4.7040 5.1960
CNN 5.0109 5.2280 6.2341
ERNIE 4.337 4.5253 4.8669
LDA 5.5752 6.1680 6.7025
LSTM 5.0323 5.4087 6.6815
BERT-base 3.3305 3.8339 4.6453
LDA-Kmeans 0.2744 0.3241 0.4543

The text data topic of the topic is divided

Theme Weighting (%) Core theme Topic description
Topic 5 34.26 Society, industry, company, time, work, competition, market, enterprise, Internet, young person, graduation, opportunity. The inner volume is very serious
Topic 6 29.04 Oneself, hard work, life, work, study, anxiety, lying flat, things, overtime, life, examination and investigation Stress of life
Topic 7 21.17 Serious, after-work, anti-internal volume, evening, likes, colleagues, work, support, milk tea, mobile phone, game, star. Resistance volume
Topic 8 15.53 Education, school, students, parents, training, teachers, institutions, cold and summer holidays, policies, college entrance exams, universities and complementary courses Education volume

Test results in three different data sets

Data set Model Accuracy rate Recall rate F1 value
YCT CNN 0.9015 0.8996 0.9219
LSTM 0.8638 0.8605 0.8646
BERT-base 0.9164 0.9169 0.9131
LDA-Kmeans 0.9613 0.9844 0.9702
LDA 0.9248 0.9118 0.9206
ERNIE 0.9474 0.9465 0.9434
Weibo1 CNN 0.8891 0.8858 0.9078
LSTM 0.8489 0.8472 0.8526
BERT-base 0.9031 0.9015 0.8977
LDA-Kmeans 0.9789 0.9699 0.9545
LDA 0.9063 0.8981 0.9056
ERNIE 0.9341 0.9321 0.9295
Online2 CNN 0.873 0.8694 0.9026
LSTM 0.8381 0.8207 0.8458
BERT-base 0.8942 0.8882 0.8861
LDA-Kmeans 0.9679 0.9789 0.9625
LDA 0.8954 0.8848 0.8948
ERNIE 0.9228 0.908 0.9166

The accuracy of each model is compared to the accuracy of the key words

Model Accuracy(%)
General key Synonymous words Keywords of multiple meanings
LSA 44.37 41.64 38.77
PLSA 56.88 53.37 42.34
STM 64.51 58.32 40.23
CNN 47.74 43.04 37.67
ERNIE 52.78 51.38 50.69
LDA 57.16 51.52 39.03
LSTM 65.75 60.85 54.08
BERT-base 55.85 49.45 49.21
LDA-Kmeans 93.88 90.12 88.54

The result of the text data topic is divided

Theme Weighting (%) Core theme Topic description
Topic 1 35.78 Choices, problems, young people, society, children, life, future, ability, flat lying, opportunity, education fund. Cause of lie down
Topic 2 24.97 Self, effort, work, no desire, things, anxiety, learning, salted fish, rejection, giving up, resting Inner emotion
Topic 3 21.03 Like, teacher, friend, forever, hope, lovely, good-looking, enter the pit, game, pit, thank you, stage Resistance volume
Topic 4 18.22 Happy, home, weekend, day, comfort, sleep, mobile phone, refueling, sports, summer holidays, happiness, air conditioning Enjoy life

Young people lie in the high frequency vocabulary of text data

Serial number Participle frequency Serial number Participle frequency
1 Lie down* 76500 16 Question * 7014
2 Self * 71034 17 Child * 6757
3 Life * 50377 18 Go home 6549
4 Effort * 30549 19 Learning * 6321
5 Work * 22147 20 Anxiety * 6218
6 Suffer * 19053 21 Young man 6210
7 Like * 9987 22 Fatigue 6138
8 Eat 9654 23 Friend 6022
9 Select 9014 24 World 5317
10 Time * 8326 25 At home 5015
11 Get up 8059 26 Society * 5004
12 Hope * 8011 27 Teacher * 4932
13 Happiness 7877 28 Go to work 2714
14 Joyfulness 7656 29 China* 1999
15 Thing * 7325 30 Tomorrow 1934

The text data high frequency vocabulary of the topic of youth

Serial number Participle frequency Serial number Participle frequency
1 Inner volume 54870 16 Anxiety * 6891
2 Self * 49423 17 Question* 6624
3 Education* 28762 18 Hope * 6434
4 Child * 18941 19 Time * 6194
5 Work * 11547 20 China* 6097
6 Severity 10562 21 Company 6082
7 Effort * 9848 22 Stars 6012
8 Life * 9534 23 School 5884
9 Society * 8889 24 Student 5190
10 Teacher * 8193 25 Age 4879
11 Donation 7925 26 Overtime 4873
12 Lie down* 7884 27 Competition 4809
13 Like * 7754 28 Parent 2584
14 Industry 7526 29 Thing * 1877
15 Learning * 7206 30 Money 1813
Language:
English