Folk Tales from Diverse Cultures: Digital Analysis of Content using Natural Language Processing
Online veröffentlicht: 19. März 2025
Eingereicht: 07. Nov. 2024
Akzeptiert: 10. Feb. 2025
DOI: https://doi.org/10.2478/amns-2025-0529
Schlüsselwörter
© 2025 Yaping Li, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
At present, natural language processing has become one of the research hotspots of machine learning, and text categorization is an important branch of natural language processing technology. In this paper, for folktales from different cultures, based on natural language processing technology, the text is preprocessed using N-gram language model and SGM model. The word frequency of folktales from different cultures is counted using word frequency statistical analysis to characterize and classify them. Based on data-driven, compare the differences of key text features in different folktales. Using complex network characterization, it is concluded that the linguistic rhythm complex network aggregation coefficients of famous works are all above 0.35, the average distances are all below 2.5, and the aggregation coefficients average distance products are all kept around 1.