Folk Tales from Diverse Cultures: Digital Analysis of Content using Natural Language Processing
Published Online: Mar 19, 2025
Received: Nov 07, 2024
Accepted: Feb 10, 2025
DOI: https://doi.org/10.2478/amns-2025-0529
Keywords
© 2025 Yaping Li, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
At present, natural language processing has become one of the research hotspots of machine learning, and text categorization is an important branch of natural language processing technology. In this paper, for folktales from different cultures, based on natural language processing technology, the text is preprocessed using N-gram language model and SGM model. The word frequency of folktales from different cultures is counted using word frequency statistical analysis to characterize and classify them. Based on data-driven, compare the differences of key text features in different folktales. Using complex network characterization, it is concluded that the linguistic rhythm complex network aggregation coefficients of famous works are all above 0.35, the average distances are all below 2.5, and the aggregation coefficients average distance products are all kept around 1.