Folk Tales from Diverse Cultures: Digital Analysis of Content using Natural Language Processing

At present, natural language processing has become one of the research hotspots of machine learning, and text categorization is an important branch of natural language processing technology. In this paper, for folktales from different cultures, based on natural language processing technology, the text is preprocessed using N-gram language model and SGM model. The word frequency of folktales from different cultures is counted using word frequency statistical analysis to characterize and classify them. Based on data-driven, compare the differences of key text features in different folktales. Using complex network characterization, it is concluded that the linguistic rhythm complex network aggregation coefficients of famous works are all above 0.35, the average distances are all below 2.5, and the aggregation coefficients average distance products are all kept around 1.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

Folk Tales from Diverse Cultures: Digital Analysis of Content using Natural Language Processing

Yaping Li

Published Online: Mar 19, 2025

Received: Nov 07, 2024

Accepted: Feb 10, 2025

DOI: https://doi.org/10.2478/amns-2025-0529

KeywordsNatural language processing, Complex networks, Folktales, Text analysis

© 2025 Yaping Li, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
Natural language processing, Complex networks, Folktales, Text analysis