Folk Tales from Diverse Cultures: Digital Analysis of Content using Natural Language Processing

At present, natural language processing has become one of the research hotspots of machine learning, and text categorization is an important branch of natural language processing technology. In this paper, for folktales from different cultures, based on natural language processing technology, the text is preprocessed using N-gram language model and SGM model. The word frequency of folktales from different cultures is counted using word frequency statistical analysis to characterize and classify them. Based on data-driven, compare the differences of key text features in different folktales. Using complex network characterization, it is concluded that the linguistic rhythm complex network aggregation coefficients of famous works are all above 0.35, the average distances are all below 2.5, and the aggregation coefficients average distance products are all kept around 1.

Sprache:: Englisch

Zeitrahmen der Veröffentlichung:: 1 Hefte pro Jahr
Fachgebiete der Zeitschrift:: Biologie, Biologie, andere, Mathematik, Angewandte Mathematik, Mathematik, Allgemeines, Physik, Physik, andere

Zeitschrift RSS Feed

Folk Tales from Diverse Cultures: Digital Analysis of Content using Natural Language Processing

Yaping Li

Online veröffentlicht: 19. März 2025

Eingereicht: 07. Nov. 2024

Akzeptiert: 10. Feb. 2025

DOI: https://doi.org/10.2478/amns-2025-0529

SchlüsselwörterNatural language processing, Complex networks, Folktales, Text analysis

© 2025 Yaping Li, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Schlüsselwörter
Natural language processing, Complex networks, Folktales, Text analysis