Lexical co-occurrence network and semantic relation mining based on English corpus

This paper analyzes the form of content characterized by nodes in the text complex network model, and points out the way of constructing text network with text words as network nodes. The lexical co-occurrence relationship based on lexical semantics is delineated, and combined with the implementation process of the lexical co-occurrence analysis method, the keyword extraction method of lexical co-occurrence network based on the improved TextRank algorithm is proposed. Combine the features of complex networks and utilize the FWN short text clustering algorithm to reveal the semantic associations between words and words. Analyze the advantages of the improved TextRank algorithm. To count the distribution of lexical co-occurrence network node word classes in the English corpus in the fields of literature, journalism, and law, and to calculate the semantic relevance. In the total network of the English corpus (which contains word co-occurrence network in the field of news, word co-occurrence network in the field of literature, word co-occurrence network in the field of law), nouns have the highest number of nodes as nodes, followed by verbs. Time words have the least number of times as nodes.

Langue:: Anglais

Périodicité:: 1 fois par an
Sujets de la revue:: Sciences de la vie, Sciences de la vie, autres, Mathématiques, Mathématiques appliquées, Mathématiques générales, Physique, Physique, autres

RSS Feed de la revue

Lexical co-occurrence network and semantic relation mining based on English corpus

Guimei Pan

Publié en ligne: 17 mars 2025

Reçu: 25 oct. 2024

Accepté: 09 févr. 2025

DOI: https://doi.org/10.2478/amns-2025-0209

Mots clésTextRank algorithm, Text complex networks, Semantic relatedness, Co-occurrence networks, English corpus

© 2025 Guimei Pan, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Mots clés
TextRank algorithm, Text complex networks, Semantic relatedness, Co-occurrence networks, English corpus