Lexical co-occurrence network and semantic relation mining based on English corpus

Pan, Guimei

Open Access

Lexical co-occurrence network and semantic relation mining based on English corpus

Guimei Pan

Pan, Guimei

Mar 17, 2025

Lexical co-occurrence network and semantic relation mining based on English corpus's Cover Image

Applied Mathematics and Nonlinear Sciences

Volume 10 (2025): Issue 1 (January 2025)

About this article

Cite

Share

Download Cover

Published Online: Mar 17, 2025

Received: Oct 25, 2024

Accepted: Feb 09, 2025

DOI: https://doi.org/10.2478/amns-2025-0209

Keywords
TextRank algorithm, Text complex networks, Semantic relatedness, Co-occurrence networks, English corpus

© 2025 Guimei Pan, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Complex network discovery semantic relationship steps

General operation process of common analysis

Different domain words and the node word distribution statistics

Some examples of artificial relevance in corpus

Word A	Word B	Degree of correlation
Imperialism	Colonialism	0.472
Shampoo	Conditioner	0.399
Overseas Chinese	Settlers	0.085
Lion tiger	Tiger lion	0.551
Yongding River	Lugou Bridge	0.364
First order logic	Field of theory	0.261
The middle ages	Castle	0.277
Symphony	Movement	0.211
Taiwan	NT	0.316
Husband’s family	Wife’s family	0.387
Hewlett-Packard	Printer	0.428
Spring Festival	Dumplings	0.395

Multi-path semantic correlation calculation results

Characteristics of the correlation algorithm	Grade correlation coefficient
Separate use of the classification diagram	0.42
Reciprocal path	0.23
Depth information weighting	0.39
Information content weighting	0.37
Use the document map separately	0.36
A non-directional diagram of a two-way link	0.25
Link to link separately	0.37
Link to the link	0.31
Forward to the connection and adjust the weight of the parameters	0.41(π = 0.85)
Use the first paragraph instead of English	0.33
Integrated document and classification diagram, and adjust the parameters	0.46(λ = 0.32)
Open test masking(WS353)	0.37

Experimental parameter

Parameter name	Parameter value
Maximum iteration number	400
Iteration out of the threshold	0.003
The single document extracts the number of keywords N	10
Damping factor d	0.61
Vertex score	1.0
Slide pane size w	2/6/10/15/20

Keyword extraction comparison

Algorithm	Index	2	6	10	15	20	MEAN
TextRank
	P	0.3625	0.3212	0.3578	0.3755	0.3964	0.3627
	R	0.4689	0.4968	0.4578	0.4772	0.4931	0.4788
	F1	0.3715	0.3251	0.3698	0.3745	0.3604	0.3603
I	53	56	59	52	51	54.2
Improved textrank
	P	0.4596	0.4685	0.4725	0.4869	0.4911	0.4757
	R	0.5521	0.5637	0.5417	0.5698	0.5927	0.5640
	F1	0.4122	0.4166	0.4867	0.4474	0.4516	0.4429
I	42	41	38	35	45	40.2

Number of data sets

Classification	Text quantity	Classification	Text quantity
Data set 1	526	Data set 6	64
Data set 2	125	Data set 7	153
Data set 3	348	Data set 8	186
Data set 4	56	Data set 9	54
Data set 5	48	-	-

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

Lexical co-occurrence network and semantic relation mining based on English corpus

Published Online: Mar 17, 2025

Received: Oct 25, 2024

Accepted: Feb 09, 2025

DOI: https://doi.org/10.2478/amns-2025-0209

Keywords
TextRank algorithm, Text complex networks, Semantic relatedness, Co-occurrence networks, English corpus

© 2025 Guimei Pan, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Some examples of artificial relevance in corpus

Multi-path semantic correlation calculation results

Experimental parameter

Keyword extraction comparison

Number of data sets

Lexical co-occurrence network and semantic relation mining based on English corpus

Guimei Pan

Published Online: Mar 17, 2025

Received: Oct 25, 2024

Accepted: Feb 09, 2025

DOI: https://doi.org/10.2478/amns-2025-0209

KeywordsTextRank algorithm, Text complex networks, Semantic relatedness, Co-occurrence networks, English corpus

© 2025 Guimei Pan, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Some examples of artificial relevance in corpus

Multi-path semantic correlation calculation results

Experimental parameter

Keyword extraction comparison

Number of data sets

Keywords
TextRank algorithm, Text complex networks, Semantic relatedness, Co-occurrence networks, English corpus