Research on big data visualization in the talent cultivation guarantee system of social sports instruction and management oriented to the concept of OBE

OBE concept is a kind of educational philosophy that focuses on student-centeredness and on the development of students’ learning outcomes and abilities. In the talent cultivation of social sports guidance and management majors, applying the OBE concept and putting students’ learning outcomes and ability cultivation in the first place is of great significance to improve the quality and market competitiveness of professional talents [1-4]. With the rapid development of social sports, the demand for social sports guidance and management professionals is also increasing [5-6]. In order to cultivate professionals with high level of professionalism to meet the needs of the society, the introduction of the OBE concept has become an effective way of educational reform [7-9].

The major of social sports instruction and management is an important specialty to cultivate social sports workers and management talents. With the rapid development of China’s sports industry, the demand for social sports guidance and management professionals is also increasing [10-13]. However, there are some problems in the process of talent cultivation in this specialty, such as the lack of diversified teaching modes, insufficient cultivation of practical ability, and disconnection with social demand. Therefore, it is of great significance to visualize the big data in the talent cultivation guarantee system of social sports guidance and management for the OBE concept [14-17].

The cultivation of social sports guidance and management professional talents under the OBE concept helps to cultivate high-quality professionals needed for social sports [18-19]. By clarifying the learning outcomes, designing adaptive courses, and strengthening comprehensive evaluation, we can achieve the goals of cultivating practical ability, improving independent learning ability and enhancing comprehensive quality. However, the implementation of the OBE concept also faces challenges, and requires the joint efforts of schools and teachers to improve the training programs and teaching methods to meet the social needs and the requirements of professional talent training [20-23].

Based on the talent portrait technology, this paper uses the improved clustering algorithm to visualize and analyze the data of various indexes of social sports instruction and management talents from five samples of a college since graduation. Firstly, Scrapy framework was selected for data acquisition, and then HanLP Perceptron Segmentation Tool was used for text segmentation. Next, the density parameter was introduced to judge each sample point in the sample dataset and observe whether it is suitable as the initial clustering center. By calculating the overall average distance of the sample dataset, the overall neighborhood of the sample dataset is constructed by selecting the appropriate parameters according to the characteristics of the sample data. Multiple dimensions of the principal component analysis are utilized to present the portrait of social sports instruction and management talents, and the final comprehensive evaluation score is calculated.

2

Preparation of talent profiling algorithms

2.1

Talent Profiling Technology

Talent profiling can be seen as a branch of user profiling in terms of highly educated and technical talents. The core point is to rely on the real data of individuals to build a comprehensive and accurate character model.

Combined with the current popular data analysis technology and modern information technology, talent profiling not only helps in-depth mining and rational allocation of talent, but also promotes the improvement of talent self-knowledge. By establishing a personalized talent management system and recommendation mechanism, enterprises can more accurately match talents to suitable positions or project teams, which not only saves manpower costs, but also helps to optimize the allocation of talents and improve work efficiency and comprehensive competitiveness.

2.2

Talent portrait construction

According to the system needs, the talent portrait construction in this paper can be divided into the following four stages: data collection, text analysis, talent portrait construction, and talent relevance analysis as shown in Figure 1 [24].

1)

Data Acquisition

The basis of the talent portrait system is to analyze the talent data, so it is crucial to accurately and comprehensively obtain talent information. In this paper, according to the needs of the model construction, the data sources are divided into four categories, the platform basic data, talent filling data, network crawling data, and job data.

2)

Text analysis

The main goal of this stage is to realize the pre-processing of multi-source heterogeneous Chinese text data, so that it meets the system requirements and is available for algorithm recognition.

3)

Portrait Construction

Based on the labeled data obtained in the text analysis stage, we match them and then build a personal portrait of the talent in combination with the specific needs of the system.

4)

Relevance Analysis

After the first three steps of the process, we can get a full, three-dimensional portrait of each talent in the professional field.

2.3

Scrapy crawler framework

2.3.1

Introduction to the Scrapy Framework

Scrapy is a powerful Python-based open source web crawler framework designed to help users efficiently and quickly crawl data from web pages. The framework provides a suite of tools and features that enable users to easily build, manage, and extend web crawlers. Scrapy’s flexibility and customizability have made it the tool of choice for many data acquisition projects. The framework provides crawling data functions and rich expansion interface, can meet this paper to build talent profiling system for talent such as papers, patents and other information acquisition needs. The framework can be customized and extended according to your own needs.

2.3.2

Scrapy Framework Architecture

Scrapy framework consists of the following modules: 1)

Engine: the engine is used to handle the data flow processing of the whole system, trigger transactions, and is the core of the whole framework.

2)

Item: Item defines the data structure of the crawling result, and the crawled data will be assigned to this object.

3)

Scheduler: scheduler is used to accept requests sent by the engine and added to the queue, and in the engine again when the request is provided to the engine, can be imagined as a URL priority queue, by which to decide what is the next URL to be crawled, and at the same time to remove duplicate URLs

4)

Downloader: Downloader is used to download web page content and return web page content to EGINE, downloader is built on twisted which is an efficient asynchronous model

5)

Spiders: Spiders are developer-defined classes that contain crawling logic and web page parsing rules, it is mainly responsible for parsing responses and generating extraction results and new requests.

6)

Item Pipeline: item pipeline, responsible for processing items after they are extracted, its main task is to clean, validate and store data.

7)

Downloader Middle wares: Downloader Middle wares is a hook framework located between the engine and the downloader, it mainly handles the request and response between the engine and the downloader.

8)

Spider Middle wares: Spider Middle wares is a hook framework located between engine and spider, the main task is to process the response and output of spider input and new request.

2.4

Text Analysis Based on Segmentation Techniques

Most of the data needed for the talent portrait system constructed in this paper are text data crawled from network platforms, which can not be used directly as data samples, and need to be processed by text analyzing technology to be transformed into a digital data set that can be applied to the clustering algorithm. The general process is to use long text data to be processed into Chinese short words (tags), and then after the feature processing engineering will contain the actual significance of the label into the digital form of the data, and finally through the matching algorithm and clustering algorithm to analyze the data to construct a personal talent model.

Parsing technology is an important technology in the field of natural language processing, which is used to cut a continuous text sequence into words or phrases with semantic meaning. For Chinese, due to its large vocabulary, rich word meanings and the absence of spaces or other obvious separators between words, the accuracy and efficiency of the word segmentation technique is crucial to the quality of the subsequent text processing tasks.

In this paper, we adopt the HanLP word segmentation tool, which is mainly based on the perceptual machine model, which is a statistical learning-based approach, and an open-source Chinese natural language processing toolkit developed by the Institute of Natural Language Processing and Social Humanities Computing of the Chinese Academy of Computer Science. Known for its high performance and accuracy, it provides a comprehensive solution for Chinese text processing. HanLP supports a variety of functions, including word segmentation, lexical annotation, named entity recognition, dependent syntax analysis, etc. These functions cover the core tasks of Chinese natural language processing. With a simple and easy-to-use API, users can easily integrate HanLP into their projects and get started quickly without tedious configuration. HanLP utilizes advanced machine learning algorithms such as perceptual machines, which can effectively process Chinese text and achieve remarkable results in terms of accuracy and performance. In addition, HanLP supports multiple language models, allowing users to choose the appropriate model for text processing according to their actual needs. Due to its powerful features and superior performance, HanLP is widely used in various Chinese natural language processing tasks, including academic research, industrial applications and other fields, and has become one of the important tools in Chinese text processing.

The segmentation model of HanLP is mainly based on the Conditional Random Field CRF algorithm, which is a model of conditional probability distribution of another set of output sequences given a set of input sequences conditions, and the base structure is shown in Figure 2.

The specific formula is as follows: (1) $P (y | x) = \frac{1}{z (x)} \cdot \exp (\sum_{j = 1}^{n} \sum_{i = 1}^{m} φ_{i} f_{i} (y_{j - 1}, y_{j}, x, j))$

3

K-mean algorithm using density parameter to select initial center

3.1

K-means algorithm

K-means algorithm is the most commonly used algorithm in cluster analysis. It is a cluster analysis activity that uses the distance between samples as a tight class indicator for correlation. It is a typical unsupervised clustering method. Thus there is no need for manual expert labeling, which reduces the cost of clustering to a great extent. It is for these reasons K-means method has better applicability than other clustering methods and it has a great potential to improve it in many ways. Various improved K-means clustering methods have also made the accuracy and applicability of this method further enhanced.

3.1.1

K-means algorithm idea

K Mean clustering algorithm is very commonly used algorithm. Its algorithmic ideas have evolved with the times, but the most basic ideas have not changed much [25]. The following is a simple description of the most basic K-means method using its computational process. There are mainly several steps as follows:

Step 1: We select K samples in the data space as the initial center, where each object represents a clustering center;

Step 2: For the data in the sample set, their belonging is determined by calculating their distances from these clustering centers. We assign them to the class corresponding to the clustering center closest to them according to the closest distance criterion;

Step 3: Update the clustering centers by taking the mean value corresponding to all the objects in each category as the clustering center for that category, calculating the value of the objective function, and then continuing to put in the samples to calculate their distances to continue to classify them according to the closest distance and recalculate the clustering centers;

Step 4: Repeat step 2 and then for each iteration to determine whether the clustering center and the value of the objective function has changed, know that the clustering center and the value of the objective function no longer change, and then output the results.

3.1.2

Analysis of advantages and disadvantages of K-means algorithm

K-means algorithm is often used because of the relatively simple operation method, the principle is easy compared with other clustering methods, the operation speed is fast and other advantages, so this clustering algorithm is often used by us. However, the K-means algorithm has some disadvantages and shortcomings while having these advantages. For example, the most primitive calculation method has a great deal of randomness in the selection of the initial center, and the K-means method is highly dependent on the selection of the initial center. The selection of the initial clustering center has a great influence on the results of the final cluster analysis. When our initial clustering center is not chosen appropriately, it may cause the algorithm results to be unstable, so that the final clustering results are likely to be incorrect, and it is highly likely that it will converge to the local optimum of the clusters instead of the global optimum. Similarly, the choice of K-value is often based on the user’s personal experience, and the choice of K-value has a significant impact on the final clustering result. When different values of K are chosen for the same sample dataset, the clustering results may be very different, which will have a great influence on the final result, and the selection of K values will largely determine the clustering results. Therefore, most of the improved K-means algorithms make relevant improvements in these two directions to improve the K-means algorithm by choosing more appropriate initial centers and more appropriate K values, and then get better results.

3.2

K-means algorithm for density parameter optimization

3.2.1

Acquisition of density parameters

The K-means algorithm with improved initial clustering centers is a related improvement of the initial clustering centers used to reduce the uncertainty generated by the random selection of clustering centers. In the following, we first introduce the relevant parameters for calculating the initial clustering centers for the improved K-means method. For dataset D = {x₁, x₂, x₃, …, x_n}, the first thing we do is to calculate the mean of the distance between the overall samples, denoted as AveDist, where n is the number of samples [26]. This is calculated as follows: (2) $A v e D i s t = \frac{2}{n (n - 1)} \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} d (x_{i}, x_{j})$

where dist(x_i, x_j) is the Euclidean distance between two points, and both x_i and y_i are m-dimensional data. The calculation method is as follows: (3) $d i s t (x_{i}, x_{j}) = | | x_{i} - x_{j} | | = \sqrt{{(x_{i 1} - x_{j 1})}^{2} + ...... + {(x_{i m} - x_{j m})}^{2}}$

where m is the dimension of the sample point in question.

The number of sample points contained within each sample neighborhood is defined as follows: in a m-dimensional space data set D = {x₁, x₂, x₃, …, x_n} has been given. For each data, the determination of the number of points within its associated neighborhood is determined by the following equation: (4) $ρ (x_{i}, ε) = \sum_{1 \leq i \leq n, 1 \leq j \leq n, i \neq j} sgn (ε - d i s t (x_{i}, x_{j}))$

where the sgn function is: (5) $\begin{matrix} sgn (x) & = & 1, x \geq 0 \\ sgn (x) & = & 0, x < 0 \end{matrix}$

The result of ρ(x_i, ε) is the number of samples in the sample x_i neighborhood is also called the density parameter. The ε is the domain radius, whose value is generally the mean of the distance between the overall samples, and can also be taken ε = α × AveDist, 0 < α ≤ 1 when the direct use of the mean of the distance does not yield a large enough clustering K-value, of which is in the case of obvious classification boundaries, α usually take the value of 0.3, 0.4, 0.5. The specific value is chosen experimentally by the desired K value.

3.2.2

Using Density Parameters to Obtain Initial Cluster Centers

In the above calculation we can get the density parameters for each sample. Through these density parameters we can make relevant improvements to the K-means method. Its method of selecting the initial clustering center is through the relevant density parameters calculated above. Here we sort the ρ(x_i, ε) calculated for each sample x_i from largest to smallest, and first select the x_i with the largest sample density as the first initial clustering center. Then remove x_i and all the samples within its neighborhood and find the remaining samples corresponding to the maximum density parameter of x_i and remove all the points within its neighborhood, and continue to search downward. Continue down the list until we find the cluster centers we need. When the final number of clustering centers is less than the required number of clustering centers, we can appropriately reduce the value of α and then select.

3.2.3

Description of K-means algorithm for optimizing initial centers based on density parameters

Based on the above description here I provide a systematic description of how the improved K-means method performs the selection of initial clustering centers:

Step 1: Calculate the distance dist(x_i, x_j) between two pairs of data in the dataset and then calculate the associated average distance by using the AveDist formula;

Step 2: Calculate the ρ(x_i, ε) of each sample after choosing the appropriate ε parameters for i = 1, 2, 3, …, n. Place it in dataset $S = {ρ (x_{i}, ε), i = 1, 2, 3, ..., n}$ ;

Step 3: Select the largest ρ(x_i, ε) corresponding to x_i from dataset S as the first initial clustering center. Then eliminate all samples within the ε neighborhood of x_i. Then continue to select the largest ρ(x_i, ε) corresponding samples from the remaining sample points as the second clustering center;

Step 4: Keep repeating step (3) to know that all elements in S are eliminated or the number of initial clustering centers required for clustering is reached;

Step 5: The selected initial clustering center is noted as V = {v₁, v₂, v₃, …, v_K} where K is the number of classes of the required clusters. Then use the selected V as the initial clustering center to calculate the distance between each element in the sample set and the initial clustering center. The one with the smallest distance is selected as grouped into one class;

Finally then update the clustering centers $v_{j} = \frac{1}{| C_{j} |} \sum_{x \in C_{j}} x$ repeating step (5) knowing that each v_j is not changing. The stabilized result is the clustering result obtained based on the improved K-means method.

4

Steps of principal component analysis

Assuming that the input data is a matrix of X_n×p, where n is the number of samples and p is the indicator variable, the main steps of the Improved Principal Component Analysis (IPCA) are as follows:

Set the data matrix: (6) $X_{n \times p} = [\begin{matrix} x_{11} & x_{12} & \dots & x_{1 p} \\ x_{21} & x_{22} & \dots & x_{2 p} \\ ⋮ & ⋮ & \dots & ⋮ \\ x_{n 1} & x_{n 2} & \dots & x_{n p} \end{matrix}]$

where x_ij is the jrd value of the ind sample and i = 1, ⋯n, j = 1, ⋯p. 1)

Min-Max normalization of the X_n×p matrix yields Y_a×p. (7) $Y_{n \times p} = [\begin{matrix} y_{11} & y_{12} & \dots & y_{1 p} \\ y_{21} & y_{22} & \dots & y_{2 p} \\ ⋮ & ⋮ & \dots & ⋮ \\ y_{n 1} & y_{n 2} & \dots & y_{n p} \end{matrix}]$

where y_i = (x_ij − min x_j)/(max x_j − min x_j), i = 1, 2, ⋯n, j = 1, 2, ⋯p. 2)

Correlation coefficient matrix R: (8) $r_{i j} = \frac{\sum_{k = 1}^{n} (y_{k i} - \bar{y_{i}}) (y_{k j} - \bar{y_{j}})}{\sqrt{\sum_{k = 1}^{n} {(y_{k i} - \bar{y_{i}})}^{2} \sum_{k = 1}^{n} {(y_{k j} - \bar{y_{j}})}^{2}}}$ (9) $R = [\begin{matrix} r_{11} & r_{12} & \dots & r_{1 p} \\ r_{21} & r_{22} & \dots & r_{2 p} \\ ⋮ & ⋮ & \dots & ⋮ \\ r_{n 1} & r_{n 2} & \dots & r_{n p} \end{matrix}]$

The eigenvalues of matrix R were calculated and the m eigenvalues λ_i of matrix R were arranged in descending order 3)

Cumulative contribution of principal components

Generally take the eigenvalues whose cumulative contribution rate reaches 85% or more, λ₁, λ₂, ⋯λ_m(m ≤ p) as the corresponding eigenvalues corresponding to the corresponding m principal components [27].

4)

Principal component loading: (10) $l_{i j} = p (z_{i}, x_{j}) = \sqrt{λ_{i}} e_{i j} (i, j = 1, 2 \dots p)$

where e_ij denotes the jrd component of vector e_i. 5)

Score of each principal component: (11) $Z = [\begin{matrix} z_{11} & z_{12} & \dots & z_{1 m} \\ z_{21} & z_{22} & \dots & z_{2 m} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ z_{n 1} & z_{n 2} & \dots & z_{n m} \end{matrix}]$

The principal component analysis was performed on the feature matrix and the final matrix was obtained as 2347 × 3.

5

Experimentation and data analysis

5.1

Cluster Analysis of Social Sports Instruction and Management Talents

5.1.1

Data sources

The data of this experiment came from the social sports college of a university, containing five samples of social sports guidance and management talents since graduation to date of the various indicators of data, as shown in Table 1. Cluster analysis was carried out by using the parallelized K-Means algorithm according to the weight table of each index determined in Chapter II.

Table 1.

Experimental data table

N	ID	Moral quality	Professional knowledge	Physical and mental quality	Humanistic quality	Practice innovation quality
1	130105	73.87	75.43	74.58	74.67	82.79
2	130926	73.56	71.58	76.82	86.39	80.72
3	130477	74.32	74.02	70.77	84.68	85.84
…	…	…	…	…	…	…
294	131060	73.38	85.19	75.7	83.59	82.6
295	131246	74.43	87.3	70.53	81.2	75.56
296	131432	77.1	83.38	72.96	77.34	77.3
…	…	…	…	…	…	…

5.1.2

K-value selection

The first step of the K-Means algorithm is firstly the selection of the number k of class clusters, which has a great influence on the later clustering process and the final clustering results, and it can be said that whether the selection of the k value is appropriate or not directly determines whether the final clustering results are of practical significance. In this experiment, the choice of k-value determines whether the studied results have real application value for the development of social sports guidance and management talents, optimizing education and teaching, and improving the quality of talent training.

As far as the current domestic and international research results in K-Means clustering are concerned, there is no recognized and authoritative k-value selection strategy proposed. When the number of hypothetical class clusters is equal to or higher than the real number of class clusters, the indicator will rise slowly, and once trying to get less than the real number of class clusters, the indicator will rise sharply. That is, the critical point k value is the optimal number of class clusters. In this experiment, this method is used for the determination of k. The chosen metric is the sum of the center-of-mass distances of the k class clusters.

In this paper, the cases where k is selected for each value from 1 to 10 are compared separately, and the results are shown in Fig. 3.

It can be seen that when k is taken as 10, 9, 8, 7 and 6 in turn, the bar graph shows a rather slow increase, and the change of the slope of the corresponding value of k in the curve graph is not obvious, so it is considered that the real suitable for the experimental data should be taken as k is less than 6. Again, when k is taken as 5 and 4, the increase and the slope in the two graphs have a more obvious change. When k is 3, 2 and 1, the increase in the bar graph rises sharply and the slope in the curve graph changes significantly, therefore, the value of k should be greater than 3. From this, we can draw the result that it is more appropriate to choose the value of k as 4 or 5 in this experiment.

5.1.3

Analysis of clustering results

First of all, from the overall clustering results, the comprehensive quality of all the social sports guidance and management talents of the college ranges from 63 to 85 points, and the number of social sports guidance and management talents with more than 80 points is relatively small, so it can be seen that the overall quality level of the social sports guidance and management talents is in the middle of the range, and there are no social sports guidance and management talents with especially outstanding comprehensive ability to be evaluated by the new index system at the present time. The new index system shows that there are no social sports guidance and management talents with outstanding comprehensive ability. Combined with the actual situation of social sports guidance and management talents, it is also more reasonable to see that most of the school’s professional courses are concentrated in the second semester of the sophomore year and the first semester of the junior year, and the professional specialties of the general social sports guidance and management talents are only gradually highlighted after the three samples.

From the results of the four categories of clustering a, b, c, d, the five first-level quality indicators of the practical innovation quality of the biggest float, the humanistic quality of the smallest float. It shows that the ability of social sports guidance and management talents in practice and innovation (social practice ability, scientific and technological innovation ability, academic research ability, community work ability) is relatively large differences, some social sports guidance and management talents in this aspect of the weighted score almost reached the full score, but there is a part of the social sports guidance and management talents scores do not reach the pass mark, the grade gap is large. This shows that the school still pays more attention to the quality of practice and innovation of social sports guidance and management talents, has carried out the corresponding promotion work, and has achieved certain positive results, or most of the social sports guidance and management talents have realized the importance of this quality, and some of them have already strengthened this aspect of the exercise through specific practice and scientific and technological competitions, etc., and achieved good results. And achieved good results. The effectiveness of this quality is gradually showing, and it is necessary to continue to insist on and improve the cultivation program of practical innovation quality.

The following is a description of the results of each category of the chart to show the general range of scores and total scores of the various qualities of various types of social sports guidance and management personnel, and to make a specific analysis. 1)

Class a

Table 2 shows the clustering results of category a. Category a has the largest number of social sports guidance and management talents, with more than 90 people, accounting for about one-third of the total number of social sports guidance and management talents, and the overall quality level is also 60 to 84 points, encompassing the largest range of scores, which shows that the scores basically meet the standard, but the scores are mostly fluctuating between 60 to 100 points, with no particularly outstanding performance, and a very small number of A very small number of students were below the passing score. It can be said that this part of the social sports guidance and management talents can basically serve as a generalization of the quality of social sports guidance and management talents in the sample, presenting basically an overall educational effect and characteristics of the sample.

2)

Category b

Table 3 shows the clustering results of category b. The number of social sports guidance and management talents in category b is nearly 80, and the overall quality is in the range of 70 to 80 points, and their performance is that all the indexes are above the passing score, and the overall physical and mental quality is the highest in the sample, which are all above 78 points. Moreover, the highest scores of all indicators except the quality of professional knowledge appeared in this group of social sports guidance and management talents.

3)

Category c

Table 4 shows the clustering results of category c. The quality scores of the social sports guidance and management talents in category c are concentrated in 75 to 88 points, and the overall level is “good”, except for the ideological and moral quality of which a very small number of the social sports guidance and management talents are slightly lower than the passing score, the rest of the scores are all over the passing score, and the quality of professional knowledge and the quality of practice and innovation are relatively outstanding. The quality of specialized knowledge and the quality of practice and innovation are relatively outstanding. This part of the social sports guidance and management talents in the ideological performance, collective concept, labor hygiene is relatively deficient. Many of them are relatively individualistic in the sample, resisting team spirit and collective interests with their own will, and their own development is also based on their own ideas, but they are also aware of and consciously improve their learning, practice, physical and mental, and humanistic qualities.

4)

Category d

Table 5 shows the clustering results of category d. The scores of all the indexes of social sports guidance and management talents in category d are the lowest, and the number of people whose practical and innovative qualities are not enough to pass in this category is larger, and the scores are lower, and the ideological and moral qualities and physical and mental qualities of a lot of social sports guidance and management talents are not enough to reach the passing score. It can be said that the overall performance of social sports instruction and management talents in this category is very poor in all qualities.

Table 2.

Cluster result a value representation

Population range	Score Form	Moral quality		Professional knowledge		Physical and mental quality		Humanistic quality		Practice innovation quality		Total score
Population range	Weight	0.23		0.28		0.14		0.12		0.23		1
96	Weighted(low-high)	14	18	24	28	8	13	4	7	10	18	60	84
96	Percent system(low-high)	60.9	78.3	85.7	100	57.1	92.9	33.3	58.3	43.5	78.3	60	84

Table 3.

Cluster result b value representation

Population range	Score Form	Moral quality		Professional knowledge		Physical and mental quality		Humanistic quality		Practice innovation quality		Total score
Population range	Weight	0.23		0.28		0.14		0.12		0.23		1
79	Weighted (low-high)	17	19	19	21	11	13	7	9	16	18	70	80
79	Percent system (low-high)	73.9	82.6	67.9	75	78.6	92.9	58.3	75	69.6	78.3	70	80

Table 4.

Cluster result c value representation

Population range	Score Form	Moral quality		Professional knowledge		Physical and mental quality		Humanistic quality		Practice innovation quality		Total score
Population range	Weight	0.23		0.28		0.14		0.12		0.23		1
67	Weighted(low-high)	16	19	22	25	11	13	9	11	17	20	75	88
67	Percent system(low-high)	69.6	82.6	78.6	89.3	78.6	92.9	75	91.7	73.9	87	75	88

Table 5.

Cluster result d value representation

Population range	Score Form	Moral quality		Professional knowledge		Physical and mental quality		Humanistic quality		Practice innovation quality		Total score
Population range	Weight	0.23		0.28		0.14		0.12		0.23		1
58	Weighted(low-high)	15	18	19	21	8	12	7	9	11	15	60	75
58	Percent system(low-high)	65.2	78.3	67.9	75	57.1	85.7	58.3	75	47.8	65.2	60	75

5.2

Evaluation of Social Sports Instruction and Management Talents by Principal Component Analysis

5.2.1

Principal Component Analysis (PCA) analysis process

Firstly, the clustering results of equity-based social sports guidance and management talents this section mainly focuses on the sample data that has been data standardization, based on the principal component analysis method to carry out a comprehensive evaluation, firstly, according to the correlation matrix to test whether the data is suitable for principal component analysis, as shown in Figure 4.

According to the correlation analysis matrix it can be seen that there is a phase relationship between the 5 variables, which is suitable for principal component extraction.

The feasibility of its principal component analysis was further analyzed by KMO and Bartlett’s test, as shown in Table 6:

Table 6.

KMO and Bartlett test results

KMO sampling availability number		0.861
Bartlett sphericity test	Approximate card	11416.909
	freedom	58
	significance	<0.05

According to the KMO and Bartlett’s test, the KMO value, if higher than 0.8, indicates that it is very suitable for analysis; between 0.7 and 0.8, it indicates that it is more suitable for analysis; if this value is between 0.6 and 0.7, it indicates that it can be analyzed; and if this value is less than 0.6, it indicates that it is not suitable for principal component analysis. Bartlett’s test of sphericity corresponds to a p-value of less than 0.05 passes the test, which also indicates that it is suitable for principal component analysis.

In this case, KMO value = 0.861 and Bartlett’s test corresponding to p-value < 0.05 both indicate that the case data is suitable for principal components.

Principal component analysis is calculated based on the correlation coefficient matrix or covariance matrix, and the eigenvalues or eigenroots are important concepts of the matrix. According to the eigenroot, the proportion of variance contribution of each principal component can be calculated (or called the variance explanation rate, the same below), the variance contribution rate refers to the proportion of the variance explained by the principal component to the total variance; the larger the value, the stronger the ability of the principal component to synthesize the information of the original variables, this paper is based on the total variance explanation of the data of the talent of the social sports instruction and management is specific as shown in Table 7.

Table 7.

Total variance interpretation

Constituent	Total variance interpretation
	Initial eigenvalue			Extracting the load of the load
	Total variance interpretation	Percentage of variance	Sum	Total variance interpretation	Percentage of variance	Sum
1	4.305	47.156	47.156	4.305	47.156	47.156
2	2.976	15.641	62.797	2.976	15.641	62.797
3	2.238	14.584	77.381	2.238	14.584	77.381
4	0.903	12.273	89.654
5	0.656	10.346	100.000

This paper selects the principal components with eigenvalues greater than 1 to carry out principal component analysis, as can be seen from the above table: the principal component analysis will construct 3 principal components (the first 3 components to do the cumulative variance calculation), the eigenroot value is greater than 1, in order of 4.305, 2.976, 2.238. The variance of the 3 principal components of the variance explained rate is 47.156%, 15.641%, 14.584%, respectively, and the cumulative variance Explanation rate is 77.381%, the cumulative variance explanation rate is close to 80%, the information of the original index is lost less, the effect of principal component analysis is more ideal, and it has research significance.

Based on the matrix of component score coefficients it is possible to analyze the information composition of the three principal components that have been identified, as shown in Table 8.

Table 8.

Component score coefficient matrix

Variable name	1	2	3
Moral quality	0.087	0.054	0.016
Professional knowledge	0.012	0.109	0.121
Physical and mental quality	0.025	0.033	0.065
Humanistic quality	0.166	0.137	0.092
Practice innovation quality	0.073	0.041	0.178

5.2.2

Classification evaluation results based on Principal Component Analysis (PCA)

Based on the results of the composite score, the following frequencies of social sports instruction and management talents’ scores were derived as shown in Table 9. It can be seen that the number of social sports guidance and management talents with a comprehensive rating of one to five stars are 11, 68, 121, 74 and 26 respectively.

Table 9.

Principal component composite score frequency graph

Interval	Number	Grade
0~0.2	11	One star
0.2~0.4	68	Binary star
0.4~0.6	121	Three star
0.6~0.8	74	Four star
0.8~1.0	26	Five star

6

Conclusion

Based on the talent portrait technology, this paper uses the optimized clustering algorithm to help governmental and enterprise organs to improve the efficiency of analyzing social sports guidance and management talents. Graduates from the School of Social Sports of a university were selected for analysis, and it was found that the total score of the comprehensive quality of all the social sports guidance and management talents in the school ranged from 63 to 85, and the number of social sports guidance and management talents with scores of 80 or more was relatively small, so it can be seen that the overall quality level of the social sports guidance and management talents is in the middle. Category a has the highest number of social sport instruction and management talents with scores ranging from 60 to 84. The final number of social sports instruction and management talents with a comprehensive rating of one to five stars for the college was 11, 68, 121, 74, and 26, respectively.

Język:: Angielski

Częstotliwość wydawania:: 1 razy w roku
Dziedziny czasopisma:: Nauki biologiczne, Nauki biologiczne, inne, Matematyka, Matematyka stosowana, Matematyka ogólna, Fizyka, Fizyka, inne

Kanał RSS czasopisma

Research on big data visualization in the talent cultivation guarantee system of social sports instruction and management oriented to the concept of OBE

Junling Liu

Data publikacji: 26 wrz 2025

Otrzymano: 10 sty 2025

Przyjęty: 09 maj 2025

DOI: https://doi.org/10.2478/amns-2025-1037

Słowa kluczoweTalent portrait, K-mean clustering, Density parameter, Principal component analysis

© 2025 Junling Liu, published by Sciendo.

This work is licensed under the Creative Commons Attribution 4.0 International License.

Słowa kluczowe
Talent portrait, K-mean clustering, Density parameter, Principal component analysis