Research on big data visualization in the talent cultivation guarantee system of social sports instruction and management oriented to the concept of OBE
Online veröffentlicht: 26. Sept. 2025
Eingereicht: 10. Jan. 2025
Akzeptiert: 09. Mai 2025
DOI: https://doi.org/10.2478/amns-2025-1037
Schlüsselwörter
© 2025 Junling Liu, published by Sciendo.
This work is licensed under the Creative Commons Attribution 4.0 International License.
OBE concept is a kind of educational philosophy that focuses on student-centeredness and on the development of students’ learning outcomes and abilities. In the talent cultivation of social sports guidance and management majors, applying the OBE concept and putting students’ learning outcomes and ability cultivation in the first place is of great significance to improve the quality and market competitiveness of professional talents [1-4]. With the rapid development of social sports, the demand for social sports guidance and management professionals is also increasing [5-6]. In order to cultivate professionals with high level of professionalism to meet the needs of the society, the introduction of the OBE concept has become an effective way of educational reform [7-9].
The major of social sports instruction and management is an important specialty to cultivate social sports workers and management talents. With the rapid development of China’s sports industry, the demand for social sports guidance and management professionals is also increasing [10-13]. However, there are some problems in the process of talent cultivation in this specialty, such as the lack of diversified teaching modes, insufficient cultivation of practical ability, and disconnection with social demand. Therefore, it is of great significance to visualize the big data in the talent cultivation guarantee system of social sports guidance and management for the OBE concept [14-17].
The cultivation of social sports guidance and management professional talents under the OBE concept helps to cultivate high-quality professionals needed for social sports [18-19]. By clarifying the learning outcomes, designing adaptive courses, and strengthening comprehensive evaluation, we can achieve the goals of cultivating practical ability, improving independent learning ability and enhancing comprehensive quality. However, the implementation of the OBE concept also faces challenges, and requires the joint efforts of schools and teachers to improve the training programs and teaching methods to meet the social needs and the requirements of professional talent training [20-23].
Based on the talent portrait technology, this paper uses the improved clustering algorithm to visualize and analyze the data of various indexes of social sports instruction and management talents from five samples of a college since graduation. Firstly, Scrapy framework was selected for data acquisition, and then HanLP Perceptron Segmentation Tool was used for text segmentation. Next, the density parameter was introduced to judge each sample point in the sample dataset and observe whether it is suitable as the initial clustering center. By calculating the overall average distance of the sample dataset, the overall neighborhood of the sample dataset is constructed by selecting the appropriate parameters according to the characteristics of the sample data. Multiple dimensions of the principal component analysis are utilized to present the portrait of social sports instruction and management talents, and the final comprehensive evaluation score is calculated.
Talent profiling can be seen as a branch of user profiling in terms of highly educated and technical talents. The core point is to rely on the real data of individuals to build a comprehensive and accurate character model.
Combined with the current popular data analysis technology and modern information technology, talent profiling not only helps in-depth mining and rational allocation of talent, but also promotes the improvement of talent self-knowledge. By establishing a personalized talent management system and recommendation mechanism, enterprises can more accurately match talents to suitable positions or project teams, which not only saves manpower costs, but also helps to optimize the allocation of talents and improve work efficiency and comprehensive competitiveness.
According to the system needs, the talent portrait construction in this paper can be divided into the following four stages: data collection, text analysis, talent portrait construction, and talent relevance analysis as shown in Figure 1 [24].

Four stages of talent portrait construction
Data Acquisition The basis of the talent portrait system is to analyze the talent data, so it is crucial to accurately and comprehensively obtain talent information. In this paper, according to the needs of the model construction, the data sources are divided into four categories, the platform basic data, talent filling data, network crawling data, and job data. Text analysis The main goal of this stage is to realize the pre-processing of multi-source heterogeneous Chinese text data, so that it meets the system requirements and is available for algorithm recognition. Portrait Construction Based on the labeled data obtained in the text analysis stage, we match them and then build a personal portrait of the talent in combination with the specific needs of the system. Relevance Analysis After the first three steps of the process, we can get a full, three-dimensional portrait of each talent in the professional field.
Scrapy is a powerful Python-based open source web crawler framework designed to help users efficiently and quickly crawl data from web pages. The framework provides a suite of tools and features that enable users to easily build, manage, and extend web crawlers. Scrapy’s flexibility and customizability have made it the tool of choice for many data acquisition projects. The framework provides crawling data functions and rich expansion interface, can meet this paper to build talent profiling system for talent such as papers, patents and other information acquisition needs. The framework can be customized and extended according to your own needs.
Scrapy framework consists of the following modules:
Engine: the engine is used to handle the data flow processing of the whole system, trigger transactions, and is the core of the whole framework. Item: Item defines the data structure of the crawling result, and the crawled data will be assigned to this object. Scheduler: scheduler is used to accept requests sent by the engine and added to the queue, and in the engine again when the request is provided to the engine, can be imagined as a URL priority queue, by which to decide what is the next URL to be crawled, and at the same time to remove duplicate URLs Downloader: Downloader is used to download web page content and return web page content to EGINE, downloader is built on twisted which is an efficient asynchronous model Spiders: Spiders are developer-defined classes that contain crawling logic and web page parsing rules, it is mainly responsible for parsing responses and generating extraction results and new requests. Item Pipeline: item pipeline, responsible for processing items after they are extracted, its main task is to clean, validate and store data. Downloader Middle wares: Downloader Middle wares is a hook framework located between the engine and the downloader, it mainly handles the request and response between the engine and the downloader. Spider Middle wares: Spider Middle wares is a hook framework located between engine and spider, the main task is to process the response and output of spider input and new request.
Most of the data needed for the talent portrait system constructed in this paper are text data crawled from network platforms, which can not be used directly as data samples, and need to be processed by text analyzing technology to be transformed into a digital data set that can be applied to the clustering algorithm. The general process is to use long text data to be processed into Chinese short words (tags), and then after the feature processing engineering will contain the actual significance of the label into the digital form of the data, and finally through the matching algorithm and clustering algorithm to analyze the data to construct a personal talent model.
Parsing technology is an important technology in the field of natural language processing, which is used to cut a continuous text sequence into words or phrases with semantic meaning. For Chinese, due to its large vocabulary, rich word meanings and the absence of spaces or other obvious separators between words, the accuracy and efficiency of the word segmentation technique is crucial to the quality of the subsequent text processing tasks.
In this paper, we adopt the HanLP word segmentation tool, which is mainly based on the perceptual machine model, which is a statistical learning-based approach, and an open-source Chinese natural language processing toolkit developed by the Institute of Natural Language Processing and Social Humanities Computing of the Chinese Academy of Computer Science. Known for its high performance and accuracy, it provides a comprehensive solution for Chinese text processing. HanLP supports a variety of functions, including word segmentation, lexical annotation, named entity recognition, dependent syntax analysis, etc. These functions cover the core tasks of Chinese natural language processing. With a simple and easy-to-use API, users can easily integrate HanLP into their projects and get started quickly without tedious configuration. HanLP utilizes advanced machine learning algorithms such as perceptual machines, which can effectively process Chinese text and achieve remarkable results in terms of accuracy and performance. In addition, HanLP supports multiple language models, allowing users to choose the appropriate model for text processing according to their actual needs. Due to its powerful features and superior performance, HanLP is widely used in various Chinese natural language processing tasks, including academic research, industrial applications and other fields, and has become one of the important tools in Chinese text processing.
The segmentation model of HanLP is mainly based on the Conditional Random Field CRF algorithm, which is a model of conditional probability distribution of another set of output sequences given a set of input sequences conditions, and the base structure is shown in Figure 2.

CRF model
The specific formula is as follows:
K-means algorithm is the most commonly used algorithm in cluster analysis. It is a cluster analysis activity that uses the distance between samples as a tight class indicator for correlation. It is a typical unsupervised clustering method. Thus there is no need for manual expert labeling, which reduces the cost of clustering to a great extent. It is for these reasons K-means method has better applicability than other clustering methods and it has a great potential to improve it in many ways. Various improved
Step 1: We select
Step 2: For the data in the sample set, their belonging is determined by calculating their distances from these clustering centers. We assign them to the class corresponding to the clustering center closest to them according to the closest distance criterion;
Step 3: Update the clustering centers by taking the mean value corresponding to all the objects in each category as the clustering center for that category, calculating the value of the objective function, and then continuing to put in the samples to calculate their distances to continue to classify them according to the closest distance and recalculate the clustering centers;
Step 4: Repeat step 2 and then for each iteration to determine whether the clustering center and the value of the objective function has changed, know that the clustering center and the value of the objective function no longer change, and then output the results.
K-means algorithm is often used because of the relatively simple operation method, the principle is easy compared with other clustering methods, the operation speed is fast and other advantages, so this clustering algorithm is often used by us. However, the K-means algorithm has some disadvantages and shortcomings while having these advantages. For example, the most primitive calculation method has a great deal of randomness in the selection of the initial center, and the K-means method is highly dependent on the selection of the initial center. The selection of the initial clustering center has a great influence on the results of the final cluster analysis. When our initial clustering center is not chosen appropriately, it may cause the algorithm results to be unstable, so that the final clustering results are likely to be incorrect, and it is highly likely that it will converge to the local optimum of the clusters instead of the global optimum. Similarly, the choice of
The K-means algorithm with improved initial clustering centers is a related improvement of the initial clustering centers used to reduce the uncertainty generated by the random selection of clustering centers. In the following, we first introduce the relevant parameters for calculating the initial clustering centers for the improved K-means method. For dataset
where
where
The number of sample points contained within each sample neighborhood is defined as follows: in a
where the sgn function is:
The result of
In the above calculation we can get the density parameters for each sample. Through these density parameters we can make relevant improvements to the K-means method. Its method of selecting the initial clustering center is through the relevant density parameters calculated above. Here we sort the
Based on the above description here I provide a systematic description of how the improved K-means method performs the selection of initial clustering centers:
Step 1: Calculate the distance
Step 2: Calculate the
Step 3: Select the largest
Step 4: Keep repeating step (3) to know that all elements in
Step 5: The selected initial clustering center is noted as
Finally then update the clustering centers
Assuming that the input data is a matrix of
Set the data matrix:
where Min-Max normalization of the
where Correlation coefficient matrix
The eigenvalues of matrix Cumulative contribution of principal components Generally take the eigenvalues whose cumulative contribution rate reaches 85% or more, Principal component loading:
where Score of each principal component:
The principal component analysis was performed on the feature matrix and the final matrix was obtained as 2347 × 3.
The data of this experiment came from the social sports college of a university, containing five samples of social sports guidance and management talents since graduation to date of the various indicators of data, as shown in Table 1. Cluster analysis was carried out by using the parallelized K-Means algorithm according to the weight table of each index determined in Chapter II.
Experimental data table
| N | ID | Moral quality | Professional knowledge | Physical and mental quality | Humanistic quality | Practice innovation quality |
|---|---|---|---|---|---|---|
| 1 | 130105 | 73.87 | 75.43 | 74.58 | 74.67 | 82.79 |
| 2 | 130926 | 73.56 | 71.58 | 76.82 | 86.39 | 80.72 |
| 3 | 130477 | 74.32 | 74.02 | 70.77 | 84.68 | 85.84 |
| … | … | … | … | … | … | … |
| 294 | 131060 | 73.38 | 85.19 | 75.7 | 83.59 | 82.6 |
| 295 | 131246 | 74.43 | 87.3 | 70.53 | 81.2 | 75.56 |
| 296 | 131432 | 77.1 | 83.38 | 72.96 | 77.34 | 77.3 |
| … | … | … | … | … | … | … |
The first step of the K-Means algorithm is firstly the selection of the number k of class clusters, which has a great influence on the later clustering process and the final clustering results, and it can be said that whether the selection of the k value is appropriate or not directly determines whether the final clustering results are of practical significance. In this experiment, the choice of k-value determines whether the studied results have real application value for the development of social sports guidance and management talents, optimizing education and teaching, and improving the quality of talent training.
As far as the current domestic and international research results in K-Means clustering are concerned, there is no recognized and authoritative k-value selection strategy proposed. When the number of hypothetical class clusters is equal to or higher than the real number of class clusters, the indicator will rise slowly, and once trying to get less than the real number of class clusters, the indicator will rise sharply. That is, the critical point k value is the optimal number of class clusters. In this experiment, this method is used for the determination of k. The chosen metric is the sum of the center-of-mass distances of the k class clusters.
In this paper, the cases where k is selected for each value from 1 to 10 are compared separately, and the results are shown in Fig. 3.

K value selection comparison
It can be seen that when k is taken as 10, 9, 8, 7 and 6 in turn, the bar graph shows a rather slow increase, and the change of the slope of the corresponding value of k in the curve graph is not obvious, so it is considered that the real suitable for the experimental data should be taken as k is less than 6. Again, when k is taken as 5 and 4, the increase and the slope in the two graphs have a more obvious change. When k is 3, 2 and 1, the increase in the bar graph rises sharply and the slope in the curve graph changes significantly, therefore, the value of k should be greater than 3. From this, we can draw the result that it is more appropriate to choose the value of k as 4 or 5 in this experiment.
First of all, from the overall clustering results, the comprehensive quality of all the social sports guidance and management talents of the college ranges from 63 to 85 points, and the number of social sports guidance and management talents with more than 80 points is relatively small, so it can be seen that the overall quality level of the social sports guidance and management talents is in the middle of the range, and there are no social sports guidance and management talents with especially outstanding comprehensive ability to be evaluated by the new index system at the present time. The new index system shows that there are no social sports guidance and management talents with outstanding comprehensive ability. Combined with the actual situation of social sports guidance and management talents, it is also more reasonable to see that most of the school’s professional courses are concentrated in the second semester of the sophomore year and the first semester of the junior year, and the professional specialties of the general social sports guidance and management talents are only gradually highlighted after the three samples.
From the results of the four categories of clustering a, b, c, d, the five first-level quality indicators of the practical innovation quality of the biggest float, the humanistic quality of the smallest float. It shows that the ability of social sports guidance and management talents in practice and innovation (social practice ability, scientific and technological innovation ability, academic research ability, community work ability) is relatively large differences, some social sports guidance and management talents in this aspect of the weighted score almost reached the full score, but there is a part of the social sports guidance and management talents scores do not reach the pass mark, the grade gap is large. This shows that the school still pays more attention to the quality of practice and innovation of social sports guidance and management talents, has carried out the corresponding promotion work, and has achieved certain positive results, or most of the social sports guidance and management talents have realized the importance of this quality, and some of them have already strengthened this aspect of the exercise through specific practice and scientific and technological competitions, etc., and achieved good results. And achieved good results. The effectiveness of this quality is gradually showing, and it is necessary to continue to insist on and improve the cultivation program of practical innovation quality.
The following is a description of the results of each category of the chart to show the general range of scores and total scores of the various qualities of various types of social sports guidance and management personnel, and to make a specific analysis.
Class a Table 2 shows the clustering results of category a. Category a has the largest number of social sports guidance and management talents, with more than 90 people, accounting for about one-third of the total number of social sports guidance and management talents, and the overall quality level is also 60 to 84 points, encompassing the largest range of scores, which shows that the scores basically meet the standard, but the scores are mostly fluctuating between 60 to 100 points, with no particularly outstanding performance, and a very small number of A very small number of students were below the passing score. It can be said that this part of the social sports guidance and management talents can basically serve as a generalization of the quality of social sports guidance and management talents in the sample, presenting basically an overall educational effect and characteristics of the sample. Category b Table 3 shows the clustering results of category b. The number of social sports guidance and management talents in category b is nearly 80, and the overall quality is in the range of 70 to 80 points, and their performance is that all the indexes are above the passing score, and the overall physical and mental quality is the highest in the sample, which are all above 78 points. Moreover, the highest scores of all indicators except the quality of professional knowledge appeared in this group of social sports guidance and management talents. Category c Table 4 shows the clustering results of category c. The quality scores of the social sports guidance and management talents in category c are concentrated in 75 to 88 points, and the overall level is “good”, except for the ideological and moral quality of which a very small number of the social sports guidance and management talents are slightly lower than the passing score, the rest of the scores are all over the passing score, and the quality of professional knowledge and the quality of practice and innovation are relatively outstanding. The quality of specialized knowledge and the quality of practice and innovation are relatively outstanding. This part of the social sports guidance and management talents in the ideological performance, collective concept, labor hygiene is relatively deficient. Many of them are relatively individualistic in the sample, resisting team spirit and collective interests with their own will, and their own development is also based on their own ideas, but they are also aware of and consciously improve their learning, practice, physical and mental, and humanistic qualities. Category d Table 5 shows the clustering results of category d. The scores of all the indexes of social sports guidance and management talents in category d are the lowest, and the number of people whose practical and innovative qualities are not enough to pass in this category is larger, and the scores are lower, and the ideological and moral qualities and physical and mental qualities of a lot of social sports guidance and management talents are not enough to reach the passing score. It can be said that the overall performance of social sports instruction and management talents in this category is very poor in all qualities.
Cluster result a value representation
| Population range | Score Form | Moral quality | Professional knowledge | Physical and mental quality | Humanistic quality | Practice innovation quality | Total score | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Weight | 0.23 | 0.28 | 0.14 | 0.12 | 0.23 | 1 | |||||||
| 96 | Weighted(low-high) | 14 | 18 | 24 | 28 | 8 | 13 | 4 | 7 | 10 | 18 | 60 | 84 |
| Percent system(low-high) | 60.9 | 78.3 | 85.7 | 100 | 57.1 | 92.9 | 33.3 | 58.3 | 43.5 | 78.3 | |||
Cluster result b value representation
| Population range | Score Form | Moral quality | Professional knowledge | Physical and mental quality | Humanistic quality | Practice innovation quality | Total score | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Weight | 0.23 | 0.28 | 0.14 | 0.12 | 0.23 | 1 | |||||||
| 79 | Weighted (low-high) | 17 | 19 | 19 | 21 | 11 | 13 | 7 | 9 | 16 | 18 | 70 | 80 |
| Percent system (low-high) | 73.9 | 82.6 | 67.9 | 75 | 78.6 | 92.9 | 58.3 | 75 | 69.6 | 78.3 | |||
Cluster result c value representation
| Population range | Score Form | Moral quality | Professional knowledge | Physical and mental quality | Humanistic quality | Practice innovation quality | Total score | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Weight | 0.23 | 0.28 | 0.14 | 0.12 | 0.23 | 1 | |||||||
| 67 | Weighted(low-high) | 16 | 19 | 22 | 25 | 11 | 13 | 9 | 11 | 17 | 20 | 75 | 88 |
| Percent system(low-high) | 69.6 | 82.6 | 78.6 | 89.3 | 78.6 | 92.9 | 75 | 91.7 | 73.9 | 87 | |||
Cluster result d value representation
| Population range | Score Form | Moral quality | Professional knowledge | Physical and mental quality | Humanistic quality | Practice innovation quality | Total score | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Weight | 0.23 | 0.28 | 0.14 | 0.12 | 0.23 | 1 | |||||||
| 58 | Weighted(low-high) | 15 | 18 | 19 | 21 | 8 | 12 | 7 | 9 | 11 | 15 | 60 | 75 |
| Percent system(low-high) | 65.2 | 78.3 | 67.9 | 75 | 57.1 | 85.7 | 58.3 | 75 | 47.8 | 65.2 | |||
Firstly, the clustering results of equity-based social sports guidance and management talents this section mainly focuses on the sample data that has been data standardization, based on the principal component analysis method to carry out a comprehensive evaluation, firstly, according to the correlation matrix to test whether the data is suitable for principal component analysis, as shown in Figure 4.

Variable correlation analysis
According to the correlation analysis matrix it can be seen that there is a phase relationship between the 5 variables, which is suitable for principal component extraction.
The feasibility of its principal component analysis was further analyzed by KMO and Bartlett’s test, as shown in Table 6:
KMO and Bartlett test results
| KMO sampling availability number | 0.861 | |
| Bartlett sphericity test | Approximate card | 11416.909 |
| freedom | 58 | |
| significance | <0.05 |
According to the KMO and Bartlett’s test, the KMO value, if higher than 0.8, indicates that it is very suitable for analysis; between 0.7 and 0.8, it indicates that it is more suitable for analysis; if this value is between 0.6 and 0.7, it indicates that it can be analyzed; and if this value is less than 0.6, it indicates that it is not suitable for principal component analysis. Bartlett’s test of sphericity corresponds to a p-value of less than 0.05 passes the test, which also indicates that it is suitable for principal component analysis.
In this case, KMO value = 0.861 and Bartlett’s test corresponding to p-value < 0.05 both indicate that the case data is suitable for principal components.
Principal component analysis is calculated based on the correlation coefficient matrix or covariance matrix, and the eigenvalues or eigenroots are important concepts of the matrix. According to the eigenroot, the proportion of variance contribution of each principal component can be calculated (or called the variance explanation rate, the same below), the variance contribution rate refers to the proportion of the variance explained by the principal component to the total variance; the larger the value, the stronger the ability of the principal component to synthesize the information of the original variables, this paper is based on the total variance explanation of the data of the talent of the social sports instruction and management is specific as shown in Table 7.
Total variance interpretation
| Constituent | Total variance interpretation | |||||
|---|---|---|---|---|---|---|
| Initial eigenvalue | Extracting the load of the load | |||||
| Total variance interpretation | Percentage of variance | Sum | Total variance interpretation | Percentage of variance | Sum | |
| 1 | 4.305 | 47.156 | 47.156 | 4.305 | 47.156 | 47.156 |
| 2 | 2.976 | 15.641 | 62.797 | 2.976 | 15.641 | 62.797 |
| 3 | 2.238 | 14.584 | 77.381 | 2.238 | 14.584 | 77.381 |
| 4 | 0.903 | 12.273 | 89.654 | |||
| 5 | 0.656 | 10.346 | 100.000 | |||
This paper selects the principal components with eigenvalues greater than 1 to carry out principal component analysis, as can be seen from the above table: the principal component analysis will construct 3 principal components (the first 3 components to do the cumulative variance calculation), the eigenroot value is greater than 1, in order of 4.305, 2.976, 2.238. The variance of the 3 principal components of the variance explained rate is 47.156%, 15.641%, 14.584%, respectively, and the cumulative variance Explanation rate is 77.381%, the cumulative variance explanation rate is close to 80%, the information of the original index is lost less, the effect of principal component analysis is more ideal, and it has research significance.
Based on the matrix of component score coefficients it is possible to analyze the information composition of the three principal components that have been identified, as shown in Table 8.
Component score coefficient matrix
| Variable name | 1 | 2 | 3 |
|---|---|---|---|
| Moral quality | 0.087 | 0.054 | 0.016 |
| Professional knowledge | 0.012 | 0.109 | 0.121 |
| Physical and mental quality | 0.025 | 0.033 | 0.065 |
| Humanistic quality | 0.166 | 0.137 | 0.092 |
| Practice innovation quality | 0.073 | 0.041 | 0.178 |
Based on the results of the composite score, the following frequencies of social sports instruction and management talents’ scores were derived as shown in Table 9. It can be seen that the number of social sports guidance and management talents with a comprehensive rating of one to five stars are 11, 68, 121, 74 and 26 respectively.
Principal component composite score frequency graph
| Interval | Number | Grade |
|---|---|---|
| 0~0.2 | 11 | One star |
| 0.2~0.4 | 68 | Binary star |
| 0.4~0.6 | 121 | Three star |
| 0.6~0.8 | 74 | Four star |
| 0.8~1.0 | 26 | Five star |
Based on the talent portrait technology, this paper uses the optimized clustering algorithm to help governmental and enterprise organs to improve the efficiency of analyzing social sports guidance and management talents. Graduates from the School of Social Sports of a university were selected for analysis, and it was found that the total score of the comprehensive quality of all the social sports guidance and management talents in the school ranged from 63 to 85, and the number of social sports guidance and management talents with scores of 80 or more was relatively small, so it can be seen that the overall quality level of the social sports guidance and management talents is in the middle. Category a has the highest number of social sport instruction and management talents with scores ranging from 60 to 84. The final number of social sports instruction and management talents with a comprehensive rating of one to five stars for the college was 11, 68, 121, 74, and 26, respectively.
