Research on the optimization system of athlete selection and training effect based on big data
Pubblicato online: 21 mar 2025
Ricevuto: 11 ott 2024
Accettato: 05 feb 2025
DOI: https://doi.org/10.2478/amns-2025-0563
Parole chiave
© 2025 Yongkang Guan, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
In the field of competitive sports, athlete selection and training effects determine the level of athletic teams. The rapid development of big data technology provides new training methods and ideas for the optimization of athlete selection and training effect. Big data is relying on the powerful rapid classification and recording ability of electronic computers for data collection, through the continuous progress of search engine technology, various types of data on the Internet platform of mutual communication, tandem, communication, and in the cloud computing, with the help of the Internet dimension, will be a single data convergence into a huge all-embracing large database [1–3].
Athlete selection is the foundation of competitive sports, and the important measure to improve the quality and skills of athletes is long-term training [4–5]. With the aid of big data, the selection and training methods are based on evidence, and both become more professional and reliable. For example, the U.S. professional basketball league has a powerful data statistics, mining system, the quantification of the game data has been to the extreme, with scoring, lost points, shooting percentage and other statistics up to more than 90 technical indicators [6–7]. In addition, there are also player contribution statistics, that is, by calculating a player's whole game scoring, positional scoring, free throw scoring and other active scoring, first-class assists, steals and other positive technical indicators, minus fouls, lost balls and other negative data through the corresponding formula to produce a series of evaluation indicators [8–10]. On the basis of these evaluation indexes, the basic information, physical indexes, in-competition performance, event results, and training-specific data of athletes are collected through professional medical records, training records of athletes' training bases, and records of the competition process, etc., and the quantitative assessment of the athletes' indexes is carried out by using the big data analysis technology, so as to predict the potential athletic ability and development potential of the athletes [11–13]. In addition, the tacit understanding and matching between players in the team events of competitive sports also affect the results. More advanced data mining techniques can be used to discover the correlation and regularity between athletes and their influence on the team's winning rate, providing a more scientific basis for selection [14–15]. According to the various indicators and characteristics of athletes, combined with the results of big data analysis, personalized training plans can be developed to strengthen the weak links while taking into account the sustainable development of advantageous projects, thus improving the training effect [16–18]. It can be seen that big data has an important application value for the optimization of athlete selection and training effect.
In this paper, an index system that can be used to evaluate the selection of athletes is constructed, and the constructed indexes are pre-processed and analyzed for significance and correlation, and the training content is arranged by verifying the homogeneity among the items. The median is introduced for contour coefficient calculation, and an improved K-means clustering algorithm is proposed based on the contour coefficient. After determining the number of clusters, the improved clustering algorithm was utilized to cluster and analyze the various scores of the selection indexes of the athletes. Subsequently, the athlete and sport models are constructed, and the collaborative filtering and content recommendation algorithms are combined to recommend suitable training programs for athletes and develop training intensity plans for the athletes' tolerance. The article also designed an athlete selection and training optimization system to optimize the selection and training arrangements for athletes.
Through expert investigation and consultation and summarization of research literature on orienteering athlete selection, based on the statistical results of the previous questionnaire survey, the questionnaire of this paper was compiled and the athlete selection indexes were constructed, and as much as possible, the system of selection indexes was listed in six aspects including body morphology, physical function, physical fitness, special sports quality, mental quality, and coaches' evaluation, and the final selection indexes were as shown in Table 1.
Athletes selection indexes
Primary index | Secondary index |
---|---|
Physical form | Weight, height, sitting height, length of legs, finger distance, Kole index, chest circumference, shoulder width, arm’s length, body fat, upper and lower limbs length, hips width, waistline, arch |
Body function | Lung activity, maximum oxygen uptake, hemoglobin, blood pressure, cardio index, blood testosterone level |
Physical quality | Lead-up, 50m run, 1000(800)m run, fixed jump, seated forward bend(sit-ups) |
Special quality | Reading chart, using compass, route selection, sense of direction, distance judgment, checkpoint capture, repositioning |
Mental quality | Cognitive psychology, emotional emotion, will quality, personality characteristics, temperament type, general intelligence, motion intelligence |
Trainer | Sports ability, sports sense, acceptance ability, combat psychological quality, will-quality |
According to the six aspects established in Table 1, including physical form, physical function, physical quality, special sports qualities, mental qualities, and coach evaluation, a cumulative total of 44 secondary athlete selection indicators exists.
Data source and pre-processing
In the previous section, a comprehensive qualitative exploration of athlete selection indicators was conducted to lay down the qualitative support for the subsequent research. In this section, the data from the athlete selection test will be analyzed to provide data support for the subsequent study.
Athlete fitness test data from the 2023-2024 academic year at the University of H was utilized for two quantitative analyses. The total amount of raw data was 7449 entries, of which the total amount of data for female athletes was 3435 entries and the total amount of data for male athletes was 4014 entries. Python's data analysis package Pandas was utilized to preprocess the raw data.
Significance analysis between some indicators
The “physical fitness” was selected as a representative for correlation analysis, and the significance of the two indicators was tested before verifying the correlation between the two indicators. The significance of the difference between the indicators is realized by the principle of t-test with the help of Scipy scientific computing package of Python. For the convenience of presentation, the Chinese names of the indicators were simplified with English designations: BMI (body mass index), CV (lung capacity), ST (seated forward bending), 50m (50-meter run), SLJ (standing long jump), SU (sit-ups), PU (pull-ups), and 1000m ¥ 800m (1000-meter run and 800-meter run).
The significance analysis heatmaps between the events of male and female athletes are shown in Figures 1 and 2, respectively. From the data in the graphs, it can be seen that the significance between all the events of both male and female athletes is ≤ 0.05, which means that there is a significant difference between all the events, and the validity of the validation data.
Pearson correlation coefficient analysis between parts
In this paper, the Pearson correlation coefficient is used to analyze the correlation between indicators. It should be noted that BMI is determined by the larger or smaller the better, so it needs to be considered separately.Pearson correlation coefficient is greater than 0.2 means that there is correlation, and less than 0.2 can be considered as basically irrelevant. By analyzing the correlation between the physical fitness test data of athletes of University of H and the correlation between male and female athletes, the results of the Pearson correlation analysis between male and female athletes are shown in Fig. 3 and Fig. 4, respectively.
It can be seen that the results of 50m running and standing long jump are consistent among male and female athletes, both showing negative correlation. That is to say, the better the results of 50-meter running, the better the results of standing long jump will be to a certain extent. In other programs that show weak positive or weak negative correlation, it can be said that there is more or less a connection between the two programs. Speed is mainly determined by muscle strength, muscle speed, and sensitivity between nerves and muscles. And the standing long jump mainly reacts to the explosive force of the lower limbs and the muscle power of the upper limbs, the explosive force has something of the muscle speed and power, the sensitivity between the nerves and the muscles of the integrated embodiment. Therefore, the 50m running and vertical jump program has certain homogeneity due to the physical quality requirements. The homogeneity of the project is one of the bases for arranging the exercise content and improving the performance of the project.
In other weakly positively correlated and weakly negatively correlated items, more or less certain similarities between the items can be found, which also verifies the credibility of the source data to a certain extent. Except for the items with certain correlation, there is no correlation between most of the items. For example, there is almost no correlation between forward bending while sitting and other activities. Seated forward bending is an item that reflects the flexibility of the body, which means that seated forward bending is independent of the other items. The correlation results of the items are not moderately correlated, which also reflects the scientific nature of the athlete selection index system constructed in this paper, which can comprehensively reflect the athletes' physical qualities, covering the physical form, physical function and physical quality.

Significant analysis between indexes(male)

Significant analysis between indexes(female)

Pearson correlation coefficient analysis between indexes(male)

Pearson correlation coefficient analysis between indexes(female)
K-means clustering algorithm is an iterative relocation algorithm, the algorithm is generally divided into two steps: the first step is iterative, the distance between each sample point is calculated, and then the corresponding sample point is divided into the closest cluster to complete the initial clustering. The second step is relocation. Recalculate the clustering center of each cluster, and divide the closest sample points into their corresponding clusters. Repeat this operation until the clustering center does not change. The algorithm is to obtain the clustering results by means of multiple iterations. The basic idea is: first of all, randomly select
Denote the average sample distance of dataset
The sum of error squares for data set
The Euclidean distance is a distance named after the ancient Greek mathematician Euclid. Therefore, the Euclidean distance, also known as the Euclidean distance, is a common distance metric that measures the true distance between two points in
The Euclidean distance in 2D and 3D space is the distance between two points, and the 2D Euclidean distance formula is:
The 3D Euclidean distance formula is:
Generalizing to
Euclidean distance is a measure of the distance between two sample points. The closer the distance between two sample points, the more similar these two sample points are. Conversely, the less similar these two sample points are [20].
In this section, a K-means clustering algorithm based on optimizing the initial clustering centers and profile coefficients is designed. For a better description of the algorithm, it is assumed that the dataset to be clustered is
The inter-cluster similarity is defined as:
Where,
In order to better show the characteristics of the sample points, this section introduces the median for the calculation of the profile coefficient. Assuming that sample point
Assuming that sample point
Assuming that sample point
The median-based average profile coefficient combines intraclass and interclass distances to evaluate the reasonableness of the overall effect of clustering, and takes a value between -1 and 1. If the value is close to 1, it means that the intraclass distance of the sample is much smaller than the minimum interclass distance, which indicates that the clustering result for this data set is optimal.
Figure 5 shows the effect of fitting the number of clusters of male athletes' selection test scores for the school sports teams in the academic year 2023-24, and Figure 6 shows the effect of fitting the number of clusters of female athletes' test scores. From the curve graph can be seen four male athletes and female athletes respectively clustered k-value changes, with the increase in the number of clusters, the sum of squares within the group is decreasing, and the sum of squares between the groups is increasing, according to the figure and the data background can be clustered into 5 classes, after which the curve rise slows down, and the fitted curve tends to be stable, indicating that clustering into 5 classes is a better fit for the data.

The number of clustering and the fitting effect(male)

The number of clustering and the fitting effect(female)
The five categories of male athletes for each physical quality are summarized in Table 2. The first category had the largest number of male athletes with 401. This category had moderate test results in total score, body mass index, lung capacity, long jump, and forward body flexion endurance run, but had poorer results in sprinting and higher results in pull-ups.
The cluster of male athletes
Total score | BMI | CV | 50m | SLJ | ST | 1000m | PU | Count | |
---|---|---|---|---|---|---|---|---|---|
1 | 77.03 | 22.02 | 3524.06 | 6.28 | 236.62 | 15.52 | 240.14 | 8.19 | 401 |
2 | 67.31 | 23.53 | 3061.73 | 5.93 | 231.14 | 14.05 | 245.08 | 8.01 | 114 |
3 | 73.63 | 20.32 | 3226.11 | 5.92 | 234.86 | 15.14 | 242.22 | 8.57 | 295 |
4 | 77.95 | 20.82 | 3983.44 | 6.23 | 238.03 | 16.10 | 249.25 | 7.95 | 320 |
5 | 79.38 | 24.69 | 4598.46 | 5.90 | 239.67 | 17.05 | 253.55 | 6.90 | 135 |
The second group of male athletes had low total scores and poor results in lung capacity, body mass index, standing long jump, and forward body bends. But performed better in sprint and endurance running events. This category had the least number of male athletes with only 114.
The third category of male athletes had the lowest body mass index, higher performance in sprints, and best performance in pull-ups. There were 295 male athletes in this category.
The fourth category of male athletes had poor performance in endurance running and sprinting, better performance in long jump, forward body flexion, lunges, poor sprinting quality, and poor endurance quality, and there were 320 male athletes in this category.
The fifth category of male athletes had the highest total score, highest long jump, forward body flexion, highest lung capacity and body mass index, but lower scores in pull-ups and endurance running. Although it is the category with the highest total score, it is not well-rounded and lacks more endurance and upper body strength.
Table 3 shows the five categories of female athletes for each physical fitness. According to Table 3, it can be seen the results of female athletes' physical fitness test clustered 5 categories. The first category of female athletes had the highest total score of 91, and this category of female athletes had the best results in long jump, forward bend, endurance run and one-minute sit-up in the selection test, but this category of female athletes had the worst results in the sprint test. The best results in sprinting were obtained by female athletes in categories III and V. The lowest total scores were obtained by female athletes in category III. Category III female athletes had the lowest total score and were the worst in BMI, lung capacity, long jump, forward bending, endurance run, and one minute sit-up, but this category of female athletes had the best results in sprinting, and this category had the lowest number of female athletes, which was only 65. Category V female athletes excelled in sprinting and endurance running but were average in lunges, forward bending, long jump, and one-minute sit-ups. There were 222 female athletes in this category. Category 2 female athletes performed average in every event tested, with balanced scores in each event, no outstanding scores and no overly poor scores. The fourth category of female athletes had higher total scores, better scores in every test, and more balanced scores in every test event.
The cluster of female athletes
Total score | BMI | CV | 50m | SLJ | ST | 1000m | SU | Count | |
---|---|---|---|---|---|---|---|---|---|
1 | 78.61 | 22.83 | 3993.82 | 7.46 | 167.79 | 21.41 | 231.64 | 33.47 | 91 |
2 | 74.48 | 21.24 | 3035.79 | 7.26 | 163.68 | 20.02 | 235.85 | 31.72 | 285 |
3 | 67.00 | 23.59 | 2037.57 | 7.1 | 161.40 | 19.26 | 238.08 | 30.30 | 65 |
4 | 76.67 | 21.99 | 3455.24 | 7.21 | 165.65 | 20.56 | 234.49 | 32.42 | 237 |
5 | 72.69 | 20.78 | 2613.45 | 7.18 | 162.41 | 19.34 | 232.45 | 31.21 | 222 |
Overall, the female athletes clustered results in more complex categorical variables, mainly sprint scores, which are more different from endurance running scores. Female athletes with higher overall scores had lower sprint scores, while female athletes with lower overall scores had higher sprint scores.
In this paper, from the perspectives of both athletes and sports, we combine the collaborative filtering (CF) algorithm with the content-based recommendation (CB) algorithm based on the athlete model and the sports model to recommend sports training programs to athletes that match the athletes' characteristics. The principle of a collaborative filtering algorithm is to predict what current athletes may like based on the past behaviors or opinions of the registered athlete population. From the perspective of the athlete group, the athlete-based collaborative filtering algorithm (UB-CF) is chosen to build a model for the athletes, and then, from the perspective of the sports, the CB recommendation algorithm is used to build a model of the recommended objects based on the sports characteristics, and finally, the UB-CF algorithm and the CB algorithm are combined to form a personalized sports recommendation algorithm, so as to achieve the purpose of giving personalized recommendations to the athletes.
The degree of construction of the athlete model directly affects the recommendation effect of the whole recommendation system, the construction of the sports model not only considers the basic information filled in by the athlete when he/she first registers in the system, and constructs the model for the athlete using the basic information, but also takes into account the change of the athlete's interest in using the system in the short term or the long term, which has to go to the updating of the athlete model, so that the athlete can achieve the purpose of accurate recommendation.
The athlete model is made up of a model of the athlete's basic physical traits and a model of their rating matrix for the program. The basic athlete profile model includes basic attribute information and interests of the athlete. Here the athlete keywords are extracted, each keyword is a basic information of the athlete, the basic information of the athlete is age, gender, BMI, sports category of interest, etc., the athlete model is represented as shown in equation (11):
Where,
The athlete-to-program rating matrix model represents the athlete's liking of the sport he or she is interested in, i.e., the rating scale. The athlete's rating level of the sport contains access, like, share, and favorite. The rating representation is implicitly obtained and expresses the athlete's preference for the recommended sport through the athlete's interface actions.
Because the application domains of recommender systems are different, there is no standard guideline to establish a unified modeling standard for each application. This shows that recommender system modeling has a great impact on recommender systems. In this paper, from the perspective of sports, according to the role of sports can be in the human body can be divided into the upper limbs, trunk and lower limbs. According to sports, the qualities of sports can be divided into strength, speed, flexibility, endurance, and sensitivity: five major sports qualities. If the sports have both upper limb and lower limb sports, the keyword of the object is set to 1, then the trunk sports is set to 0. If the sports can have the three major sports qualities of strength, speed, flexibility, their weights are set to 1, then the two major sports qualities of endurance and agility are set to 0. Eventually, the model of the sports is as shown in equation (12):
Where
The UB-CF algorithm measures and scores the degree of liking for these items based on the historical behavior of the athletes, and then calculates the relationship between the athletes' attitudes and preferences for the same item.
In this paper, Pearson correlation coefficient is used to calculate the correlation between two athletes, and the range of the calculated correlation result should be in [-1, +1], -1 means that there is an inverse influence between the two, +1 means that there is a positive influence, and 0 means that there is no correlation, and the formula is as follows:
By using the above formula, the set of similar athletes or the nearest neighbor set of athletes of the target athlete to be recommended can be calculated, and by denoting the set by
Where,
Finally, based on the predicted values of the athletes for the items, Top-N programs can be selected for the target users based on the descending list.
The CB recommendation algorithm requires only one important feature, i.e., labels, which is needed to decompose the sports into a series of features that are sufficient to indicate the sport and to give a relationship between the sport and the user based on the user's behavior in the system (checking out, sharing, rating, and favoriting) [21].
The cosine similarity formula for CB recommendation algorithm is as follows:
Where
Finally, according to the similarity of the sports in descending order, select the Top-N to the user can be.
Training intensity is one of the main factors in determining training tolerance, and it usually refers to the degree of exhaustion and strain on the body during training. Training intensity is usually measured by physiological indicators such as enzyme shedding, heart rate, and maximum oxygen consumption. It has been found that training intensity has an impact on the raw materials used by the body and the adaptations carried out by the body. Reasonable training intensity not only improves the heart and respiratory capacity, but also enhances muscle strength, bone density, the body's responsiveness to the outside world, and reduces one's own sense of depression, thus improving the human body's physical fitness.
The exercise heart rate equation is as follows:
Where THR stands for Exercise Heart Rate,
The resting heart rate is obtained by measuring the pulse rate for one minute in the morning while awake and before getting out of bed, or it can be measured on all three mornings and the average of the three measurements is obtained. The intensity of training for cravings is now more commonly categorized into three levels, minimum, optimal and maximum intensity, which are calculated as in the formula below:
Eq.
Eight preparatory athletes were selected as experimental subjects and a questionnaire was designed to collect data on the effect of training recommendations. The questionnaire was in the form of a Likert scale with options categorized as 1, 2, 3, 4, 5, and 6, where 6 represents the highest level of satisfaction with the training recommendation and 1 represents the lowest level of satisfaction with the training recommendation.
The athletes' data were automatically divided into training set (70%) and test set (30%), the training set was used to predict the rating matrix of the test set, and then the root mean square error (RMSE) between the true and predicted values was calculated for evaluating the prediction quality of the algorithm, and the smaller the RMSE, the better the model. Since the training and test sets are randomly divided, we use the mean value of 200 experiments to represent the prediction results, which is more convincing. Figure 7 displays the experimental results and algorithm comparison.

Comparison of algorithms
The personalized training recommendation algorithm proposed in this paper can achieve the lowest RMSE value in the four methods with the best results. Under the same similarity calculation model, lower RMSE values can be obtained by using the new recommendation method, which proves the rationality of the personalized training recommendation method proposed in this study.
Table 4 shows the real recommendation results of the test set, while Table 5 shows the recommendations results of the proposed algorithm in this paper. The actual top-2 recommended sports of the 5th athlete are badminton and swimming, while the algorithm recommended badminton and basketball. Other than that, the actual top2 recommendations and predicted top2 recommendations of all other athletes are completely consistent, which also shows the superiority of this algorithm.
The true recommendation of the test set
Stud. | FB. | BB. | VB. | TT. | B. | T. | S. | Top2 recommendation |
---|---|---|---|---|---|---|---|---|
1 | 0.23 | 0.10 | 0.07 | 0.23 | 0.23 | 0.10 | 0.07 | Football/badminton/table tennis |
2 | 0.44 | 0.44 | 0.13 | 0.44 | 0.55 | 0.44 | 0.44 | Badminton, football/basketball/table tennis/tennis/swimming |
3 | 0.10 | 1.60 | 0.07 | 0.07 | 0.55 | 0.10 | 2.12 | Swimming, basketball |
4 | 0.18 | 0.81 | 0.10 | 0.26 | 1.91 | 0.13 | 1.60 | Badminton, swimming |
5 | 0.23 | 0.34 | 0.10 | 0.34 | 3.17 | 0.07 | 0.86 | |
6 | 0.18 | 1.60 | 0.18 | 0.18 | 0.18 | 0.18 | 1.91 | Swimming, basketball |
7 | 0.10 | 3.80 | 0.10 | 0.10 | 0.10 | 0.10 | 2.86 | Basketball, swimming |
8 | 0.23 | 0.10 | 0.07 | 0.23 | 0.23 | 0.10 | 0.07 | Football/badminton |
The true recommendation of the test set
Stud. | FB. | BB. | VB. | TT. | B. | T. | S. | Top2 recommendation |
---|---|---|---|---|---|---|---|---|
1 | 0.50 | 0.27 | 0.09 | 0.20 | 0.64 | 0.09 | 0.19 | Badminton, football |
2 | 0.34 | 0.24 | 0.09 | 0.15 | 0.42 | 0.13 | 0.27 | Badminton, football |
3 | 0.37 | 0.84 | 0.10 | 0.26 | 0.34 | 0.10 | 1.64 | Badminton, swimming, |
4 | 0.10 | 0.94 | 0.19 | 0.18 | 1.68 | 0.14 | 1.64 | Badminton, swimming |
5 | 0.19 | 1.20 | 0.13 | 0.19 | 2.22 | 0.13 | 0.61 | |
6 | 0.38 | 0.85 | 0.10 | 0.26 | 0.33 | 0.10 | 1.61 | Swimming, basketball |
7 | 0.35 | 1.27 | 0.09 | 0.20 | 0.20 | 0.09 | 0.94 | Basketball, swimming |
8 | 0.50 | 0.27 | 0.09 | 0.20 | 0.64 | 0.09 | 0.19 | Badminton, football |
It can be seen that this paper's algorithm recommendation results and the actual results are very close to the 8 preparatory athletes' sports top2 recommendation, for which 7 athletes' algorithm recommendation results coincide with their actual situation, and 1 athlete's algorithm recommendation results are different from the actual situation with one sport.
The traditional athlete collaborative filtering recommendation algorithm only considers whether the athlete likes it or not, which often does not match the actual training scenarios. Experimental results show that the algorithm in this paper has a very superior recommendation effect.
The purpose of the system design is to meet the needs of different levels and stages of selection as a starting point, so that the system simulates the human brain on the selection of candidates for preferential selection, the overall function is to be able to comprehensive management of the selection of information, athletes test indexes to evaluate the results of the analysis of the library to save, on the one hand, can be dynamic evaluation of athletes, on the other hand, with the help of the system can quickly make decisions, that is, to make a one-time decision, can be given to athletes to give preferential sorting of multi-year training stage. On the other hand, the system can make decisions quickly, i.e., one-time decisions and preferential ranking of athletes in the training stage for many years, and at the same time, the system can also evaluate the athletes at the same level and handle some affairs, such as printing reports.
The system structure designed in this paper is shown in Fig. 8. Users (referring to coaches, researchers, managers, etc.) interact with the system through the human-computer multimedia intelligent interface. The multimedia human-computer intelligent interface is a human-computer interface based on multimedia technology, designed and realized by artificial intelligence methods, which provides a variety of functions, such as multimedia information input, output, information storage and processing, and intelligent interaction.

Basic structure diagram of the system
The total system control is a software system based on the management system of each library. It carries out cooperative scheduling, mutual communication, overall control, resource sharing, and cooperative operations for each library.
Database management system (DBMS) is a software system for storing, querying, managing, and maintaining data information. In connection with the characteristics of material selection information, the system initially connects three types of databases and a database dictionary. The center database is used to store test data for material selection and provide other system calls. The standard database is used to store the judging standard and test description of material selection, which is the guideline for user evaluation. The evaluation and analysis results database is used to store the results after the system evaluates and predicts the athletes for dynamic tracking and evaluation. The data dictionary includes the name of the database and the help software for operation, management, and maintenance.
Model library management system (MBMS) is the software system that handles the storage, calling, management, assembly, and construction of models. Model library MB mainly stores evaluation, prediction, and statistical analysis model methodology inventory. It also includes mathematical algorithms, applications, system programs, and other tools used for modeling. Image Bank Management System (IBMS) is a software system designed for the storage, calling, stitching, construction, and management of information. The image library is used to store technical images and statistical graphs for athletes' evaluation.
Knowledge Base Management System KBMS is a software system for storing, querying, managing and maintaining knowledge information. In addition to the system knowledge base, the system has added the neural network knowledge base. The system knowledge base stores some experiences, facts, and reasoning rules used by coaches for evaluation and prediction. The neural network knowledge base mainly contains the self-learning model of neural networks and some rules used for network reasoning.
The sample library management system is a software system for storing, querying, managing, and modifying sample information. The sample library is used to store the samples for neural network training.
The main function of the inference machine is to utilize the knowledge in the system knowledge base, call various models in the model library for inference, or utilize the self-learning model of the neural network knowledge base for self-learning inference.
In this study, an improved K-means clustering algorithm was designed to analyze the athletes' selection test performance, and the athletes were clustered into five categories based on the fitting results. Among the male athletes, the category 2 group had the worst performance in the selection test with a total score of 67.31. While category 5 male athletes had the highest total score performance with a total score of 79.38, their physical fitness was not comprehensive enough. Among female athletes, category 1 group had the highest total score with a total of 78.61 points. The category 3 group had the worst performance with a total test score of 67. The personalized training algorithm designed in this paper achieved lower RMSE values than the other three compared methods. Among the eight experimental subjects selected, the recommendation results of this paper's algorithm for seven of the athletes matched the actual situation, proving the accuracy of the new method's recommendations. It shows that the work in this paper has practical results for both the analysis of selection data and training recommendation optimization of athletes, and the selection and training optimization system constructed in this paper has solid technical support.