An empirical study on the improvement of students’ physical fitness and health by college physical education programs based on the background of big data
Data publikacji: 17 mar 2025
Otrzymano: 12 paź 2024
Przyjęty: 07 lut 2025
DOI: https://doi.org/10.2478/amns-2025-0343
Słowa kluczowe
© 2025 Qian Hou, published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
True health is not the absence of disease, but the maintenance of the best state in physical, mental and social adaptation [1-2]. Under the strategy of education and health, health education has been an important part of realizing “five education”, and it is highly similar to physical education curriculum in terms of objectives, contents, etc. The integration of health education into physical education classroom is not only conducive to the healthy growth of students and stimulate their interest in learning, but also provides a new development idea for the reform of physical education curriculum of colleges and universities [3-5]. The realization of health education into physical education classroom is not only conducive to students’ healthy growth and stimulate students’ interest in learning, but also provides a new idea for the reform of physical education curriculum in colleges and universities.
At present, physical health is an important indicator of talent selection for social employers, and higher education institutions are important channels for the output of social applied first-line talents, and the university physical education program needs our further thinking and speculation in terms of how to better cultivate talents’ healthy living habits as well as to improve the level of talents’ physical fitness and health [6-7]. University physical education should gradually adapt to the requirements of the times from educational concepts, educational ideas to teaching content, mode, means and methods [8].
Moreover, as the future pillars of the country, the health condition of college students has been deeply concerned by all walks of life. Due to academic, family, emotional and other aspects of the pressure can not be timely catharsis, students suffering from depression, autism, the situation is endless [9-10]. At the same time, the improvement of the quality of life makes students neglect exercise, and obesity, myopia, hypoglycemia and so on become common diseases, which hinder the healthy growth of students. Therefore, in order to reduce the occurrence of students’ risky behaviors and improve students’ physical and mental health, colleges and universities should be committed to infiltrating health education into physical education teaching, not only teaching theoretical knowledge, but also focusing on the combination of theory and practice in the physical education curriculum, to ensure the unity of students’ consciousness and behavior, and to promote the development of students’ healthy behavior habits [11-13].
Quality education emphasizes the cultivation of moral, intellectual, physical and aesthetic comprehensive development of students, advocating that teachers in the development of students’ intelligence at the same time should pay attention to students’ physical fitness, healthy physique is the prerequisite and foundation of everything. In such a background, college physical education teachers must timely adjust the traditional ideological concepts to enhance the physical health of young people oriented curriculum reform, give full play to the nurturing value of physical education courses [14-16]. However, in terms of the current implementation of college physical education curriculum teaching, it is still far from meeting the requirements of quality education, and plays a limited role in helping the development of adolescent physical health, so how to study the reform of physical education curriculum based on the physical health of adolescents is very necessary [17-19].
The participation of students in sports has a positive significance for students’ physical health, but it is necessary to conduct relevant research on how sports affect students physically and mentally and what kind of impact they have. Literature [20] reveals that the construction of school sports facilities significantly affects students’ future participation in sports, based on a follow-up survey of students’ entry into the school learning process. Literature [21] combined health questionnaires, quality of life scales, and functional activity summary scales to reveal the associations between the health status of American adolescents during the cessation of physical activity in the context of the epidemic and factors such as gender, grade, and type of physical activity participation, which provide important references for health professionals in their health policy development. Literature [22] empirically examined the impact of college students’ sports participation and found that college students’ lifelong participation in sports effectively promoted their physical and mental health, and increased their self-esteem and well-being levels. Literature [23] investigated the demographic characteristics of sport participants online and combined independent t-tests, one-way ANOVA, and multifactorial ANOVA methods in a comparative study, noting that recreational sport participants engaged in many indoor sports for health prevention during the outbreak, and concluded that during the outbreak, joint efforts of individuals and organizations were needed to promote the maintenance of exercise habits and avoid social isolation. Literature [24] longitudinally assessed how physical activity affects bone health using a dual-energy x-ray absorptiometry (DXA) tool and found that physical activity promoted significant increases in leg BMD and BMC. Literature [25] systematically reviewed studies related to high-intensity interval training (HIIT) to understand the impact and influence of HIIT on human physical health, and synthesized the results of the studies to learn that HIIT has a positive effect on cardiorespiratory, blood glucose, and vascular function in humans. Literature [26] searched journal papers related to physical activity and the New Crown epidemic for integrated analysis, in which more studies exploring the epidemic’s impact on physical activity levels, followed by the epidemic’s impact on mental health, and noted that physical activity levels in the context of the epidemic declined sharply, but that exercise was effective in mitigating mental health disorders in the context of the epidemic.
Physical education helps to improve students’ physical fitness and motor skills, so it is very meaningful to optimize the teaching around physical education, and the optimization of physical education involves teaching content, teaching methods and the introduction of intelligent information technology. Literature [27] conducted a controlled experimental class in physical education and found that after applying the team game tournament (tgt) strategy to a twelve-week basketball physical education class, students’ motivation to learn physical education increased significantly, but there was no significant improvement in the level of athletic skills. Literature [28] envisioned a motor posture recognition model with quaternion algorithm as the core logic to assist physical education teaching, and conducted simulations through experiments to verify that the proposed model can support classrooms to grasp the level of students’ motor skill specification and effectively ensure the effectiveness of teaching. Literature [29] describes the development of artificial intelligence and virtual reality technology to promote the construction of informationization and intelligence in education, and reviews the research literature on these technologies in the field of education to deepen the understanding of these information technologies in educational practice, especially in physical education subjects. Literature [30] used a hierarchical evaluation strategy to identify the elements of the indicators in the evaluation system and assessed the role of multimedia teaching platforms in physical education, and the results of the evaluation revealed that multimedia teaching platforms play an important role in promoting the effectiveness of physical education teaching in colleges and universities. Literature [31] introduced a virtual reality-assisted convolutional neural network (VRA-CNN) in the Mobile Cloud Computing (MCC) e-learning platform, which strengthened the intensity of the sensor communication, and at the same time provided a real-time interactive environment for sports training for people with disabilities, and finally verified it using experimental methods, which pointed out that the introduction of VRA-CNN method effectively improved the athletes’ sports knowledge and sports confidence. Literature [32] used the Physical Activity Self-Efficacy Scale to explore the factors affecting college students’ sports participation, and suggested that university administrators need to work together with health professionals to promote college students’ sports habits and sustainable participation in lifelong sports.
In this paper, a method for classifying college students’ physical fitness and health data based on unsupervised learning is proposed. The method uses college students’ test data of BMI, lung capacity, body forward flexion, standing long jump, 50-meter run, 1000-meter run (male) or 800-meter run (female), pull-ups (male) or sit-ups (female) as the input data, and through the K-medoids clustering algorithm, the students with similar physical fitness are classified together, and the teacher develops targeted and individualized Teaching content. An empirical study of the changes in students’ physical fitness and health was conducted through a before-and-after controlled experiment.
Due to the large number of college students and the lack of any regularity in the physical fitness test data, which makes it difficult to find out the pattern from it, it is difficult for the teacher to design a personalized exercise program for each student, so as to efficiently and scientifically improve the students’ physical fitness. In this paper, the K-medoids clustering algorithm is utilized to identify students with similar physical fitness and classify their physical fitness characteristics. Teachers based on these data can improve the overall teaching direction of the class and the individual student’s personalized teaching guidance content, consider in advance the teaching strategies and pathways suitable for students in the class, and adapt to the personalized differences that exist between students [33]. For example, badminton and volleyball, which are widely implemented in colleges and universities and are highly popular among students, have the advantages of not requiring high comprehensive personal qualities, little difficulty in starting, and no special restrictions on the selection of venues, which can effectively mobilize students’ interest in learning and participation. In carrying out the badminton course, teachers in teaching students, according to the specific situation of the students were arranged teaching tasks, such as better physical quality and has mastered the swing, grip, serve and other technology students, you can appropriately increase the practice of running and jumping, blocking the ball, swinging and other movements of the movement of a larger amount of action. If the physical quality of students is general or they have not mastered the badminton operation skills, you can practice the simple serve, turn the ball, catch the ball, and other actions.Adopting such differentiated teaching methods can efficiently and scientifically improve the physical quality of students.
The cluster analysis task is to divide a given data set into clusters, where the data objects in the clusters are as similar as possible, and the data objects between different clusters are so dissimilar that Euclidean distance can be used to measure the similarity between the data objects. Suppose the data set
Where:
The clustering error sum of squares, standard deviation and standard deviation per sample formulas are defined below, respectively:
In the formula
The specific formula for the standard deviation of the data set is given below:
In the formula
The formula for the standard deviation for each sample is as follows:
Where:
The quality of the final clusters is a factor in selecting the initial centroids in the K-medoids clustering algorithm. When the initial centroids are chosen close to the final clustering centroids, the accuracy of the clustering results is relatively high and the number of iterations to update the centroids is less. When the initial center point is chosen as a sample or an isolated point that is seriously deviated from the final clustering center region, the clustering process is easy to fall into the local extremes, the clustering results are less accurate and the number of iterative updates of the center point is higher. In Kmedoids algorithm, the key factor to improve the effectiveness of cluster analysis includes the initial center point selection [34].
Standard deviation is most often used in probability statistics as a measure of the degree of statistical distribution, defined as the arithmetic square root of the variance, reflecting the level of dispersion of the data. The smaller the standard deviation, the denser the data distribution; the larger the standard deviation, the more discrete the data distribution.The high density of clustered centroids results in a relatively small standard deviation of centroids samples.On the contrary, the density of the isolated point samples is lower and the standard deviation of the isolated point samples is relatively larger [35]. For the given data set X, the standard deviation
In order to avoid isolated points or less intensive sample points to be selected as the initial center point, and also in order to make the initial center point to be selected as more intensive sample points, the initial center point candidate set is defined to better improve the effectiveness and efficiency of clustering. When the standard deviation of a sample is less than the average standard deviation of all samples exceeding the standard deviation of the dataset, this sample point is likely to be the initial centroid, and the initial centroid candidate set
The
In K-medoids clustering analysis, an important step is the initial center point selection. In order to avoid selecting isolated points as initial centroids, and to select more intensive sample points as initial centroids, the initial centroids candidate set defined by Eq. (5) is used to select initial centroids and iteratively update the initial centroids, and the clustering process is realized by the following two steps.
Firstly, two initial centroids are selected in the initial centroid candidate set and the two initial centroids are iteratively updated. When selecting the first initial center point
The formula for the sum of the distances from sample
The second initial center point
The samples in the dataset are then clustered with reference to the proximity principle, assigning all the sample points to the closest spaced centroids, and then the resulting clusters are updated to continually probe for a sample with the smallest sum of distances from the samples in the cluster to replace the original center, replacing the original centroids with the new centroids, with the following formula:
Where:
If it is the same as the previous clustering error sum of squares, then the centroids are not updated, otherwise the centroids continue to be updated.
Next, from the initial center point candidate set, select the remaining
The set of points selected from
The centroids are then updated for each cluster. In this manner the initial centroids are incrementally added until
Assuming that the set of data samples of students’ physical health is

Flowchart of K-medoids synthesis algorithm
The realization steps are as follows:
1) Initialize the data samples related to students’ physical fitness and health, select 2) Where The clustering radius can be expressed using And whether Where: In summary, the process of determining the optimal path and clustering center between multiple data samples is as follows: Firstly, the distance 3) Reclustering and analyzing the historical optimal positions of the data samples based on the K-medoids algorithm. Using the historical optimal position of the ACO algorithm as the representative object in the K-medoids algorithm 4) For the new data sample set formed follow the method of step 2), calculate the optimal solution represented by each data sample and update the historical optimal position and global optimal solution of the data sample set. 5) Recalculate the weighted Euclidean distance 6) Set Where At this time to determine whether the clustering 7) Termination conditions are reached, the clustering ends, and the optimal clustering center is obtained.
In this paper, the physical fitness data of 600 junior students from a college of University A in the last semester of 2023 is used for analysis, which includes the students’ personal information as well as the physical measurement data, in which BMI, lung capacity, forward body flexion, standing long jump, 50-meter run, 1,000-meter run (for males) or 800-meter run (for females), and pull-up (for males) or sit-up (for females) were used as the input data for this test .
Since the amount of college students’ physical test data is relatively large and may contain a lot of noisy data, we first organized the data to remove students with missing certain test attributes, such as sprinting, and students with 0 lung capacity. Then in order to narrow down the impact of different grades and different evaluation indexes, we normalized the physical test data, and the data were limited to [-1,1], which is also beneficial to perform better when using PCA for dimensionality reduction.
The physical health test data of 320 male students were used as experimental data and analyzed using the K-medoids algorithm, and the data of BMI (A1), lung capacity (B1), body flexion (C1), standing long jump (D1), 50-meter run (E1), 1,000-meter run (F1), and pull-ups (G1) were passed in, and the number of classifications, k, was set at 10, which indicated that they would be divided into ten categories. The number of iterations is selected as 100, if the set of centroids does not change or does not converge by reaching the maximum number of times then the set of centroids of the last time is selected as the centroids, and the final centroid vectors and number of centroids of each cluster are shown in Table 1. The clustering results show that set id8 has the highest number of people, reaching 47 people, and set id4 has the lowest number of people, which is 13 people.
The last central point vector and counting statistics of each cluster
| Set id | Center point vector | Cluster number | ||||||
|---|---|---|---|---|---|---|---|---|
| A1/kg·m-2 | B1/ml | C1/cm | D1/s | E1/cm | F1/s | G1/per | ||
| 0 | 21.5 | 3363.44 | 13 | 276.25 | 5.78 | 258.94 | 8 | 28 |
| 1 | 19.57 | 4289.79 | 13 | 213.95 | 8.34 | 220.18 | 12 | 19 |
| 2 | 19.14 | 3178.68 | 15.5 | 220.28 | 6.98 | 249.37 | 6 | 30 |
| 3 | 16.91 | 3222.17 | 12.6 | 217.49 | 8.41 | 247.09 | 11 | 32 |
| 4 | 16.26 | 3141.03 | 15.4 | 219.45 | 8.26 | 272.95 | 2 | 13 |
| 5 | 21.06 | 4501.99 | 15.4 | 240.52 | 8.22 | 295.38 | 4 | 44 |
| 6 | 18.53 | 4784.56 | 17.8 | 247.24 | 8.61 | 329.15 | 7 | 31 |
| 7 | 17.61 | 4623.67 | 16.5 | 222.20 | 6.03 | 250.20 | 12 | 21 |
| 8 | 20.27 | 3613.72 | 16.7 | 198.57 | 5.18 | 256.01 | 8 | 47 |
| 9 | 18.26 | 3638.07 | 16.9 | 200.39 | 11.43 | 212.27 | 10 | 35 |
The physical fitness test data of 280 female students were analyzed using the K-medoids algorithm, and the data of BMI (A2), lung capacity (B2), body flexion (C2), standing long jump (D2), 50 m run (E1), 800 m run (F2), and sit-ups (G2) were passed in and categorized again according to the above steps. The final centroid vectors and quantities for each cluster are shown in Table 2. Among them, cluster id3 has the highest number of 37 people and cluster id0 has the lowest number of 13 people.
The last central point vector and counting statistics of each cluster
| Set id | Center point vector | Cluster number | ||||||
|---|---|---|---|---|---|---|---|---|
| A2/kg·m-2 | B2/ml | C2/cm | D2/s | E2/cm | F2/s | G2/per | ||
| 0 | 18.98 | 2538.11 | 16.7 | 193.58 | 8.68 | 218.92 | 40 | 13 |
| 1 | 16.03 | 2494.65 | 12.6 | 178.38 | 8.39 | 234.48 | 39 | 23 |
| 2 | 16.24 | 3278.83 | 14.9 | 180.40 | 9.25 | 241.87 | 31 | 29 |
| 3 | 19.16 | 3128.31 | 16.4 | 207.28 | 7.79 | 230.72 | 33 | 37 |
| 4 | 16.81 | 3166.11 | 14.2 | 210.02 | 8.84 | 238.89 | 33 | 31 |
| 5 | 17.43 | 2536.41 | 15.5 | 147.28 | 9.50 | 227.57 | 38 | 24 |
| 6 | 22.18 | 2805.64 | 17 | 140.18 | 8.66 | 238.10 | 33 | 35 |
| 7 | 16.74 | 2059.28 | 12.6 | 168.02 | 9.18 | 285.42 | 35 | 28 |
| 8 | 20.58 | 2939.92 | 17.5 | 176.07 | 8.12 | 248.71 | 38 | 26 |
| 9 | 19.96 | 2828.51 | 14.3 | 162.75 | 9.80 | 312.13 | 38 | 34 |
The clustering results of the physical health test data were downgraded by PCA, and the distribution of the various clusters of physical health is shown in Fig. 2, (a) and (b) the results of the clustering of boys’ and girls’ physical health, respectively. Where 10 different colors represent 10 categories of different degrees of physical health. Based on the method of this paper can evenly divide students’ physical health into 10 categories. It can be found that multiple test data in the set id4, id6, id7, id8, id9 are close to each other, which indicates that the algorithm can be very good to make the students with similar physical fitness grouped together, which is convenient for teachers to make personalized exercise programs for students.

The distribution of various clusters
Based on the results of students’ physical fitness and health classification, a personalized exercise program was developed for 20 different categories of students during physical education lectures in the second semester of 2023. At the end of the semester, the results of boys’ and girls’ body indicators before and after the curriculum reform were compared and analyzed, and the total physical health scores were counted to analyze whether the personalized curriculum based on the K-medoids algorithm can improve students’ physical health.
1) The comparative analysis of the results of the boys’ pre-test and post-test for each physical indicator is shown in Table 3. ** indicates P<0.01 with highly significant difference, and * indicates P<0.05 with significant difference (the same below). Boys’ BMI, lung capacity, 50-meter sprint, standing long jump, pull-ups, sitting forward bend, and 1000-meter run all improved. Among them, standing long jump had a highly significant difference (P=0.003<0.01), BMI, lung capacity, 50 meters, 1000 meters, and pull-ups had a significant difference (P<0.05). There was no significant difference in seated forward bending (P=0.069>0.05).
Comparison analysis of each indicator results
Project
N
Pretest
Posttest
P
A1/kg·m-2
320
22.63±4.06
24.51±3.48
0.032*
B1/ml
320
3899.71±654.03
4265.04±512.03
0.029*
C1/cm
320
9.51±6.42
9.93±7.01
0.069
D1/s
320
216.88±20.54
232.49±24.64
0.003**
E1/cm
320
7.69±1.55
7.21±0.96
0.026*
F1/s
320
255.17±21.97
243.36±30.19
0.014*
G1/per
320
3.77±3.57
5.26±4.04
0.019*
2) The comparative analysis of the results of the physical indicators of the girls’ pre-test and post-test is shown in Table 4. the mean value of BMI decreased, and the lung capacity, standing long jump, seated forward bending, and 800 meters increased, with significant differences (P<0.05). There was no significant difference in all others (P>0.05).
Comparison analysis of each indicator results
Project
N
Pretest
Posttest
P
A2/kg·m-2
280
21.69±3.12
21.11±4.80
0.062
B2/ml
280
2632.77±552.29
2831.07±582.54
0.002*
C2/cm
280
13.59±7.53
16.87±7.04
0.066*
D2/s
280
167.95±20.13
174.50±9.58
0.003*
E2/cm
280
8.78±1.68
8.62±0.96
0.024
F2/s
280
246.17±29.98
236.36±30.17
0.017*
G2/per
280
42.11±10.69
44.25±6.05
0.025
The total score of physical fitness, according to the National Standard of Physical Fitness for Students, 90 to 100 points is excellent, 80 to 90 is good, 60 to 80 points is passing, and less than 60 points is failing. Comparison of students’ total physical fitness scores in pre- and post-tests is shown in Figure 3. In the comparison of male students’ total physical fitness scores between the pre and post-tests, the excellent rate and good rate of the post-test increased by 7.75% and 4.34% respectively, and the passing rate and failing rate decreased by 7.73% and 4.36% respectively. Girls’ post-test excellence and good rates increased by a total of 14.03%, which is 1.94% higher than those of boys.

The overall performance of physical health is compared
In summary, the physical fitness of both boys and girls improved after the reform of physical education curriculum in the context of big data, in which the boys’ improvement in the standing long jump program was the most obvious, and the overall excellence rate and good rate of the girls’ improvement was better than that of the boys.
The study proposes a method for identifying college students’ physical health based on the K-medoids algorithm, which clusters college students’ physical health categories and then launches highly targeted personalized course teaching. The physical health data of 600 juniors in a college of University A in the last semester of 2023 were clustered, and male and female students were clustered into 10 different physical health categories, and different teaching contents were adopted for different categories of students. After the curriculum reform, male students had a highly significant difference in standing long jump (P<0.01) and significant differences in BMI, lung capacity, 50 meters, 1000 meters, and pull-ups (P<0.05). Female students had a decrease in BMI mean, significant difference (P<0.05) in lung capacity, standing long jump, sitting forward bend, 800 meters. The total physical fitness scores of male and female students increased by a total of 12.09% and 14.03% in the excellent rate and good rate, respectively, compared with the pre-curriculum reform. The results of the above empirical analyses indicate that students’ physical fitness can be significantly improved after reforming the college physical education curriculum using big data technology.
