An empirical study on the improvement of students’ physical fitness and health by college physical education programs based on the background of big data

True health is not the absence of disease, but the maintenance of the best state in physical, mental and social adaptation [1-2]. Under the strategy of education and health, health education has been an important part of realizing “five education”, and it is highly similar to physical education curriculum in terms of objectives, contents, etc. The integration of health education into physical education classroom is not only conducive to the healthy growth of students and stimulate their interest in learning, but also provides a new development idea for the reform of physical education curriculum of colleges and universities [3-5]. The realization of health education into physical education classroom is not only conducive to students’ healthy growth and stimulate students’ interest in learning, but also provides a new idea for the reform of physical education curriculum in colleges and universities.

At present, physical health is an important indicator of talent selection for social employers, and higher education institutions are important channels for the output of social applied first-line talents, and the university physical education program needs our further thinking and speculation in terms of how to better cultivate talents’ healthy living habits as well as to improve the level of talents’ physical fitness and health [6-7]. University physical education should gradually adapt to the requirements of the times from educational concepts, educational ideas to teaching content, mode, means and methods [8].

Moreover, as the future pillars of the country, the health condition of college students has been deeply concerned by all walks of life. Due to academic, family, emotional and other aspects of the pressure can not be timely catharsis, students suffering from depression, autism, the situation is endless [9-10]. At the same time, the improvement of the quality of life makes students neglect exercise, and obesity, myopia, hypoglycemia and so on become common diseases, which hinder the healthy growth of students. Therefore, in order to reduce the occurrence of students’ risky behaviors and improve students’ physical and mental health, colleges and universities should be committed to infiltrating health education into physical education teaching, not only teaching theoretical knowledge, but also focusing on the combination of theory and practice in the physical education curriculum, to ensure the unity of students’ consciousness and behavior, and to promote the development of students’ healthy behavior habits [11-13].

Quality education emphasizes the cultivation of moral, intellectual, physical and aesthetic comprehensive development of students, advocating that teachers in the development of students’ intelligence at the same time should pay attention to students’ physical fitness, healthy physique is the prerequisite and foundation of everything. In such a background, college physical education teachers must timely adjust the traditional ideological concepts to enhance the physical health of young people oriented curriculum reform, give full play to the nurturing value of physical education courses [14-16]. However, in terms of the current implementation of college physical education curriculum teaching, it is still far from meeting the requirements of quality education, and plays a limited role in helping the development of adolescent physical health, so how to study the reform of physical education curriculum based on the physical health of adolescents is very necessary [17-19].

The participation of students in sports has a positive significance for students’ physical health, but it is necessary to conduct relevant research on how sports affect students physically and mentally and what kind of impact they have. Literature [20] reveals that the construction of school sports facilities significantly affects students’ future participation in sports, based on a follow-up survey of students’ entry into the school learning process. Literature [21] combined health questionnaires, quality of life scales, and functional activity summary scales to reveal the associations between the health status of American adolescents during the cessation of physical activity in the context of the epidemic and factors such as gender, grade, and type of physical activity participation, which provide important references for health professionals in their health policy development. Literature [22] empirically examined the impact of college students’ sports participation and found that college students’ lifelong participation in sports effectively promoted their physical and mental health, and increased their self-esteem and well-being levels. Literature [23] investigated the demographic characteristics of sport participants online and combined independent t-tests, one-way ANOVA, and multifactorial ANOVA methods in a comparative study, noting that recreational sport participants engaged in many indoor sports for health prevention during the outbreak, and concluded that during the outbreak, joint efforts of individuals and organizations were needed to promote the maintenance of exercise habits and avoid social isolation. Literature [24] longitudinally assessed how physical activity affects bone health using a dual-energy x-ray absorptiometry (DXA) tool and found that physical activity promoted significant increases in leg BMD and BMC. Literature [25] systematically reviewed studies related to high-intensity interval training (HIIT) to understand the impact and influence of HIIT on human physical health, and synthesized the results of the studies to learn that HIIT has a positive effect on cardiorespiratory, blood glucose, and vascular function in humans. Literature [26] searched journal papers related to physical activity and the New Crown epidemic for integrated analysis, in which more studies exploring the epidemic’s impact on physical activity levels, followed by the epidemic’s impact on mental health, and noted that physical activity levels in the context of the epidemic declined sharply, but that exercise was effective in mitigating mental health disorders in the context of the epidemic.

Physical education helps to improve students’ physical fitness and motor skills, so it is very meaningful to optimize the teaching around physical education, and the optimization of physical education involves teaching content, teaching methods and the introduction of intelligent information technology. Literature [27] conducted a controlled experimental class in physical education and found that after applying the team game tournament (tgt) strategy to a twelve-week basketball physical education class, students’ motivation to learn physical education increased significantly, but there was no significant improvement in the level of athletic skills. Literature [28] envisioned a motor posture recognition model with quaternion algorithm as the core logic to assist physical education teaching, and conducted simulations through experiments to verify that the proposed model can support classrooms to grasp the level of students’ motor skill specification and effectively ensure the effectiveness of teaching. Literature [29] describes the development of artificial intelligence and virtual reality technology to promote the construction of informationization and intelligence in education, and reviews the research literature on these technologies in the field of education to deepen the understanding of these information technologies in educational practice, especially in physical education subjects. Literature [30] used a hierarchical evaluation strategy to identify the elements of the indicators in the evaluation system and assessed the role of multimedia teaching platforms in physical education, and the results of the evaluation revealed that multimedia teaching platforms play an important role in promoting the effectiveness of physical education teaching in colleges and universities. Literature [31] introduced a virtual reality-assisted convolutional neural network (VRA-CNN) in the Mobile Cloud Computing (MCC) e-learning platform, which strengthened the intensity of the sensor communication, and at the same time provided a real-time interactive environment for sports training for people with disabilities, and finally verified it using experimental methods, which pointed out that the introduction of VRA-CNN method effectively improved the athletes’ sports knowledge and sports confidence. Literature [32] used the Physical Activity Self-Efficacy Scale to explore the factors affecting college students’ sports participation, and suggested that university administrators need to work together with health professionals to promote college students’ sports habits and sustainable participation in lifelong sports.

In this paper, a method for classifying college students’ physical fitness and health data based on unsupervised learning is proposed. The method uses college students’ test data of BMI, lung capacity, body forward flexion, standing long jump, 50-meter run, 1000-meter run (male) or 800-meter run (female), pull-ups (male) or sit-ups (female) as the input data, and through the K-medoids clustering algorithm, the students with similar physical fitness are classified together, and the teacher develops targeted and individualized Teaching content. An empirical study of the changes in students’ physical fitness and health was conducted through a before-and-after controlled experiment.

2

Physical education curriculum design for colleges and universities based on K-medoids clustering algorithm

Due to the large number of college students and the lack of any regularity in the physical fitness test data, which makes it difficult to find out the pattern from it, it is difficult for the teacher to design a personalized exercise program for each student, so as to efficiently and scientifically improve the students’ physical fitness. In this paper, the K-medoids clustering algorithm is utilized to identify students with similar physical fitness and classify their physical fitness characteristics. Teachers based on these data can improve the overall teaching direction of the class and the individual student’s personalized teaching guidance content, consider in advance the teaching strategies and pathways suitable for students in the class, and adapt to the personalized differences that exist between students [33]. For example, badminton and volleyball, which are widely implemented in colleges and universities and are highly popular among students, have the advantages of not requiring high comprehensive personal qualities, little difficulty in starting, and no special restrictions on the selection of venues, which can effectively mobilize students’ interest in learning and participation. In carrying out the badminton course, teachers in teaching students, according to the specific situation of the students were arranged teaching tasks, such as better physical quality and has mastered the swing, grip, serve and other technology students, you can appropriately increase the practice of running and jumping, blocking the ball, swinging and other movements of the movement of a larger amount of action. If the physical quality of students is general or they have not mastered the badminton operation skills, you can practice the simple serve, turn the ball, catch the ball, and other actions.Adopting such differentiated teaching methods can efficiently and scientifically improve the physical quality of students.

2.1

K-medoids clustering analysis

The cluster analysis task is to divide a given data set into clusters, where the data objects in the clusters are as similar as possible, and the data objects between different clusters are so dissimilar that Euclidean distance can be used to measure the similarity between the data objects. Suppose the data set X = {x₁,x₂,…,x_n}, the number of samples is n, the dimension of each sample is p, and the value of the ath attribute of the ith sample is denoted as x_ia. The formula for the Euclidean distance between samples x_i and x_j is given below: (1) $d (x_{i}, x_{j}) = \sqrt{\sum_{a = 1}^{p} {(x_{i a} - x_{j a})}^{2}}$ \[d({{x}_{i}},{{x}_{j}})=\sqrt{\sum\limits_{a=1}^{p}{{{({{x}_{ia}}-{{x}_{ja}})}^{2}}}}\]

Where: d(x_i,x_j) denotes the distance between sample x_i and x_j, and i and j take values ranging from 1 to n.

The clustering error sum of squares, standard deviation and standard deviation per sample formulas are defined below, respectively: (2) $E = \sum_{i = 1}^{k} \sum_{x \in c_{i}} d {(o_{i}, x)}^{2}$ \[E=\sum\limits_{i=1}^{k}{\sum\limits_{x\in {{c}_{i}}}{d}}{{({{o}_{i}},x)}^{2}}\]

In the formula o_i represents the cluster center point of the ind cluster, c_i represents the ith cluster, and x is the sample point attributed to the ith cluster.

The specific formula for the standard deviation of the data set is given below: (3) $v = \sqrt{\frac{1}{n - 1} \sum_{i = 1}^{n} d {(x_{i}, \bar{x})}^{2}}$ \[v=\sqrt{\frac{1}{n-1}\sum\limits_{i=1}^{n}{d}{{({{x}_{i}},\bar{x})}^{2}}}\]

In the formula $\bar{x}$ $\bar{x}$ is the mean of the full data sample and v represents the value of the standard deviation of the data set.

The formula for the standard deviation for each sample is as follows: (4) $v_{i} = \sqrt{\frac{1}{n - 1} \sum_{j = 1}^{n} d {(x_{i}, x_{j})}^{2}}$ \[{{v}_{i}}=\sqrt{\frac{1}{n-1}\sum\limits_{j=1}^{n}{d}{{({{x}_{i}},{{x}_{j}})}^{2}}}\]

Where: v_i is the standard deviation value for each sample.

2.1.1

Initial Center Point Candidate Set

The quality of the final clusters is a factor in selecting the initial centroids in the K-medoids clustering algorithm. When the initial centroids are chosen close to the final clustering centroids, the accuracy of the clustering results is relatively high and the number of iterations to update the centroids is less. When the initial center point is chosen as a sample or an isolated point that is seriously deviated from the final clustering center region, the clustering process is easy to fall into the local extremes, the clustering results are less accurate and the number of iterative updates of the center point is higher. In Kmedoids algorithm, the key factor to improve the effectiveness of cluster analysis includes the initial center point selection [34].

Standard deviation is most often used in probability statistics as a measure of the degree of statistical distribution, defined as the arithmetic square root of the variance, reflecting the level of dispersion of the data. The smaller the standard deviation, the denser the data distribution; the larger the standard deviation, the more discrete the data distribution.The high density of clustered centroids results in a relatively small standard deviation of centroids samples.On the contrary, the density of the isolated point samples is lower and the standard deviation of the isolated point samples is relatively larger [35]. For the given data set X, the standard deviation v_i of each sample x_i is compared with the standard deviation v of the overall data set according to equations (3) and (4), when v_i is less than v, it indicates that x_i is in the region where the distribution density is comparatively higher, and thus the possibility of becoming the center of the clustering is higher. When v_i is greater than v, it indicates that x_i is in a relatively less densely distributed region, and therefore less likely to be the initial centroid. However, it cannot be ruled out that when v_i is slightly larger than v, sample x_i is the center point, so the average standard deviation of all samples larger than v is used as the upper bound of the initial center point candidate set. Thus, both the more intensive sample points in the initial center point candidate set, but also can make the discrete degree of the sample points are not too large in the initial center point candidate set.

In order to avoid isolated points or less intensive sample points to be selected as the initial center point, and also in order to make the initial center point to be selected as more intensive sample points, the initial center point candidate set is defined to better improve the effectiveness and efficiency of clustering. When the standard deviation of a sample is less than the average standard deviation of all samples exceeding the standard deviation of the dataset, this sample point is likely to be the initial centroid, and the initial centroid candidate set s_m is defined as follows: (5) $s_{m} = {x_{i} | v_{i} \leq v', i = 1, 2, ..., n}$ \[{{s}_{m}}=\{{{x}_{i}}|{{v}_{i}}\le v',i=1,2,...,n\}\]

The v′ in the formula is the average of all v_i for which v_i is greater than v.

2.1.2

Algorithm description

In K-medoids clustering analysis, an important step is the initial center point selection. In order to avoid selecting isolated points as initial centroids, and to select more intensive sample points as initial centroids, the initial centroids candidate set defined by Eq. (5) is used to select initial centroids and iteratively update the initial centroids, and the clustering process is realized by the following two steps.

Firstly, two initial centroids are selected in the initial centroid candidate set and the two initial centroids are iteratively updated. When selecting the first initial center point o₁, try to select it as dense as possible, as close as possible to the real clustering center, the formula is as follows: (6) $o_{1} = \arg \min_{x_{i} \in s_{m}} {d_{i} | i = 1, 2, ..., n}$ \[{{o}_{1}}=\arg {{\min }_{{{x}_{i}}\in {{s}_{m}}}}\{{{d}_{i}}|i=1,2,...,n\}\]

The formula for the sum of the distances from sample x_i to all the sample points d_i in Eq: (7) $d_{i} = \sum_{j = 1}^{n} d (x_{i}, x_{j})$ \[{{d}_{i}}=\sum\limits_{j=1}^{n}{d}({{x}_{i}},{{x}_{j}})\]

The second initial center point o₂ is selected to ensure that its degree of density is also relatively high, while it is spaced from the initial center point has been obtained is required to be farther away, so that the initial center point as far as possible to choose in different clusters, to avoid appearing in the same cluster, the formula is as follows: (8) $o_{2} = \arg \max_{x_{i} \in s_{m}} {d (x_{i}, o_{1}) | i = 1, 2, ..., n}$ \[{{o}_{2}}=\arg {{\max }_{{{x}_{i}}\in {{s}_{m}}}}\{d({{x}_{i}},{{o}_{1}})|i=1,2,...,n\}\]

The samples in the dataset are then clustered with reference to the proximity principle, assigning all the sample points to the closest spaced centroids, and then the resulting clusters are updated to continually probe for a sample with the smallest sum of distances from the samples in the cluster to replace the original center, replacing the original centroids with the new centroids, with the following formula: (9) $o_{^{j}}^{''} = \arg \min_{x_{i} \in c_{j}, x_{l} \in c_{j}} {\sum_{l = 1}^{n} d (x_{i}, x_{l}) | i = 1, 2, ..., n}$ \[o_{^{j}}^{''}=\arg {{\min }_{{{x}_{i}}\in {{c}_{j}},{{x}_{l}}\in {{c}_{j}}}}\left\{ \sum\limits_{l=1}^{n}{d}({{x}_{i}},{{x}_{l}})|i=1,2,...,n \right\}\]

Where: c_j is the jnd cluster, x_l has to belong to c_j as x_i. Then the clusters are clustered to calculate the sum of squared clustering errors.

If it is the same as the previous clustering error sum of squares, then the centroids are not updated, otherwise the centroids continue to be updated.

Next, from the initial center point candidate set, select the remaining k – 2 initial center point. Assuming that g(2 ≤ g < k) centroids have been obtained, when selecting the g + 1rd initial centroid, the sample point farthest from the cluster centroid is selected in each cluster, and at the same time, the sample point should belong to the initial centroid candidate set, with the following formula: (10) $o_{i}^{'} = \arg \max_{x_{l} \in c_{i} \cap s_{m}} {d (x_{l}, o_{i}) | l = 1, 2, ..., n}$ \[o_{i}^{'}=\arg {{\max }_{{{x}_{l}}\in {{c}_{i}}\cap {{s}_{m}}}}\left\{ d({{x}_{l}},{{o}_{i}})|l=1,2,...,n \right\}\]

The set of points selected from g cluster is shown in $o^{'} = {o_{1}^{'}, o_{2}^{'}, \dots, o_{g}^{'}}$ ${{o}^{\prime }}=\{o_{1}^{\prime },o_{2}^{\prime },\ldots ,o_{g}^{\prime }\}$. The sample point furthest from the centroid is selected as the g + 1th centroid from o′, ensuring that the g + 1th centroid is most likely to be the centroid of the added cluster, as shown in the following equation: (11) $o_{g + 1} = \arg \max_{o_{j} \in 0} {d (o_{j}^{'}, o_{j}) | j = 1, 2, ..., g}$ \[{{o}_{g+1}}=\arg {{\max }_{{{o}_{j}}\in 0}}\{d(o_{j}^{'},{{o}_{j}})|j=1,2,...,g\}\]

The centroids are then updated for each cluster. In this manner the initial centroids are incrementally added until k centroid is identified and the final clustering results are obtained.

2.2

Classification of students’ physical health based on K-medoids

Assuming that the set of data samples of students’ physical health is X = {X_i = (x_i1,x_i2,⋯,x_im),i = 1,2,⋯,N}, the flowchart of K-medoids synthesis algorithm is shown in Fig. 1.

The realization steps are as follows:

1) Initialize the data samples related to students’ physical fitness and health, select m data sample from the data sample set and set the maximum number of iterations, m data samples are used as the initial centroids, and the m initial centroids can be expressed as (M₁,M₂,⋯,M_m).

2) X_i and X_j are two data samples in the data sample set, and assuming that d_ij is the weighted Euclidean distance between X_i and X_j, it can be expressed as: (12) $d_{i j} = | | p (X_{i} - X_{j}) | |^{2} = \sqrt{\sum_{k = 1}^{m} p {(x_{i k} - x_{j k})}^{2}}$ \[{{d}_{ij}}=||p({{X}_{i}}-{{X}_{j}})|{{|}^{2}}=\sqrt{\sum\limits_{k=1}^{m}{p}{{({{x}_{ik}}-{{x}_{jk}})}^{2}}}\]

Where p represents the weighting factor, the value of which is determined based on the different differences in the contribution of each component in the clustering.

The clustering radius can be expressed using r. Since there is a certain distance between X_i and X_j, the number of missing information PQ on the straight line distance from data sample X_i to data sample X_j at the moment of t can be expressed using τ_ij. When τ_ij(0) = 0, it means that the number of missing information PQ on each straight line path at the initial moment is the same and all are 0. From this, the formula for the number of missing information PQ on the straight line distance between X_i and X_j can be expressed as: (13) $τ_{i j} (t) = {\begin{array}{l} 1, & d_{i j} \leq r \\ 0, & d_{i j} > r \end{array}$ \[{{\tau }_{ij}}(t)=\left\{ \begin{array}{*{35}{l}} 1, & {{d}_{ij}}\le r \\ 0, & {{d}_{ij}}>r \\ \end{array} \right.\]

And whether X_i is subsumed into the X_j field yields equation (14): (14) $p_{i j} (t) = \frac{τ_{i j}^{α} (t) η_{i j}^{β} (t)}{\sum_{s \in S} τ_{s j}^{α} (t) η_{s j}^{β} (t)}$ \[{{p}_{ij}}(t)=\frac{\tau _{ij}^{\alpha }(t)\eta _{ij}^{\beta }(t)}{\sum\limits_{s\in S}{\tau _{sj}^{\alpha }}(t)\eta _{sj}^{\beta }(t)}\]

Where: p_i(t) represents the probability of X_i subsuming to X_j, and when p_i(t) ≥ p₀(0 ≤ p₀ ≤ 1), then X_i subsuming to X_j in the domain, and p₀ represents only a probability constant. Let S = {X,|d_ij ≤ r,s = 1,2,⋯,j,j+1,⋯,N}, η represent the local heuristic function, i.e., the expected degree of data sample X_i moving to data sample X_j. α,β represent the role of the characteristic information accumulated during the multi-factor move in the process of the sample move path, and the role of the heuristic function in the process of the sample move path, respectively.

In summary, the process of determining the optimal path and clustering center between multiple data samples is as follows: Firstly, the distance d_ij between each data sample is calculated using the weighted Euclidean method, and the probability p_ij(t) that a data sample is subsumed into another data sample and the number of missing information PQ is derived from equation (14), which achieves the determination of optimal paths and clustering centers between multiple data samples, which is the historical optimal position of the data samples.

3) Reclustering and analyzing the historical optimal positions of the data samples based on the K-medoids algorithm. Using the historical optimal position of the ACO algorithm as the representative object in the K-medoids algorithm O_j, determine the clusters in which each data sample is located as well as the centroids between the classes.

4) For the new data sample set formed follow the method of step 2), calculate the optimal solution represented by each data sample and update the historical optimal position and global optimal solution of the data sample set.

5) Recalculate the weighted Euclidean distance d_ij between any two data samples to determine the new clustering center and find the optimal path.

6) Set D_i to be the deviation error for the jth clustering and ε to represent the clustering, then analyze the overall error Eq: (15) $D_{j} = \sum_{l = 1}^{J} \sqrt{\sum_{k = 1}^{m} {(x_{l k} - c_{j k})}^{2}}$ \[{{D}_{j}}=\sum\limits_{l=1}^{J}{\sqrt{\sum\limits_{k=1}^{m}{{{({{x}_{lk}}-{{c}_{jk}})}^{2}}}}}\] (16) $ε = \sum_{j = 1}^{k} D_{j}$ \[\varepsilon =\sum\limits_{j=1}^{k}{{{D}_{j}}}\]

Where c_jk denotes the krd component of the jnd clustering center.

At this time to determine whether the clustering ε is within the specified range, such as within the specified range, the clustering stops, if not within the specified range, it is necessary to turn to step 3) to continue the iteration.

7) Termination conditions are reached, the clustering ends, and the optimal clustering center is obtained.

3

Empirical analysis

3.1

Classification of students’ physical health

3.1.1

Data collection

In this paper, the physical fitness data of 600 junior students from a college of University A in the last semester of 2023 is used for analysis, which includes the students’ personal information as well as the physical measurement data, in which BMI, lung capacity, forward body flexion, standing long jump, 50-meter run, 1,000-meter run (for males) or 800-meter run (for females), and pull-up (for males) or sit-up (for females) were used as the input data for this test .

3.1.2

Data pre-processing

Since the amount of college students’ physical test data is relatively large and may contain a lot of noisy data, we first organized the data to remove students with missing certain test attributes, such as sprinting, and students with 0 lung capacity. Then in order to narrow down the impact of different grades and different evaluation indexes, we normalized the physical test data, and the data were limited to [-1,1], which is also beneficial to perform better when using PCA for dimensionality reduction.

3.1.3

Analysis of experimental results

The physical health test data of 320 male students were used as experimental data and analyzed using the K-medoids algorithm, and the data of BMI (A1), lung capacity (B1), body flexion (C1), standing long jump (D1), 50-meter run (E1), 1,000-meter run (F1), and pull-ups (G1) were passed in, and the number of classifications, k, was set at 10, which indicated that they would be divided into ten categories. The number of iterations is selected as 100, if the set of centroids does not change or does not converge by reaching the maximum number of times then the set of centroids of the last time is selected as the centroids, and the final centroid vectors and number of centroids of each cluster are shown in Table 1. The clustering results show that set id8 has the highest number of people, reaching 47 people, and set id4 has the lowest number of people, which is 13 people.

Table 1.

The last central point vector and counting statistics of each cluster

Set id	Center point vector							Cluster number
Set id	A1/kg·m^-2	B1/ml	C1/cm	D1/s	E1/cm	F1/s	G1/per	Cluster number
0	21.5	3363.44	13	276.25	5.78	258.94	8	28
1	19.57	4289.79	13	213.95	8.34	220.18	12	19
2	19.14	3178.68	15.5	220.28	6.98	249.37	6	30
3	16.91	3222.17	12.6	217.49	8.41	247.09	11	32
4	16.26	3141.03	15.4	219.45	8.26	272.95	2	13
5	21.06	4501.99	15.4	240.52	8.22	295.38	4	44
6	18.53	4784.56	17.8	247.24	8.61	329.15	7	31
7	17.61	4623.67	16.5	222.20	6.03	250.20	12	21
8	20.27	3613.72	16.7	198.57	5.18	256.01	8	47
9	18.26	3638.07	16.9	200.39	11.43	212.27	10	35

The physical fitness test data of 280 female students were analyzed using the K-medoids algorithm, and the data of BMI (A2), lung capacity (B2), body flexion (C2), standing long jump (D2), 50 m run (E1), 800 m run (F2), and sit-ups (G2) were passed in and categorized again according to the above steps. The final centroid vectors and quantities for each cluster are shown in Table 2. Among them, cluster id3 has the highest number of 37 people and cluster id0 has the lowest number of 13 people.

Table 2.

The last central point vector and counting statistics of each cluster

Set id	Center point vector							Cluster number
Set id	A2/kg·m^-2	B2/ml	C2/cm	D2/s	E2/cm	F2/s	G2/per	Cluster number
0	18.98	2538.11	16.7	193.58	8.68	218.92	40	13
1	16.03	2494.65	12.6	178.38	8.39	234.48	39	23
2	16.24	3278.83	14.9	180.40	9.25	241.87	31	29
3	19.16	3128.31	16.4	207.28	7.79	230.72	33	37
4	16.81	3166.11	14.2	210.02	8.84	238.89	33	31
5	17.43	2536.41	15.5	147.28	9.50	227.57	38	24
6	22.18	2805.64	17	140.18	8.66	238.10	33	35
7	16.74	2059.28	12.6	168.02	9.18	285.42	35	28
8	20.58	2939.92	17.5	176.07	8.12	248.71	38	26
9	19.96	2828.51	14.3	162.75	9.80	312.13	38	34

The clustering results of the physical health test data were downgraded by PCA, and the distribution of the various clusters of physical health is shown in Fig. 2, (a) and (b) the results of the clustering of boys’ and girls’ physical health, respectively. Where 10 different colors represent 10 categories of different degrees of physical health. Based on the method of this paper can evenly divide students’ physical health into 10 categories. It can be found that multiple test data in the set id4, id6, id7, id8, id9 are close to each other, which indicates that the algorithm can be very good to make the students with similar physical fitness grouped together, which is convenient for teachers to make personalized exercise programs for students.

3.2

Effectiveness of Students’ Physical Fitness Improvement Based on Personalized Instruction

Based on the results of students’ physical fitness and health classification, a personalized exercise program was developed for 20 different categories of students during physical education lectures in the second semester of 2023. At the end of the semester, the results of boys’ and girls’ body indicators before and after the curriculum reform were compared and analyzed, and the total physical health scores were counted to analyze whether the personalized curriculum based on the K-medoids algorithm can improve students’ physical health.

3.2.1

Comparative analysis of the results of the pre- and post-tests on various body indicators

1) The comparative analysis of the results of the boys’ pre-test and post-test for each physical indicator is shown in Table 3. ** indicates P<0.01 with highly significant difference, and * indicates P<0.05 with significant difference (the same below). Boys’ BMI, lung capacity, 50-meter sprint, standing long jump, pull-ups, sitting forward bend, and 1000-meter run all improved. Among them, standing long jump had a highly significant difference (P=0.003<0.01), BMI, lung capacity, 50 meters, 1000 meters, and pull-ups had a significant difference (P<0.05). There was no significant difference in seated forward bending (P=0.069>0.05). Table 3.

Comparison analysis of each indicator results

Project	N	Pretest	Posttest	P
A1/kg·m^-2	320	22.63±4.06	24.51±3.48	0.032*
B1/ml	320	3899.71±654.03	4265.04±512.03	0.029*
C1/cm	320	9.51±6.42	9.93±7.01	0.069
D1/s	320	216.88±20.54	232.49±24.64	0.003**
E1/cm	320	7.69±1.55	7.21±0.96	0.026*
F1/s	320	255.17±21.97	243.36±30.19	0.014*
G1/per	320	3.77±3.57	5.26±4.04	0.019*

2) The comparative analysis of the results of the physical indicators of the girls’ pre-test and post-test is shown in Table 4. the mean value of BMI decreased, and the lung capacity, standing long jump, seated forward bending, and 800 meters increased, with significant differences (P<0.05). There was no significant difference in all others (P>0.05). Table 4.

Comparison analysis of each indicator results

Project	N	Pretest	Posttest	P
A2/kg·m^-2	280	21.69±3.12	21.11±4.80	0.062
B2/ml	280	2632.77±552.29	2831.07±582.54	0.002*
C2/cm	280	13.59±7.53	16.87±7.04	0.066*
D2/s	280	167.95±20.13	174.50±9.58	0.003*
E2/cm	280	8.78±1.68	8.62±0.96	0.024
F2/s	280	246.17±29.98	236.36±30.17	0.017*
G2/per	280	42.11±10.69	44.25±6.05	0.025

3.2.2

Comparative analysis of pre and post-test total physical fitness scores

The total score of physical fitness, according to the National Standard of Physical Fitness for Students, 90 to 100 points is excellent, 80 to 90 is good, 60 to 80 points is passing, and less than 60 points is failing. Comparison of students’ total physical fitness scores in pre- and post-tests is shown in Figure 3. In the comparison of male students’ total physical fitness scores between the pre and post-tests, the excellent rate and good rate of the post-test increased by 7.75% and 4.34% respectively, and the passing rate and failing rate decreased by 7.73% and 4.36% respectively. Girls’ post-test excellence and good rates increased by a total of 14.03%, which is 1.94% higher than those of boys.

In summary, the physical fitness of both boys and girls improved after the reform of physical education curriculum in the context of big data, in which the boys’ improvement in the standing long jump program was the most obvious, and the overall excellence rate and good rate of the girls’ improvement was better than that of the boys.

4

Conclusion

The study proposes a method for identifying college students’ physical health based on the K-medoids algorithm, which clusters college students’ physical health categories and then launches highly targeted personalized course teaching. The physical health data of 600 juniors in a college of University A in the last semester of 2023 were clustered, and male and female students were clustered into 10 different physical health categories, and different teaching contents were adopted for different categories of students. After the curriculum reform, male students had a highly significant difference in standing long jump (P<0.01) and significant differences in BMI, lung capacity, 50 meters, 1000 meters, and pull-ups (P<0.05). Female students had a decrease in BMI mean, significant difference (P<0.05) in lung capacity, standing long jump, sitting forward bend, 800 meters. The total physical fitness scores of male and female students increased by a total of 12.09% and 14.03% in the excellent rate and good rate, respectively, compared with the pre-curriculum reform. The results of the above empirical analyses indicate that students’ physical fitness can be significantly improved after reforming the college physical education curriculum using big data technology.

Lingua:: Inglese

Frequenza di pubblicazione:: 1 volte all'anno
Argomenti della rivista:: Scienze biologiche, Scienze della vita, altro, Matematica, Matematica applicata, Matematica generale, Fisica, Fisica, altro

Feed RSS della rivista

An empirical study on the improvement of students’ physical fitness and health by college physical education programs based on the background of big data

Qian Hou

Pubblicato online: 17 mar 2025

Ricevuto: 12 ott 2024

Accettato: 07 feb 2025

DOI: https://doi.org/10.2478/amns-2025-0343

Parole chiaveBig data, K-medoids clustering, Cluster centroids, Students’ physical health

© 2025 Qian Hou, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Parole chiave
Big data, K-medoids clustering, Cluster centroids, Students’ physical health