The Development Path of Physical Education Teaching Informatization in Colleges and Universities in the Context of Modern Teaching Theory

Along with the development of science and technology, smart products come one after another, and people are gradually mesmerized by the virtual world in the network. Students’ awareness of exercise is gradually weakening, and they would rather stay in the dormitory and play with their cell phones than do outdoor activities. Students ignore the importance of their bodies in order to save time playing games or studying [1-2]. This not only enhances students’ subconscious inertia, but also reduces their physical functions, which is not conducive to good physical and mental development [3]. Therefore, the Ministry of Education has proposed the development of physical education informatization in a bid to change this status quo, enhance students’ health, improve their quality, and work together for the development of physical education informatization.

The development of education modernization requires the promotion of the transformation of exam-oriented education to quality education [4]. For physical education, it is necessary to change from the traditional education model based on sports skills training to cultivate students’ lifelong sports awareness, so that they can form good sports thinking and sports habits, and then benefit for a lifetime [5-7]. However, the limitations of the traditional physical education model have led to the fact that in the process of modernization of education, the thinking and methods of physical education have not been deeply transformed, and the traditional thinking of physical education is still used [8-10]. Rather, in the information age, promoting the close integration of physical education and information technology helps to make up for the shortcomings of the traditional physical education teaching mode with the help of the Internet, cloud computing, the Internet of Things and other technologies to achieve the effect of 1+1>2 [11-14]. Actively carry out the construction of physical education informatization, can be constructed through the “wisdom sports” model of education, break through the traditional physical education in time and space limitations, and through the cultivation of students’ sports thinking, better play the function of physical education should be the function of the service function [15-18]. For this reason, colleges and universities should deeply explore the scientific path of physical education informatization development in the context of education modernization, so as to truly promote the reform of physical education in colleges and universities.

This study provides an overview of educational computerization in modern pedagogy from the practical significance of computer-based teaching and the basic model, respectively. Then the process and principle of K-means clustering algorithm are introduced, and then the basic concept of decision tree and ID3 learning algorithm are introduced. Finally, the physical fitness health tests of students at a university are analyzed for physical fitness clustering and decision-making purposes. Clustering analysis is carried out on two groups of data of male and female students, and the test data of male and female students are clustered into four categories, namely A, B, C, and D. And the mean values of the same variables for each category of students were used as weights to visualize and express the change trends among the values of different variables. For decision analysis, this study established an early warning judgment for physical fitness and health based on the CART classification tree model. The final total score of physical fitness greater than or equal to 60 points is recorded as “passing”, and less than 60 points is recorded as “failing” as the leaf node, which corresponds to the decision-making results, and each test item is treated as a different feature or attribute, according to the principle of minimizing the Gini index, and the decision tree for boys and girls is constructed. Each test item is treated as a different feature or attribute, and the decision trees for boys and girls are constructed according to the principle of minimizing the Gini index.

2

Proposed pedagogical model and technological basis

2.1

Overview of Education Informatization under Modern Pedagogy

2.1.1

The Relevance of Informative Teaching and Learning

Informatization teaching is an inevitable choice to promote the reform and development of modernization in the field of education. Since China’s in-depth promotion of education system reform, experts and scholars in the education sector are fully aware of the drawbacks of the traditional duck-type teaching mode, but also recognize the constraints of the examination-based education system on the cultivation of students’ comprehensive quality. However, the reform of the system can not be completed overnight, the long-term thinking stereotypes and cognitive habits, so that many innovations and reform means difficult to truly change the thinking of front-line teachers, as well as major universities and regional education model, these new education model is often only a superficial impact on the teacher’s mode of teaching, or to help teachers to produce all kinds of teaching aids tool model, not really play its due educational value. These new educational models often only superficially affect the teaching style of teachers, or help teachers create all kinds of teaching aids and models, without really playing their due educational value. With the popularization and development of Internet technology, people’s material and spiritual civilization have experienced unprecedented satisfaction, but it has also had a significant impact on the traditional education model. Exploring the information teaching mode of “Internet+Education” has become the main way and inevitable choice for the reform of education modernization.

2.1.2

Basic Models of Informational Teaching and Learning

Under modern pedagogy, this paper focuses on the cooperative mode of informatized teaching. The cooperative mode requires teachers to provide students with different sequences of learning content based on the Internet platform and several kinds of informationized teaching technology, and to adopt an easy group learning mode, in which students deeply communicate with each other in the process of thematic study. Collaborate with each other to gain a deeper understanding and grasp of the content through interaction, which will ultimately internalize the knowledge into the students’ knowledge structure. And under the influence of this model, this kind of partnership, role-playing cooperation, and collaborative cooperation teaching mode can also create new development opportunities.

2.2

K-means clustering algorithm

2.2.1

Process of K-means clustering algorithm

K-means is used to achieve clustering using segmentation. Segmentation, as described here, is the random division of a p-dimensional space into K domains. Based on the K classification, the centroid, or center of mass, of each category is found; second, the initial clustering results are obtained by assigning the observations to the nearest category based on the distance between the observations and the K center of mass. The initial clustering results are generated on the basis of arbitrary spatial partitioning, so it is not certain whether the resulting K categories are a “natural subcategory”, so it is necessary to repeat the study of these categories, according to this idea, the detailed flow of the K-means class algorithm is shown in Figure 1.

The first step is to determine the number of clusters as K. In K-means modeling, choosing the appropriate number of subclasses is complex, considering not only the results of the clustering, but also the problem to be solved. Too many or too few classes will be meaningless.

In the second step, the centers of the K initial classes are found. A “class center” is the center of a type with the most typical characteristics of that type. After determining the number of subclasses K, an initial class center of mass should be determined for each of the K classes.

In the third step, the classes are categorized according to the principle of nearest distance. Calculate the distance of each observation to the K class center of mass in an orderly manner, and then assign all the observations to the nearest class according to the principle of closest distance to the K class center of mass, resulting in K subclasses.

In the fourth step, the center of mass of the K classes is reassigned. The principle is to calculate the mean value of the points of all observations in each class on each variable in an orderly manner and use that mean point as the new class center of mass.

Step 5, determine whether the conditions for terminating the clustering algorithm have been met, and if not, return to step 3 and repeat the above steps until the end conditions are met.

Clustering algorithms generally have two end conditions. One is the number of iterations. The clustering algorithm ends when the current number of iterations is equal to the specified number of iterations. The second is the extent of deviation from the center of mass of the object. The clustering algorithm stops when the class center distance calculated in the most recent iteration exceeds the class center distance calculated in the last iteration. By appropriately increasing the number of iterations or reasonably adjusting the judgment criterion of the center of mass bias, the possible deviation of the initial class center of mass can be effectively corrected.K-means clustering is a repeated iterative process, and the clustering process, the small class inside the sample observation point, will be adjusted non-stop until it is finally stabilized.

2.2.2

Principles of K-means clustering algorithm

The distance between different data objects, measured by some metric, is generally defined using the Euclidean distance. The Euclidean distance between two observations X and Y is the open square of the sum of the squares between the p variable sub-values of the two points.

Definition of Euclidean distance: $E U C L D (x, y = \sqrt{\sum_{i = 1}^{p} {(x_{i} - y_{i})}^{2}})$ $$EUCLD\left( {x,y = \sqrt {\sum\limits_{i = 1}^p {{{\left( {{x_i} - {y_i}} \right)}^2}} } } \right)$$.

The goal of dividing the dataset is to classify each point within the sample into K cluster and categorize each point into the corresponding cluster. We want the points in each cluster to be as close as possible, and at the same time, we want the points in different clusters to be as far apart as possible. Assuming that the partitioned cluster is $(C_{1}, C_{2} \dots C_{K})$ $$\left( {{C_1},{C_2} \ldots {C_K}} \right)$$, then minimize the squared error E:

$3 E = \sum_{i = 1}^{K} \sum_{x \in C_{i}} d (x, c_{i})$ $$3E = \sum\limits_{i = 1}^K {\sum\limits_{x \in {C_i}} d } \left( {x,{c_i}} \right)$$, where c_i is the center of mass of C_i. Then the expression for solving for the center of mass is: $c_{i} = \frac{1}{| c_{i} |} \sum_{x \in C_{i}} x$ $${c_i} = \frac{1}{{\left| {{c_i}} \right|}}\sum\limits_{x \in {C_i}} x$$. After that iterate on the position of the center of mass. Each iteration calculates a new center of clustering, reclassifies the position of the data, and then calculates a new center of clustering. After many iterations, the position of the clustering center is not changed and finally we get the clustering center we need.

2.3

Decision tree methodology

2.3.1

Basic concepts

A decision tree belongs first of all to a tree which has been defined in a textbook on data structures and the graphical representation of a tree is shown in Fig. 2.

Decision tree is a common classification method applied to analyze the prediction or decision making, hence the name classification tree. The probability of various situations is known, and then according to a series of rules, to build a decision tree, to obtain the expected value of the project risk prediction, to determine the feasibility of the decision tree is the use of data using graphical representation of classification methods, and can be explained in the form of IF.... Decision tree is a graphical representation of the data classification method, and can be explained in the form of IF...then, very intuitive and easy to understand.

A decision tree is the entire database, each of its non-leaf nodes is a test of a data attribute, the test results by each edge, a class, or class distribution by the leaf nodes, the root node is the top node of the decision tree. A graphical representation of the decision tree is shown in Figure 3, with rectangles representing the internal nodes and ellipses representing the leaf nodes.

So how do you classify using a decision tree? Suppose there is a tuple that is of unknown class labeling, and on a decision tree, the attribute values of the tuple are tested, starting at the root node, to find a path to a leaf node that holds the class prediction for this tuple.

2.3.2

ID3 learning algorithm

Information entropy, known as the average information quantity in information theory, is an average value used as a measure of the information being transmitted. The information being transmitted in a source consists of a finite number of mutually exclusive and jointly complete events, all of which occur with a certain probability, which is expressed mathematically as: a set of events X₁, ⋯, X_r, occurring with a given probability $p (X_{1}), \dots, p (X_{r})$ $$p\left( {{X_1}} \right), \cdots ,p\left( {{X_r}} \right)$$, whose average value H(X) is the information entropy, and whose value is equal to the mathematical expectation of the (self-)information quantity I(X) of each event, i.e.: 1 $H (X) = - \sum_{r = 1}^{r} p (X i) I (X i) = - \sum_{r = 1}^{r} p (X_{i}) \log p (X_{i})$ $$H(X) = - \sum\limits_{r = 1}^r p (Xi)I(Xi) = - \sum\limits_{r = 1}^r p \left( {{X_i}} \right)\log p\left( {{X_i}} \right)$$

In the traditional ID3 algorithm, the information entropy is used as a criterion for selecting the attribute, based on the data calculation, the value of the information entropy is derived, and then the size of each information entropy is compared, the largest information entropy is selected, and this is used as a criterion for selecting the attribute and the item corresponding to the information entropy is used as the root node of the decision tree. After dividing the example set into subsets using this attribute, the entropy value of the system is minimized, and it is expected that the average path of this non-leaf node to reach the leaf nodes of each descendant is the shortest, so that the average depth of the generated decision tree is smaller. It can be seen that the training example set in the target classification of the more fuzzy and disordered, the higher its entropy; training example set in the target classification of the more clear the more orderly, the lower its entropy, ID3 algorithm is based on the “information win (gain) the greater the attribute of the training examples of the classification of the more favorable,” the principle of the algorithm at each step of the selection of the “The attribute in the attribute list that best classifies the set of training examples”. The information gain of an attribute is the reduction of system entropy due to the use of this attribute to segment the samples. Calculating the information winnings of each attribute and comparing them is the key operation of the ID3 algorithm.

The ID3 algorithm was introduced in the previous section, and the following section will focus on the basic strategies of the ID3 algorithm, which will provide the basis for the data mining implementation in the later section by analyzing the strategies. 1)

The decision tree is developed with individual nodes representing the training samples.

2)

Determine whether the sample is in the same class, if it is in the same class, the corresponding node will be included in the leaf node of the tree and the class will be labeled.

3)

If the sample is not in the same class, an entropy-based metric called information gain is used as heuristic information from which the attribute that best reclassifies the sample is selected, and this attribute becomes the test or decision attribute for this node.

4)

For this new test attribute, create a new branch for it and classify the new sample accordingly.

5)

A decision tree for each sample is formed recursively, and when an attribute appears on a node of the tree, its descendants are not considered.

6)

The stopping condition for the whole recursive process is the fulfillment of one of the following conditions: (1)

The given node sample belongs to the same category.

(2)

There is no other attribute which is used again for making the division of samples, this is the node can be used as a leaf of the decision tree.

(3)

When there are no samples in a particular branch, in this case, the classes in the training set samples are compared and the largest class is used as a criterion to create a leaf.

2.3.3

Improvements to the ID3 algorithm

In this section, ID3 will be improved to increase the speed of computation. 1)

Calculation of information entropy

According to the learning algorithm of ID3 in the previous section, it is known that the main application for calculating the information gain of attributes is the formula of information entropy, which requires many times of transferring the log function, which is a large amount of computation and time-consuming, therefore, according to the demand, this paper will improve the formula of information entropy by utilizing power expansions of the function, i.e., transforming the formula of information entropy to an approximate computation formula that contains addition, subtraction, multiplication and division. The improvement process is as follows:

According to the demand can be set to p_i: 2 $p_{i} = \frac{1 - s_{i}}{1 + s_{i}} \begin{matrix} (0 < p_{i} \leq 1) \\ (0 \leq s_{i} < 1) \end{matrix}$ $${p_i} = \frac{{1 - {s_i}}}{{1 + {s_i}}}\begin{array}{c} {\left( {0 < {p_i} \leq 1} \right)} \\ {\left( {0 \leq {s_i} < 1} \right)} \end{array}$$

From equation (2): 3 $s_{i} = \frac{1 - p_{i}}{1 + p_{i}}$ $${s_i} = \frac{{1 - {p_i}}}{{1 + {p_i}}}$$

Substituting Eq. (2) into Eq. (1) yields: 4 $- \sum_{i = 1}^{n} \frac{1 - s_{i}}{1 + s_{i}} \log_{2} \frac{1 - s_{i}}{1 + s_{i}} = - \sum_{i = 1}^{n} \frac{1 - s_{i}}{1 + s_{i}} \frac{\ln \frac{1 - s_{i}}{1 + s_{i}}}{\ln 2}$ $$ - \sum\limits_{i = 1}^n {\frac{{1 - {s_i}}}{{1 + {s_i}}}} {\log _2}\frac{{1 - {s_i}}}{{1 + {s_i}}} = - \sum\limits_{i = 1}^n {\frac{{1 - {s_i}}}{{1 + {s_i}}}} \frac{{\ln \frac{{1 - {s_i}}}{{1 + {s_i}}}}}{{\ln 2}}$$

Using the power series expansion of the function, the above equation (4) can be written as: 5 $ρ = \frac{2}{\ln 2} \sum_{i = 1}^{n} \frac{1 - s_{i}}{1 + s_{i}} (s_{i} + \frac{1}{3} s_{i}^{3} + \frac{1}{5} s_{i}^{5} + \frac{1}{7} s_{i}^{7} \dots)$ $$\rho = \frac{2}{{\ln 2}}\sum\limits_{i = 1}^n {\frac{{1 - {s_i}}}{{1 + {s_i}}}} \left( {{s_i} + \frac{1}{3}s_i^3 + \frac{1}{5}s_i^5 + \frac{1}{7}s_i^7 \cdots } \right)$$

As the power of the power series is higher, the accuracy of the formula is higher, since in the actual operation is only to compare the size, so it is sufficient to take the first two terms in the above formula to calculate. Thus, the approximate formula for the value is: 6 $ρ = \frac{2}{\ln 2} \sum_{i \neq 1}^{n} \frac{1 - s_{i}}{1 + s_{i}} (s_{i} + \frac{1}{3} s_{i}^{3}) = \frac{2}{3 \ln 2} \sum_{i = 1}^{n} \frac{s_{i} (1 - s_{i}) (3 + s_{i}^{2})}{1 + s_{i}}$ $$\rho = \frac{2}{{\ln 2}}\sum\limits_{i \ne 1}^n {\frac{{1 - {s_i}}}{{1 + {s_i}}}} \left( {{s_i} + \frac{1}{3}s_i^3} \right) = \frac{2}{{3\ln 2}}\sum\limits_{i = 1}^n {\frac{{{s_i}\left( {1 - {s_i}} \right)\left( {3 + s_i^2} \right)}}{{1 + {s_i}}}}$$

Substituting Eq. (3) into Eq. (6) yields: 7 $ρ = \frac{8}{3 \ln 2} \sum_{i = 1}^{n} \frac{p_{i} (1 - p_{i}^{3})}{{(1 + p_{i})}^{3}}$ $$\rho = \frac{8}{{3\ln 2}}\sum\limits_{i = 1}^n {\frac{{{p_i}\left( {1 - p_i^3} \right)}}{{{{\left( {1 + {p_i}} \right)}^3}}}}$$

Substituting Eq. $P_{i} = \frac{x_{i}}{x}$ $${P_i} = \frac{{{x_i}}}{x}$$ into Eq. (7) yields: 8 $ρ = \frac{8}{3 x \ln 2} \sum_{i = 1}^{n} \frac{x_{i} (x^{3} - x_{i}^{3})}{{(x + x_{i})}^{3}}$ $$\rho = \frac{8}{{3x\ln 2}}\sum\limits_{i = 1}^n {\frac{{{x_i}\left( {{x^3} - x_i^3} \right)}}{{{{\left( {x + {x_i}} \right)}^3}}}}$$

$\frac{8}{3 \ln 2}$ $$\frac{8}{{3\ln 2}}$$ in the above equation is a constant that can be omitted from the calculation of the comparison size, so the final formula for calculating the information entropy is: 9 $ρ = \frac{1}{x} \sum_{i = 1}^{n} \frac{x_{i} (x^{3} - x_{i}^{3})}{{(x + x_{i})}^{3}}$ $$\rho = \frac{1}{x}\sum\limits_{i = 1}^n {\frac{{{x_i}\left( {{x^3} - x_i^3} \right)}}{{{{\left( {x + {x_i}} \right)}^3}}}}$$

After the verification of the experiment, it is proved that the operation speed of equation (9) is faster than that of equation (1), the larger the data size, the larger the time gap, and the calculation of information entropy will be applied a lot in generating the decision tree, so when calculating the information entropy, the calculation of information entropy with equation (9) will save time and improve the efficiency of the generation of the decision tree. 2)

Create decision tree

When creating a decision tree, it mainly relies on selecting the attribute with the largest information gain as the splitting node, and the amount of information $I (x_{1}, x_{2}, \dots x_{n})$ $$I\left( {{x_1},{x_2}, \cdots {x_n}} \right)$$ is fixed, so the calculation of the attribute’s information gain mainly depends on the size of the attribute’s information entropy. The larger the information gain of an attribute, the smaller its corresponding information entropy, i.e., when selecting a split node, the attribute with the smallest information entropy can be selected as the root node, and so on down to select all branch nodes and produce a decision tree.

3

Analysis and research on the performance of university students in physical fitness tests

3.1

Descriptive analysis of college students’ physical fitness test

3.1.1

Subject of the study

This paper takes 15637 college students in a university in the first to the fourth year of college as the research object, and takes the test results of the Standard at the end of 2023 as the source of data, in the students who participated in the test 8406 male students and 7231 female students. Each test student collected a total of 10 items of basic information, 8 items of test environment information, 12 items of physical fitness test data, and finally only selected physical function indicators, physical fitness-related indicators, and overall evaluation for data mining. The 2023 test data was stored in SAP format and text format respectively.

3.1.2

Sample data

This paper focuses on the physical fitness test data of school students in 2023, there are a total of 15637 test data, of which 8406 are for boys and 7231 are for girls. The test data were analyzed by SPSS22.0 for frequency and descriptive analysis, and the histogram of total score distribution of male and female students was plotted. The histogram of total score distribution for male students is shown in Figure 4, and the histogram of total score distribution for female students is shown in Figure 5.

By looking at Figures 4 and 5, it can be seen that the distribution of total scores for boys is approximately normal and the distribution of total scores for girls is uneven.

Table 1 is a statistical table of the scores of each item, from the results of the mean value of each test item of boys and girls in Table 1, the mean value of girls’ physical fitness scores is 73.84, and the mean value of boys’ physical fitness scores is 70.46. Most of the girls’ test results are concentrated in the range between passing and good, and there are a few groups of boys’ test results in the range of failing. The main reason for the overall unsatisfactory performance of the boys’ group was the scoring average of 29.61 and 54.28 points in the pull-up program and standing long jump respectively.

Table 1.

Score table for each item

	Male					Female
	Statistics	Min	Max	M	σ	Statistics	Min	Max	M	σ
Vital capacity	8406	0	100	89.32	16.77	7231	0	100	82.02	14.33
50m run	8406	0	100	73.46	9.80	7231	0	100	71.52	9.51
Standing long jump	8406	0	100	54.28	25.16	7231	0	100	69.86	17.25
Sit and reach	8406	0	100	62.03	17.35	7231	0	100	84.57	12.06
Sit-up/pull-up	8406	0	100	29.61	35.27	7231	0	100	71.35	14.38
1000/800m run	8406	0	100	63.68	19.24	7231	0	100	60.49	20.66
Health grade	8406	30.26	112.43	70.46	12.82	7231	23.58	111.62	73.84	10.59

3.2

Analysis of body mass clustering results

The results after clustering were divided into four categories A, B, C, D (total percentage of 1.0000), the same variable values in this four categories of data to the mean value as its weight, the larger the value indicates that in this indicator the class of students sports performance is better, and use this value of the same variables of the four categories of data to compare and analyze.

3.2.1

Post-clustering performance of male data

Table 2 summarizes the percentages of physical fitness in each of the four clustered categories of boys, A, B, C, and D. Boys in category D had the smallest values for each variable of the four categories; students in category A performed the best and students in category D performed the worst in 1000m, 50m, standing long jump, and pull-ups; students in category B performed the best and students in category D performed the worst in the seated forward bending test and lung capacity; students in category B had excellent performance in the 1000m and 50m tests, while students in categories C and D had poor performance in these two events. B students had excellent performance in 1000m and 50m tests, while C and D students performed poorly in these two tests. Comparing the pull-ups of students in categories B, C, and D, students in category C performed well, while students in categories B and D performed poorly. When comparing A, B, and C students in the seated bent-over, 1000m, and 50m events, A students performed the best, B students performed the second best, and D students performed the worst. The performance of all four categories of students in the seated body flexion and the spirometry program was found to be the best for category B students, the second best for category C students, and the worst for category D students in both programs.

Table 2.

The proportion of physical quality of male students

Item	A	B	C	D
Vital capacity	0.227	0.286	0.273	0.214
50m run	0.338	0.271	0.228	0.163
Standing long jump	0.281	0.258	0.263	0.198
Sit and reach	0.240	0.277	0.248	0.235
Pull-up	0.293	0.270	0.221	0.216
1000m run	0.304	0.286	0.226	0.184

3.2.2

Post-clustering performance of girls’ data

Table 3 summarizes the percentage of girls in each of the four categories A, B, C, and D for each individual physical quality. As shown in Table 3, the distribution of the values of each variable in the different categories in the girls’ clustering results is more complicated compared to the boys’ clustering results. In the 50m and 800m test items, the D students performed the best and the B students performed the worst; the C and D students, who performed better in the seated forward bending and lung capacity test items, showed a significant decline in the sit-up program compared with the A and B students, while the A and B students showed an upward trend; in the standing long jump program, the test situation was similar to that of the sit-up program In the standing long jump, the test was similar to the sit-up program, with students in category A performing the best and students in category D performing the worst. Overall, in the female population, students who performed well in the seated forward bend test event also performed well in the 800m event.

Table 3.

The proportion of physical quality of female students

Item	A	B	C	D
Vital capacity	0.246	0.201	0.288	0.265
50m run	0.233	0.211	0.267	0.289
Standing long jump	0.295	0.263	0.229	0.213
Sit and reach	0.236	0.192	0.317	0.255
Sit-up	0.306	0.245	0.232	0.217
800m run	0.231	0.223	0.254	0.292

3.2.3

Data clustering conclusions

In this paper, K-means method in clustering method is selected to cluster analysis of college students’ physical fitness test data, and the mean value of different variable values of students in different categories is used as weights to visualize and express the trend of change of different variable values of students in different categories, and the idea is to use K-means to discover the intrinsic connection between different variable values. The following conclusions can be obtained by studying different categories of male and female students separately: 1)

There is a phenomenon of “the increase of both sides” between strength and flexibility qualities. To address this phenomenon, physical education should strengthen cross-training of strength and flexibility while correcting students’ training attitudes.

2)

There is a big difference between the changes of the test indicators of male and female students. In order to address this situation, in physical education, attention should be paid to the differentiation of training methods between male and female students.

3.3

Early warning analysis for decision-making

The physical fitness test scores of 15,637 students in the school continued to be preprocessed by gender, with the input variables for boys being weight score, lung capacity score, 50-meter run score, standing long jump score, seated body bends score, 1,000-meter run score, and pull-ups score, and for girls being weight score, lung capacity score, 50-meter run score, standing long jump score, seated body bends score, 800-meter run score, and one-minute sit-ups score. Score and one-minute sit-up score, the target variable is whether the total score is passing or not, and the physical fitness warning is divided into two levels: passing is recorded as 1 and failing is recorded as 0. There are a total of 8,406 male students and 7,231 female students, and the data are complete, and the scores of each item are in percentage system, with 60 as the passing line. The fields are explained below:

“Weight Score” is expressed as “TS”, “Spirometry Score” is expressed as “FHL”, “50m Run Score” is expressed as “WSM”, “Standing Long Jump Score” is expressed as “LDTY”, “Sitting Forward Flex Score” is expressed as “ZWT”, “800m Run Score” is expressed as “BBM”, “1000m Run Score” is expressed as “YQM”, “Sit-up Score” is expressed as “YWQZ”, and “Pull-Up Score” is expressed as “YTXS”.

3.3.1

ID3 Classification Tree Physical Fitness Early Warning Modeling for Boys

The dataset of male students has 8406 entries, and the pruning of the decision tree is realized by changing the parameters in it, using the cross-validation method to calculate the Gini index of the subtrees of each test set, and selecting the subtree with the smallest Gini index as the optimal subtree.

By analyzing the relationship between the depth of the decision tree and the error rate, it can be found that the deeper the depth of the decision tree (i.e., the larger the value), the lower the error rate, that is, the higher the rate of classification correctness, but considering the overfitting, the depth of the decision tree can not be increased all the time, so the overall considerations, the choice of the maximum depth of 3, when the data set of the accuracy rate of 95.6%, which indicates that the establishment of the ID3 classification tree model of the male students of the physical fitness health warning has very good applicability. The final ID3 classification tree is shown in Figure 6.

The above ID3 classification tree is transformed into a collection of if-then somatic warning rules with corresponding interpretations (based on the left-to-right order of the leaf nodes):

Rule 1: if FHL ≤ 70 and TZ ≤ 90 and YQM ≤ 65 then leaf node 0.

Explanation: A student with a lung capacity score below 70 and a weight score below 90 and a 1000 meter run score below 65 is failing the overall score. This means that only a high weight score is not necessarily a passing score.

Rule 2: if FHL≤70 and TZ≤90 and YQM>65 then the leaf node is 0.

Explanation: A student with a spirometry score below 70 and a weight score below 90 and a 1000 meter run score above 65 is failing the overall score. Combined with Rule 1, it turns out that lung capacity is too low and weight and 1000m run scores are high, and the total score is still not necessarily passing.

Rule 3: if FHL≤70 and TZ>90 and LDTY≤50 then leaf node is 0.

Explanation: A student with a lung capacity score below 70 and a weight score greater than 90 and a standing long jump score below 50 fails the test. Compare this with rule 1, which shows that even if the body weight is 90 points high, as long as the spirometry score is less than 70 points, the total score is still a fail.

Rule 4: if FHL≤70 and TZ>90 and LDTY>50 then the leaf node is 1.

Explanation: A student whose spirometry score is less than 70 and whose weight score is greater than 90 and whose standing long jump score is greater than 50 has a passing overall score. The rule shows that it is possible to fail an individual test item when the total score is passing.

Rule 5: if FHL>70 and LDTY≤40 and YQM≤70 then the leaf node is 0.

Explanation: A student with a spirometry score above 70 and a standing long jump score below 40 is failing the total score. It means that despite the high importance of lung capacity, the standing long jump score is too low and the total score is still failing.

Rule 6: if FHL>70 and LDTY>40 and TZ≤75 then the leaf node is 1.

Explanation: A student with a lung capacity score above 70 and a standing long jump score above 40 and a body weight below 75 has a passing overall score. Compare this to Rule 1 to further appreciate that the weight score is not as important as the other two event scores.

3.3.2

ID3 Classification Tree Physical Fitness Early Warning Modeling for Female Students

There are 7231 test sets for girls, and the pruning of the decision tree is realized by changing the parameters in it, and the cross validation method is used to calculate the Gini index of the subtrees of each test set, and the subtree with the smallest Gini index is selected as the optimal subtree.

By analyzing the relationship between the depth of the decision tree and the error rate, considering the overfitting, the depth of the decision tree can not be increased all the time, so the maximum depth of the tree is chosen to be 4. At this time, the accuracy rate of the training set is 98.5%, and the accuracy rate of the test set is 99.0%, which indicates that the established ID3 classification tree model also has very good applicability to the girls’ physical health warning. The final ID3 classification tree obtained is shown in Figure 7.

The above ID3 classification tree is transformed into a collection of if-then somatic warning rules with corresponding explanations (based on the left-to-right order of the leaf nodes):

Rule 1: if BBM ≤ 40 and WSM ≤ 65 and YWQZ ≤ 80 then leaf node 0.

Explanation: Students who score less than 40 points in 800 meter run and 65 points in 50 meter run and 80 points in sit-up are failing in total score. 800 meter run and 50 meter run score are both low when the possibility of failing in total score is very high.

Rule 2: if BBM ≤ 40 and WSM ≤ 65 and YWQZ ≤ 80 and WSM > 50 then leaf node 0.

Explanation: A student with an 800 meter run score below 40 and a 50 meter run score above 50 but below 65 and a sit-up score below 80 is failing the overall score. Draw the same conclusion as for Rule 1.

Rule 3: if BBM≤40 and WSM≤65 and YWQZ>80 then leaf node is 1.

Explanation: A student who scores less than 40 in the 800 meter run and less than 65 in the 50 meter run and more than 80 in sit-ups has a passing overall score. For the total score to be passing when the 800 meter run and 50 meter run scores are not high, the sit-ups need to be at the good level and above.

Rule 4: if BBM≤20 and WSM>65 and YWQZ≤50 then the leaf node is 0.

Explanation: 800 meters running score is lower than 20 points and 50 meters running score is higher than 65 points and sit-up score is lower than 50 points of the total score of the students is failed. 800 meters running score is too low, other items have failed, the total score is basically failed.

Rule 5: if BBM≤40 and WSM>65 and YWQZ≤50 and BBM>20 then the leaf node is zero.

Explanation: 800 meter run score is lower than 40 points and high with 20 points and 50 meter run score is higher than 65 points and sit-ups score is lower than 50 points of the total score of the students is failed. As with Rule 4, the 800 meter run score is still somewhat low and the total score is failing.

Rule 6: if BBM≤40 and WSM>65 and YWQZ>50 and LDTY>60 then the leaf node is 1.

Explanation: The total score of students whose 800 meter run score is lower than 40 and 50 meter run score is higher than 65 and sit-up score is higher than 50 and standing long jump score is higher than 60 is passing. Comparison found that standing long jump 60 points is a critical point, higher than 60 points total score pass.

Rule 7: if BBM>40 and WSM≤25 and LDTY≤60 and BBM≤67 then leaf node is 0.

Explanation: 800 meters running score is higher than 40 points but lower than 70 points and 50 meters running score is lower than 25 points and standing long jump score is lower than 60 points of the total score of the students is failed. 800 meters running in the pass with low scores on other items, resulting in a failed total score.

From the above rules, it can be seen that there are more individual items affecting the total score of girls than boys, and some items have low scores, but the total score of the passing items can still be passed.There are more failing rules than passing rules in the 15 rules, and in the passing rules, the value of the characteristic dividing point is a little bit low, which is not visible in the evaluation formula. Through the early warning model, girls can also be trained in the usual 800 meters running, 50 meters running, sit-ups and standing long jump and other items of the results of substitution into the model, according to the model to match the classification rules can be output a prediction, for the physical prediction of the failure of the girls is an early warning, usually need to strengthen the exercise.

4

Conclusion

This study analyzes and researches the physical fitness and health test of college students in a school by using the physical fitness clustering method as well as constructing the ID3 tree physical fitness early warning model for male and female students. A simple analysis of the students’ physical fitness test shows that the mean value of female students’ physical fitness scores is 73.84, and the mean value of male students’ physical fitness scores is 70.46. The distribution of male students’ total scores is approximately normal, while the distribution of female students’ total scores is uneven.

The analysis of the results of physical fitness clustering shows that the boys’ data clustering performance is A, B students perform better, C, D students perform worse; while the girls’ test items from C, D students perform better, A, B students perform worse. It shows that there is a significant difference in the changes between the various test indicators among male and female student groups. Both male and female students have strength and flexibility qualities between the phenomenon of “this and that”.

Analysis of the decision warning model shows that the most important physical fitness test item for male students is lung capacity, and the scores of body weight, standing long jump and 1000 meter run are directly related to whether they pass or fail the final total score. For girls, the most important physical fitness test item was the 800-meter run, and the scores of the 50-meter run, lung capacity, standing long jump, 800-meter run, and sit-ups were directly related to whether the final total score was passing or not.The ID3 categorical number of physical fitness early warning model showed that there were more individual items affecting the total score of the girls than the boys, and that the scores of some items were lower, but the total score of the items that existed to pass could be passed or not. It is not necessary to have the results of all test items to judge whether the total score is passing or not, but the model can be used to judge the total score of some students who do not take part in a certain item, resulting in the lack of scores of the item.

In conclusion, this paper has a good exploration of the development path of physical education teaching informationization in colleges and universities under the modern teaching theory, and the model proposed in this paper is very suitable for the development of physical education teaching informationization in colleges and universities, and plays a certain guiding role in improving the level of teaching.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

The Development Path of Physical Education Teaching Informatization in Colleges and Universities in the Context of Modern Teaching Theory

Dapeng Sun

Published Online: Mar 24, 2025

Received: Oct 26, 2024

Accepted: Feb 15, 2025

DOI: https://doi.org/10.2478/amns-2025-0747

KeywordsModern pedagogy, Physical education teaching informatization, Physical fitness clustering method, Decision tree, ID3 algorithm

© 2025 Dapeng Sun, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
Modern pedagogy, Physical education teaching informatization, Physical fitness clustering method, Decision tree, ID3 algorithm