A Study on the Optimal Design of Reinforced Learning-Driven Personalized Physical Training Strategies in Physical Education Instruction

Physical education has always been an indispensable part of school education, and with the development of society and the gradual deepening of people’s concern for health, the importance of physical education is also becoming more and more prominent. As an important form of physical training, the application of physical fitness training in physical education has also received more and more attention [1-4]. With the increasingly fierce social competition, the pressure faced by students is also increasing, and their physical quality and physical fitness level are also gradually concerned [5-7]. In order to improve the physical quality and fitness level of students, personalized physical training has emerged, which not only focuses on the improvement of children’s individual physical quality, but also pays more attention to cultivating students’ comprehensive ability and teamwork spirit [8-11].

Individualized physical training refers to the planned, organized, targeted sports training, in order to improve the body functions, enhance physical fitness, improve athletic ability and adapt to the purpose of sports load [12-14]. Personalized physical training is an important form of physical exercise, which can effectively improve the function of various organs of the human body, strengthen the functions and capabilities of the body, improve the performance of sports and reduce the risk of sports injuries by scientifically selecting sports, mastering the training methods and conducting a reasonable training program [15-18]. In physical education, personalized physical training is one of the most important means of cultivating students’ comprehensive quality. Through physical training, students can comprehensively improve their physical quality, enhance their physical fitness level, improve their motor skills and ability to adapt to various sports programs, so that they can better participate in sports, enjoy the fun of sports and improve the quality of life [19-22].

Existing recommender systems are unable to complete dynamic modeling and lack the timeliness of recommendation, based on this, this paper proposes a personalized recommendation model based on reinforcement learning to be used for physical training strategy recommendation. The model presented in this paper can better capture the real-time dynamic interests of users and provide a reinforcement learning process. Softmax is used to transform the probability of each action, and a pseudo-twin component is designed to compute rewards and remove noisy interaction records to achieve more accurate recommendations. After calculating the rewards, the rewards are ranked and the top ranked ones are selected as the final recommendation. Finally, the effectiveness of the recommendation algorithm is verified on the mL-1m and mL-100k datasets.

2

Research on SOM neural network algorithms

2.1

SOM neural network model

SOM neural network [23] is a competitive learning unsupervised network with the following basic ideas: 1)

When a certain type of data is input, the neuron node in its output layer gets the maximum stimulus and wins, and the connection weight vectors of the winning neuron node and its surrounding nodes are corrected accordingly in the direction of the input vector.

2)

When the input data changes, the winning node also shifts from the original winning node to other nodes.

3)

According to the output condition of the SOM neural network, it is able to get the distribution characteristics of all data from the sample data can be obtained, and also make the output neuron nodes represent a certain type of pattern, so as to judge the class to which the input vector belongs.

SOM neural network structure is a two-layer neural network, divided into the input layer and the output layer (also called the competition layer), the neurons of the two layers are all interconnected. The main function of the input layer is to receive input information from the outside world, and the number of neurons is the same as the dimension of the input data. The main function of the output layer is to process input information, and each neuron has a weight value. After receiving the input data, the network will determine the winning neuron in the output layer according to the “WTA” rule, and after the neuron weights are stabilized through continuous training with the input data, it will determine the position of the input data in the low-dimensional space. The neuron weights will be stabilized to determine the position of the input data in the low-dimensional space. Usually, each neuron represents a clustered category.

The neurons in the output layer have a topological relationship with each other, and different network topologies can be utilized to meet different needs. In this regard, Fig. 1 shows the topology of SOM neural network, Fig. 1(a) shows the one-dimensional line array of SOM neural network, and Fig. 1(b) shows the two-dimensional planar array of SOM neural network.

2.2

SOM neural network algorithm

The training goal of the SOM neural network is to find appropriate weight vectors for each output layer neuron in order to maintain the topology. The three phases of SOM training are:

1)

Competition phase to find the winning neuron.

In SOM neural network, the similarity between the input vector and the neuron’s weight vector is the basis for the network to classify or cluster the input vectors, and is also the key to find the winning neuron. Usually the similarity is measured by the distance between two vectors, the larger the distance, the lower the similarity. The SOM neural network, after receiving an input vector at the input layer, calculates the Euclidean distance between the weight vector of each neuron and that vector, and searches for the neuron with the largest similarity as the winning neuron.

Assume that there is M neuron in the output layer, and the normalized weight vector is W_j(j = 1,2,…,M) with the same dimension as the input vector. For each input vector X, assume that the winning neuron is j^*. The formula for the winning neuron is: (1) $j^{*} = \min {\sum_{j = 1} | | X - W_{j} | |}$

2)

Cooperation phase to determine the superior neighborhood.

Cooperation between the winning neuron and its neighboring neurons is carried out to determine the tuning center of the winning neuron, thus forming the superior neighborhood. Generally the winning neighborhood needs to decrease with the increase of the lateral distance of the winning neuron, and the Gaussian function is usually chosen to calculate it.

Let h(t) be the neighborhood tuning function of the SOM neural network, d_j^*,j be the Euclidean distance between the positions of two neurons in the competing layer, δ(t) denote the neighborhood radius that decreases with the number of iterations, and τ₁ be a constant.

(2)

h (t) = \exp (\begin{matrix} - \frac{d_{j^{*}, j}^{2}}{2 δ^{2} (t)} \end{matrix})

(3)

δ (t) = δ_{0} \exp (- \frac{t}{τ_{1}})

3)

Adaptive stage, the adjustment of network weight vector.

The adaptive phase is mainly centered on the winning neuron to update the weight vectors of all neurons in the superior neighborhood, the degree of adjustment is based on the lateral distance of the winning neuron to determine, the closer the lateral distance, the greater the degree of adjustment. When the entire network learning is completed competitive layer neurons can recognize the sample data set similar input patterns, this stage is the key to the sample data set classification, clustering. The adjustment formula of neuron weight vector is: (4) $W_{j} (t + 1) = W_{j} (t) + h (t) α (t) [X - W_{j} (t)]$ where the weight vector is W_j(j = 1,2,…,M); X is the input vector: h(t) is the neighborhood-adjusted Gaussian function: α(t) is the learning rate, which decreases with the number of iterations, α(t) The computational formula can be expressed as: (5) $α (t) = α_{0} \exp (\begin{matrix} - \frac{t}{τ} \end{matrix})$ where τ is a constant.

The specific implementation steps of the SOM neural network algorithm are described as follows:

Stepl: Initialization of network parameters. Set the network input layer, the number of nodes in the output layer, the initial learning rate α₀, the initial neighborhood radius δ₀, the number of iterations T, set the weight vector and do normalization.

Step2: Randomly select an input vector from the sample data set and do normalization on it.

Step3: Determine the winning neuron in the output layer. Calculate the Euclidean distance between the sample data and the M neurons respectively, and find the neuron with the smallest distance as the winning neuron j^*.

Step4: Define the winning neighborhood h(t),t = 1,2,…,n and adjust the weight vectors of the winning neuron and the neurons in the winning neighborhood to make them closer to the input vector.

Step5: Update the iterative learning rate and neighborhood radius and renormalize the learned weight vectors.

Step6: Continue iteration until the learning rate decays to 0 or the number of iterations is reached.

The flow of the SOM neural network algorithm is shown in Figure 2.

2.3

Shortcomings of SOM algorithm in clustering

The SOM algorithm utilizes self-organizing features to group data by mapping high-dimensional complex samples onto low-dimensional neuron arrays, but there are some shortcomings in the SOM neural network algorithm.

1)

SOM lacks a specific objective function and does not guarantee convergence. Almost all algorithms are finally reduced to optimization problems, in order to reach the final goal, the objective function needs to be constructed and the model needs to be solved, but the SOM algorithm lacks a reasonable objective function, which makes the SOM algorithm blind during the learning process, and parameters such as the learning rate, the neighborhood function, the network structure, etc., need to be preset, and it is difficult to ensure the overall convergence.

2)

The number of output neurons set is inconsistent with the clustering category. In clustering the clustering categories are difficult to predict in advance, while a SOM output neuron usually does not correspond to a category, there may be a single category merging and splitting.

3)

SOM neural network measures the similarity between data in terms of Euclidean distance. Therefore, to use the SOM algorithm for data clustering, the following three problems need to be solved: first, set the objective function and optimize the convergence conditions to speed up the convergence of the network; second, organize and optimize the training results of SOM, design a self-clustering method, determine the number of clusters and divide the categories; third, explore whether the combination of SOM neural network and dimensionality reduction algorithm is feasible.

3

Personalized physical training recommendation based on LSTM and reinforcement learning

This section details a personalized recommendation model for sports training strategies based on LSTM [24] and reinforcement learning [25]. The removal of recommendation noise as well as acquisition of user’s time-varying points of interest is carried out using LSTM long and short-term interest layers and reinforcement learning. The proposed algorithm consists of a long- and short-term interest point acquisition layer, a reinforcement learning decision layer, and a reward output layer, and the structure is shown in Figure 3. First, the algorithm is based on the recommendation interaction records and selects a number of items that a particular user has interacted with as input to the model. Second, the embedding vectors of these items are input into the LSTM in the form of 3D vectors in order to obtain the feature information corresponding to each sequence, the agent in reinforcement learning is cascaded using the MLP layer as well as the Softmax layer, and the dimensions of the sequences are converted into the total number of items in order to obtain the selection probability corresponding to each action. Next, the first k and last k items of the action list are taken as output. Finally, these two components are fed into the sub-network of the pseudo-twin module, respectively, and the rewards of the former items are computed by the pseudo-twin reward module, which is fed back into the algorithm to update the parameter values in order to recommend the item that best matches the current user’s preference within the current moment.

The pseudo-twin component is used to fully capture the interaction between items in both subsequences and output delayed rewards. Finally, a gradient strategy is designed to perform robust gradient estimation.

3.1

Long- and short-term time-varying interest acquisition

Long Short-Term Memory Network (LSTM) is a deep learning model commonly used to process sequence data, especially for sequence data with long-term dependencies. It is an extension of recurrent neural network (RNN), which effectively captures and remembers long-term dependencies in sequence data by introducing a gating mechanism, while avoiding the gradient vanishing problem in traditional RNN.

Firstly, the input of the long-short-term network is constructed to obtain the interaction record of this user in the interaction record, and the obtained item features are grouped into inputs, which is done to ensure that each group can be predicted by the LSTM to produce a new result. Secondly, the embedding vector of each item is obtained from the interaction record. Finally, after the output of LSTM, MLP, and Softmax network, the sequence features predicted for each sequence are generated, and thus, the task of the long and short-term time-varying interest acquisition layer is finished, and the two-dimensional sequence feature vectors about the original sequences are obtained.

3.2

Enhanced Learning Modeling

The strategy component uses a multilink level agent module for decision making, where the sequence features are first fed into this module, all actions in each state are sampled, and the top k action that maximizes the benefit is selected for recommendation.

Define the action as all candidate items after passing through the long and short term interest point acquisition layer and MLP and Softmax layer, each item is defined as an action, also called an arm, all the actions are represented by the set A = (V₁,V₂,…,V_t), where each element in the set represents an action and also an item, the goal of the recommender system is to recommend the items with the probability of the top k items as the final items.

Given the definitions of actions and states of the strategy component, the scheme of the strategy is given. First, starting from the state, the two-dimensional sequence features are fed into the MLP layer, which converts the vector of s×n into s×|A|, where |A| represents the total number of items in the current sequence, which yields a two-dimensional matrix of s×|A|. The significance of this is that the number of sequence dimensions in the sequence is converted into the number of items in the sequence in order to carry out a later stage of probabilistic prediction for each action. The MLP is downgraded as follows: (6) $z = M L P (x; Θ_{m l p})$ Where, x represents the sequence eigenvector of s×n before dimensionality reduction and z represents the 2D matrix of s×|A|.

After obtaining the corresponding items will be sign, the probability of each item is converted using Softmax. Softmax converts the eigenvalues into probabilities as follows: (7) $S o f t \max (v_{i}) = \frac{e^{v_{i}}}{\sum_{j = 1}^{| R |} e^{v_{j}}}$ where v_i refers to the ind item in this sequence, e is a natural constant (Euler number), and |A| is worth the length of the sequence. Finally, each sequence is sorted by probability and the first k and last k items in each sequence are selected as input values in the reward calculation module, respectively.

3.3

Incentive mechanisms

Given an interaction sequence i_1:n, one action is sampled for each item of sequence i_1:n, which is scanned to obtain a list of actions, denoted (a₁,⋯,a_n).

Based on the two subsequences generated, they are then aggregated using the pseudo-twin component. Specifically, given subsequences $i_{1 : n}^{+}$ and $i_{1 : n}^{-}$ , the aggregation function is shown in Equation (8) and Equation (9): (8) $v^{+} = a g g (i_{1 : n}^{+} \cup i_{n + 1}; Θ_{a g g}^{+})$ (9) $v^{-} = a g g (i_{1 : n}^{-} \cup i_{n + 1}; Θ_{a g g}^{-})$ where agg(·) denotes the aggregation function, $Θ_{a g g}^{+}$ and $Θ_{a g g}^{-}$ are the parameters to be learned, and v⁺ and v⁻ are the embedding vectors. In order to reduce the interaction between them, the two paths of the pseudo-twin component use different parameters to aggregate them.

After obtaining the embeddings of the two subsequences, a multilayer perceptron (MLP) is used to compute their delay rewards. The functions are shown in Eq. (10) and Eq. (11): (10) $R^{+} = M L P (v^{+}; Θ_{m l p})$ (11) $R^{-} = M L P (v^{-}; Θ_{m l p})$ where Θ_mlp denotes the MLP-parameter.

3.4

Learning and forecasting

In this model, sequences are selected using a reinforcement learning approach. The goal is to learn a stochastic strategy that maximizes the expected cumulative reward J(Θ) for all users, where Θ denotes all parameters to be learned. A simple approach is to directly utilize the absolute performance scores R⁺ of subsequences i₁:n⁺ for gradient estimation.

Specifically, given a list of sampled actions (a₁,…,a_t), the probability of generating $i_{1 : n}^{+}$ is shown in Equation (12): (12) $P (i_{1 : n}^{+}) = \prod_{t} π (a_{t} | s_{t}) P (s_{t + 1} | s_{t}, a_{t}) = \prod_{t} π (a_{t} | s_{t})$

Depending on the level of sampling $i_{1 : n}^{+}$ , the objective function is shown in Eq. (13): (13) $J (Θ) = E_{π ~ (a_{1}, \dots, a_{n})} R^{+} = \sum_{a_{1}, \dots, a_{n}} P (i_{1 : n}^{+}) R^{+} = \sum_{a_{1, \dots, a_{n}}} \prod_{t} π (a_{t} | s_{t}) R^{+}$

A two-by-two comparison is formed as an additional constraint to fully utilize the information provided by these two subsequences to improve the learning process. Note that binary actions are used in the model and the probability is equal to 1. Therefore, the probability of generating $i_{1 : n}^{-}$ is shown in Equation (14): (14) $P (i_{1 : n}^{-}) = \prod_{t} (1 - π (a_{t} | s_{t})) (1 - P (s_{t + 1} | s_{t}, a_{t})) = \prod_{t} (1 - π (a_{t} | s_{t}))$

Based on the probabilities and rewards of the generated subsequences, the traditional strategy learning strategy is transformed into a two-by-two learning process, and the final gradient of J(Θ) is shown below: (15) $J (Θ) \propto P (i_{1 : n}^{+}) R^{+} - P (i_{1 : n}^{-}) R^{-} = \prod_{t} π (a_{t} | s_{t}) R^{+} - \prod_{t} (1 - π (a_{t} | s_{t})) R^{-}$

Unlike previous sequence recommendation methods, the advantage of this method is that it provides a new reinforcement learning perspective for sequence recommendation.

Items are fed into a reinforcement learning component, and a pseudo-twin component is designed to perform the computation of rewards as well as the removal of noise for personalized recommendation of sports training strategies.

Calculate their awards R⁺, rank all candidates based on their awards, and select the top n results as the final recommendation.

3.5

Algorithm Flow

The strategy network and pseudo-twin network are intertwined, so they need to be trained together. In the entire training process, the parts to be trained are the LSTM part, MLP part, and the neural network layer in the pseudo-twin module.

The preprocessing procedure is replicated. For all datasets, the presence of comments or ratings is considered as implicit feedback (i.e., the user interacts with the item), and timestamps are used to determine the sequence of operations. For segmentation, the history sequence of each sequence was divided into two parts using leave-one-out: (1) the most recent interactions used for testing: and (2) the sequence of interactions used for validation.

4

Analysis of the results of the clustering of students’ physical fitness

This paper takes 12503 college students in the first to fourth year of a sports college as the research object, and takes the test results of the Standard at the end of 2022 as the data source, among the students who participated in the test, 5518 were male students and 6985 were female students. A total of 10 items of basic information, 9 items of testing environment information, and 8 items of physical fitness test data were collected from each testing student, and only physical function indicators, physical fitness-related indicators, and overall evaluation were finally selected for data mining.

After analyzing the clustering results, the girls’ group is divided into 5 categories of students, and the boys’ group is divided into 3 categories of students, each of which presents different characteristics of its physical test indicators. The specific results are analyzed as follows, and the final clustering results of the girls’ and boys’ groups are shown in Table 1 and Table 2, respectively. The overall trends of the clustering results are shown in Figures 4 and 5, respectively.

Table 1.

Female group k-means Cluster results

	Cluster1	Cluster2	Cluster3	Cluster4	Cluster5
Case number	650	710	4520	895	210
Lung capacity score	85	84	85	78	83
50 meters running score	62	68	73	64	37
Fixed jump	64	65	73	35	27
Preflexion score	76	82	84	75	75
Sit-ups scores	65	18	71	68	28
1000 meters run /800 meter running score	34	67	75	65	24
Health score	65.25	69.26	79.83	68.56	52.13

Table 2.

Man group k-means Cluster results

	Cluster1	Cluster2	Cluster3
Case number	1630	2468	1420
Lung capacity score	84	84	83
50 meters running score	81	78	68
Fixed jump	64	65	16
Preflexion score	75	72	63
Sit-ups scores	72	7	5
1000 meters run /800 meter running score	72	63	49
Health score	78.46	68.21	57.44

The girls’ group achieved higher results in all measures in Cluster 3. These students have better physical fitness, accounting for 64.71% of the total number of students. The lowest mean of total score of students in cluster 5 was 52.13 not passing line, and its lower mean in other items except lung capacity and sitting forward bends items, which indicates that this category is insufficient in endurance, lower extremity explosive strength, waist and abdominal strength. In addition, the girls’ group in cluster 1, cluster 2 and cluster 4 achieved poor results in long-distance running, sit-ups and standing long jump, respectively, and the students in these three different cluster types have their own weak items.

The highest mean values of the measured scores of the items in cluster 1 of the male group showed higher physical fitness in this category, and the male students in cluster 3 showed lower total scores of the test items. Cluster 2 and Cluster 3 achieved lower scores in pull-ups and both categories also had the highest number of students, 74.27% of the total number of students, with the lowest mean score of 5 in pull-ups in Cluster 3. The standing long jump event resulted in lower mean scores for Cluster 3. It was also found that cluster 3 students failed in the total mean score, with male students mainly failing in the standing long jump and pull-up test programs due to weak performance.

From the results of the cluster analysis, it can be seen that the male and female student groups have a significant difference in physical fitness factors. The bifurcation of the boys’ group is more apparent. The boys in the cluster 1 group displayed greater balance in all test indicators and improved physical fitness. Cluster 2 and Cluster 3, the two categories with a larger number of people in these two groups, and the weak points of the test items in the boys’ group were concentrated in the upper body strength test items, with only single-digit scoring averages achieved for pull-ups. Cluster 3 was the worst overall performance among the 3 categories, with its failing standing long jump item.

The factors affecting change in the female group were more complex, with unique key influences in different cluster categories except for cluster 3. Cluster 1 had a weakness in endurance events, cluster 2 had a weakness in sit-ups, and cluster 4 had a weakness in standing long jump. Cluster 5 basically scored poorly in each item, which suggests that we need to choose different exercise methods for different students to make up for the shortcomings, and also proves the utility of SMO-based algorithms in physical education optimization.

5

Experimental results of personalized recommendation model

5.1

Overall Performance Comparison

In order to better show the effectiveness of the proposed algorithm in this paper, four baseline algorithms of KNNBasic, KNNWithMeans, KNNBaseline and SVD, and four combination algorithms of DDPG+RLSTM, DDPG+RLSTM, DDPG+T_self_attention and DDPG+self_attention are compared with the proposed algorithm.

Table 3 shows the experimental comparison results, both in the mL-1m dataset and in the mL-100k dataset the algorithms proposed in this paper outperform the other algorithms on Ave_RMSE and Ave_MAE. In addition, the final results of all the module combination algorithms are better than the baseline algorithm, thus indicating that the algorithms in this paper as well as the module combination algorithms are superior with respect to the baseline algorithm.

Table 3.

Comparison of experimental results

Algorithm	mL-1m data set		mL-100k data set
Algorithm	Ave_RMSE	Ave_MAE	Ave_RMSE	Ave_MAE
This algorithm	0.402	0.231	0.243	0.106
DDPG+RLSTM	0.446	0.258	0.527	0.407
DDPG+LSTM	0.603	0.467	0.564	0.436
DDPG+T_self_attention	0.576	0.398	0.451	0.267
DDPG+self_attention	0.595	0.425	0.535	0.368
KNNBasic	0.922	0.726	0.975	0.774
KNNWithMeans	0.928	0.738	0.951	0.75
KNNBaseline	0.897	0.703	0.923	0.727
SVD	0.875	0.683	0.938	0.74

Tables 4 and 5 show the magnitude of Ave_RMSE and Ave_MAE enhancement for each combination algorithm relative to the baseline algorithm on the two datasets respectively. The unsigned numbers in the table are all positive, and the Ave_RMSE value of this paper’s algorithm is 0.5668 higher than the worst performing baseline algorithm, KNNWithMeans, and the Ave_MAE value is 0.6870 higher than the worst performing baseline algorithm, KNNWithMeans, on the mL-1m dataset. On mL-100k dataset, the Ave_RMSE value of this paper algorithm is 0.7508 higher than the baseline algorithm KNNBasic, and the Ave_MAE value is 0.8630 higher than the baseline algorithm. Meanwhile, the test results of module combination algorithms are all greatly improved over the baseline algorithm. Therefore, the performance of both the algorithm and the combination algorithm in this paper is due to the baseline algorithm.

Table 4.

This algorithm and module combination Ave_RMSE enhancement

Lifting amplitude/%		KNNBasic	KNNWithMeans	KNNBaseline	SVD
mL-1m	This algorithm	56.40	56.68	55.18	54.01
	DDPG+RLSTM	51.63	51.94	50.28	49.03
	DDPG+LSTM	34.60	35.02	33.11	31.09
	DDPG+T_self_attention	37.53	37.93	35.79	34.17
	DDPG+self_attention	35.47	35.88	33.67	32.00
mL-100k	This algorithm	75.08	74.45	73.67	74.09
	DDPG+RLSTM	45.95	44.58	42.90	43.82
	DDPG+LSTM	42.15	40.69	38.89	39.87
	DDPG+T_self_attention	53.74	52.58	51.14	51.92
	DDPG+self_attention	45.13	43.74	42.04	42.96

Table 5.

This algorithm and module combination Ave_MAE enhancement

Lifting amplitude/%		KNNBasic	KNNWithMeans	KNNBaseline	SVD
mL-1m	This algorithm	68.18	68.70	67.14	66.19
	DDPG+RLSTM	64.46	65.04	63.30	62.24
	DDPG+LSTM	35.67	36.72	33.57	31.63
	DDPG+T_self_attention	45.18	46.07	43.39	41.73
	DDPG+self_attention	41.46	42.41	39.54	37.77
mL-100k	This algorithm	86.30	85.90	85.42	85.68
	DDPG+RLSTM	47.42	45.73	44.02	45.00
	DDPG+LSTM	43.67	41.87	40.03	41.08
	DDPG+T_self_attention	66.50	64.40	63.27	63.92
	DDPG+self_attention	52.45	50.93	49.38	50.27

5.2

Algorithm convergence analysis

This part of the experiment mainly collects the reward value Reward that can be obtained by the intelligent body after each prediction when the algorithm is tested, as well as the RMSE and MAE values that can be obtained by each round of prediction, and analyzes the convergence of the algorithm by observing the trend of the values of these evaluation indexes. Meanwhile, in order to evaluate the convergence of the algorithm as a whole, this paper analyzes the algorithm by observing the trend of the average return Ave_Reward in each round. Figures 6 and 7 show the trend of Ave_Reward values of this paper’s algorithms and the combined algorithms of each module with the increase of training times on the two datasets mL-1m and mL-100k, respectively. The overall results of all algorithms tested on mL-1m are better than the overall results tested on mL-100k, the reason for which has been analyzed in the overall performance section. In both graphs the algorithms of this paper have the best performance, but there are some problems:

In mL-1m, the algorithm eventually converges to a higher height than the other algorithms, but this advantage is not very obvious, while in mL-100k, the convergence height of this paper’s algorithm is much higher than the other algorithms, so that the algorithm can better learn the user’s long-term physical training interests when training on the mL-1m dataset, so that the long-term physical training interests are dominated by the student’s overall physical training interests The state enhancement mechanism in modeling long-term sports training interests makes the algorithm more capable of modeling students’ long-term interests.

5.3

Set-Pair Recommendation Degree for Physical Training Strategies

With a number of A students of the sports college as the experimental object, according to the recommendation algorithm to obtain different physical training strategies and their recommendation degree as shown in Table 6, the set of pairs of recommended degree, can be obtained with the number of A students most compatible with the part of the physical training strategy, the analysis of the original physical measurement data can be found, the recommended physical training strategies and experimental objects at the same time taking into account both the similarity and the dissimilarity, in line with the research This is in line with the recommendation principle proposed in the study. In real life, it is necessary to consider practical factors, and among the highly recommended physical training strategies, users can filter them subjectively according to factors such as intensity, mode, and relevance, so as to obtain the physical training strategy that best matches their own.

Table 6.

Number A student and physical training strategy set

Number	Degree of recommendation	Sort
1	0.668	3
2	0.526	5
3	0.459	6
4	0.745	2
5	0.324	8
6	0.159	10
7	0.569	4
8	0.951	1
9	0.437	7
10	0.266	9

6

Conclusion

This paper proposes a personalized recommendation model for sports training strategies based on LSTM and reinforcement learning, based on the application of SOM neural network algorithm to cluster and divide students’ physical fitness, and recommend personalized physical training strategies for students.

Using the SOM neural network algorithm can quickly classify students by physical fitness category, divide each student by category, and also react to the weak items of each category of students, the average value of all indicators in the girls’ group clustering 3 achieved better results, and this type of students accounted for 64.71% of the total number of girls. The boys group in cluster 3 had the lowest mean score for the pull-up program, with a score of only 5 points. Therefore, it is possible to help students select appropriate physical training strategies based on the characteristics obtained for each category.

The personalized recommendation model for physical training strategies based on LSTM and reinforcement learning achieves advanced performance, and the personalized recommendation algorithm for physical training strategies proposed in this paper outperforms other algorithms on Ave_RMSE and Ave_MAE on both mL-1m and mL-100k datasets. On the mL-1m dataset, the Ave_RMSE and Ave_MAE values are 56.68% and 68.70% higher than KNNWithMeans, respectively. The Ave_RMSE and Ave_MAE values of this paper’s algorithm are better than those of the best performing algorithms on the mL-100k dataset.

In terms of the set-pair recommendation degree of the model, this paper’s model performs reliable sports training strategy recommendation, and it is able to perform scientific sports training strategy recommendation while acquiring long and short-term interests.

Język:: Angielski

Częstotliwość wydawania:: 1 razy w roku
Dziedziny czasopisma:: Nauki biologiczne, Nauki biologiczne, inne, Matematyka, Matematyka stosowana, Matematyka ogólna, Fizyka, Fizyka, inne

Kanał RSS czasopisma

A Study on the Optimal Design of Reinforced Learning-Driven Personalized Physical Training Strategies in Physical Education Instruction

Yinghui Jiang

Duqian Ding

Data publikacji: 21 mar 2025

Otrzymano: 30 paź 2024

Przyjęty: 07 lut 2025

DOI: https://doi.org/10.2478/amns-2025-0601

Słowa kluczoweSOM neural network, Cluster analysis, Markov decision making, Recommendation model, Physical education optimization

© 2025 Yinghui Jiang et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Słowa kluczowe
SOM neural network, Cluster analysis, Markov decision making, Recommendation model, Physical education optimization