Construction of a neural network-based model for training data analysis and performance prediction of athletes

The sports industry is a highly competitive and variable field, and various sports such as basketball, soccer, and track and field require athletes to give full play to their strengths and skills in order to achieve excellent results [1-3]. However, in order to improve athletes’ performance, in addition to traditional training methods, data analysis and athletes’ performance prediction are gradually playing an important role in the sports world [4-6]. Data analysis is a method of collecting, organizing and analyzing a large amount of information on athletes’ physical indicators, training data and competition results to reveal athletes’ strengths and weaknesses and provide scientific basis for decision makers [7-9].

In the sports industry, data analysis can be used to monitor athletes’ physical indicators, training data and game performance in real time, so as to find out whether the athletes’ training status and physical condition have reached the optimal state [10-13]. When the athletes have problems in the process of competition or training, the decision makers can make reasonable adjustments and improvements based on the results of data analysis [14-15]. Athlete performance prediction is a method to predict future performance based on historical data and trends [16-17]. For the sports world, the importance of athlete performance prediction is self-evident, which is an important guidance for the development of training programs and improvement strategies to improve the individual performance of athletes, as well as the team’s strategic adjustments and decision-making, and the construction of models through the use of neural networks is of great significance for the analysis of athlete’s data and performance prediction [18-21].

In this paper, the filtering method combining jitter removal and bi-exponential smoothing is adopted to filter the skeletal data output from KinectV2, which provides stable data input for accurate recognition of athletes’ movements.Then, an automatic coding and decoding network model was adopted to extract the relevant features of the athlete’s skeletal data.The extracted real-time motion data was matched with the template motion data by a dynamic time regularization algorithm to complete the recognition of the movements. Then the self-organizing mapping network is proposed to improve the traditional clustering integration method, and the performance and function tests of the system and the clustering experiments of the athletes’ training status are carried out respectively to verify the rationality of the system design in this paper.

2

Neural network-based recognition and prediction of athletes’ training movements

2.1

Skeletal data filtering

Before using the joint data of an athlete, a very important step is noise reduction filtering of the joint data. It is necessary to do smoothing of the data using some filtering methods.

This system uses a filter that combines jitter clear and bi-exponential smoothing. Jitter clear attempts to suppress input spikes by limiting the range of variation allowed in the output of each frame. Equation (1) shows a dither filter variant that uses an exponential filter to suppress large variations in the input, where t is the threshold.

(1)

{\hat{X}}_{n} = {\begin{matrix} X_{n} & | X_{n} - {\hat{X}}_{n - 1} | < t \\ α X_{n} + (1 - α) {\hat{X}}_{n - 1} & | X_{n} - {\hat{X}}_{n - 1} | \geq t \end{matrix}

Alternatively, a simple average filter can be used instead of an exponential filter. Since median filters usually perform better at eliminating peaks, the variation of inputs may be limited by the median value, as shown in equation (2) for a median filter, where X_med represents the median value of the last n inputs and t is the threshold value.

(2)

{\hat{X}}_{n} = {\begin{matrix} X_{n} & | X_{n} - X_{m e d} | < t \\ X_{m e d} & | X_{n} - X_{m e d} | \geq t \end{matrix}

In median filtering, the output of the filter is the median of the last N input. The median filter helps to eliminate impulse peak noise. Ideally, the filter size N should be chosen to be larger than the duration of the peak noise. However, the delay of the filter is directly dependent on N and, therefore, a larger N increases the delay.

The median filter does not utilize the statistical distribution of the data or noise. The jitter removal filter essentially bypasses the filtering of the undetected condition, which is the threshold of the input data. Skeletal tracking states can be used to adaptively select this threshold. The data that has been filtered by the jitter removal filter is reprocessed using a bi-exponential smoothing filter.

The bi-exponential smoothing filter is a commonly used smoothing filter. The bi-exponential smoothing filter smoothes the output of the filter by applying two exponentials and using them to trend the input data. It is described by the following two equations, where T represents the trend, F represents the filtered output, and α and γ represent the parameters.

(3)

T : b_{n} = γ ({\hat{X}}_{n} - {\hat{X}}_{n - 1}) + (1 - γ) b_{n - 1}

(4)

F : {\hat{X}}_{n} = α X_{n} + (1 - α) ({\hat{X}}_{n - 1} + b_{n - 1})

The trend b_n is calculated as the difference between the last two outputs of the filter on the exponential filter, and then the current trend and the previous filter outputs, ${\hat{X}}_{(n - 1)} + b_{(n - 1)}$ are used to calculate the output of the filter. γ controls the sensitivity of the trend to recent changes in the input, a large γ result on the trend double exponential smoothing filter coincides with the data, so that the predicted future data is a straight line with a slope equal to the trend value b_n as in equation (5): (5) ${\hat{X}}_{n + k \ln} = {\hat{X}}_{n} + k b_{n}$

In general, the performance of this filtering method is superior to about in prediction. However, signal overshoot is evident in the predicted output, and a simple but useful improvement would be to adjust the α and γ parameters based on joint velocity adaptation, such that when a joint is not moving fast, it is filtered more accurately by using smaller α and γ parameters. This adaptation results in smoothing the output when a joint is not moving fast. When a joint is moving fast, by using larger a and γ parameters, this results in a better responsiveness to input changes, which reduces latency.

In this paper, two preset α and γ parameters are used: one for the low-speed case a_low and γ_low, and the other for the high-speed case a_high and γ_low. Two speed thresholds, such as V_low and V_high, can also be used. Then for each input X_n, the speed estimate is V_n, and the filtering parameters a and γ are set as linear interpolators based on the current speed between their low and high values. For example, the α parameter is used for time n, as shown in equation (6).

(6)

a_{n} = {\begin{matrix} a_{l o w} & v_{n} \leq v_{l o w} \\ a_{h i g h} + \frac{v_{n} - v_{h i g h}}{v_{l o w} - v_{h i g h}} (a_{l o w} - a_{h i g h}) & v_{l o w} \leq v_{n} \leq v_{h i g h} \\ a_{h i g h} & v_{n} \geq v_{h i g h} \end{matrix}

In this paper, we adopt a joint filtering method combining jitter removal and bi-exponential smoothing based combination to filter the output data in KinectV2. It provides stable data input for athlete’s action recognition.

2.2

Dynamic time regularization methods

DTW (Dynamic Time Warping) [22] was firstly used mainly in isolated word speech recognition, the algorithm is based on the idea of dynamic programming, which measures the degree of similarity between two discrete time sequences, and at the same time is able to find the corresponding minimum matching path. Given two temporal feature sequences reference template X and test template Y.

(7)

X = {X (1), X (2) \dots X (n) \dots X (N)}

(8)

Y = {Y (1), Y (2) \dots Y (m) \dots Y (M)}

The meanings of the parameters contained in the above equations are as follows:

1) X is the corresponding feature sequence of the reference template, and Y is the corresponding feature sequence of the test template.

2) n denotes the timing label of the reference template, ranging from [1, N]; m denotes the timing label of the test template, ranging from [1, M].

3) N, M denote the total number of frames of the timing feature sequences in the reference and test templates, respectively.

4) X(n) denotes the feature vector for frame n of the reference template and Y(m) denotes the feature vector for frame m of the test template. Simple linear scaling does not take into account the fact that actions do not vary singularly in length over the time series, which can lead to poor recognition. Therefore is a dynamic programming (DP) approach is taken.

It is necessary to construct a matrix grid of n×m for aligning the two sequences X, Y. The elements at matrix (i, j) are the distances d between the points X_i and Y_j, i.e., the similarity between each point on the sequence X and the corresponding point on the sequence Y, with smaller distances representing higher similarity, and vice versa with lower similarity. Usually the Euclidean distance is used as shown below: (9) $d (i, j) = {(X_{i} - Y_{j})}^{2}$

The dynamic programming approach then boils down to finding a nonlinear bending path in the grid described above. On this path the distance between the test template feature sequence and the reference template feature sequence is minimized and the similarity is maximized. This path is called as regularized path W, and the krd element of W is defined as W_k = (i, j)_k: (10) $W = w_{1}, w_{2}, \dots, w_{k}, \dots w_{K} \max (m, n) \leq K < m + n - 1$

Regularized path W needs to satisfy the following condition constraints:

1)

Boundary conditions:

(11)

w_{1} = (1, 1)

(12)

w_{K} = (m, n)

It is required that the regularized path must start at the lower left corner W₁ and end at the upper right corner W_k. To ensure that the time order is unchanged.

2)

Monotonicity:

If for element w_k−1 = (a′, b′), then the next point w_k = (a,b) of the corresponding path must satisfy the following condition: (13) $(b - b') \leq 1$ (14) $0 \leq (b - b')$

This restricts the points above the regularized path W to be monotonic over time.

3)

Continuity:

If element w_k−1 = (a′, b′), the next point w_k = (a, b) of the path must satisfy the following conditions: (15) $(a - a') \leq 1$ (16) $(b - b') \leq 1$

This can be interpreted to mean that you can only align with your own neighboring points and not match across a point. Restrict the compensation of the regularized path so that every coordinate in X and Y is guaranteed to be present in W.

The above constraints are combined so that there are only three path directions for each grid point. For example, if the path has just passed grid point (n_i,m_j), the next grid point to be passed can only be one of the following three: (17) $(n, m) = (n_{i}, m_{j + 1})$ (18) $(n, m) = (n_{i + 1}, m_{j})$ (19) $(n, m) = (n_{i + 1}, m_{j + 1})$

There will be many paths that satisfy these constraints above, and usually only the path with the smallest regularization cost is taken, which is given by: (20) $D T W = (X, Y) = \min {\frac{\sqrt{\sum_{k = 1}^{K} w_{k}}}{K}}$

Where the denominator K is used to compensate for the problem of different lengths of different paths and is typically set between 0.5 and 2. In this way the search for (n,m) from (n,m) = (1,1) saves the corresponding distance for each (n,m), which can be understood as the similarity to the reference template. According to the value of K has been set to carry out from the current node’s matching similarity with the direct accumulation of similarity of the node with the largest similarity synthesis. And the final regularized path with maximum similarity is obtained: (21) $\begin{matrix} D (n_{i}, m_{j}) = d [X (i), Y (j)] \\ + \min (D (n_{i}, m_{j + 1}), D (n_{i + 1}, m_{j}), D (n_{i + 1}, m_{j + 1})) \end{matrix}$

2.3

Motion Information Feature Recognition and Evaluation

In the field of joint data action recognition and analysis, a plurality of potential features of actual physical significance and independent of each other are decoupled from an RGB image or an athlete’s skeleton sequence by an automated coding and decoding network framework, and then an action recognition or analysis task is accomplished based on these potential features.

2.3.1

Overall architecture of the auto-codec network

The core structure of the overall architecture of the automatic coding and decoding network proposed in this chapter consists of three encoders (action information encoder, skeleton structure information encoder, and camera viewpoint information encoder) and one decoder. The three encoders decouple the input 2D athlete skeletal keypoint sequence into three independent potential feature vectors: (1) a time-dependent action information feature vector. (2) Time-independent athlete skeleton structural information feature representing the structural information of the athlete skeleton. (3) Time-independent camera viewpoint features, representing the camera angle information when the action video was taken.

2.3.2

Codec Core Structure and Parameters

The camera viewpoint encoder and the athlete skeleton information encoder have similar network structures, only differing in the number of convolution channels and pooling strategy. The motion information encoder uses a one-dimensional convolution in the temporal dimension to capture temporal information in the sequence of skeletal keypoints of the athlete, and the skeleton information and viewpoint encoders add a global maximum pooling layer to the network structure to collapse the timeline to produce a fixed-size output. Separately, two static features of fixed length obtained from the encoding are copied and tiled along the time axis, and then the tiled features are spliced along the channel axis. In the decoder, nearest-neighbor upsampling is used first, and then a convolution with a step size of 1 is used to recover the temporal information, while a regularization layer is added to prevent overfitting and enhance the model generalization.

2.3.3

Loss function

The core idea of the method in this chapter is to encode mutually independent action information features, skeleton structure information features, and camera viewpoint features from 2D athlete skeletal keypoint sequences of arbitrary length by means of an automatic coding and decoding network model [23], and to be able to reorganize the encoded static and dynamic feature parameters into a given real sample. That is, for a given two samples of 2D athlete skeletal keypoint sequences, P_i1,j1,k1, P_i2,j2,k2 ∈ R^T×2C, where, i ∈ M, j ∈ S, k ∈ V; it is desired to regroup the action information features and skeleton structure information features separated from P_i1,j1,k1 and the camera viewpoint features separated from P_i2,j2,k2 into P_i1,j1,k2 and satisfy Equation (22): (22) $\forall_{i, i_{2} \in M, j_{1}, j_{2} S S . k_{1}, k_{2} \in V} p_{i_{i}, j_{1}, k_{2}} \approx D (E_{M} (P_{i_{1}}), E_{S} (P_{j_{1}}), E_{V} (P_{k_{2}}))$

To achieve this goal, this paper designs a loss function consisting of three components: cross-reconstruction loss, reconstruction loss, and ternary loss to constrain the separation of the three potential features as well as the reconstruction into the corresponding 2D athlete skeleton keypoint sequences.

1)

Cross-reconstruction loss

The purpose of using the cross reconstruction loss is to minimize the difference between the input and the output, and the loss function is shown in Equation (23): (23) $\begin{matrix} L_{c r o s s} = {‖ D (E_{M} (p_{i_{1}, j_{1}, k_{1}}), E_{S} (p_{i_{1}, j_{1}, k_{1}}), E_{V} (p_{i_{2}, j_{2}, k_{2}})) - p_{i_{1}, j_{1}, k_{2}} ‖}^{2} \\ + {‖ D (E_{M} (p_{i_{1}, j_{1}, k_{1}}), E_{S} (p_{i_{2}, j_{2}, k_{2}}), E_{V} (p_{i_{2}, j_{2}, k_{2}})) - p_{i_{1}, j_{2}, k_{2}} ‖}^{2} \\ + {‖ D (E_{M} (p_{i_{1}, j_{1}, k_{1}}), E_{S} (p_{i_{2}, j_{2}, k_{2}}), E_{V} (p_{i_{1}, j_{1}, k_{1}})) - p_{i_{1}, j_{2}, k_{1}} ‖}^{2} \\ + {‖ D (E_{M} (p_{i_{2}, j_{2}, k_{2}}), E_{S} (p_{i_{1}, j_{i}, k_{1}}), E_{V} (p_{i_{2}, j_{2}, k_{2}})) - p_{i_{2}, j_{1}, k_{2}} ‖}^{2} \\ + {‖ D (E_{M} (p_{i_{2}, j_{2}, k_{2}}), E_{S} (p_{i_{2}, j_{2}, k_{2}}), E_{V} (p_{i_{1}, j_{1}, k_{1}})) - p_{i_{2}, j_{2}, k_{1}} ‖}^{2} \\ + {‖ D (E_{M} (p_{i_{2}, j_{2}, k_{2}}), E_{S} (p_{i_{1}, j_{i}, k_{1}}), E_{V} (p_{i_{1}, j_{1}, k_{1}})) - p_{i_{2}, j_{1}, k_{1}} ‖}^{2} \end{matrix}$

2)

Reconstruction loss

In addition to cross-reconstruction, the network is required to reconstruct the original input samples at each iteration of training, with a loss function as shown in Eq. (24): (24) $\begin{matrix} L_{r e c} = {‖ D (E_{M} (p_{i_{1}, j_{1} k_{1}}), E_{S} (p_{i_{1}, j_{1}, k_{1}}), E_{V} (p_{i_{1}, j_{1}, k_{1}})) - p_{i_{1, j}, j_{1}, k_{1}} ‖}^{2} \\ + {‖ D (E_{M} (p_{i_{2}, j_{2}, k_{2}}), E_{S} (p_{i_{2}, j_{2}, k_{2}}), E_{V} (p_{i_{2}, j_{2}, k_{2}})) - p_{i_{2}, j_{2}, k_{2}} ‖}^{2} \end{matrix}$

Therefore, the total reconstruction loss function is: (25) $L_{r e c_c o r s s} = L_{r e c} + L_{c r o s s}$

3)

Ternary Loss

The ternary loss in deep metric learning is introduced to increase the distance between classes and decrease the distance within classes. Equation (26) gives the action information encoder ternary loss function: (26) $\begin{matrix} L_{t r i p l c t_M} = \max (0, ‖ E_{M} (p_{i_{1}, j_{2}, k_{2}}) - E_{M} (p_{i_{1}, j_{1}, k_{1}}) ‖ \\ - ‖ E_{M} (p_{i_{1}, j_{2}, k_{2}}) - E_{M} (p_{i_{2}, j_{2}, k_{2}}) ‖ + m) \\ + \max (0, ‖ E_{M} (p_{i_{2}, j_{1}, k_{1}}) - E_{M} (p_{i_{2}, j_{2}, k_{2}}) ‖ \\ - ‖ E_{M} (p_{i_{2}, j_{1}, k_{1}}) - E_{M} (p_{i_{1}, j_{1}, k_{1}}) ‖ + m) \end{matrix}$ where p_i1,j2,k2 and p_i2,j1,k1 are anchor samples; and m is the maximum spacing between positive and negative samples. For the skeleton and camera view encoders, the same ternary loss function is used to explicitly separate the features. The formulas are shown in Eq. (27) and Eq. (28), respectively.

(27)

\begin{matrix} L_{t r i p l c t_S} = \max (0, ‖ E_{S} (p_{i_{2}, j_{2}, k_{1}}) - E_{S} (p_{i_{i_{1}, j_{1}}, k_{1}}) ‖ \\ - ‖ E_{S} (p_{i_{2}, j_{1}, k_{1}}) - E_{S} (p_{i_{2}, j_{2}, k_{2}}) ‖ + m) \\ + \max (0, ‖ E_{S} (p_{i_{1}, j_{1}, k_{2}}) - E_{S} (p_{i_{2}, j_{2}, k_{2}}) ‖ \\ - ‖ E_{S} (p_{i_{1}, j_{1}, k_{2}}) - E_{S} (p_{i_{1}, j_{i}, k_{1}}) ‖ + m) \end{matrix}

(28)

\begin{matrix} L_{t r i p l c t_V} = \max (0, ‖ E_{V} (p_{i_{2}, j_{2}, k_{1}}) - E_{V} (p_{i_{1}, j_{1}, k_{1}}) ‖ \\ - ‖ E_{V} (p_{i_{2}, j_{1}, k_{1}}) - E_{V} (p_{i_{2}, j_{2}, k_{2}}) ‖ + m) \\ + \max (0, ‖ E_{V} (p_{i_{1}, j_{1}, k_{1}}) - E_{V} (p_{i_{2}, j_{2}, k_{2}}) ‖ \\ - ‖ E_{V} (p_{i_{1}, j_{1}, k_{2}}) - E_{V} (p_{i_{1}, j_{1}, k_{1}}) ‖ + m) \end{matrix}

Thus, the total ternary loss function is: (29) $L_{t r i p l c t} = L_{t r i p l c t_M} + L_{t r i p l c t_S} + L_{t r i p l e t_V}$

Summing the above two loss functions, the total loss function is obtained as: (30) $L = L_{r e c_c r o s s} + L_{t r i p l e t}$

2.4

Athlete performance prediction based on self-organizing mapping networks

The Self-Organizing Mapping (SOM) network [24] is the core algorithm used in this paper to model the recognition of athletes’ movements.The SOM network is an unsupervised learning algorithm. It is based on the Hebb learning rule, which updates the connection weights between neurons based only on the activated neurons, thus dividing the patterns in the input space according to the similarity between the input samples. The topology of the neurons in the output layer of the model preserves the distribution of the original data in its input space and clusters the data.SOM networks are distinguished from other widely used clustering algorithms due to their ability to map structures from high-dimensional spaces to low-dimensional spaces, making them an unsupervised learning algorithm with unique advantages in visualizing nonlinear relationships in data, topology-based cluster analysis, vector quantization, and clustering of multidimensional data. Since its proposal, SOM networks have been used in the fields of pattern recognition, image text processing, data mining, and medical diagnosis.

The SOM network structure is a feedforward neural network consisting of an input layer and an output layer. The number of neurons in the input layer is the same as the dimension of the input samples, and the output layer is also called the competition layer, the neurons in the layer are mostly arranged in a two-dimensional plane, and each neuron in the output layer can be treated as a prototype of a pattern in the input space. Each neuron in the output layer can be considered as a prototype of a pattern in the input space. The input layer and the output layer are fully connected networks with variable weights.

There are k neurons in the output layer neuron set V, i.e., V = {v₁,…v_k}, where each output layer neuron has a corresponding weight vector w, and w ∈ R^d, d are the number of input layer neurons, i.e., the dimension of the input space. There are two parameters that determine how the weight vectors of the network are updated during the training of the SOM network, one is the distance between the samples and the weight vectors of the neurons and the other is the topological distance between the neurons. Overall the SOM network is trained in such a way that neurons that are topologically close are more likely to represent the same pattern.

The training steps of the SOM network can be broadly divided into 3 steps:

1)

Initialize the network

Before the model training, the target number of iterations T, the size of the training set n, and the number of neurons in the network of the SOM network need to be set beforehand k. At the beginning of the training, firstly the algorithm will randomly initialize the weight vectors w of all the neurons in the network.

(31)

w (0) = R a n d o m V e c t o r (d)

2)

Input samples and compute winning neurons

Input sample x from the training set and then compute the distance between the input sample and the weight vector of each neuron, which is typically computed using the Euclidean distance.

(32)

D (x, w) = {‖ x - w ‖}_{2}

Among all neurons, the neuron that corresponds to the closest distance of the weight vector from the sample is called the winning neuron, at which point the winning neuron is activated.

(33)

w i n n e r = \arg \min_{j} D (x, w_{j})

3)

Weight vector update

The weight vectors of all neurons are updated in the direction of the current input sample. Where t denotes the number of current iteration rounds and η(t,N(winner,j)) denotes the learning rate, which is related to the number of current iterations t and the topological distance between neuron j and the winning neuron winner N(winner,j). In the SOM network, if a right-angle coordinate (x_x,x_y) is set for each neuron, then the topological distance between any two neurons is as follows.

(34)

w_{j} (t + 1) = w_{j} (t) + η (t, N (w i n n e r, j)) * (x - w_{j} (t))

(35)

η (t, N (w i n n e r, j)) = e^{- \frac{N (w i n n e r, j)}{t + 2}}

(36)

N (w i n n e r, j) = | w i n n e r_{x} - j_{x} | + | w i n n e r_{y} - j_{y} |

Each round of training will make the weight vectors corresponding to the winning neuron and its neighboring neurons move in the direction of input sample x_i, and the magnitude of the weight vector update of the neurons corresponding to the topological distance of the winning neuron is lower than that of the neuron with the topologically close distance of the winning neuron, so that the weight vectors corresponding to the neurons corresponding to the same pattern in the input space are close to each other, and the weight vectors of the neurons corresponding to different patterns are far away from each other. The result is that neurons are clustered into different clusters.

3

Motion prediction system design and implementation

3.1

Needs analysis

The users of the system are mainly divided into two roles: sports athletes and system administrators. This paper analyzes the functional requirements of the system as follows:

1) User login and registration function: the user login and registration function is mainly to provide an entrance for athletes and system administrators to login the system.

2) Action segmentation function: Because of the complexity of the sports action, it is necessary to segment the complete sports action into multiple sub-movements, and then analyze and evaluate the sub-movements obtained after segmentation.

3) Skeletal key point extraction and preprocessing function: use the automatic coding and decoding network model to extract the skeleton key point sequences of the athletes in the sports action video, and preprocess the skeleton key point sequences of the athletes.

4) Standard template action entry function: the system administrator user is responsible for uploading the sports standard template action video into the system and performing preprocessing on it, including athlete action segmentation, key point correction and other operations. Then it is subjected to feature extraction and key action frame calibration in order to establish a normative action database, which provides a reference standard for the subsequent realization of the sports action normative assessment function and the non-normative limb marking function.

5) Irregular limb labeling function: the motion action video to be evaluated is matched and aligned with the standard template action video by the action posture matching algorithm proposed in Chapter III. Then the corresponding key action frames are extracted from the to-be-evaluated sports actions with reference to the key action frames in the canonical action database, and the irregular limbs present in the key action frames are labeled.

6) Athlete kinematic data display and export function: analyze and calculate the key point data of the athlete’s skeleton. In addition, this function also supports exporting athlete kinematic data for further analysis and application.

7) Action normative evaluation function: calculate the similarity between the action to be evaluated and the standard template action, and use this as the normative evaluation score of the action to be evaluated to feedback to the user.

3.2

Overall system design program

The overall scheme of the sports action evaluation system is shown in Fig. 1, which can be generally divided into visualization interface module and GPU auxiliary module. The data storage module is primarily responsible for storing user information and standard movement data. A filter combining jitter clarity and biexponential smoothing is used to reduce noise and filter the joint data, and the GPU analysis and processing module is mainly responsible for extracting the coordinate sequences of key points of the athlete’s skeleton from video sequences through the automatic coding and decoding network model. Finally, the motion template matching of the athlete’s movements is performed by the DTW algorithm combined with the Euclidean distance. The data preprocessing module is mainly responsible for correcting the outliers and interpolating the missing values of the athlete’s skeletal key point sequences extracted by the auto-codec network model, and then matching the aligned motion feature sequences by the DTW algorithm, and marking the wrong limbs in the extracted key motion frames. Finally, the extracted feature data are put into the cascade self-organizing mapping network, which well improves the performance of the clustering model on the dataset and improves the performance of the system in the task of athlete action recognition.

4

Analysis of athlete training data

In order to verify the effectiveness of the method in this chapter, three movements were selected for the experiment, namely “open arms”, “vertically raise arms”, and “45-degree kick”. In order to verify the reliability of the algorithm, each action was repeated 100 times in a standard manner and scored, and the accuracy of the algorithm score was observed, and Figure 2 shows the test performance of the algorithm. It can be seen that there was no error in the scores of the two actions of “opening arms” and “raising arms vertically”, and the score of “45-degree kick” was low for 3 times (23rd, 39th, and 68th) out of 100 repeated tests, that is, the scoring reliability of the algorithm in this test was more than 97%.

The feature vectors are extracted from the elements of the test action stream and the standard action stream, and the angle between the feature vectors and the Euclidean distance of the feature vectors are used as the distance metric parameters, and then evaluated using the DTW algorithm. In this paper, we compare the Euclidean distance as the distance parameter and the “root node divergence” method of extracting feature vectors to obtain the distance parameter, and choose 10 groups of actions, and find that the Euclidean distance as the distance parameter takes less time to calculate, especially in the 10th group of actions, and then we use the DTW algorithm to evaluate the distance parameter. Especially in the 10th group of actions, the computation time of the algorithm of this paper is 83.38% less than that of the “root node dispersion method” to extract the feature vectors to obtain the distance parameter. The comparison results for processing time are shown in Table 1.

Table 1.

Measure parameters processing time comparison

Action number	Time distance at (ms)	This article measures the time of calculation (ms)
1	2.55	1.23
2	2.56	1.26
3	2.78	2.11
4	2.95	1.95
5	2.81	1.26
6	3.02	1.28
7	3.06	1.54
8	3.45	2.64
9	2.89	1.05
10	3.55	0.59

5

Analysis of system test results

Before the system is put into use, it is necessary to analyze the recognition rate and stability of the system to detect the motion recognition algorithm. 20 male athletes and 20 female athletes were invited to carry out a five-day test, and the athletes used the system to carry out a sports training every day in the morning and afternoon, and 400 sample data were obtained at the end of the test, and the sample data were summarized and calculated as shown in Table 2, in which the distance error of the long jump is considered to be accurate within 2 cm. After the test, 400 samples of data were obtained, and the sample data were summarized to obtain the motion recognition rate of the system and the accuracy rate of the motion results as shown in Table 2, in which the error of the distance of the long jump motion is considered to be accurate within the range of 2 centimeters.

Table 2.

Multifunctional body motion test results

Sports category	Frequency of motion	Motion recognition rate/%	Accuracy rate/%
Sit-ups	400	98.9	98.2
Lead up	400	98.8	98.5
Squat motion	400	99.3	98.9
Fixed jump	1200	94.3	87.5

Through the test results of the data analysis shows that this sports training system for a variety of functional sports recognition rate is very high, the system recognition function is stable and reliable, in which the sit-up movement, pull-up movement, squatting movement accuracy rate can be put into the market requirements, relatively speaking, the standing long jump movement recognition accuracy is low. Through further testing and analysis, the examination of this paper’s system within the range of vision from the starting point of the jump from near to far for the standing long jump movement, the specific movement as shown in Table 3, the long jump movement test data and real measurement data obtained for comparison, observation of the measured situation and the real situation of the difference, the difference between the curves as shown in Figure 3.

Table 3.

Fixed distance distance test

Test distance	True distance	Test distance	True distance
0.63	0.58	1.72	1.73
0.85	0.75	1.78	1.79
1.12	1.02	1.85	1.84
1.16	1.09	1.93	1.94
1.23	1.15	2.05	2.06
1.32	1.21	2.12	2.13
1.36	1.45	2.2	2.21
1.45	1.46	2.31	2.33
1.51	1.53	2.35	2.36
1.72	1.71	2.45	2.53

By analyzing the real measurement data shows that the accuracy of the measurement data outside the range of 1.45m to 2.35m is not satisfactory, considering the KinectV2 ranging principle analysis may exist for the following reasons:

1) Because KinectV2 is a camera based on the TOF principle, which realizes ranging by measuring the transmission delay time between light pulses, the system’s measured ambient light may give an impact on the accuracy of KinectV2 sensing.

2) Because the KinectV2 camera collects information through a one-point positional camera, there exists a certain aberration in the position of the image edges, as well as the problem of the installation position of the KinectV2 in the sports training equipment, which may lead to the emergence of the ranging error.

3) Due to the physical differences of different testers, in standing long jump, different testers have different heights, different foot lengths, and different landing postures, the pre-estimation of the foot may have certain errors, and the distance is too close or too far may lead to a reduction in the sensitivity of the KinectV2 distance measurement.

6

Predictive analysis of athletes’ training performance

The simulation experiment results and network test results of the self-organizing mapping network applied to the training state clustering analysis of shooting athletes are shown in Table 4.

Table 4.

Clustering results

Sample number	categories	Model test
1, 4, 10, 13, 14	A	15
2, 5, 7, 12	B
3, 6, 8, 9, 11	C

According to the clustering results, the data data of 11 indicators of 15 groups were categorized as shown in Table 5, and the serial numbers 1-4 in the first row of Table 5 represented the four indicators of motor level, brain function state, central tension and central fatigue, respectively, and the data data of the follow-up survey were categorized as shown in Table 6. The scoring criteria in Table 6 are Fatigue self-perceived symptom: extreme fatigue as 1. Self-perceived training intensity: extreme intensity as 1. Self-perceived engagement: 1 diffident, 2 less engaged, 3 normal, 4 more engaged, 5 very engaged. I: slow target report, II: fast target report, III: fast spread experience, IV: slow spread experience, V: standard speed war speed practice. Male: 30 rounds per group, female: 20 rounds per group. The classification results were analyzed in conjunction with the evaluations of coaches and athletes, as well as the results of the longitudinal follow-up survey.

Table 5.

Indicators after clustering

N	Entropy difference	Main frequency	8HZ	9HZ	10HZ	1	2	3	4	Average score	Best score
A
1	0.0874	33.2512	-0.0105	0.2287	5.4123	8	5	27	-9	9.494	9.626
4	-0.0602	47.1526	1.5236	-14.2154	4.4107	8.2	4	28	4	9.485	9.578
10	0.0431	50.1245	1.6254	-1.1245	4.2157	8.3	-5	23	14	9.579	9.658
13	0.0912	40.1254	16.2546	-5.1245	15.4153	7.8	-13	16.7	8	9.647	9.824
14	0.0411	38.1245	13.1245	1.7005	14.5789	8.6	7	21.6	-2	9.558	9.684
15	0.0725	63.2564	-5.8454	6.0531	9.1456	8.4	-6	23.6	-2	9.525	9.618
B
2	-0.0985	54.9878	-4.9987	-1.2045	-10.9456	8	-3	22	-13	9.36	9.754
5	-0.0985	46.4512	-0.3907	1.3995	-17.1056	8.3	-7	31	-7	9.475	9.754
7	0.0328	49.321	1.7564	20.6145	-5.5874	9.2	-5	31.8	-15	9.615	9.836
12	0.0145	45.4512	-0.9102	5.4512	-3.8415	9.5	-14	26	-14	9.584	9.802
C
3	-0.1215	27.4516	-18.7548	5.5829	-9.3215	7.8	7	25.6	3	9.632	9.856
6	-0.0185	38.1243	-0.1234	5.1109	4.4251	8.3	6	22.3	6	9.689	9.782
8	-0.0045	24.9325	-2.1945	-1.9859	-2.0945	7.5	12	28.7	-12	9.635	9.805
9	-0.0759	24.1235	-13.4561	8.3815	-22.1393	8.2	1	16.4	-8	9.407	9.541
11	0.0086	33.7045	-0.3256	-1.8446	-3.9315	8.4	3	30.6	-5	9.659	9.761

Table 6.

Longitudinal tracking results after clustering

Serial number	Fatigue awareness symptom	Self-feeling training intensity	Ego degree	Training mode
A
1	0.25	0.72	4	I2, II2, III*3
4	0.27	0.6	4	I2, II2, III*5
10	0.26	0.66	3	II2, III7, V*1
13	0.01	0.77	4	II3, III7, V*1
14	0.21	0.69	3	II2, III2, V*1
15	0.17	0.71	2	II6, III12, V*1
B
2	0.46	0.62	3	I2, II2, III*7
5	0.68	0.61	3	I2, II2, III*5
7	0.45	0.67	3	I2, III7, IV4, V1
12	0.87	0.69	3	II3, III6, V*1
C
3	0.19	0.58	4	I2, II2, III*4
6	0.11	0.65	4	II2, III8, IV1, V1
8	0.03	0.61	3	II2, III7, V*1
9	0.39	0.69	4	II2, III7, V*1
11	0.27	0.6	4	II3, III7, V*1

The salient feature of category A is that most of the brain-adapted information entropy levels have decreased, suggesting that the information is more centralized.8 The proportion of Hz has decreased, suggesting that there is a decrease in the instability factor in the motor-technical structure of the information structure.There was a general increase in the percentage of 10Hz (primary frequency), indicating that the athlete is on a trend of information concentration towards a high level of competitiveness.The poorer state of brain function suggests that the technique is not automated. The level of arousal is normal, the fatigue index is low, and the score of self-conscious symptoms of fatigue is low. Athletes in this category are on an upward trend, suggesting that current training methods are reasonable and should continue to be strengthened.

The salient feature of category B is the higher frequency of main-sequence competition, and the higher proportion of brain main-sequence covariates indicates that the more concentrated the information is, the higher the level of athletic status. 9Hz has a rising proportion, and 9Hz is a motor skill covariate showing optimization of motor skills, and its elevated level indicates elevated stability of athletic ability.A decrease in 10Hz (main frequency) indicates a weakening of the athlete’s tendency to concentrate information. However, from another point of view, if the proportion of cognitive and thought control parameter of the brain (10Hz) rises, it reflects that the athlete’s autonomous goal orientation in synergy with the competition goal creates a certain degree of competition with the main sequence parameter of motor skills, which is not conducive to the performance of the athlete’s level of motor skills, so that the decrease of 10Hz still indicates the trend of the improvement of the level of motor skills.Higher levels of arousal suggest a higher level of training commitment on the part of the athlete. Higher level of exercise level. Higher fatigue index. The self-perceived symptoms of fatigue are higher, while the self-perceived level of commitment is not too high (probably due to the increase in their own requirements). It is suggested that the training adaptation state of this type of athletes is in an optimized state, but coaches should pay attention to the appropriate adjustment of sports intensity, reduce the amount of exercise, and carry out targeted nutritional regulation.

The prominent feature of category C is that the brain information entropy is generally elevated, indicating that the athletic ability and technical stability performance ability is not high or declining, the main frequency competition frequency is low, the proportion of 8Hz rises, indicating that the athletes’ motor skill level tuning is unreasonable, suggesting that the athletes’ technical level has a certain bias. 10Hz (main frequency) declines, the level of movement is not too high, and the brain function state is better. Technical instability, information is not centralized, fatigue self-conscious symptoms are low, feeling of having strength not to use, self-perception of training intensity is not high, the training intensity and mode should be adjusted in a timely manner.

So C, A and B can be regarded as 3 gradients of training adaptation state. c is the training non-adaptation period, A is the entering state period, and B is the optimization period of the adapted state (care is taken to prevent over fatigue). The network test results show that the network performance is good. According to the self-report of the athlete with test number 1, in the two months before mid-September, he had been unable to find the feeling, which is consistent with the pattern classification results. The athlete with test number 2 reported that his/her condition has been good and his/her performance has been good, but he/she feels tired and has a lower immunity, and he/she often catches a cold in the later period. As his fatigue index has been high, he finally lost in the 10th Games in October. Perhaps it is the reasoning that the higher the degree of order, the more unstable it is. The real-time state tracking observation of the athlete is consistent with the results of the model analysis in this paper, which proves that the self-organized mapping network is applicable to the training state analysis of the athlete.

7

Conclusion

In this paper, based on the athlete’s skeleton information, training data analysis and sports performance prediction of athletes are studied, and a training data prediction system based on self-organized mapping network is designed.

In the reliability experiment of the algorithm in this paper, the scoring accuracy of the two actions of “opening the arms” and “raising the arms vertically” is maintained in a high range, and the scoring accuracy of the “45-degree kick” is lower for 3 times in 100 repeated experiments, indicating that the scoring reliability of the algorithm in this paper is more than 97%.

The prediction accuracy of the movement prediction system for sit-ups, pull-ups, and squats meets the requirements for marketable applications, but the prediction accuracy on the standing long jump movement needs to be improved. Only when the standing long jump distance is within the range of 1.45m to 2.35m, the prediction results of the system have a certain degree of credibility.

In this paper, the training state of the algorithm shooting athletes is divided into three categories, in which C is the training non-adaptation period, A is the period of entering the state, and B is the period of optimization of the adaptation state. The athlete with test serial number 1’s verbal motor state description is in line with the pattern classification results and is associated with the entering state period.The verbal motor state description of the athlete with test number 2 also matches the classification result of belonging to the state optimization period.The clustering results above demonstrate that the self-organizing mapping network proposed in this paper is suitable for analysis of the training state of athletes.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

Construction of a neural network-based model for training data analysis and performance prediction of athletes

Shiyu Xie

Published Online: Mar 21, 2025

Received: Nov 01, 2024

Accepted: Feb 18, 2025

DOI: https://doi.org/10.2478/amns-2025-0624

KeywordsKinectV2, Automatic codec network model, Dynamic time regularization algorithm, Self-organizing mapping network, Athlete training data analysis

© 2025 Shiyu Xie, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
KinectV2, Automatic codec network model, Dynamic time regularization algorithm, Self-organizing mapping network, Athlete training data analysis