Research on Sports Dance Training and Teaching in Modern Colleges and Universities Combined with Deep Learning
Published Online: Mar 21, 2025
Received: Oct 13, 2024
Accepted: Feb 10, 2025
DOI: https://doi.org/10.2478/amns-2025-0602
Keywords
© 2025 Jie Jiao, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
With the improvement of people’s living standard, more and more people begin to pay attention to health problems, and sports have become a popular way of fitness. In colleges and universities, sports dance is a set of sports, art, physical beauty in one and as a beautiful art form and an important sports program, by a wide range of attention and support [1-4]. Sports dance has a low threshold, rich in forms of expression, and has become a popular sport in colleges and universities, and the quality of dance teaching should also be mentioned to a new height [5-6]. However, at present, there is a serious disconnection between classroom teaching and extracurricular training in college sports dance, which directly affects the improvement of sports dance performance and is not conducive to the long-term development of sports dance specialty. In order to improve students’ competitive level and physical and mental health, the combination of teaching and training of college sports dance is increasingly important [7-10].
Training content and method are two important aspects of specialized physical training in college sports dance. In terms of training content, it is necessary to develop a scientific and reasonable training program according to different training objectives and individual differences of students, focusing on details and skill training [11-13]. In terms of training methods, diversified training methods can be used, such as single training, combination training and cycle training, and at the same time, competitive training and application of technical means can also be increased to improve the training effect and interest [14-16]. Sports dance special physical training should focus on comprehensiveness, which means that the training program needs to cover all aspects of the athlete’s body, including endurance, explosive force, flexibility, balance and so on. In training, it is necessary to set up corresponding training programs for different abilities in order to comprehensively improve the physical quality of athletes [17-20].
In this paper, the OpenPose algorithm, the TAR-DL method, and the improved DTW algorithm are comprehensively applied to successfully assess the accurate movement of sports dance training movements and provide teaching aids for dance teaching. Firstly, the key points of human skeleton in the sports dance movement images are extracted based on the OpenPose algorithm, and the estimation of human posture is realized by matching the key points for the sports dance trainers in the images and aggregating different types of key points into human bodies. Then, the TAR-DL method, which embeds the (3D time + 3D channel + 3D space) attention module with BCEF loss function in SlowFast network, is proposed to realize the accurate recognition of sports dance movements. Then, an improvement strategy for the DTW algorithm is proposed, and the sports dance movement evaluation is realized based on the improved DTW algorithm. Finally, the model is applied and evaluated.
In order to realize the research on sports dance training and teaching in modern colleges and universities, this paper firstly extracts the key point information of human skeleton by using OpenPose algorithm, then constructs a sports dance movement recognition model based on TAR-DL method, and finally applies DTW algorithm to evaluate the normality of the recognized dance movements.
Since the sequence of human skeletal keypoints can effectively express the human action posture information in videos, more and more computer vision and machine learning tasks are realized based on human posture estimation algorithms, including image recognition, action video analysis, etc. OpenPose algorithm is the world’s first real-time 2D human posture estimation open-source library developed based on convolutional neural networks and supervised learning, and its good results in both single and multi-person human pose detection applications [21].
The OpenPose algorithm uses a bottom-up idea to implement a human pose estimator, which first predicts all the keypoints in the image as candidate points, and then matches the keypoints for each person in the image, aggregating different types of keypoints into a human body. Therefore, no matter how many people are in the image, the pose of all the people can be estimated with only one inference, thus achieving near-real-time processing performance. Meanwhile. The OpenPose algorithm also has excellent performance in estimating poses of blurred or overlapping person objects.
The Open Pose algorithm adopts a two-branch multi-stage convolutional neural network structure, and the network structure and overall processing flow are shown in Fig. 1. First, the original RGB image of size

Two-branch multi-stage convolutional neural network
Corresponding loss functions are applied at the end of each stage of the two network branches to guide the network iterations in generating the confidence map ensemble and PAF, where a standard
During the OpenPose model training process, a 2D confidence map
Partial Affinity Vector Field PAF for Human Skeletal Keypoints PAF is a two-dimensional vector field that encodes the orientation of a human limb from one skeletal keypoint to another, with each type of human limb having a corresponding partial affinity vector field connecting the two associated body part maps. The greatest advantage of this feature representation is that it preserves both positional and directional information about the limb support region.
The single limb keypoint partial affinity vector field is schematically shown in Fig. 2, and let the left elbow bone keypoint

Schematic diagram of affinity vector field of key points of a single limb
In this paper, we propose the TAR-DL method, which embeds the (3D temporal + 3D channel + 3D spatial) attention module with the BCEF loss function in the SlowFast network to improve the average accuracy of the model for sports dance movement recognition.
The structure of the TAR-DL network in this paper is schematically shown in Fig. 3. First, consecutive multiple video frames with temporal information are input into the TAR-DL network, and the spatial and motion information in the video are captured using the Slow Fast network, after which the generated spatial and motion information are horizontally connected to generate a feature map with five dimensions of data information [23]. These five dimensions are the number of video frames, the number of channels, the time dimension, the width and the height of a single input model, i.e.,

Network structure of the TAR-DL
(3D time + 3D channel + 3D space) The computation of the attention convolution module is shown in Eqs. (14) to (16):
In the (3D time + 3D channel + 3D space) attention convolution module, Global Maximum Pooling (GMP) and Global Average Pooling (GAP) are used to focus on the information of time, channel and space, and then Squeeze and Excitation operations are performed to obtain the corresponding weight information.
Attention Modules
A common practice to improve the average recognition accuracy of existing network models is to embed an attention module, and the performance of many networks is improved by embedding an attention module. Thus, in this paper, a lightweight attention module, CBAM, is improved and embedded into the existing SlowFast network to improve the average recognition accuracy of network models [24].
CBAM consists of two sub-modules: channel attention CAM and SAM, CAM performs channel Attention on Feature map, i.e., retains channel information, compresses spatial information, and focuses on the “what” in the network. SAM performs spatial Attention on Feature map, i.e., retains spatial information, compresses channel information, and focuses on “where” in the network.
Eqs.
CBAM only focuses on both channel and spatial information, while ignoring the temporal information. For this reason, this paper proposes (3D time + 3D channel + 3D space) attention convolution module to improve the average recognition accuracy of the existing network SlowFast while preserving the temporal information between consecutive video frames.
3D temporal attention part
The 3D temporal attention part uses the temporal information of the Feature map obtained from SlowFast network to generate a 3D temporal attention map, which focuses on the “when” information in the input video. The specific process is as follows:
First, the 3D Temporal Attention module uses 3DGMP with 3DGAP to aggregate the channel information and spatial information in the input Featuremap to generate two different feature maps:
The formulas of 3D temporal attention module are shown in Eqs. (19) to (20):
3D channel attention part
The 3D channel attention module uses the feature map obtained from SlowFast network to generate the 3D channel attention map, which focuses on the “what” information in the input video. The specific process is as follows:
First the 3D Channel Attention module uses 3DGMP with 3DGAP to make the network model focus on temporal and spatial information to generate two different 3-dimensional Featuremaps:
The formulas of the 3D channel attention module are shown in Eqs. (21) to (22):
3D Spatial Attention
The 3D spatial attention part uses the feature map obtained from SlowFast network to generate the 3D spatial attention figure
The computation of the 3D spatial attention module is shown in Eqs. (23) to (24):
Loss function
In the multi-label classification action recognition task with spatio-temporal information, the loss function used by the conventional deep learning network model is the loss function of binary categorization crossover (BCE Loss). The formula for BCE Loss is as follows:
Compared with BCE Loss, Focal Loss increases the weight value of (1 –
In order to increase the loss weight of the sample categories of the tail class data and reduce the influence of the long-tail effect on the model training, this paper fuses the BCE Loss with the Focal Loss and proposes the BCEF Loss to improve the average identification accuracy of the model. The formula of the BCEF Loss is shown in equations (27) to (29):
The core idea of this paper is to improve the overall average correct recognition rate of the model by adjusting the focus parameter
In order to further assess the completion quality of prescribed movements in sport dance, this paper develops a movement assessment strategy. The training set was selected to integrate the prescribed movement template, and at the same time, the qualified range of movement quality was determined, collectively known as the range of movement similarity index, through which the range of similarity index was used to determine the grade to which the prescribed movements belonged to, so as to achieve the purpose of movement quality assessment. The data in the test set are used as the actual action sequences, and the optimized DTW algorithm is used as the evaluation model, which calculates the action similarity index by comparing the actual action sequences with the sequence of prescribed action templates, and then determines the action quality grade in the action evaluation grade table, and then evaluates and provides targeted guidance to the actual actions according to the grade [26].
The DTW algorithm can be seen as employing a dynamic programming strategy to search for the minimum path of two time series of unequal lengths. By aligning the two sequences, the algorithm is able to measure the similarity between them more precisely. Similarly, DTW has a strong potential for application in sports dance movement quality assessment. It can match the realistic movement sequences with the template movement sequences, calculate the similarity between them, and provide a scientific basis for assessing movement normality and precision.
The main idea is: Let there be two different time sequences of the same sport dance movement, sequence
The correspondence of sequence points between Sequence

Mapping between sequential points before and after dynamic time planning
The process of utilizing the DTW algorithm for sports dance movement evaluation is as follows:
First, in order to align sequence
Let
There exists a shortest path in Boundary conditions: the first and last points of the two sequences must be one-to-one correspondence, to ensure the completeness and coherence of the sequence alignment, and to avoid the missing sequence fragments. Monotonicity: due to the specificity of the individual, each person performs the action at a different speed, but the action sequence must be constant, so the action sequence advances with time. Therefore, if Continuity: The next point of the current point can only be a point directly adjacent to it. That is, the next point of the current point (
The above conditions show that if the coordinate of the current point is (
In this way, the DTW distances of sequences C and Q can be obtained
0.6 is selected as the critical value, when 0.6 ≤ Ψ(
For the task of sports dance movement evaluation, the DTW algorithm does show high accuracy. However, at the same time, its complexity should not be neglected. The DTW algorithm calculates the Euclidean distance, which leads to both time and space complexity of
In order to improve the performance of the DTW algorithm in the task of sports dance movement evaluation. The following improvement strategies are adopted for the DTW algorithm:
Relax the global path restriction: expand the slope range from 0.5~2 to 0.2~3, which reduces the constraints of the algorithm in finding the optimal path, and thus reduces the amount of computation. Meanwhile, due to the continuity and fluidity of sports dance movements, relaxing the slope restriction can better capture the similarity between movements and improve the accuracy of evaluation. Coarse-graining search for shortest path. First, the full-resolution matrix
In order to verify the effectiveness of this paper’s method for sports dance basic movement recognition, the TAR-DL network model was trained using the training set of sports dance basic movement dataset, and the accuracy of the model was tested using the test set. Compared to the action recognition method based on video data, the recognition method based on skeletal coordinate information has higher recognition efficiency. Therefore, in the comparison experiments, this paper first compares different methods based on 3D action recognition on the NTU RGB-D dataset, and the comparison results of different methods on the NTU RGB-D dataset are shown in Table 1.
Comparison of 3D-based recognition methods in NTU RGB-D
Methods | CV/% | CS/% |
---|---|---|
ST-LSTM | 78.94 | 70.31 |
TSRJI | 81.41 | 74.25 |
Clips+CNN+MTLN | 85.67 | 80.46 |
ST-GCN | 89.44 | 82.17 |
DPRL | 90.75 | 84.38 |
SGN | 94.38 | 87.52 |
2S-AGCN | 96.24 | 89.43 |
2S-NLGCN | 96.24 | 89.43 |
MS-AAGCN | 97.16 | 90.27 |
Sym-GNN | 97.58 | 90.49 |
MS-AAGCN+TEM | 97.69 | 91.84 |
TAR-DL | 99.72 | 97.41 |
The results show that 3D-based action recognition with richer joint relationships helps to capture more useful patterns, and additional motion prediction and complementation based on skeleton features in the NTU RGB-D dataset improves the recognition efficiency. To deal with noise and occlusion in 3D skeleton data, ST-LSTM introduces a gating mechanism in LSTM to learn the reliability of sequential input data and adjusts its effect on updating long term contextual information stored in memory cells accordingly, with a recognition rate of 77.7%. And the recognition efficiency of ST-GCN reaches 88.3%, which is a combination of the GCN model as well as the TCN model, which is a dynamic skeleton model of spatio-temporal dual streams, and the spatio-temporal dual streams based recognition method is configured with three-dimensional convolutional filters, and the accuracy of this method is better than the network structures such as ST-LSTM, TSRJI, Clips+CNN+MTLN, and so on. Where the hierarchical structure of the GCN model and the data in the action recognition task are diverse, the topology of the graph is heuristically set and fixed to all model layers and input data for processing the data with different rules of the graph structure. The recognition accuracy of MS-AAGCN is 96.2%. The action recognition method that uses TEM (Time Extension Module) in addition to MS-AAGCN has a recognition accuracy of 96.5%. And the TAR-DL method used in this paper, embedding (3D time + 3D channel + 3D space) attention module with BCEF loss function in SlowFast network, the recognition rate of this method is up to 99.72%, which is obviously better than other sports dance movement recognition methods.
Given that sports dances are categorized into 10 dance categories, this paper targets the basic movements of Tango among them to conduct recognition experiments. In order to accurately observe the TAR-DL classification results of the self-constructed sports dance video dataset, this paper uses a confusion matrix to evaluate the performance of TAR-DL. The resulting confusion matrix for sports dance movement recognition is shown in Fig. 5, where it is known to assume that the diagonal elements are equal to the percentage of real numbers.

Confusion matrix of ballroom dancing basic movement recognition
The confusion matrix shows that TAR-DL can effectively solve the problems of shape change and bone noise in large-scale data. Sport Dance’s pas de deux format causes occlusion problems because of the interaction between dancers, and the consistency of each dancer’s movements can have an impact on the experiment’s results. As can be seen from Figure 5, the recognition rate of the six basic movements of “Progressive Side Step”, “Closed Promenade”, “Lock Turn”, “R.F&L.F Lock Turn”, “Progressive Link” and “Walk” is more than 90%, indicating that the TAR-DL model in this paper has a good recognition effect on the movement of dance sport and is feasible. Specifically, “Progressive Side Step”, “Closed Promenade”, and “Progressive Link” are confused, with 1% of “Progressive Side Steps” being mistaken for “Closed Promenade” and 1% being mistaken for “Progressive Link”. In “Closed Promenade”, 1% were mistaken for “Progressive Side Step” and 2% were mistaken for “Progressive Link”. In “Progressive Link”, 1% were mistaken for “Progressive Side Step” and 2% were mistaken for “Closed Promenade”. At the same time, “Lock Turn” is also confused with “R.F&L.F Lock Turn”, and 4% of “Lock Turns” are mistaken for “R.F&L.F Lock Turns”. The reason for the confusion is mainly because in the dance sport, the extraction of each dancer’s movements has the problem of different active and passive interactions, the lack of information about the relationship between skeleton joints, and the similarity between some movements, so it is difficult to capture and distinguish.
In order to verify the effectiveness of the DTW algorithm improvement strategy designed in this paper, a sports dance athlete competition video in a standard dataset is randomly selected as the prediction data, and the distance of the three basic movements of “Progressive Side Step”, “Closed Promenade” and “Progressive Link” is analyzed by the Improved-DTW algorithm proposed in this paper. As shown in Figure 6, (a)~(c) are the comparison of the three dance movements of “Progressive Side Step”, “Closed Promenade” and “Progressive Link”, respectively. The shaded portion depicts the distinction between the standard sequence and the matched sequence of the Improved-DTW algorithm.

Comparison of foot angle change of athletes with standard sequence
From Figure 6, it can be clearly observed that the two trajectories are closely adjacent to each other during each movement implying that the athletes’ foot movement trajectories in this process are relatively standard and highly compatible with the ideal dance movements. However, if a significant distance between the matching line and the standard line is found during the movement process, it proves that there are deficiencies in the athletes’ sports dance movement postures, which fail to fully meet the standard requirements of the movement. Additionally, the data collected during this experiment revealed the dynamic changes in the athlete’s front and back swing during every move.
In order to verify the performance of Improved-DTW algorithm, DTW algorithm, FastDTW algorithm and HMM algorithm are used in this section of experiments for experimental comparison respectively, and the recognition results of each algorithm are shown in Table 2.
Experimental comparison
Algorithms | Accuracy rate/% | Time consuming/ms |
---|---|---|
DTW | 91.25 | 2.04 |
FastDTW | 86.73 | 1.74 |
HMM | 90.68 | 2.18 |
Improved-DTW | 94.39 | 1.67 |
Through the comparative analysis, it can be clearly seen that the Improved-DTW algorithm proposed in this paper significantly outperforms the other compared algorithms in terms of accuracy. In the comparison results with the DTW algorithm, due to the Improved-DTW adopts the improvement strategy of relaxing the global path restriction and coarse-graining the search for the shortest path, the algorithm outperforms the DTW algorithm in terms of time-consumption and accuracy, with an improvement of 3.14% in terms of accuracy, and a reduction of 0.37ms in terms of time-consumption, which proves the effectiveness of the algorithmic improvement strategy designed in this paper. The results of experimental comparisons fully demonstrate the superiority and effectiveness of the Improved-DTW algorithm in the task of sports dance movement recognition.
In this paper, each dancer’s posture during practicing the same piece of sports dance is selected, and the similarity scores between the sports dance posture and the standard posture of different dancers are calculated separately. When performing the calculation, the weights of all body parts are set to 1, and the weights of each category of scores are also set to 1. The sports similarity scores calculated using this paper’s sports dance movement assessment method based on the improved DTW algorithm are shown in Table 3. Where A1~A7 denote the numbers of different dancers, A1 and A2 are professional dancers, and A3~A7 are amateur dancers.
Ballroom dancing movement similarity score using the method in this paper
Dancer | |||||
---|---|---|---|---|---|
A1 | 91.9 | 93.0 | 90.6 | 84.6 | 90.0 |
A2 | 89.8 | 90.8 | 91.0 | 89.7 | 90.3 |
A3 | 70.1 | 72.2 | 69.2 | 72.1 | 70.9 |
A4 | 79.5 | 80.9 | 63.7 | 55.2 | 69.8 |
A5 | 77.9 | 81.4 | 65.9 | 60.8 | 71.5 |
A6 | 65.8 | 58.9 | 61.1 | 60.5 | 61.6 |
A7 | 56.0 | 60.3 | 49.7 | 56.8 | 55.7 |
As seen in Table 3, the scoring results of professional dancers A1 and A2’s sport dance postures were 90.0 and 90.3 respectively, which were much higher than those of other dancers. The dance scores of amateur dancers, such as A3~A7, were lower than those of the professional dancers, with scores all lower than 72. This reflects the validity of the method of this paper to some extent.
In order to further prove the feasibility of this paper’s method, the results obtained from this paper’s method are compared with the scoring results of professional sports dance teachers, and the comparison of scoring results is shown in Table 4. As can be seen from Table 4, the results of the sports dance movement evaluation method based on the improved DTW algorithm described in this paper are relatively close to the scoring results of professional sports dance teachers. It shows that the method of this paper can be applied to the movement scoring task of sports dance training and teaching in modern colleges and universities.
Comparison of scoring results
Dancer | Upper-body fluidity | Lower-body fluidity | Musical timing | Body balance | Choreography | |
---|---|---|---|---|---|---|
A1 | 4 | 4 | 5 | 5 | 4 | 90.0 |
A2 | 4 | 5 | 5 | 5 | 5 | 90.3 |
A3 | 4 | 3 | 5 | 3 | 4 | 70.9 |
A4 | 4 | 5 | 3 | 2 | 5 | 69.8 |
A5 | 4 | 3 | 4 | 4 | 4 | 71.5 |
A6 | 4 | 2 | 4 | 2 | 3 | 61.6 |
A7 | 2 | 3 | 3 | 2 | 3 | 55.7 |
In addition, this paper also compares the scoring results without and with movement sequence alignment as shown in Table 5, which shows that the scoring effect is closer to the scoring results of professional sport dance teachers after using movement sequence alignment.
Evaluation results with and without alignment processing
Dancer | The score of sports dance teachers after the normalization process | Align the processed |
Do not use aligned |
---|---|---|---|
A1 | 93 | 90.0 | 67.4 |
A2 | 93 | 90.3 | 38.5 |
A3 | 82 | 70.9 | 60.2 |
A4 | 78 | 69.8 | 44.8 |
A5 | 82 | 71.5 | 52.9 |
A6 | 58 | 61.6 | 30.6 |
A7 | 50 | 55.7 | 41.5 |
Based on OpenPose algorithm, TAR-DL method and improved DTW algorithm, this paper explores the application of deep learning technology in the training and teaching of sports dance in modern colleges and universities, and provides data references for dance teaching through accurate recognition of sports dance movements.
The 3D-based motion recognition has richer joint relationships, which helps to capture more useful patterns, and the recognition effect is better than that of traditional 2D recognition methods. The TAR-DL method used in this paper, embedding (3D time + 3D channel + 3D space) attention module and BCEF loss function in the SlowFast network, has a recognition rate of 99.72% for dance sports training movements, which is significantly better than other 3D recognition methods. At the same time, in the TAR-DL classification experiment of sports dance movements, the recognition rate of six basic sports dance movements of “Progressive Side Step”, “Closed Promenade”, “Lock Turn”, “R.F&L.F Lock Turn”, “Progressive Link” and “Walk” reached more than 90%. The results show that the TAR-DL model in this paper has a good recognition effect on sports dance movements and is feasible.
When the Improved-DTW algorithm analyzed in this paper is applied to recognize sports dance movements, the trajectory of the athlete’s feet during each movement is relatively consistent with the ideal dance movement. And comparing with the DTW algorithm, the Improved-DTW algorithm is significantly improved in terms of time consumption and accuracy, and its time consumption is reduced by 0.37ms while the accuracy is increased by 3.14%, which proves the effectiveness of the algorithmic improvement strategy designed in this paper. In addition, the Improved-DTW algorithm is used in the actual dancer movement evaluation task, and the evaluation results obtained are extremely close to those of professional sport dance teachers, thus strongly proving the superiority of the Improved-DTW algorithm in the sport dance movement recognition task.