A Multidimensional Mining and Pattern Recognition Approach for Piano Teaching Behavior Data in Music Education

Piano is one of the important basic courses and a compulsory course for music education majors in colleges and universities. The cultivation and improvement of students’ piano playing ability and performance level are especially important for the music teaching majors in colleges and universities. For a long time, the research and discussion on the reform of piano teaching for college music teaching majors has been very active, highlighting the importance of piano teaching for talent cultivation. With the arrival of the “Internet+” era represented by big data, Internet of Things and cloud computing, traditional piano teaching faces unprecedented opportunities and challenges.

The traditional piano playing course teaching method is relatively single, without considering the actual learning situation and learning needs of each student, resulting in a “one-size-fits-all” teaching mode affects the effect of playing and singing class teaching, but also restricts the students’ learning initiative to play. The use of big data technology can be all the students of the playing and singing class knowledge mastery and learning needs to identify and analyze, for different students to develop different teaching programs, to ensure that the playing and singing class teaching knowledge can be better suited to the learning needs of students, can fully stimulate students to learn the independent mobility [1-2]. After class for the students have not mastered the teaching knowledge, again focus on explaining, and for the students to set aside some targeted training tasks, as a way to consolidate the theoretical basis of the knowledge students have learned, to enhance their use of playing and singing technology level [3-4]. In addition, through the flexible application of big data technology, teachers can also make clear their own teaching deficiencies, which can better guide teachers to carry out the subsequent “playing and singing” teaching work, which will help to improve the quality and efficiency of “playing and singing” teaching as a whole [5].

Cheng, M. et al. constructed a piano playing gesture recognition model based on the Extreme Learning Machine algorithm, which is able to obtain the dynamic information of the hand joints of the piano learners by recognizing and analyzing the changes of their hand gestures, based on which the data analysis model is used to recommend personalized piano learning resources for the learners [6]. Johnson, D. et al. designed an automatic evaluation system for recognizing piano hand poses by machine learning on depth graph for hand segmentation and hand pose detection. Experiments showed that the best performance was achieved by the model that used deep contextual features for hand image segmentation and used normal vector histogram for hand image detection [7]. Hsiao, C. P. et al. proposed a glove that can simulate the haptics of a piano playing teacher by embedding vibration sensors to capture the sound signals during the teacher’s playing and recognize them as tapping behaviors by machine learning algorithms, which complements the way students learn during piano teaching [8]. Chen, Y. C. et al. developed a human posture image recognition system for piano playing, which is based on edge computing technology and can be implemented to capture side sitting, front sitting, and hand images of a player, which helps to standardize students’ postures during piano playing [9]. Dai, L. Aiming at the problem of low recognition accuracy of two-piano playing sequences under noise and reverberation environments, a neural network model-based assisted training and analysis system is proposed to provide more scientific training means for two-piano players, which significantly improves the training efficiency and performance quality of the players [10]. Huang, N. et al. improved the BP neural network-based note recognition method in traditional music teaching by fusing the endpoint detection algorithm with the radial frequency extraction algorithm, which strengthened the performance of the model in note timing and note base note recognition, and meanwhile implemented a piano performance evaluation model through the improvement of BP neural network [11]. Chen, Q. divided the piano performance scoring system into single note recognition and multi-note recognition tasks, and for the real-time single note recognition task, algorithms such as local energy endpoint detection were utilized to improve the real-time and robustness of its recognition process, to improve the stability of the piano performance scoring system, and to provide students with the correct feedback needed for their performance [12]. Yu, Z. et al. emphasized the importance of automatic recognition and evaluation of playing intensity during piano performance for music teaching assistance, and proposed a piano playing intensity evaluation system, whose performance initially meets the expectations and can accurately assess the piano playing effect under interference conditions [13]. Xue, X. et al. examined the use of AI wireless networks in music teaching, using MIDI and audio editing to capture and record piano performances labeled with notes and audio waveforms, where students could both train with the support of AI multiple signal classification algorithms and select teachers accordingly [14]. Asahi, S. et al. utilized a piano practice support system based on a long and short-term memory network to extract information about learners’ performances during practice, enabling students to evaluate the rhythm and melody of their performances in a systematic analysis of the information, and to achieve independent piano practice practice [15].

This study is oriented towards multidimensional data mining of teaching behaviors, and based on the characteristic attributes of teaching behaviors, the information gain IG is adopted as an effective method to measure the role of features, and clustering of data behaviors of teaching behaviors is carried out. In order to be able to reduce the interference factors existing in the teaching video so as to better identify the teaching behavior patterns, this paper proposes the Teacher-Set IE algorithm to identify and extract the teaching behavior patterns. And by bilinearly aggregating the last layer of 2D convolutional neural network and 3D convolutional neural network across layers, a teacher behavior pattern recognition model based on 3D bilinear pooling (3D BP-TBR) is proposed. Finally, the practical effectiveness of the teacher behavior pattern recognition method in this paper is tested through experiments.

2

Multi-dimensional Data Mining and Pattern Recognition for Teaching Behavior

2.1

Multidimensional Mining for Instructional Behavior Data

2.1.1

Data mining

Data mining is also known as knowledge discovery in data mining repositories. It is from a large amount of incomplete, noisy, fuzzy, and random data for practical applications. The process of extracting information and knowledge implicit in it that people do not know beforehand, but is potentially useful topic. With more than 20 years of development, data mining technology is not only becoming more theoretically mature, but also a considerable number of data mining products and application systems have emerged subsequently and have been successful.

Data mining is a multidisciplinary cross-cutting information technology, which contains theories and techniques from a number of subject areas, such as databases, machine learning, artificial intelligence, and statistics. Databases, artificial intelligence, and mathematical statistics are strong technical pillars of data mining research. Methods and mathematical tools for data mining include statistics, decision trees, neural networks, fuzzy logic, linear programming, etc.

The basic steps of data mining are shown in Fig. 1, Data mining is a complete and iterative process of human-computer interaction processing, which needs to go through several steps. Generally speaking, the process of data mining consists of five main stages, namely: data preparation, data selection, data preprocessing, data mining, and transforming models and patterns.

Data preparation involves identifying the research object and setting predictable objectives for the project. Data selection is collecting data, obtaining the data needed for the study, and parsing the data. Data preprocessing includes operations such as cleaning, outlier removal, filtering, denoising, and normalization. Model Evaluation i.e. Evaluating the effectiveness of constructing the model and fully explaining and justifying the final discovered knowledge to aid practical problems.

2.1.2

Multidimensional data processing of teaching behavior

In the process of feature extraction for multidimensional data mining, the dimension of features gradually increases with the continuous addition of new features, which can easily lead to a dimensionality disaster if it is not restricted. When the dimension of the extracted features is high, it can be found that there will be some correlation or redundancy between the features. If the feature space is too large, it may increase computational complexity, affect model training accuracy, and reduce the classification effect. The high dimensionality of data is a key and difficult problem in data mining research. Therefore, it is necessary to do a good job of classification management of features in the process of behavior recognition, and try to ensure that the number of features is the least and the information is the most complete.

According to the feature attributes of teaching behaviors, in the case of uncertainty about which feature attributes should be included in class characterization or class comparison, certain feature item filtering methods are used to help identify irrelevant or weakly relevant attributes, so as to pick out a subset of features to represent the teaching behavior data with the computation of classification filtering.

Information gain IG is an effective measure of the role of features. The information gain IG value of a feature characterizes the magnitude of the average role that this feature plays in classification. The larger the value of information gain IG of a feature, the smaller the role of this feature for classification in that corpus set. If the same feature has a significant difference in its IG value between two different corpus sets, it indicates that there is a significant difference in the role played by this feature in the two corpus sets.

The calculation of information gain is described below.

Let S be the set of s data samples and assume that the class labeling attribute has m different values C_i(i = 1, 2, …, m).

s_i is the number of samples of class C_i, then the entropy or expected information of the set of samples is: (1) $I (s_{1}, s_{2}, \dots, s_{m}) = - \sum_{i = 1}^{m} p_{i} \log_{2} (p_{i})$

where p_i is the probability that any sample belongs to C_i and is estimated using s_i/s.

Let attribute A have v different values ${a_{1}, a_{2}, \dots a_{v}}$ . One can use attribute A to partition S into v subsets ${S_{1}, S_{2}, \dots, S_{v}}$ , where S_j is the subset of samples in S for which attribute A takes the value a_j. s_ij is the number of samples in subset S_j whose category is C_i. The entropy or expected information of partitioning into multiple intervals by attribute A is: (2) $E (A) = \sum_{j = 1}^{n} \frac{s_{1 j} + s_{2 j} + \dots + s_{m j}}{s} I (s_{1 j}, s_{2 j}, \dots, s_{m j})$

where term $\frac{s_{1 j} + s_{2 j} + \dots + s_{m j}}{s}$ is the weight of subset S. Is equal to the number of samples in subset S, divided by the total number of samples in S. The smaller the entropy value, the higher the purity of the subset division. Given subset S_j: (3) $I (s_{1 j}, s_{2 j}, \dots, s_{m j}) = - \sum_{i = 1}^{m} p_{i j} \log_{2} (p_{i j})$

where $p_{i j} = \frac{| s_{i j} |}{| s_{j} |}$ is the probability that the sample in S_j belongs to class C_i. Then the information gain of division by attribute A is: (4) $G a i n (A) = I (s_{1}, s_{2}, \dots, s_{m}) - E (A)$

Calculate the information gain for each attribute of the sample in S. The attribute with the highest discrimination in a given set is the attribute with the highest information gain.

The specific steps of feature screening are to calculate the information gain IG of each behavioral feature for a given training set, and to remove from the feature space those feature attributes whose IG is lower than a set threshold, and the calculation process includes the calculation of probability and entropy for each feature attribute.

There are various types of teaching behavioral features, and the data representation needs to be normalized in order to facilitate the fusion calculation. After collecting data, data mining consists of 3 stages, data preprocessing, pattern discovery and pattern analysis. As a data source for pattern discovery, the quality of data preprocessing directly affects the final result of pattern discovery. A good data source can not only discover high-quality patterns but also improve the performance of data mining. Therefore, data preprocessing is the foundation of the whole data mining and the key to data mining quality assurance. The data processing process is as follows: 1)

Data cleaning, filling in vacant values, smoothing early born data, identifying and removing isolated points. Dirty data can throw the mining process into chaos, leading to unreliable output. In the multidimensional data mining in this paper, the filter coefficient part is a real-time determination of the teaching behavior filtering without the need to finish the whole teaching process before giving the conclusion, so the information recorded in the teaching behavior logs will have some missing fields. At this time, these missing fields need to be filled with default values. As for the data with missing key information, it is directly removed during the data preprocessing stage.

2)

Data integration is the combination of data from multiple data sources stored in a consistent data store for the research project, the data sources include teaching behavior, teacher data, etc., in the data preprocessing process will be extracted from both the characteristics of the attributes needed to become a new set of data.

3)

Data transformation transforms data into a form suitable for mining, summarizes and aggregates data, and uses conceptual hierarchies. The low-level “raw” data is generalized into high-level concepts. Attribute data is normalized to fall within specific intervals and new attributes are constructed and added to the attribute set to aid in the mining process.

2.2

Teaching behavior clustering based on multidimensional data mining

The research work in data mining has been focused on practical clustering analysis to find appropriate methods and effective for large databases. Popular research themes focus on the scalability of clustering methods, high-dimensional cluster analysis techniques, the effectiveness of methods for clustering complex types of data and clustering methods for mixed numerical and categorical data in large databases.

As an important data analysis method, clustering has received more attention for large-scale data applications. Based on the similarity between the data, the data set can be divided into different classes by establishing a mathematical model, which minimizes the similarity of data between classes and maximizes the similarity between data within classes. A large number of clustering algorithms have emerged from the birth of clustering until now. In this section, clustering algorithms are classified into six categories such as division based clustering algorithms, hierarchical based clustering algorithms, spectral clustering algorithms, lattice based clustering algorithms, density based clustering algorithms and fuzzy based clustering algorithms.

The process of k-means algorithm is as follows: 1)

Randomly select k point to represent the initial center of the k classes to be divided.

2)

Divide the data points into the closest class by calculating the distance of the remaining data points from the centers of the k classes.

3)

For the newly formed k classes, calculate their average values as the center of each such class.

4)

Repeat 2) and 3) until all class centers no longer change.

The selection of the initial centers of the k-means clustering algorithm is random, and the initial points selected in different ways make the clustering results show differences.

The k-medoids clustering algorithm uses a data point closest to the center of the class to represent the class. k-medoids algorithm is used in many applications due to the advantages of fast convergence, local search capability and simplicity of the algorithm.

The theory of the k-medoids clustering algorithm is that given n a set $X = (x_{1}, x_{2}, \dots, x_{n})$ of data points, where $x_{i} = (x_{i 1}, x_{i 2}, \dots, x_{i p})$ denotes a p-dimensional data point. The algorithm is to find a division $p_{k} = (c_{1}, c_{2}, \dots, c_{k})$ of X that minimizes the value of its objective function J: (5) $J = \sum_{i = 1}^{k} \sum_{x_{j} \in c_{i}} {(x_{j} - o_{i})}^{2}$

where o_i denotes the center point of class c_i.

The k-medoids algorithm intra-class similarity is usually measured using the Euclidean distance, which is defined as: (6) $d_{(x, y)} = \sqrt{{(x_{1} - y_{1})}^{2} + {(x_{2} - y_{2})}^{2} + {(x_{3} - y_{3})}^{2} + \dots + {(x_{p} - y_{p})}^{2}}$

where $x = (x_{1}, x_{2}, \dots, x_{p})$ and $y = (y_{1}, y_{2}, \dots, y_{p})$ are two p-dimensional data points in the set of data points.

The exact procedure of k-medoids clustering algorithm is as follows: 1)

Select k different data points as initial class centers o_i in the set X with n data point.

2)

Calculate the distances of the other data points to their respective class centers according to Eq. and group them in the class with the closest distance.

3)

Calculate the required objective function E_i according to Eq.

4)

Generate a non-center point o_nc in each class in turn and calculate the corresponding objective function E_nc.

5)

Update the centers of the original classes to form k new data classes: if E_nc < E_i, then E_i = E_nc, and the initial class center is changed from o_i to o_nc.

6)

Repeat 2) 3) 4) 5) until the objective function no longer changes.

2.3

Pattern Recognition Methods for Teaching Behavior Data

2.3.1

Behavior Recognition and Extraction for Instructional Videos

In this paper, after studying a large number of actual piano teaching videos, we found that the teacher’s behavior in a large number of videos of teaching scenes has a certain pattern, that is, the first few people in the video of the whole teaching scene who have relatively large changes in the distance of movement and body postures may include the teacher with a high probability. In this paper, we name this concept of regularity of teaching behavior as “teacher set”, that is, the possible spatial area of teacher behavior in the whole classroom teaching video.

In order to reduce the interfering factors in the teaching video so as to better identify the teaching behavior patterns, this paper proposes the Teacher-Set IE (Teacher-Set IE) algorithm, which can identify and extract the teaching behavior patterns. The Teacher-Set IE algorithm consists of three steps: teacher and student human body key point tracking, teacher set recognition and teacher set extraction in the video of the actual teaching scene.

In the piano teaching video, assume that j = {1, 2, …, m} denotes the Person ID and k = {1, 2, …, 17} denotes the teacher-student human body key point of the corresponding Person ID. In each episode of the teaching video, the 17 human body key points detected from different target entities are matched with the bounding box information of the bounding box, and whether the entire 17 human body key points of a target entity are located in the bounding box where the person is located, and if the bounding box and the 17 human body key points in each frame belong to the same moving entity in the whole continuous video frame, then it can be judged that the 17 human body key points belong to the Person ID. The judgment formula is as follows: (7) ${K e y p o i n t_{(j, k)} k = 1, 2, \dots \dots, 17} \in R P O_{j}$

Where $k e y p o i n t_{(j, k)}$ is the krd human body key point in the jnd motion entity and ROI_j is the bounding box information box of the jth motion entity.

Next, the movement distances and body posture gesture change distances of the teacher and multiple students in the actual teaching scene are calculated. Assuming that $i = {1, 2, \dots, n}$ represents the frame of the uncensored classroom teaching video, PNum (in this paper, the value of PNum is taken as 5, and the selection of the value of PNum is described in detail in the subsequent experiments) represents the first few people in the teaching scene video. In the teaching scene video, bounding box movement distance and the change of the key point of the human body of the first few people PNum of the movement area can be judged as the teacher set (the calculation method of this movement distance MD is also described in detail in the subsequent experiments), therefore, the movement distance of the teacher set TeacherSDis can be expressed as: (8) $\begin{aligned} T e c h e r S D i s & \in M a x {[\sum_{i = 1, j = 1}^{n, m} (f r_{i + 1}^{P e r s o n_{j}} - f r_{i}^{P e r s o n_{j}}) \\ + \sum_{i = 1, j = 1}^{n, m} \sum_{k = 0}^{17} (f r_{i + 1}^{P e r s o n_{j k}} - f r_{i}^{P e r s o n_{j k}})], P N u m} \end{aligned}$

Where, $(f r_{i + 1}^{P e r s o n_{j}} - f r_{i}^{P e r s o n_{j}}))$ denotes the Euclidean distance of the center point of the bounding box of Person ID j between the i + lrd collar and ith frame in the teaching scene video, and $f r_{i + 1}^{P e r s o n j k} - f r_{i}^{P e r s o n j k}$ denotes the Euclidean distance of the key point of the human body between the i + lth and ith frames of Person ID j.

After determining the spatial region of the teacher set, the behavioral information of the teacher set will be acquired and recorded, mainly including: frames, Person ID, μ, v, γ, h, and 17 human key points. Based on this, the start and end time of the video sequence of the teacher’s behavior in the uncensored classroom video and the teacher’s behavior information in the corresponding time period are extracted. The motion region of the teacher can be extracted as follows: (9) $A_{Re g i o n} : (M i n (μ_{i} - γ_{\max} \frac{h_{\max}}{2}), M a x (v_{j} + \frac{h_{\max}}{2}))$ (10) $B_{Re g i o n} : (M i n (μ_{m} + γ_{\max} \frac{h_{\max}}{2}), M a x (v_{n} - \frac{h_{\max}}{2}))$ (11) $W i d t h_{Re g i o n} = | M a x (μ_{m} + γ_{\max} \frac{h_{\max}}{2}) - M i n (μ_{i} - γ_{\max} \frac{h_{\max}}{2}) |$ (12) $H i g h_{Re g i o n} = | M a x (v_{j} + \frac{h_{\max}}{2}) - M i n (v_{n} - \frac{h_{\max}}{2}) |$

Where, A_Region denotes the coordinates of the upper left corner of the movement area of the teacher set, B_Region denotes the coordinates of the lower right corner, Width_Region denotes the degree of the movement area of the teacher set, and High_Region denotes the height of the movement area of the teacher set. μ_i, μ_m, v_j, and v_n denote the X-axis and Y-axis null angle coordinates of the center of the bounding box of the teacher set, h denotes the height of the bounding box, and γ denotes the ratio of the width to the height of the bounding box. In order to minimize the error in the motion region, the height of the bounding box in each frame is determined based on the characteristics of the target, so come to use the highest h_max of a certain human detection bounding box to represent the height of the human detection phase, and the largest width-to-height ratio k_max denotes the width-to-height ratio of the human detection bounding box. It is important to note that the person must be a member of the teacher collection.

2.3.2

Teacher Behavior Recognition Model Based on 3D Bilinear Pooling

After identifying and extracting the teacher’s set and motion regions, it is necessary to intelligently recognize the teacher’s behaviors in the teaching scenario. In order to obtain the types of teacher behaviors in the teaching video at a fine-grained level, this paper proposes a new method of two-dimensional convolutional neural network last layer and three-dimensional convolutional neural network cross-layer bilinear aggregation (3D CLBP) based on the two-dimensional image fine-grained recognition method, and proposes a three-dimensional bilinear pooling-based pattern recognition model of teacher behaviors based on 3D CLBP that can incorporate more features of the three-dimensional convolutional layers (3D BP-TBR) model.

Since the number of 3D convolutional operations is K × T (K is the size of the over-wafer and T is the time dimension) times the number of 2D convolutional operations, only the last layer of the 3D convolutional neural network is associated with the cross-layer in this paper, given that too many 3D convolutional operations require a large number of computations.

Assume f_i, f_j, f_l ∈ R^C×L×H×W where C, L, H and W are the number of channels, duration, height and width of the 3D convolutional layer. Here six forms of 3D convolutional neural network final layer are designed. The formula for the final layer of 3D convolutional neural network bilinear feature F_l is defined as: (13) $F_{l} = f_{b} {A P 3 D^{4 \times 7 \times 7} (f_{l})}$ (14) $F_{l} = f_{b} {A P 3 D^{4 \times 7 \times 7} (C o n^{1 \times 1 \times 1} (f_{l}))}$ (15) $F_{l} = f_{b} {N (A P 3 D^{4 \times 7 \times 7} (C o n^{1 \times 1 \times 1} (f_{l})))}$ (16) $F_{l} = f_{b} (A P 3 D^{4 \times 7 \times 7} (f_{l} f_{l})}$ (17) $F_{l} = f_{b} {N (A P 3 D^{4 \times 7 \times 7} (f_{l} f_{l}))}$ (18) $F_{l} = f_{b} {N (A P 3 D^{4 \times 7 \times 7} (C o n^{1 \times 1 \times 1} (f_{l}) C o n^{1 \times 1 \times 1} (f_{l})))}$

where f_b denotes the recombination operation and $N$ denotes the normalization operation. AP3D denotes the 3D average pooling computation with filter size 4 × 7 × 7, Con denotes the 3D convolution operation with 1 × 1 × 1 small convolution kernel, and the formula for the 3D cross-layer bilinear feature F_C is defined as: (19) $F_{c} = f_{b} {N (A P 3 D^{4 \times 7 \times 7} (C o n^{1 \times 1 \times 1} (f_{i}) C o n^{1 \times 1 \times 1} (f_{j})))}$

Similarly, the formula for the aggregated 3D convolutional feature F_3D−CLBP is defined as: (20) $F_{3 D - C L B P} = c o n c a t (\sum_{l a = 1}^{L a} F_{c}^{l a}, F_{l})$

where L_a denotes the layer of 3D convolutional neural network and concat denotes the cascade operation. Based on the bilinear aggregation algorithm of 3D convolutional neural network’s last layer and 3D convolutional neural network’s cross-layer, this paper proposes a new fine-grained recognition algorithm of teacher’s behavior (3D BP-TBR), and the architecture of the 3D BP-TBR model is shown in Fig. 2. The video sequence of the teacher’s teaching behavior pattern is divided into N sub-segments of equal size, and then Inception is used to extract and splice the features of one randomly selected frame from each segment of the video sequence of the teacher’s behavioral movement region, and the spliced feature set is inputted into a 3D convolutional neural network, and the fusion representation of the 3D features is enhanced by using the 3D last-layer and 3D cross-layer bilinear feature aggregation algorithms, in which 3D features can be correlated and interconnected with each other, and the 3D BP-TBR model architecture is shown in Figure 2. When 3D features are interrelated and mutually reinforced. Finally, the recognition results of teacher’s behavior patterns are obtained to achieve fine-grained judgment and efficient recognition of teaching behaviors in massive teaching videos.

3

Piano Teaching Behavior Mining and Pattern Recognition Effects

3.1

Regular Mining Analysis of Piano Teaching Behavior

In this paper, piano teaching is classified into four teaching modes based on the characteristics of piano teaching behaviors, i.e., stagnant (S1), focused (S2), rushed (S3), and rhythmic (S4), and then multidimensional data mining is performed on the real course data set. In teaching, learners can choose when to participate in the course according to their needs. In this paper, the learning data of the teaching behavior model is counted in weekly units, and according to the teaching behavior payoff, learning gain, and learning efficiency calculation formula, the payoff formula is: (21) $e f f e c t^{w} = \frac{\sum_{i = 1}^{n} - a_{i} {\times ef}_{i}^{w}}{\sum_{i = 1}^{n} a_{i}}$

Where, effect^w denotes the weekly give and take of the teaching behavior, a_i denotes the weighting factor on learning behavior i. $e f_{i}^{w}$ denotes the weekly give and take of the teaching behavior week w on learning behavior i, and n denotes the number of learning behaviors.

Its harvest formula is: (22) $e f f e c t^{w} = \frac{e f f e c t^{w} - e f f e c t_{_{\max}}^{w}}{e f f e c t_{_{\max}}^{w} - e f f e c t_{_{\min}}^{w}}$

The formula for learning efficiency is given by Eq: (23) $r a t i o^{w} = \frac{e f f e c t^{w}}{e f f o r t^{w}}$

effort^w denotes the weekly amount of give and take in the course of study during week w of the teaching behavior. In this paper, the learner’s activity data in the piano teaching behavior is divided into give-and-take type features and gain-and-take type features. First, the weekly payoff is derived using the payoff formula, which is based on the payoff-type features and their occurrence time. Then, based on the learners’ test scores, i.e., the harvest features, the weekly harvest was derived using the learning harvest formula for different teaching behavior patterns. Finally, weekly learning efficiency was calculated using the learning efficiency formula based on weekly payoffs and weekly harvests, and then weekly learning efficiencies were combined to generate learning efficiency sequences. The weekly learning efficiency of each learner was calculated based on the four piano teaching behavior patterns, and then its dynamic changes were observed.

3.1.1

Patterns of Learning Efficiency in Different Modes of Teaching Behavior

The learning efficiency under the rhythmic teaching behavior mode during the entire course is less fluctuating and relatively better than the other modes. It has been stable above 2.0 after the second week. Learning efficiency is closely related to the teaching behavior mode, which directly affects the learners’ effectiveness in course learning and reflects the teaching situation and the characteristics of the teaching behavior. The fact that the learners were able to maintain their learning efficiency in the rhythmic teaching behavior in the piano course indicates that the rhythmic teaching behavior enables the learners to enjoy the exploration of knowledge in the teaching process, to independently choose the learning space to explore their own values, and to have stronger motivation to learn, as well as to obtain better learning results in their learning.

3.1.2

Patterns of Learning Gains in Different Modes of Teaching Behavior

This paper studies the dynamic evolution of quiz scores for various piano teaching behavior patterns, where the scores can indicate learners’ mastery of the course content of the unit and demonstrate their learning progress. The evolutionary pattern of learning gains is shown in Figure 4. The stagnant teaching behavior shows that the instructor only has a small contribution to the quiz near the end of the lesson, and similarly, the learner’s quiz scores only have a small gain near the end of the lesson, and almost zero at other times. The first four weeks of focused instructional behavior yielded good gains in quiz performance. However, as the course progressed, their quiz scores showed a similar phenomenon to learning efficiency. The Rushed Instructional Behavior Model made relatively good gains in engagement payoffs throughout the course, so learners also made better quiz scores, with learning gains ranging from 0.4 to 0.9. Rhythmic teaching behavior model of the first two weeks of relatively little pay, but this type of educators immediately realize the problem, can make timely teaching adjustments, so that the learners in this teaching behavior model in a timely manner to be corrected, in the subsequent quiz scores in the upward trend, learning gains in the second week after all can be maintained at 0.8 or more.

3.2

Empirical Analysis of Piano Teaching Behavior Oriented Pattern Recognition

The dataset ucf101 is used as the experimental dataset for the experiments in this paper. ucf101 is a piano teaching course video dataset with a total number of 13,320 video clips and a total duration of 27 hours, with video clip lengths ranging from 4 to 10 seconds. In this paper, we first divide the ucf101 dataset into a training set and a validation set using random division, with the training set accounting for 80% of the total number of videos in the dataset and the validation set accounting for 20% of the total number of videos in the dataset, and the number of video clips in the training set and validation set being 9537 and 3783, respectively.We select stagnant (S1), focus-on-attention (S2), catch-up (S3), and rhythmic (S4) 4 teaching behavior patterns as the research object of this paper. Accuracy, loss, recall, precision and F1 index are used to evaluate the model recognition results of this paper.

The experiment verifies the recognition effect of the model when randomly dividing the dataset, which is also commonly known as a pattern recognition experiment. Randomly dividing data into training and validation sets according to a certain proportion is the most common way of processing it, and such a division can better evaluate the model’s recognition ability. In the experiments, the recognition model of this paper (3D BP-TBR), the traditional BP recognition model, and the TSN model are used to train and validate on the training set and validation set, respectively, to compare and analyze the recognition effect of the two methods. The experimental results when randomly dividing the dataset are shown in Fig. 5, the Train Loss and Val Loss of the recognition model proposed in this paper are 0.089% and 0.047%, respectively, and the Train Acc (97.11%) and Val Acc (99.03%) are higher than that of the traditional BP recognition model as well as the TSN model. The results show that the method proposed in this paper effectively improves the recognition accuracy of the model.

Using the model of this paper to test the recognition of four teaching behavior patterns, the recognition results of the teaching behavior patterns are shown in Table 1, among the recognition of the teaching behavior patterns, the precision rate of the four teaching behavior patterns such as stagnation type (S1), focus type (S2), catching up with the work (S3), and rhythmic type (S4) are all over 93%, and the average recognition precision rate, recall rate, and F1 value are 97.31%, 96.96%, and 97.34%, respectively, which indicates that the pattern recognition method of this paper is effective and can adequately identify the piano teaching behavior patterns. 96.96% and 97.34%, which indicates that the pattern recognition method in this paper is effective and can adequately identify the behavior patterns of piano teaching.

Table 1.

Identification of the pattern of teaching behavior

Serial number	Behavior pattern category	Recall rate	Accuracy rate	F1
S1	Stagnation type	0.9735	0.9895	0.9866
S2	Focus type	0.9984	0.9387	0.9682
S3	Drive type	0.9085	0.9748	0.9435
S4	Rhythm type	0.9978	0.9893	0.9954
Mean		0.9696	0.9731	0.9734

4

Conclusion

Mining and identification of teaching behavior data can provide scientific data and a basis for optimizing teaching processes, teaching results, and teaching environments. Therefore, this paper establishes a model for recognizing teacher behavior patterns based on 3D bilinear pooling (3D BP-TBR) using multidimensional data mining. The research results of this paper are as follows: 1)

Through the law mining of different teaching behavior patterns, it can be seen that in piano teaching, the learning efficiency of stagnation-type and focus-type teaching behaviors in the whole course of study is relatively low, and tends to be close to 0 in the tenth week of the teaching process. The learning efficiency of the catch-up-type teaching behavior pattern is between 1.00 and 2.75. It remained stable above 2.0 after the second week under the rhythmic teaching behavior mode. In the learning gain method, the learning gain trend pattern of the four teaching behavior patterns was similar to the teaching efficiency pattern. Among them, the rhythmic teaching behavior pattern has relatively better learning efficiency and learning gains compared to the other teaching behavior patterns. It shows that different teaching behaviors have different impacts on teaching learning efficiency and learning gains. However, the rhythmic teaching behavior pattern is associated with better learning gains in piano teaching.

2)

3D BP-TBR achieved optimal results in the experiment with Train Loss and Val Loss of 0.089% and 0.047%, and Train Acc and Val Acc of 97.11% and 99.03%, respectively, and the recognition performance is higher than that of the traditional BP recognition model as well as the TSN model. In addition, the precision rates for the four teaching behavior patterns of stagnation, focus, rush, and rhythm were 98.95%, 93.87%, 97.48%, and 98.93%, respectively, and the average recognition precision, recall, and F1 value were all higher than 96%, which verified that the pattern recognition model in this paper performs well and can accurately recognize the piano teaching behavior patterns.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

A Multidimensional Mining and Pattern Recognition Approach for Piano Teaching Behavior Data in Music Education

Cheng Lyu

Published Online: Mar 24, 2025

Received: Nov 01, 2024

Accepted: Feb 14, 2025

DOI: https://doi.org/10.2478/amns-2025-0705

KeywordsData mining, Teaching behavior, 3D bilinear pooling, 3D BP-TBR, Pattern recognition

© 2025 Cheng Lyu, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
Data mining, Teaching behavior, 3D bilinear pooling, 3D BP-TBR, Pattern recognition