Published Online: Mar 21, 2025
Received: Oct 22, 2024
Accepted: Feb 01, 2025
DOI: https://doi.org/10.2478/amns-2025-0667
Keywords
© 2025 Yujing Wang, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Piano teaching also has a complete teaching model, and in addition to literacy development, performance skills, basic training, and selection of teaching aids, students need to understand music theory [1-3]. Students should learn to read music, understand basic music notation and terminology, and have some knowledge of music theory. Teaching music theory can help students better understand the structure and style of music and enhance their musical literacy and creativity.
Music theory is an important part of music fundamentals, which refers to the study and analysis of musical structures, musical principles and musical forms [4-6]. Music theory mainly includes rhythm, harmony, melody, pitch, timbre and other aspects, which are the basis of musical expression and creation [7-8]. In the basic training stage of piano teaching, scale, tone, and sense of rhythm are important basic skills in addition to finger exercises [9-11]. Scale is the basis of musical melody, the full name of the scale is tonal scale, which is also applicable in piano teaching, the piano scale is divided into 12 groups by the twelve equal temperament [12-14]. In music theory, rhythm and melody are important means of expressing feelings and emotions. In the performance technique stage of piano teaching, it mainly includes finger weight changes, legato, chords, tone shifts and musical expression techniques [15-16]. In music theory, music harmony can be divided into chords, composition of harmony, and function of harmony [17-18]. And musical timbre refers to the characteristics and distinctions of different instruments or voices in a musical work, which can be divided into clear, thick, bright, soft and so on [19-20]. The combination of music rhythm, harmony, melody, pitch and timbre is one of the important means of music creation and expression, which affect and complement each other, and together constitute a complete musical work.
Music theory is a crucial component of music creation and expression, as well as music education and performance. Therefore, in music education, the combination of music theory and piano teaching can help students establish correct piano music concepts and knowledge systems, and improve their piano playing quality and skill level.
This paper focuses on a piano music evaluation method based on music theory knowledge, which is combined with piano teaching, and proposes an intelligent model for piano education. The intelligent piano evaluation method, which incorporates music theory knowledge, first recognizes tone name and time value information from a piano audio signal by using Fast Fourier Transform, cepstrum analysis, and so on. Then the piano notes are estimated using harmonic reset and octave correction methods, and the real piano notes are filtered out from the audio mixed with multiple sound sources. By intelligently scoring students’ piano music performance in terms of pitch, intensity, rhythm, and time value, it is possible to understand each student’s learning situation in a targeted manner.
Music is an important way to convey emotional moods, and understanding music theory is the foundation of music composition. Literature [21] also mentions that although music creation is free, it requires basic knowledge of music theory. No matter what kind of music teaching, creation must be an inevitable topic. While piano teaching is a teaching that brings together many aspects of music theory, finger gestures, piano knowledge, etc., the piano, similar to most musical instruments, promotes the combination of theory and practice [22]. The study by Xian [23] shows that most teachers in piano teaching focus on their piano playing ability, followed by teaching the demonstration method, and finally the practical method. And the research study of [24] found that music students recognize the value of music theory in music courses.
Nowadays, the combination of music theory and piano teaching with intelligent technology caters to the intelligent education model of the current era development on one hand, and accelerates students’ piano skills on the other. Therefore, literature [25] uses Tonnetz in music theory to encode music in a two-dimensional form to encode musical relationships between pitches, and models polyphonic music in combination with deep networks to generate more stable tones. The results of this research can help students in piano teaching adjust their playing techniques in time during performance practice and record audio of creative inspiration during composition for subsequent review. In addition, literature [26] mentions that chord embedding with predicted chords to generate polyphonic music is successful based on music theory perspective. Whereas, literature [27] on polyphonic piano music converted to notation by reinforcement learning (RL) RL transcriber can reduce the effect of noise in recorded audio. This research, both in teaching and independent learning, has greatly helped teachers and students. Literature [28] then utilized a toolkit with smart technology, such as audio, electronic piano, synthesizer, and other instruments. For composition, the results showed that students’ musical thinking and creativity were enhanced. It can be seen that smart technology combined with music theory has made changes and innovations to the piano teaching model. Literature [29] analyzed the effectiveness of smart devices on personal development in piano training and music theory knowledge, and although music can be learned quickly, there are limitations, just as piano teaching is incomplete without music theory. In addition, literature [30] mentions the use of visualization in learning music to help students create efficiently. Indeed after visualization, the combination of music theory and piano teaching, on the one hand, to help students learn theoretical knowledge, on the other hand, piano fingering, strength, rhythm and so on can be infinitely refined and fixed, for students to learn the piano to lay a solid foundation.
There are three types of music features, namely overall features, basic features, and complex features. The basic features of music reflect the complex features, then express the overall features, and finally show the structure of the composition, express the artistic style, and emphasize the emotional connotation.
Tone length, pitch, timbre, and intensity are the four basic characteristics of sound, and the corresponding physical quantities are vibration duration, frequency, spectral distribution, and amplitude. The length of the music heard by the human ear depends on the duration of the vibration of the instrument’s sounding position. In the case of a piano, for example, the longer the keys are pressed, the longer the tone, and conversely, the shorter the tone. The height of the sound depends on the number of vibrations per unit time of the sounding position of the instrument, the higher the frequency, the higher the sound. The tone is higher when the frequency is higher. The lower the frequency, the lower the tone. Tone is the sensory characteristics of music, different instruments play music contains different frequencies of overtones accompanied by, and it is because of these complex and varied overtones make people able to distinguish the timbre of a variety of instruments. The pianos have a rich bass, a natural and smooth midrange, and a bright and gorgeous treble, all of which are synthesized by more than 200 strings and soundboards that create beautiful tones. The strength of the sound, as the name suggests, is determined by the amplitude of the music, the greater the amplitude of the music, the stronger the sound.
The modern piano is the instrument with the widest range, capable of playing 88 musical notes of varying pitch. It contains 52 white keys corresponding to the names of several basic independent levels. Any two adjacent keys on a piano are a half tone apart. Assuming that only the white keys on the piano keyboard are observed, each group of white keys constitutes a variable pitch scale, with each group representing keys of various pitches ranging from a certain tone to an octave higher. The pitch of the groups exhibits a gradual upward or downward relationship. The meter is responsible for describing the precise heights of the tones in the musical system and how they are related to each other. The main familiar meters are the “pure meter”, the “five degree meter” and the “twelve degree meter”. The twelve equal temperament law, also known as the twelve equal-range law, is the equal-frequency, equal-proportional division of the interval of an octave into twelve semitones, and is utilized by most plucked and symphonic instruments to set the tones.
According to vibration theory, the sound produced by the movement of piano strings should contain a fundamental tone with the lowest frequency and the highest amplitude, as well as a number of higher frequency overtones that are integral multiples of the fundamental frequency. The number and intensity of overtones depends on how and where the strings are excited to vibrate. The piano is a large stringed instrument with a complex structure, with over a hundred components in the percussion system alone. Due to many influencing factors, the sound of the piano does not strictly follow the above simple law. There are the following peculiarities:
(1) Loss of fundamental frequency: In the harmonic structure of the bass region, the amplitude of the fundamental frequency is much smaller than the amplitude of the other harmonics (this phenomenon also occurs from time to time in the mid-range), and even in the harmonic structure of the lowest several notes, the amplitude of the fundamental frequency is zero and completely non-existent. (2) Loss of harmonics: In the treble region, only the fundamental frequency component exists in the spectral structure, and the other harmonic amplitudes are very low or even zero. (3) Dissonance: the frequencies of the harmonics are not integer multiples of the fundamental frequency. In the spectral structure, as the number of harmonics increases, the frequency interval of harmonics gradually increases.
The evaluation function in the Smart Piano refers to intelligent scoring based on the students’ pitch, strength, rhythm, and timing, so that students can understand the learning situation and deficiencies in each lesson. The intelligent evaluation of the students is mainly divided into three phases: firstly, for the first time using the Smart Piano, students should be evaluated on their piano learning level, hobbies, and expectations. Students without piano foundation can start from the very beginning of the basic piano learning, while students with piano foundation can choose the difficulty of the test in the intelligent piano, in the mastery of its repertoire of pitch, strength, rhythm, timing scores, and then choose the appropriate learning program for the current stage according to the student’s performance, the students of the piano level of the intelligent assessment is conducive to the teacher’s tailored to the needs of each student to better target different learning programs for each student. Targeted to each student in different teaching methods, in the teaching of intelligent piano, for the primary students can be used in a one-to-many way group teaching, intermediate and advanced students should be used in a one-to-one way of teaching. Secondly, in each stage of students’ learning, students’ learning situation should be intelligently assessed, and the way of group PK can be used, through the mutual competition between students, to drive the students’ learning enthusiasm and also can be sent to parents through WeChat, so that parents can always grasp the students’ learning progress. Finally, students can also let students through the way of self-assessment of their own learning satisfaction, the teacher’s comments on the assessment, teachers can understand the students’ perception of their own learning, but also through the students’ opinions in the teaching process to make certain improvements.
Traditional piano teaching is generally used in the one-on-one teaching method, which focuses on the training of students’ piano playing skills, teaching methods have a single, students basically need to perform a lot of practice every day, there is no communication with other people of the same age, the piano is extremely difficult to learn, and not a day or two days can be successful overnight, but requires students to adhere to a long period of time, and need to be very interested in learning the piano. Students need to maintain a great interest in learning piano. For students to learn the piano for a long time, it can feel very boring. No interest in learning the piano can also be difficult to stick to the afternoon. Intelligent piano development, to the traditional piano education field has brought innovative teaching methods, intelligent piano in the teaching process is used with the Internet connected to the education mode, mainly divided into collective class teaching mode and “online”, “offline” teaching mode.
Spectral subtraction, as one of the earlier proposed unsupervised audio signal denoising algorithms, is based on simple assumptions. It assumes that the noise-containing frequency signal contains only additive noise, and that the additive noise is smooth. It then utilizes the relative independence between the noise signal and the pure audio signal.
The main task of piano music recognition is to recognize the tone name and timing information of individual notes from an audio signal that contains multiple notes.
Preprocessing The preprocessing process includes pre-emphasis, framing, windowing, filtering and other processes. Pre-emphasis compensates for the loss of high-frequency components, frame-splitting enables a smooth transition between frames, and windowing reduces leakage in the frequency domain. The filter for pre-emphasis is standing:
Fast Fourier Transform (FFT) Assuming that the original music signal is where Calculate the energy of spectral lines The energy of the The above spectrum is passed through a Mel filter bank with a frequency response of The Mel frequency cepstrum coefficient C(n) is obtained by cepstrum analysis on the Mel spectrum:
The number of Mel filters in speech recognition is generally 24, and the sampling frequency is 800011z. In this paper, experiments are conducted for the background of piano playing music, and the sampling frequency is taken as 11.025kHz.
This method utilizes the human ear auditory properties, calculates the inverted spectral departure of the note frame signal from the background noise segment, and determines the starting north point of the emperor’s note by the single-gate-eye method based on the difference slice of the inverted spectral distance feature of the note starting and falling segments, and its body steps are as follows:
The music of the pre-waiting time is the environmental noise frame, take which attached to the first 10 frames MFC (cepstrum coefficients of the mean value of the parameter estimation of the environmental noise as a superior, notated as: me (n). Calculate the distance between each frame in the music signal and the MiCC cepstrum piecewise value of the ambient noise. Where Calculate the maximum value dth of the inverted spectral distance of each frame of the front wait by step 2), set the net value as a base, and use the single threshold method to determine the starting and ending points of the notes.
The calculation of the Harmonic Product Spectrum is based on a perfect multiplication of the frequency ratios between the harmonics, and takes into account only the first harmonic. Therefore, the “Harmonic Product Spectrum (HPS)” method is still used to filter the true piano notes.
Harmonic overlap is the main difficulty in multiple fundamental frequency estimation, because harmonic overlap makes it impossible for the algorithm to categorize the fundamental frequency and its corresponding harmonics, and in particular, it is impossible to separate the overlapping harmonics accurately. In this section, we propose to separate the overlapped harmonics using a harmonic reset algorithm in order to achieve reliable computation of fundamental frequency saliency.
For each candidate fundamental frequency of set
where
Most of the harmonic envelopes of musical signals are relatively smooth, so the interpolated harmonic amplitudes of individual sources at overlapping harmonics can be obtained by cubic spline interpolation of adjacent reliable peaks. Non-overlapping harmonics play an important role in the harmonic reset process because of their uniqueness and reliability. The overlapping harmonics may come from two sources or multiple sources, so interpolation can be used to separate the harmonics one by one in order of frequency value from low to high. Since the harmonic vectors of candidate fundamental frequencies with small frequency values have more reliable non-overlapping harmonics, it is reasonable and reliable to separate the harmonics in the above order. The harmonic reset is divided into two cases, the simple case of two sources mixing at the same frequency and the complex case of multiple sources mixing.
When multiple sources are overlapped, the HHM of the source with the smallest frequency value is also taken as the primary object of analysis, and the set
where
Errors in the estimation of secondary and octave frequencies in multiple fundamental frequency estimation are collectively referred to as octave errors. In most cases, the smaller the value of the integer ratios of frequencies between notes in a musical tone, the more harmonic it is to the auditory senses. However, smaller integer ratios result in more pronounced overlap between harmonics and a greater probability of octave errors in the fundamental frequency. In addition, according to the characteristics of HPS itself, the probability of octave error occurrence will be increased to some extent when it is utilized to solve the candidate fundamental frequency.
On the premise of ensuring that the candidate fundamental frequencies in set
For a candidate fundamental frequency
The significance of a candidate fundamental frequency is calculated by its energy value, denoted as
The process of exclusion of base frequencies for a subset is shown in equation (10):
where
where
In the left-hand note estimation stage, the low harmonic product spectrum is used to select the candidate notes, and then the selected notes are further filtered based on the “harmonic matching ratio” to obtain the estimated left-hand notes. The right-hand notes are estimated from the harmonic amplitude spectra of the right-hand candidates after spectral subtraction using the HPS method.
In order to verify the effect of the note estimation method based on harmonic product spectrum proposed in this paper, two parts of experiments are designed, the first part verifies the effect of each step of the algorithm. The other part compares the effects of this paper’s algorithm with other algorithms. In this section, to verify the effect of each step of this paper’s algorithm, the experimental data are 5 demonstration audios selected from each of the EPSA database’s collection of music scores of levels 1 to 5, with a total of 25 demonstration audios, removing the note fragments that contain only left-handed notes, and the total number of valid note fragments is 600, and each note fragment contains 1 to 5 notes pressed by the left and right hands. The note fragments are divided according to the audio labeling information, and the right hand notes are estimated for each note fragment using the algorithm of this paper, and the results of each step are recorded as described below.
Preprocessing Calculate the “multi-frame mean spectrum” of a note fragment, and select a note fragment to be pre-processed as shown in Fig. 1. In the figure, a) is the time-domain waveform of the note fragment, and b) is the mean spectrum obtained from the frequency spectrum at different moments, using the mean value method to smooth out the changes in amplitude at different moments. Note estimation By processing 600 note fragments, the experimental results of the main melody note estimation based on the number of notes contained in the note fragments are shown in Table 1. The two numbers in parentheses in the first row of the table indicate the number of notes in the fragments, and the remaining rows are the respective indicators of the notes. When there is only one right-handed note without a left-handed note, the metrics of right-handed note screening using HPS combined with a priori note ranges are close to 100%, indicating the effectiveness of the HPS method. Comparing the data in the columns of the table from left to right, it is known that the accuracy and recall of the algorithm’s estimation of right-handed notes tends to decrease as the number of left-handed notes increases, which is consistent with the idea of the algorithm. However, the data in columns 4 and 5 and columns 6 and 7 show that with the same number of left-hand notes, the accuracy of right-hand note estimation increases slightly with the increase in the number of right-hand notes, which is due to the fact that after spectral subtraction using the right-hand notes of the HPS method, the recall, R, is higher, and when the total number of right-hand notes increases, it will reduce the number of misdetected notes and increase the accuracy. Overall, the recall, precision, and F-value of this method for the estimation of right-hand i.e., main melody notes are 0.947, 0.936, and 0.948, respectively, which are high in all indicators, which is due to the fact that in order to accurately extract the piano playing notes and to reduce the evaluation errors due to systematic detection errors, this section makes full use of a priori music scores to determine the range of candidate notes and to improve the note detection effect.

Preprocessing stage rendering
The right hand theme is estimated
| Note/combination Test result | (0,1) | (1,1) | (2,1) | (2,2) | (3,1) | (3,2) | Total |
|---|---|---|---|---|---|---|---|
| The total number of right notes | 220 | 157 | 84 | 87 | 84 | 56 | 688 |
| Correct number | 214 | 145 | 77 | 79 | 72 | 49 | 636 |
| Leak number | 6 | 12 | 7 | 8 | 12 | 7 | 52 |
| Error number | 4 | 10 | 3 | 10 | 10 | 8 | 45 |
| Recall rate | 0.985 | 0.955 | 0.932 | 0.914 | 0.896 | 0.859 | 0.947 |
| Accuracy | 0.996 | 0.947 | 0.897 | 0.927 | 0.859 | 0.874 | 0.936 |
| F value | 0.978 | 0.957 | 0.915 | 0.913 | 0.872 | 0.839 | 0.948 |
The experimental data is displayed as the results of note detection in comparison experiments, as depicted in Fig. 2. When the a priori knowledge of the audio is unknown, the note estimation of this paper’s method “based on harmonic reset and octave correction” has different degrees of shortcomings compared with the neural network (Algorithm 1) and bidirectional neural network evaluation (Algorithm 2) methods, the reason is that the audio used in the experimental dataset contains notes randomly set up, and the range of notes between the notes may not meet the piano music theory, which makes the range of notes obtained by this paper’s algorithm inaccurate. The reason is that the audio contained in the data set used for the experiment is randomly set, and the range of notes between the notes may not satisfy the music theory of piano music, which makes the range of left and right hand notes obtained by this paper’s algorithm inaccurate and leads to the omission of note detection and misdetection. However, the difference between all evaluation indexes is within 5%, which indicates that this paper achieves note estimation based on the characteristics of note overlap and dissonance combined with the principle of spectral reduction. After the preprocessing information of the audio file is known, the method described in this paper is about 4%, 1%, 2%, 1%, and 2%, 1% higher than other methods in the recall, precision, and F-value metrics, respectively, indicating that in the case of the present system with preprocessing information about the piano sight-reading, the method of this paper is able to estimate the notes efficiently by using the method of this paper.

Contrast test results
After the evaluation method was completed, an experiment was conducted to verify its effectiveness. For the same piece of music were selected three players to perform, one of them is a piano teacher, student 1 is a piano grade 7, student 2 is a piano grade 8, and the scores of the three players after the completion of the performance are shown in Table 2. From the table, it can be found that whether it is the single score in the performance or the average score of multiple performances, the scores from high to low are the teacher (0.962), student 1 (0.878) and student 2 (0.764), and compared with the real piano level of the three performers, this score ranking can objectively reflect the performance level of the performers. Therefore, the system is of some significance in alleviating the lack of resources in vocal music teaching and improving students’ independent learning.
Three players score
| Group | Score 1 | Score 2 | Score 3 | Score 4 | Score 5 | Mean |
|---|---|---|---|---|---|---|
| Teacher | 0.98 | 0.98 | 0.97 | 0.95 | 0.93 | 0.962 |
| Student 1 | 0.87 | 0.89 | 0.87 | 0.88 | 0.88 | 0.878 |
| Student 2 | 0.79 | 0.78 | 0.74 | 0.76 | 0.75 | 0.764 |
The survey chose undergraduate and graduate students from two colleges of a conservatory as the target, in the process of questionnaire placement, the author randomly chose 75 students from each of the two colleges who had been exposed to the intelligent piano related courses, a total of 150 questionnaires were distributed, 150 questionnaires were recovered, 150 valid questionnaires were recovered, with a recovery rate of 100%, and a validity rate of 100%.
The “Questionnaire on the Status of Intelligent Piano Teaching” consists of mainly closed-ended single-choice questions, with a total of 32 questions, using a 5-level scoring system (completely agree, relatively agree, generally agree, relatively disagree, and strongly disagree), and the degree of agreement is positively correlated with the score. The results of the survey are shown in Table 3.
Descriptive analysis of each dimension
| Dimension | N | Minimum value | Maximum value | Mean | Standard deviation | |
|---|---|---|---|---|---|---|
| Student cognition | knowledge | 154 | 1 | 5 | 4.46 | 0.754 |
| Skill cognition | 154 | 2 | 5 | 4.50 | 0.732 | |
| Attitude cognition | 154 | 1 | 5 | 4.46 | 0.794 | |
| Conduct | Student cognition | 154 | 1 | 5 | 4.47 | 0.695 |
| Course opening | 154 | 1 | 5 | 4.53 | 0.774 | |
| Teaching material | 154 | 1 | 5 | 4.54 | 0.753 | |
| Quality of teaching | 154 | 2 | 5 | 4.58 | 0.720 | |
| Teaching content | Teaching equipment condition | 154 | 2 | 5 | 4.5 | 0.778 |
| The course of the course | 154 | 1 | 5 | 4.52 | 0.698 | |
| Skill training | 154 | 1 | 5 | 3.71 | 0.772 | |
| Mental training | 154 | 2 | 5 | 3.73 | 0.835 | |
| Performance training | 154 | 1 | 5 | 3.72 | 0.998 | |
| Specific content of teaching | 154 | 2 | 5 | 3.97 | 0.735 |
In this section, the questionnaire will first be administered to the students, followed by a study and analysis of the process and results to reveal the current status of smart pianos in assisting actual teaching and learning environments. Individual and overall calculations were made for the specific dimensions involved in the three broad dimensions, of which the mean score for the dimension of attitude perception was 4.46, which is a good level. The quality of teaching had a mean score of 4.58, which was the highest single item with the highest mean, while the dimension with the highest mean score among the three dimensions was the dimension of students’ cognitive situation, which was 4.47. In the standard deviation analysis, the major dimension with the highest level of stability was the cognitive situation of students, while the lowest was the role of smart piano in basic teaching. The results indicate that the development of smart piano education has made great progress at that time, but there is still room for improvement.
Most learners are not under any external pressure to learn and can enjoy learning the piano very much. However, it is also because of the lack of external pressure to urge learners to pass the fresh period of learning the piano that they are often repulsed by the difficulties or frustrations in the process of practicing the piano and gradually lose their willingness to learn the piano. After 10 hours of teaching practice, the willingness to learn is shown in Table 4. The subsequent learning willingness of the learners who utilized smart piano teaching was much higher than that of 15.1% of the learners who had traditional one-to-one piano teaching. In smart piano teaching, the course content can be greatly enhanced by multimedia and other modern teaching tools, and some abstract and difficult music theory concepts can become more visual. For adults, teaching tools and the organization and arrangement of teaching activities still play an important role in their learning effectiveness. In the intelligent teaching classroom, the intelligent system combines playing the piano with games, competitions and other PK methods, which can improve the fun of piano learning, and the learners present higher enthusiasm and concentration in the learning process, and have a stronger willingness to continue to learn the piano.
The students learn the will
| Very willing | Prefer | General | Unwillingness | |
|---|---|---|---|---|
| Experimental group | 33.4% | 50.2% | 16.4% | 0% |
| Control group | 18.3% | 28.2% | 43.5% | 10% |
This paper proposes an intelligent piano assessment method based on music theory knowledge, such as tone length, pitch, and twelve equal temperaments. Through the algorithms of Fourier transform, harmonic reset, and octave correction, it recognizes the characteristic information and notes in piano music, so as to realize intelligent assessment of students’ piano abilities. Conclusions can be drawn by verifying the effect of the algorithm:
The recall, precision, and F-value of the main melody note estimation of this method reach more than 0.9, which improves the note detection effect. Three performers were selected to evaluate their piano level, and the score sorting results were Teacher>Student1>Student2, which can truly reflect the performers’ performance level. Intelligent piano teaching has a better application effect on students’ cognition, skills and mental training. Students are more willing to continue using intelligent piano teaching than traditional one-to-one piano teaching.
