A Study on the Application of Music Information Technology in Enhancing the Interactivity of Teaching and Learning in Music Education
Online veröffentlicht: 24. März 2025
Eingereicht: 04. Nov. 2024
Akzeptiert: 16. Feb. 2025
DOI: https://doi.org/10.2478/amns-2025-0786
Schlüsselwörter
© 2025 Yanxiu Chen, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
As the reform of basic education continues to improve, the reform of music education also faces unprecedented challenges. How to cultivate a new generation of qualified and innovative music teachers for basic education has become the center of teaching reform activities in major schools across the country. In the reform of music education, first of all, we should get rid of the traditional educational thinking, and establish a student-centered educational concept [1]. Reform the means of education, the use of advanced teaching mode, so that the quality of talents cultivated by education is guaranteed. In the progress of society, teachers are required to correct the educational ideology, change the concept of education, face all students, strengthen the ideological and moral education of students, and emphasize the cultivation of students’ innovative spirit, practical ability, and expanding horizons [2-5]. It is proposed to change the teacher’s lectures to the teachers and students to participate in teaching activities, and to change the teaching method that focuses on teaching results to the teaching method that makes students happy in the learning process [6-7]. This requirement poses a strong challenge to music education which still uses traditional education concepts and teaching methods to provide talents for basic music education.
Interactive music teaching maximizes the ability of information exchange between students and teachers, makes the communication between students and teachers more equal, and at the same time improves students’ music literacy [8-9]. This kind of teaching method not only can improve students’ interest, enthusiasm, initiative and thinking ability in learning, making it easier for them to understand and master knowledge, but also allows teachers to better understand students’ learning situation and personalize teaching according to students’ needs and characteristics [10-12]. For music education, this interactivity is particularly important, because music is a kind of sensual, emotional resonance art, only through the interactive experience of music, communication and understanding, emotional perception in order to better master the content and skills of music [13-16]. In the wave of today’s digital era, information technology has widely penetrated into various fields. As an art discipline, music can also be enriched through the innovative application of information technology to promote teacher-student interaction, teamwork, music creation, teaching change, etc., and improve the learning effect [17-19].
For music information technology, this paper chooses audio spectrum recognition among its many fields as the subject of this research, proposes CQT spectrum recognition algorithm, and proposes a wavelet denoising algorithm to solve the problem with respect to the noise problem in the process of recording and broadcasting and recognition of audio signals. Optimize the CQT parameters, calculate the center frequency of the music signal within the transformed spectrum, and derive the frequency resolution to ensure accuracy. The method of FFT is invoked to reduce the computation of CQT frequency domain, convert the CQT definition equation into coefficient expression, and further represent the CQT coefficients in matrix form. The discrete wavelet transform is performed on the noise-bearing signal, and a new threshold function is proposed to solve the edge distortion problem of the traditional wavelet threshold denoising algorithm, and its asymptote equation is determined. Through the music spectrum recognition and audio signal denoising experiments to test the performance of the spectrum recognition music information technology proposed in this paper, combined with music education and teaching corresponding to the proposed music information technology-based music teaching strategies, and music teaching practice.
The wide application of information technology creates opportunities for music education teaching, changes boring teaching methods, inspires students’ musical thinking, and promotes the creation of efficient classrooms. In this paper, a new thinking on the deep integration of music teaching and information technology is launched, and audio spectrum recognition in music information technology is selected as the main content of this research, to realize the audio spectrum recognition in music education teaching, and to provide assistance for the interactive development of music teaching classroom.
The essence of music is a physical fluctuation phenomenon, when the air molecules begin to move through some kind of vibration will produce sound, people will be perceived by their own hearing organs, and the resulting changes in air pressure over time can produce a continuous signal, that is to say, in real life, music is in the form of analog signals, which waveforms are continuously changing, and computers can only handle discrete signals. To represent the continuous process in limited memory, it is necessary to sample the analog signal at a fixed frequency and then convert the continuous signal into a discrete signal through quantization. During the sampling process, the sampling frequency must not be less than twice the maximum frequency of the analog signal in order to prevent the signal from losing information, according to the Nyquist Rice Sample Theorem. Finally, the music signal is compressed into a signal that can be processed by a computer using the same coding.
Before analyzing an audio signal, in order to improve the quality of the data it is first necessary to preprocess the signal so that it can be analyzed more efficiently. The signal can be flattened by raising the high frequency portion through pre-emphasis. Then, through frame splitting,
The formula for the Hamming window is:
Short-time energy characteristics Musical signals change over time, and so does their energy. The onset of a note is usually accompanied by a sharp increase in the energy of the music, and the energy of a piece of music is usually related to the arousal value. For the energy characteristics of the signal, it is necessary to divide the frames and add windows to get an audio sequence of length
Instead of short-time energy, short-time amplitude can also be calculated as Eq:
Zero Crossing Rate Zero Crossing Rate (ZCR) is an intuitive way to calculate fundamental frequency. For music, fundamental frequency is the pure tone with the lowest frequency and the highest intensity, which determines the pitch of the whole music, and thus the pitch can be detected by ZCR algorithm. The zero crossing rate is calculated by the standard deviation of the number of times the signal crosses the zero axis in each frame and the mean value, which is calculated by the formula:
It is difficult to analyze non-periodic signals in the time domain
However, the theoretical Fourier Transform is used to analyze continuous signals and cannot be used on a computer. In practice, the discrete time Fourier transform (DTFT) is used to analyze the sampled data. For a continuous audio signal
Where:
Where:
In turn, the formula for the discrete Fourier transform can be obtained:
where
Time-frequency analysis can simultaneously analyze and study the time direction and frequency direction of the audio signal, which can better present the transient characteristics of the signal frequency over time, and thus improve the identification on the music.
Short-time Fourier Transform (DFT) For DFT, it is a transformation of the global, the average frequency characteristics over a period of time. If the frequency domain resolution is increased it decreases the resolution on the time domain and vice versa. For this shortcoming, the Short Time Fourier Transform (STFT) can be introduced, the The STFT slides on the time domain signal by setting the size and step of the window, performs the Fourier transform within each corresponding window separately, and finally stitches them together, so that the frequency over time data is formed, constituting a two-dimensional representation of the joint distribution of time and frequency, and the formula for the STFT is:
Mel Frequency Cepstrum Factor The human ear can perceive the frequency range from 20Hz to 20,000Hz, for signals beyond this frequency range, the human ear will automatically filter out this information and cannot perceive it. Therefore, the subjective perception in the frequency domain is nonlinear, while the frequency and time in the spectrum of STFT are linearly related. In response to this problem, the Mel filter bank was invented, which allows the intensity of the tone to be changed without changing the frequency, which means that the intensity of the response to the sound pressure, i.e. the loudness, needs to be taken into account. The Mel frequency allows for the display of the pitch heard at a specific frequency and a specific intensity, which forms a nonlinear correspondence with Hz, which corresponds to:
From Eq. (10), it can be understood that the accuracy of Mel frequency calculation decreases as the frequency increases, which corresponds to the high sensitivity of the human ear to low-frequency tones.
Constant Q transform The Constant Q Transform (CQT) transform is similar to the MFCC, and is also a transformation from a linear spectrum to a nonlinear spectrum. The difference with MFCC is that it is a time-frequency transformation algorithm with an exponential distribution law, which is based on logz as the base. From the previous analysis of musical sound theory, we can understand that the frequency of the same high octave tone is twice as much as that of the low octave tone, that is to say, the sounds in music exist in the form of exponential distribution. The COT can satisfy this characteristic, and the amplitude value of the music signal at each note frequency can be obtained directly through the COT transform, so it is often used in the research of music direction. It also means that COT has higher frequency resolution and lower time resolution than DFT at low frequencies.CQT is similar to DFT in calculating the frequency coefficients, in DFT, the interval between center frequencies is constant, and in COT, the center frequencies are distributed according to the exponential law, and its center frequencies are calculated as follows:
where: B is the number of frequency bands per octave, and the CQT transform for a finite-length sequence 1
Derived further:
Where
As can be seen from equation (14), the value of the constant
Since the spectral lines of the CQT are exponentially distributed and are more computationally intensive than the linear spectral lines, it is desired to reduce the computational effort by calling the FFT [21]. The CQT can be computed directly in the time domain by initializing the parameters
where:
Further, the CQT coefficients can be expressed in matrix form as:
Where:
Audio signals in the recording and playback process will produce noise, in the conversion process will inevitably produce noise, affecting the recognition effect and accuracy of the audio spectrum.
The noise in the audio file is generally Gaussian white noise, which is randomly distributed in the time domain, and the superposition of the original audio signal will affect the rhythm point judgment, resulting in music and lighting rhythm confusion. To address this problem, considering that the more popular LMS adaptive denoising algorithm has the defect of large computational volume, and because the energy of the signal is mainly distributed in a small number of larger wavelet coefficients, while the noise is uniformly distributed in the wavelet domain, corresponding to the smaller wavelet coefficients, we have chosen the wavelet transform algorithm as a denoising tool. Wavelet transform denoising algorithms can be categorized into masked denoising method, mode maxima denoising, and wavelet thresholding denoising [22].
After comparing the performance and advantages and disadvantages of three wavelet denoising algorithms, taking into account the characteristics of low signal-to-noise ratio in music signals, this paper uses wavelet threshold denoising algorithm, and the main work is to compare the advantages and disadvantages of the traditional soft and hard threshold wavelet denoising, and put forward a new threshold function, which overcomes, to a certain extent, the defects of the hard threshold function destroying the smoothness of the signals, and the soft threshold function that easily causes distortion, and suppresses the noise influence.
We use Gaussian white noise as a noise source and assume that there is an existing band-noise signal
where
Eq. (19) is the soft threshold function, Eq. (20) is the hard threshold function, and
In order to solve the problem of appealing the traditional wavelet threshold denoising algorithm, we propose a new threshold function, such as equation (21):
where
If
If,
Therefore, the asymptotic line equation of function
This threshold function possesses several properties as follows.
Continuity at threshold When If
In addition to the threshold function, the size of the threshold is also an important factor affecting the denoising performance, the threshold is too large, it is easy to filter part of the signal wavelet coefficients, the loss of signal energy; the threshold is too small, the reconstructed signal still contains noise components, the denoising effect is greatly reduced.Donoho proposed the classical universal threshold
In this chapter, the performance of the spectrum recognition music information technology proposed above will be examined to explore the feasibility of its application in music education teaching. The experiment mainly includes two parts: audio spectrum recognition and audio denoising.
In this section, two students majoring in vocal music at the School of Music of R Comprehensive University will be selected for this experiment. The two students are referred to as Student A and B. Student A will sing the country ballad style song “Hairy Rain”, while Student B will sing the jazz style song “Night Shanghai”.
First of all, the audio spectrum recognition of student A’s singing voice is carried out, and the relevant data of the recognition is specifically shown in Table 1. In the data, it can be found that the sound pressure of the 1st and 2nd overtones is higher than that of the fundamental, and the 1st overtones are higher, with a sound pressure of 48 dB. The sound pressure of the whole harmonic column ranges from 0.1 to 48 dB, and there are 8 of them above 10 dB, which indicates that the singer sings with more strength, and belongs to the singing style of strong starting tone. The whole harmonic column has an integer multiple of 1:2:3:4:5:6:7:8:9:10:11 between frequencies, and the relationship between degrees is upward pure octave, pure fifth, pure fourth, major third, minor third, minor third, major second, major second, major second, minor second, and belongs to the standard harmonic column arrangement structure. This indicates that the sound has a clear musical effect and a bright timbre. However, there are 10 overtones in total, but there are only 6 (1st to 6th overtones), which are relatively few, indicating that the sound is dry and straightforward, and lacks a sense of mellowness and magnetic effect.
A student’s spectrum identification data
| Harmonic sequence | Frequency(Hz) | Pitch | Sound pressure(dB) |
|---|---|---|---|
| Pitch | 584.57 | d2-17 | 34.97 |
| 1st overtone | 1,165.91 | d3-21 | 48.06 |
| 2nd overtone | 1,747.43 | a3-21 | 36.65 |
| 3rd overtone | 2,388.14 | d4-17 | 30.46 |
| 4th overtone | 2,907.21 | #f4-39 | 21.25 |
| 5th overtone | 3,491.75 | a4-22 | 28.01 |
| 6th overtone | 4,088.45 | c5-49 | 18.6 |
| 7th overtone | 4,672.93 | d5-18 | 10.39 |
| 8th overtone | 5,254.53 | e5-15 | 6.63 |
| 9th overtone | 5,909.71 | #f5-11 | -0.51 |
| 10th overtone | 6,408.02 | g5+29 | 9.63 |
The results of the spectral identification of Student B are specifically shown in Table 2. It can be seen that the fundamental of Student B is much stronger than the overtones, and five of the first seven overtones have negative sound pressure values, below 0 dB. This indicates that the singer sang with less intensity. There are 11 overtones in total, 7 of which are more obvious, and the relationship between the frequencies of the harmonic columns is 1:2:3:4:5:6:7:8, and the degrees are upward pure octaves, pure fifths, pure fourths, major thirds, minor thirds, minor thirds, and major twos, which belong to the standard harmonic columns. This indicates that the singer’s voice has an obvious musical effect, and that the tone is round and bright. The low number of overtones indicates that Student B has little resonance.
B student’s spectrum identification data
| Harmonic sequence | Frequency(Hz) | Pitch | Sound pressure(dB) |
|---|---|---|---|
| Pitch | 529.1 | c2+11 | 18.38 |
| 1st overtone | 1,058.25 | c3+11 | -9.31 |
| 2nd overtone | 1,584.44 | g3+10 | -3.35 |
| 3rd overtone | 2,113.49 | c4+9 | -12.92 |
| 4th overtone | 2,639.44 | e4-7 | 1.58 |
| 5th overtone | 3,168.59 | g4+10 | 1.31 |
| 6th overtone | 3,697.84 | bb4-23 | -2.82 |
| 7th overtone | 4,226.89 | c5+9 | -3.63 |
Obviously, the audio spectrum recognition algorithm proposed in this paper can effectively recognize the spectrum of different styles and types of songs sung by students in music teaching, effectively summarize students’ vocal singing characteristics and voice features, and provide effective assistance for music teaching.
The original audio signal sample used for the audio signal denoising experiment in this section is the singing clip of student A from the previous section. The wavelet denoising algorithm proposed in this paper is used to denoise the original audio signal, as shown in Figure 1. Figure (a) is the original audio signal without denoising, and Figure (b) is the audio signal processed by the wavelet denoising algorithm in this paper. From the figure, it can be clearly seen that the noise has been processed, the noise noise in the audio signal is greatly reduced, and the signal-to-noise ratio SNR is improved from 12.634 to 23.163. The wavelet denoising algorithm proposed in this paper has a significant denoising effect on the audio signal.

The effect of signal denoising
The purpose of this chapter is to explore and analyze the application of the music spectrum recognition information technology in vocal music teaching proposed above, and correspondingly to propose effective music teaching strategies, including effective breathing support and vocal relaxation techniques, resonance tuning and timbre shaping, vocal exercises and skills training.
Effective breath support and vocal relaxation techniques are interdependent and complementary in vocal teaching. Effective breath support provides a steady supply of air for vocal relaxation, which makes the vocal process more comfortable and stable. At the same time, vocal relaxation techniques can help vocal performers to realize better resonance, adjust their vocal position, and relieve physical and mental stress, so as to enhance the quality and expressiveness of their voice. In vocal music teaching, teachers need to analyze the students’ spectral recognition results and guide them to master the correct breathing support techniques and vocal relaxation methods, and through repeated instruction and practice, help them to use these techniques in practice and form good breathing and relaxation habits.
Resonance tuning and timbre shaping are inextricably linked in vocal music teaching, influencing and promoting each other. Resonance tuning provides the basis for tone shaping. Through reasonable resonance adjustments, vocal performers can improve their voice’s resonance effect and expressiveness, provide more rich sound texture, and change space for tone shaping. The technique and application of tone shaping also further enrich and optimize the effect of resonance tuning, so that the resonance space and resonance area can be more accurately controlled and adjusted. In vocal music teaching, teachers need to guide students to master the correct resonance tuning techniques and the principles of timbre shaping, analyze the students’ timbre shaping problems with the help of the students’ spectral recognition results, and help them achieve stable resonance and rich timbre through repeated guidance and practice.
Vocal music teaching involves both vocal exercises and skills training, which are interdependent and complementary. Vocal exercises provide voice learners with the opportunity to acquire a good vocal foundation, and through systematic practice and feedback, they can master correct vocal techniques and strengthen the stability of their voices. Skill training, on the other hand, further improves learners’ performance ability and musicality through various specific vocal technique exercises, so that they can effectively convey emotions and shape musical images through their voice. Teachers should understand the differences in students’ vocal skills and training needs based on spectrum recognition results, and choose appropriate teaching strategies and methods according to their characteristics and levels.
In the previous chapter, music teaching strategies such as resonance tuning and timbre shaping, vocal exercises, and skills training were proposed in conjunction with the music information technology based on spectrum recognition proposed in this paper. In this chapter, we will use music teaching practice as a means to explore the effect of music information technology on enhancing the interactivity of music education teaching and results. The experimental subjects of this music teaching experiment were students of the 2022 grade in the vocal music major of the School of Music of R Comprehensive University, and the experimental period was from September to October 2024.
Before formally carrying out the analysis, the music teaching behaviors were numbered in numerical order, as shown in Table 3. It mainly includes 12 categories of teaching behaviors such as teacher’s prompts, teacher’s instructions, classroom organization and management.
Teaching behavior
| Number | Teaching behavior |
|---|---|
| 1 | Teacher’s tips |
| 2 | Teacher’s instructions |
| 3 | Classroom organization and management |
| 4 | The teacher’s question |
| 5 | Teacher’s acceptance |
| 6 | According to the teacher’s No |
| 7 | Questions from students to teachers |
| 8 | Students’ reflection to other students |
| 9 | Students’ Speeches to Teachers |
| 10 | Students’ speeches to other students |
| 11 | Students’ thinking and connection |
| 12 | Media tool information presentation |
Due to the complexity of the types of teaching interactions, it is difficult to precisely define teaching interactions in terms of various “groups of teaching behaviors”. In this paper, we first depict the points on the time axis of the collected behaviors, and then connect them sequentially, so as to reflect the classroom teaching process in the field. The “time-behavior diagram” of the 5th music class in this music teaching practice is shown in Figure 2. In the time-behavior diagram, the horizontal direction is the direction of time, the vertical direction is the arrangement of various behaviors, above the horizontal coordinate is the teacher’s behavior, and below the horizontal coordinate is the student’s behavior. From the coordinate graph, every time the curve intersects with the horizontal coordinate, it can basically be considered as a teacher-student interaction. As can be seen from the graph, the teacher-student interactions during the experiment were more frequent, and the number of teacher-student interactions reached as many as 18 times in one lesson, and the interactivity of music teaching was significantly improved.

Time-Behavior Diagram
Teaching mode judgment is mainly based on the analysis of students (S)-students (T), through the T behavior occupancy rate, S occupancy rate, behavior conversion rate calculation, drawing out the teaching mode analysis chart, specifically as shown in Figure 3. It can be seen that the T behavior occupancy rate is 0.58, and the behavioral conversion rate is 0.41, and the teaching mode of the music course in this music teaching practice is still the common “dialogue type”. In the subsequent improvement of music teaching, further attempts can be made to strengthen the guidance of students’ behavior in the classroom.

Teaching mode analysis chart
The frequency of various teaching behaviors in this music teaching practice was counted, as shown in Figure 4. It can be seen that the distribution of various behaviors was relatively balanced, with Type 7 behaviors (students’ responses to the teacher) being the most frequent at 24 times. Type 11 behaviors (student thinking and connecting) had more behaviors with 20 times, second only to Type 7 behaviors. This suggests that this music teaching practice emphasizes student participation in the classroom, reflecting the’student-centered’ curriculum concept, and that the interactivity of music teaching has significantly improved.

Frequency of teaching behavior
This paper proposes the music spectrum recognition as the core of music information technology, and music education teaching combined name, puts forward the corresponding music teaching strategy, carries out music teaching practice, and explores the effect of music information technology in enhancing the interactivity of music education teaching. First of all, the effectiveness of the music spectrum recognition and denoising method proposed in this paper is examined. In the audio spectrum recognition experiment, effective spectrum recognition can be realized for students singing different styles of songs, based on the recognition results, it is concluded that student A who sings the country ballad style song “Hairy Rain” has a bright color but lacks a sense of mellowness and magnetic effect, while student B who sings the jazz style song “Shanghai at Night” has a mellow, bright tone but does not have a strong resonance when he/she sings. In the audio denoising experiment, the SNR of the audio signal using the wavelet denoising algorithm in this paper is improved from 12.634 to 23.163, and the noise and clutter in the audio signal can be greatly reduced. Combining spectrum recognition-related music information technology and the corresponding music strategy proposed in this paper, music teaching practice can be carried out. Taking the 5th music class as an example, the teacher-student interaction is more frequent, and the number of teacher-student interactions in one class is as many as 18 times, which significantly improves the interactivity of music teaching. The music teaching mode during the practice was conversational, with a T-behavior occupancy rate of 0.58 and a behavioral conversion rate of 0.41. Statistics on the frequency of various teaching behaviors during the music teaching practice showed that the teaching behaviors with the highest number of times were the Type 7 behaviors (students’ response to the teacher) and Type 11 behaviors (students’ thinking and connecting), which amounted to 24 and 20 times, respectively, and the classroom participation of the students was significantly increased. The interactive nature of music teaching is outstanding.
