A Study on the Application of Music Information Technology in Enhancing the Interactivity of Teaching and Learning in Music Education

As the reform of basic education continues to improve, the reform of music education also faces unprecedented challenges. How to cultivate a new generation of qualified and innovative music teachers for basic education has become the center of teaching reform activities in major schools across the country. In the reform of music education, first of all, we should get rid of the traditional educational thinking, and establish a student-centered educational concept [1]. Reform the means of education, the use of advanced teaching mode, so that the quality of talents cultivated by education is guaranteed. In the progress of society, teachers are required to correct the educational ideology, change the concept of education, face all students, strengthen the ideological and moral education of students, and emphasize the cultivation of students’ innovative spirit, practical ability, and expanding horizons [2-5]. It is proposed to change the teacher’s lectures to the teachers and students to participate in teaching activities, and to change the teaching method that focuses on teaching results to the teaching method that makes students happy in the learning process [6-7]. This requirement poses a strong challenge to music education which still uses traditional education concepts and teaching methods to provide talents for basic music education.

Interactive music teaching maximizes the ability of information exchange between students and teachers, makes the communication between students and teachers more equal, and at the same time improves students’ music literacy [8-9]. This kind of teaching method not only can improve students’ interest, enthusiasm, initiative and thinking ability in learning, making it easier for them to understand and master knowledge, but also allows teachers to better understand students’ learning situation and personalize teaching according to students’ needs and characteristics [10-12]. For music education, this interactivity is particularly important, because music is a kind of sensual, emotional resonance art, only through the interactive experience of music, communication and understanding, emotional perception in order to better master the content and skills of music [13-16]. In the wave of today’s digital era, information technology has widely penetrated into various fields. As an art discipline, music can also be enriched through the innovative application of information technology to promote teacher-student interaction, teamwork, music creation, teaching change, etc., and improve the learning effect [17-19].

For music information technology, this paper chooses audio spectrum recognition among its many fields as the subject of this research, proposes CQT spectrum recognition algorithm, and proposes a wavelet denoising algorithm to solve the problem with respect to the noise problem in the process of recording and broadcasting and recognition of audio signals. Optimize the CQT parameters, calculate the center frequency of the music signal within the transformed spectrum, and derive the frequency resolution to ensure accuracy. The method of FFT is invoked to reduce the computation of CQT frequency domain, convert the CQT definition equation into coefficient expression, and further represent the CQT coefficients in matrix form. The discrete wavelet transform is performed on the noise-bearing signal, and a new threshold function is proposed to solve the edge distortion problem of the traditional wavelet threshold denoising algorithm, and its asymptote equation is determined. Through the music spectrum recognition and audio signal denoising experiments to test the performance of the spectrum recognition music information technology proposed in this paper, combined with music education and teaching corresponding to the proposed music information technology-based music teaching strategies, and music teaching practice.

2

Music Information Technology Based on Spectrum Recognition

The wide application of information technology creates opportunities for music education teaching, changes boring teaching methods, inspires students’ musical thinking, and promotes the creation of efficient classrooms. In this paper, a new thinking on the deep integration of music teaching and information technology is launched, and audio spectrum recognition in music information technology is selected as the main content of this research, to realize the audio spectrum recognition in music education teaching, and to provide assistance for the interactive development of music teaching classroom.

2.1

Principles of audio signal visualization

The essence of music is a physical fluctuation phenomenon, when the air molecules begin to move through some kind of vibration will produce sound, people will be perceived by their own hearing organs, and the resulting changes in air pressure over time can produce a continuous signal, that is to say, in real life, music is in the form of analog signals, which waveforms are continuously changing, and computers can only handle discrete signals. To represent the continuous process in limited memory, it is necessary to sample the analog signal at a fixed frequency and then convert the continuous signal into a discrete signal through quantization. During the sampling process, the sampling frequency must not be less than twice the maximum frequency of the analog signal in order to prevent the signal from losing information, according to the Nyquist Rice Sample Theorem. Finally, the music signal is compressed into a signal that can be processed by a computer using the same coding.

2.1.1

Preprocessing of audio signals

Before analyzing an audio signal, in order to improve the quality of the data it is first necessary to preprocess the signal so that it can be analyzed more efficiently. The signal can be flattened by raising the high frequency portion through pre-emphasis. Then, through frame splitting, N sample point is gathered into one observation unit, and a relatively stable short-time sequence is obtained from a long-time non-stationary sequence, the length of one frame is usually taken as 10~30ms. however, due to the discontinuity of the beginning and the end of the frame splitting, the error becomes bigger and bigger with the increase of the number of frames, and the spectral leakage problem occurs when truncating non-periodic signals, therefore, it is usually necessary to Therefore, it is usually necessary to multiply the windowing function with the original signal to obtain the periodic signal, and thus increase the continuity between frames. The commonly used windowing functions are rectangular and Hamming windows, where the rectangular window is given by: (1) $w (n) = {\begin{array}{l} 1, & 0 \leq n \leq (n - 1) \\ 0, & n = else \end{array}$

The formula for the Hamming window is: (2) $W (e^{j w}) = e^{- μ \frac{N - 1}{2}) w} \frac{\sin (\frac{w N}{2})}{\sin (\frac{w}{2})}$

2.1.2

Time domain analysis

1)

Short-time energy characteristics

Musical signals change over time, and so does their energy. The onset of a note is usually accompanied by a sharp increase in the energy of the music, and the energy of a piece of music is usually related to the arousal value. For the energy characteristics of the signal, it is necessary to divide the frames and add windows to get an audio sequence of length N and then calculate it, and its calculation formula is: (3) $E_{n} = \sum_{m = n}^{n + L - 1} x^{2} (m)$

Instead of short-time energy, short-time amplitude can also be calculated as Eq: (4) $M_{n} = \sum_{m = n}^{n + L - 1} | x [m] |$ 2)

Zero Crossing Rate

Zero Crossing Rate (ZCR) is an intuitive way to calculate fundamental frequency. For music, fundamental frequency is the pure tone with the lowest frequency and the highest intensity, which determines the pitch of the whole music, and thus the pitch can be detected by ZCR algorithm. The zero crossing rate is calculated by the standard deviation of the number of times the signal crosses the zero axis in each frame and the mean value, which is calculated by the formula: (5) $Z_{\hat{n}} = \frac{1}{2 L} \sum_{m = \hat{n} - L + 1}^{\hat{n}} | sgn (x [m]) - sgn (x [m - 1]) | \tilde{w} [\hat{n} - m]$

2.1.3

Frequency domain analysis

It is difficult to analyze non-periodic signals in the time domain f(t) to get relevant feature information, so it is usually necessary to transform the original audio signal to compress the data and extract higher-level information. One of the most basic transforms is the Fourier Transform (FT), by which the frequency spectrum of a piece of audio can be obtained.

However, the theoretical Fourier Transform is used to analyze continuous signals and cannot be used on a computer. In practice, the discrete time Fourier transform (DTFT) is used to analyze the sampled data. For a continuous audio signal x(t), it is discretized into x[n] by sampling and the DTFT is performed on x[n] with the formula: (6) $X^{f} (θ) = \sum_{n = - \infty}^{n = + \infty} x [n] \cdot e^{- j θ n}$

Where: n is the sequence number of the sampling point, and θ is the unit in radians or the sampled digital angular frequency. The result of its transformation, although discrete in the time domain, is still continuous in the frequency domain, so it is necessary to sample it uniformly on the ω-axis (0 ≤ k ≤ 2π) to obtain its discrete spectrum: (7) $θ [k] = \frac{2 k π}{N}, 0 \leq k \leq N - 1$

Where: N is the time window length and the sampled digital frequency θ can be expressed as the discrete digital frequency θ[k].

In turn, the formula for the discrete Fourier transform can be obtained: (8) $X^{d} [k] = \sum_{n = 0}^{n = N - 1} x [n] \cdot e^{\frac{2 π k n}{N}}, 0 \leq k \leq N - 1$

where X^d[k] is the result of the DFT for N discrete points on x[n] in the time domain.

2.1.4

Time-frequency analysis

Time-frequency analysis can simultaneously analyze and study the time direction and frequency direction of the audio signal, which can better present the transient characteristics of the signal frequency over time, and thus improve the identification on the music. 1)

Short-time Fourier Transform (DFT)

For DFT, it is a transformation of the global, the average frequency characteristics over a period of time. If the frequency domain resolution is increased it decreases the resolution on the time domain and vice versa. For this shortcoming, the Short Time Fourier Transform (STFT) can be introduced, the

The STFT slides on the time domain signal by setting the size and step of the window, performs the Fourier transform within each corresponding window separately, and finally stitches them together, so that the frequency over time data is formed, constituting a two-dimensional representation of the joint distribution of time and frequency, and the formula for the STFT is: (9) $S (m, k) = \sum_{n = 0}^{N - 1} x (n + m H) w (n) e^{- i 2 π \frac{k n}{N}}$

2)

Mel Frequency Cepstrum Factor

The human ear can perceive the frequency range from 20Hz to 20,000Hz, for signals beyond this frequency range, the human ear will automatically filter out this information and cannot perceive it. Therefore, the subjective perception in the frequency domain is nonlinear, while the frequency and time in the spectrum of STFT are linearly related. In response to this problem, the Mel filter bank was invented, which allows the intensity of the tone to be changed without changing the frequency, which means that the intensity of the response to the sound pressure, i.e. the loudness, needs to be taken into account. The Mel frequency allows for the display of the pitch heard at a specific frequency and a specific intensity, which forms a nonlinear correspondence with Hz, which corresponds to: (10) $M e l (f) = 2595 \times \lg (1 + \frac{f}{700})$

From Eq. (10), it can be understood that the accuracy of Mel frequency calculation decreases as the frequency increases, which corresponds to the high sensitivity of the human ear to low-frequency tones. 3)

Constant Q transform

The Constant Q Transform (CQT) transform is similar to the MFCC, and is also a transformation from a linear spectrum to a nonlinear spectrum. The difference with MFCC is that it is a time-frequency transformation algorithm with an exponential distribution law, which is based on logz as the base. From the previous analysis of musical sound theory, we can understand that the frequency of the same high octave tone is twice as much as that of the low octave tone, that is to say, the sounds in music exist in the form of exponential distribution. The COT can satisfy this characteristic, and the amplitude value of the music signal at each note frequency can be obtained directly through the COT transform, so it is often used in the research of music direction. It also means that COT has higher frequency resolution and lower time resolution than DFT at low frequencies.CQT is similar to DFT in calculating the frequency coefficients, in DFT, the interval between center frequencies is constant, and in COT, the center frequencies are distributed according to the exponential law, and its center frequencies are calculated as follows: (11) $f_{k} = f_{\min} \cdot 2^{\frac{k}{B}} k = 0, 1, 2, \dots K - 1$

where: B is the number of frequency bands per octave, and the CQT transform for a finite-length sequence 1 x(n) is: (12) $X^{c Q T} (k) = \frac{1}{N_{k}} \sum_{n = 0}^{N_{k} - 1} x (n) w_{N_{k}} (n) e^{- i \frac{z_{n} Q}{N_{k}}}$

2.2

Audio Spectrum Recognition Algorithm

2.2.1

Optimizing CQT parameters

f_k is the Kth frequency component of the music signal within the transform spectrum, called the center frequency. Δf_k is the frequency interval between neighboring semitones, called the bandwidth, or frequency resolution, which is obtained by calculation: (13) $f_{k} = f_{0} \cdot 2^{\frac{k}{b}}$

Derived further: (14) $Δ f_{k} = f_{k + 1} - f_{k} = f_{0} \cdot 2^{\frac{k}{b}} (2^{\frac{1}{b}} - 1) = f_{k} \cdot (2^{\frac{1}{b}} - 1)$

Where f₀ is the lowest frequency within the signal; b is the number of frequency points within a frequency range, which can also be regarded as the number of spectral lines.

As can be seen from equation (14), the value of the constant Q is $(2^{\frac{1}{b}} - 1)$ . Since this study is based on the analysis of tone levels, the value of b is set according to the subject of this study. The analytical study of tones was carried out in Chapter 2, where the interval of each octave was divided into twelve semitones, at which point the frequency ratio of two adjacent semitones is $\sqrt[12]{2}$ , hence the name twelve-mean law [20]. In order to comply with the frequency distribution of the twelve equal temperament and to make the distance between the spectral lines correspond to the frequency ratio of the two semitones, the value of b is set to 12. This method ensures that each semitone range is crossed by a spectral line and thus the tone is recognized, a method that guarantees accuracy without much computational stress.

2.2.2

Realization of CQT by calling FFT

Since the spectral lines of the CQT are exponentially distributed and are more computationally intensive than the linear spectral lines, it is desired to reduce the computational effort by calling the FFT [21]. The CQT can be computed directly in the time domain by initializing the parameters f_max, f₀ and b according to the definition of CQT, and the FFT is called in the frequency domain of the CQT to convert the definition of CQT into coefficient expressions: (15) $X (f_{k}) = \frac{1}{N (f_{k})} \sum_{n = 0}^{N (f_{k}) - 1} x (n) \cdot W (f_{k}, n) \cdot e^{- \frac{2 j π Q_{n}}{N (f_{k})}}$

where: W(f_k, n) is the window function of frequency f_k.

Further, the CQT coefficients can be expressed in matrix form as: (16) $X^{C Q T} (K) = T \cdot x_{m a t r i x}$ (17) $T (f_{k}, n) = {\begin{matrix} \frac{1}{N f_{k}} \cdot W (f_{k}, n) \cdot e^{- \frac{2 j π Q_{n}}{N (f_{k})}} & n < N (f_{k}) \\ 0 & n \geq N (f_{k}) \end{matrix}$

Where: X is the CQT coefficients, the transform kernel T is a n × k-dimensional matrix, n is the number of frequency points, and k is the maximum window length. When N(f_k) < k when T is a sparse matrix because T all zeros after the N(f_k)th element of the kth row. x_matrix is the matrix form of the signal x_n under the window function W(f_k, n) and the number of rows of the matrix is k. The transformation kernel T is produced when the transformation factor ñ is kept constant.

2.3

Wavelet denoising algorithm

Audio signals in the recording and playback process will produce noise, in the conversion process will inevitably produce noise, affecting the recognition effect and accuracy of the audio spectrum.

The noise in the audio file is generally Gaussian white noise, which is randomly distributed in the time domain, and the superposition of the original audio signal will affect the rhythm point judgment, resulting in music and lighting rhythm confusion. To address this problem, considering that the more popular LMS adaptive denoising algorithm has the defect of large computational volume, and because the energy of the signal is mainly distributed in a small number of larger wavelet coefficients, while the noise is uniformly distributed in the wavelet domain, corresponding to the smaller wavelet coefficients, we have chosen the wavelet transform algorithm as a denoising tool. Wavelet transform denoising algorithms can be categorized into masked denoising method, mode maxima denoising, and wavelet thresholding denoising [22].

After comparing the performance and advantages and disadvantages of three wavelet denoising algorithms, taking into account the characteristics of low signal-to-noise ratio in music signals, this paper uses wavelet threshold denoising algorithm, and the main work is to compare the advantages and disadvantages of the traditional soft and hard threshold wavelet denoising, and put forward a new threshold function, which overcomes, to a certain extent, the defects of the hard threshold function destroying the smoothness of the signals, and the soft threshold function that easily causes distortion, and suppresses the noise influence.

We use Gaussian white noise as a noise source and assume that there is an existing band-noise signal f(t) = s(t) + o(t), where s(t) represents the source signal and o(t) represents Gaussian white noise with variance σ² [23]. A discrete wavelet transform is done on the band-noise signal to obtain Eq. (18): (18) $w_{f} (j, k) = w_{s} (j, k) + w_{o} (j, k) k = 0, 1, 2, 3, ..., N$

where i denotes the number of decomposition layers and N denotes the signal length. According to the characteristics of the original signal and noise distribution in the wavelet domain, set a threshold λ, when the signal wavelet coefficient w_f(j, k) < λ, we think that the noise signal wavelet coefficient w_f(j, k) is mainly caused by the wavelet coefficient w_s(j, k) of the original signal, can be removed w_o(j, k), that is, w_f(j, k) ≈ w_s(j, k), remember the wavelet coefficient of the denoised wavelet coefficient is ${\bar{w}}_{f} (j, k) = w_{s} (j, k)$ . Wavelet denoising is mainly soft and hard thresholding denoising in two ways, the threshold function is shown in equations (19), (20): (19) $\tilde{W} (j, k) = {\begin{array}{l} s g n (w_{f} (j, k)) (| w_{f} (j, k) | - λ) & | w_{f} (j, k) | \geq λ \\ 0 & | w_{f} (j, k) | < λ \end{array}$ (20) $\tilde{W} (j, k) = {\begin{array}{l} w_{f} (j, k) & | w_{f} (j, k) | \geq λ \\ 0 & | w_{f} (j, k) | < λ \end{array}$

Eq. (19) is the soft threshold function, Eq. (20) is the hard threshold function, and λ denotes the threshold value. Hard threshold denoising is below the threshold λ wavelet coefficients are all set to 0, can be retained, but this will cause signal oscillation, destroying the smoothness of the original signal. Soft threshold denoising is different from the hard threshold “one-size-fits-all” approach, the wavelet coefficients smaller than the threshold is set to 0, and then the wavelet coefficients larger than the threshold is subtracted from the threshold to get the new wavelet coefficients, the waveform obtained compared to the hard threshold denoising is smoother, but the wavelet coefficients of the denoised wavelet coefficients and the wavelet coefficients of the original signal in this way there is a fixed bias that affects the accuracy of the reconstructed signal, making the original signal smoother and more accurate. This affects the accuracy of the reconstructed signal and makes the reconstructed signal edge distortion.

In order to solve the problem of appealing the traditional wavelet threshold denoising algorithm, we propose a new threshold function, such as equation (21): (21) $\tilde{w} (j, k) = {\begin{array}{l} sgn (w_{f} (j, k)) [| w_{f} (j, k) | - \frac{a λ}{a + | w_{f} (j, k) | - λ}] & | w_{f} (j, k) | \geq λ \\ 0 & | w_{f} (j, k) | < λ \end{array}$

where a is the moderating factor, and the asymptotic equation can be found for the function $f (x) = sgn (x) (| x | - \frac{a λ}{a + X - λ})$ .

If $x < 0, \lim_{x} \to - \infty \frac{f (x)}{x} = 1, \lim_{x \to - \infty} f (x) - x = 0$ .

If, $x > 0, \lim_{x \to + \infty} \frac{f (x)}{x} = 1, \lim_{x \to - \infty} f (x) - x = 0$ .

Therefore, the asymptotic line equation of function f(x) is y = x, from which it can be introduced in Eq. (19), as the wavelet coefficients w_f(j, k) increase infinitely, the gap between $\tilde{w} (j, k)$ and w_f(j, k) decreases infinitely, which gradually approximates the hard threshold function, solving the defect of the soft thresholding method that there is a fixed deviation between the original signal and the denoised signal.

This threshold function possesses several properties as follows. 1)

Continuity at threshold λ can effectively avoid the problem of reconstructed signal oscillations caused by the hard threshold function.

2)

When $| w_{f} (j, k) | \to + \infty, \tilde{w} (j, k) = | w_{f} (j, k) |$ , and $\lim_{| w_{f} (j, k) | \to + \infty} \frac{| w_{f} (j, k) |}{| \tilde{W} (j, k) |} = 1$ . This means that as |w_f(j, k)| is incremented, $\tilde{w} (j, k)$ gradually approaches the hard threshold function, and the speed of approximation is related to the adjustment factor a.

3)

If a = 0, the new threshold function is equivalent to the hard threshold function; if a → + ∞, the new threshold function is equivalent to the soft threshold function. A better denoising effect can be obtained by a reasonable choice of the value of a.

In addition to the threshold function, the size of the threshold is also an important factor affecting the denoising performance, the threshold is too large, it is easy to filter part of the signal wavelet coefficients, the loss of signal energy; the threshold is too small, the reconstructed signal still contains noise components, the denoising effect is greatly reduced.Donoho proposed the classical universal threshold $λ = σ \sqrt{2 lnN}$ (N represents the length of the signal, and σ represents the standard deviation of the noise), but this way in each decomposition scale uses the However, this approach adopts the same threshold in each decomposition scale, when N is too large, λ is too large, and it is easy to zero the signal detail wavelet coefficients in the denoising process, which destroys the signal details. We choose λ_j = λ ln (j + 1).

3

Music spectrum recognition and audio signal denoising experiments

In this chapter, the performance of the spectrum recognition music information technology proposed above will be examined to explore the feasibility of its application in music education teaching. The experiment mainly includes two parts: audio spectrum recognition and audio denoising.

3.1

Audio Spectrum Recognition Experiment

In this section, two students majoring in vocal music at the School of Music of R Comprehensive University will be selected for this experiment. The two students are referred to as Student A and B. Student A will sing the country ballad style song “Hairy Rain”, while Student B will sing the jazz style song “Night Shanghai”.

First of all, the audio spectrum recognition of student A’s singing voice is carried out, and the relevant data of the recognition is specifically shown in Table 1. In the data, it can be found that the sound pressure of the 1st and 2nd overtones is higher than that of the fundamental, and the 1st overtones are higher, with a sound pressure of 48 dB. The sound pressure of the whole harmonic column ranges from 0.1 to 48 dB, and there are 8 of them above 10 dB, which indicates that the singer sings with more strength, and belongs to the singing style of strong starting tone. The whole harmonic column has an integer multiple of 1:2:3:4:5:6:7:8:9:10:11 between frequencies, and the relationship between degrees is upward pure octave, pure fifth, pure fourth, major third, minor third, minor third, major second, major second, major second, minor second, and belongs to the standard harmonic column arrangement structure. This indicates that the sound has a clear musical effect and a bright timbre. However, there are 10 overtones in total, but there are only 6 (1st to 6th overtones), which are relatively few, indicating that the sound is dry and straightforward, and lacks a sense of mellowness and magnetic effect.

Table 1.

A student’s spectrum identification data

Harmonic sequence	Frequency(Hz)	Pitch	Sound pressure(dB)
Pitch	584.57	d²-17	34.97
1st overtone	1,165.91	d³-21	48.06
2nd overtone	1,747.43	a³-21	36.65
3rd overtone	2,388.14	d⁴-17	30.46
4th overtone	2,907.21	^#f⁴-39	21.25
5th overtone	3,491.75	a⁴-22	28.01
6th overtone	4,088.45	c⁵-49	18.6
7th overtone	4,672.93	d⁵-18	10.39
8th overtone	5,254.53	e⁵-15	6.63
9th overtone	5,909.71	^#f⁵-11	-0.51
10th overtone	6,408.02	g⁵+29	9.63

The results of the spectral identification of Student B are specifically shown in Table 2. It can be seen that the fundamental of Student B is much stronger than the overtones, and five of the first seven overtones have negative sound pressure values, below 0 dB. This indicates that the singer sang with less intensity. There are 11 overtones in total, 7 of which are more obvious, and the relationship between the frequencies of the harmonic columns is 1:2:3:4:5:6:7:8, and the degrees are upward pure octaves, pure fifths, pure fourths, major thirds, minor thirds, minor thirds, and major twos, which belong to the standard harmonic columns. This indicates that the singer’s voice has an obvious musical effect, and that the tone is round and bright. The low number of overtones indicates that Student B has little resonance.

Table 2.

B student’s spectrum identification data

Harmonic sequence	Frequency(Hz)	Pitch	Sound pressure(dB)
Pitch	529.1	c2+11	18.38
1st overtone	1,058.25	c3+11	-9.31
2nd overtone	1,584.44	g3+10	-3.35
3rd overtone	2,113.49	c4+9	-12.92
4th overtone	2,639.44	e4-7	1.58
5th overtone	3,168.59	g4+10	1.31
6th overtone	3,697.84	bb4-23	-2.82
7th overtone	4,226.89	c5+9	-3.63

Obviously, the audio spectrum recognition algorithm proposed in this paper can effectively recognize the spectrum of different styles and types of songs sung by students in music teaching, effectively summarize students’ vocal singing characteristics and voice features, and provide effective assistance for music teaching.

3.2

Audio signal denoising experiment

The original audio signal sample used for the audio signal denoising experiment in this section is the singing clip of student A from the previous section. The wavelet denoising algorithm proposed in this paper is used to denoise the original audio signal, as shown in Figure 1. Figure (a) is the original audio signal without denoising, and Figure (b) is the audio signal processed by the wavelet denoising algorithm in this paper. From the figure, it can be clearly seen that the noise has been processed, the noise noise in the audio signal is greatly reduced, and the signal-to-noise ratio SNR is improved from 12.634 to 23.163. The wavelet denoising algorithm proposed in this paper has a significant denoising effect on the audio signal.

4

Music teaching strategies based on music information technology

The purpose of this chapter is to explore and analyze the application of the music spectrum recognition information technology in vocal music teaching proposed above, and correspondingly to propose effective music teaching strategies, including effective breathing support and vocal relaxation techniques, resonance tuning and timbre shaping, vocal exercises and skills training.

4.1

Effective breathing support and vocal relaxation techniques

Effective breath support and vocal relaxation techniques are interdependent and complementary in vocal teaching. Effective breath support provides a steady supply of air for vocal relaxation, which makes the vocal process more comfortable and stable. At the same time, vocal relaxation techniques can help vocal performers to realize better resonance, adjust their vocal position, and relieve physical and mental stress, so as to enhance the quality and expressiveness of their voice. In vocal music teaching, teachers need to analyze the students’ spectral recognition results and guide them to master the correct breathing support techniques and vocal relaxation methods, and through repeated instruction and practice, help them to use these techniques in practice and form good breathing and relaxation habits.

4.2

Resonance tuning and tone shaping

Resonance tuning and timbre shaping are inextricably linked in vocal music teaching, influencing and promoting each other. Resonance tuning provides the basis for tone shaping. Through reasonable resonance adjustments, vocal performers can improve their voice’s resonance effect and expressiveness, provide more rich sound texture, and change space for tone shaping. The technique and application of tone shaping also further enrich and optimize the effect of resonance tuning, so that the resonance space and resonance area can be more accurately controlled and adjusted. In vocal music teaching, teachers need to guide students to master the correct resonance tuning techniques and the principles of timbre shaping, analyze the students’ timbre shaping problems with the help of the students’ spectral recognition results, and help them achieve stable resonance and rich timbre through repeated guidance and practice.

4.3

Vocalization exercises and skills training

Vocal music teaching involves both vocal exercises and skills training, which are interdependent and complementary. Vocal exercises provide voice learners with the opportunity to acquire a good vocal foundation, and through systematic practice and feedback, they can master correct vocal techniques and strengthen the stability of their voices. Skill training, on the other hand, further improves learners’ performance ability and musicality through various specific vocal technique exercises, so that they can effectively convey emotions and shape musical images through their voice. Teachers should understand the differences in students’ vocal skills and training needs based on spectrum recognition results, and choose appropriate teaching strategies and methods according to their characteristics and levels.

5

Music Teaching Practices Based on Music Information Technology

In the previous chapter, music teaching strategies such as resonance tuning and timbre shaping, vocal exercises, and skills training were proposed in conjunction with the music information technology based on spectrum recognition proposed in this paper. In this chapter, we will use music teaching practice as a means to explore the effect of music information technology on enhancing the interactivity of music education teaching and results. The experimental subjects of this music teaching experiment were students of the 2022 grade in the vocal music major of the School of Music of R Comprehensive University, and the experimental period was from September to October 2024.

Before formally carrying out the analysis, the music teaching behaviors were numbered in numerical order, as shown in Table 3. It mainly includes 12 categories of teaching behaviors such as teacher’s prompts, teacher’s instructions, classroom organization and management.

Table 3.

Teaching behavior

Number	Teaching behavior
1	Teacher’s tips
2	Teacher’s instructions
3	Classroom organization and management
4	The teacher’s question
5	Teacher’s acceptance
6	According to the teacher’s No
7	Questions from students to teachers
8	Students’ reflection to other students
9	Students’ Speeches to Teachers
10	Students’ speeches to other students
11	Students’ thinking and connection
12	Media tool information presentation

5.1

Analysis of teaching interactions

Due to the complexity of the types of teaching interactions, it is difficult to precisely define teaching interactions in terms of various “groups of teaching behaviors”. In this paper, we first depict the points on the time axis of the collected behaviors, and then connect them sequentially, so as to reflect the classroom teaching process in the field. The “time-behavior diagram” of the 5th music class in this music teaching practice is shown in Figure 2. In the time-behavior diagram, the horizontal direction is the direction of time, the vertical direction is the arrangement of various behaviors, above the horizontal coordinate is the teacher’s behavior, and below the horizontal coordinate is the student’s behavior. From the coordinate graph, every time the curve intersects with the horizontal coordinate, it can basically be considered as a teacher-student interaction. As can be seen from the graph, the teacher-student interactions during the experiment were more frequent, and the number of teacher-student interactions reached as many as 18 times in one lesson, and the interactivity of music teaching was significantly improved.

5.2

Instructional model judgment

Teaching mode judgment is mainly based on the analysis of students (S)-students (T), through the T behavior occupancy rate, S occupancy rate, behavior conversion rate calculation, drawing out the teaching mode analysis chart, specifically as shown in Figure 3. It can be seen that the T behavior occupancy rate is 0.58, and the behavioral conversion rate is 0.41, and the teaching mode of the music course in this music teaching practice is still the common “dialogue type”. In the subsequent improvement of music teaching, further attempts can be made to strengthen the guidance of students’ behavior in the classroom.

5.3

Analysis of Relative Frequency Distribution of Teaching Behaviors

The frequency of various teaching behaviors in this music teaching practice was counted, as shown in Figure 4. It can be seen that the distribution of various behaviors was relatively balanced, with Type 7 behaviors (students’ responses to the teacher) being the most frequent at 24 times. Type 11 behaviors (student thinking and connecting) had more behaviors with 20 times, second only to Type 7 behaviors. This suggests that this music teaching practice emphasizes student participation in the classroom, reflecting the’student-centered’ curriculum concept, and that the interactivity of music teaching has significantly improved.

6

Conclusion

This paper proposes the music spectrum recognition as the core of music information technology, and music education teaching combined name, puts forward the corresponding music teaching strategy, carries out music teaching practice, and explores the effect of music information technology in enhancing the interactivity of music education teaching. First of all, the effectiveness of the music spectrum recognition and denoising method proposed in this paper is examined. In the audio spectrum recognition experiment, effective spectrum recognition can be realized for students singing different styles of songs, based on the recognition results, it is concluded that student A who sings the country ballad style song “Hairy Rain” has a bright color but lacks a sense of mellowness and magnetic effect, while student B who sings the jazz style song “Shanghai at Night” has a mellow, bright tone but does not have a strong resonance when he/she sings. In the audio denoising experiment, the SNR of the audio signal using the wavelet denoising algorithm in this paper is improved from 12.634 to 23.163, and the noise and clutter in the audio signal can be greatly reduced. Combining spectrum recognition-related music information technology and the corresponding music strategy proposed in this paper, music teaching practice can be carried out. Taking the 5th music class as an example, the teacher-student interaction is more frequent, and the number of teacher-student interactions in one class is as many as 18 times, which significantly improves the interactivity of music teaching. The music teaching mode during the practice was conversational, with a T-behavior occupancy rate of 0.58 and a behavioral conversion rate of 0.41. Statistics on the frequency of various teaching behaviors during the music teaching practice showed that the teaching behaviors with the highest number of times were the Type 7 behaviors (students’ response to the teacher) and Type 11 behaviors (students’ thinking and connecting), which amounted to 24 and 20 times, respectively, and the classroom participation of the students was significantly increased. The interactive nature of music teaching is outstanding.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

A Study on the Application of Music Information Technology in Enhancing the Interactivity of Teaching and Learning in Music Education

Yanxiu Chen

Published Online: Mar 24, 2025

Received: Nov 04, 2024

Accepted: Feb 16, 2025

DOI: https://doi.org/10.2478/amns-2025-0786

KeywordsFFT, Spectrum Recognition, CQT, Wavelet Denoising Algorithm, Music Teaching

© 2025 Yanxiu Chen, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
FFT, Spectrum Recognition, CQT, Wavelet Denoising Algorithm, Music Teaching