Open Access

A Study of Artistic Expression and Spectral Data Analysis in Yangqin Performance Techniques

  
Sep 24, 2025

Cite
Download Cover

Introduction

The yangqin is a musical instrument that has an important position in the history of Chinese music. Since ancient times, the yangqin has been the instrument of choice for the literati, and it not only embodies the essence of ancient Chinese music culture, but also is one of the most widely used instruments in Chinese classical music [14]. In terms of the performance techniques and artistic expression of the yangqin, continuous innovation is the fundamental reason why the yangqin has stood out in the world. Yangqin techniques include pizzicato, glissando, overtones, vibrato, transcription, etc., of which the most basic operation is pizzicato. Usually three fingers are used, the first plucking the string, the second loosening the string, and the third playing the string [58].

To be a good yangqin player, it requires a lot of work on the technical aspects, especially the stabilized finger-picking technique. The training of finger-picking is the cornerstone of yangqin technique, and stability and speed can be improved by practicing hard [910]. In the practice of finger-picking, the yangqin player should always remember to keep a good posture and learn to observe the distance between the fingertips and the strings carefully. Only by mastering the finger-picking technique can they play beautiful music. In addition, some of the techniques of the yangqin also need constant training to be mastered yangqin players also need to pay attention to the overtones and tuning of the strings, and be able to flexibly adjust the tension of the strings [1114] to adapt to different tonal requirements. The study of yangqin technique and artistic expression is one of the important branches in the field of Chinese music history research. By continuously exploring the techniques and artistic expressions of the yangqin, we can not only inherit and carry forward the cultural connotation of the yangqin, but also promote the international exchange of Chinese classical music [1517].

In order to realize the measurement and analysis of spectral data in yangqin performance, this study proposes a model for analyzing spectral data of yangqin performance with the processing and identification of spectral features and the quantization of spectral data as the core contents. The importance of time-frequency fine features is calculated and ranked using the feature selection based on recursive feature elimination and random forest, and fused with MFCC and NMFCC to form two new features before the random forest model is trained to complete the processing and identification of spectral features. The spectrum quantization process follows the principle of nearest neighbor and the principle of center of mass, and the split method is used to design the initial codebook, set the codebook and iterative training parameters to complete the spectrum quantization work. This paper selects the yangqin concerto work “Smoke” created by young composer Liu Chang as the object of research and analysis, and analyzes its yangqin playing skills performance in depth from the aspects of spectral data analysis and artistic expression evaluation.

Spectrum data analysis model of yangqin performance

The traditional evaluation of yangqin performance skills tends to focus on generalized characterization, and there is still less research work on frequency domain data. With the rapid development of information technology, it is more and more feasible to analyze the spectral data of yangqin performance. In this paper, we build up a model for analyzing the spectral data of yangqin performance, propose a spectral feature processing and recognition algorithm to more accurately characterize the resonance body spectral features, and realize spectral quantization on this basis.

Spectral Feature Processing and Recognition Algorithm
Mel frequency cepstrum coefficients based on non-pitch components

Yangqin Audio Generation Model

In the sound generation process of the yangqin, it can generally be viewed as a process in which the audio signals emitted by the vibrating excitation source are co-acted by the resonator. This process can be modeled by the excitation source-filter model. Its time domain expression is in convolution form: y(t)=x(t)*h(t)

Where x(t) is the vibration excitation signal of the sound generating source, h(t) is the unit impact response of the resonator filter, and y(t) is the final output sound signal of the instrument. According to the correspondence between time-domain convolution and frequency-domain multiplication, the frequency-domain expression of the excitation source-filter model is: Y(ω)=X(ω)H(ω)

Where X(ω) is the frequency domain representation of the vibration excitation signal, H(ω) is the frequency domain representation of the unit impact response of the resonator filter, and Y(ω) is the frequency domain representation of the musical instrument sound signal. It can be seen that the timbre of the musical signal is mainly determined by two parts: the vibration excitation signal and the resonant body filter. The frequency, material and shape of the excitation source will affect the properties of the excitation signal, and the size, shape and material of the resonator are also important factors affecting the timbre of the yangqin. However, the excitation source of the same instrument will always change, for example, the guitar strings include steel strings and nylon strings, etc., and the strings made of the same material on a guitar have thicknesses, while the resonance body is relatively fixed for a musical instrument, so if the resonance body component of the music signal, i.e., the non-pitch component, can be extracted, it can be more stable and effective in the characterization of the musical instrument timbre.

NMFCC extraction process

Firstly, the effective audio segment is intercepted by the endpoint detection algorithm, then the base frequency of the audio segment is estimated to get the base frequency f1, and then the effective audio segment is resampled based on the base frequency f1, so that the sampling rate becomes a power of 2 times of the base frequency f1 to reduce the spectral leakage [18]. Next, pre-processing operations such as pre-emphasis, framing and windowing are performed. After obtaining the sequence of frames after windowing of the valid audio, the FFT transform is applied to the frame to obtain the spectrum of the frame [19]. Harmonic marking of the spectrum, obtain the index series of each harmonic peak maximum point, intercept the width of the harmonic spacing centered on the index series of each harmonic peak maximum point, set the harmonic components within the width to zero, then use cubic spline interpolation to interpolate the gap between the zeros, and then smoothing to obtain the non-pitch components. Then the squared FFT mode values of the non-pitch components are used to calculate the energy spectra of the non-pitch components, and then the energy outputs of each filter are filtered using the Meier filter bank, and finally the logarithmic operation is performed on the filter outputs and then the discrete cosine transform is performed to obtain the final MFCC features.

Time-frequency fine characterization

Random Forest [20].

Random forest belongs to a kind of integrated learning, which improves the accuracy by training and classifying votes on multiple decision tree units, calculating the number of votes and using the label corresponding to the highest number of votes as the final classification result. For each decision tree, Random Forest draws samples from the training set each time, putatively and randomly, with the same number of samples as the size of the original training set. Thus the training set for each tree is different and somewhat the same. Because random forests both randomly draw samples and randomly select a subset of features from all features, random forests are more resistant to overfitting.

The process of constructing a decision tree includes selecting samples, selecting a subset of features, and selecting features from the subset of features for node splitting. If there is x sample in the original sample pool, when constructing the decision tree we need to first randomly draw x samples from the sample pool with putbacks to train the current decision tree. Since each draw is put-back, drawing samples for each decision tree will result in about one-third of the samples in the original sample pool being drawn at that time. After obtaining x training samples, the number of features in the training set is set to Y. Then each time the decision tree needs to be split, y features are randomly selected from Y, where y < < Y, and one feature is selected as a splitting attribute from y according to some strategy. The strategy here can be minimum Gini impurity or maximum information gain, etc. After the split is completed, the second best feature is selected from the remaining features as the new split attribute, until there are no remaining features or the best classification performance is achieved. This constitutes a decision tree, and the cycle of generating multiple decision trees forms a random forest.

Random forest can not only deal with the dataset containing high dimensional features, but also generate the importance score of each feature dimension, which can provide feature ranking as a reference for our feature selection on the basis of high classification performance. Moreover, it has good ability to prevent overfitting and anti-noise, so Random Forest model is chosen as the classification model in this study.

Feature Selection

In feature engineering, we always want to filter out invalid features or even counterproductive features through effective feature selection algorithms, and select the highest performance feature subset in the existing feature set. This can not only improve the classification performance of the model, but also reduce the feature dimension and thus improve the computing speed.

Feature selection is mainly based on filtering ideas, wrapping ideas, embedding ideas, etc. Filtering refers to the idea that by setting some criteria for the features, when the indicators calculated by the features do not meet the criteria, they are filtered.

Embedding idea is mainly to choose the machine learning model to train the feature subset and obtain the weight of each feature dimension, according to the weight and then recharacterization of feature selection, which relies on the setting of the threshold value. And the threshold value as a hyperparameter it is often difficult to determine.

Recursive feature elimination and random forest based feature selection

Feature selection ideas in the filtering idea of high computational efficiency, but the classification accuracy is low, the embedding idea classification accuracy is high, but it is difficult to grasp the threshold setting, the parcel idea classification accuracy is high, in the case of a larger dataset consumes more resources. In summary, considering that the data set in this paper is not large, the parcel idea is chosen, and the feature selection based on recursive feature elimination and random forest in the parcel idea is more comprehensive than forward search and backward search in the process of feature selection, and has a higher classification accuracy rate, so this study uses the algorithm that combines the recursive feature elimination and the random forest for the ranking of the importance of the features and feature screening. And the importance score adopts the fl-weighted score criterion, which determines the weight of fl-score by the frequency of occurrence of each category, and then weights the fl-score to obtain the weighted calculation. The formula of flscore is as follows: f1score=2×TPTP+FP×TPTP+FNTPTP+FP+TPTP+FN

Where TP refers to the number of actual positive categories that were classified correctly, FP refers to the number of actual negative categories that were treated as positive by the classification model, and FN refers to the number of actual positive categories that were misclassified as negative.

Spectrum quantization algorithm

For superior performance coding, scalar quantization alone cannot be achieved. When multiple source symbols are united to form a multidimensional vector and then the vector is scalar quantized the degree of freedom will be greater, the quantization base can be further reduced and the code rate can be further compressed with the same distortion.

Quantization is an efficient data compression and coding technique. Its basic idea is to form a vector from a number of scalar data, it divides the vector space into a number of small regions, each small region looks for a representative vector, and vectors that fall into the small regions during quantization are replaced with this representative vector. As shown in the figure below, each vector in the space is quantized into the red * vector in the small region it falls into.

The basics of quantization, the speech signal consists of many frames, a frame of the speech signal is similar to a vector, the voice channel parameters extracted from a particular frame of the speech signal, a total of K, Xi = {ai1,ai2,…,aik}. Then Xi is a K-dimensional vector. N speech frames, each with a total of K vocal tract parameters, form a total of N K-dimensional vectors. The basic principle of quantization, quantization is the mapping of this K-dimensional input vector X into another K-dimensional quantized vector. Where the set formed by the quantized vectors is called a codebook or codebook, and each vector in the codebook is called a code word or code vector. Yò{Y1, Y2,…, YN | YiòRK} that quantizes a vector X. First, a suitable distortion measure is chosen, and then, using the principle of least distortion, the distortion introduced by replacing X with a quantized Y3 vector is calculated separately. Where that quantized vector corresponding to the minimum distortion value is the reconstruction vector (or recovery vector) of the vector X.

A criterion for quantization that minimizes the distortion caused by quantization for a given codebook size K.

The design of quantization, in speaker recognition system, the recognition process is divided into two aspects: training and recognition. In the training process, it simply means that the codebook is generated by the LBG algorithm and continuously trained to optimize the codebook and get a well-designed codebook [21]. In the recognition process, the first step is also to train from unknown speech and then quantize the sequence of feature vectors from each codebook in turn. In codebook design, to get a well-designed codebook that is, to minimize the statistical mean of the error in the codebook design, the following two principles are followed.

Nearest Neighbor Principle NNR [22]. This criterion should be followed when selecting the corresponding code word based on X. The mathematical expression is: d(X,YI)=mind(X,Yi)

Center of Mass Principle. Setting the set of all input vectors X that select the code word YI to SI, YI should minimize the error value between all vectors in this set and YI. If the Euclidean distance is chosen as the distortion measure, then YI should be the center of mass of all vectors in with the expression: YI=1NxSIX,I where N is the number of vectors contained in SI.

When quantization of the signal is carried out, the quantization interval is generally made to be finely divided in the range of values where the number of occurrences of the signal value is high with high probability, and slightly sparsely densely divided where the number of occurrences of the signal value is low with low probability, and in this way the average quantization distortion can be reduced. Based on the LGB algorithm, the design and selection of the initial codebook has a great influence on the design of the optimal codebook. The methods for generating the initial codebook are random selection method, splitting method and chain mapping method. The random selection method is simple and does not require initialization, but it will select atypical vectors as code words, and its convergence speed is slow in code book training, and the code words in the trained code book are not fully applied. The split method and chain mapping method can make up for the shortcomings of the random selection method. Therefore, the design of the initial codebook in this experiment adopts the splitting method, and the following are the specific steps for generating the codebook.

Step 1, set the codebook and iterative training parameters, set the set of training vectors X of all input music signals to be S, set the size of the codebook to be N, set the maximum number of iterations of the iterative algorithm to be L, set the number of quantization levels, the threshold for distortion improvement to be δ, the initial value of the number of iterations to be m=1, the initial value of the aberration to be D(°) = ∞, and set the fission step size to be ε and generally ε ∈[0.01,0.05].

In Step 2, all the trained sequences of the recorded music are first considered as one class and their center of mass is calculated as the code word of the initial codebook: YI(0)=1NXiòSXi

Step 3, set the initial value y1(0), y2(0), y3(0), …, yM(0) for the M codebooks.

Step 4, divide 5 into M different subsets of S1(m), S2(m), ⋯, SM(m) according to the nearest neighbor Huai rule.

When XSi(m) exists: d(X,Ylm1)d(X,Yimi),i,i1

Step 5, Calculate the total aberration D(m): D(m)=l=1m d(X,Yl(m1))

Step 6, the relative value δ(m) of the distortion improvement ΔD(m) is calculated: δ(m)=ΔD(m)D(m)=|D(m1)D(m)|D(m)

Step 7, once again, calculates the yardage of the new Mabon Y1(m),Y2(m),Y3(m),,YM(m) : Yi(m)=1NiXSi(m)x

Step 8, comparing δ(m) < δ? if it holds, go to step 10 for execution, otherwise, go to step 9 for execution.

Step 9, judge m<L? if it holds, another m = m + 1, while transferring to step 4 for further execution, otherwise, transferring to step 10 for execution.

Step 10, send the generation to the end, take Y1(m),Y2(m),Y3(m),,YM(m) as the code word of the trained codebook, and output the total aberration D(m).

The quality of the codebook is improved to some extent by using the splitting method to generate the codebook.

Spectral data analysis of the yangqin concerto “Smoke

Yangqin Concerto “Smoke Gesture” was composed by young composer Liu Chang in 2016, which is one of the representative works among the new works of Yangqin in recent years. The motif of the piece originates from the folk song of Sangzhi in western Hunan province, and the composer adopts the western music composition method of “symphonic thinking” to create the piece, which makes the music have conflict and contrast. The whole piece is divided into six parts: introduction, slow movement, middle movement, fast movement, climax and coda, with a clear structure and exquisite conception, and the moods are constantly advancing in the music.

In this paper, we will take “Smoke” as the research object, take the spectral data analysis model of yangqin performance constructed in this paper as the means, with the help of this paper's spectral feature processing and recognition algorithm, identify different playing bridges in the work of hitting playing, scraping playing, and anti-bamboo playing, and use this paper's spectrum quantization algorithm to carry out the analysis of the spectral data.

Percussion

The spectrograms of the bass, middle and treble regions of the yangqin concerto “Smoke Gesture” performed are specifically shown in Figure 1. As can be seen from the figure, the obvious peaks in the spectrogram of the bass region are around 20, while the number of peaks in the spectrograms of the middle and treble regions are around 15 and 10. It can be seen that the number of peaks in the spectrograms from the bass to the treble regions are decreasing, which is in line with the characteristics of the yangqin as a pitch instrument. Moreover, the distribution of peaks from the bass region is relatively decentralized, while the middle and treble regions are relatively concentrated, and the fundamental frequency energy is also gradually enhanced with the rise of pitch.

Figure 1.

Pitch

Scratch playing

When the yangqin concerto “Smoke” is played by scraping, the maximum amplitude difference of each tone region is shown in Figure 2. It can be seen that the energy in the performance of the concerto “Smoke” using the scratch method is concentrated between 394 and 2442 Hz. The maximum amplitude difference between the bass and middle registers is 12.85 and 14.06, while the maximum amplitude difference between the tenor and soprano registers reaches 25.07 and 32.5, and the maximum amplitude difference in the bass register is the smallest.

Figure 2.

Maximum vibration difference

Anti-Bamboo Play

In this section, the melodic pitch frequency of the yangqin concerto “Smoke” is analyzed when it is performed using the anti-bamboo method. A total of 24 sets of pitches are counted and labeled with the numbers 1~24, corresponding to the specific melodic pitch frequencies as shown in Fig. 3. Figure a shows the reference frequency values from an objective point of view, while Figure b shows the melodic pitch frequency of the yangqin concerto “Smoke” using the anti-bamboo playing method. It can be seen that the energy of the performance of the concerto “Smoke” is mainly concentrated in the range of 440-740 Hz, with a difference of 300 Hz. Comparing with the reference frequency values, the melodic pitch frequencies of the concerto “Smoke” are generally slightly higher, with an average of each group of pitch frequencies higher than the reference frequency value by 6.65 Hz, and the 19th group of soprano notes is higher than the reference frequency value by 30.26 Hz at most, which conforms to the performance timbre of the anti-bamboo method. This is consistent with the brighter character of the playing tone of the anti-bamboo method.

Figure 3.

Frequency

Evaluation of the artistic expression of the yangqin concerto “Smoke

Music is the art of hearing, the objective evaluation of sound is based on the physical level of analysis, the final evaluation of the sound quality of the instrument still needs to be based on the human subjective feeling as an important basis. In this chapter, the performance of the yangqin concerto “Smoke” will be evaluated from the perspective of artistic expression. There were 10 participants in this evaluation of artistic expression, among which 2 were yangqin teachers from College of Arts A, 5 were yangqin undergraduates from College of Arts A, and 3 were yangqin artists.

The listening venue was the Recording Technology Laboratory of College A. The control values of the acoustic parameters of the listening venue were reverberation time 0.3s~0.65s, deviation less than 25%, background noise less than 35dB, ambient temperature 20°C~25°C, ambient humidity 50%~70%, which were in line with the requirements of “Chinese National Musical Instruments Sound Standard Library”.

In this paper, a multi-dimensional evaluation method will be used to evaluate the artistic expression of the dulcimer concerto “Smoke Posture”. The compilation of the dimensions of the multi-dimensional evaluation rules refers to Baldeman's analysis of the constituent elements of music, that is, from the idea of a whole, a whole, the general dimension is from “the overall appearance of the performance work”, “the accuracy of the performance technique” controlled for the purpose of music, “the degree of grasp of some elements” in the performance process, and “the ability to interpret the performance work”, a total of four general dimensions. The sub-dimensions are: difficulty of the piece, accuracy of score reading, independence of fingers, dexterity and evenness, key touch, use and relaxation of the arms, use of the pedals, mastery of tempo, mastery of rhythm, mastery of strength, mastery of timbre, musical respiration, mastery of syntax, mastery of the melodic line, harmony, completeness and fluency of the piece, clarity of the compositional structure, embodiment of the style of the piece, and expressiveness of the performance, totaling 17 sub-dimensions. Completeness and fluency of the piece, clarity of the compositional structure of the performance, embodiment of the style of the piece, and rich expressiveness of the performance, totaling 17 sub-dimensions.

Overall dimensional analysis

The evaluation of the total dimensions of artistic expression of the yangqin concerto “Smoke Gesture” is specifically shown in Table 1, using a percentage scoring method. In terms of the overall dimensions, the highest average rating was 81.1 for “mastery of some elements” and the lowest average rating was 72 for “technical accuracy”. The average ratings for “overall representation of the work” and “ability to interpret the work” were 74.9 and 80.3, respectively. Among the dimensions, the overall dimension with the largest difference between the highest and lowest ratings is “Accuracy of playing technique”, which has a high degree of dispersion, with a minimum rating of 60 for Expert 1 and a maximum rating of 95 for Expert 3, and a difference of 35 between the maximum and minimum values.

Evaluation of the total dimension

Number The overall appearance of the work Accuracy of performance technique The degree of mastery of some elements The ability to explain the work
Expert 1 69 60 82 69
Expert 2 68 64 94 92
Expert 3 95 95 80 73
Expert 4 69 64 72 85
Expert 5 66 75 86 78
Expert 6 68 90 81 73
Expert 7 76 67 77 86
Expert 8 68 68 81 77
Expert 9 94 73 84 89
Expert 10 76 64 74 81
Average 74.9 72 81.1 80.3
Dimensional analysis

The sub-dimensions were divided into 17 items, mainly including the difficulty of playing the piece, the accuracy of reading the score, finger ability, and key touching, etc., which were named correspondingly through D1~D17, as shown in Table 2.

Dimension

Number Dimension
D1 The difficulty of playing a work
D2 Accuracy of reading spectrometry
D3 Finger ability
D4 Touch key
D5 The use and relaxation of the arm
D6 Pedals
D7 Speed control
D8 Pace
D9 Strength
D10 The performance of timbre
D11 The grasp of music breathing and syntax
D12 The whole line of the melody
D13 Harmony
D14 The integrity and fluency of the work
D15 The curved structure of the play is clear
D16 The performance of the work is reflected in the style of the work
D17 Play is rich in expressiveness

The ratings of the dimensions of artistic expression are shown in Figure 4, using a five-point scale. It can be clearly seen from the figure that the highest rated dimensions are “ease of playing the piece” and “accuracy of reading the score”, with an average rating of 4.4 and 4.6, both of which were given a rating of 5 by the experts. Among the 17 sub-dimensions, “finger ability” and “grasp of force” are the only two sub-dimensions with average ratings below 3, with average ratings of 2.4 and 2.8 respectively. This suggests that in the artistic performance of the yangqin concerto “Smoke”, attention should be paid to the improvement of finger playing ability and the grasp of playing strength.

Figure 4.

Evaluation of fractal dimensions

Conclusion

This study establishes a spectral data analysis model of yangqin performance, focusing on the processing and recognition of spectral features and the quantization of spectral data. The yangqin concerto “Smoke” composed by Liu Chang, a young composer, was chosen as the research object to analyze the corresponding spectral data and evaluate the artistic expression.

The spectral data of the Yangqin concerto “Smoke” is analyzed from the three methods of playing Yangqin, namely, striking, scraping and anti-bamboo. In the conventional percussion method, the number of spectral peaks from the bass to the treble regions gradually decreases, with about 20 peaks in the bass region, and about 15 or 10 peaks in the middle and treble regions. When playing with the scraping method, the energy of the performance is concentrated between 394 and 2442 Hz, and the maximum amplitude of the tenor area and the treble area is as high as 25.07 and 32.5, while the maximum amplitude difference between the bass area and the alto area is relatively low, 12.85 and 14.06. When playing through the anti-bamboo method, the frequency of the melodic pitch is generally a little bit higher than that of the reference frequency value from the objective point of view, and the average frequency of each pitch is 6.65 Hz higher than the average frequency value from the objective point of view, and the average frequency of each pitch is 6.65 Hz higher than the average frequency value from the objective point of view. 6.65 Hz higher.

A multidimensional evaluation method was used to evaluate the artistic expression of the yangqin concerto “Smoke Gesture”. In the evaluation of the total dimensions, the average evaluation values of the four total dimensions of “overall appearance of the performance work”, “performance technique”, “grasp degree of some elements” and “ability to interpret the performance work” were 74.9, 72, 81.1 and 80.3, respectively. Among the sub-dimensions, only the average evaluation value of “finger ability” and “grasp of dynamics” was lower than 3, and the other sub-dimensions were higher than 3, and the average evaluation values of “difficulty of performing works” and “accuracy of reading music” were the highest, reaching 4.4 and 4.6.

Language:
English