Construction of Western Music Theory Teaching Model Based on Machine Learning
Published Online: Mar 19, 2025
Received: Oct 16, 2024
Accepted: Feb 10, 2025
DOI: https://doi.org/10.2478/amns-2025-0408
Keywords
© 2025 Ruijie Liao, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Since the globalization of the economy, the world is linked as a whole, and Western music has gradually gained recognition and popularity in China. Many Chinese musicians began to study the theories of Western music, hoping to provide reference for the development of Chinese music and promote Chinese music to the world [1–3]. Different people have different views on the theoretical basis of early Western music performance, and it is under such different arguments that the theory of early Western music performance has gradually penetrated into the minds of musicians, and has continuously influenced the development of world music [4–6].
Western music, like other types of music, is an important part of human survival, which not only helps people’s emotional expression, but also can stimulate human creativity. Nowadays, with the development of technology, the application of machine learning technology in the field of music has become more and more extensive, bringing great innovation [7–9]. The application of machine learning technology in the field of music not only realizes the analysis of music and recognition of emotional characteristics, but also realizes the automatic creation and performance of music [10–12]. Of course, in the process of realizing this innovative technology, there are some problems and difficulties, such as the determination of training samples and parameters, etc. However, with the continuous progress and application of technology, the application of machine learning technology in the field of music is bound to become more and more mature and popular [13–16].
Chu,X. introduced the construction method of AI music teaching evaluation model based on deep learning, which can evaluate the quality of AI music teaching after network training, and the experimental results indicated that the model’s prediction accuracy is quite high and has certain practicality [17]. Zhang,L. proposed the EFDfO framework, which combines data fusion, feature extraction and optimization techniques to customize personalized teaching strategies for students. Experimentally, it was concluded that the EFDfO model facilitates the improvement of student learning [18]. Wang,D. et al. emphasized the wide application of machine learning algorithms. In order to accurately evaluate the machine learning based music education information system and by comparing multiple models, it was pointed out that the construction of GBDT model was optimal [19]. Yuan,L. created an online music teaching model based on deep learning and neural network algorithms and proposed an improved orthogonal moment subpixel localization algorithm. The research results proved that the created model algorithm is effective and has the value of practical application [20]. Liao,S. mentioned the construction, method and application of music teaching resources based on recursive neural network, and developed a music mobile teaching platform based on the needs of users such as teachers and students and the characteristics of music teaching. It was shown through experiments that the system has good stability and is of great significance in reforming the music education model and developing music education [21]. Chen,Y. et al. introduced a music generation system based on the Transformer model, which combines an adaptive music feature encoder and a music emotion-driven multi-task learning framework. The model achieves a large performance improvement on LMD and helps to improve students’ musical skills and emotional expression enhancement, which has potential applications in music education and automatic composition [22]. Zhou,W. built a development system of local music teaching materials based on deep learning, and analyzed the current development status of TM teaching materials in each school through surveys and interviews, as well as the influence of the factors. The results show that both support and opposition occupy a certain proportion in the development of TMs, which indicates that it is necessary to carry out a systematic discussion on the two main teaching subjects of music teaching before developing traditional music teaching [23].
The research carries out data mining and processing of Western music theory teaching and students’ performance through machine learning, and constructs behavioral characteristic values for students according to the processing results. 594 students majoring in music in a university are selected as the research object, and the data mining and analysis of information such as students’ academic year results, the number of books borrowed from the library, the consumption records generated by the one-card pass, and the time spent on the campus online network, etc. are carried out to explore the correlation relationship between each piece of data. Students are divided into specialist and undergraduate categories, and logistic regression models are used to find the characteristic coefficients and make predictions, so as to derive the behavioral characteristics that are highly correlated with Western music theory. Finally, by combining the results of the study, a model for teaching Western music theory has been constructed based on the blended learning theory.
The process of data mining is actually the process of extracting knowledge from a large amount of data. Data mining has established a set of mature process system, Western music theory teaching data as shown in Figure 1, the main process of data mining are: data acquisition, data preprocessing, feature extraction, feature selection, data mining, model evaluation.

Data mining process
Data Cleaning. Every minute and every second of data in the information age is generated quickly through the Internet, and there may be dirty data obtained from different sources. In order to achieve high quality data, it is inevitable to clean these data [24]. The purpose of data cleaning is to address the anomalies that may be encountered in the original data, such as inconsistency, missingness, and outliers of the data. In addition, data cleaning is a necessary step in data mining analysis and is the most important task in data preprocessing, with different data cleaning tasks targeting different types of errors.
Data integration. In order to make the data mining process more effective and to utilize as many sources of data as possible, data with different attributes, dimensions, and structures can be consolidated by combining multiple sources of data, storing and managing them in a unified way, which is why data integration plays a key role. Machine learning is driving the automation of data integration, reducing the cost of data integration in general and improving experimental accuracy.
Data reduction. After cleansing and integrating the data, one can obtain integrated data sets of good quality. However, the massive scale dataset contains many redundant attributes that are irrelevant to data mining, and the main strategy of data reduction is to compress the original data effectively by principal component analysis or singular value decomposition, or to extract the attributes by feature extraction [25].
Data transformation. Sometimes there may be inconsistency in the magnitude of different features in the data after data preprocessing, and the difference between the values may be so large that not processing may affect the results of data analysis. There will also be a great deal of variability between feature values in the various datasets managed, such as a maximum value of 10000 and a minimum value of 0.0001. Attribute values need to be normalized, and the most commonly used methods are as follows:
Feature selection is one of the most important processes in the data mining steps, which is defined as the process of selecting a subset of features from the feature space that provides more comprehensive and relevant information with the construction of the model. Feature selection is the process of converting the original data into more meaningful features using technical means while maintaining the original features of the data, which can make the classification prediction model more effective in dealing with the actual problems in the specific domain where the target problem is located, and provide model support for the prediction accuracy of the unknown data at a later stage.
Data mining is the process of extracting useful patterns and models from large datasets, which is the most critical step in the process of information acquisition, and the technical difficulty is relatively high. Firstly, it is necessary to clarify the tasks and objectives of the business problems to be solved by data mining, and secondly, to select appropriate data mining algorithms according to the different task objectives and the characteristics of the dataset. It is worth mentioning that the selection of algorithms is complex and iterative, and it is often necessary to combine a variety of algorithms rather than being limited to one.
The final step of data mining is to evaluate the performance of the model. This process determines whether the intelligent model can be applied in practical applications. Combined with the tasks and objectives of the business problem, verify whether this model is suitable for solving the actual problem, if it can not effectively solve the actual problem, it is necessary to return to re-adjust the model or even to extract new data features and rebuild the model, practice is the only standard for testing the truth.
594 music majors of a university were selected, and the data of students’ historical scores of Western music, one-card spending, and library swipe records were collected, and the structure of the data was different because the students’ data came from different data tables. It was found that the acquired data was incomplete, with missing sample data and duplicate samples, so it was necessary to perform preprocessing operations on the acquired data.
Data preprocessing is usually divided into several parts, including data cleaning, data integration, data conversion, data normalization, and attribute creation. After processing the behavioral characteristics of students learning Western music, the behavioral characteristics are obtained as shown in Table 1.
Characteristics of student behavior
| Eigenvalue | Meaning |
|---|---|
| X1 | Learning duration |
| X2 | Student music cognitive level |
| X3 | Learn the needs of music learning |
| X4 | Students’ interest in music learning |
| X5 | Student music learning skills |
| X6 | Pre-class music preview |
| X7 | Professional skill level |
| X8 | Enthusiasm for classroom teaching |
| X9 | Student interaction |
| X10 | Musical learning skills |
| X11 | Product quality evaluation |
| X12 | Enthusiasm for music learning |
| X13 | Self-learning music |
According to the relationship between the number of times students borrowed books and their theoretical grades in the course as shown in Figure 2, there is a relationship between the number of times students borrowed books and their grades in the course. Firstly, the number of students who borrowed books 0 times is the largest, reaching 323, accounting for 54.38% of the total number of students, followed by 57 and 20 students who borrowed books 4 and 5 times, respectively, and the number of those who borrowed books more than 9 and 10 times is on the high side, while there is only one who borrowed books more than 15 times. This may be due to the fact that the school’s time limit for borrowing books is six months, with a maximum of four books at a time, so the number of times a book is borrowed cannot be too many.

The number of books borrowed and the average performance diagram of the course
In addition, students with course averages below 60 had essentially 0 checkouts, indicating that they basically did not enter the library to check out and study. In contrast, students with a high number of checkouts were generally above 60, with a high number of library entries and more study time. Correlation analysis of the number of times students borrowed books and the average grade of course theory, Pearson correlation coefficient is 0.435, significance test p-value is 1.74e-43, it can be seen that the number of times students borrowed books and the average grade of course theory there is a significant correlation.
The correlation analysis of monthly consumption as well as total consumption with students’ course average grades throughout the semester was conducted to obtain a table of correlation between the amount of students’ consumption and their course average grades, as shown in Table 2. Through the table, it can be seen that the Pearson correlation coefficient of students’ monthly consumption and total consumption with the average grade of the course throughout the semester is less than 0.1, and the p value of the significance test is greater than 0.05. Because there are many forms of student consumption and various channels of consumption, the correlation with the average grade of the course can not be derived from the consumption data of the one-card alone, and therefore it can be assumed that the correlation between the amount of student consumption and the average grade of the course is basically uncorrelated.
Correlation result
| Consumption amount | Pearson correlation coefficient | P value |
|---|---|---|
| September consumption | -0.057 | 0.095 |
| October consumption | -0.025 | 0.428 |
| November consumption | -0.019 | 0.535 |
| December consumption | -0.009 | 0.779 |
| January consumption | 0.035 | 0.299 |
| Total consumption | -0.027 | 0.407 |
The correlation analysis of the number of early risers per month of students and the average grade of students in the course was carried out, and the table of correlation between the number of early risers of students and the average grade of the course was obtained, as shown in Table 3. The table shows that the Pearson correlation coefficients of the number of early risers per month and the total number of early risers with the average grade in the course are all greater than 0.5, and the p-value of the significant value test is much less than 0.05, and the p-value of the total number of early risers is even more significant, which reaches 3.27e-73. Because the number of early risers is related to the students’ learning attitude, the students have a better habit of learning by waking up early, and the learning time every day is more sufficient, therefore, it can be concluded that there is a significant correlation between the number of early risers of the students and the average grade of the students in the course.
Correlation result
| Get up early | Pearson correlation coefficient | P value |
|---|---|---|
| The number of early hours in September | 0.522 | 8.97e-54 |
| The number of early hours in October | 0.557 | 7.34e-54 |
| The number of early hours in November | 0.548 | 1.09e-58 |
| The number of early hours in December | 0.547 | 1.65e-58 |
| People get up early in January | 0.587 | 1.11e-67 |
| Always get up | 0.601 | 3.27e-73 |
The correlation analysis of students’ monthly Internet hours and students’ average grades in the course was carried out, and the table of correlation between students’ Internet hours and average grades in the course was obtained, as shown in Table 4. Through the table, it can be seen that the Pearson correlation coefficients of students’ monthly Internet hours and total Internet hours with their average grades in the course are all less than -0.1, and both the January Internet hours and total hours are less than -0.2, and the p-values of the test of the significant value are all much less than 0.05. Since the Internet hours of students are negatively correlated with the students’ study time, the longer the Internet hours are, the shorter the study time is every day, so it can be concluded that there is a significant correlation between the students’ online hours and the students’ average grade in the course.
Correlation result
| Internet length | Pearson correlation coefficient | P value |
|---|---|---|
| September Internet duration | -0.153 | 1.48e-01 |
| October time | -0.195 | 3.85e-04 |
| November Internet time | -0.167 | 2.33e-03 |
| December Internet time | -0.124 | 0.00045 |
| January Internet time | -0.218 | 8.29e-09 |
| Total Internet length | -0.237 | 2.41e-10 |
The correlation analysis of students’ monthly use of traffic and students’ average grades in the course was carried out, and the table of correlation between students’ Internet use of traffic and average grades in the course was obtained, as shown in Table 5. Through the table, it can be seen that the Pearson correlation coefficients of students’ monthly Internet usage traffic as well as total Internet usage traffic and average course grades are all less than -0.1, and the Pearson correlation coefficients of the Internet usage time in January are less than -0.3, and the p-values of the test of the significant value are all much less than 0.05. Since the more Internet usage traffic the students have, the bigger the percentage of the entertainment time will be, and it will take longer, it is possible to think that the there is a significant correlation between students’ Internet usage traffic and students’ average course grades.
Correlation result
| Service flow | Pearson correlation coefficient | P value |
|---|---|---|
| September traffic | -0.102 | 0.002 |
| October usage | -0.169 | 1.65 e-04 |
| November traffic | -0.185 | 1.75e-05 |
| December traffic | -0.171 | 1.79e-02 |
| January traffic | -0.319 | 9.27e-16 |
| Total traffic | -0.235 | 2.45e-09 |
The logistic regression model evolved from the optimization based on the concept of linear regression, in which a linear regression is performed by fitting a given data point using 1 straight line, and the process of fitting is linear regression [26].
In linear regression, the combination of all the sample features in the training sample set are multiplied with the parameters separately, and then the obtained results are summed up, and the model form is shown in Equation (3).
Vector is equation (4).
In logistic regression, the sum obtained in linear regression is relaxed in the Sigrnoid function to obtain the minimum number of (0,1), and when this value is greater than the threshold set, it is judged to be a positive class, and vice versa for the negative class.
Assuming that there are currently
The great likelihood function in a sample is the result of multiplying the posterior probabilities of each sample as shown in Equation (6).
The log-likelihood function is shown in Equation (7).
Solve for it and derive for
When the derivative is 0, it is not possible to derive
While the previous chapters introduced the concepts of learner behavioral characteristic attributes, predictive modeling principles and methods, and model evaluation, the focus of this chapter is on the experimental steps and analysis of the results. The learners were categorized into two groups: the specialist category and the undergraduate category. Each category of learners has the behavioral characteristics attributes mentioned above, and the behavioral data of each category of learners for each week was modeled and analyzed using logistic regression models.
Using the Logistic regression model, the experimental steps are as follows:
First, using 10-fold cross-validation, the dataset was evenly divided into 10 parts, and each time, 9 of them were used as the training set and the last one was used as the test set for validation. Model all the training datasets, applying the logistic regression model to each training set one at a time until all training is complete. Apply the solved final model to the test dataset and then apply the decision rule to obtain the ROC curve and calculate the AUC value. Based on the ROC and AUC obtained from the test dataset in step 3, solve for the mean value and evaluate the model performance.
Using different behavioral activity data on Western music learners using Logistic regression modeling to obtain ROC curves and the value of AUC obtained, the specific results are shown in Figures 3 to 4, the horizontal coordinates of each grayscale graph indicates the number of weeks to build the model using the behavioral characteristics of the learner, and the vertical coordinates indicate the number of weeks to use the model to predict the learner.

Modeling prediction results

Undergraduate learners modeling the prediction results
Observation of the AUC values derived from each experiment reveals that the predictive accuracy of modeling learner behavioral data using logistic regression meets expectations. For example, Figure 3 models learners in the specialist category using 9 weeks of behavioral data and predicts the 10th week its prediction accuracy is as high as 0.91, which can be considered very high for modeling predictions. Even in the modeling analysis of undergraduate learners with a relatively small amount of data, the prediction accuracy of using 9 weeks of behavioral data and predicting the 10th week is up to 0.85. At the same time, as the number of weeks of behavioral data used increases, the prediction accuracy tends to grow, and the prediction accuracy of using the same number of weeks of behavioral data to predict a different week tends to decrease as the number of weeks of prediction increases.
Another use of logistic regression is to assess the importance of features. In this paper, 13 behavioral features affecting learners’ midway music learning activities are extracted, but each feature has a different degree of influence on the learners, another important role of logistic regression is to assess the correlation of each feature with the learners’ midway stop learning activities i.e. feature weights.
In order to get a better prediction model, it is necessary to optimize the weights of each feature. In the above experiments, only the minimum cost function solution is considered for the training data to improve the prediction accuracy, but not the generalization performance of the model, i.e., whether it can be applied to more different behavioral data prediction, so it is necessary to introduce the regular term in the original model to optimize the weights of each feature, so as to more accurately analyze the degree of the influence of different features on the churn rate. Next, each group of experiments generates a set of coefficients, i.e., weight values, for each feature, as shown in Figures 5 to 6, which are the average weight values of the 13 features for the two types of learners, respectively.

The characteristics of the specialty are important

The importance of undergraduate characteristics
Figure 5 shows the average weights of the 13 behavioral characteristics of learners in the specialist category, from which it can be seen that the characteristics of learners in the specialist category with an average weight of 0.3 or more are
Figure 6 shows the average weights of the 13 characteristics in the undergraduate category learners, from which it can be seen that the characteristics with an average weight of 0.3 or more in the undergraduate category learners are
The smart classroom teaching model based on the theory of blended learning is an organic association of teachers, students and smart technology (hardware equipment and software systems). In terms of specific course practice, a smart classroom must have two basic conditions: one is a smart teaching method, and the other is a smart system technology.2017 is known as the first year of the industrialization of artificial intelligence, and smart technology lays the cornerstone of the development of smart education. With the support of artificial intelligence, emerging rich media technology, and intelligent sensing technology, the smart classroom will give learners the opportunity to create a smart environment that supports smart learning, and the extended dimensions of the smart classroom are shown in Figure 7.

The extension dimension of the intelligent classroom
The cultivation goal of smart teaching includes training students’ software operation ability, case analysis ability and graphic design expression ability as well as the cultivation of innovation consciousness and planning thinking. Teaching practice by utilizing smart classrooms is an important way to cultivate higher-order thinking abilities and innovative and entrepreneurial talents. Wisdom teaching makes full use of the online platform of the Internet and the statistical data of the wisdom system to establish a wisdom space environment, uses technologies such as big data, cloud computing and artificial intelligence to form personalized and ubiquitous teaching activities, and uses multiple evaluations, data evaluations, and wisdom scoring systems to pay attention to the wisdom development of the students, and the teaching effect of which is bound to show the emergence of information technology-enhanced wisdom talents. I have constructed a teaching model for a computer graphic design course based on blended learning theory, based on the above requirements of wisdom teaching effect, as shown in Fig. 8.

Teaching mode design in class
In the pre-class session, teachers formulate thematic study plans and learning contents, make short videos of course previews, and push learning materials to students’ smart mobile terminals. At the same time, the cloud data system understands the students’ pre-study situation and analyzes the key points and difficulties of teaching, accordingly fine-tuning the learning content and learning strategies, and formulating the content of classroom discussion. Students download the materials and pre-study the course content before class and solve the learning difficulties independently through the Internet and offline platforms, and spontaneously form learning groups and set personalized learning goals while transmitting the pre-study results to the cloud-based intelligent platform.
The in-class session is divided into two parts: blended teaching and intelligent teaching. In blended teaching, the teacher divides the class into learning groups of four students and equally adjusts the overall learning abilities of each group. Specific tasks are formulated according to the research objectives discussed by the students before class, the teaching plan of the teacher’s syllabus, and the intelligent assignment of the learning system. Teachers and the intelligent tutor system work together to formulate the learning content, change the learning strategies, and review and grade for the students in the process of interaction, and at the end of each lesson, teachers and students work together to summarize and give feedback on the whole lesson.
At the end of each lesson, teachers and students summarize the knowledge map of the lesson and select outstanding works for online display and offline communication, so as to prepare for participating in disciplinary competitions and innovation and entrepreneurship projects. Finally, the teacher will reflect on and summarize the entire teaching process.
The construction of the model is a new exploration of the use of blended learning theory to break the traditional one-way face-to-face classroom and establish a hybrid teaching strategy based on Internetbased online learning and based on the smart classroom. As Prof. He Kexiang said: not only to play the leading role of teachers to guide, inspire and monitor the teaching process, but also to fully realize the initiative, enthusiasm and creativity of students as the main body of the learning process.
By using machine learning algorithms to mine learners’ Western music teaching data, and adopting logistic regression algorithms in machine learning methods to predict students’ Western music theory learning performance from students’ learning behavioral characteristics, a Western theory teaching model is constructed based on the above. The main summary is as follows: through the use of Pearson correlation coefficient and significance p-value to the degree of correlation between each indicator and achievement, the results show that the number of times of book borrowing and the average course theoretical scores correlation analysis, Pearson correlation coefficient of 0.435, significance test p-value of 1.74e-43, it can be seen that the number of times of students’ book borrowing and the average course scores there is a significant correlation. Additionally, the frequency of early wake-ups, the duration of internet access, and the volume of internet traffic have a significant correlation with the average grade of the students’ course. The learners were divided into two categories: specialist and undergraduate, and the influencing factor analysis was conducted by combining the learners’ behavioral activity characteristics to identify 13 behavioral attributes associated with the learners’ Western music learning activities. The logistic regression model was used to model and predict the analysis of the two types of learners and obtain experimental results. The results showed that the characteristics common to both specialties and undergraduates were professional skill level, music learning skills, and independent music learning ability, indicating that the above three behaviors had a high influence on the learners’ Western music learning ability. In summary, the Western music theory teaching model was constructed by combining predictions of
students’ performance and influential characteristics.
