A Study of the Evolution of Compositional Techniques Applying Time Series Analysis
Online veröffentlicht: 21. März 2025
Eingereicht: 04. Nov. 2024
Akzeptiert: 07. Feb. 2025
DOI: https://doi.org/10.2478/amns-2025-0604
Schlüsselwörter
© 2025 Han Li, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Time series analysis is a statistical method used to study the pattern of change of data over time. It usually consists of three main components: trend, seasonality, and stochasticity. Trend is the overall trend of the time series in the long term; seasonality is the repeated cyclical changes of the time series in a shorter period of time [1-4]. Stochasticity is unexplained random fluctuations in a time series. It is used to analyze a series of data points arranged in chronological order in order to reveal the characteristics of trend, seasonality, periodicity, and stochasticity in the data, and is widely used in many fields such as economics, finance, meteorology, and engineering [5-8].
Composition is a very ancient and immensely subtle art that has had a profound influence throughout history. Traditional compositional techniques refer to those methods of composition that have been preserved through much history [9-11]. Compositional techniques cover a wide range of content, such as melody, harmony, tonality, rhythm, meter, weaving, orchestration, polyphony and other types of techniques, and all the elements in music are techniques that composers may use when composing [12-15]. These methods often highlight the wisdom of the artists, the content of which can be seen in different cultures. In the field of contemporary music, the study of the sublimation and innovation of song composition techniques is a task that cannot be ignored. With the continuous development of music technology and the gradual expansion of music market, song composition technique has become an important part of songwriting [16-19].
The article first studies and explains the establishment of the twelve-tone system. Secondly, it analyzes the evolution of Schoenberg’s twelve-tone system and the inevitable trend of twelve-tone composition techniques. Then two types of LSTM neural network models are constructed, which are 1-layer LSTM network structure and 2-layer LSTM network structure. Meanwhile, in the process of model training and parameterization, the model parameters epoch, time_step, units and batch_size are determined. The article further elaborates the principle and application of linear prediction-based ARIMA model. Finally, music playback data from different compositional techniques is used to experimentally test and verify the effectiveness of the two time series prediction models proposed in this article.
After a period of exploration and accumulation, Schoenberg established the twelve-tone technique in 1921. A creative technique is Schoenberg’s twelve-tone technique, which references and modifies the chromaticism of the twelve tones in good tuning. Creation involves designing the initial sequence, a prototype twelve-tone sequence. The prototype is modified by mirroring it, evolving into forms such as reflection and retrograde. Schoenberg called his own twelve-tone sequence the ‘basic group’ of musical material. Also, only one base set may be used in the work. When used, the tones should be displayed in sequence according to the prototype. The principle of such a system cannot be separated from a long study of Schoenberg’s twelve tones. A work very close to the twelve-tone structure is Jacob’s Ladder, in which Schoenberg chose two six-note sets for the full chromatic scale display, one for vertical harmonization and the other for fixed patterns. This format was Schoenberg’s attempt to create a twelve-tone system that was very close to a twelve-tone structure, but still fell short of eliminating the note centers. Schoenberg discovered the twelve-tone group’s structure over time to repeat each note as late as possible. Schoenberg’s ultimate quest for a twelve-tone sequence was the Serenade Op.24 and the Five Piano Sketches Op.23. Schoenberg’s twelve-tone era had arrived until the release of the Piano Suite Op.25. Consisting of six pieces, the Suite for Piano Op.25 is based on the basic structure of various 18th-century dance forms. Schoenberg eventually established the twelve-tone technique, using four operations in the sequence, retrograde, displacement, retrograde reflex and reflex. Two sets of secondary relationships precede the sequence prototype (P), in the first song of the Piano Suite Op. 25 (Example 1), followed by a three-whole-tone displacement. Based on the sequence prototype (P), Schoenberg formed the sequence form, the eight basic sequences through reflection (I), retrograde (R), and reflection retrograde (RI) operations, and then through triple whole-tone displacement. Schoenberg’s compositions incorporate the intensity, timbre, and rhythm of traditional music, while strictly adhering to the established twelve-tone sequence, resulting in a deeper emotional expression of the melody.
The influence of late Romanticism on Schoenberg
Traditional tonality gradually reached its limits at the end of the 19th century, and the needs of composers could no longer be met by tonal thinking and the traditional tonal system, of which Arnold Schoenberg was one. In his early works Schoenberg continued to write in the musical language of late Romanticism, and at its premiere in 1905 the symphonic poem “Pelléas et Mélisande” was not accepted by the public. But six years later, every part of the work was colorfully retouched, and the world sought out the piece. At the same time, the chords were hidden, as well as the functional effects of the triadic progression, and at this time the work was not yet completely atonal, the relationship between the tones was still close. Even in the work of Chamber Symphony No. 1 of the same period, it is still the result of the combination of chromatic parts. Furthermore, the traditional superimposition of thirds in the chord structure, which Schoenberg replaced with a fourth overlapping chord format, is not evident in the tonal character of the work, allowing for a mixture of traditional tonality and collective thinking, a hidden timbre developing in the pitch relationships between chords. Both works use exaggeration to intensify the musical expression of intent, and both continue to use a tonal musical language.
The Transition to Tuneless Musical Thinking
By the end of the 19th century, Schoenberg had advanced various methods of color realization to the pinnacle of creative practice. Around 1909, Schoenberg began to shift towards atonal music composition, gradually shifting to creative thinking that negated tonality. Releasing each pitch of the work independently is the so-called atonal music, breaking the tonality-centered musical structure. At the time, this non-traditional musical concept had a huge impact. Schoenberg sought atonal music through a variety of musical means during this time. However, Schoenberg was willing to accept the term “atonal”, but refused to recognize his music as “atonal”. The development of music based on a certain number of pitches is the basis of Schoenberg’s “atonal” approach to music making, which takes pitch intervals as elements and breaks them down and expands them. More ways of presenting the music are allowed through this creative approach. In the piece “Air Garden Piece” op.15, the basic tone pattern theme is in a set of four tones, with major and minor thirds being the pitch relationships for this pitch. The prototype changes in descending order of major second, and the melody changes according to the quartet, and the music develops accordingly. The sixth piece of The Garden in the Sky has a core group of three chords, with major and minor seconds as the pitch relations. This pitch-related group runs through the whole piece, and the horizontal development of the melodic texture of the whole piece under such a pitch framework is accomplished in this way, and then through the extension technique of variation. The vertical accompaniment expresses pitch relations of the major and minor second in different ways, and the pitch relation group is hidden in the melodic texture. Schoenberg’s important compositional technique is the use of regular intervals during this period, which are expressed throughout the work, and the degree relationships of the pitch groups are used throughout the work, thus replacing the important role of tonality in the era of atonal music.
Fully connected neural network
A fully connected neural network is an artificial neural network model with a simple structure that is easy to build. It consists of an input layer, an output layer and several hidden layers, each layer of neural network has several neurons, the neurons between layers are all connected to each other, each connection has a weight value, and the neurons in the same layer are not connected to each other. In the training process, the output of the hidden layer will be used as the input of the next layer until the output layer.
RNN Neural Network
RNN is a neural network model with memristive function. Unlike fully connected neural networks, RNN introduces temporal information into the network, making it capable of handling time series data. In traditional neural networks, the training is from the input layer to the hidden layer and then to the output layer, the layers are fully connected to each other, and the nodes between each layer are unconnected, which means that in traditional networks, the elements are independent of each other, and the inputs and outputs are also independent.
LSTM Neural Network
LSTM is a special kind of recurrent neural network, which mainly solves the problem of gradient disappearance and gradient explosion phenomenon that is easy to occur in traditional RNN due to the long sequence during training, so as to make the model better able to learn long sequence data.
LSTM network structure
A traditional recurrent neural network has only one state s in its recurrent layer, which is very sensitive to short-term inputs but is unable to memorize long-term inputs. Then, adding a memory unit
The network structure of LSTM consists of an input layer, a hidden layer, and an output layer. LSTM improves the ability to learn longer sequences thanks to the modified special structure, the key point is the difference in the hidden layer of LSTM. In the original RNN, there is only one computational module in the hidden layer, while LSTM adds a state module, called cell state, to the original computational module, which is used to characterize the process of change added in neurons, i.e., the process of change in the hidden layer. Together, the computational module and the cellular state constitute the memory module of LSTM.
Computation of LSTM network
LSTM is a neural network model widely used in sequential data processing and consists of multiple LSTM cells, each of which contains a core cell state throughout the chained structure of the LSTM. The LSTM has the ability to add or remove information to or from the cell state, which is accomplished by means of a gate structure. The gate structure is a mechanism that can selectively allow information to pass through and consists of a sigmoid neural network layer and a point-by-point multiplication operation. The sigmoid layer outputs a number between 0 and 1, which is used to control the extent to which information is allowed to pass in each component. When the output is 0, it means that no information is allowed to pass, while an output of 1 means that all information is allowed to pass. The LSTM has three gate structures, the forgetting gate, the input gate, and the output gate, which are used to protect and control the state of the unit.
The forgetting gate controls which information can be forgotten from memory unit
Where,
The input gate can decide to control which information can enter the memory cell
Where,
The output gate is used to control the output degree of the current information. The calculation of the output gate is shown in equation (3):
The calculation of the current moment memory cell
Eq. denotes the cell state of the input at the current moment, while the long-term memory
Where,
The current moment output
LSTM network structure design
The network structure of the LSTM model generally includes an input layer, an LSTM layer, a fully connected layer, and an output layer, where important parameters include the number of neurons in the input layer, the number of LSTM layers, the number of neurons in the LSTM layer, the number of neurons in the fully connected layer, and the number of unit nodes in the output layer [20]. In this paper, two network structures are designed which differ in the number of LSTM layers. For the LSTM model network structure built for time series, the LSTM network structure is shown in Figure 1.

LSTM network structure
LSTM model modeling process
In this paper, based on the training process of traditional machine model and LSTM model, the LSTM model modeling process is designed. The LSTM model modeling process is shown in Figure 2.

LSTM modeling process
The modeling process includes the steps of data preprocessing, training LSTM model and determining the best LSTM model. Among them, in the data preprocessing stage in order to eliminate the differences between the data, the original data are normalized to the [0, 1] interval using max-min global normalization method, and the calculation formula is shown in (7):
Where,
The data were then divided into training and test sets using a consistent 8:2 ratio for dataset division. Finally, multiple models are trained separately, and the optimal values of each parameter are selected according to the MAPE values to determine the hyperparameters of the LSTM model.
In addition to the hyperparameters such as the network structure, the training parameters are also important factors affecting the final prediction accuracy, so the optimal training parameter values are determined by conducting training parameter experiments and analyzing the relationship between the loss function and each parameter during the training of the LSTM model [21]. The training parameters include the number of training rounds (epoch) and the number of datasets input to the LSTM model for computation (batch_size). In this paper, we consider that too large a value of epoch will cause the model to overfit the training data and the training time will also become longer, and too small an epoch will result in underfitting. Therefore, instead of directly determining the size of epoch, the number of epoch is set to a range of values, i.e., between 1 and 100, which effectively prevents the situation of how large and how small the epoch is.
A time series is a collection of measurements of a particular phenomenon/thing that are arranged in a chronological order. Each measurement in the time series is affected by a variety of different factors, thus forming a time series with an inherent pattern. Time series forecasting is the study of a time series to infer the possibility of future changes, trends, and rules, and then accordingly modeling and forecasting the future changes of the series.
The ARMA model is an autoregressive sliding average model. For a set of smooth sequences {
When
ARMA model can only deal with smooth data, which requires that the mean, variance and self-covariance of {
The method of time series forecasting using STEPl Smoothness Processing The time series data is first tested for smoothness, if the time series data is non-smooth, the data is first smoothened, i.e., the difference is repeated and the number of differences is noted as STEP2 Model Identification STEP3 model fixed order Model ordering i.e. determining the STEP4 Parameter Estimation. After completing the preliminary determination of the model order, it is necessary to try to take the coefficients of each model under different accuracy requirements STEP5 Model Diagnosis After completing the above four steps, model diagnosis is used to determine the appropriateness of the model. The main point here is to test its independence, i.e., to test whether the residual series { STEP6 Model Prediction After passing the model after the significance test, the
In this section of the experiment, the playback volume of different compositional techniques on a music platform is used as a dataset, and the evolutionary trend of compositional techniques is explored from the experimental results by applying the model proposed in this paper for predictive analysis.
A time series is a collection of data collected at different points in time at constant intervals, and these collections are analyzed to understand long-term trends and predict future data values. There are three basic characteristics of time series problems: (1) The assumption that trends in the development of things will extend into the future. (2) The data on which the predictions are based are irregular. (3) Does not take into account the causal relationship between the development of things. It may contain some kind of components that are trending, cyclical, and random. Depending on whether the trend is linear or not, the problem is transformed into a fit of a straight line or a fit of a curve. Music datasets have distinct time series characteristics and provide a rich history of user actions. The problem of predicting music popularity trends can be analyzed and studied from a time series perspective.
Analyzing the data, it can be seen that the study of the evolution of compositional techniques in this paper involves the problem of time series, therefore, this paper will construct a prediction model from the perspective of time series for the study. Long Short-Term Memory (LSTM) network is an important structure of recurrent neural network, the main use is to process and predict sequence data. LSTM is suitable for time series prediction problems to generate paths with long time continuous flow of gradients, is explicitly designed to solve long term dependency problems, has the advantage of remembering information for a long period of time, and the cumulative time scale can be changed dynamically. Based on the advantages of LSTM, we will carry out a study on LSTM time series forecasting for music popularity trend prediction in this paper.
In addition, ARIMA is a popular and widely used statistical method for time series forecasting, which explicitly caters to a standard set of structures in time series data and provides a simple yet powerful method for effective time series forecasting. Unlike long- and short-term memory networks the differential autoregressive moving average method is suitable for short-term forecasting studies, and in order to make the compositional technique evolution forecasting study more comprehensive, this paper will also construct an ARIMA time series forecasting model to study the evolutionary popularity of compositional techniques.
Among recurrent neural networks, LSTM has attracted a lot of attention because of its powerful ability to predict time series problems. LSTM is implemented in various open source deep learning frameworks, such as Tensorflow, PyTorch, MxNet. Keras, as a high-level wrapper, implements a very developer-friendly interface. Based on this, this paper uses Tensorflow as the backend and Keras as a high-level encapsulation interface.
Data Preprocessing
Taking the first dataset as an example, the input to the model here is the processed data in the user_actions and songs_artists tables. Because it is a prediction of the evolution of compositional techniques, this section adopts the method of directly counting the attribute of the number of plays for each compositional technique. Taking the music dataset as an example, we plot and observe the trend of music playback for different compositional techniques, and “slide” the training set of the LSTM network by taking the daily playback, the mean of playback, and the variance of playback for three consecutive days as a sample at one point in time.
Structure of the model
The structure of the model is as follows:
The input layer contains three neurons, which represent the melody, harmony, and tune in the compositional technique. The first hidden layer is an LSTM structural unit with 35 LSTM units. The second hidden layer is an LSTM structural unit and contains 10 LSTM units. The output layer contains 3 neurons, which represent playback volume, playback mean, and playback variance, respectively.
Take the first dataset as an example, the curves of play volume, download volume, and collection volume of some randomly selected composition techniques in these 180 days. The curves of playback volume, download volume, and collection volume are shown in Figure 3. The purple curve represents the play volume, the green curve represents the download volume, and the red curve represents the collection volume.

The graph of playback, downloads, and collection
Prediction results and analysis
The model predicts the playback volume of the following two months based on the playback volume of the first 180 days, and the curve comparison between the predicted curve and the real playback volume is shown in Figure 4. Among them, the purple curve represents the actual playback volume, while the green curve represents the predicted playback volume. From the analysis of the comparison graph, it can be seen that for most of the composing techniques, the model is able to predict their playback volume better. For certain data with particularly large fluctuations, such as the sixth prediction, the model underperforms. The others are not much different from the actual playback volume. In conclusion, the LSTM-based time series predicts the evolution trend of compositional techniques better.

The prediction curve is compared to the curve of the actual playback
The ARMA model is for smooth time series problems, while for non-smooth time series, it is necessary to differentiate the original time series one or more times to form a new series before using the ARMA(p,q) model. Therefore, the time series X1,X2,…,Xn is formed after one difference processing of the series. The scatter plot of the data after one difference processing is shown in Fig. 5. The horizontal coordinate in the graph represents the serial number of the time series values, while the vertical coordinate represents the value of the time series. From the figure, it can be seen that the data has stabilized after one difference in processing. Therefore, when using ARIMA (p,d,q) model, the value of parameter d should be 1, which means that the time series data will be stabilized after a differential processing. To determine the values of p and q in the ARIMA(p,d,q) model, one must look at the trailing or truncated condition of the autocorrelation and partial autocorrelation plots.

A data scatter diagram after a difference treatment
The autocorrelation series plot and partial autocorrelation series plot of the differentially processed data are shown in Fig. 6 and Fig. 7. From Fig. 6, it can be seen that the autocorrelation series plot appears to be trailing since item 6, so the value of parameter P should be 6, i.e., the autoregressive order is 6. From Fig. 7, it can be seen that the partial autocorrelation series plot appears to be trailing since item 6, so the value of parameter q should be 6, i.e., the sliding average order order is 6.

Autocorrelation sequence diagram

Partial self-correlation sequence diagram
The ARIMA (6, 1, 6) model was used to predict the evolution of compositional techniques. The predicted and actual outputs are shown in Figure 8. In the figure: the horizontal coordinate is the predicted date. The vertical coordinate is the amount of compositional techniques auditioned. Series 1 is the actual audition volume from September 1, 2023 to September 30, 2023, and Series 2 is the predicted value using the autoregressive moving average model.

Forecast output and actual output
The predicted output is compared with the actual output and the comparison of predicted and actual output is shown in Table 1. According to the table, the average relative error of the predicted auditions from September 1, 2023 to September 30, 2023 is calculated to be 11.75%.
Forecast output is compared with actual output
Time | Actual value | Predictive value | Mean relative error |
---|---|---|---|
2023.9.1~2023.9.5 | 5951,6051,6771,6447,6728 | 5836,5661,5699,5773,5798 | 9.822% |
2023.9.6~2023.9.10 | 6335,6820,5899,6317,7149 | 5875,5898,5770,5769,5744 | 10.132% |
2023.9.11~2023.9.16 | 6758,6878,6480,6632,6021 | 5758,5773,5790,5809,5754 | 12.633% |
2023.9.16~2023.9.20 | 5933,7103,6874,6686,7065 | 5741,5765,5726,5739,5727 | 15.624% |
2023.9.21~2023.9.25 | 7101,6096,6798,6555,6797 | 5719,5697,5715,5683,5715 | 12.326% |
2023.9.26~2023.9.30 | 6902,6338,6291,6070,5962 | 5687,5684,5609,5657,5688 | 9.952% |
In this paper, we study the prediction method of composition technique evolution based on traditional ARIMA model and deep learning LSTM neural network model respectively, and carry out prediction experiments based on the time series of music playback of different types of actual composition techniques, and finally analyze the experimental results to verify the effectiveness and applicability of the method.
The LSTM neural network model is used to predict the evolution of compositional techniques, and it is found that the model in this paper is able to predict the playback volume of most of the compositional techniques well. Using ARIMA to predict the music playback of different compositional techniques, the experimental results show that the average relative error of the predicted auditions from September 1, 2023 to September 30, 2023 is 11.75%.
Existing models and difficult to effectively analyze the relationship between the evolution of compositional techniques, and the effect of deeper mining of related data is not satisfactory. The time series model proposed in this paper, on the other hand, achieves more satisfactory prediction results in the evolution of compositional techniques.