Accès libre

Drama plot development law and its prediction model based on time series analysis

  
25 sept. 2025
À propos de cet article

Citez
Télécharger la couverture

Introduction

With the development of society and people’s in-depth understanding of culture and art, theater performance is an art form that is more and more popular among people. Drama performance is not only a cultural activity, but also a medium to express human emotions and thoughts, through the performance of the actors, it can make the audience resonate and feel the emotions of the characters, so as to better understand and experience the life story of the characters [1-4]. In theater performance, plot is a very important part. The sequence of plot development is directly related to the overall effect of theater performance [5-6]. A good sequence of plot development can let the audience better understand the plot, more deeply into the drama situation, also can better convey the theme of the drama, and let the audience according to the development of the plot to predict the later plot, so that the audience has a deeper understanding and knowledge of the theme [7-10].

In the plot development of a story, logical deduction plays a crucial role. Plot development requires reasonable logical deduction, and logical deduction is the cornerstone of plot development. Whether it is a movie, a novel or a play, logical deduction is needed to keep the coherence and credibility of the story [11-14]. Plot development and logical deduction are important parts of storytelling. In creative writing, writers need to consider the motivations, behaviors, and emotions of the characters as well as their relationships with each other. In this process, logical deduction can help screenwriters clarify their thoughts, determine the development direction of the story, and make the story more realistic and believable [15-17].

In this paper, the ARIMA-LSTM combination model is constructed, which realizes the effective prediction of the development law of drama plot through time series analysis. In this combined model, for the problem of insufficient extraction of effective information of the sequence in univariate time series prediction, the ARIMA model is introduced to predict the long-term trend of univariate sequence, and the LSTM model is used to extract and fit the nonlinear information of the sequence to realize the sufficient extraction of effective information of the univariate sequence. Meanwhile, in order to assess the prediction effect of the constructed model, the ARIMA model, DA-LSTM model and ARIMA-LSTM combination model are respectively fitted for prediction, and then the model in this paper is compared with other models for prediction performance comparison experiments.

ARIMA-LSTM based time series combination forecasting model

A time series is a set of sequences arranged in order of occurrence. Time series prediction models are mainly divided into two categories: linear prediction models and nonlinear prediction models. ARIMA model is better at predicting linear time series, while LSTM neural network model is suitable for predicting nonlinear time series. Therefore, this paper analyzes the development law of drama plot by constructing ARIMA-LSTM combination model, so as to realize the prediction of drama plot development.

ARIMA prediction model
Analysis and Definition of ARIMA Forecasting Models

The ARIMA model consists of 3 components, i.e. autoregressive (AR), difference (I) and moving average (MA) [18].

AR model

AR model means that the value of the current time series is determined only by the value of the past time series. Its mathematical expression is given below: yt=α0+α1yt1+α2yt2++αpytp+εt

where α0 denotes the constant term, α1, α2αp denotes the coefficients of the autoregressive terms of each order, εt denotes the white noise sequences that are independent of each other, yt denotes the current predicted value, and yt−1, yt−2, ⋯ ytp denotes the time series value of the past moment.

MA model

The MA model is independent of the previous time series values, and it is a linear model that predicts the current value by using a linear combination of the weighted average of the current εt and the past q moments of random errors {εt} with the following mathematical expression: yt=εt+θ1εt1++θqεtq

where q is said to be the order moving average model, denoted as MA(q) model, where θ1, θ2θq is the moving average coefficient and {εt} is the sequence of random disturbance terms in different periods. When θ0 = 0, it is called MA(q) model.

ARMA model

The ARMA model is a new model obtained by recombining the AR model and the MA model. The model is a linear model with a combination of previous time series values and residual series. Its mathematical expression is: yt=α0+α1yt1+α2yt2+αpytp+εtβ1εt1β2εt2βqεtq

p is the autoregressive order and q is the sliding average order, which is abbreviated as ARMA (p, q) model. Where, α0, α1, α2, … αp and β1, β2, … βq are the parameters.

ARIMA model

AR, MA, ARMA models are all applicable to smooth stochastic processes, but most of the time series obtained in practice are non-stationary series. In order to solve the non-stationary series thus appear ARIMA model, ARIMA model can be regarded as the ARMA model after differencing, its mathematical expression is: Φ(B)dyt=Θ(B)εt

where d is the difference order, Φ(B) = 1 − ϕ1B − ⋯ − ϕpBp denotes the autoregressive coefficient polynomial, and Θ(B) = 1 − θ1B − ⋯ − θqBq denotes the moving smoothing coefficient polynomial.

Construction of the ARIMA model

The construction process of ARIMA model is shown in Figure 1, which can be divided into several steps of time series smoothness detection, non-smooth series processing, pattern recognition, model optimization and model testing.

Smoothness detection of time series

Smoothness detection is to determine whether the current time series in the subsequent period of time to the continuation of the existing trend, and its detection methods are mainly graphical test and ADF single-root test. Graphical test is to observe the time series of the time series plot, if the time series around a certain mean value of random fluctuations, it is proved that the sequence is a smooth sequence, otherwise it is a non-smooth sequence. ADF single-root test is the first assumption that the sequence is the existence of a unit root, if the test statistics obtained through the ADF test is much less than 1% of the corresponding critical value, it shows that the time series is smooth.

Non-stationary series processing

After passing the smoothness test of the time series, if the time series is non-smooth, the difference operation should be carried out.

Pure randomness test

Pattern recognition is the selection of a model that better fits the trend of the time series from AR, MA, ARMA models, which is determined by using the nature of autocorrelation coefficient and partial autocorrelation coefficient. If the partial autocorrelation function truncates at p and the autocorrelation function tends to zero, the AR(p) model should be selected. If the autocorrelation function truncates at q and the value of the partial correlation function tends to zero, the mobile regression MA(q) model should be selected. If neither the autocorrelation nor the partial correlation function is truncated and the values of the functions tend to zero, the ARMA(p, q) model should be chosen. However, there may be various combinations of the values of p and q derived by this method, and it is generally necessary to test several times to obtain the optimal combination, thus obtaining the optimal model.

Model optimization

Model optimization is generally through the comparison of AIC and BIC values, and then select the optimal model.

In general, AIC is defined as: AIC=2k2ln(L)

Figure 1.

Flowchart of ARIMA model construction

where k is the number of model parameters and L is the likelihood function.

In general, BIC is defined as: BIC=kln(n)2ln(L)

where K, L has the same meaning as AIC and n is the number of samples.

The AIC criterion is to obtain the optimal parameters of the model by finding a balance between the complexity of the model and the likelihood function, while the BIC increases the number of samples on the basis of the AIC, which can be prevented when the number of samples is too large in order to improve the accuracy, thus increasing the complexity of the model. Thus the BIC criterion is more suitable for large sample size.

Model Testing

Model testing is to test the model for significance and parameter testing. The significance test of the model is to check whether the model can fully explain the correlation of the time series, if the residual series of the time series is a white noise series, it passes the test, otherwise, it needs to be re-fitted and verified by the model.

Model Prediction

Input the time interval to be predicted for prediction analysis.

Long and Short Term Memory (LSTM) Neural Network Modeling

Long Short-Term Memory Neural Network (LSTM) is an enhanced structure of Recurrent Neural Network (RNN), which introduces the concept of cell state and adopts the mechanism of controlling gates, which can effectively overcome the shortcomings of RNN by designing the gate structure to enhance or remove the ability of the cell state to process the information [19]. The structure of the LSTM neural network is shown in Fig. 2.

Figure 2.

LSTM neural network structure diagram

The LSTM neural network is designed with three gate mechanisms, the output gate, the input gate and the forgetting gate, to control the cell state.

The role of the forgetting gate is to control whether the information of the current memory cell can be transmitted to the next memory cell. Its specific realization steps are: using the memory cell output value ht−1 of the previous moment and the memory cell input value xt of the current moment, through the combination of their respective weights as the input value of the current moment, and then through the sigmoid function to obtain the value of the value of the domain of [0,1], which is used to represent the size of the memory cell state of the previous moment has been forgotten, and the smaller the value represents the greater the degree of forgetfulness. The calculation method of the forgetting gate is expressed as follows: ft=σ(Ufht1+Wfxt+bf)

where f, i, o denotes the forgetting gate, the input gate, and the output gate, respectively, and correspondingly, U, W, b shall be a different parameter in different gates.

The input gate is mainly to screen the input information, i.e., selectively discard certain information at the time of input, and generate the information that needs to be updated at the current moment. The specific realization process is to use the output ht−1 of the memory cell at the previous moment and the input value xt of the memory cell to judge those information is used for updating through sigmoid function, followed by obtaining the candidate vector C˜t through the tanh function, and then it is necessary to further fuse the cell state and C˜t to get the latest cell state, which involves the following equations: it=σ(Uiht1+Wixt+bi) C˜t=tanh(Uxht1+Wxxt+bx) Ct=ftCt1+itC˜t

The output gate determines the output of the model, the output of the output gate is mainly composed of a combination of the sigmoid function and the tanh function, firstly, the output value Ot is obtained through the sigmoid function, and then the value is compressed to a value between -1 and 1 through the tanh function, to make a selection of those information that will be remembered by the current memory unit from being passed to the next memory unit. The formulas involved are as follows: ot=σ(Uoht1+Woxt+bo) ht=ottanh(Ct)

ARIMA-LSTM combined model construction
Predictive model combination approach

Using different prediction models to predict the same sequence, different models obtain effective information from different perspectives, and their prediction results contain different effective information, so the use of multiple models from multiple perspectives to mine more effective information can avoid the limitations of a single model in obtaining information. The combination of prediction models is broadly categorized into two types: series-connected and parallel-connected.

In the series combination of prediction models, the input sequence of the current prediction model is the output sequence of the previous prediction model, and the last prediction model completes the output of the final prediction result.

In the parallel combination of forecasting models, the forecasting results of different forecasting models do not interfere with each other, and the forecasting results of different models are superimposed using the weighted combination method, and the weighted combination of sequences is the forecasting result of the combination model. Assuming that n single forecasting models are used for time series forecasting, the forecasting results of the combined model satisfy the following formula: Y=w1y1+w2y2++wnyn Y=f(y1,y2,,yn)

Where wi(i = 1, 2, ⋯, n) is the weighting coefficient of the ind prediction model, and i=1nwi=1,yi(i=1,2,,n) is the predicted value of the ith prediction model, and f is the nonlinear function. Eq. (13) and Eq. (14) are the mathematical expressions for the linear combination of prediction results and nonlinear combination of prediction results, respectively. The combination of prediction models used in this paper is a linear combination.

Generally speaking, the size of the weighting coefficients of different components in the combined prediction model is negatively correlated with its prediction error, and the prediction model with the larger error will be given a smaller weighting coefficient. According to the different calculation methods of the weighting coefficients of the combination prediction model, the combination prediction model can be divided into optimal combination prediction model and non-optimal combination prediction model. The optimal combination prediction model determines the weighting coefficients of each prediction model according to the principle of minimum combination prediction model prediction. In this paper, the weighted least squares method is used to calculate the weighting coefficients of different models, and the mathematical description of the weighting coefficient calculation process is as follows:

Assuming that the n forecasting models are modeled in parallel combination to forecast the same time series, the forecasting results of the combination model can be expressed as: Yt=i=1nwiy^it

Where, Yt is the predicted value of the combined model at the t moment, wi is the weighting coefficient of the ith prediction model, and satisfies i=1nwi=1 , y^ii is the predicted value of the ith model at the t moment.

The prediction error of the combined model at the moment t can be expressed as: et = ytYt=i=1nwie^it = ([w1,w2,,wn][e^1t,e^2t,,e^nt]T)

where yi is the observed value of the original sequence t moment, and e^it is the prediction error of the ith model t moment.

The squared value of the prediction error of the combined model at the tth moment can be expressed as: R = (i=1nwie^it)2 = ([w1,w2,,wn][e^1t,e^2t,,e^nt]T)2 = [ w1 w2 wn]T[ e^1t2 e^1te^2t e^1te^nt e^2te^1t e^2t2 e^2te^nt e^nte^1t e^nte^2t e^nt2][ w1 w2 wn]

where [w1,w2,,wn]T is the vector of weighting coefficients, denoted as w . The sum of squares of the prediction errors of the t-moment combination model is further found to be: S=t=1NR=[ w1 w2 wn]T[ t=1Ne^1t2 t=1Ne^1te^2t t=1Ne^1te^nt t=1Ne^2te^1t t=1Ne^2t2 t=1Ne^2te^nt t=1Ne^nte^1t t=1Ne^nte^1t t=1Ne^nt2][ w1 w2 wn]

Let the error information matrix of the combined model be E, then we have: E=[ E11 E12 E1n E21 E22 En En1 En2 Enn]

Thus the sum of squares of the prediction errors of the combined model at moment t can be expressed as: S=wTEw

Because the combined prediction model in this paper adopts parallel combination, the modeling prediction process of each prediction model does not interfere with each other, so when the constraints are i=1nwi=1 , there exists a vector of weighting coefficients w¯ so that the sum of squares of prediction errors s of the combined model can reach the minimum value smin, and at this time, the combined prediction model is the optimal weighted prediction model.

ARIMA-LSTM Combined Modeling

In this paper, we use the parallel weighted combination of two prediction methods, ARIMA model and DA-LSTM model [20] which introduces a two-stage attention mechanism, and calculate the weighting coefficients of the predicted values of both of them using the weighted least squares method, and the weighting coefficients are calculated as follows:

Let the error vector of the ARIMA model be E1=[e^11,e^12,,e^1N]T and the error vector of the DA-LSTM model be E2=[e^21,e^22,,e^2N]T . From equation (19), the error information matrix E of the combined forecasting model can be expressed as: E=[ E11 E12 E21 E22]

The existence of a weighting factor w1 for the predicted values of the ARIMA model and a weighting factor w2 for the predicted values of the DA-LSTM model, under constraint w1 + w2 = 1, minimizes the sum of squares of the prediction errors of the combined model, which is obtained from Eq. (20) and Eq. (21) and the expression for the minimum value of the sum of squares of the prediction errors of the combined model is given below: Smin=wTEw=[w1,w2][ E11 E12 E21 E22][ w1 w2]

Simplifying Eq. (22), the weighting coefficients and the sum of squares of the prediction errors of the combined model are formulated as follows: Smin=E11E22E12E11+E222E12 w1=E22E12E11+E222E12 w2=E11E12E11+E122E12

The weighted combination of the two forecasts using the weighting coefficients yields the forecasts of the combined model. The flow of time series forecasting using ARIMA-LSTM combination model is shown in Figure 3 [21].

Figure 3.

Flowchart of combined model prediction

Prediction of drama plot development pattern based on ARIMA-LSTM modeling
Prediction of Drama Plot Development Patterns Based on ARIMA Modeling
Stability tests

ARIMA model prediction requires the sequence to be smooth and stochastic, therefore, a smoothness test and a stochasticity test need to be performed on the sequence before modeling.

The dataset used in this paper comes from the dataset of theater plays obtained from web crawler and is divided into training set and test set in the ratio of 8:2. The time series plot of the data in the training set is plotted as shown in Fig. 4, where the horizontal coordinate indicates the length of the unit time, and the vertical coordinate is the normalized value of the drama plot from the beginning, development, climax, and ending in the range of [0,1]. It can be seen that there is no significant seasonality in the distribution of errors.

Figure 4.

AQI sequence diagram on training set

ADF unit root test is done on this series, and the statistic is D-F=8.1765 with p=1.4058e-12<0.05, which passes the smoothness test, so the series is already a smooth series, by using the adfuller() function in statsmodels package in python. Then do white noise test on the sequence got p=0.00000175<0.05, do not accept the original hypothesis, that is, do not consider this sequence is purely random sequence, that is to say, the sequence contains random information, in line with the modeling requirements of ARIMA model.

Model identification and ordering

There is no obvious seasonal periodicity as can be seen from the above time series plot. Therefore the main thing is to determine the three parameters p, d, q. The order of the model to be built can be determined based on the autocorrelation diagram (ACF) and partial autocorrelation diagram (PACF), or the order can be fixed by the automatic model identification function.

Considering the large amount of data of the drama plot used, this paper uses the BIC criterion to order the model, using the auto-arima function in the pmdarima library in python software to realize the automatic ordering of the model. It is finally determined that the minimum BIC value is 96643.215 when p=3, d=1, q=4. The model can be finally determined as ARIMA(3,1,4).

Residual white noise test

After model identification and parameter estimation, the reasonableness of fitting the identified model needs to be tested. The residual white noise test is going to be used to determine if the model fit is good enough and if there is still valuable information to be extracted. The nature of white noise is E(εt) = 0 that the variables are uncorrelated at different moments. The white noise test is performed on the fitted residual series, observing the Q-Q plot and the autocorrelation plot of the residuals to see whether most of the points on the Q-Q plot fall around the straight line, and to see whether most of the ACF values of the autocorrelation function plot are close to zero, which can also be visualized through the KS test on the residual series to see whether the residuals are in line with the normal distribution. If not, it indicates that the residual detection sequence is a non-white noise sequence, which also indicates that there is still useful information in the original sequence that has not been obtained, and it is also necessary to make corresponding adjustments to the parameters of the model. However, if the residuals pass the white noise test, it indicates that the valuable information in the original sequence has been fully obtained and the model is validated.

The Q-Q diagram of the model test is shown in Figure 5. As can be seen from Fig. 5, most of the points are around a straight line, indicating that the residuals probably conform to a normal distribution. Doing the KS test on the residual series yields p=0.20536106>0.05, which does not reject the original hypothesis, indicating that the residual series conforms to a normal distribution.

Figure 5.

Model check Q-Q chart

The autocorrelation plot of the residual series is shown in Fig. 6, according to Fig. 6 it can be seen that most of the ACF values are very small, indicating that the value of the autocorrelation coefficient is very close to zero.

Figure 6.

Model check autocorrelation graph

Therefore, there is no significant correlation in the residual series from these two plots, i.e., the residual series is a purely random series, indicating that the model passes the diagnostic test.

Model predictions

The ARIMA(3,1,4) model is utilized to predict the data in the test set, and the predicted values are compared with the true values as shown in Figure 7. The overall trend of the real values and the predicted values of the ARIMA(3,1,4) model in Fig. 7 is basically the same, but there is still some prediction error.

Figure 7.

The prediction results of ARIMA model

Prediction of DA-LSTM model with introduction of attention mechanism

The results of the DA-LSTM model introducing the two-stage attention mechanism for the predictions obtained on the test set are shown in Fig. 8. The DA-LSTM network with the addition of the two-stage attention mechanism has better prediction results, but there is still a certain amount of error as in the case of the ARIMA(3,1,4) model.

Figure 8.

DA-LSTM model prediction result

Prediction experiments with ARIMA-LSTM
ARIMA-LSTM experimental results

The crawled obtained drama script dataset is divided into training set and test set in the ratio of 8:2, and the training set ARIMA-LSTM model is used for training, and then the learned content is applied to the test set for drama plot prediction.

The prediction results of the combined ARIMA-LSTM model for the development of drama plots are shown in Figure 9, where the horizontal axis represents the time and the vertical axis represents the normalized value of the plot, which takes the value in the interval [0,1]. The dark cyan solid line represents the true closing price and the orange dashed line represents the predicted value of ARIMA-LSTM for the drama plot. As can be seen in Figure 9, the fluctuation of the real value and the predicted value is basically the same, and only a small difference exists, which indicates that ARIMA-LSTM has a better fitting effect on the development of the real drama plot.

Figure 9.

Prediction result of ARMI-LSTM

Comparative Experimental Analysis

In order to prove that ARIMA-LSTM is suitable for the prediction of drama plot development, four groups of comparative experiments were conducted under the same computer operating environment, namely, RNN, classical LSTM, DA-LSTM, ARIMA, and ARIMA-LSTM, among which the ARIMA-LSTM model is the model chosen in this paper. In order to facilitate the representation of the above four groups of comparison models and ARIMA-LSTM models are renamed as Model1~Model5.

This figure shows the comparison of some of the real drama plot development values and Model1 prediction results as shown in Figure 10. As can be seen in Figure 10, the prediction results of Model1 do not match the trend of the real values, and the difference between the two data is large. Therefore, it can be concluded that due to the Model1 gradient disappearance and gradient explosion problem, Model1 for the real value of the poor fitting effect, poor prediction performance.

Figure 10.

Comparison between the true value and predicted value of Model1

The comparison of some of the drama plot development values and the prediction results of Model2 and Model3 are shown in Fig. 11 and Fig. 12, respectively. As can be seen from the figure, the predicted value trend of Model2 model is relatively large with the real value. And Model3 model in the overall trend fluctuation close to the real value trend, the best fitting effect, but there are still some gaps in some intervals.

Figure 11.

Comparison between the true value and predicted value of Model2

Figure 12.

Comparison between the true value and predicted value of Model3

The comparison of some real drama plot development values and Model4 and Model5 prediction results are shown in Fig. 13 and Fig. 14, respectively. As can be seen from the figures, Model4 and Model5 are basically consistent with the trend of the real value data, and the difference in the data is small, but Model4 is not as good as Model5 in individual positions, so it can be seen that the prediction performance of Model5 is better than the prediction performance of Model4.

Figure 13.

Comparison between the true value and predicted value of Model4

Figure 14.

Comparison between the true value and predicted value of Model5

After Fig. 10-Fig. 14, it can be seen that compared to other models, Model5 predicted the highest fit to the true value, followed by Model4, Model3, Model2 and Modell, respectively, with Model1 having the lowest fit. In order to evaluate the model fitting effect more intuitively, the experimental results were evaluated using three metrics, which are mean absolute percentage error (MAPE), root mean square error (RMSE) and mean absolute error (MAE). The evaluation of the experimental results metrics is shown in Table 1.

Experimental result

Model MAE RMSE MAPE
Modell 0.002459 0.003268 0.002043
Model2 0.000735 0.000987 0.000614
Model3 0.000638 0.000906 0.000547
Model4 0.000624 0.000879 0.000521
Model5 0.000601 0.000852 0.000504

As can be seen from Table 1, Model5 has the smallest MAE, RMSE and MAPE of 0.000601, 0.000852 and 0.000504, respectively. Model1 achieves the maximum value of 0.002459, 0.003268 and 0.002043 for MAE, RMSE and MAPE, respectively. The maximum values of MAE, RMSE and MAPE are obtained from the The order of MAE, RMSE and MAPE from high to low is Model1, Model2, Model3, Model4 and Model5, respectively. Due to the problem of gradient vanishing and gradient explosion in RNN model, the MAE, RMSE and MAPE of Model1 have the largest error and the worst prediction performance among all models. The MAE, RMSE and MAPE of Model2 are 0.000735, 0.000735, 0.003268 and 0.002043, respectively, and all achieve the maximum value. 0.000735, 0.000987 and 0.000614. Model3 is an improved model for LSTM, which improves the prediction performance of the neural network by introducing a two-stage attention mechanism, so that Model3 reduces the MAE, RMSE and MAPE by 13.20%, 8.21% and 10.91%, respectively, relative to Model2. Model4 reduces in MAE, RMSE and MAPE compared to Model3, which indicates that ARIMA also has some advantages in time series. Model5 is better than Model4 in MAE, RMSE and MAPE, and the results show that combining ARIMA model with DA-LSTM model can improve the prediction accuracy. It can be seen that the prediction performance of ARIMA-LSTM is better than other models no matter for MAE, RMSE, or MAPE, so ARIMA-LSTM model is more superior in this experiment.

Conclusion

In this paper, a combined ARIMA-LSTM prediction model based on time series analysis is constructed to realize the prediction of the development pattern of drama plot, and the prediction performance of the model is evaluated.

The smoothness test is performed on the training set data, and there is no obvious seasonality in its error distribution. By ordering the model, the ARIMA model was finally determined as ARIMA(3,1,4). The Q-Q plot of this model for residual white noise test has most of the points around a straight line, while doing KS test on the residual series gets p=0.20536106>0.05, which does not reject the original hypothesis, indicating that the residual series conforms to the normal distribution. And the overall trend of the true value and ARIMA(3,1,4) model predicted value and the true value is basically the same, but there is still some error. Similarly, the DA-LSTM model predicted values are largely fitted to the true values, but there is also a little error. Using the ARIMA-LSTM combination model for drama plot development prediction experiments, the predicted values are almost completely fitted to the true values, indicating that the time series prediction performance of the constructed combination model is better.

In addition, comparing the RNN, classical LSTM, DA-LSTM and ARIMA models, the ARIMA-LSTM combined prediction model has the best fitting effect, and the MAE, RMSE and MAPE of the model have achieved the minimum values of 0.000601, 0.000852 and 0.000504, respectively, which proves that the model of this paper is applied to the prediction of the development law of the drama plot. The validity of this paper’s model is proved.