Open Access

Data-driven Multiple Regression Analysis of Teaching Mode Innovation and Teaching Quality of English Education in Colleges and Universities Based on Data

  
Sep 26, 2025

Cite
Download Cover

Introduction

Innovative teaching mode is the key to improve teaching quality and cultivate students’ core competitiveness, exploring and applying data-driven teaching mode innovation becomes crucial, through data analysis teachers can deeply understand students’ learning needs, learning progress and learning styles, so as to target the design and adjustment of teaching content and methods, personalized teaching becomes possible to help each student to achieve a better learning effect [1- 2]. However, to realize the innovation of teaching mode, we need to overcome various challenges, continuously improve the technical infrastructure, ensure data security and privacy protection, and improve the professionalism and training level of teachers. Only in this way can colleges and universities meet the challenges in the field of education and create a more valuable educational environment for students’ learning and development [3-4].

Data analysis is of great significance in English teaching in colleges and universities. By collecting, organizing and analyzing a large amount of students’ learning data, educational institutions and teachers can gain a deeper understanding of students’ learning needs and behavioral patterns, so as to carry out more precise and personalized teaching [5-6]. By analyzing students’ learning data, teachers can reveal students’ bottlenecks in English learning, understand students’ data such as learning time, use of learning materials, and answering questions, which helps teachers accurately grasp students’ learning needs and design teaching content and activities in a targeted way to meet students’ learning needs [7-9]. Secondly, data analysis can reveal students’ learning behavior patterns. By analyzing students’ clicking and browsing records on online learning platforms, teachers can understand students’ preferences for different types of resources, and then optimize the selection and presentation of teaching resources. By analyzing students’ learning behaviors, they can also find out students’ independent learning ability and motivation to learn, which can provide targeted guidance and motivation strategies for teachers [10-12]. Therefore, by deeply exploring students’ learning needs and behavioral patterns, teachers can implement personalized teaching and provide accurate learning guidance, thus improving students’ learning effects and English proficiency, and data analysis also provides teachers with a basis for teaching evaluation and adjustment, promoting the continuous improvement of education quality [13-14].

In recent years, with the rapid development of computer and network technology, digital education has been gradually spreading across the country. In 2020, a sudden epidemic hampered normal offline teaching and learning activities across the country [15]. In this context, “online teaching” has been rapidly developed. “Online teaching” is a new type of teaching method that uses the Internet, multimedia, and a variety of interactive means to teach and interact systematically, and “online” means that all of learners’ teaching and learning activities are carried out on the platform, that is, in the network [16-17]. During the online teaching period, the teachers’ own lack of online teaching experience and insufficient information technology capabilities have led to the fact that their assessment of students’ learning status can only be obtained through online interaction and assignments. Obviously, such evaluation is far from playing its due value and role, which is bound to seriously affect the quality of online teaching. However, the quality of online teaching not only depends on teachers’ online teaching design, but also on the quality of online teaching evaluation [18-19]. It can be said that teaching evaluation throughout the whole process of online teaching plays a decisive role in the quality of the whole university English online teaching. Therefore, it is of great significance to track, monitor and analyze the quality of university English online teaching by using big data, effectively promote the quality of university English online teaching, and strive to realize the “substantial equivalence” between online teaching and offline classroom teaching quality [20].

Scholars first analyze the innovation of English teaching from a technological perspective. Literature [21] introduces the principal component analysis method to reduce the dimensionality of evaluation indexes in the English online teaching quality assessment model, which effectively improves the performance of the model’s English teaching evaluation. Literature [22] systematically reviewed the research literature related to English digital teaching and affirmed the positive role played by digital information technology in promoting the improvement of English teaching. Literature [23] constructed a computerized English teaching system based on C++ and Windows technology tools, which to a certain extent promoted the improvement of English effect. Literature [24] conceived an English translation model with neural machine algorithm as the underlying structure, which has higher quality of English translation and shows good performance in business English translation teaching practice. Literature [25] attempts to integrate convolutional neural network (cnn)-recurrent neural network into immersive situated English teaching scenarios, which promotes students’ English learning interactivity, sense of immersion, and sense of cognition.

Secondly, English teaching innovation is discussed from the perspective of English teaching methods. Literature [26] envisioned an English teaching ecosystem based on big data technology, and in the teaching comparison experiments, it was confirmed that the proposed teaching methods help students reform and innovate their English teaching concepts, methods and contents. Based on the empirical investigation method, literature [27] reveals that the English flipped teaching classroom contributes to the effect of teacher-student interaction and the quality of interaction, and at the same time provides an important reference for English teachers’ practice of flipped classroom.

In order to use big data to promote the innovation of English teaching mode, the article analyzes the current trend of English education and teaching mode innovation in colleges and universities. Subsequently, it takes the learning data of English majors in a college in J province as an example, and carries out preliminary correlation analysis after data preprocessing of the samples. The improved K-means++ algorithm is used to calculate the Euclidean distance of the clustering centers and sum them up, and the clustering centers are constantly updated based on the probability formula. The English teaching data were clustered by the above method, and the law between the clusters of teaching needs was derived. On this basis, a SPOC college English informatization teaching model containing online teaching, classroom guidance, classroom research, and after-class practice is constructed. A teaching quality evaluation path using multiple regression analysis is explored for this model to realize the teaching quality assessment of the new English teaching model.

Innovative trends in the teaching model of English language education in higher education institutions

English informatization teaching in colleges and universities has greatly promoted and enriched the integration and utilization of high-quality educational resources, and a large number of high-quality digital resources have enabled the personalized needs of learners to be realized with the support of new media technology. The current trend of data-driven college English teaching mode innovation trend is mainly reflected in the following aspects.

Utilization of digital resources

At present, the focus and research hotspot of China’s English education informatization resource construction has shifted from the early high-quality courses, high-quality resource sharing courses, and open video courses to the construction of microcourses and catechisms, but the construction of digital resources such as high-quality courses and online courses in the early stage has laid a very good foundation for English informatization classroom teaching. The digital resources for teachers to carry out English informatization teaching based on the network platform can be introduced through the introduction of high-quality MOOC course resources. Building online courses and microcourse video resources with localized characteristics. Transform and upgrade the original high-quality resource sharing courses on campus and transform them into digital resources needed for English informatization classrooms. Form the resources needed for “flipped classroom” with Chinese characteristics in continuous development. It can be said that the classroom cannot be “flipped” without rich digital resources.

Relying on new media technologies

Due to the intervention of new media, based on digital network technology, reorganization and integration of different communication technologies, single media is transformed into all-media, which provides a great possibility for the realization of English informatization classroom. In the English informatized classroom, teachers deliver teaching information to students through online teaching platforms, mobile terminals and other new media technologies, and students can also communicate with teachers in real time through new media technologies. In addition, learners can also collaborate and communicate with each other through new media such as microblogging, WeChat, QQ, and online forums to jointly complete the construction of knowledge. What’s more, when educators or learners are utilizing various media and devices to learn and work, new media and devices, themselves, also have an interactive role, and information is disseminated between people and machines to form interactions. It can be seen that the lack of new media technology support information classroom is difficult to realize.

Educational methods to support personalized learning

English informatization classroom learning is not limited by time, space and location. Teachers use educational technology before class to record short and concise microclass videos, so that students can study independently outside class and before the new class, realizing flipped classroom teaching. During the class, teachers analyze the students’ learning data fed back by the online platform and provide personalized guidance to students, and conduct targeted discussions and explanations in class. After class, students collaborate with each other to complete assignments and interact with teachers and classmates online, and teachers provide online counseling. Through human and machine automatic procedures, supervise, remind the learning process and learning tasks, learners can freely choose the learning content, control the learning process, learners from passive acceptance of knowledge to active learning knowledge, in access to course resources at the same time, so that the focus on personalized learning blended teaching becomes possible.

Acquisition and analysis of data underlying innovations in teaching models

From the above analysis, it is not difficult to find that the collection and analysis of rich English teaching big data is the basis for realizing the innovation of new media teaching, personalized education and other teaching modes. Therefore, this study first collects and analyzes the sample data of English education in colleges and universities, refines the basic laws of English education and teaching, and carries out the construction of new teaching models on this basis.

Data acquisition

This paper is based on the historical operation data of the academic affairs system of a university in J province, including the results of various majors and courses in English, the evaluation results, the attendance rate, the attendance rate, and the preview situation. The data results of the survey were preprocessed to form five factor attributes that may affect the teaching quality, namely “preview”, “evaluation”, “achievement”, “attendance” and “attendance”, and the sample set was as follows: Ui=[ui1,ui2,,uik]

where i[1,M] , M describe the set size, and uik describes the kth attribute value of the ith sample ui in the set.

Data pre-processing

One hot code is chosen to convert the fixed type data such as “Prep”, “Course Type” and “Analysis Type” into numerical values, and some of the attribute codes are converted as shown in Table 1. The conversion of some attribute codes is shown in Table 1.

Conversion table of attribute codes

Text type Data attribute Value
Preview Yes 0
No 1
Course Basic English 0
Professional English 1
Business English 2
English listening 3
Requirement Self-directed type 0
Self-driven type 1
Friendly type 2
Passive type 3

A manual calibration method is used to calibrate the demand types to form a calibrated dataset to better evaluate the analytical model, which is then divided into a training set and a test set. The training set data attributes and example data are shown in Table 2.

Data attributes and examples of training set

ui1 ui2 ui3 ui4 ui5 ui6 ui7
1 88 91 90 85 2 3
0 89 98 84 97 1 0
1 82 93 87 90 0 1
0 89 94 98 91 0 2
1 83 86 94 81 0 0
1 81 84 87 89 0 2
0 88 80 80 92 2 2
1 78 79 88 90 0 1

In order to understand the distribution of courses in the dataset, statistics were made according to four types of needs: “autonomous”, “friendly”, “self-driven” and “passive”. The statistical results are shown in Figure 1.

Figure 1.

Overview statistics of data set

Data attribute correlation analysis
Pearson’s correlation coefficient

Correlation analysis refers to the analysis of the degree of correlation between two or more variables that are correlated, thus measuring the degree of correlation between the elements of the two variables. Correlation analysis is used to investigate whether there is some kind of dependence between a phenomenon, and to analyze the direction of correlation and the degree of correlation, is an important statistical method to study the correlation between random variables. This paper presents the required correlation analysis is Pearson correlation coefficient.

Correlation analysis based on classical statistics is used to measure the strength of linear correlation between variables, which is generally described quantitatively using the Pearson correlation coefficient. The Pearson correlation coefficient, also known as the Pearson product-moment correlation coefficient, is a linear correlation coefficient and is the most commonly used type of correlation coefficient [28]. Denoted as r, it is used to reflect the degree of linear correlation between two variables, including dependent variable Y and dependent variable X, with a value of r ranging from -1 to 1, with larger absolute values indicating a stronger correlation. A larger absolute value of r indicates a stronger correlation. A value of r = 0 indicates that the two variables are not linearly correlated (only non-linearly), but are also correlated in other ways (e.g., in a curvilinear way). If r < 0, there is a negative correlation between the two variables, i.e. the larger the value of one variable, the smaller the value of the other variable. If r > 0, it means that there is a positive correlation between the two variables, i.e., the larger the value of one variable, the larger the value of the other variable. When r = 1 and -1, this indicates that the dependent variable Y and the independent variable X can be well described by the line equation and all sample points fall well on a straight line. The formula for this is shown in equation (2): r=i=1n(xix¯)(yiy¯)i=1n(xix¯)i=1n(yiy¯)

where r is the Pearson correlation coefficient of the two variables, xi and yi are the sample observations of the two variables, n is the number of samples, and x¯ and y¯ denote the mean values of the two variables, respectively.

Correlation analysis of EFL data

In this section, attribute correlation analysis is performed on the calibrated training set to remove data attributes with weak correlation to ensure the stability of the subsequent clustering algorithm. In order to reduce the amount of computation, the correlation analysis is performed only on the four analysis types of basic English courses, and the Pearson correlation coefficient is used to statistically analyze the degree of closeness of relationship between the variables.

Substituting the values of the data attributes in the training set into the Pearson’s formula yields a table of correlation coefficient analysis, as shown in Table 3. Where (Sm)uik and (Sn)uij represent the standard deviation, (SmRn)uijuik represents the covariance, and SmRn represents the correlation coefficients of the elements in the analysis type set R and the relationship grouping set S. R0 ~ R3 represent four types of demand types such as autonomous, self-driven, friendly and passive, respectively, while S1 ~ S6 represent the attribute relationship combinations such as (assessment/grades), (assessment, listening rate), (assessment/attendance), (grades/listening rate), (grades/attendance) and (listening rate/attendance), respectively.

Analysis of correlation coefficients

Group Type n (Sm)uik (Sn)uij (SmRn)uijuik SmRn r
S1 S1R0 88 8.37 6.01 7.29 0.15 0.23
S1R1 68 8.91 6.29 11.02 0.22
S1R2 92 8.74 7.31 18.10 0.26
S1R3 75 10.14 9.13 22.72 0.26
S2 S2R0 88 8.37 8.38 11.75 0.18 0.2
S2R1 68 8.91 8.91 4.31 0.08
S2R2 92 8.72 8.74 21.90 0.30
S2R3 75 10.14 10.12 16.57 0.21
S3 S3R0 88 8.37 5.04 2.71 0.07 0.1
S3R1 68 8.89 5.42 7.40 0.16
S3R2 92 8.74 7.32 2.61 0.05
S3R3 75 10.12 7.47 14.43 0.20
S4 S4R0 88 5.97 8.37 9.15 0.21 0.25
S4R1 68 6.29 8.91 2.98 0.05
S4R2 92 7.29 8.72 22.34 0.35
S4R3 75 9.13 10.14 25.04 0.34
S5 S5R0 88 6.00 5.06 1.47 0.04 0.07
S5R1 68 6.27 5.42 4.31 0.11
S5R2 92 7.28 7.31 -0.11 0.00
S5R3 75 9.16 7.45 9.52 0.15
S6 S6R0 88 8.35 5.04 0.42 0.02 0.08
S6R1 68 8.83 5.42 1.67 0.05
S6R2 92 8.72 7.32 5.44 0.08
S6R3 75 10.14 7.45 14.88 0.23

It can be found that the correlation between relationship group S4 (grade/attendance rate) is the highest (r=0.25), followed by S1 (evaluation/grade) (r=0.23), which shows that the attribute values “grade” and “attendance rate”, “evaluation” and “grade” in the training set have the highest correlation between the type of course demand, and the cluster analysis should be mainly carried out according to this relationship group.

Cluster analysis of English language teaching data
K-means++ clustering algorithm

K-means is a better performance clustering algorithm with low complexity, which can be applied to the case of large amount of data. The algorithm first randomly selects k point in the dataset as the center of the clustering cluster, and divides each dataset point into clusters according to the Euclidean distance, and then iterates over the center of the clusters to form k clusters that meet the requirements.

As a basic algorithm, K-means is robust and can be used for all types of data sets. However, the algorithm also has many shortcomings, such as the algorithm is susceptible to the selection of the initial k value and the distribution of data points, as well as the convergence speed is slow in large datasets and so on [29]. The number of datasets to which the algorithm is applied is large in size, so the K-means algorithm needs to be improved.

The steps to execute the improved K-means++ algorithm of the paper are as follows:

Step 1: The center point C is selected in the data set.

Step 2: For each data point xi in the data set, calculate its Euclidean distance from the nearest center point C with the following formula: Di(x)=(aiaj)2+(bibj)2

where (ai,bi) is the coordinate of xi. (aj,bj) is the coordinate of C. The cumulative value of the Euclidean distance is: Sj=xiXDi(x)2

Step 3: Select the brand new center point Cj based on the probability formula in equation (5): Cj=Di(x)2Sj

Step 4: Repeat steps 2 and 3 until k center point is generated.

It can be seen that the K-means++ algorithm has improved the selection of iterative centroids and better clustering performance [30].

Clustering test for ELT data

The improved K-means++ algorithm was utilized to test the cluster analysis of the instructional data in order to provide an algorithmic architecture for the new instructional model construction work. Based on the data preprocessing in the previous section, the K-means++ algorithm was used to test clustering on 30% of the raw data. A random center of mass O (k=4) was constructed, each point was assigned to the nearest center of mass, and then the center of mass was recalculated, and the process was repeated until the results of the cluster assignment of the data points no longer changed position. According to the algorithmic process, the preprocessed dataset is programmed using Python and graphical results are formed as shown in Figure 2. Fig. 2 (a) and (b) show the clustering results of the relationship between “grade-listening” and “grade-assessment”, respectively.

Figure 2.

Comparison diagrams of clustering relationship

The results after clustering are tested for two different relationship groupings separately. Let the calibrated number of a particular classification be Nc, and the number obtained through the correlation analysis and clustering results in this paper be Nc , then the accuracy of this classification can be defined as: p=Nc/Nc . In this paper, the test validity is measured by p pairs of post-clustering accuracies.

Figure 3 shows the results of the grouping evaluation of the relationship, and the clustering effects of the relationship between “lecture-grade” and “grade-evaluation” are shown in Figure 3, respectively. Among them, the average classification accuracy of “grade” and “attendance rate” p¯=91.7% , and the accuracy rate of “evaluation” and “grade” p¯=82.7% . It can be seen that the evaluation accuracy of the “lecture-grade” relationship is high, which also verifies the view of correlation analysis.

Figure 3.

Comparison of relationship grouping evaluation

Data-driven multiple regression analysis of teaching model construction and teaching quality

Based on the previous analysis of the innovative trend of English education in colleges and universities and the preliminary analysis of English teaching data, this paper proposes a data-driven informatization education model for English in colleges and universities and a teaching quality evaluation method based on multiple linear regression.

Constructing Informatization Teaching Mode of English in Colleges and Universities

Most of the traditional English informatization teaching models in colleges and universities use MOOC as the form of courses, and its advantages are mainly manifested in the openness of content and form. However, the shortcomings of MOOC are also obvious, such as: no prerequisites for learners, no size limitation, low completion rate, no formal credit certification, and open online exams are prone to academic integrity and other problems.

SPOC, as a form of online course derived from MOOC, adheres to the teaching concept and teaching design of MOOC, utilizes the resources of high-quality MOOCs, improves the teaching methods and processes in schools, and improves the teaching effect. Therefore, this paper proposes a college English informatization teaching model based on SPOC flipped classroom.

The teaching activities of SPOC can be divided into pre-course orientation, classroom research and post-course practice, with a complete flipped classroom teaching process; it is an organic combination of physical classroom and network teaching. In this paper, based on the research of many scholars, based on the concept of SPOC, on the existing network platform, according to the informatization teaching framework designed in the previous paper, and then combined with the characteristics of the flipped classroom and blended learning to construct a SPOC-based informatization classroom teaching process model, the process model of the English informatization teaching mode based on SPOC is shown in Figure 4.

Figure 4.

Process model of English information based teaching model based on SPOC

E-learning platforms

The effective development of blended teaching mode can not be separated from the support of the network platform, the platform can be based on the existing stable and easy to operate network course platform, boutique resource sharing course website as a carrier, by the teacher to redesign it, for learners to do regular maintenance. Digital resources are the core of the network platform, each course is independent, different network teaching platform functions generally include: resource area, communication area, management area, learning area, etc. The network teaching platform provides an environment for the realization of SPOC teaching activities. Teachers build resources based on the platform, release learning tasks, manage students, and interact with students; students acquire needed learning materials based on the platform, carry out independent learning, complete online tests and discussions, and realize the application of knowledge understanding.

Pre-course orientation

Before the class, the teacher is the creator and integrator of the course resources, and the designer of the content and progress of the “SPOC flipped classroom”, and the students are the implementers of the “flipped classroom”. Teachers release “learning tasks” and push learning resources, and students independently choose a variety of high-quality resources for online learning through the online platform, watch micro-videos to understand the main content of the classroom in advance, and identify problems.

Classroom research

In the classroom, the teacher is the guide and facilitator of teaching activities, providing individualized guidance to students, organizing group seminars, carrying out project training, jointly solving problems encountered, and providing feedback on classroom problems. The classroom teaching methodology has shifted from monolithic theory teaching to diversified collaboration, inquiry, discussion and interaction. Teachers are the leaders of “SPOC Flipped Classroom”, summarizing the knowledge points and giving new tasks according to the students’ situation, and students understand and internalize the knowledge through practical operation.

Practice learning after school

After class, the teacher is a supporter, assigning after-class homework and implementing after-class tests to help students assess what they have learned. According to the students to complete the homework, selected outstanding students work in the network platform to display and share, targeted counseling as well as evaluation, students in the teacher’s assistance and intra-group and inter-group collaborative exchanges, so that the previous knowledge to be consolidated, sublimation, and for the next stage of learning to prepare.

Teaching quality evaluation method of SPOC teaching model based on multiple regression
Multiple linear regression algorithms

One-dimensional linear regression is the simplest linear regression model used to analyze the linear relationship between an independent variable and the dependent variable, and its basic idea is to predict the value of the dependent variable by modeling the linear relationship between the independent variable and the dependent variable. However, usually in the process of research on real problems, the change of the dependent variable is often subject to the joint action of multiple variables at the same time, at this time, the univariate linear regression can’t predict the dependent variable, and it is necessary to elicit two or more variables that act together to explain the change of the dependent variable, i.e., multiple regression, which, in the case of linear relationship between multiple independent variables and the variable to be measured, is referred to in this paper as Multiple linear regression. Multiple linear regression is a statistical method that uses multiple independent variables to predict one or more dependent variables, and it can analyze the relationship between multiple independent variables and a dependent variable and estimate the functional form between them. The modeling and parameter calculation process is as follows:

When y is set as the dependent variable and x1,x2,⋯,xi is the independent variable and there is a linear relationship between the independent variable and the dependent variable, the general form of the multiple linear regression model is as follows: y=a0+a1x1+a2x2++aixi+e

Where a0 represents the constant term, a1,a2⋯,ai represents the regression coefficient, and e represents the error term.

If two independent variables x1, x2 and the same dependent variable y are linearly correlated, the multiple linear regression model formula is: y=a0+a1x1+a2x2+e

Parameter estimation for multiple regression models, like the same binary linear regression equation, requires that the parameters be solved by least squares provided that the sum of squares of errors (e2) is minimized.

With the binary linear regression model, the standard set of equations for solving the regression parameters is shown in equation (8): { y=na0+a1x1+a2x2 x1y=a0x1+a1x12+a2x1x2 x2y=a0x2+a1x1x2+a2x22

The values of a0, a1, and a2 can be found by solving this equation, which can also be solved by using matrix method formulas: a=(xx)1(xy) [ a0 a1 a2]=[ n x1 x2 x1 x12 x1x2 x2 x1x2 x22][ y x1y x2y]

It should be noted that the correlation between independent variables needs to be considered when modeling with multiple linear regression models. If there is a high degree of correlation between the independent variables, it may lead to a decrease in the accuracy of the multiple linear regression model, and at this time, the use of feature selection, principal component analysis and other methods can be considered to reduce the correlation between the independent variables, so as to improve the accuracy of the model.

Multiple covariance tests

Multicollinearity is a situation where there is a high degree of correlation between independent variables in a multiple regression model. Multicollinearity can lead to inaccurate regression coefficients, make it difficult to make statistical inferences, and even cause the model to fail. Variance inflation factor (VIF), is used to portray the severity of complex (multiple) correlations among multiple variables. It is the ratio of the variance of the regression coefficients estimated under the assumption of a nonlinear relationship between the independent variables. When 0 < VIF < 5, there is no covariance. If 5 < VIF < 10, the phenomenon is weak complex covariance. When the value is 10 < VIF < 100, the covariance is moderate. Severe covariance occurs when VIF is greater than 100. The calculation formula is shown in equation (11): VIFj=11Rj2

where VIFj is the variance inflation factor and Rj2 is the decidable coefficient of multiple explanatory variables assisting the regression.

In this paper, we choose to utilize the variance inflation factor (VIF) to test for the presence of multicollinearity in the independent variables.

Analysis of multiple regression results

Single-factor linear regression analysis

In order to study the influence of pre-study rate, attendance rate, listening rate, course grade, and evaluation grade on the final exam grade, this paper first studies the influence of each factor on the final exam grade. Firstly, the linear regression of each independent variable on the dependent variable is established separately, and the model is as follows: Y=β0i+β1iXi+εi,i=1,2,,5

Where, β0i, β1i is the regression parameter to be estimated and εi is the random error. The results of the regression parameters and their confidence intervals, test statistics R2, F, p, s2 obtained using statistical software are shown in Table 4.

The calculation result of single factor linear regression model

Parameter Parametric estimate Parametric estimate
β01 0.4843 [0.4064, 0.5631]
β11 0.3185 [0.2182, 0.4192]
R2 = 0.2043, F = 39.7541, p = 0.0000, s2 = 0.0374
β02 -0.9687 [-1.945, 0.0082]
β12 1.7012 [0.7135, 2.6789]
R2 = 0.0729, F = 11.5871, p = 0.0005, s2 = 0.0421
β03 0.3712 [0.2538, 0.4957]
β13 0.5876 [0.3872, 0.7831]
R2 = 0.1879, F = 36.3412, p = 0.0000, s2 = 0.0384
β04 0.2395 [0.1728, 0.3116]
β14 0.7221 [0.6234, 0.8123]
R2 = 0.5871, F = 207.3967, p = 0.0000, s2 = 0.0198
β05 0.3387 [0.0184, 0.6621]
β15 0.3871 [0.0553, 0.7213]
R2 = 0.0354, F = 5.3687, p = 0.0213, s2 = 0.0443

It can be seen from the calculation results in Table 4 that the value of p of each variable is less than 0.05, and the F value is greater than the critical value of F, indicating that the preview rate, attendance rate, attendance rate, course grade, and evaluation score have significant effects on the final examination score, among which the contribution rate of course grade β14 is 58.71%, the contribution rate of preview rate β11 is 20.43%, the contribution rate of attendance rate β13 is 18.79%, the contribution rate of attendance rate β12 is 7.29%, and the contribution rate of evaluation score β15 is 3.54%. From the data of classroom teaching, the course grade, the rate of pre-study and the rate of listening to the lectures have the greatest influence on the English final examination results, and the strict management and supervision of pre-study before class, classroom answering and post-class homework should be strengthened in the teaching process.

Multifactor linear regression analysis

In order to study the overall impact of multiple variables such as pre-study rate, attendance rate, listening rate, course grade, and evaluation grade on the final exam grade, this paper establishes a multivariate linear regression model (1): Y=β0+β1X1+β2X2+β3X3+β4X4+β5X5+ε

Where, β0, β1, β2, β3, β4, β5 are the regression parameters to be estimated and ε is the random error. Table 5 shows the calculation results of the multifactor linear regression model (1).

The calculation result of multi-factors linear regression model (1)

Parameter Parametric estimate Parametric estimate
β0 0.8213 [0.0645, 1.5578]
β1 0.1073 [0.0213, 0.1984]
β2 -0.8531 [-1.7682, 0.0536]
β3 0.0573 [-0.1328, 0.2368]
β4 0.6742 [0.5574, 0.7921]
β5 0.1983 [-0.0651, 0.4622]
R2 = 0.6151, F = 46.1386, p = 0.0000, s2 = 0.0175

From the calculation results in Table 5, it can be seen that the value of p is less than 0.05, and the value of F is greater than the critical value of F, indicating that the model is valid from the overall point of view. The confidence intervals of parameters β2, β3, and β5 contain zero points, indicating that the effects of attendance rate, listening rate, and evaluation of teaching grades on final examination grades are not significant.

Remove the non-significant factors and re-establish the linear regression model (2): Y=k0+k1X1+k4X4+ε

The results of the calculation of the multifactor linear regression model (2) (Ⅰ) are shown in Table 6. From the calculation results, p<0.05, F value is much larger than the critical value, which indicates that the model is overall usable and 60.31% of the final exam grade can be explained by the model.

The calculation result(Ⅰ) of multi-factors linear regression model (1)

Parameter Parametric estimate Parametric estimate
k0 0.2011 [0.1276, 0.2732]
k1 0.1153 [0.0364, 0.1935]
k4 0.6531 [0.5384, 0.7633]
R2 = 0.6031, F = 112.9138, p = 0.0000, s2 = 0.0184

Figure 5 shows the distribution of the residuals, and it was found that 16 data points had residual confidence intervals that did not contain zeros, and the data should be considered outliers.

Figure 5.

The distribution of residual

After eliminating them and re-running the program to calculate, the results of calculating the multifactor linear regression model (2) (II) are shown in Table 7. From the calculation results, the regression parameters k0, k1 and k4 do not change much, the confidence interval length of the parameters becomes shorter, the values of R2 and F become larger, and the residual sum of squares s2 becomes smaller, indicating that the modified model is more plausible, and that 78.43% of the final exam grade can be explained by the model.

The calculation result(Ⅱ) of multi-factors linear regression model (2)

Parameter Parametric estimate Parametric estimate
k0 0.1864 [0.1324, 0.2379]
k1 0.1012 [0.0443, 0.1566]
k4 0.7036 [0.6311, 0.7792]
R2 = 0.7843, F = 244.1613, p = 0.0000, s2 = 0.0081

In the one-factor linear regression analysis, the effects of pre-testing rate, attendance rate, attendance rate, course grade, and evaluation grade on the final exam grade are all significant, but in the multi-factor linear regression analysis, the pre-testing rate and course grade have a significant effect on the final exam grade, and the course grade has the largest proportion and the highest contribution. From the regression model, it can be seen that there is multicollinearity among attendance rate, listening rate, course grade, and evaluation of teaching grade, and course grade can highly reflect attendance rate, listening rate and course grade.

Based on the above multiple regression process, it will be able to assess and analyze the quality of English teaching after applying the SPOC college English informatization teaching model, so as to continuously optimize and improve the English teaching model constructed in this paper, and to enhance the quality of English teaching in colleges and universities.

Conclusion

This paper focuses on the correlation analysis and clustering of basic data on English teaching, and innovatively proposes a data-driven English education model. The correlation analysis shows that English grades have the highest correlation (r=0.25, 0.23) with listening rate and assessment, and combined with the higher average classification accuracy of the “listening-grade” relationship (91.7%>82.7%), it is concluded that the listening rate has the highest correlation with English grades. From this teaching rule, a SPOC informationized teaching model aimed at improving classroom attention was constructed, and then a teaching quality assessment method using multiple linear regression was proposed. After removing insignificant factors, the results of the multifactor linear regression analysis show that the prep rate and course grade can explain most (60.31%) of the English final exam scores. Therefore, in order to better assess the teaching quality effects achieved by the SPOC informationized teaching model proposed in this paper, more attention should be paid to students’ pre-preparation and course grades.

Language:
English