A Study of Online Behavioural Data Analysis and Teaching Intervention Strategies for College English Learners
Publicado en línea: 03 feb 2025
Recibido: 01 oct 2024
Aceptado: 06 ene 2025
DOI: https://doi.org/10.2478/amns-2025-0017
Palabras clave
© 2025 Hui Zhang, published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
Learner behavioural portrait refers to a learning analysis technology that uses big data and data mining technology to collect, clean and analyse the learning data generated by users in a multimodal learning environment, generating various types of labels, such as learning patterns and style preferences, which are used to represent the characteristics of the learner’s behaviour, helping to provide accurate real-time feedback and carry out personalized learning [1–4].
The teaching method of network English can provide students with independent learning space, and students can complete learning independently according to their own needs and personal goals [5–6]. At the same time, targeted teaching methods can help students to check their shortcomings, make up for the shortcomings, and better improve the students’ English proficiency. In the process of teaching, teachers can also use the network platform to communicate with students to help students find out their learning deficiencies and conduct guided teaching [7–8]. At the same time, in the process of students’ independent online learning, there are no classroom constraints, which require students to have strong self-control. Teachers can supervise and control students’ learning behaviours and make adjustments to the teaching objectives according to the feedback from students’ self-study process [9–10]. The teaching resources contained in the network are rich and diverse. Therefore, it should be combined with effective network teaching content and characteristics of the curriculum, guide teachers to think comprehensively about the effectiveness of network teaching courses, constantly stimulate students’ learning motivation, and adjust the teaching content and curriculum according to the learning objectives [11–13]. Teachers can make teaching profiles for students before carrying out teaching, understand the individual needs of students, and tailor the teaching to the needs of the students. At the same time, college students often make adjustments to their learning goals during their studies, such as the four or six-level exams, the study of English for graduate school, and so on. Teachers should do a good job of guiding students according to their different learning behaviours in a timely manner, provide students with useful English learning materials, and help students obtain their desired goals [14–15].
With the continuous development of Internet technology at this stage, Internet English teaching has become an important development trend of modern English teaching, and the traditional English teaching mode has gradually failed to meet the needs of the times. Internet English teaching can not only break the time and space limitations of teaching but also give students and teachers a more efficient and convenient learning mode. Liu, Y et al. developed an intelligent online English teaching functional module oriented to meet students’ English learning needs, which covers the areas of homework management, course notification, and visitor management, and put the module into practice after integrating it into an online English teaching system, which made a positive contribution to the optimisation of the web-based intelligent English online teaching platform [16]. Liu, Z. Y et al. investigated the attitudes and perceptions of Chinese and Russian students towards distance online English learning modes, and based on the findings of the study, were informed that the students had an overall positive attitude but indicated that there was still much room for improvement in the distance online English learning modes, and based on the feedback from the students they put forward targeted suggestions for optimising the distance online English teaching [17]. Sari, F. M used a questionnaire to examine students’ perceptions of online language learning classrooms, and the study noted that students mostly showed positive attitudes but also revealed that overload of online coursework, unstable course content, and problems with Internet devices hindered students’ participation in online language learning classrooms [18]. Gao, W In order to improve the quality of English teaching, an attempt was made to design an interactive English teaching paradigm by combining the Internet of Things (IoT) technology and a large number of sensor devices, and the feasibility of the proposed pedagogical reforms was verified by carrying out the teaching practice, which confirms the feasibility and efficiency of the proposed scheme [19]. Wang, Y et al. conducted a modelling analysis using Partial Least Squares-Structural Equation Modelling (PLS-sem), which revealed that students were very receptive to the Internet-based online teaching of English and indicated that teachers could improve students’ academic self-efficacy by providing access to language e-learning, as well as implementing student-centred instructional strategies, which would be conducive to the enhancement of students’ competence and academic confidence [20]. Agung, A. S. N et al. recorded students’ perceptions of online English language learning in the English Language Education Learning Programme at the Palmantarino College of Education and other related data and found that the factors affecting the quality of online English language teaching and learning were the availability and sustainability of the Internet device connection, the accessibility of the instructional media, and the compatibility between media tools, with the most critical factor being the accessibility of the instructional media [21]. Ningsih, P. E. A et al. evaluated the use of learning media in English online learning classrooms from the perspective of teachers and students perceptions and found that WhatsApp Group learning media were very common and effective in online learning formats, while unstable power supply and network seriously hampered normal teaching and learning in English online classrooms [22].
Online education, however, is faced with dilemmas such as low course completion rates, low engagement participation, and low learning autonomy, which has caused various stakeholders to reflect on the quality of online learning and seek solutions. Therefore, it is necessary to provide personalised intervention strategies for different groups to improve the ineffectiveness of online learning. Kizilcec, R. F et al. explored students’ self-regulated learning (SRL) competence in six MOOCs (SRL) performance, which found that students with strong SRL have behavioural habits of reviewing past course knowledge and pointed out that learners’ behavioural characteristics can be used as a predictor of SRL competence [23]. Aguilera-Hermida, A. P., combining orienting and quantitative analytical methods, elaborated on students’ self-efficacy during online learning in special situations, where technology applications all promote students’ commitment to online learning, but students prefer face-to-face classroom learning to online learning modes [24]. Dhawan, S talks about the forced shift of offline education to online teaching and learning in the context of the New Crown Epidemic and also analyses the advantages and challenges of online learning modes, providing scientific advice to educational institutions on how to carry out online teaching and learning [25]. Singh, V et al. examined in detail relevant studies and literature on the definition of online learning, conducted a content analysis of the relevant literature, and concluded that the evolution of the definition of the concept of online learning is a mapping of the technological development over the past three decades [26]. Rodrigues, H et al. reported on the cutting-edge research and the latest practices in online e-learning, and based on an in-depth reading and analysis of the e-learning literature research from 2010 to 2018, identified the themes of e-online Learning research, i.e., educational system construction, and online learning related issues [27].
The study uses social networks and the DBSCAN clustering method to construct an English learner behavior analysis model to analyze the online behavior of English learners. Firstly, data on English learners’ learning behaviors are collected from relevant online learning platforms, and the data are quantified and preprocessed. The article explores the data in terms of interactions between learners and online interactions between teachers and learners in learning forums, then analyses the learners’ learning behavioural characteristics in terms of learning frequency and learning time, and finally divides the 14 behavioural characteristics of English language learners into three types by using the DBSCAN clustering algorithm, and proposes intervention strategies for the three types.
According to the theory of behavioral science, English learners use online learning platforms to learn according to their own needs, and behavioral science focuses on observable and measurable epiphenomenal behavioral activities. In the process of online learning, the operations of English learners can be observed and quantified, so the analysis of online learning behaviours takes online operations as a breakthrough, which is studied by observing, describing and refining the learning operations of English learners [28–29].
English learners learn online on the learning platform and produce a series of different behaviours, which lead to different learning effects, in which the subject of the behaviour, the object of the behaviour, the behavioural environment, and the result of the behaviour have universality, but the behavioural tools and means adopted by each English learner and the behavioural process have uniqueness. Therefore, according to the behavioural tools and means and the behavioural process in the six elements, online learning behaviors are divided into five categories: course access, video viewing, homework performance, usual performance, and interaction.
A total of 19 features were extracted based on theories such as behavioural science and behaviourist psychology, and the feature values were adopted from the three commonly used values of the central tendency metric, as shown in Table 1.
Online learning behavior characteristics
Feature category | Behavior characteristics | Code |
---|---|---|
Course access | Online completion learning | OCD |
Online submission task | OCN | |
After class | PT | |
After class | PMN | |
Video viewing | Online view | ST |
Online learning | LWT | |
Job performance | Whether homework is repeated | SVN |
Job submission moment | OCPN | |
Is the assignment submitted | OCRN | |
Finish your homework online | OTPN | |
Interaction situation | Reply online | OTRN |
Interactive online learning | LIN | |
After class | PPN | |
After-school reply | PRN |
The behavioural features listed in Table 1 are quantified one by one and the basis for the selection of feature thresholds is given.
In the case of course visits, the sign-in rate can reflect the most basic learning situation of English learners, and the formula is as in (1):
Where
If there are multiple plurals, firstly, the median can be chosen as an alternative threshold, and secondly, multiple plurals can be discussed again in terms of threshold selection and categorisation, as below. The actual number of active days for ELLs is recorded as
If
In the video viewing situation, the viewing frequency can reflect the learning input of ELLs, in general, the total number of videos is taken as the minimum number of times ELLs should watch them, which is set as
If
The length of viewing can reflect the degree of English learners’ mastery of the course, set the total length of the video as
If
Assignment submission time can reflect the self-efficacy of English learners, set the assignment submission period for
If
Assignment results can indicate the learner’s attitude towards the course, set the total number of assignments for
In the interactive case, the total post content length threshold is the median of the post content contribution rate, and the post content contribution rate is calculated as in (10):
Where
The post depth threshold is the median of the post count contribution rate, and the post count contribution rate is calculated as in (11):
Where
The quantification of the remaining behavioural characteristics is broadly similar to the process used to quantify the above characteristics and will not be repeated.
This study collected behavioural data from students of a university in the first semester of the 2020-2021 academic year when they studied course X on a MOOC platform, their grades for assignments on a platform, and their grades for the final offline exam. A dataset containing 14 behavioral features was collected. However, the amount of data that could completely cover these 14 features was also limited. Finally, only 115 data were extracted, all of which were valid data. These included students’ personal information, student learning behavior data, and the final exam results, which were considered as the learning effect of English learners.
Data discretisation
In the original data of the course, the final grades are in percentage, which are discretised in order to facilitate the prediction of the final learning effect from the subsequent training data, with grades in the interval [90,100] classified as grade A, grades in the interval [80,89] as grade B, grades in the interval [60,79] as grade C, and grades in the interval [0,59] as grade D. The data are then discretised into the following categories. The data was preprocessed according to behavioral features and their respective thresholds are shown in Table 1. This is in contrast to the quantitative processing of behavioral features in the previous subsection.
Data Normalisation
Due to the non-uniformity of scale and attributes among different categories of data, data need to be standardised, and the two commonly used methods for data standardisation are Min-max standardisation and Z-score standardisation.
Min-max standardisation: also called extreme difference standardisation, is a linear transformation method, it is used to normalise the original data, the original data corresponding to the linear mapping in the [0, 1] interval.
Z-score standardisation: is based on the mean and standard deviation of the original data, so that the standardised variable falls between [-1, 1].
The density of a graph refers to the closeness of relationships between nodes in a community network. In a community network, density belongs to an important variable. The network can be a close relationship or an estranged relationship. In general, the community with close relationships has more cooperative behaviors, easier information communication, and better group cooperation performance, while the community with estranged relationships will have the problem of poor information and less contact, which is a frequently used concept in social network research. The density of the overall network is calculated as follows: if there are
Centrality is an important index in social network research, used to analyze and evaluate the position of a node in the network, analyze what kind of power a node has in its network structure, or what kind of central position it holds, reflecting the degree of importance of the node in the network. Network centrality can be divided into three indicators: degree centrality, proximity centrality, and mediation centrality.
Degree centrality: Degree centrality is the sum of a person’s number of relationships and is an important indicator of who is at the centre of the group in which they live, as is often done in social networks. The formula for measuring degree centrality is as follows, with equation (15) being the absolute value, i.e., summing up the number of relationships a person has. Equation (16) is a normalised value, i.e. dividing the sum of the number of relationships of a person by the maximum number of relationships of that network, mostly used for comparisons between ten same networks. The The formula for measuring the degree centrality of a group is as follows, equation (17) represents the difference between the degree centrality of the highest degree centrality actor in a network and the degree centrality of the other actors. The greater the difference, the higher the degree centrality of the group.
Proximity centrality: the formula for measuring proximity centrality is as follows, Mediator centrality: The formula for measuring mediator centrality is as follows, Group mediated centrality is also an indicator of the overall structure of a network. The higher the value of group mediated centrality, the higher the likelihood that the information resources in the network are monopolised by a few actors, i.e., the higher the likelihood that there is a high degree of control of the information resources by the actors, and the worse the organisation of the network in which they are located. The formula for measuring group mediation centrality is as follows, equation (20) represents the gap between the mediation centrality of the actor with the highest mediation centrality and the mediation centrality of other actors in a network. The greater the gap between the two, the higher the value of group mediocentrality.
The basic idea of the DBSCAN algorithm is to first select a data point as a starting point and calculate the density of all points in its neighbourhood. If the density is greater than or equal to a predetermined threshold, then the point is a core point. If the density is less than that threshold, then the point is a noise point. For each core point, a neighbourhood radius
DBSCAN algorithm related definitions:
Core and Boundary Points: for any data object Direct density reachability: an object Density reachable: given a dataset Density connected: if there exists object Clusters and Noise: Starting from any core point object, all objects that are density-reachable from that object form a cluster, and objects that do not belong to any cluster are noise.
The DBSCAN algorithm needs to manually set two parameters, neighbourhood radius threshold Input spatio-temporal dataset Starting from any one of the unprocessed data objects If a data object Sequentially accessing other unprocessed data objects to form a number of core objects, noise points, and clustering clusters, and merging and expanding the data objects and clustering clusters that satisfy the density direct or the density reachable. Repeat steps 2) to 4) continuously until no new data objects are merged into any of the clustering clusters or labelled as noise points, and end the algorithm. Output the clustering results.
In this paper, based on the comparative analysis of various analysis tools, UCINET is used to analyse the social behaviours of English learners on learning forums. It integrates NetDraw, which implements visualization functions, Mage, a three-bit display analysis software, and Pajek, a free application for large-scale network analysis, which is capable of analyzing social network relationships more comprehensively.
The main functions of the forum are posting and replying. In this paper, we extract the records of these two behaviors from the log data with the keywords “Create-Post” and “Reply-Post” respectively. These two behaviors are extracted from the log data, and after cleaning and preprocessing, they are organized, and a two-dimensional matrix of teacher-student interactions is constructed, as shown in Table 2, with Si denoting the English learner and T denoting the teacher. Due to the large amount of data, only a portion of it is displayed.
Students and students interaction two dimensional matrices
S1 | S2 | S3 | S4 | S5 | S6 | …… | T | |
---|---|---|---|---|---|---|---|---|
S1 | 0 | 1 | 0 | 0 | 1 | 0 | …… | 2 |
S2 | 0 | 0 | 2 | 1 | 0 | 0 | …… | 4 |
S3 | 2 | 0 | 0 | 0 | 1 | 0 | …… | 6 |
S4 | 1 | 2 | 0 | 0 | 0 | 0 | …… | 4 |
S5 | 0 | 0 | 1 | 0 | 1 | 0 | …… | 2 |
S6 | 1 | 1 | 0 | 0 | 0 | 1 | …… | 3 |
…… | …… | …… | …… | …… | …… | …… | …… | …… |
T | 2 | 3 | 3 | 4 | 3 | 2 | …… | 0 |
With the UCINET analysis tool, the two-dimensional matrix can be converted into a visualised network structure diagram of teacher-student interaction, as shown in Figure 1. The figure depicts the interactions between teachers and learners in the course on the cloud classroom platform’s online forum. Among them, each node represents a learner or a teacher. The larger the node indicates that the student or teacher represented has more posting-response times in the forum. The more active the performance, and the connecting line between the nodes indicates the response relationship between the two.

Student interaction network diagram
From the teacher-student interaction network diagram, it can be seen that all members (including teachers) are in a network, and there is no isolated member. Node T, representing the teacher in the diagram, is obviously larger than the other nodes, which intuitively reflects the phenomenon of interaction between ELLs-teachers as well as sideways to indicate that the teacher’s dominant role in forums, and ELLs are more willing to reply to the teacher’s posts, no matter whether it is Based on the teacher’s authority in the classroom or the interesting and seminar nature of the post topics, the teacher’s role in the forum is obvious.
After calculating the network density as shown in Table 3, the network density of the course is 0.207, with a standard deviation of 0.658, which indicates that the English learners in the course are not very closely connected, and the interaction of the English learners in the forum is not particularly frequent. Combined with the network diagram of teacher-student interactions, it can be seen that the members on the edge of it have a relatively low level of interaction with other members, so the teacher should adopt appropriate strategies to guide the interaction of this part of the members with other members.
Density analysis
1 | 2 | 3 |
---|---|---|
Avg Value | Std Dev | Avg Wed Degree |
0.207 | 0.658 | 12.393 |
The centrality of English learners can be analysed through UCINET, and combined with the algorithms related to the analysis of social network centrality, the relative point degree centrality, intermediate centrality, and proximity centrality were calculated, and the results were obtained as shown in Table 4, Table 5, and Table 6 respectively.
Relative point center analysis
English learner | Point of point | Point of entry | Point center | English learner | Point of point | Point of entry | Point center |
---|---|---|---|---|---|---|---|
2 | 20 | 49 | 0.57 | 56 | 10 | 9 | 0.15 |
14 | 19 | 31 | 0.41 | 6 | 7 | 11 | 0.14 |
26 | 27 | 14 | 0.34 | 39 | 12 | 6 | 0.14 |
4 | 26 | 11 | 0.3 | 45 | 13 | 5 | 0.14 |
11 | 17 | 20 | 29.01 | 52 | 9 | 9 | 0.14 |
8 | 20 | 11 | 0.25 | 61 | 10 | 8 | 0.14 |
13 | 14 | 17 | 0.25 | 26 | 10 | 7 | 0.14 |
18 | 14 | 17 | 0.25 | 29 | 9 | 8 | 0.14 |
24 | 25 | 16 | 0.25 | 37 | 8 | 9 | 0.14 |
22 | 18 | 12 | 23.01 | 42 | 8 | 9 | 0.14 |
33 | 15 | 15 | 0.24 | 47 | 6 | 11 | 0.14 |
5 | 19 | 10 | 0.24 | 24 | 5 | 11 | 0.13 |
30 | 13 | 15 | 0.23 | 40 | 7 | 8 | 0.12 |
10 | 15 | 12 | 0.23 | 54 | 9 | 6 | 0.12 |
15 | 13 | 14 | 0.22 | 36 | 6 | 8 | 0.11 |
20 | 20 | 7 | 0.22 | 46 | 7 | 7 | 0.11 |
32 | 13 | 20 | 0.22 | 48 | 8 | 6 | 0.11 |
21 | 16 | 10 | 0.21 | 49 | 8 | 6 | 0.11 |
27 | 13 | 14 | 0.21 | 57 | 5 | 9 | 0.11 |
1 | 20 | 5 | 0.2 | 59 | 8 | 6 | 0.11 |
12 | 13 | 13 | 0.2 | 38 | 6 | 7 | 0.1 |
19 | 13 | 2 | 0.2 | 41 | 6 | 7 | 0.1 |
16 | 16 | 10 | 0.19 | 50 | 6 | 7 | 0.1 |
3 | 12 | 11 | 0.19 | 55 | 9 | 4 | 0.1 |
17 | 11 | 11 | 0.18 | 60 | 6 | 7 | 0.1 |
7 | 12 | 9 | 0.17 | 51 | 7 | 5 | 0.09 |
9 | 10 | 11 | 0.17 | 43 | 4 | 5 | 0.07 |
31 | 8 | 13 | 0.27 | 44 | 3 | 4 | 0.05 |
34 | 12 | 9 | 0.17 | 53 | 4 | 3 | 0.05 |
Relative intermediate analysis
English learner | Betweenners | Relative central degree | English learner | Betweenners | Relative central degree |
---|---|---|---|---|---|
2 | 50.226 | 0.029 | 31 | 9.327 | 0.001 |
4 | 29.1 | 0.018 | 55 | 8.853 | 0.006 |
24 | 29.257 | 0.016 | 20 | 8.46 | 0.006 |
14 | 24.383 | 0.015 | 21 | 7.71 | 0.005 |
50 | 22.433 | 0.014 | 29 | 7.26 | 0.005 |
18 | 20.583 | 0.013 | 51 | 7.21 | 0.005 |
10 | 20.3 | 0.012 | 39 | 5.677 | 0.004 |
26 | 20.133 | 0.012 | 23 | 5.51 | 0.004 |
13 | 19.85 | 0.012 | 25 | 5.427 | 0.004 |
8 | 18.526 | 0.011 | 48 | 5.343 | 0.004 |
33 | 16.38 | 0.01 | 35 | 5.26 | 0.004 |
22 | 16.217 | 0.01 | 53 | 5.177 | 0.004 |
5 | 15.6 | 0.01 | 44 | 4.627 | 0.004 |
17 | 15.183 | 0.01 | 45 | 4.593 | 0.004 |
52 | 14.993 | 0.009 | 93 | 4.51 | 0.004 |
11 | 14.693 | 0.009 | 47 | 4.127 | 0.003 |
16 | 14.267 | 0.009 | 40 | 2.927 | 0.003 |
34 | 14.217 | 0.009 | 56 | 2.01 | 0.002 |
12 | 14.1 | 0.009 | 58 | 2.427 | 0.002 |
60 | 13.68 | 0.009 | 49 | 2.377 | 0.002 |
36 | 12.68 | 0.008 | 59 | 2.32 | 0.002 |
15 | 12.183 | 0.008 | 37 | 2.26 | 0.002 |
1 | 11.017 | 0.007 | 54 | 2.01 | 0.002 |
27 | 10.633 | 0.007 | 28 | 1.577 | 0.002 |
41 | 10.183 | 0.007 | 50 | 1.653 | 0.002 |
19 | 9.767 | 0.006 | 46 | 1.593 | 0.002 |
6 | 9.66 | 0.006 | 42 | 0.76 | 0.001 |
9 | 9.6 | 0.006 | 43 | 0.01 | 0.001 |
7 | 9.6 | 0.006 | 52 | 0.01 | 0.001 |
Relative proximity to central analysis
English learner | Short cut | Proximity center | English learner | Short cut | Proximity center |
---|---|---|---|---|---|
18 | 224 | 0.29 | 23 | 231 | 0.272 |
30 | 224 | 0.29 | 39 | 231 | 0.272 |
10 | 224 | 0.29 | 47 | 231 | 0.272 |
33 | 224 | 0.29 | 44 | 291 | 0.262 |
13 | 225 | 0.289 | 25 | 231 | 0.272 |
22 | 225 | 0.289 | 38 | 231 | 0.272 |
5 | 225 | 0.289 | 45 | 232 | 0.271 |
1 | 225 | 0.289 | 53 | 232 | 0.271 |
32 | 226 | 0.288 | 48 | 232 | 0.271 |
17 | 227 | 0.286 | 58 | 232 | 0.271 |
12 | 226 | 0.286 | 56 | 233 | 0.27 |
15 | 227 | 0.286 | 46 | 253 | 0.27 |
60 | 227 | 0.286 | 37 | 233 | 0.27 |
16 | 227 | 0.286 | 40 | 233 | 0.27 |
34 | 227 | 0.286 | 49 | 2353 | 0.27 |
20 | 227 | 0.286 | 28 | 293 | 0.27 |
29 | 228 | 0.285 | 59 | 234 | 0.269 |
31 | 228 | 0.285 | 54 | 234 | 0.269 |
36 | 228 | 0.285 | 50 | 235 | 0.267 |
55 | 228 | 0.285 | 42 | 236 | 0.266 |
19 | 228 | 0.285 | 43 | 238 | 0.264 |
3 | 228 | 0.285 | 52 | 239 | 0.263 |
27 | 228 | 0.285 | 57 | 299 | 0.263 |
18 | 224 | 0.29 | 23 | 231 | 0.272 |
30 | 224 | 0.29 | 39 | 231 | 0.272 |
10 | 224 | 0.29 | 47 | 231 | 0.272 |
33 | 224 | 0.29 | 44 | 291 | 0.262 |
13 | 225 | 0.289 | 25 | 231 | 0.272 |
22 | 225 | 0.289 | 38 | 231 | 0.272 |
Point degree centrality indicates the direct connection between English learners, and the higher the point degree centrality, the closer the direct connection between English learners and the higher the attention between them. As can be seen in Table 4, the relative dot centrality of ELLs 2 and 14 is very high, indicating that they are very willing to connect with other students and maintain a high level of activity in the forum, and their dot in is significantly greater than their dot out, which indicates that they have a high level of prestige in the forum. The dot centrality of ELLs 42, 43, 52, and 57 is very low, which indicates that they seldom connect with other students, and both point-in and point-out degrees are very low, they rarely pay attention to others and also rarely get attention from others, and they have very little interaction in the forum.
From Table 5, we can see that the intermediate centrality of ELLs #2, 4, 24, 14, 30, and 18 is high, which indicates that they have more control over the resources in the forums and they are able to control the dissemination and flow of information and play the role of communication among many members. The intermediate centrality of ELLs #43, 52, and 57 is 0, which indicates that they are in the periphery of the forums and are seldom able to facilitate communication among others.
Proximity centrality indicates the extent to which English learners are dependent on others. As can be seen from Table 6, the English learners’ proximity centrality is all relatively close to each other the difference is not very big, and it is generally low, which indicates that they have a certain degree of dependence on other members in the process of posting and replying to the posts in the forums, and in combination with the diagram of the teacher-student social network structure, this degree of dependence may originate from the dependence on the teacher.
The online learning platform records log data of English learners’ online learning behaviours, and each operation of English learners in the process of online learning is recorded, and the frequency of operation refers to the number of times English learners operate behaviours for the course resources they learn when they study the course, and the number of logs is added to the number of logs for each operation. According to the teaching plan for online learning, the online learning platform for autumn 2019 has now been chosen. The log data generated was analysed for a total of 153 days over 5 months of course teaching, and 5163 students were sampled for analysis using the sampling method.
The SPSS Statistics tool was used for the analysis, resulting in the page views of each English learner taking the course in a semester, as shown in Figure 2, with the horizontal axis representing the unique identification of each English learner and the vertical axis recording the number of times that the English learner viewed the course pages in the online learning platform in a semester. From the figure, it can be seen that the differences in English learners’ frequency of course learning are relatively distinct among them, the smallest number of course views in a semester of learning is 711, and the largest number of views reaches 13,312.

The course page browsing frequency chart
Now, the number of times English learners’ course pages are viewed is sorted in descending order, as shown in Figure 3. In order to more clearly analyse the English learners’ study of the course, the number of course page views of 1,000 is used as a division into 2 parts, which are the group of English learners with less than 1,000 course views in a semester and the group of people with more than 1,000 course views.

List of course page browsing times statistics
The statistics of English learners for course page views of less than 1000 are shown in Figure 4. It can be obtained that the group of people with less than 1000 course pageviews occupies 97.7% of the total number of people, and the total number of pageviews in a semester is 24276, with the average number, of course, pageviews being 48. Among them, the number of course page views within 200 times in a semester of this population occupies about 94.1%.

The course page is less than 1,000
The statistical graph of English learners for the number of course page views greater than 1000 is shown in Figure 5. It can be obtained that the group of people with more than 1,000 course page views occupies only 2.6% of the total number of people, the total number of course page views in a semester is 41,764, and the average number of course page views is 3,400. Among them, the number of course page views within 4,000 times in a semester for this group is about 73.5%.

The course page is more than 1,000
According to the teaching plan of online learning, we will continue to select online learning behavior data from English learners in autumn 2019 for sampling and analysis. Through the behavioural data generated by the online learning platform, the statistics yielded a total of 607,618 log messages of English learners interacting with the course while they were studying online, and the analysis can obtain the overall time distribution of students’ participation in online learning, as shown in Figure 6. From the figure, it can be seen that the learning time of English learners has a certain regularity. That is, English learners visit the online learning platform more at the beginning of the semester and at the end of the semester, especially at the end of the semester, English learners’ visits to the learning platform increase significantly and reach the peak to a certain extent.

The course studies the overall time distribution
In addition to the statistics on the browsing of the courses, the discussions of the courses in the online learning platform were also analysed, resulting in a pattern similar to the browsing of the courses, and the overall situation is shown in Figure 7. From the figure, it can be seen that the discussion and evaluation of English learners’ courses are concentrated in the first and the last month of a semester, and the number of discussions at other times is almost 0 times.

The overall discussion of the situation statistics
Overall, in terms of study time, the frequency of online learning by English learners shows a rapid rise then a decline tending to level off, and then a sharp rise again at the end of the semester. This shows that English learners’ time allocation in the process of online learning is unreasonable, which is specifically manifested in the low participation and motivation of learning in the middle of the semester, and they only spend more time and energy to complete the academic exams at the end of the semester.
The DBSCAN clustering algorithm was used to cluster the dataset for unsupervised learning using the 14 learning behavioural characteristics of English learners’ online learning as the clustering elements. It was found that category C=2, score: 0.55; C=3, score: 0.59; C=4, score: 0.51; C=5, score: 0.49, and the segmentation result when the category is 3 is shown in Fig. 8, and when n_clusters=3, then DBSCAN clustering has the best performance.

English learners behavior clustering
In order to classify the types of different clusters of English learners, we have analyzed the differences in the online learning behaviors of the 3 types of English learners, and the mean values of the 14 online learning behavioral indicators of each cluster of English learners were compared as shown in Figure 9. In the figure, it can be seen that the trajectories of online learning behaviors of the 3 classes of English learners are roughly similar, and there are very obvious differences in some behavioral features, which are typically characterized as follows:
CLUSTER 0: The online learning activities of this category of English learners are more balanced among the three categories, and among the 14 indicators characterizing online learning behaviors in SPOC environments, only 2 indicators are lower than 0.1, and the remaining 12 behavioral characteristics are all above 0.1, which show a certain degree of activity. CLUSTER 1: The online learning activities of this category of English learners fluctuate the most among the three categories, and of the 14 indicators that characterize online learning behaviors in SPOC environments, 2 indicators are lower than 0.1, and 3 are equal to 0.1, while the remaining 7 behavioral features are above 0.1, which shows that the online learning behaviors of this category of English learners are not active enough. CLUSTER 2: The online learning activities of this category of English learners are more stable among the three categories, and among the 14 indicators characterizing online learning behaviors, all the behavioral features are above 0.1, which shows a certain degree of activity for most learning behaviors. Accordingly, the English learners in this cluster are named “highly active English learners”.

English learner online behavior data clustering results
Guiding students to accept new forms of learning, such as teacher-student and student-student interaction, from face-to-face to long-distance interaction with the help of network devices. Emotional expression, text pop-ups, emoticon pop-ups, symbols and so on, these interactive methods and emotional expression become new forms in the teaching process, and students are guided to accept and adapt to them in order to make the whole teaching process more smooth and natural, and the learning process more active and engaged.
Encourage students to engage in new aspects of teaching, such as interactive sessions: voting, scoring, quizzing, feedback, socialising. Incentive sessions: speak the most, get the most votes, snatch the star, get the most likes, good thinking, etc. Encouraging students to participate in well-designed sessions is the only way to make it possible for English language learners to achieve the set pre-determined teaching goals.
In the course of the online learning environment, the learning behaviour data generated by English learners will be recorded and collected in real time, and value judgments will be made on English learners’ learning through the learning behaviour data generated by English learners. Such judgements vary with the changes in teaching objectives, teaching content, and teaching environments, but essentially, they all require the establishment of appropriate evaluation dimensions, refined evaluation indicators, and evaluation criteria that can give students clear expectations, in order to present evaluation results that are authentic, objective, and comprehensively reflect the students’ situation, so that ELLs can self-check and teachers can launch interventions.
The behaviour of ELLs is monitored to enhance the assessment of expected learning outcomes for ELLs, to provide timely warnings of ELLs found to be at risk for learning, to enhance the process of intervention, and to reinforce feedback on learning. This process involves differential intervention for different types of ELLs. Higher-level teaching assistants and excellent course specialists at the same level can be introduced into the learning process. Statistics on the frequency and duration of low performing ELLs’ behaviours in each session are tallied and summarised to the TAs on a weekly basis. The TAs then convert the summary results into weekly behavioural assessments based on quantitative evaluation criteria, and then monitor the behaviour of the low performing learners based on the results of the assessments, and then provide real-time behavioural feedback to the low performing learners in a targeted manner.
The study comprehensively analysed the online learning behaviours of English learners by collecting and pre-processing the data generated from their online learning and using the DBSCAN methodology to cluster the behavioural data in online learning, which was used to quantify the pedagogical interventions and design improvement strategies for different types of learners. The results of each specific study are as follows:
Through the collection of data, on the basis of which the online learning social behaviours of English learners were analysed, it was found that the learners interacted with the teacher to a higher degree, the teacher its dominant role, and interacted with other learners to a lower degree, but there was no isolated behaviour. through the online learning platform to record English learners’ online learning behaviour data, extracted 5163 students’ data for analysis which, found that in a semester of learning, the minimum number, of course, browsing 711 times, the maximum number of browsing up to 13,312 times, there is a big difference.