Accès libre

A Study of Online Behavioural Data Analysis and Teaching Intervention Strategies for College English Learners

  
03 févr. 2025
À propos de cet article

Citez
Télécharger la couverture

Introduction

Learner behavioural portrait refers to a learning analysis technology that uses big data and data mining technology to collect, clean and analyse the learning data generated by users in a multimodal learning environment, generating various types of labels, such as learning patterns and style preferences, which are used to represent the characteristics of the learner’s behaviour, helping to provide accurate real-time feedback and carry out personalized learning [14].

The teaching method of network English can provide students with independent learning space, and students can complete learning independently according to their own needs and personal goals [56]. At the same time, targeted teaching methods can help students to check their shortcomings, make up for the shortcomings, and better improve the students’ English proficiency. In the process of teaching, teachers can also use the network platform to communicate with students to help students find out their learning deficiencies and conduct guided teaching [78]. At the same time, in the process of students’ independent online learning, there are no classroom constraints, which require students to have strong self-control. Teachers can supervise and control students’ learning behaviours and make adjustments to the teaching objectives according to the feedback from students’ self-study process [910]. The teaching resources contained in the network are rich and diverse. Therefore, it should be combined with effective network teaching content and characteristics of the curriculum, guide teachers to think comprehensively about the effectiveness of network teaching courses, constantly stimulate students’ learning motivation, and adjust the teaching content and curriculum according to the learning objectives [1113]. Teachers can make teaching profiles for students before carrying out teaching, understand the individual needs of students, and tailor the teaching to the needs of the students. At the same time, college students often make adjustments to their learning goals during their studies, such as the four or six-level exams, the study of English for graduate school, and so on. Teachers should do a good job of guiding students according to their different learning behaviours in a timely manner, provide students with useful English learning materials, and help students obtain their desired goals [1415].

With the continuous development of Internet technology at this stage, Internet English teaching has become an important development trend of modern English teaching, and the traditional English teaching mode has gradually failed to meet the needs of the times. Internet English teaching can not only break the time and space limitations of teaching but also give students and teachers a more efficient and convenient learning mode. Liu, Y et al. developed an intelligent online English teaching functional module oriented to meet students’ English learning needs, which covers the areas of homework management, course notification, and visitor management, and put the module into practice after integrating it into an online English teaching system, which made a positive contribution to the optimisation of the web-based intelligent English online teaching platform [16]. Liu, Z. Y et al. investigated the attitudes and perceptions of Chinese and Russian students towards distance online English learning modes, and based on the findings of the study, were informed that the students had an overall positive attitude but indicated that there was still much room for improvement in the distance online English learning modes, and based on the feedback from the students they put forward targeted suggestions for optimising the distance online English teaching [17]. Sari, F. M used a questionnaire to examine students’ perceptions of online language learning classrooms, and the study noted that students mostly showed positive attitudes but also revealed that overload of online coursework, unstable course content, and problems with Internet devices hindered students’ participation in online language learning classrooms [18]. Gao, W In order to improve the quality of English teaching, an attempt was made to design an interactive English teaching paradigm by combining the Internet of Things (IoT) technology and a large number of sensor devices, and the feasibility of the proposed pedagogical reforms was verified by carrying out the teaching practice, which confirms the feasibility and efficiency of the proposed scheme [19]. Wang, Y et al. conducted a modelling analysis using Partial Least Squares-Structural Equation Modelling (PLS-sem), which revealed that students were very receptive to the Internet-based online teaching of English and indicated that teachers could improve students’ academic self-efficacy by providing access to language e-learning, as well as implementing student-centred instructional strategies, which would be conducive to the enhancement of students’ competence and academic confidence [20]. Agung, A. S. N et al. recorded students’ perceptions of online English language learning in the English Language Education Learning Programme at the Palmantarino College of Education and other related data and found that the factors affecting the quality of online English language teaching and learning were the availability and sustainability of the Internet device connection, the accessibility of the instructional media, and the compatibility between media tools, with the most critical factor being the accessibility of the instructional media [21]. Ningsih, P. E. A et al. evaluated the use of learning media in English online learning classrooms from the perspective of teachers and students perceptions and found that WhatsApp Group learning media were very common and effective in online learning formats, while unstable power supply and network seriously hampered normal teaching and learning in English online classrooms [22].

Online education, however, is faced with dilemmas such as low course completion rates, low engagement participation, and low learning autonomy, which has caused various stakeholders to reflect on the quality of online learning and seek solutions. Therefore, it is necessary to provide personalised intervention strategies for different groups to improve the ineffectiveness of online learning. Kizilcec, R. F et al. explored students’ self-regulated learning (SRL) competence in six MOOCs (SRL) performance, which found that students with strong SRL have behavioural habits of reviewing past course knowledge and pointed out that learners’ behavioural characteristics can be used as a predictor of SRL competence [23]. Aguilera-Hermida, A. P., combining orienting and quantitative analytical methods, elaborated on students’ self-efficacy during online learning in special situations, where technology applications all promote students’ commitment to online learning, but students prefer face-to-face classroom learning to online learning modes [24]. Dhawan, S talks about the forced shift of offline education to online teaching and learning in the context of the New Crown Epidemic and also analyses the advantages and challenges of online learning modes, providing scientific advice to educational institutions on how to carry out online teaching and learning [25]. Singh, V et al. examined in detail relevant studies and literature on the definition of online learning, conducted a content analysis of the relevant literature, and concluded that the evolution of the definition of the concept of online learning is a mapping of the technological development over the past three decades [26]. Rodrigues, H et al. reported on the cutting-edge research and the latest practices in online e-learning, and based on an in-depth reading and analysis of the e-learning literature research from 2010 to 2018, identified the themes of e-online Learning research, i.e., educational system construction, and online learning related issues [27].

The study uses social networks and the DBSCAN clustering method to construct an English learner behavior analysis model to analyze the online behavior of English learners. Firstly, data on English learners’ learning behaviors are collected from relevant online learning platforms, and the data are quantified and preprocessed. The article explores the data in terms of interactions between learners and online interactions between teachers and learners in learning forums, then analyses the learners’ learning behavioural characteristics in terms of learning frequency and learning time, and finally divides the 14 behavioural characteristics of English language learners into three types by using the DBSCAN clustering algorithm, and proposes intervention strategies for the three types.

Characteristics of Online Behaviour of University English Learners
Extraction of online learning behavioural features

According to the theory of behavioral science, English learners use online learning platforms to learn according to their own needs, and behavioral science focuses on observable and measurable epiphenomenal behavioral activities. In the process of online learning, the operations of English learners can be observed and quantified, so the analysis of online learning behaviours takes online operations as a breakthrough, which is studied by observing, describing and refining the learning operations of English learners [2829].

English learners learn online on the learning platform and produce a series of different behaviours, which lead to different learning effects, in which the subject of the behaviour, the object of the behaviour, the behavioural environment, and the result of the behaviour have universality, but the behavioural tools and means adopted by each English learner and the behavioural process have uniqueness. Therefore, according to the behavioural tools and means and the behavioural process in the six elements, online learning behaviors are divided into five categories: course access, video viewing, homework performance, usual performance, and interaction.

A total of 19 features were extracted based on theories such as behavioural science and behaviourist psychology, and the feature values were adopted from the three commonly used values of the central tendency metric, as shown in Table 1.

Online learning behavior characteristics

Feature category Behavior characteristics Code
Course access Online completion learning OCD
Online submission task OCN
After class PT
After class PMN
Video viewing Online view ST
Online learning LWT
Job performance Whether homework is repeated SVN
Job submission moment OCPN
Is the assignment submitted OCRN
Finish your homework online OTPN
Interaction situation Reply online OTRN
Interactive online learning LIN
After class PPN
After-school reply PRN
Quantification of behavioural characteristics

The behavioural features listed in Table 1 are quantified one by one and the basis for the selection of feature thresholds is given.

Course visits

In the case of course visits, the sign-in rate can reflect the most basic learning situation of English learners, and the formula is as in (1): Ai=j=1kmijj=1knij(i=1,2,s;j=1,2,k)

Where Ai is the attendance rate of the i nd student, mij is the number of times the i th student attended for the j th time, order nij is the number of times the i th student was not absent for the j th time, k is the total number of times the teacher took attendance, and s is the total number of students. Taking into account factors such as chance and contingency, and in conjunction with expert advice, ELLs with an 80% sign-in rate are recorded as 1, and those who do not are recorded as 0.

If there are multiple plurals, firstly, the median can be chosen as an alternative threshold, and secondly, multiple plurals can be discussed again in terms of threshold selection and categorisation, as below. The actual number of active days for ELLs is recorded as A, and if a0a, the expression is: Active days={ High activityA>a0Moderately active a<Aa0Low activityA<a

If a0 < a, the expression is: Active days={ High activityAaModerately activea0A<aLow activityA<a0

Video viewing

In the video viewing situation, the viewing frequency can reflect the learning input of ELLs, in general, the total number of videos is taken as the minimum number of times ELLs should watch them, which is set as t. Take the plural of the viewing frequency of all ELLs as t0, and if there are more than one plurality (exclude extreme cases such as the viewing frequency of 0), the median is used as the alternate threshold as t0'. The actual frequency of viewing by ELLs is recorded as T. If t0t, the expression is: Viewing frequency={ moreT>t0normalt<Tt0lessT<t

If t0 < t, the expression is: Viewing frequency={ moreTtnormalt0T<tlessT<t0

The length of viewing can reflect the degree of English learners’ mastery of the course, set the total length of the video as l, that is, the length of time that English learners should watch, take the plural l0, the actual length of time that English learners watch is L, if l0l (there is a repeated viewing situation), the expression is: Viewing duration={ longL>l0normallLl0shortL<l

If l0 < l, the expression is: Viewing duration={ longL>l0normallLl0shortL<l

Operational performance

Assignment submission time can reflect the self-efficacy of English learners, set the assignment submission period for d day, take the median of d days d' (such as d = 10 , then d' = 5 , rounded up when there is a remainder), the assignment submission time ascending order, the English learners assignment submission time is recorded as Job, due to the online learning platform of English learners’ learning time is different, so take the median of Jobsub, if Jobsub < d', then the expression is: Job submit time={ VigorousJobJobNormalJobsab<JobdNot activeJob>d

If Jobsub > d', the expression is: Job submit time={ VigorousJobdNormald<JobJobsubNot activeJob>Jobsub

Usual performance and interactions

Assignment results can indicate the learner’s attitude towards the course, set the total number of assignments for y, the teacher of the learner’s homework submitted for assessment. Assessment grade is divided into “good and bad”, if the percentage system, is greater than or equal to 80 for excellent, [60, 79] interval for good, less than 60 for poor, take the plurality of all the homework grades for the learners’ overall performance, if the grades happen to be even or there are multiple pluralities, the homework grade is recorded as good. Learners’ overall assignment grades: if each grade happens to be evenly divided or multiple plurals occur, the assignment grade is recorded as good.

In the interactive case, the total post content length threshold is the median of the post content contribution rate, and the post content contribution rate is calculated as in (10): Cij=fijj=1nfij(i=1,2,,m;j=1,2,n)

Where Cij is the contribution rate of the words of the j rd post of the i nd student to the total individual posts, and fij is the number of words of the j th post of the i th student.

The post depth threshold is the median of the post count contribution rate, and the post count contribution rate is calculated as in (11): Ci=j=1nfiji=1m(j=1mfij)(i=1,2,,m;j=1,2,,n)

Where Ci is the contribution of the number of posts or replies of the i nd student to the total number of posts or replies of all students, fij is the number of posts of the j th student of the i th student, m is the number of students, and n is the number of posts.

The quantification of the remaining behavioural characteristics is broadly similar to the process used to quantify the above characteristics and will not be repeated.

Data collection and pre-processing
Data sets

This study collected behavioural data from students of a university in the first semester of the 2020-2021 academic year when they studied course X on a MOOC platform, their grades for assignments on a platform, and their grades for the final offline exam. A dataset containing 14 behavioral features was collected. However, the amount of data that could completely cover these 14 features was also limited. Finally, only 115 data were extracted, all of which were valid data. These included students’ personal information, student learning behavior data, and the final exam results, which were considered as the learning effect of English learners.

Data pre-processing

Data discretisation

In the original data of the course, the final grades are in percentage, which are discretised in order to facilitate the prediction of the final learning effect from the subsequent training data, with grades in the interval [90,100] classified as grade A, grades in the interval [80,89] as grade B, grades in the interval [60,79] as grade C, and grades in the interval [0,59] as grade D. The data are then discretised into the following categories. The data was preprocessed according to behavioral features and their respective thresholds are shown in Table 1. This is in contrast to the quantitative processing of behavioral features in the previous subsection.

Data Normalisation

Due to the non-uniformity of scale and attributes among different categories of data, data need to be standardised, and the two commonly used methods for data standardisation are Min-max standardisation and Z-score standardisation.

Min-max standardisation: also called extreme difference standardisation, is a linear transformation method, it is used to normalise the original data, the original data corresponding to the linear mapping in the [0, 1] interval. x=xminmaxmin(x is a positive indicator) x=maxxmaxmin(xis a negative indicator)

Z-score standardisation: is based on the mean and standard deviation of the original data, so that the standardised variable falls between [-1, 1]. x=xμσ

A model for analysing data on the online behaviour of university English learners
Learning social network analysis
Network density and interpretation

The density of a graph refers to the closeness of relationships between nodes in a community network. In a community network, density belongs to an important variable. The network can be a close relationship or an estranged relationship. In general, the community with close relationships has more cooperative behaviors, easier information communication, and better group cooperation performance, while the community with estranged relationships will have the problem of poor information and less contact, which is a frequently used concept in social network research. The density of the overall network is calculated as follows: if there are n actors in the network, the actual number of relations it contains is m when the overall network is undirected, the theoretical maximum of the total number of relations it contains is n(n – 1)/2, and the density of the network is m/(n(n – 1)/2) when the overall network is a directed network, the theoretical maximum of the total number of relations it contains is n(n – 1), and the density of the network is m/(n(nl)).

Centrality and interpretation

Centrality is an important index in social network research, used to analyze and evaluate the position of a node in the network, analyze what kind of power a node has in its network structure, or what kind of central position it holds, reflecting the degree of importance of the node in the network. Network centrality can be divided into three indicators: degree centrality, proximity centrality, and mediation centrality.

Degree centrality: Degree centrality is the sum of a person’s number of relationships and is an important indicator of who is at the centre of the group in which they live, as is often done in social networks. The formula for measuring degree centrality is as follows, with equation (15) being the absolute value, i.e., summing up the number of relationships a person has. Equation (16) is a normalised value, i.e. dividing the sum of the number of relationships of a person by the maximum number of relationships of that network, mostly used for comparisons between ten same networks. The CD(mi) in the formula is the number of points of the node ni, Xij is the value of 0 or 1, respectively, representing whether the enabler j acknowledges a relationship with the enabler i , g is the number of enablers in that network, and the normalisation process is to be divided by the maximum number of relationships possible for a given node in a social network, i.e., the number of g–1 relationships. CD(ni)=d(ni)=jXij=jXji CD(ni)=d(ni)g1

The formula for measuring the degree centrality of a group is as follows, equation (17) represents the difference between the degree centrality of the highest degree centrality actor in a network and the degree centrality of the other actors. The greater the difference, the higher the degree centrality of the group. CD=i=1g[ CD(n*)CD(ni) ]maxi=1g[ CD(n*)CD(ni) ]

Proximity centrality: the formula for measuring proximity centrality is as follows, d(ni, nj ) in equation (18) is the distance between ni and nj , CC(ni) is the inverse of the sum of the distances from node ni to other nodes, the smaller the value means that the greater the distance between node ni and other nodes, and the more peripheral the agent is, the lower the degree of its importance, and vice versa. This metric is also highly correlated with degree centrality, as actors with high degree centrality tend to have high proximity centrality. Cc(ni)=1j=1gd(ni,nj)

Mediator centrality: The formula for measuring mediator centrality is as follows, gjk in equation (19) is the number of shortcuts for enabler j to reach enabler k, gjk(ni) is the number of shortcuts with enabler i on the shortcuts for enabler j to reach enabler k , and g is the number of people in this network. CB(ni)=j<kgjk(ni)/gjk(g1)(g2)

Group mediated centrality is also an indicator of the overall structure of a network. The higher the value of group mediated centrality, the higher the likelihood that the information resources in the network are monopolised by a few actors, i.e., the higher the likelihood that there is a high degree of control of the information resources by the actors, and the worse the organisation of the network in which they are located. The formula for measuring group mediation centrality is as follows, equation (20) represents the gap between the mediation centrality of the actor with the highest mediation centrality and the mediation centrality of other actors in a network. The greater the gap between the two, the higher the value of group mediocentrality. CB=2i=1g[ CB(n*)CB(ni) ][ (g1)2(g2) ]

Cluster analysis of online learning behaviour data
DBSCAN algorithm

The basic idea of the DBSCAN algorithm is to first select a data point as a starting point and calculate the density of all points in its neighbourhood. If the density is greater than or equal to a predetermined threshold, then the point is a core point. If the density is less than that threshold, then the point is a noise point. For each core point, a neighbourhood radius eps is used to calculate the density of all points within the neighbourhood [30]. If the density is greater than or equal to this threshold, then these points can form a cluster. For each boundary point, if it is within the neighbourhood of a core point, then it can be grouped in the cluster where that core point is located. If it is not within the neighbourhood of any of the core points, then it is a noise point. Ultimately, all the data points that are classified as clusters are the clustering results and all the noise points are the noise results [31].

DBSCAN algorithm related definitions:

Eps Neighbourhood: any data object p that is in a space centred on it with a radius of Eps the data object that is inside the space is called the neighbourhood object of the point p with the formula: NEps(p)={qD|Dist(p,q)Eps} where Dist(p, q) stands for the meaning of the distance from the data object p to the data object q.

Core and Boundary Points: for any data object p, a MinPts threshold is set, and if |NEPS(P)|≥MinPts, i.e., the number of sets of data objects in the neighbourhood of data object p is greater than or equal to the given MinPts threshold, then it is called p a core point and if the number of sets of data objects in the neighbourhood of data object p is less than the given MinPts threshold and is in the neighbourhood of the other data object, then it is called p a boundary point.

Direct density reachability: an object p is said to be directly reachable from an object q density if p is in the Eps -neighbourhood of a point q, i.e., pNEps(q), and the object q is a core point, i.e., |NEPS(p)|≥MinPts.

Density reachable: given a dataset D, an object pn is said to be density reachable from an object p1 when there exists an object p1, p2,…, pnD, for piD(0 < i < n), if pi+1 is density direct from object pi. Density reachable is asymmetric.

Density connected: if there exists object pD such that objects p and q are density reachable from object r, then objects p and q are said to be density connected. Density connectedness is symmetric.

Clusters and Noise: Starting from any core point object, all objects that are density-reachable from that object form a cluster, and objects that do not belong to any cluster are noise.

DBSCAN algorithm flow

The DBSCAN algorithm needs to manually set two parameters, neighbourhood radius threshold Eps and neighbourhood density threshold MinPts , before the clustering starts, after which the data objects in the spatio-temporal dataset can be automatically classified into different clusters, and the implementation steps are as follows:

Input spatio-temporal dataset D, neighbourhood radius threshold Eps, neighbourhood density threshold MinPts.

Starting from any one of the unprocessed data objects q in the spatio-temporal data set D, traverse all the data objects within the search range with Eps as the radius, and if the number of data objects covered within the range of Eps is not less than the pre-determined number of MinPts, then these data objects are grouped together into a clustering cluster C with q as the core object.

If a data object q, the number of data objects covered by its domain scope does not reach MinPts then the data object is temporarily treated as a noise point.

Sequentially accessing other unprocessed data objects to form a number of core objects, noise points, and clustering clusters, and merging and expanding the data objects and clustering clusters that satisfy the density direct or the density reachable.

Repeat steps 2) to 4) continuously until no new data objects are merged into any of the clustering clusters or labelled as noise points, and end the algorithm.

Output the clustering results.

Visual analysis of online learning behaviour data and intervention strategies
Learning Social Analytics

In this paper, based on the comparative analysis of various analysis tools, UCINET is used to analyse the social behaviours of English learners on learning forums. It integrates NetDraw, which implements visualization functions, Mage, a three-bit display analysis software, and Pajek, a free application for large-scale network analysis, which is capable of analyzing social network relationships more comprehensively.

Analysis of the structure of the teacher-student network

The main functions of the forum are posting and replying. In this paper, we extract the records of these two behaviors from the log data with the keywords “Create-Post” and “Reply-Post” respectively. These two behaviors are extracted from the log data, and after cleaning and preprocessing, they are organized, and a two-dimensional matrix of teacher-student interactions is constructed, as shown in Table 2, with Si denoting the English learner and T denoting the teacher. Due to the large amount of data, only a portion of it is displayed.

Students and students interaction two dimensional matrices

S1 S2 S3 S4 S5 S6 …… T
S1 0 1 0 0 1 0 …… 2
S2 0 0 2 1 0 0 …… 4
S3 2 0 0 0 1 0 …… 6
S4 1 2 0 0 0 0 …… 4
S5 0 0 1 0 1 0 …… 2
S6 1 1 0 0 0 1 …… 3
…… …… …… …… …… …… …… …… ……
T 2 3 3 4 3 2 …… 0

With the UCINET analysis tool, the two-dimensional matrix can be converted into a visualised network structure diagram of teacher-student interaction, as shown in Figure 1. The figure depicts the interactions between teachers and learners in the course on the cloud classroom platform’s online forum. Among them, each node represents a learner or a teacher. The larger the node indicates that the student or teacher represented has more posting-response times in the forum. The more active the performance, and the connecting line between the nodes indicates the response relationship between the two.

Figure 1.

Student interaction network diagram

From the teacher-student interaction network diagram, it can be seen that all members (including teachers) are in a network, and there is no isolated member. Node T, representing the teacher in the diagram, is obviously larger than the other nodes, which intuitively reflects the phenomenon of interaction between ELLs-teachers as well as sideways to indicate that the teacher’s dominant role in forums, and ELLs are more willing to reply to the teacher’s posts, no matter whether it is Based on the teacher’s authority in the classroom or the interesting and seminar nature of the post topics, the teacher’s role in the forum is obvious.

Density analysis

After calculating the network density as shown in Table 3, the network density of the course is 0.207, with a standard deviation of 0.658, which indicates that the English learners in the course are not very closely connected, and the interaction of the English learners in the forum is not particularly frequent. Combined with the network diagram of teacher-student interactions, it can be seen that the members on the edge of it have a relatively low level of interaction with other members, so the teacher should adopt appropriate strategies to guide the interaction of this part of the members with other members.

Density analysis

1 2 3
Avg Value Std Dev Avg Wed Degree
0.207 0.658 12.393
Centrality analysis

The centrality of English learners can be analysed through UCINET, and combined with the algorithms related to the analysis of social network centrality, the relative point degree centrality, intermediate centrality, and proximity centrality were calculated, and the results were obtained as shown in Table 4, Table 5, and Table 6 respectively.

Relative point center analysis

English learner Point of point Point of entry Point center English learner Point of point Point of entry Point center
2 20 49 0.57 56 10 9 0.15
14 19 31 0.41 6 7 11 0.14
26 27 14 0.34 39 12 6 0.14
4 26 11 0.3 45 13 5 0.14
11 17 20 29.01 52 9 9 0.14
8 20 11 0.25 61 10 8 0.14
13 14 17 0.25 26 10 7 0.14
18 14 17 0.25 29 9 8 0.14
24 25 16 0.25 37 8 9 0.14
22 18 12 23.01 42 8 9 0.14
33 15 15 0.24 47 6 11 0.14
5 19 10 0.24 24 5 11 0.13
30 13 15 0.23 40 7 8 0.12
10 15 12 0.23 54 9 6 0.12
15 13 14 0.22 36 6 8 0.11
20 20 7 0.22 46 7 7 0.11
32 13 20 0.22 48 8 6 0.11
21 16 10 0.21 49 8 6 0.11
27 13 14 0.21 57 5 9 0.11
1 20 5 0.2 59 8 6 0.11
12 13 13 0.2 38 6 7 0.1
19 13 2 0.2 41 6 7 0.1
16 16 10 0.19 50 6 7 0.1
3 12 11 0.19 55 9 4 0.1
17 11 11 0.18 60 6 7 0.1
7 12 9 0.17 51 7 5 0.09
9 10 11 0.17 43 4 5 0.07
31 8 13 0.27 44 3 4 0.05
34 12 9 0.17 53 4 3 0.05

Relative intermediate analysis

English learner Betweenners Relative central degree English learner Betweenners Relative central degree
2 50.226 0.029 31 9.327 0.001
4 29.1 0.018 55 8.853 0.006
24 29.257 0.016 20 8.46 0.006
14 24.383 0.015 21 7.71 0.005
50 22.433 0.014 29 7.26 0.005
18 20.583 0.013 51 7.21 0.005
10 20.3 0.012 39 5.677 0.004
26 20.133 0.012 23 5.51 0.004
13 19.85 0.012 25 5.427 0.004
8 18.526 0.011 48 5.343 0.004
33 16.38 0.01 35 5.26 0.004
22 16.217 0.01 53 5.177 0.004
5 15.6 0.01 44 4.627 0.004
17 15.183 0.01 45 4.593 0.004
52 14.993 0.009 93 4.51 0.004
11 14.693 0.009 47 4.127 0.003
16 14.267 0.009 40 2.927 0.003
34 14.217 0.009 56 2.01 0.002
12 14.1 0.009 58 2.427 0.002
60 13.68 0.009 49 2.377 0.002
36 12.68 0.008 59 2.32 0.002
15 12.183 0.008 37 2.26 0.002
1 11.017 0.007 54 2.01 0.002
27 10.633 0.007 28 1.577 0.002
41 10.183 0.007 50 1.653 0.002
19 9.767 0.006 46 1.593 0.002
6 9.66 0.006 42 0.76 0.001
9 9.6 0.006 43 0.01 0.001
7 9.6 0.006 52 0.01 0.001

Relative proximity to central analysis

English learner Short cut Proximity center English learner Short cut Proximity center
18 224 0.29 23 231 0.272
30 224 0.29 39 231 0.272
10 224 0.29 47 231 0.272
33 224 0.29 44 291 0.262
13 225 0.289 25 231 0.272
22 225 0.289 38 231 0.272
5 225 0.289 45 232 0.271
1 225 0.289 53 232 0.271
32 226 0.288 48 232 0.271
17 227 0.286 58 232 0.271
12 226 0.286 56 233 0.27
15 227 0.286 46 253 0.27
60 227 0.286 37 233 0.27
16 227 0.286 40 233 0.27
34 227 0.286 49 2353 0.27
20 227 0.286 28 293 0.27
29 228 0.285 59 234 0.269
31 228 0.285 54 234 0.269
36 228 0.285 50 235 0.267
55 228 0.285 42 236 0.266
19 228 0.285 43 238 0.264
3 228 0.285 52 239 0.263
27 228 0.285 57 299 0.263
18 224 0.29 23 231 0.272
30 224 0.29 39 231 0.272
10 224 0.29 47 231 0.272
33 224 0.29 44 291 0.262
13 225 0.289 25 231 0.272
22 225 0.289 38 231 0.272

Point degree centrality indicates the direct connection between English learners, and the higher the point degree centrality, the closer the direct connection between English learners and the higher the attention between them. As can be seen in Table 4, the relative dot centrality of ELLs 2 and 14 is very high, indicating that they are very willing to connect with other students and maintain a high level of activity in the forum, and their dot in is significantly greater than their dot out, which indicates that they have a high level of prestige in the forum. The dot centrality of ELLs 42, 43, 52, and 57 is very low, which indicates that they seldom connect with other students, and both point-in and point-out degrees are very low, they rarely pay attention to others and also rarely get attention from others, and they have very little interaction in the forum.

From Table 5, we can see that the intermediate centrality of ELLs #2, 4, 24, 14, 30, and 18 is high, which indicates that they have more control over the resources in the forums and they are able to control the dissemination and flow of information and play the role of communication among many members. The intermediate centrality of ELLs #43, 52, and 57 is 0, which indicates that they are in the periphery of the forums and are seldom able to facilitate communication among others.

Proximity centrality indicates the extent to which English learners are dependent on others. As can be seen from Table 6, the English learners’ proximity centrality is all relatively close to each other the difference is not very big, and it is generally low, which indicates that they have a certain degree of dependence on other members in the process of posting and replying to the posts in the forums, and in combination with the diagram of the teacher-student social network structure, this degree of dependence may originate from the dependence on the teacher.

Descriptive Analysis of Online Learning Behavioural Characteristics
Analysing the Frequency of Course Learning for English Learners

The online learning platform records log data of English learners’ online learning behaviours, and each operation of English learners in the process of online learning is recorded, and the frequency of operation refers to the number of times English learners operate behaviours for the course resources they learn when they study the course, and the number of logs is added to the number of logs for each operation. According to the teaching plan for online learning, the online learning platform for autumn 2019 has now been chosen. The log data generated was analysed for a total of 153 days over 5 months of course teaching, and 5163 students were sampled for analysis using the sampling method.

The SPSS Statistics tool was used for the analysis, resulting in the page views of each English learner taking the course in a semester, as shown in Figure 2, with the horizontal axis representing the unique identification of each English learner and the vertical axis recording the number of times that the English learner viewed the course pages in the online learning platform in a semester. From the figure, it can be seen that the differences in English learners’ frequency of course learning are relatively distinct among them, the smallest number of course views in a semester of learning is 711, and the largest number of views reaches 13,312.

Figure 2.

The course page browsing frequency chart

Now, the number of times English learners’ course pages are viewed is sorted in descending order, as shown in Figure 3. In order to more clearly analyse the English learners’ study of the course, the number of course page views of 1,000 is used as a division into 2 parts, which are the group of English learners with less than 1,000 course views in a semester and the group of people with more than 1,000 course views.

Figure 3.

List of course page browsing times statistics

The statistics of English learners for course page views of less than 1000 are shown in Figure 4. It can be obtained that the group of people with less than 1000 course pageviews occupies 97.7% of the total number of people, and the total number of pageviews in a semester is 24276, with the average number, of course, pageviews being 48. Among them, the number of course page views within 200 times in a semester of this population occupies about 94.1%.

Figure 4.

The course page is less than 1,000

The statistical graph of English learners for the number of course page views greater than 1000 is shown in Figure 5. It can be obtained that the group of people with more than 1,000 course page views occupies only 2.6% of the total number of people, the total number of course page views in a semester is 41,764, and the average number of course page views is 3,400. Among them, the number of course page views within 4,000 times in a semester for this group is about 73.5%.

Figure 5.

The course page is more than 1,000

Analysis of English Learners’ Learning Time Preferences

According to the teaching plan of online learning, we will continue to select online learning behavior data from English learners in autumn 2019 for sampling and analysis. Through the behavioural data generated by the online learning platform, the statistics yielded a total of 607,618 log messages of English learners interacting with the course while they were studying online, and the analysis can obtain the overall time distribution of students’ participation in online learning, as shown in Figure 6. From the figure, it can be seen that the learning time of English learners has a certain regularity. That is, English learners visit the online learning platform more at the beginning of the semester and at the end of the semester, especially at the end of the semester, English learners’ visits to the learning platform increase significantly and reach the peak to a certain extent.

Figure 6.

The course studies the overall time distribution

In addition to the statistics on the browsing of the courses, the discussions of the courses in the online learning platform were also analysed, resulting in a pattern similar to the browsing of the courses, and the overall situation is shown in Figure 7. From the figure, it can be seen that the discussion and evaluation of English learners’ courses are concentrated in the first and the last month of a semester, and the number of discussions at other times is almost 0 times.

Figure 7.

The overall discussion of the situation statistics

Overall, in terms of study time, the frequency of online learning by English learners shows a rapid rise then a decline tending to level off, and then a sharp rise again at the end of the semester. This shows that English learners’ time allocation in the process of online learning is unreasonable, which is specifically manifested in the low participation and motivation of learning in the middle of the semester, and they only spend more time and energy to complete the academic exams at the end of the semester.

Classification of Online English Learner Types

The DBSCAN clustering algorithm was used to cluster the dataset for unsupervised learning using the 14 learning behavioural characteristics of English learners’ online learning as the clustering elements. It was found that category C=2, score: 0.55; C=3, score: 0.59; C=4, score: 0.51; C=5, score: 0.49, and the segmentation result when the category is 3 is shown in Fig. 8, and when n_clusters=3, then DBSCAN clustering has the best performance.

Figure 8.

English learners behavior clustering

In order to classify the types of different clusters of English learners, we have analyzed the differences in the online learning behaviors of the 3 types of English learners, and the mean values of the 14 online learning behavioral indicators of each cluster of English learners were compared as shown in Figure 9. In the figure, it can be seen that the trajectories of online learning behaviors of the 3 classes of English learners are roughly similar, and there are very obvious differences in some behavioral features, which are typically characterized as follows:

CLUSTER 0: The online learning activities of this category of English learners are more balanced among the three categories, and among the 14 indicators characterizing online learning behaviors in SPOC environments, only 2 indicators are lower than 0.1, and the remaining 12 behavioral characteristics are all above 0.1, which show a certain degree of activity.

CLUSTER 1: The online learning activities of this category of English learners fluctuate the most among the three categories, and of the 14 indicators that characterize online learning behaviors in SPOC environments, 2 indicators are lower than 0.1, and 3 are equal to 0.1, while the remaining 7 behavioral features are above 0.1, which shows that the online learning behaviors of this category of English learners are not active enough.

CLUSTER 2: The online learning activities of this category of English learners are more stable among the three categories, and among the 14 indicators characterizing online learning behaviors, all the behavioral features are above 0.1, which shows a certain degree of activity for most learning behaviors. Accordingly, the English learners in this cluster are named “highly active English learners”.

Figure 9.

English learner online behavior data clustering results

Design of Instructional Intervention Strategies
Conceptual readiness and learning adaptation

Guiding students to accept new forms of learning, such as teacher-student and student-student interaction, from face-to-face to long-distance interaction with the help of network devices. Emotional expression, text pop-ups, emoticon pop-ups, symbols and so on, these interactive methods and emotional expression become new forms in the teaching process, and students are guided to accept and adapt to them in order to make the whole teaching process more smooth and natural, and the learning process more active and engaged.

Encourage students to engage in new aspects of teaching, such as interactive sessions: voting, scoring, quizzing, feedback, socialising. Incentive sessions: speak the most, get the most votes, snatch the star, get the most likes, good thinking, etc. Encouraging students to participate in well-designed sessions is the only way to make it possible for English language learners to achieve the set pre-determined teaching goals.

Quantifying evaluation criteria to activate learning motivation

In the course of the online learning environment, the learning behaviour data generated by English learners will be recorded and collected in real time, and value judgments will be made on English learners’ learning through the learning behaviour data generated by English learners. Such judgements vary with the changes in teaching objectives, teaching content, and teaching environments, but essentially, they all require the establishment of appropriate evaluation dimensions, refined evaluation indicators, and evaluation criteria that can give students clear expectations, in order to present evaluation results that are authentic, objective, and comprehensively reflect the students’ situation, so that ELLs can self-check and teachers can launch interventions.

Behavioural monitoring for early warning and enhanced feedback on learning

The behaviour of ELLs is monitored to enhance the assessment of expected learning outcomes for ELLs, to provide timely warnings of ELLs found to be at risk for learning, to enhance the process of intervention, and to reinforce feedback on learning. This process involves differential intervention for different types of ELLs. Higher-level teaching assistants and excellent course specialists at the same level can be introduced into the learning process. Statistics on the frequency and duration of low performing ELLs’ behaviours in each session are tallied and summarised to the TAs on a weekly basis. The TAs then convert the summary results into weekly behavioural assessments based on quantitative evaluation criteria, and then monitor the behaviour of the low performing learners based on the results of the assessments, and then provide real-time behavioural feedback to the low performing learners in a targeted manner.

Conclusion

The study comprehensively analysed the online learning behaviours of English learners by collecting and pre-processing the data generated from their online learning and using the DBSCAN methodology to cluster the behavioural data in online learning, which was used to quantify the pedagogical interventions and design improvement strategies for different types of learners. The results of each specific study are as follows:

Through the collection of data, on the basis of which the online learning social behaviours of English learners were analysed, it was found that the learners interacted with the teacher to a higher degree, the teacher its dominant role, and interacted with other learners to a lower degree, but there was no isolated behaviour.

through the online learning platform to record English learners’ online learning behaviour data, extracted 5163 students’ data for analysis which, found that in a semester of learning, the minimum number, of course, browsing 711 times, the maximum number of browsing up to 13,312 times, there is a big difference.