Research on the Integration of Student Behavior Analysis and Curriculum Education Strategies in Colleges and Universities under Deep Learning Framework

In recent years, the field of educational data mining has begun to attract widespread attention, driven by the applications of education informatization, distance education, smart education and flipped classroom. With the continuous accumulation and in-depth development of student behavioral data, analyzing learners’ behavioral characteristics is essential for intelligently adjusting educational and teaching strategies in higher education and improving learning quality [1-3]. However, most of the traditional analysis and management of student behavior relies on the personal empiricism of administrators and decision makers to judge, which lacks the cognition and judgment of learners’ personalized development [4-5]. At the same time, it cannot deeply meet and guide students’ learning behaviors, provide personalized learning contexts and promote learning optimization [6-7]. In today’s increasingly advanced information technology, through a variety of machine learning, artificial intelligence technology mining students’ in and out of school life behavior, learning behavior, and the establishment of a model of students’ behavioral characteristics, for the prediction of students’ future development trends [8-10]. The significance is to be able to control the daily behavioral safety of students in a timely manner, timely learning guidance and psychological intervention for problematic students [11].

The application of deep learning in the field of education has attracted much attention, especially in the analysis of students’ classroom behavior. Deep learning algorithms have the advantages of processing complex data, effective feature extraction and adaptivity, and student classroom behavior analysis plays an important role in teaching reform [12-13]. Through the analysis of student behavior data, teachers can have a comprehensive understanding of student learning and provide effective guidance [14-15]. In addition, student classroom behavior analysis can monitor student engagement, track student attention levels, identify students’ emotional states, and even assess the effectiveness of teaching and learning, thus providing valuable information for education [16-18].

Fu, R. et al. constructed a classroom learning behavior analysis system based on convolutional neural network to identify students’ listening, fatigue, hand-raising, lying down, and reading/writing behaviors in the classroom by using a human detection algorithm and a skeletal key point extraction algorithm to support the analysis of students’ classroom behaviors under real conditions [19]. Ma, C. et al. showed that the analysis of students’ classroom behaviors has become an important index for teaching quality evaluation, and proposed a deep learning-based face recognition system to determine the effectiveness of classroom teaching by analyzing students’ side face, head down, and eye focus, which facilitates the development and implementation of teaching [20]. Zhou, J. et al. proposed a method for detecting students’ typical classroom behaviors based on the extraction of key information of human skeleton, which effectively avoids the problem of low recognition accuracy due to the students’ physique, dress cover-up, and the interference of background information, and it can reflect the students’ learning status even though it is a guide to the implementation of classroom teaching [21]. Jia, Q. et al. examined the YOLOv5 behavioral detection model, which on one hand incorporates the contextual attention mechanism to enhance the feature extraction capability of the model, and on the other hand replaces the generated feature maps with VGG-19 in OpenPose to improve recognition accuracy, which is of high utility in smart classroom behavioral analysis and faculty management optimization tasks [22]. Liu, S. et al. investigated a deep learning based method for evaluating learners’ classroom behaviors using MTCNN detection method with image enhancement algorithm to identify students’ classroom behaviors combined with a quantitative evaluation method, CFIndex, in order to evaluate the students’ real classroom performances [23]. Gupta, S. K. et al. designed a method for analyzing students’ emotional content based on maximum interval face detection, which is capable of recognizing different emotional states of students in the classroom in order to form instructional feedback, allowing teachers to change their teaching strategies even if they have to, which is conducive to improving students’ learning efficiency [24]. Gong, B. et al. analyzed the value of deep learning technology for the application of intelligent classroom behavior recognition by applying the random forest algorithm and correction matrix to the analysis of classroom teaching behavior data in colleges and universities, and the results showed that the analysis of teaching activities under the framework of deep learning optimized the teachers’ teaching and management of the students, and helped to improve the efficiency of teaching and learning [25]. Trabelsi, Z. et al. introduced artificial intelligence behavior recognition technology to establish a student-centered intelligent real-time visual classroom, which can detect students’ mood, attendance, and attention level in multi-dimensions, and provide teachers with visual analysis results, which is conducive to the improvement of teachers’ teaching management and students’ learning efficiency [26]. Akila, D. et al. developed a student behavioral analysis framework for offline education using Deep Learning-Student Attention Recognition Model (DL-SARF) to extract feature information from student’s side face, head down, and eyes to assess student’s classroom performance, styled instructional programs are easier to manage and implement [27]. Liu, H. et al. proposed a student behavior recognition method based on Rs-YOLOv3 network to address the complex and slow process of traditional human behavior recognition by replacing the resn module therein with SE-Res2net module and using DIoU_Loss as the bounding loss function, which significantly improves the model’s performance of recognizing small targets [28]. Su, X. et al. evaluated the accuracy and effectiveness of YOLO v5s algorithm in recognizing students’ classroom behaviors by inputting the data images processed by LabelImg annotation tool into a deep learning detection model to study the recognition performance of the model, which provides technical support for improving teaching quality and strengthening classroom management [29].

In this paper, we propose a student behavior recognition method based on improved OpenPose to be applied to a student behavior analysis system in universities. The method first performs a series of preprocessing on the extracted video frame images, and then detects the location of the target student in the images using a target detection algorithm incorporating an attention mechanism. Next, the detected images are used to extract the coordinates of the student’s skeletal joint points through an improved OpenPose human 2D pose estimation model. Finally, the learning behavior is classified by training a support vector machine. Based on the recognition results of student behaviors, educators can gain a deeper understanding of students’ behavioral habits, preferences, and needs in order to design effective educational strategies for their courses.

2

Behavioral analysis of college students based on deep learning

2.1

Deep Learning Based Student Behavior Analysis System Architecture for Colleges and Universities

2.1.1

Deep Learning Theory

Deep learning is a kind of deep constructive learning style, which requires learners to be able to critically learn new knowledge on the basis of understanding, relate it to the existing cognitive structure, effectively internalize, transfer and apply the knowledge, relate the knowledge to new real-life situations and solve practical problems, so as to obtain the development of higher-order abilities. Deep learning focuses on the accumulation of knowledge, as well as the emotional experience, value recognition, and ability development of learners during the process of knowledge exploration.

Teaching methods such as problem-based learning and task-driven learning are considered to be effective ways to promote deep learning, and these teaching methods emphasize the initiative and participation of learners, encouraging them to keep exploring and reflecting in the process of solving practical problems. At the same time, deep learning also puts forward high requirements for the context of knowledge learning, suggesting that teaching should be closely linked with practice and real situations, so that learners can experience the power and value of knowledge in real or simulated situations. At present, deep learning has become an important trend in the development of higher education, and is also the goal pursued by blended teaching in the context of informationization. In course construction and teaching, it is necessary to effectively connect teaching time and space, strengthen teaching and learning interaction, purposefully design and carry out learning tasks based on problems, tasks and real situations, provide students with rich learning resources, stimulate their interest and motivation to learn, and guide them to actively explore and actively practice, so as to achieve deep learning.

2.1.2

Student Behavior Analysis Framework

In this paper, we design a deep learning-based behavior analysis system for students in colleges and universities. For the subsequent development of the system, the system architecture should follow the principles of modularization, high cohesion, and low coupling when designing, in order to meet the needs of scalability, stability, and easy maintenance of the system. The overall architecture of the system can be divided into four layers from bottom to top: data storage layer, data processing layer, business logic layer and user interface layer. 1)

Data Storage Layer: Select the database (relational, non-relational and distributed storage system) according to the structure and access mode of the data, and also need to ensure the security, integrity and efficient access to the data.

2)

Data processing layer: deep learning models are computed in this layer to analyze and process the video frames so as to accurately identify the different behaviors of students.

3)

Business Logic Layer: The user management module is responsible for user registration, login, and permission assignment. The data acquisition module is responsible for collecting students’ classroom behavior data from cameras, learning management systems, etc., and performing data preprocessing on the raw data. The behavior recognition algorithm management module is responsible for invoking and managing deep learning models to perform behavior recognition and analysis on the collected data. The analysis report module generates corresponding analysis reports based on recognition results and in response to user needs, providing references for teachers, students, and administrators.

4)

User interface layer: human-computer interaction interface.

2.2

Student Behavior Recognition Based on Improved OpenPose

Regarding the behavior recognition module in the above system, this paper proposes a method for recognizing students’ classroom behavior based on human skeleton.

2.2.1

Image Preprocessing

In order to better meet the needs of student behavior recognition, the extracted video images are preprocessed as follows: 1)

Take the center point as the benchmark, the image is uniformly scaled and cropped to 432 × 368 size.

2)

Image denoising, noise is prevalent in the image, of which Gaussian noise is the most common one, in order to effectively suppress Gaussian noise in the image, this paper uses Gaussian filter for processing. The one-dimensional Gaussian distribution and two-dimensional Gaussian distribution are shown in equation (1) and equation (2): (1) $G (x) = \frac{1}{\sqrt{2 π} σ} e^{- \frac{x^{2}}{2 σ^{2}}}$ $$G(\:x\:)\: = \frac{1}{{\sqrt {2\pi } \sigma }}{e^{ - \frac{{{x^2}}}{{2{\sigma ^2}}}}}$$ (2) $G (x, y) = \frac{1}{2 π σ^{2}} e^{- \frac{x^{2} + y^{2}}{2 σ^{2}}}$ $$G(\:x,y)\: = \frac{1}{{2\pi {\sigma ^2}}}{e^{ - \frac{{{x^2} + {y^2}}}{{2{\sigma ^2}}}}}$$

2.2.2

Target detection

For the detection of student targets in video frame images, this paper uses the Tiny_YOLOv3 target detection algorithm for student target detection. Due to the shallow network layer of Tiny_YOLOv3, the detection speed is relatively fast, but the accuracy is not enough, in order to improve the accuracy of detection, this paper incorporates the attention mechanism SENet to improve the network structure of the original model [30]. SENet is a typical method for enhancing or suppressing channels for different tasks, analyzing the importance of each feature channel to improve accuracy, as a typical channel attention mechanism. In this paper, we incorporate SENet into the Tiny_YOLOv3 network structure as a way to improve detection accuracy.

2.2.3

Target bone point acquisition

Human posture estimation based on OpenPose algorithm is mainly divided into two kinds of 2D posture estimation and 3D posture estimation, among which 2D posture estimation is mostly studied, and 2D posture estimation is further divided into single-person posture estimation and multi-person posture estimation. The single-person pose estimation methods mainly contain coordinate regression-based methods, heat map detection-based methods, and hybrid models based on coordinate regression and heat map detection. Multi-person pose estimation is mainly categorized into two types: 1)

Top-down methods, i.e., detect each person first and then perform pose estimation for each person, typical models are RMPE, MaskR-CNN, etc.

2)

Bottom-up approach, i.e., detect all the joints first, and then determine which individual each joint belongs to, the most typical model is OpenPose. (1)

OpenPose network structure

The OpenPose model structure is shown in Fig. 1, (a) and (b) represent the external structure of the OpenPose model and the structure of the VGG19 network, respectively.There are two stages before and after the OpenPose model. Stage 1: The input image passes through the first 10 layers of the VGG19 network to obtain the feature map F. Stage 2: The obtained feature map F is used as an input to a two-branch multi-stage convolutional neural network, where the upper branch (S( · ) in Fig. 1(a)) is used to predict a set of 2D confidence maps of the body part locations, while the lower branch (L( · ) in Fig. 1(a)) is used to predict a set of 2D vector fields of the part affinities showing the affinity between joints (PAF).

The input to the first stage of the OpenPose network is the feature map F, which is processed by a series of CNNs to obtain the joint 2D confidence map S¹ and partial affinity L¹ [31]. And from the second stage onwards, the input to the network contains a total of three parts, F, S^t − 1 and L^t − 1, as shown in equation (3): (3) $\begin{array}{l} S^{t} = ρ^{t} (F, S^{t - 1}, L^{t - 1}) & t \geq 2 \\ L^{t} = φ^{t} (F, S^{t - 1}, L^{t - 1}) & t \geq 2 \end{array}$ $$\begin{array}{l} {{S^t} = {\rho ^t}(\:F,{S^{t - 1}},{L^{t - 1}}\:)}&{t \ge 2} \\ {{L^t} = {\varphi ^t}(\:F,{S^{t - 1}},{L^{t - 1}}\:)}&{t \ge 2} \end{array}$$

Through repeated iterations of the multi-stage convolutional neural network until the network converges. Eventually, at the time of prediction, whether this joint belongs to the same person or not is measured by the affinity (PAF) between the joint pairs d_j₁ and d_j2, as shown in equation (4): (4) $E = \int_{a = 0}^{a = 1} L_{c} (p (u)) \times \frac{d_{j 2} - d_{j 1}}{| d_{j 2} - d_{j 1} |_{2}} d u$ $$E = \int_{a = 0}^{a = 1} {{L_c}} (\:p(\:u\:)\:) \times \frac{{{d_{j2}}\: - {d_{j1}}}}{{|{d_{j2}}\: - {d_{j1}}{|_2}}}du$$

Where: p(u) denotes the pixel point between consecutive image cable points d_j1 and d_j2 as shown in equation (5): (5) $p (u) = (1 - u) d_{j 1} + u d_{j 2}$ $$p(\:u\:) = (\:1\: - u\:)\:{d_{j1}}\: + u{d_{j2}}$$ (2)

OpenPose network improvement

The feature extraction network of the original OpenPose model adopts VGG19 convolutional neural network, and the research shows that: when the convolutional neural network reaches a certain depth, not only can’t improve the performance, but also cause the network to converge slower, and the detection performance deteriorates. In this paper, we propose MobileNet, a lightweight network for mobile with depth-separable convolution as the core, which has fewer parameters and lower computing cost than standard convolutional neural networks. Depth separable convolution is composed of depth convolution (DW) and pointwise convolution (PW). The structure of standard convolution and depth separable convolution is shown in Fig. 2, (a), (b), and (c) denote standard convolution, depth convolution, and point convolution, respectively.

Assuming that the input feature map size is D_i × D_i × M, the convolution kernel size is D_K × D_K × M, and the output feature map size is D_θ × D_θ × N, the number of parameters to be convolved using the standard convolution of Fig. 2(a) is: (6) $W_{s \tan d} = (\begin{matrix} D_{K} \times D_{K} \times M \end{matrix}) \times N$ $${W_{s\tan d}} = (\begin{array}{c} {{D_K} \times {D_K} \times M} \end{array})\: \times N$$

Whereas the convolution kernel size for depth convolution in Fig. 2(b) is (D_K, D_K, 1) with M kernels, the convolution kernel size for point convolution in Fig. 2(c) is (1, 1, M) with N kernels, the number of parameters for depth and point convolution are: (7) ${\begin{matrix} W_{d e p t h w i s e} = (D_{K} \times D_{K} \times 1) \times M \\ W_{p o int w i s e} = (1 \times 1 \times M) \times N \end{matrix}$ $$\left\{ {\begin{array}{c} {{W_{depthwise}} = (\:{D_K} \times {D_K} \times 1\:) \times M} \\ {{W_{po\operatorname{int} wise}} = (\:1 \times 1 \times M)\: \times N} \end{array}} \right.$$

Therefore, the number of depth-separable convolutional parameters is: (8) $W_{D} = W_{d e p t h w i s e} + W_{p o int w i s e} = (D_{K} \times D_{K} \times 1) \times M + (1 \times 1 \times M) \times N$ $${W_D}\: = \:{W_{depthwise}}\: + \:{W_{po\operatorname{int} wise}}\: = (\:{D_K}\: \times {D_K}\: \times 1\:)\: \times M + (\:1\: \times 1\: \times M)\: \times N$$

Therefore, the parametric ratio of the number of depth-separable convolutional parameters to the number of standard convolutional parameters is: (9) $η = \frac{W_{D}}{W_{s \tan d}} = \frac{1}{N} + \frac{1}{D_{K}^{2}}$ $$\eta = \frac{{{W_D}}}{{{W_{s\tan d}}}} = \frac{1}{N} + \frac{1}{{D_K^2}}$$

The convolutional kernel sizes of VGG19 are all of size 3 × 3, so the number of parameters will be reduced by about 1/9 by replacing VGG19 with a MobileNet network with a deeply separable convolutional core.

Through the above analysis, this paper starts from improving the feature extraction network, and compares the impact of different feature extraction networks on the final results by comparing four feature extraction networks, namely VGG19, MobileNet, MobileNetV3-small, and MobileNetV3-large. The original model uses a large number of 7 × 7 convolutional kernels, which causes an increase in computation, this paper adopts a residual structure of three 3 × 3 convolutions to replace one 7 × 7 convolution, and continues to improve each 3 × 3 convolution by using a form of depth-separable convolution.

2.2.4

Classification of objectives

Target classification is mainly used to classify the extracted human joint point data to determine the current learning behavior of students. 1)

Skeletal key point direct coordinate method

In this paper, we classify and identify the two learning behaviors of raising hands and stretching by the direct coordinate method of skeletal key points, and the identification situation is shown in Fig. 3, with (a) and (b) indicating the skeletal key points of raising the left hand and stretching posture, respectively. The method used is to determine which learning behavior is present by calculating and comparing the geometric relationship between the detected coordinate positions of the skeletal key points in different parts of the body.

As shown in Fig. 3(a), it can be seen that when the key point of the human left hand hand is higher than the key point of the nose, it can be determined that the action at this time is lifting the left hand, and by the same token, it can be determined that lifting the right hand. The specific calculation method is as follows: take the upper left corner of the picture as the coordinate origin, when there is a left hand hand key point in the picture, the longitudinal coordinate of the hand key point H is less than the longitudinal coordinate of the point N and the longitudinal coordinate of the right hand hand key point is greater than N, it is determined that the action at this time is to raise the left hand.

As shown in Fig. 3(b), when the key point of the right and left hand of the human body is higher than the key point of the nose, it can be determined that the action at this time is stretching. The specific calculation method is as follows: take the upper left corner of the picture as the coordinate origin, when there are left and right hand hand keypoints in the picture, the vertical coordinates of the hand keypoints H1 and H2 are less than N1, it is determined that the action at this time is stretching.

The skeletal key point direct coordinate method for determining learning behaviors with obvious geometric relationships not only has a high recognition accuracy, but also has a simple recognition method and a fast calculation speed. 2)

Skeletal key point relationship feature extraction method

The process of extracting students’ actions through the skeletal key point relationship feature extraction method is shown in Figure 4. (a), (b) and (c) represent the skeletal keypoint connection diagrams of the three actions of lying down, playing cell phone and writing respectively. It can be seen that the differences in the key point relationship features of these three actions are mainly reflected in the arms and head, so we will extract the skeletal key point relationship features for the head and arms respectively, so as to categorize and recognize these three actions.

In the classroom application scenario, it is necessary to identify the learning behaviors of each student in the classroom, and each student in the classroom has a different body size and height, and the distance of the students from the camera is also different, so the process of key point feature extraction for the three kinds of actions should not only ensure that the extracted feature vectors can differentiate between the three kinds of learning behaviors, so that the support vector machine can be used to carry out effective classification, but also to satisfy that the feature vectors can be applied to each student to ensure the invariance of learning behavior detection.

Through the above analysis, this paper extracts the feature vectors by representing the three key points on the arm as two vectors and transforming the positional relationship between the key points into the relationship between the vectors. The specific calculation method is shown in Eq. (10), in which the three key points S1, E1, and H1 of the left arm in Fig. 4(a) are represented by vectors $\vec{E 1 H 1}$ $$\overrightarrow {E1H1}$$ and $\vec{E 1 H 1}$ $$\overrightarrow {E1H1}$$, respectively, and the projection length of $\vec{E 1 H 1}$ $$\overrightarrow {E1H1}$$ in the $\vec{E 1 S 1}$ $$\overrightarrow {E1S1}$$-direction can be obtained by dividing the two vectors by the modulus of $\vec{E 1 H 1}$ $$\overrightarrow {E1H1}$$ after dot-multiplying them, and then dividing this projection length by the modulus of $\vec{E 1 S 1}$ $$\overrightarrow {E1S1}$$ yields the ratio of the projection length of $\vec{E 1 H 1}$ $$\overrightarrow {E1H1}$$ in the $\vec{E 1 S 1}$ $$\overrightarrow {E1S1}$$-direction to the length of E1S1, W1. W1 is the feature vector of the left arm of the human body, which achieves the purpose of learning behavior detection invariance by avoiding the influence of the feature vector due to the student’s body size and distance in the real application scenarios by means of the ratio: (10) $W 1 = \frac{\vec{E 1 H 1} \cdot \vec{E 1 S 1}}{| \vec{E 1 S 1} |^{2}}$ $$W1 = \frac{{\overrightarrow {E1H1} \cdot \overrightarrow {E1S1} }}{{|\overrightarrow {E1S1} {|^2}}}$$

The calculation of the eigenvectors of the head is shown in Eq. (11), where the key point between the neck and the head is represented by vector $\vec{C 1 N 1}$ $$\overrightarrow {C1N1}$$ with the vector direction C1 pointing to N1, and vector $\vec{S 1 S 2}$ $$\overrightarrow {S1S2}$$ represents the vector between the key point of the left shoulder and the right shoulder. Dotwise multiplying the two vectors and dividing by the mode of $\vec{S 1 S 2}$ $$\overrightarrow {S1S2}$$ yields the length of the projection of C1N1 in the direction of S1S2, and then dividing this length of projection by the mode of $\vec{S 1 S 2}$ $$\overrightarrow {S1S2}$$ yields the ratio of the length of the projection of $\vec{S 1 S 2}$ $$\overrightarrow {S1S2}$$ in the direction of $\vec{S 1 S 2}$ $$\overrightarrow {S1S2}$$ to the length of S1S2, Y1. Since vectors are directional, the value of Y1 is negative when N1 is below C1. When NI is above C1, the value of Y1 is positive. Thus, a distinction can be made between these two action features of head down and head up: (11) $Y 1 = \frac{\vec{C 1 N 1} \cdot \vec{S 1 S 2}}{| \vec{S 1 S 2} |^{2}}$ $$Y1 = \frac{{\overrightarrow {C1N1} \cdot \overrightarrow {S1S2} }}{{|\overrightarrow {S1S2} {|^2}}}$$ 3)

Learning behavior classification based on support vector machine

In this paper, support vector machine is used as the classification algorithm of learning behavior, and the feature vectors extracted from students’ learning behavior based on human skeletal key points are taken as inputs, and the learning behavior is classified after the training of support vector machine, and the learning behavior classification model is finally obtained [32]. The specific process is to take the double-arm feature vector W and the head and neck feature vector Y in the feature vectors of skeletal key points extracted in the previous section as the input vectors of the support vector machine, and at the same time, we label the three kinds of learning behaviors that need to be classified, and the learning behavior lying on the floor is labeled as 1, playing with the cell phone is labeled as 2, and writing is labeled as 3.

In order to avoid overfitting for classification, in this paper, the image data of the three actions are acquired according to the criteria of multiple angles, multiple gestures and multiple numbers. The image data of the three actions of the authors of this paper were automatically acquired by a camera, and 10,000 images were acquired for each action, resulting in a total of 30,000 image data. Then the obtained picture data are categorized into three learning behaviors, and the picture data of each category are randomly sorted in the process of classification training, and then 70% of the picture data of each category are divided into the training set, and the remaining 30% are used as the test set. Finally, these picture data are trained with feature vectors using support vector machines, and finally the classification model is generated.

3

Curricular educational strategies incorporating student behavioral analysis

The analysis of students’ classroom behavior can provide powerful support for teaching reform. By analyzing students’ classroom behavior through deep learning methods, teachers can obtain valuable information in order to adjust teaching strategies and optimize classroom teaching.

3.1

Student engagement monitoring

By monitoring and analyzing students’ behavior in the classroom in real-time, teachers can better understand their participation in the classroom and make timely adjustments to their teaching strategies to improve students’ engagement. For example, if a student is found to be playing with his cell phone with his head down all the time, the teacher can try to communicate with him or guide him to participate in classroom discussions.

3.2

Student Attention Tracking

By analyzing students’ head posture, combined with facial expressions and other behaviors, such as detecting the direction and tilt angle of students’ heads, as well as the position and orientation of their eyes, the direction of students’ attention can be inferred, helping teachers to understand whether students are concentrating on listening to lectures. This helps teachers to adjust the teaching content or approach in time to attract students’ attention.

3.3

Student Affect Recognition

By analyzing the information of students’ facial expressions and body language, it is possible to determine the students’ emotional state, for example, the crossing of arms, the tension of the shoulders, and the stiffness of the body may suggest an emotional state, such as anxiety, stress, and so on. This is very helpful for teachers to develop individualized teaching plans.

3.4

Assessment of student learning outcomes

By analyzing data on students’ classroom performance and homework completion, we can assess students’ learning effectiveness and provide timely feedback and guidance. This helps teachers better grasp the learning status of students, so as to improve teaching methods in a targeted manner.

4

Analysis of student behavior and the effects of educational reforms in the curriculum

4.1

Effectiveness of Student Behavior Analysis

In order to verify the effectiveness of the student classroom behavior recognition method, this study conducted comparative experiments on the student classroom behavior dataset using CNN-10 and the method in this paper, respectively.

4.1.1

Student Classroom Behavior Dataset Construction

Since there is no open dataset of students’ classroom behavior in China, this study used a SONY FDR-AX30 digital 4K camcorder to collect image data of students’ classroom behavior using a single shot and a single image intercepted from a classroom-recorded video for the construction of a dataset of students’ classroom behavior with 600 students from a university in A city.

In this study, the collected images were cropped and saved according to the upper body region of the human body constructed based on the key points of the human skeleton, and then scaled uniformly to an image with a size of 112 × 112 (with blanks complementing the zeros) using the long side as a base. By tagging and categorizing the 4600 collected images with students’ classroom behaviors, this study summarizes seven typical classroom behavior images that appear more frequently in students, including: raising hands 514, listening 825, looking around 806, reading 503, writing 522, standing up 847, and sleeping 583, which constitutes the Student Classroom Behavior Dataset (SCBID).

4.1.2

Experimental steps

1)

The dataset of students’ classroom behaviors is disrupted, and four-fifths of each of these students’ classroom behaviors are randomly selected as training samples, and the remaining one-fifth is used as testing samples.

2)

Based on the consideration of balancing the performance of the experimental machine and the training efficiency, the mini-batch is set to 160 and the epoch is set to 30.

3)

The CNN-10 and the student classroom behavior recognition method designed in this paper are used for training and testing respectively, and in order to reduce the random error, 10 random experiments are carried out with each of these two methods, after which the average recognition accuracy of these two methods is counted and calculated.

4.1.3

Analysis of experimental results

Under the same experimental conditions and after 10 randomized experiments, this study compares the accuracy of CNN-10 and the student classroom behavior recognition method designed in this paper in recognizing multiple classroom behaviors on the SCBID dataset, and the accuracy comparison results are shown in Fig. 5. By calculation, this study concludes that the average recognition accuracy of CNN-10 is 92.15%, while the average recognition accuracy of the student classroom behavior recognition method designed in this paper is 98.04%, which is up by 5.89% compared to CNN-10.

The confusion matrix of CNN-10 on the students’ classroom behavior dataset is shown in Fig. 6.Using CNN-10, there is a large gap in the recognition accuracy of different classroom behaviors, e.g., the recognition accuracy of raising hands and reading is 83.22% and 83.26%, respectively. Among them, 13.96% of hand-raising is misrecognized as listening, while 8.77% of reading is misrecognized as listening, and 5.43% is misrecognized as sleeping. The reasons for this are mainly: (1) The recognition of hand-raising behavior mainly relies on local information such as hands, which is easily interfered by factors such as students’ physique, dress code, and classroom background. Students’ reading behavior has a certain range of movement, and if it is too small, it is easy to misinterpret it as listening. Some of the reading and sleeping behaviors are too similar and the distinction is subtle, which can also lead to misjudgments. (2) Due to the limitations of the student classroom behavior dataset, it is difficult to classify all students’ classroom behaviors by using only convolutional neural networks to obtain behavioral features directly from the original image data.

The confusion matrix of a method for recognizing student classroom behavior based on human skeleton information and deep learning on student classroom behavior dataset is shown in Fig. 7. The gap of recognition accuracy of each classroom behavior under the method of this paper is narrowed. The recognition accuracy of the writing behavior is even higher than 100%, and the reading behavior with the lowest recognition accuracy also reaches 94.28%, in which 3.75% of the reading behaviors are misidentified as standing up.

4.2

Effectiveness of educational reforms in the curriculum

This chapter designs a controlled experiment to verify the effectiveness of the implementation of a curriculum education strategy that incorporates student behavior analysis. The experiment was carried out on first-year undergraduate students studying educational technology at a university in S province. There were a total of 80 students, including 22 male students and 58 female students. Through the investigation of enrollment requirements of the experimental subjects, students’ performance and other preconditions, it was found that the experimental students’ basic level was uneven, and in order to ensure the relative balance of the learning level between the experimental groups, they were therefore divided into 10 study groups of 8 each through heterogeneous grouping of human intervention. The above 10 groups were randomly assigned, and five of them were selected as the experimental group for this study, while the other five groups were automatically designated as the control group. The information of the students after grouping is as follows: 40 students in both experimental and control groups, including 11 male students and 29 female students. It basically ensures consistency in the pre-learning level of the students in the experimental group and the control group. In the experiment, except that the experimental group was taught in accordance with the curriculum education strategy designed by this study that integrates student behavior analysis, while the control group was taught in accordance with traditional teaching methods and general teaching strategies, there was no significant difference between the two groups of students in terms of learning content, learning resources, instructors, course curriculum and teaching progress, and so on.

The experimental design questionnaire examined the effect evaluation of the experimental group and the control group before and after the implementation of the experiment in four dimensions: motivation, commitment to learning, learning strategies, and learning outcomes, and the total score of each dimension was 30 points.

4.2.1

Questionnaire pre-test analysis

In order to exclude any unnecessary influence on the experimental results of this study due to the differences in the level of deep learning that the students in the experimental group and the control group have previously had, a test questionnaire was distributed to the two groups of students before the start of the experiment, and the resulting data were analyzed through SPSS 20.0. 1)

Descriptive statistical analysis

First of all, this study analyzed the pre-test data with descriptive statistics, and the results of the descriptive statistics of the pre-test of the questionnaire are shown in Table 1. Before the start of the formal experiment, except for the learning input dimension where the data of the two groups are basically the same, in the three dimensions of learning motivation, learning strategy and learning outcome, the mean values of students’ ratings of the experimental group are higher than those of the control group by 0.79, 0.61, 0.36 respectively, and the situation of the two groups are not much different.

2)

Test of Difference

In order to explore whether this difference is statistically significant, this study further analyzes on the basis of descriptive statistics and conducts an independent samples t-test on the pre-test data of the two groups of students in the four dimensions, and the test results are shown in Figure 8. According to the relevant criteria of the independent samples t-test, when the p-value is greater than 0.05 or 0.01, it means that there is no significant difference in the data at the corresponding level. From the results of the t-test, it can be seen that the p-values of the two groups of students in the learning motivation dimension, the learning engagement dimension, the learning strategy dimension, and the learning outcome dimension are 0.8365, 0.5366, 0.7159, and 0.6625, respectively, which are in the greater than 0.05, and therefore it can be said that there is no significant difference between the two groups of students in the overall level before the experiment.

Table 1.

Pretest descriptive statistics

	Group	N	M	SD
Learning motivation	Experimental group	40	19.56	1.23
Learning motivation	Control group	40	18.77	1.36
Learning input	Experimental group	40	19.36	1.02
Learning input	Control group	40	19.33	1.17
Learning strategy	Experimental group	40	18.63	1.69
Learning strategy	Control group	40	18.02	1.98
Learning results	Experimental group	40	19.22	1.06
Learning results	Control group	40	18.86	1.88

4.2.2

Questionnaire post-test analysis

1)

Descriptive statistical analysis

After the end of the formal experiment, by filling out the same learning questionnaire again by the students of the two groups, this study obtained the post-test data, and its descriptive statistical results are shown in Table 2. The experimental group’s scores on the four dimensions of deep learning motivation, deep learning engagement, deep learning strategy, and deep learning outcome were higher than those of the control group by 4.05, 3.68, 3.48, and 3.11, respectively, and the scores increased significantly compared with the pre-test. Therefore, this study concludes that the curriculum education strategy incorporating student behavior analysis has a positive effect on students’ learning level. The next step will be to analyze the difference through independent samples t-test to determine whether this effect reaches the level of significance.

2)

Test of Variance

Similar to the analysis conducted in the pre-test, the post-test still tested the data for differences using independent samples t-test for the four dimensions as well as the overall level, and the results of the test are shown in Figure 9. The p-values of the two groups of students in the motivation dimension, learning engagement dimension, learning strategy dimension, and learning outcome dimension are all less than 0.01, so the result of the posttest data is that there is a significant difference between the experimental group and the control group in all four sub-dimensions. It shows that the curriculum education strategy designed in this paper, which integrates student behavior analysis, can effectively enhance student learning.

Table 2.

Posttest descriptive statistics

	Group	N	M	SD
Learning motivation	Experimental group	40	23.11	0.52
Learning motivation	Control group	40	19.06	1.22
Learning input	Experimental group	40	23.64	0.15
Learning input	Control group	40	19.96	1.03
Learning strategy	Experimental group	40	22.54	1.06
Learning strategy	Control group	40	19.06	1.18
Learning results	Experimental group	40	22.96	1.15
Learning results	Control group	40	19.85	1.03

5

Conclusion

In this paper, we use the OpenPose model to extract features from a large amount of image, video, and sensor data to achieve accurate analysis and understanding of student behaviors, so as to design personalized teaching strategies to enhance student learning.

Compared with the CNN-10-based student classroom behavior recognition method, the average recognition accuracy of the student classroom behavior recognition method designed in this paper can reach 98.04% for the seven learning behaviors: raising hands, listening, looking around, reading, writing, standing up, and sleeping. It can effectively exclude the influence of irrelevant factors such as students’ physical appearance, dress code, and classroom background, and highlight key effective information, with stronger generalization ability and higher recognition accuracy.

Under the effect of the curriculum education strategy integrating student behavioral analysis, students’ evaluation of learning motivation, learning commitment, learning strategy and learning outcome is significantly improved, and the p-values of the four dimensions before and after the implementation of the strategy are less than 0.01. It shows that the curriculum education strategy integrating student behavioral analysis designed in this paper is highly recognized by students.

Lingua:: Inglese

Frequenza di pubblicazione:: 1 volte all'anno
Argomenti della rivista:: Scienze biologiche, Scienze della vita, altro, Matematica, Matematica applicata, Matematica generale, Fisica, Fisica, altro

Feed RSS della rivista

Research on the Integration of Student Behavior Analysis and Curriculum Education Strategies in Colleges and Universities under Deep Learning Framework

Shengnan Wu

Pubblicato online: 24 mar 2025

Ricevuto: 22 ott 2024

Accettato: 11 feb 2025

DOI: https://doi.org/10.2478/amns-2025-0715

Parole chiaveDeep learning, OpenPose, Target detection, Support vector machine, Student behavior analysis

© 2025 Shengnan Wu, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Parole chiave
Deep learning, OpenPose, Target detection, Support vector machine, Student behavior analysis