Pathways to Improve the Effectiveness of Multidimensional Data Analysis in Decision Support for Instructional Management

Schools have already stored a large amount of data about students, teachers and employment, including students’ basic information, course grades, school performance, employment, teachers’ basic information, teaching arrangements and teaching effectiveness information, etc., and a large amount of valuable information is hidden in these data [1-3]. Making full use of these data resources to provide support for the school’s teaching management decisions can further improve the phenomenon of high delayed graduation rate or reverie rate and poor employment quality in higher vocational colleges and universities [4-5]. Through the use of data visualisation and data mining techniques, the information implied in the data can be fully explored, and the results can be displayed in a friendly and intuitive form, resulting in the analysis results that can help the adjustment of teaching policy or professional training plan, providing students with more comprehensive and flexible channels to inquire about the employment letters of graduates, as well as recommending occupations with a high degree of match with their own characteristics, so as to achieve the purpose of assisting teaching management [6-8].

Intelligent education management research mainly focuses on concepts, principles and structural arguments, rarely incorporating the rationality and reliability factors from the engineering and technical perspectives, which weakens the feasibility of some reference programmes due to the lack of engineering support, and there is still a lot of room for improvement. The essence of pedagogical decision-making is still decision-making, which needs to be based on facts, and the value of facts depends on data, and the completeness, robustness, and validity of data represent the data management capability and business execution capability, which needs to be supported by reasonable technical-logical solutions [9-12]. In the transformation process from data and business to decision-making, it involves data collection and mining, validity statistics, and tracking of change rules, etc. The data-driven teaching management achieved in this way needs to configure the platform business with self-adjustment capability, with a view to generating dynamic data that conforms to the teaching process, and realising intelligent management through decision-making prediction and analysis [13-16].

The traditional teaching management system focuses more on the management of the teaching system and does not focus on analysing and grasping the current state of students, and teaching managers are unable to take timely and effective measures to ensure the quality and effectiveness of teaching and learning. Gumus, S et al. reviewed the research literature related to the styles of educational management models, and found that the research related to distributed leadership, instructional leadership, teacher leadership, and transformational leadership is the most prominent and prevalent, and the studies focused on the impact of educational management behaviours on student achievement and the logic of their functioning [17]. Johari, J et al. examined how teacher job autonomy, workload, and life balance impacted teacher performance in educational management, and the study noted that job autonomy as well as work-life balance had a significant impact on teacher performance [18]. Lim, W. M et al. critically examined the potential and ways of applying AI technology in education and provided an in-depth reflection and analysis based on an educational management perspective, which provides an important reference for future educational administrators to make rules for AI technology in educational practices [19]. Abad-Segura, E et al. discussed that the digital transformation of education facilitates the implementation of sustainable management of education, and combined with the Scopus database on digital transformation in education as well as sustainable management in education was explained and analysed in detail [20]. González-Pérez, L. I studied the framework and model of the Education 4.0 model, revealing that Education 4.0 is oriented to the personalised development of the student and incorporates the student’s personality to promote a model of self-directed learning aimed at the multidimensional cultivation of the student’s knowledge and competences [21].

Relevant education industries have processed multidimensional data analysis of students’ online learning behaviours, and mined the deficiencies in the teaching and learning environments through relevant technologies in order to improve the teaching content and teaching process. Romero, C et al. systematically reviewed the performance of data mining techniques in educational management and cutting-edge technologies, and the literature covered data-driven educational decision making, data-based teaching and learning analytics and so on [22]. Aldowah, H et al. elaborated on the positive role played by data mining technology in teaching management and teaching practice, and reviewed the relevant publications from 2000 to 2017, including four segments of teaching analysis, teaching prediction, student behaviour analysis and visual analysis of teaching management, pointing out that the EDM technology and LA technology can provide positive help for the scientific solution of teaching problems [23]. Turnbull, D et al. talked about the advances in information technology and equipment that have provided favourable conditions for distance online education mode to meet the need of accessing knowledge at any time and any place [24]. Iqbal, A et al. used least squares structural equations for modelling analysis and examined the drivers of KM processes with a research sample of students from Pakistani research universities, while revealing that that knowledge management exerts an influence on organisational performance through intellectual capital and innovation [25].

The research constructed a decision support system for teaching management and improved it on the basis of the traditional C4.5 algorithm. Taking the data provided by a university as the basis of the research, the association rule technology is used to mine the text of students’ evaluation of teachers, and the relationship between teachers and teaching quality is analysed. At the same time, the performance of the changed C4.5 algorithm is verified through experiments, and the grade prediction model is constructed to predict students’ grades. According to the collection of students’ school situation, the k-means algorithm is used to cluster them and analyse the correlation between each variable of students’ school performance and employment information.

2

Intelligent teaching management decision support system

The decision-making requirements involve the three dimensions of “data management - business design - platform architecture”, and are subject to the multi-level and multi-authority business topology of intelligent teaching management, which is expressed in four levels. The first level of use cases describes the interaction relationship between teachers and learners as the key participant, and involves the use cases of attending classes, selecting classes, answering questions, tutoring, preparing classes, assignments, reviewing, previewing, and taking exams, The first level of use cases describes the interaction relationship between teachers and learners as the key participant, involving use cases such as attending classes, selecting classes, answering questions, tutoring, preparing for classes, assignments, reviewing, previewing, and taking exams, etc., and the teachers and learners participate in the relevant use cases according to their needs. The second layer of decision-making is mainly used to optimize teaching and learning behaviors, obtain potential teaching needs and intervention requirements in a timely manner, and improve teaching effectiveness. The third layer is an operational mechanism for evaluating the effectiveness of the statistical and management results of the second layer of data, which is the key layer of instructional decision-making claims, and also forms a ring-like correlation structure of data, where educational decision-makers evaluate the data, the results of the evaluation are fed back to the second layer, and the prediction results and decision-making needs are transmitted to the fourth layer. The fourth layer is the architectural adjustment layer, which connects the top-level decision-making demands through the business integration standards provided by the platform at all levels. The layers are connected to realize cross-platform data integration, business integration and architecture integration.

2.1

Decision-making design for teaching and learning management

2.1.1

Relational data model

The data described by the relational data model is described through two dimensions: “type” and “value”: “type” represents the basic description structure of the data, and its design follows the basic paradigm of the relational data model; The Value represents the specific traces of data generated during the access to the system. Together, “type” and “value” construct a relational data structure, and the relationship between the data can be described as tables that meet the appropriate relationship pattern, and the tables are associated with each other through “foreign keys” defined by attributes. Relational structures are good for representing descriptive and defining data, and the model can be used for teachers, learners, and other participants and the relationships that exist between them.

2.1.2

Resource Description Framework

Resource Description Framework (RDF) is a structure that represents data in graphs and is suitable for representing decisions and rules of online platform Web business data. At the initiation of the decision-making claim of intelligent teaching management, the platform should construct the business process in time and autonomously, and construct the RDF corresponding to the decisions and rules in due time. Here, take the decision-making feedback of the learning process as an example, the whole process involves three types of participants, namely, the learner, the teaching management personnel and the teaching decision-maker, and the decision-making generates a process that will be associated with the first layer, the second layer, and the third layer, and the process contains a number of attributes of the relevant use cases and the guidance and participation links between resources, then the process can be described as a complete RDF, usually expressed in RDF process is relatively simple.

2.1.3

Knowledge attribute maps

Knowledge Attribute Graph is a data model for representing autonomous relationships, such as social network relationships. Compared with RDF, it can embed built-in services for the attributes of data and the attributes of relationships, and has a richer way of describing the intrinsic mutual constraints and influencing relationships of data. KAG can describe the activity sequences of use cases and the interaction and collaboration relationships, and with the help of concepts such as labels and association attributes, it can give the graph vertices and richer information about the roles. The objects involved in an activity are its roles, the meaning of the relationship is expressed with the help of labels, and the cause and effect of the activity involvement is pinpointed with the help of association attributes.

2.2

Construction of decision support system for teaching management

2.2.1

Teaching management system

Traditional university teaching management mainly completes the management of a series of mathematical activities from the professional teaching plan to the semester course plan, from the scheduling of classes to the examination arrangements, from the generation of student grades to the statistical analysis of course grades. Therefore, it mainly includes the following aspects of the module shown in Figure 1.

The traditional academic affairs management system consists of a university-level academic affairs management subsystem and a faculty-level academic affairs management subsystem. The university-level academic affairs management subsystem faces the academic affairs department of the school, focusing on the management and maintenance of academic affairs data, and data analysis and statistics from an overall perspective. Faculty-level teaching management sub-system for faculty teaching staff, focusing on faculty teaching data entry, query, printing, data analysis and data statistics from a local perspective.

Poor data integration and data inconsistency between traditional university teaching management systems are particularly serious. The original database system is difficult to adapt to the different requirements of data granularity of various types of objects. For example, students’ course learning is gradual, and there is a certain correlation between the courses and the sequential relationship. In the study of a more advanced courses must be studied before which courses, if the first course did not learn well, will inevitably affect the learning of subsequent courses and other issues. Higher vocational colleges and universities should synchronise their professional settings and curriculum systems with the needs of society, which requires schools to revise their teaching plans in a timely manner and replace courses that are no longer suitable for the market. The traditional teaching management system is difficult to correlate the above demands and analyse the reasonableness of the curriculum.

2.2.2

Decision support system for teaching management

Higher education teaching management decision support system is able to use the traditional university teaching management system using the database on the basis of the use of data warehousing technology, OLAF technology and data mining technology to analyse the following aspects of teaching: performance decision analysis, faculty decision analysis, decision analysis of teaching quality assessment, decision analysis of the selection of the course arrangements, decision analysis of teaching workload, the professional course settings decision analysis, enrolment decision analysis, student employment analysis and so on. The data processing within the current teaching management system can be roughly divided into two categories: operational processing and analytical processing. Operational processing, also called transaction processing, refers to the daily operation of the database on-line, usually the query and modification of a record or a group of records, mainly for the specific application of the teaching implementation of the service, people are concerned about the response time, data security and integrity. Analytical processing, on the other hand, is used for decision analysis by administrators, often accessing large amounts of historical data and it requires high speed data processing - instantaneous processing. The conflicting data needs lead to different requirements for data management and storage. The architecture of this solution applied in the education industry is shown in Figure 2.

2.3

Design of a multidimensional data warehouse

2.3.1

Multidimensional data collection

ETL in the data warehouse is the most important data collection tools, which includes data extraction, transformation, loading and cleaning, they jointly constructed the data warehouse, the more mainstream ETL tools on the market include Powercenter and Datastage and so on. Teaching management data warehouse for colleges and universities, it is difficult to establish a function of the content of the most complete ETL tool system, because the data mobility of colleges and universities, so the establishment of a small-scale and time nodes of the ETL tool, the establishment of the data market, should be the key to the development of teaching management decision support system in colleges and universities.

2.3.2

Data analysis models

According to the teaching decision-making system business needs and the reality of the existing teaching information system data, decision-making system days ago there are many decision-making analysis theme, here mainly introduces the three analysis theme and the design of the decision-making theme data analysis model: 1)

School registration information data analysis model

School registration information data analysis model consists of regional dimension table, teaching unit dimension table, national dimension table, professional dimension table and school registration information fact table constitutes a star structure.

2)

Teaching quality assessment data analysis model

Teaching quality evaluation data analysis model consists of academic dimension table, regional dimension table, course dimension table, teacher dimension table and teaching quality evaluation fact table to form a snowflake model structure, in which the teaching quality evaluation fact table contains attributes such as teacher employee number, course number taught and rating. The model can analyse the statistics of teachers’ teaching evaluation, analyse the influence of teachers’ age and education factors on teaching, and provide a basis for improving teaching quality.

3)

University English four and six grade data analysis model

The data analysis model of University English Grade 4 and 6 results consists of student dimension table, regional dimension table, ethnicity dimension table, teaching unit dimension table, and Grade 4 and 6 examination results fact table constituting the snowflake model structure, in which the Grade 4 and 6 examination results fact table contains the information of examination results, examination categories, examination time, etc. The analysis model can analyse students’ Grade 4 and 6 results, and analyse teachers’ teaching evaluation. The analysis model can analyse the students’ level 4 and 6 examination situation, such as the first pass rate, the second pass rate, and the performance statistics.

2.3.3

Data collection tools

Data collection tools are used in data warehouses with the characteristics of applicability, extensibility and flexibility, and data sampling and conversion can be achieved with hand-written code. So it is necessary for college teaching to design ETL tools with the help of data warehouse, for example, the following is designed as an ETL tool module for managing academic information, and its function procedure should be: 1)

Procedure SET_SDblink()//apply to set the meta-database link.

2)

Procedure Extract_Data()//applied to extract data at any time.

3)

Procedure Set_TDblink()//Applied to set the target database link.

4)

Function Get_Exist(ID: string): boolean///Applied to quick judgement record loading based on primary key ID.

5)

Procedure Data_Load() //Applied to data loading.

3

Data mining-based analytical model for decision-making in teaching and learning management

3.1

Apriori based rule mining for teaching evaluation

The Apriori algorithm accomplishes frequent itemset discovery step-by-step by growing the number of itemset elements [26]. The algorithm is generally divided into two steps:

The first step is to iteratively identify all frequent itemsets, which requires that the support of frequent itemsets is not lower than the minimum support set by the user. Identifying or discovering all frequent itemsets is the most central part of association rules and also the most computationally intensive part.

The second step is to construct rules from the frequent itemsets with a confidence level no lower than the minimum confidence level set by the user. That is, first 1-frequent itemset L₁ is generated and then 2-frequent itemset L₂ is generated until the algorithm stops because it can no longer extend the elemental data of the frequent itemset. In the Kth loop, the set of K-candidate itemsets C_K is generated then the support is generated by scanning the database and tested to generate the set of K-frequent itemsets L_K.

3.2

Decision Tree Based Achievement Early Warning

3.2.1

Decision tree algorithm

1)

ID3 algorithm

The advantages of the ID3 algorithm are: the theory of the algorithm is clear, the method is simple, and the learning ability is strong. The disadvantage is that it is only effective for smaller data sets and is more sensitive to noise, and when the training data set is increased, the decision tree may change accordingly.

2)

C4.5 algorithm

Among the decision tree algorithms, the C4.5 algorithm is considered a classic classification algorithm in software development. It improves on some of the shortcomings of the ID3 algorithm. First, C4.5 algorithm can discretize the continuous attribute values to build decision tree. Second, C4.5 algorithm gets rid of the problem that information gain favors attributes with more values. It no longer uses the principle of information gain, but uses the information gain rate as the basis for measuring the attributes of the decision tree. Third, the C4.5 algorithm can generate a collection of “If-Then” rules through the construction of the decision tree, recording a decision tree node top-down generation path [27].

3)

SPRINT algorithm

SPRINT algorithm is categorized into serial and parallel algorithms, which solves the problem of data residing in memory. The list of categories residing in memory is merged with each list of attributes and by doing so it makes it simple to traverse the list to find the optimal splitting criterion.

3.2.2

C4.5 algorithm

The C4.5 algorithm is a further invention by Quinlan that improves on the ID3 algorithm by introducing the concept of information gain rate, which is able to deal with consecutive attributes as well as the problem of attribute value vacancies. Some of the problems related to C4.5 are described below:

Concept 1 defines n messages that occupy the same probability, where the probability of a single message is P = 1/n, then $I (P) = - \sum^{n} P_{i} L o g_{2} (P_{i})$ is called the amount of information transmitted by the message.

Concept 2: When there is an event y_j, followed by the probability of the occurrence of time x_i in the case of y_j is the conditional probability, this result is p(x_i ∣ y_j), we use the negative of the logarithm of the conditional probability to calculate the amount of conditional self-information, that is, the entropy of the decision tree algorithm, which represents a kind of uncertainty. As shown in equation (1): (1) $I (x_{i} | y_{j}) = - \sum_{i, j = 1}^{n} P (x_{i} | y_{j}) L o g_{2} P (x_{i} | y_{j})$

Concept 3: Conditional entropy, which is the uncertainty of a random variable under a condition, is found according to equation (1). Given the outcome value X, the conditional entropy under condition y_j is shown in equation (2): (2) $H (X, y_{j}) = \sum p (x_{i} | y_{j}) * I (x_{i} | y_{j})$

Concept 4: Use X to represent the outcome value and A to represent one of the decision attributes. Based on the above three concepts the information gain can be calculated as in equation (3): (3) $G a i n (X, A) = H (X) - H (X, A)$

3.2.3

Improved C4.5 algorithm

In the improved C4.5 algorithm, let: there are m positive example attribute and n negative example attributes in the category attributes. Then it is informative: (4) $I (m, n) = - \frac{m}{m + n} log, \frac{m}{m + n} - \frac{n}{m + n} {log}_{2} \frac{n}{m + n}$

If A is used as a property of the test and A has p different values, then: (5) $\begin{array}{rcl} E (A) & = & \sum_{i = 1}^{p} \frac{m_{i} + n_{i}}{m + n^{*}} (- \frac{m_{i}}{m_{i} + n_{i}} \log \frac{m_{i}}{m_{i} + n_{i}} - \frac{n_{i}}{m_{i} + n_{i}} \log_{2} \frac{n_{i}}{m_{i} + n_{i}}) \\ = & \frac{1}{(n + m) \ln 2} \sum_{i = 1}^{p} (- y_{i} \ln \frac{m_{i}}{n_{i} + m_{i}} - n_{i} \ln \frac{n_{i}}{n_{i} + m_{i}}) \end{array}$

For simplicity of calculation, the $\frac{1}{(n + m) \ln 2}$ to be counted at each step is omitted as a constant.

Then: (6) $E (A) = \sum_{i} = 1^{p} (- m_{i} \ln \frac{m_{i}}{n_{i} + m_{i}} - n_{i} \ln \frac{n_{i}}{n_{i} + m_{i}})$

Using the formula in higher numbers, if x is small, then ln(1 + x) = x, which can be obtained as follows: (7) $\ln \frac{m_{i}}{n_{i} + m_{i}} = \ln (1 - \frac{n_{i}}{n_{i} + m_{i}}) \approx - \frac{n_{i}}{n_{i} + m_{i}}$ (8) $\ln \frac{n_{i}}{n_{i} + m_{i}} = \ln (1 - \frac{m_{i}}{n_{i} + m_{i}}) = - \frac{m_{i}}{n_{i} + m_{i}}$

Simplifying Eq. (8) with Eqs. (6) and (7) gives the following: (9) $E (A) = \sum_{i = 1}^{p} [- m_{i}^{*} (- \frac{n_{i}}{n_{i} + m_{i}}) - n_{i}^{*} (- \frac{m_{i}}{n_{i} + m_{i}})] = 2 \sum_{i = 1}^{p} \frac{n_{i} m_{i}}{n_{i} + m_{i}}$

By omitting the constant, the: (10) $E (A) = \sum_{i = 1}^{p} \frac{n_{i} m_{i}}{n_{i} + m_{i}}$

As: (11) $S p l i t I n f o (A, D) = - \frac{m_{i}}{m_{i} + n_{i}} log \frac{m_{i}}{m_{i} + n_{i}} - \frac{n_{i}}{m_{i} + n_{i}} {log}_{2} \frac{n_{i}}{m_{i} + n_{i}} \approx \sum_{i = 1}^{p} \frac{m_{i}}{n_{i} + m_{i}}$

The formula to derive the information gain rate is as follows: (12) $G a i n R a t i o (A, D) = \frac{S p l i t I n f o (A, D) - E (A)}{S p l i t I n f o (A, D)}$

3.3

Career guidance based on K-means algorithm

The k-means algorithm belongs to the clustering algorithms that get divided and belongs to the unsupervised machine learning methods that automatically group unlabelled sets of data. k-means algorithm first randomly determines the value of k, which is the number of clusters to be clustered, and the value of k and the location of the initial clustering centre of mass that is represented by k is randomised [28]. The centre of mass is the vector mean of each dimension of each class of clusters of the dataset. There are various measures of distance between datasets and the following measures are common.

Euclidean distance is one of the algorithms in the metric space. The Euclidean distance in N dimensional space is represented as follows: (13) $d (x, y) = \sqrt{{(x_{1} - y_{1})}^{2} + {(x_{2} - y_{2})}^{2} + ... + {(x_{n} - y_{n})}^{2}}$

d(x, y) is the n-dimensional Euclidean distance between points (x₁, x₂, …, x_n) and (y₁, y₂, …, y_n).

Minkowski distance: (14) $d (x, y) = {(\sum_{i = 1}^{n} | x_{i} - y_{i} |^{p})}^{\frac{1}{p}}$

When p = 2 in (14) is the Manhattan distance (15) $d (x, y) = \sum_{i = 1}^{n} | x_{i} - y_{i} |$

The Chebyshev distance is the Chebyshev distance when p → ∞ in (14): (16) $d (x, y) = {(\sum_{i = 1}^{n} | x_{i} - y_{i} |^{\infty})}^{\frac{1}{x}}$

The cosine distance between point x, y is: (17) $\cos θ = \frac{\vec{x} . \vec{y}}{| \vec{x} | \cdot | \vec{y} |}$

Then according to the characteristics of the dataset choose equation (13), (14), (15), (16), (17) to calculate a kind of measurement distance as the metric distance, calculate the distance of each data to the initial centroid, and divide each data into the set of centroids with the nearest distance, so as to obtain the k subcollections, that is, the initial k clusters of classes, and then according to the method of averaging of dimensions to calculate from a new method of calculating the centre of each centre of the sub-collection until the distance between the new centroid and the initial centroid is less than the set threshold or the new centroid is not in the changing position, the algorithm ends and the original data set is divided into k classes.

4

Multidimensional data analysis for teaching and learning management

4.1

Analysis of teaching quality assessment data

The teaching evaluation data of two semesters in 2017 from some colleges of a university were collected for statistical analysis and deep mining, including a total of 350 courses. A total of 30,100 students involved in teaching evaluation. Involving 21,402 text of teaching evaluation. The nature of the courses includes 12 categories, including compulsory public foundation courses, practical sessions (auxiliary), compulsory practical sessions, optional practical sessions, elective general education courses, compulsory subject foundation courses, compulsory subject foundation courses (auxiliary), elective subject foundation courses, compulsory professional courses (auxiliary), professional optional courses, professional limited elective courses, and university elective courses.

4.1.1

Structured scoring statistics

The distribution of evaluation scores, divided into five intervals and statistical distribution of evaluation scores: 18% of the courses with scores of 90 and above, 65% of the courses with scores of 85-90, 15% of the courses with scores of 80-85, 1% of the courses with scores of 75-80, and 1% of the courses with scores of 75 or below. The following courses accounted for 1%, indicating that the overall rating of the participating students for the courses offered is high, and the ratings show a normal distribution trend.

Statistical analysis of the evaluation scores of different course natures, using SPSS software to analyse the mean and standard deviation of the ratings of courses of different natures in the two semesters, the statistical results are shown in Table 1.

Table 1.

Statistical results of different properties course assessment scores

Course nature	Opening number	Evaluator number	Mean	Median	Standard deviation
Elective course	5	44	85.89	88.69	2.69
Professional restriction	63	3180	85.01	85.935	2.765
Subject required courses	120	9184	85.22	84.585	2.713
Subject required courses	13	2834	84.91	85.82	1.986
Practice link	6	1078	84.79	86.09	0.951
Major courses (auxiliary)	20	2164	84.83	85.505	1.257
Optional course of practice	5	1165	84.65	84.8	3.426
Subject course	17	1048	84.2	85.17	4.593
Professional optional class	26	1290	84.13	85.275	2.306
Subject basis required courses (supplementary)	26	2100	83.97	86.26	1.254
General education elective course	44	5527	83.2	84.23	2.102
Public basic course	5	486	80.19	81.96	8.246
Total	350	30100	84.25	-	-

Through Table 1, it is found that: firstly, the standard deviation of most of the courses is small, between 1-3 points, with little difference, and the types of courses with large standard deviation include the compulsory public foundation courses, the elective courses of subject foundation, and the elective courses of practical sessions, indicating that there is obvious variability in students’ ratings for these three types of courses; secondly, the types of courses with the bottom two rankings in the mean value are the compulsory public foundation courses (with the rating The largest variance), general education elective courses, in which: the public basic compulsory courses include English, high mathematics, probability, linear algebra and other basic courses, such courses have an important role in supporting the students’ professional course learning and further study, the students’ expectation level is higher, at the same time, the difficulty of passing such courses is relatively large, there will be a certain degree of failing the course phenomenon, so the lower ratings indicate that the students’ expectations on the At the same time, these courses are relatively difficult to pass, and there is a certain degree of failure, so the lower ratings indicate that students’ expectations have a greater impact on the ratings, and also indicate that the ratings of these courses are more polarised, and that the teaching method of tailoring the teaching to the students’ needs needs needs to be explored.

4.1.2

Subjective assessment of teaching correlation analysis mining

Through the students’ evaluation data of teachers, after each comment is processed, the corresponding emotional tendency is obtained, and then added into each pre-processed comment Here, the value of the minimum support degree and the minimum confidence degree is set to 0.01, and about 896 association rules are obtained According to the actual situation, after careful screening, the association rules shown in Table 2 are obtained. From the table, it can be seen that the vast majority of the emotional tendency obtained from the evaluation of teaching is positive (confidence level close to 1), indicating that the vast majority of teachers in this school are of high level, strong teaching ability, appropriate teaching methods, and liked by the majority of students. It can also be seen that the teaching styles and methods that are liked by the students are: lecturing with humour, vividness, seriousness, and have a patient and careful style. However, there are also individual correlation rules that show that there are some styles of lecturing that are not liked by the students. Therefore, it is recommended that teachers, in addition to enriching their own teaching ability and basic knowledge, should find ways to improve the atmosphere of learning and discussion in each lesson, and combine their own teaching methods and methods with their own years of teaching, and call on the students’ motivation to learn, which is a win-win situation for both teachers and students.

Table 2.

The correlation analysis results after screening

Serial number	Association rule	Support	Confidence	Degree of ascension
1	Energize =>pos	0.01	0.97	1.11
2	Interesting =>pos	0.02	0.96	1.12
3	Interest =>Study	0.01	0.41	8.51
4	Teacher =>Great	0.01	0.01	1.28
5	Tenderness =>pos	0.01	0.97	1.12
6	Interest=>pos	0.02	0.92	1.04
7	Excitation=>Interest	0.01	0.77	28.32
8	Loveliness =>pos	0.01	0.97	1.12
9	Interesting =>pos	0.04	0.97	1.13
10	Lecturing=>Patience	0.01	0.07	1.27
11	Lecturing=>Meticulous	0.01	0.06	2.35
12	Practice =>pos	0.01	0.90	1.06
13	Humor =>pos	0.05	0.98	1.11
14	Wit =>pos	0.04	0.98	1.14
15	Detail =>pos	0.02	0.96	1.2
16	Detail =>pos	0.03	0.90	1.06
17	No=>neg	0.01	0.28	4.8
18	Meticulous =>pos	0.03	0.98	1.15
19	In charge of =>pos	0.05	0.95	1.08
20	Rigor =>pos	0.02	1.00	1.13
21	Clarity =>pos	0.02	0.93	1.08
22	English =>pos	0.02	0.92	1.07
23	Patience =>pos	0.06	0.97	1.12

4.2

Early warning analysis of student performance

4.2.1

Performance of C4.5 algorithm before and after improvement

To verify the algorithmic gap between the C4.5 algorithm before and after improvement, the confusion matrix of the C4.5 algorithm before and after improvement is shown in Table 3-Table 4. Comparing the two tables it is found that the number of correct classifications of C4.5 algorithm increases by 5 after improvement. Four evaluation metrics, accuracy, precision, recall and F1-Measure value, are calculated based on the confusion matrix to assess the performance of the C4.5 algorithm before and after improvement. The performance comparison of the post C4.5 algorithm before and after improvement is shown in Fig. 3. From the figure, it can be seen that the improved C4.5 algorithm improves the accuracy rate by 1.32% relative to the original C4.5 algorithm, the precision rate by 0.91% relative to the original C4.5 algorithm, the recall rate by 3.07%, and the F1-Measure by 1.81% relative to the original C4.5 algorithm. Summarising the results of the experimental evaluation, the performance of the final stage grade prediction model constructed by the improved C4.5 algorithm is better than that of the final stage grade prediction model constructed by the original C4.5 algorithm.

Table 3.

The confusion matrix of the C4.5 algorithm

True value	Predictive value	Pass	Flunk
Pass		137	13
Flunk		23	4

Table 4.

Improved C4.5 algorithm’s confusion matrix

True value	Predictive value	Pass	Flunk
Pass		140	10
Flunk		22	5

4.2.2

Applying the C4.5 algorithm to generate a staged decision tree

The following training of the learning data before the deadline of the students’ weekend achievement test generates the decision tree for the third stage as shown in Fig. 4.

The learning data before the students’ end-of-month test deadline was trained to generate the decision tree for the fifth stage, and the decision tree generated in the fifth stage is shown in Fig. 5. Since the tree constructed in the fifth stage is relatively large, only the left branch is shown.

For the generated stage 3 decision tree and stage 5 decision tree, one path from the root node to the leaf node at a time is a classification rule, and all the classification rules are integrated together as the stage performance prediction model.

4.2.3

Predictive modelling of performance in the assessment phase

When evaluating a multiclassification model, instead of evaluating the model with precision, recall, F1-Measure, and accuracy for a specific class, the stage-achievement prediction model is evaluated by the accuracy of the model as a whole. The accuracy rate is calculated by the formula: (18) $Accuracy = \frac{correctClassification (testSet)}{total(testSet)}$

Where total(testSet) denotes the total number of test sets and correctClassification(testSet) denotes the total number of correct classifications in the test set.

The accuracy of each stage performance prediction model can be calculated according to Equation (18), and the stage performance prediction model accuracy is shown in Figure 6.

The figure shows that with the increase in the number of learning behavioural features at each stage, the stage-based performance prediction model accuracy rate is roughly on the rise, but most of the overall accuracy rates are around 52%. There are two main reasons for the low prediction accuracy of the stage-based grade prediction model: (1) Because the stage-based grade prediction model predicts the final grade interval rather than the final grade pass or fail, and predicts the final grade interval in more detail, the stage-based grade prediction model has a lower accuracy rate. (2) The accuracy is relatively low due to incomplete characteristics of the stage-based learning process.

4.3

Analysis of employment data

Students’ performance in school includes students’ majors, rankings of major grades, English level, professional skill level, awards, violations and penalties, participation in clubs, and participation in competitions, while the employment situation mainly includes information on the industry category, job type, monthly salary level, grade level, and professional matching degree. Based on our experience, we believe that there is a certain correlation between students’ majors and the types of industries and positions they are engaged in, and there is also a certain correlation between the level of professional skills and the degree of professional matching of the positions they are engaged in, we make two-two combinations of the variables in the dimensions of students’ performance in school and the dimensions of the employment situation, and carry out the correlation analysis by using chi-square test.

We performed the chi-square test on the two-by-two combinations formed by the variables and collated the results of the asymptotic significance values of the Pearson chi-square as shown in Table 5. Empty cells indicate that the theoretical frequency of the corresponding combination of line list information does not meet the conditions of the application, and the values less than 0.05 are marked in bold font.

Table 5.

The results of the card square of each variable

	Industry category	Job type	Monthly salary	Rank	Professional matching
majors	0.000	0.000	0.070	0.113	0.062
Professional ranking	0.212	0.223	0.091	0.145	0.032
English level	0.088	0.044	0.056	0.114	0.105
Professional skill level	0.134	0.138	0.037	0.023	0.008
Awards for scholarship	0.155	0.178	0.041	0.028	0.062
Violation of punishment	0.125	0.108	0.077	0.023	0.126
Community participation	0.132	0.047	0.038	0.015	0.114
Competition participation	0.151	0.087	0.047	0.075	0.028

It can be seen that majors are significantly correlated with industry categories and job types (<0.05), major grade ranking is significantly correlated with major matching, English level is significantly correlated with job types, professional skill level is significantly correlated with monthly salary level, grade level, and major matching, receiving scholarships is significantly correlated with monthly salary level, grade level, and punishment for violation is significantly correlated with monthly salary level The level of professional skills is significantly related to the level of monthly salary, grade level, and professional matching.

Considering that jobs may change over time and the evaluation criteria of students’ school performance may change, the school performance information of students within 5 years of graduation and students who are about to graduate is selected for clustering, for example, the students who are about to graduate are class of 2016, and the school performance information of the 2011~2016 classes of this major is selected for clustering. In this paper, we take the students of computer network technology majoring in grade 2016 as an example for the demonstration of providing students with career guidance methods.

Select the computer network technology major 2011~2016 grade a total of 550 students in the school performance information, open the csv format file, select the algorithm “SimpleKMeans”, set the “numClusters”, “print K” value for the “numClusters”, and set the “numClusters” value for the “numClusters”. The value of “numClusters” is set as default. The output of the algorithm has an evaluation of the clustering index “Within cluster sum of squard errors”, the smaller the value of the number of instances of the same cluster the smaller the distance between the clusters, the better the effect of clustering, the value of the index with the increase in the value of K and decrease. K value represents the number of clusters, K value is too large to facilitate the understanding of clustering. K value represents the number of clusters, K value is too large to understand the clustering results, and may make the results meaningless. According to experience, the value of “numClusters” is set from 5 to 8, and the seed value is randomly adjusted 11 times for each K value, and finally “numClusters” is determined to be 7, and the results are shown in Table 6. The results are shown in Table 6. Using “Within cluster sum of squard errors” to evaluate the clustering results, there exists an error rate of 20.8214% i.e. the clustering correctness rate is about 80%.

Table 6.

K-means clustering results

	Clusters
	Full Date	0	1	2	3	4	5
	(518.0)	(55.0)	(52.0)	(264.0)	(94.0)	(54.0)	(26.0)
Panking	else	TOP70%	else	else	TOP70%	TOP70%	TOP30%
Test score	78.9312	81.61	68.65	78.45	79.22	88.35	87.41
Task-subrate	93.233	92.782	64.454	93.227	84.634	93.322	93.337
English-level	Normol	Normol	low	Normol	Normol	High	High
Skill-level	Normol	Normol	Normol	Normol	High	High	High
Scholarship-num	1.4999	2.7	0.06	1	2	3.3	3.2
Highest- scholarship	Campus-level	Campus-leve	Campus-leve	Campus-leve	Provincial	Provincial	Provincial
Punishmen-num	0.1254	0	1.4	0	0	0	0
Holdpost	No	Yes	No	Yes	No	No	Yes
Skill-competition	Yes	No	No	Yes	Yes	Yes	Yes
Skill-award	Else	Null	Null	Else	First price	Second price	First price
Innovation-level	Else	Provincial	Null	Else	Provincial	Else	Provincial
Innovation-award	Else	Else	Null	Else	Second price	Second price	Else
Expression-competiton	No	Yes	No	No	No	No	Yes
Expression-award	Null	First price	Null	Null	Null	Null	Second price

5

Conclusion

The study applies the association rule technology and the improved C4.5 decision support system to teaching management. Firstly, through analyzing and studying the data of students’ evaluation of teachers, the relationship between teachers’ basic conditions and teaching quality is derived, and guiding suggestions are made for the improvement of teaching management. Then based on the C4.5 algorithm to construct the final stage grade prediction model and through experiments to assess the performance of the C4.5 algorithm before and after the improvement, the experimental results show that the improved C4.5 algorithm prediction accuracy can reach 93.43%. The improved C4.5 algorithm is used to obtain a visualized classification tree, provide an early warning for the students whose prediction results are “pass/fail”, and finally collect the performance of graduates and soon-to-be-graduated students, use K-means to cluster the students, and provide different career recommendation results for each type of students.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

Pathways to Improve the Effectiveness of Multidimensional Data Analysis in Decision Support for Instructional Management

Bing Yu

Haiying Qiao

Yumei Shan

Published Online: Sep 24, 2025

Received: Jan 15, 2025

Accepted: May 11, 2025

DOI: https://doi.org/10.2478/amns-2025-1003

KeywordsTeaching management, C4.5 algorithm, Grade prediction model, Decision support

© 2025 Bing Yu, Haiying Qiao and Yumei Shan, published by Sciendo.

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
Teaching management, C4.5 algorithm, Grade prediction model, Decision support