Application of an Improved Sequence Pattern Association Rule Algorithm-based Data Management System for Continuing Education Teaching Data in Universities

School education has always been a pressing topic. While constantly searching for innovative educational concepts, the development of a more rational and scientific approach to education calls for educational data mining in line with the constant development of the times and technology. The rapid development of modern education informatization has led to a continuous increase in educational data, making it a prominent research focus. Educational data mining has garnered significant attention, emerging as a key area within data mining research (Agrawal et al., 1995; Fayyad et al., 1996; Srikant & Agrawal, 1996) ^{[1] [2] [3]}. Among the various techniques, association rule analysis stands out as a widely used method for uncovering relationships between items in datasets. It has found extensive applications in the field of educational data mining.

Data Mining (DM) is a technique designed to aid decision-making by extracting valuable knowledge from complex, large, noisy, and incomplete datasets. The process involves extracting, transforming, analyzing, and modeling data from databases or data warehouses to derive critical insights for decision support (Fayyad, Piatetsky-Shapiro, & Smyth, 1996; Agrawal & Srikant, 1995; Srikant & Agrawal, 1996) ^{[2] [4] [5]}. This enables decision-makers to uncover patterns, identify overlooked factors, and predict trends to inform their decisions. Association Rule (AR) analysis, a key area in data mining, focuses on uncovering relationships between data domains and identifying dependencies that meet specified support and confidence thresholds. The goal of association rule mining is to extract rules such as “the occurrence of certain events leads to the occurrence of others.” For example, in student performance management, it may reveal that “students scoring ≥80 in discrete mathematics have a 60% likelihood of also scoring ≥80 in data structures,” providing valuable insights for educational decision-making.

Although data mining is a relatively new computer technology, it has demonstrated significant potential across various fields. Commonly used methods include association rule analysis, sequence pattern analysis, classification analysis, and clustering analysis (Chen & Zhang, 2010; Wei & Zhang, 2015; Zhang & Xu, 2018) ^{[6] [7] [8]}. Traditional association rule mining algorithms typically focus on identifying frequent itemsets. However, relying solely on frequent itemsets often fails to deliver high- value results, as domain-specific knowledge is frequently overlooked, leading to the generation of numerous redundant rules. These rules, already well-known to experts, lack practical value. To address this issue, we propose a Frequent Utility and Interestingness Mining algorithm based on domain knowledge (FUI_DK). This algorithm aims to discover association rules that are both practical and engaging. Recognizing that users are generally more interested in a few high-value rules, the results are ranked in descending order based on rule value using a sorting algorithm. This approach enhances the quality of association rules and provides more effective decision-making support for university continuing education data management systems.

2

Association Rule Theory

Association rule mining seeks to identify relationships between potential itemsets in data. Traditional association rules measure the strength and reliability of these relationships using two key metrics: support and confidence (Agrawal & Srikant, 1995; Han & Kamber, 2006; Srikant & Agrawal, 1996). [1] [9] [6]. Support represents the likelihood of itemsets A and B occurring together, calculated as Support(A→B) = P(A∪B). Confidence indicates the probability of itemset B occurring given the presence of itemset A, expressed as Confidence(A→B) = P(B|A).

1)

Minimum Support and Confidence: The support threshold, defined by users or experts, represents the minimum frequency an itemset must appear in the data to be considered significant. Similarly, the minimum confidence threshold measures the rule′s reliability, indicating the lowest acceptable credibility for an association rule. A rule meeting both minimum support and confidence criteria is termed a Strong Association Rule (SAR).

2)

Item sets (Itemsets): Item sets are collections of items, expressed as: Itemsets = {item2,….,itemk}. The set of items containing k items is called k-itemsets,such as itemset is:

C = {c₁,c₂,c₃} a 3-item set. The Itemsets that satisfy the support threshold are called frequent Itemsets.

3)

Support count: the support count of itemset A is the number of transactions containing itemset A in the transaction dataset, also known as the frequency or count of Itemsets. The typical association rule mining process includes: (1)

Identifying all frequent Itemsets;

(2)

Generating frequent rules based on frequent Itemsets;

(3)

Filtering and optimising the rules using criteria such as confidence level.

2.1

Related Theories

The problem of frequent itemset mining was first introduced by Agrawal et al. (1995) [1], who developed the Apriori algorithm. This algorithm employs a strategy that generates candidate sets before testing and filtering them, progressively uncovering all frequent itemsets in the database, starting with frequent 1-item sets (Agrawal & Srikant, 1995) [1]. Later, in 2000, Hai et al [10]. proposed the FP-Growth algorithm, which utilizes data compression and the FP-tree structure to generate frequent itemsets without the need for creating candidate sets (Hai et al., 2000) [10]. Yao et al. (2003) [11] were the first to address high utility data mining, recognizing that itemsets have varying utility values. This insight led to the development of several related algorithms (Yao et al., 2003) [11]. Tseng et al. (2006) [12] introduced the UP-Growth algorithm, which employs the UP- Tree structure to extract high-utility itemsets from transactional data (Tseng et al., 2006) [12]. Building on this, Wu et al. (2007) [14] proposed the TKU algorithm, derived from UP-Growth, which allows for the mining of high utility itemsets without requiring a minimum utility threshold (Wu et al., 2007) [13]. In 2010, Hui Li et al [14]. introduced the FHIMA algorithm, which is based on the PrefixSpan approach and aims to mine frequent, efficient use-item sets by pruning through relative utility bounds and quality values. In 2011, Qian Wu et al. proposed the TKHUP algorithm, which uses projection to store transaction data into a projection table and mines high utility patterns from this structure (Wu et al., 2011) [15]. In 1996, Fayyad et al. introduced the concept of domain knowledge, which has since been widely used to enhance association rule algorithms (Fayyad et al., 1996) [2]. Pan Haiwei et al. (2012) [16] applied domain knowledge-guided association rules for pattern mining in medical images, while Zhang Jing et al. (2014) [17] combined data characteristics with domain knowledge, proposing the DKARM algorithm, which removes redundant itemsets from the frequent 2-item set to enhance the interestingness of association rules.

3

Association Rule-based Data Mining Analysis Methods

3.1

Theory of Association Rules

Definition 1: Let I = {i₁,i₂,…i_m} be A set of mmm distinct items, given a transaction database DDD, where each transaction TTT is a subset of items in III, i.e., T ⊂ I, T has a unique identifier TID. a correlation rule is an implicit formula in the form of X ⇒ Y, which X ⊂ Y, Y ⊂ I, X ∩ Y = φ.

Definition 1: Degree of Support of a set of items.

(1)

D_{s u p p} (X) = \frac{X}{| T |}

Where |T|:T number of records in the file, X: number of records containing X in T.

Definition 2: Support for rules (2) $D_{s u p p} (X \Rightarrow Y) = D_{s u p p} (X Y) = \frac{X Y}{| T |}$

Where |T|:T the number of records in the file, XY: the number of records in T that contain both X and Y.

Definition 3: Degree of Confidence (3) $D_{c o n f} (X \Rightarrow Y) = = \frac{X Y}{X}$

Based on Eq. (1) and Eq. (2), we have the following conclusion.

(4)

D_{s u p p} (X \Rightarrow Y) = \frac{X Y}{X} = \frac{\frac{X Y}{| T |}}{\frac{X}{| T |}} = \frac{D_{s u p p} (X \Rightarrow Y)}{D_{s u p p} (X)}

Definition 4: An association rule meeting both the minimum support threshold (minsup) and minimum confidence threshold (minconf) is termed a strong association rule. An itemset is considered frequent if it satisfies minsup. The collection of frequent kk-itemsets is typically represented as L_k.

Finding all frequent itemsets is the key to data mining with association rules. The problem of association rule mining is to solve all the association rules in D that have support and confidence levels higher than min_supp and min_con respectively, i.e., to solve the rule X ⇒ Y that satisfies D_supp(x ⇒ Y) = min_supp and D_supp(X ⇒ Y) ≥ min_con.

3.2

Steps of Association Rule Mining to Solve the Problem

Steps to Solve an Association Rule Problem: 1)

Data Preprocessing: Prepare the data relevant to the extraction task. Manipulate the database as required to create a standardized database D.

2)

Identifying Frequent Itemsets: Find itemsets in D that meet the minimum support threshold, known as frequent itemsets. This step is crucial since databases are typically large, making it the algorithm′s core component.

3)

Rule Generation: Generate rules that satisfy the minimum confidence threshold to form the rule set RRR, which is then interpreted and output.

3.2.1

Association Rule Algorithms

3.2.2

Conventional Algorithms for Association Rules

The Apriori algorithm, introduced by Agrawal et al. in 1993, is one of the most prominent methods for mining frequent itemsets in Boolean association rules (Agrawal, Srikant, 1993) ^[18]. Its name stems from its use of prior knowledge about the nature of frequent itemsets, and it operates through an iterative process called layer-by-layer search. In this approach, k-itemsets are used to identify (k+1)- itemsets. Initially, the frequent 1-itemsets (denoted as L₁) are determined. Then, L₂ helps to find frequent 2-itemsets, L₃ helps find frequent 3-itemsets, and the process continues until no further frequent k-itemsets can be identified. A key limitation is that every Lk requires a full scan of the database, resulting in high I/O operations, which can severely affect performance when dealing with large datasets.

To enhance the efficiency of the Apriori algorithm, several variations have been proposed, including methods like hash counting, transaction compression, and partitioning. While the Apriori candidate generation and testing approach can significantly reduce the candidate set size and improve performance, it still faces challenges, such as generating many candidate sets, repeatedly scanning the database, and matching large candidate sets, which lead to considerable overhead. The FP-growth algorithm addresses these issues by eliminating the need for candidate generation. Instead, it uses a partitioning approach where the database is compressed into an FP-tree that retains itemset association information. The compressed database is then divided into conditional databases, each associated with a frequent item, and mined separately. The FP-growth algorithm transforms the problem of finding long growth patterns into a recursive task of discovering shorter patterns and concatenating suffixes, which improves efficiency by selecting the least frequent items as suffixes, offering better selectivity.

Performance studies of the FP-tree method demonstrate its efficiency and scalability in mining both long and short frequent patterns, with speeds approximately an order of magnitude faster than the Apriori algorithm. However, for large databases, constructing memory-based FP-trees becomes impractical.

3.2.3

Improved Association Rule Algorithm

Building on the strengths of the previously discussed algorithms, the FP-growth method is enhanced by incorporating features specific to academic affairs management databases. The improvement approach works as follows: first, the database D is scanned, similar to the Apriori algorithm, to determine the frequency of each item. Based on the minimum support threshold, candidate 1-itemsets are identified. Simultaneously, the database is divided into n non-overlapping parts by transaction, and the minimum support for each part is calculated. The FP-growth algorithm is then applied to mine local frequent itemsets from each part. In the second database scan, the actual support for each candidate is evaluated, and the Apriori algorithm’s approach is used to determine the global frequent itemsets. In addition, because the transaction database is usually generated year by year, many of the data are used in the mining, and produce a lot of effective data, so this paper also proposes a timesharing mining method, that is, the data generated in the previous mining is retained, in the future mining only the new data processing (here the assumption is that the support level remains unchanged, if the support level changes, then this method is invalid), thus greatly improving the speed of mining, and the speed of mining is greatly improved, and the support level of each candidate can be changed. The following is the algorithm description step of the improved algorithm.

Algorithm: Improved Algorithm

Inputs: transaction database D; itemset minimum support min_suppl; rule minimum support min_suppR

Output: the set of global frequent items L in D.

Methods:

1)

Check the retained history mining records, if the history mining records are empty, scan the database D for the 1st time; otherwise, select the data items that have not yet been mined from the database D to form a temporary database D, and scan the database D for the 1st time^*;

2)

Derive the number of occurrences of each data item, and if it is greater than or equal to min_suppl, add the data item to L;

3)

Divide the transactions contained in L into N non-overlapping parts;

4)

Derive the set of locally frequent terms in each of the partitioned parts separately using the FP-growth algorithm;

5)

A second scan of the database D or D^* evaluates the actual support of each candidate, and the algorithm ends with a global frequent item set based on min_suppR.

3.3

Description of the FUI_DK Algorithm

Firstly, based on the idea of sequence pattern mining algorithm, find out all frequent and high utility itemsets, project the sequence dataset, calculate the support and utility of the transactions in the prefix sequence transaction table in each recursive process, The itemsets that meet the minimum support and utility thresholds are retained. Next, association rules are generated from these itemsets, followed by filtering to retain only those that satisfy the minimum confidence and interestingness thresholds. Finally, the association rules are output in descending order, ensuring they meet all specified conditions.

3.3.1

FUI_DK algorithmic Step

The algorithm proceeds with the following steps.

Step 1 Scan the sequence dataset S once and generate all prefixes of length 1.

Step 2 Calculate the support count and the utility value of each prefix, and remove the prefixes that do not satisfy the minimum support threshold and the minimum utility threshold from the sequence transaction table, and then get all the frequently used prefixes of length 1, k=1.

Step 3 Perform recursive computation for each result of length k that satisfies the conditions. 1)

Generate a prefix projection database that satisfies the condition. If the projection database is empty, it is returned recursively.

2)

Compute the support count for each item in the projection database to determine its support and utility. If the support counts of all items fall below the minimum support or utility threshold, the process returns recursively.

3)

Merge the individual items that satisfy the threshold with the current prefixes to obtain a number of new prefixes.

4)

Let k= k+1 and perform step 3 recursively for each new prefix obtained in (3) respectively.

Step 4 Generate association rules based on the set of all frequent efficient terms obtained in Step 3. If the association rule fails to meet the interestingness criteria and its confidence is below the minimum threshold, it is discarded.

Step 5 Sort the combined value of the association rules according to the corresponding weights of the indicator parameters set by the user.

The pseudo-code for the FUI_DK algorithm is presented in Algorithm 1.

Algorithm 1 FUI_DK algorithm

Input: Sequential transaction set S; Minimum thresholds α, β, γ; Combined value parameter weight λ₁,λ₂,λ₃ Output: Frequent and efficient use of interesting association rules.

1. begin

2. Scan S, Generate the pref ix, length (pref ix)= 1

3. k = 1, Count (pref ix)

4. Get Support(pref ix) and Utility! pref ix)

5. if Support(pref ix) < ∂ and Utility ( pref ix) < ß

6. delete pref ix f rom seq

7. f or k=1, k=k + 1 do

8. Generate the pref ix _ projected database

9. if database = null, return

10. Count(database item)

11. Get Support (database_item)

12. Get Utility(database_item)

13. if Support(database_item ) < α

14. if Utility(database_item) < β,return

15. pref ix ← current pref ix + database _ item

16. Generate association rules

17. if Conf idence(rule) < γ and Interest=0

18. delete rule

19. Sort Value(rule)

4

Application of Association Rules in Academic Data Management System

4.1

Application of Association Rules in Faculty Teaching Evaluation

Data mining techniques have been widely used in many fields, and are mainly used in education informatisation to analyse learners′ characteristics, teaching evaluation, curriculum planning, and the provision of personalised and intelligent web services. This paper examines the use of association rules in teaching management systems, with a particular emphasis on teaching evaluation and curriculum planning.

Teaching evaluation plays a crucial role in teaching management, not only regulating, controlling and guiding the teaching process, but also having a powerful orientation function, this is a key component of the teaching management system. The school′s teaching management database records information about students and teachers’ learning, work, rewards and penalties, etc ^[19]. There are certain intrinsic relationships between these data, which contain potential laws. Applying association rules to the analysis of teaching evaluation data can discover these potential laws and help decision makers formulate forward-looking strategies.

The following is a case of applying data mining methods: 300 records of teachers′ teaching quality assessment in our university were randomly selected, and four indicators, namely, course category, teacher′s age, title and assessment score, were chosen, ignoring other data items. The relationship between these four variables was analysed through data mining. Table 1 demonstrates the preliminary association rules for the status characteristics of teachers with assessment scores ≥90 and course categories under the conditions of setting the minimum support level of 3% and the minimum confidence level of 15%.

Table 1.

Association Rules

Rule	Course Type	Title	Age	Confidence%	Support %
A	Public Foundation			15	35
B	professional foundation			41	10.5
C	Professional			40	8
D		Intermediate		36	9
E		Associate High		40	10
F		Senior		20	5
G			31-35	34	9
H			36-49	38	10
I			50-60	15	4.5

From the rules in Table 1, it can be seen that young and middle-aged teachers have gradually matured and have certain teaching abilities; while teachers with intermediate titles or above have higher teaching standards, but teachers with junior titles still need to be better trained. The teaching effect of public foundation courses is poor, and schools should analyse the reasons in depth (e.g. public course teachers are mostly young teachers, or students are not interested in public courses, etc.) in order to formulate more scientific and reasonable countermeasures.

4.2

Application of Association Rules in Curriculum Planning

Students′ curriculum learning is a step-by-step process, and there is a certain correlation and sequence between different courses. The widespread use of network-based information systems generates vast amounts of historical data, offering strong support for decision-making. In campus network-based academic affairs systems, the advancement of applications has created the conditions for integrating relevant academic information into a data warehouse. However, as the scale of teaching expands, it is difficult for academic administrators and teachers to find out the relationship between the previous and subsequent courses directly from the student performance data, so as to make a decision on the teaching schedule. Therefore, it is necessary and feasible to use association rule mining techniques to reveal the potential patterns between courses to provide a basis for decision-making. By mining the student achievement database through the aforementioned association rule analysis method, Table 2 shows some results of the correlation (i.e., frequent 2-term set) between any two courses.

Table 2.

Curriculum Relevance Rules

Rule	Course Title 1	Course Title 2	Number of Outstanding Grades	Confidence %	Support %
A	English	Higher Mathematics	200	66.7	25
B	English	Computer Basics	220	73	27.2
C	……
D	dispersion mathematics	data structure	150	50	15
E	……

As can be seen from the rules in Table 2, when English grades are excellent, grades in Advanced Mathematics and Introduction to Computing also tend to be excellent, with high levels of confidence and support. However, according to common sense, there is no direct relationship between these two courses. Therefore, it is still necessary to further analyse and process the mined association rules, which is one of the focuses of future research. On the other hand, there is a 15% support and 50% confidence level for a good performance in Data Structures when a good performance is achieved in Discrete Mathematics. This association rule is noteworthy as it suggests that strengthening the teaching of discrete mathematics may help students to perform well in the data structures course and also suggests that a discrete mathematics course should be offered before a data structures course.

5

Experimental Evaluation and Analysis of Results

5.1

Experimental Parameter Settings

The data preprocessing leads to the set of student data sequence items. For example, {0001, 〈School of Public Administration (Basic English 0, Basic Maths 1, …) (Advanced Maths 0, Advanced English 3, …) …}, which means that the student′s number is 0001, his/her faculty is ‘School of Public Administration and Communication’, and he/she has studied Basic Maths and Basic English within the same period of time, and Higher Level Maths and Higher Level English after him/her. An item like ‘0 in Basic English’ means that the student′s grade in ‘Basic English’ is ‘excellent’. Since two different dimensions of student data are involved, there are two types of redundant itemsets: 1) intra- dimensional redundant itemsets; 2) inter-dimensional redundant itemsets. Since there are more specific redundant itemsets involved in this experiment, only the abstract redundant itemsets are given here: 1) the intra-dimensional redundant itemsets are abstractly represented as: X4→Y4, for example, if there is no certificate of English IV, then there is no certificate of English VI for sure; 2) the inter- dimensional redundant itemsets are abstractly represented as: nature of the profession→ mandatory courses for the profession, for example, a foreign language major requires that the students′ performance of the basic language must be above “average”. ‘General’ and so on. As the student academic data for correlation mining, involving the most objects for the course-related data, credits represent the importance of this course to a certain extent on behalf of the degree of the student, so the experiment will be the course credits (credit) as the main evaluation standard of the utility value of the non-course-related items to the number of students as the main evaluation standard. Its utility value is expressed in the form of: U=f-Count, where the course-related term fc=credit/∑credits.

5.2

Analysis of Experimental Results

In order to evaluate the performance of FUI_DK algorithm, performance comparison experiments are carried out with TKU algorithm, which is also an efficient pattern mining algorithm. TKU algorithm is based on UP-Growth algorithm, which performs pruning of itemsets with low utility values by scanning the database in the second time, and then constructs the UP-Tre. Experiments are carried out on the aforementioned student dataset using the above method, and the FUI_DK algorithm is evaluated on the time based on the experimental results. performance evaluation. At the same time, experiments are conducted to compare the number of rules before and after redundant rule elimination. The experimental equipment is a PC with Windows 7 operating system, CPU Intelcorei5-6600, 3.3GHz, and 8GB of RAM.

Figure 1 shows the comparison of the time taken by the FUI_DK algorithm and the TKU algorithm as the minimum support varies. Experimental results indicate that the FUI_DK algorithm outperforms the TKU algorithm in time efficiency as the minimum support increases. The computation time of FUI_DK algorithm is lower than that of TKU algorithm by 125s on average if the redundant rules are removed, and by 215s on average if the redundant rules are not removed, although the removal of the redundant rules makes the computation time overhead of FUI_DK algorithm increase, the time performance of FUI_DK algorithm remains better than that of TKU algorithm in this problem. performance remains better.

Figure 2 gives the number of rules generated by the FUI_DK algorithm before and after eliminating redundant rules. So that the minimum confidence and minimum utility threshold is consistent, in this experiment are set to 0. Through the experimental results can be seen, with the increasing minimum support, the number of rules continues to decrease, the experimental calculation of this algorithm on the elimination of redundant rules up to 43%. In the actual application of analysis, the user often need to spend a lot of energy to eliminate these redundant rules, the use of this algorithm for the user can not only avoid the waste of time on the invalid rules, but also greatly reduce the user′s difficulty in analysing the results.

6

Conclusion

This paper proposes an association rule-based application mining application in university education faculty data management system, aiming at mining the association relationship between students′ coursework grades in faculty data. The core principle of FUI_DK algorithm is detailed in the paper, and the performance of the algorithm is verified through experiments. Based on students′ academic affairs information data, association rule mining is carried out by combining multiple dimensions such as subject colleges, nature of courses, course grades, etc., which reveals the effective association relationships among colleges, courses and grades. For students, the obtained rules can help them conduct targeted learning according to their personal needs, avoiding unsatisfactory grades or insufficient skills in subsequent compulsory courses due to unclear learning direction in the early stage; for administrators, the mining results can help to analyse the reasonableness of the course arrangement and optimize the teaching method or strengthen the teaching input according to the influence of the courses.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

Application of an Improved Sequence Pattern Association Rule Algorithm-based Data Management System for Continuing Education Teaching Data in Universities

Hua Peng

Chun Yi

Published Online: Mar 31, 2025

Received: Nov 10, 2024

Accepted: Feb 20, 2025

DOI: https://doi.org/10.2478/amns-2025-0826

Keywords<kwd>sequential pattern mining</kwd>, <kwd>association rules</kwd>, <kwd>further education data</kwd>, <kwd>algorithms</kwd>

© 2025 Hua Peng et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Keywords
<kwd>sequential pattern mining</kwd>, <kwd>association rules</kwd>, <kwd>further education data</kwd>, <kwd>algorithms</kwd>