A Data-Driven Approach to Optimizing Teaching Effectiveness in the Field of Chinese Language Chinese Education 
Publié en ligne: 19 mars 2025
Reçu: 17 oct. 2024
Accepté: 29 janv. 2025
DOI: https://doi.org/10.2478/amns-2025-0381
Mots clés
© 2025 Xiaoyu Yang, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
The new generation of scientific and technological revolution and industrial revolution are reconstructing the whole social system, and digital technology, as a basic element integrated into the social structure, is changing the production and life of human beings in all aspects. In the context of the digital era, “data becomes a factor of production” has gradually become a consensus, and the good use of language and technology is also the core quality of talents in the 21st century [1–2]. Therefore, the use of data-driven approach to improve the effectiveness of the teaching practice of Chinese language and literature education, through the cultivation of applied talents with rich subject knowledge, humanistic qualities and cross-cultural communication skills, to make a positive contribution to the strengthening of international communication capabilities [3–5].
The traditional teaching of Chinese language and literature has certain deficiencies, and if we want to further improve the effect of this teaching, bring into play the practical utility of teaching, and improve the students’ Chinese language ability, we need to innovate the traditional teaching methods. Firstly, the use of digital technology can enrich the teaching methods of Chinese language and Chinese education. Teachers can promote students’ interaction and participation through online discussions, blogs, social media, etc. Meanwhile, students can share their own views, comment on the insights of others, and participate in the interpretation and discussion of the course’s teaching content [6–9]. Second, digital technology can help teachers understand students’ interests and levels and provide personalized learning guidance for students, and students can also choose the literature or topics they are interested in to better meet their individual learning needs [10–12]. Online platforms can provide real-time feedback and assessment to help students improve their writing and analytical skills, while teachers can more easily track students’ progress and provide guidance [13–14]. Finally, digital technologies can also enable students to access worldwide literary resources, including classic literature, research materials, and online courses, expanding their horizons and giving them access to more literature from different cultures [15–18]. At the same time, students can also utilize the Internet for independent literary research and academic writing, finding literature, writing papers and participating in discussions in academic communities online [19–21]. In conclusion, the powerful data integration capability of digital technology enables students to acquire and utilize knowledge more effectively, and the data-driven teaching approach will also enhance the teaching effectiveness of Chinese language Chinese courses.
This paper uses data mining technology to extract and analyze recruitment information from BOSS Direct Recruitment, Wisdom Union Recruitment, and other websites. Combined with word frequency analysis and semantic network visualization, we can show the industry status and market demand of Chinese language education, thus providing theoretical support for optimizing the teaching program. Analyzing the course grades of undergraduate students majoring in Chinese linguistics in X university, combining with the correlation analysis method, the overall correlation between different classes of courses, the variables reflecting each class of courses, and the correlation of these courses are obtained. Support is provided for teachers to develop appropriate course programs.
The concept of “data-driven” has been noticed and frequently used with the rise of big data. In the era of big data, data-driven has become a general trend. The continuous collection and accumulation of data on the teaching process and results has led to the construction of rich teaching big data. Through in-depth mining and multivariate analysis of these big data, the significance and value of teaching and learning can be revealed, thus providing powerful data support for teachers’ teaching and students’ learning.
Data mining originated in the discovery of knowledge from databases, and it first appeared at the 1st International Joint Academic Conference on Artificial Intelligence in Detroit. Data mining is an emerging technology that has emerged in recent years due to the development of artificial intelligence and database technology. It is the process of extracting from a large amount of incomplete, noisy, fuzzy, and random data of practical applications the information and knowledge implicitly contained therein that people do not know beforehand, but are potentially useful. Through data mining, valuable knowledge, rules or high-level information can be extracted from the relevant data collection of the database and displayed from different perspectives, so that the large database can serve as a rich and reliable resource for knowledge extraction.
Through the research on students, teachers and teaching managers, it is understood that the current course curriculum is mainly set up with their own experience, learning from other schools or researching in enterprises to complete, without considering the learning situation of students, there may be some unreasonable situations. This system is mainly used for the school’s internal curriculum management, through the departments at all levels of research and user exchanges, found that the main users of this system are the Registrar’s Office, departments, teachers, students, four types of users. After examining the users in accordance with the project requirements. It can be sorted out that there are six types of user types: academic affairs, departments, teachers, students, faculty leaders, and administrators. After the survey and communication with the school teachers and students. Obtained the following results of the requirements can be done for each student performance analysis. Be able to conduct performance analysis for each course. Be able to perform performance analysis for each grade level of students. To be able to perform correlation analysis for multiple courses. To summarize the results of the requirements above. We summarize the analysis points into the following three categories:
 Students can be analyzed in terms of classes, majors, grades, and colleges, in addition to individual students, who have a hierarchical relationship. Courses can be analyzed in terms of a single course or by the nature of the course. Time must be a dimensional data which has an innate hierarchical relationship, our needs are semester, academic year.
The data warehouse architecture used in this system is shown in Figure 1.
 Data source, i.e. the data source of the data warehouse, is the foundation of the data warehouse. The data sources to be used in this paper mainly come from the database of the student management system and the database of the academic affairs management system, as well as some files. The so-called data cleansing is to remove the dirty data or noise from those data that enter the data warehouse, and for the irregularity, duality, duplication and incompleteness of data in multiple data sources, the problematic data are subjected to corresponding cleaning operations. Data cleaning includes processing null values, noise data, and inconsistent data. Because the teaching management system student information is imported into the system by teachers or administrators, the file is entered by the teacher’s own resulting in some fields for the null value and the value of irregularities, we use the screening function for manual correction. As the data warehouse is a different database system, scattered data integrated in a public platform, so the data in these different formats should be consistent and standardized. That is, through data conversion is to convert the data in the data source into the data in the data warehouse according to the conversion rules. Data loading is the data source data after cleaning, conversion of the data formed into the data warehouse. Data loading is an essential part of background processing, which can be achieved with the help of software tools. In the loading operation, some final conversion can be carried out, but it should be completed before the final loading operation to eliminate all inconsistencies in the conversion.

Data warehouse architecture
A semantic network is a network diagram that expresses knowledge through concepts and their semantic relations, i.e.: a knowledge representation tool. The use of semantic networks to describe human knowledge of things is actually a simulation of human brain function, and it is hoped that such semantic networks can be used to carry out knowledge derivation. Natural Language Understanding System. Because the knowledge expressed is mainly relational, the biggest advantage of using semantic networks to characterize knowledge is that it can reflect human knowledge of the nature of objective things more accurately. As a major knowledge representation based on network structure, semantic networks have been widely used in various fields due to their powerful expressive ability and flexibility, which enable them to express concepts, rules and their correlation problems through various mechanisms.
Correlation analysis can be categorized into broad and narrow correlation analysis. Correlation analysis, in its broad sense, is the analysis of the connections between things. The analysis method includes association rules, social network analysis, time series analysis, cluster analysis, and so on. Narrow correlation analysis refers to statistical correlation, which is to analyze two or more variables with correlation, so as to measure the degree of correlation between two variables and factors, and the analysis methods include Pearson correlation analysis and Spearman correlation analysis. The process of choosing a suitable subset of a target variable from a set of related variables is referred to as data-driven feature selection. Usually the feature selection strategy can be divided into the following three types: the main feature of encapsulation is to select different permutations to form multiple subsets of correlation variables from the initial set of correlation variables of the target variable, and then analyze the results of each subset of the training to evaluate the error and select the optimal subset.
In data-driven correlation analysis, there are a large number of characteristic variables, among which correlation is a very important relationship. Correlation refers to the existence of a very strict deterministic relationship between two variables. For example, sunlight and soil play a decisive role in the process of plant growth, and there may be a correlation between the chlorophyll content in the leaves and the nutrient content in the soil. Therefore, this difficult-to-prove relationship is the focus of this paper when conducting variable analysis.
The Pearson correlation coefficient, also known as the Pearson product-moment correlation coefficient, is used to measure the correlation between two variables 
For the covariance and standard deviation of the sample, the Pearson correlation coefficient can be expressed as:
The Pearson correlation coefficient for the sample can also be estimated from the mean of the standardized scores for the (
The Pearson correlation coefficient has the following characteristics:
 The minimum value of the Pearson correlation coefficient is  The maximum value of the Pearson correlation coefficient is  The Pearson correlation coefficient can be positive or negative, with a positive coefficient meaning that 
Traditional data collection is mostly manual entry, and its method is inefficient and cannot meet the data requirements of this paper. In this paper, with the help of Octopus Collector as a data collection tool, data can be collected efficiently, and the amount of data collected is rich enough to support the requirements of data analysis, and can greatly improve the reliability of data analysis results. Taking BOSS Direct Employment as an example, the collection process is as follows: First, open the recruitment website, search for the keywords “Chinese language teacher”, “international Chinese teacher”, etc., and copy the URL. Second, open the Octopus Collector, create a new custom task, paste the URL, automatically identify the web page data, and set the page collection, which can ensure the collection of more comprehensive data. Then after setting the use of specified cookies, modify the field information so that the data collected afterward can be presented more intuitively. Finally, after all the settings are saved, the data collection can be started and exported to an Excel table at the end. Part of the collected data is shown in Table 1.
Partial acquisition data
| Job name | Time | Wages | Experience, degree requirement/year | Welfare | 
|---|---|---|---|---|
| Foreign Chinese teacher | 202410/26 | 8000~10000 | 1 | Traffic subsidy | 
| Chinese teachers can speak Japanese | 202410/26 | 6000~8000 | 2 | Performance bonus | 
| Foreign Chinese teacher | 202410/25 | 6000~8000 | 1 | Elastic work | 
| Foreign Chinese teacher(English medium) | 202410/25 | 75/h | 2 | One gold | 
| Foreign Chinese teacher | 202410/18 | 8000~12000 | 3 | Traffic subsidy | 
| Foreign Chinese teacher National trade/bright horse bridge campus | 202410/26 | 5500~8200 | 1 | Performance bonus | 
| Foreign Chinese teacher Spanish | 202410/26 | 3000~4500 | 1 | Elastic work | 
| Foreign Chinese teacher spring river garden | 202410/26 | 4500~6000 | 1 | One gold | 
| Foreign Chinese teacher Jiuxian bridge | 202410/26 | 7500~9300 | 2 | Elastic work | 
| Foreign Chinese teacher | 202410/25 | 3500~4200 | 1 | One gold | 
| Foreign Chinese teacher | 202410/18 | 5000~7200 | 1 | Elastic work | 
| English and Chinese teachers japanese | 202410/26 | 5200~8200 | 2 | One gold | 
| German and Chinese teachers | 202410/26 | 8000~10000 | 3 | Elastic work | 
| English and Chinese teachers | 202410/26 | 7000~9000 | 2 | Elastic work | 
| Shunyi district | 202410/26 | 3000~4200 | 1 | Elastic work | 
| English and Chinese teachers | 202410/23 | 4000~5000 | 1 | Elastic work | 
Using Octopus Collector to collect data from the four major recruitment websites, BOSS Direct, Wisdomlink Recruitment, 58 Tongcheng, and MileagePlus, a total of 4,390 pieces of related recruitment data were collected as of November 12, 2023, and finally 2,746 pieces of usable data were obtained after preliminary screening, de-emphasis, and other pre-processing. There are 12 fields obtained from this collection, which are job title, posting time, benefits, salary, experience, education, work location, company nature, company size, professional restriction, number of recruits, and job responsibilities.
The results of word frequency analysis as shown in Table 2, Chinese International Education (1142), Chinese Language and Literature (1095), Applied Linguistics (857) and other education and language majors are more in line with the market demand for the position, which indicates that the position to be recruited for the Chinese language and literature knowledge of the talent. Some companies are recruiting education majors such as elementary education and preschool education, which may have specialized teaching abilities for younger students. A small number of enterprises will restrict recruitment to only those who have graduated from language majors such as Korean (814), Spanish (320), Portuguese (267), etc. The requirement that international Chinese language teachers be able to master the teaching ability of Korean, Spanish and other small languages may reduce the difficulty of learning and increase the efficiency of learning for second-language learners who have no foundation in Chinese. Most of the enterprises in the market are for-profit private enterprises, which also have certain requirements for the operation and sales abilities of teachers. Therefore, recruiting international Chinese language teachers who possess marketing skills is more likely to provide assistance to enterprises and enable them to develop in a sustainable way.
Professional restrictions on some frequency statistics
| Order | Key words | Word frequency | Order | Key words | Word frequency | 
|---|---|---|---|---|---|
| 1 | Foreign language | 1184 | 15 | applied | 647 | 
| 2 | Chinese international education | 1142 | 16 | Chinese | 552 | 
| 3 | Chinese language and literature | 1095 | 17 | Japanese | 543 | 
| 4 | Philology | 1054 | 18 | Small language | 542 | 
| 5 | Pedagogy | 984 | 19 | Preschool education | 511 | 
| 6 | Priority | 963 | 20 | Psychology | 421 | 
| 7 | Chinese | 952 | 21 | Management | 410 | 
| 8 | English | 884 | 22 | Baby education | 334 | 
| 9 | Applied linguistics | 857 | 23 | Spanish | 320 | 
| 10 | Korean | 814 | 24 | Translate | 311 | 
| 11 | Liberal arts | 786 | 25 | Portuguese | 267 | 
| 12 | Literature class | 694 | 26 | Marketing | 238 | 
| 13 | Foreign language | 682 | 27 | Market | 214 | 
| 14 | Normal | 655 | 
After a single analysis of job responsibilities, a semantic network diagram of job responsibilities is obtained, which presents a structure of “core-edge”, the semantic network of job responsibilities is shown in Figure 2, and the core word is “teaching”, indicating that the social market generally regards the teaching ability of international Chinese teachers as a basic requirement, from the marginal words “Mandarin”, “team”, “communication”, “teaching materials”, “courseware”, “sales”, “small language” and other words.

Role semantic network
On the basis of data mining, according to the characteristics of Chinese language disciplines in different colleges and universities, according to the development strategies of colleges and universities, combined with the type of talents and other factors, the following strategies are proposed in terms of teaching effectiveness.
 In order to formulate a scientific and reasonable talent introduction plan, we should first consider the characteristics of Chinese language disciplines in the school, and also according to the development goals of the school, the school’s teacher construction and other factors, and formulate a strategy for the introduction of talents with a certain degree of operability. Formulate and implement measures to promote teaching reform, incentivize and guarantee teaching reform, explore various talent cultivation modes in the form of pilot classes, break the single pattern of talent cultivation, and broaden the way of talent cultivation. The current form of experimental classes is relatively simple, but more in-depth experimental class forms can be explored. Deepen the joint cultivation of schools and enterprises, excavate and analyze students’ employment data, carry out school-enterprise cooperation in educating people, explore various types of school-enterprise joint talent cultivation modes, change from university-based cultivation to joint cultivation of schools and enterprises, and jointly formulate cultivation plans by schools and enterprises to create a “zero-adaptation period” employment mode, and improve the career development ability of graduates. Implementing full data tracking and optimizing the “practice and innovation” education mode. Select outstanding students and form experimental classes for outstanding and innovative talents. The curriculum system enhances professional and innovative education by strengthening practice and innovation, exploring the integration of theoretical and practical courses, and relying on university research teams. We implement full data tracking and mining analysis, and optimize the practice and innovation education model according to the data mining results of the experimental class, so as to enhance the teaching effect of the top innovative talents.
The traditional course relevance analysis is only a simple analysis of the course content, course objectives, course name, it lacks the valuable information reflected in the data generated by the actual situation after the opening of the course, with the help of data reflecting the intrinsic relevance of the course, and then in-depth study, not only for the optimization of the curriculum to provide a basis for decision-making, but also put forward the corresponding supplementary recommendations.
The data for the course correlation analysis in this study were obtained from the course grades of undergraduate students majoring in Chinese Linguistics from the class of 2019 to the class of 2023 at X university. The grade pool contains the following three main types of information:
 Students’ basic information: student number, name, grade, college, major, and class. Relevant attributes of the course: code, name and semester of the college where the course is offered; code and name of the course category. Course source, code, name, and credits. Name and class number of the employee teaching the course. Student’s study status: study status (make-up, retake, deferred), grade, credits earned, GPA earned. The data directly exported from the Academic Affairs Office of the college could not be directly used for data analysis due to the problems of data redundancy, multiple data sources, and non-compliance with the algorithm’s requirements for data, etc. The specific data collected are shown in Table 3. Therefore, it is necessary to convert the vertical structure of the Academic Affairs Office’s grade database into the horizontal structure required by the relevant algorithms, and go through the pre-processing stages of selecting, cleaning, and converting the data. Data integration is a process of merging disparate data sources into a single entity for data analysis. Since student grades in the grades database are based on different grades from different semesters, the data sources need to be merged after cleansing.
Student data
| Semester | Course source | Course type | Course name | Student number | Name | Grade | 
|---|---|---|---|---|---|---|
| 12134 | Teaching plan | General education | lexicology | 20231102 | Zhang ** | 74 | 
| 12134 | General education | General education | Chinese philosophy | 20231215 | Li ** | 66 | 
| 12134 | Teaching plan | Subject course | Ancient Chinese | 20231327 | Gu ** | 78 | 
| 12134 | rework | Subject course | Chinese character evolution | 20231228 | Wang ** | 76 | 
| …… | …… | …… | …… | …… | …… | …… | 
After selecting the achievement data of the discipline-based courses, the simple correlation analysis method in SPSS was chosen, and the results of its analysis are shown in Fig. 3. A~F: Chinese history, Chinese public speaking, advanced Chinese translation, Chinese international relations, Chinese art, and conversation class. Among them, the correlation between Chinese international relations and advanced Chinese translation is the highest, with a correlation coefficient of 0.611.

Simple correlation analysis of the basic course of the subject
The results of the analysis are further organized to obtain the courses that exhibit statistically significant correlation, low correlation, and no correlation for each course. This study focuses on analyzing the significantly related courses, and the significant correlations among the discipline-based courses are shown in Figure 4:

Significant and relevant discipline basic courses
Combining with the Syllabus in the field of Chinese language Chinese education to further organize and analyze the significantly related courses, the following course correlations were obtained.
 Embodied course correlations The correlation coefficient of Advanced Chinese Translation and China and International Relations is 0.611 the correlations of these two courses have embodied Advanced Chinese Translation are both practical skills courses, which can provide students with certain help in their future work. China and international relations and Chinese art are cultural courses, which can expand students’ knowledge of culture and several parts of the content of the step-by-step process of in-depth learning, so the two courses are significantly correlated. The correlation coefficient of Chinese history and conversation class is 0.53. The correlation between these two courses is that Chinese history is the first course of conversation class, and students have formed a certain knowledge and understanding of the technical foundation after systematically learning the knowledge of Chinese history, so the two courses are significantly correlated. Newly discovered course correlations Chinese art is significantly correlated with Advanced Chinese Translation and Chinese international relations. On the surface, there is a big difference between Chinese art and practical skills courses in terms of learning content, but from the viewpoint of course purpose, both practical skills courses and Chinese art pay attention to the cultivation of students’ logical thinking. From the perspective of teaching content, Chinese art will use some basic knowledge in teaching Chinese language, and so on, so the two shows significant correlation in the experiment. Relevance of Courses to be Studied The correlation coefficient between conversation class and advanced Chinese translation is 0.51 According to teaching experience, conversation class is usually related to practical skills courses, and it is significantly related to translation courses, maybe because of the evaluation criteria, and it is necessary to collect more data to further analyze and verify in order to come up with the specific reasons. Analysis of the reasons for the weak correlation of the data of some related courses In comparing the syllabi, it is found that Chinese public speaking, as a pioneer course in Chinese art, should be closely correlated theoretically, but the correlation coefficient of the two courses is only 0.157. Maybe it is because of the inconsistency of students’ familiarity with and interest in Chinese public speaking courses in different semesters, and the specific reasons need to be analyzed and verified by collecting more data for further analysis and verification. The research results of this paper can better guide students in selecting courses and help teachers optimize their teaching plans. It guides teachers to adjust the learning intensity of each subject suitable for different students. It also helps teachers understand the students’ learning situation in time, and then adjust the course content to encourage them to study and improve the quality of teaching. In order to better optimize the curriculum system, we can also consider combining the students’ awards in school and the development of 5-10 years after graduation to carry out further excavation and analysis, and help the country to cultivate better builders and successors.
This paper mainly relies on information from mainstream recruitment websites and the course grades of undergraduate students majoring in Chinese linguistics for data mining. Combined with the methods of word frequency analysis, semantic network analysis, and correlation analysis, it discusses the optimization strategies for teaching mode and curriculum to improve the teaching effect. It can be seen that the majors of Chinese international education, Chinese language and literature, and applied linguistics are most in line with the needs of the job market, and the job market values the teaching ability of Chinese language majors the most. Therefore, optimized teaching strategies in terms of deepening joint cultivation between schools and enterprises are proposed. The correlation coefficient of the two types of practical skills courses, Advanced Chinese Translation and China International Relations, is 0.611, therefore, paying more attention to practical skills courses in the curriculum and teaching process can improve the quality of teaching.
