Uneingeschränkter Zugang

Mining and Teaching Design of Civics Topics in English Courses Based on LDA Topic Modeling

  
21. März 2025

Zitieren
COVER HERUNTERLADEN

Introduction

Colleges and universities have always been an important position for the ideological and political work of the Party and the State. The construction of the ideological and political construction of college and university courses is an important hand in the implementation of the spirit of “three-pronged education for all”, which is the answer to the question of how to implement the spirit of the speech in the construction of the college and university talent cultivation course system and to improve the quality of talent cultivation in all aspects. Curriculum reform from the perspective of “big ideology and politics” is aimed at infiltrating ideological and political education into the entire curriculum system of talent training, and realizing that other course modules besides the ideological and political theory course modules of colleges and universities also give full play to the role of curriculum in educating people [1-3]. Curriculum Civic and Political Construction is to deeply excavate the ideological and political education content embedded in other course modules of various majors, and to achieve the same direction of professional knowledge, comprehensive literacy and ideological and political theoretical knowledge of the three teaching contents in the curriculum construction system, synergistic nurturing, so as to realize the effect of ideological and political education in the whole process of the construction of personnel training course system, full coverage [4-6].

College English course is a comprehensive literacy course compulsory for students in colleges and universities, which not only teaches students English subject knowledge as the teaching goal, it also bears the responsibility of cultivating students’ humanistic literacy [7-9]. Values leadership and ideal beliefs cultivation for students in the course is one of the objectives of humanistic literacy in this course, which coincides with the current national goal of “curriculum civic politics” construction work. Currently, college students are basically born in the network era, in order to cater to the behavioral preferences of college students, the use of big data analysis model to mine educational topics and design teaching activities, in order to optimize the effectiveness of the course Civics teaching [10-14]. It can not only increase the vividness of the course, but also stimulate students’ learning enthusiasm, which has good practical value [15-17].

In order to realize Civics topic mining and thus optimize the teaching design, this study combed the techniques of text modeling representation, preprocessing methods and LDA topic models in topic mining. For the teaching and commenting texts of English courses, the NLPIR segmentation system is used to segment the words, and the TF-IDF weights are calculated based on the frequency of the words. Due to the complexity of variational inference in the LDA model, this paper uses the collapsed Gibbs sampling method to estimate the LDA parameters. Hot words related to Civics topics in English courses are extracted based on the topic-word item and document-topic probability distributions, and the popularity of Civics topics is calculated. The mining results of Civics topics were visualized to explore the changes in Civics topic hotness over time. Sentiment analysis of high-frequency words and LDA mining were conducted on the teaching evaluation text to summarize the factors affecting students’ experience in teaching and to provide reference and basis for the optimization of teaching design.

Topic mining-related technologies
Text modeling representation

Text is composed of words, and text belongs to the irregular data model, but the machine can only recognize the regular data model, so before clustering or classifying the text, the text should be processed, so that it becomes the data that the machine can “read” into. Vector space model (VSM) is an effective way to convert text data types.

Vector space model (VSM) is similar to the function mapping relationship in mathematics, i.e., the text is mapped to form vectors in space, the essence of which is that the unstructured data is transformed into structured data, and then the cosine similarity is used to express the spatial similarity of the vectors, which is therefore simple in structure and easy to compute [18].

Among them, the word weight w can be calculated from two perspectives, one is obtained by statistics, i.e., counting the number of occurrences of words, and the other is obtained by some algorithms, such as the TF-IDF algorithm and so on. In addition to the text vectorization based on word weights, there are also vectorization methods based on unique heat coding and word embedding, specifically, the vectorization method based on unique heat coding is mainly to number each word in the text, and divide it into 0s and 1s according to the existence or non-existence of the word in the text, which is a simple method, but it cannot show the word intrinsic connection. The vectorization based on word embedding is used to convert simple 0 and 1 into continuous real numbers, which can reflect the association between words through continuous real numbers and also reduce the word dimension.

Text pre-processing
Text Acquisition and Segmentation

Natural language processing, in essence, is the use of computers to mine the valuable information in the text. In this process, text preprocessing is one of the important links. It determines the subsequent text modeling and analysis of the effect. Text preprocessing mainly involves word segmentation, removal of invalid words, lexical labeling, etc. [19].

Text acquisition is usually through the specified database extraction or the use of crawlers to actively obtain the desired data, through the writing of SQL statements or crawler scripts, so that the machine automatically to obtain a large amount of data.

Commonly used segmentation rules can be categorized into matching rules, semantic rules, and statistical rules.

Matching rules. Commonly known as the control rule, the main idea is: the text to be analyzed with a “large thesaurus” in the elements (entries) for the associated analysis, if the “large thesaurus” can be found in a certain character, the string associated with success. Matching rules can be categorized into forward and reverse matching, as well as longest and shortest matching.

Semantic rules. In layman’s terms, the semantic rule is that the machine imitates the human’s understanding of the text, so as to achieve the purpose of word division. Specifically, in the division of the sentence, different divisions will produce different meanings. Based on this, in order to avoid the phenomenon of semantic ambiguity, semantic rules is the use of “human” control of semantic information and syntax of the characteristics of the word, the existence of ambiguity in the word segmentation to discriminate, and this “human” is the total control part.

Statistical rules. On its own, words are actually combinations of words, and different words create different words. In the text, if two words appear in a sentence or text at the same time, and the number of times of simultaneous contact is increasing, the more it means that the two words are a combination of words, and this can be precisely measured by word co-occurrence. Based on this principle, statistical ideas can be combined to count the frequency of word co-occurrence in the thesaurus and calculate their word co-occurrence (mutual information).

Removal of invalid words

After cutting the text, there will be a large number of invalid words, such as dummy words, pronouns, worthless verbs, nouns, and so on. Since these words do not help in text mining, analysis, etc., it is necessary to remove these invalid words. Removing the deactivated words actually has no module, it is just a judgment of the processed words (phrases), if they are invalid words, eliminate them, and vice versa, keep them. Determining whether a word (phrase) is an invalid word requires the construction of an invalid word list, which is mainly built empirically. Typically, invalid words are auxiliaries, exclamations, and personal pronouns.

TF-IDF algorithm

The TF-IDF algorithm is very obvious and widely used in text processing. It is commonly used in information retrieval, text mining, and other applications. Its essence is a statistical method based on probability, the algorithm is mainly to quantify the importance of words in the text. Its essence is a probability-based statistical method, the algorithm is mainly to quantify the importance of words in the text.TF-IDF algorithm idea is based on the relationship between words, text and text set. Specifically, if a word appears repeatedly in a text, but rarely appears in other articles, the word is considered to have a good recognition ability, which can be used to distinguish between different articles, so the word weight is directly proportional to the number of times it appears in the document, and inversely proportional to the number of times it appears in the text set.

LDA Subject Modeling

The LDA model is a topic generation model, commonly known as a Bayesian probability model. Text generation model, generally speaking, each word of an article has to go through a process, this process is to select a topic according to a certain probability, and at the same time, the topic will also choose a word with a certain probability, that is, the word and the topic are “mutually selected”, so the document to the topic obeys a distribution, and this distribution is a multinomial distribution, also based on the principle of “mutual selection”, the subject to the word also obeys the polynomial distribution. So each document consists of a probability distribution composed of some topics, and each topic consists of a probability distribution composed of many words, so the model contains a three-layer structure of words, topics, and documents, respectively [20]. The LDA model, as an unsupervised machine learning algorithm, each text in the text set can be represented by a probability distribution of feature words, and the feature words with higher probability represent the potential topics of each text’s potential topic. Figure 1 shows the generated graph of LDA topic model.

Figure 1.

LDA topic model generation diagram

It can be seen that the process of generating a text in a text set D is as follows:

Input text set, assuming m documents, the text set is preprocessed to form a document-word matrix, where wij represents the jth word of the ith document.

The number of word items in the text is N.

The topic distribution of the text is θ.

For each text in document set m, the LDA generation process is specified as follows:

Select a topic zn from the set of topics (distributions) that satisfies the multinomial distribution.

Draw an adjective item wn from the drawn topic that also satisfies the conditional distribution p(wn|zn, β) for topic zn.

Repeat the above process until m documents have been processed.

LDA model needs to estimate the parameters in the process of modeling, and the commonly used estimation methods are Laplace approximation, variational approximation and Gibbs sampling method. When the size of the text set is large, Gibbs sampling method is easy and efficient in extracting topics, and this method is used in this paper to extract text clustering features.

An LDA Topic Model-based Approach to Mining Civic and Political Topics in English Courses
Text pre-processing
Segmentation methods

The LDA model regards a document as a word vector, and the word is the most basic linguistic unit, so the original text needs to be processed by word segmentation, and Chinese word segmentation is to cut the language string into words.

In this paper, we use NLPIR lexical system for word separation, NLPIR is developed from ICTCLAS lexical analysis (word separation) system, ICTCLAS is a lexical database based on HMM, the system has the functions of mixed Chinese and English lexical, keyword extraction, new word recognition and adaptive word separation and user-specialized lexicon, which has high precision and speed of word separation, and it provides the interface of C++ and Java, which can be used for secondary development and calling. The system provides C++ and Java interfaces, which can be called for secondary development.

TF-IDF weights

The word frequency (tf) refers to how often a word item appears in a document. The frequency of word item i in document j is calculated as follows: tfi,j=ni,jknk,j

The inverse document frequency (if) is a measure of the universal importance of a word, and the formula for calculating the inverse document frequency of word item i is as follows: idfi=lg|D||{j:tidj}|

where |D| is the total number of documents and |j: tidj| denotes the number of documents containing lexical item i.

TF-IDF is the product of the two, i.e: tfidfi,j=tfi,j*idfi

Therefore, the higher the frequency of a lexical item in a document and the smaller the number of documents containing it, the higher the weight of the lexical item in that document. As a result, TF-IDF tends to filter out the common words and keep the important ones.

LDA model parameter estimation

The LDA model is relatively complex and difficult to solve with exact solution methods, so approximate inference is often used to solve the LDA model, and the commonly used approximate inference algorithms are: expectation maximization, variational inference and Gibbs sampling. Variational inference is used in the original LDA, the derivation process of this method is complicated and the complexity of the algorithm is high, so in this paper, we use collapsed Gibbs sampling to estimate the parameters in the LDA model, firstly, the Θ and Φ parameters in the model are accumulated off, and each word corresponding to the topic is sampled, and after the sampling is converged, the Θ and Φ are estimated by the co-occurrence relationship of the zm,n and wm,n.

The sampling formula is deduced as follows [21]: p(zi=k|zi,w)=p(w,z)p(w,zi)=p(w|z)p(wi|zi)p(wi)*p(z)p(zi) Δ(nz+β)Δ(nz,i+β)*Δ(nm+α)Δ(nm,i+α) Γ(nk(t)+βt)Γ(t=1V(nk,i(t)+βt))Γ(nk,i(t)+βt)Γ(t=1V(nk(t)+βt))*Γ(nm(k)+αk)Γ(k=1K(nm,i(k)+αk))Γ(nm,i(k)+αk)Γ(k=1K(nm(k)+αk)) nk,i(t)+βtt=1V(nk,i(t)+βt)nm,i(k)+αkk=1K(nm(k)+αk)1

where nk(t) denotes the number of lexical items t belonging to the knd topic in the corpus, nm(k) denotes the number of kth topics in the mth document, and ¬i denotes the removal of the word currently being sampled.

The probability of a topic generating a word and the set of parameters of the probability of a document generating a topic Φ and Θ are obtained by the state of the martingale chain after sampling convergence, and the estimates of θm and φk are: φk,t=nk(t)+βtt=1V(nk(t)+βt) θm,k=nm(k)+αkk=1K(nm(k)+αk)

The sampling process is: first assign random initial values to the topic of each word in each document, then according to the Gibbs sampling formula, sample the topic of each word and update the corresponding parameters, the completion of the sampling of all the words in the textbook document that is an iteration, repeat the iteration 1000 times, the topic has been approximated to converge, according to the number of topics in each document and the number of words under each topic, estimate the document The probability of generating a topic and the probability of generating a word from a topic are estimated. Taking the mean of a number of samples leads to more accurate parameter estimation.

Civics Topic Extraction

Because the textbooks on the web are not labeled with topic categories, this paper utilizes the unsupervised learning of the LDA model to extract topics from the crawled textbook set.The two most important sets of parameters in the LDA model are the topic-phrase-item probability distribution and document-topic probability distribution, and the values of the parameters can be estimated by the Gibbs sampling algorithm, according to which the topic hot words and the topics related to the topic can be extracted from the Textbook.

Topical Hot Words

The parameter Φ in the LDA model indicates the probability distribution of words in a topic, and the probability distributions of words in different topics are very different, and in the same topic, words with higher probability are obviously related to the meaning of the topic. Words with larger probability can obviously summarize the meaning of the topic and can be regarded as hot words of the topic, so this paper sorts the word probability of each topic in descending order and selects the 20 feature words with the largest probability in the topic to represent the topic.

Topic-related teaching materials

In traditional clustering algorithms, each data object can only be assigned to one of multiple clusters.LDA model is different from this, LDA model is a probabilistic generative model, which uses parameter Θ to represent the probability distribution of a topic in a document. On each topic corresponds to a class, the probability of document generating a topic can be regarded as the strength of the document’s affiliation with this topic, this feature is especially suitable for Civics Topic Mining, because a textbook may talk about multiple topics, and according to the probability, we can judge the relevance of the document to different topics.

The method used in this paper is to set a threshold, if the probability of a topic in a document exceeds this threshold, the document can be attributed to the topic, so a textbook can belong to multiple topics, if the probability of all the topics of a document does not exceed this threshold, then this document does not belong to any topic and can be filtered out.

Heat of the conversation

The heat of a Civics topic in a certain period of time is related to both the number of textbook reports on the topic and user participation in that period of time, and intuitively, the more the number of textbooks on the topic, the higher the heat of the topic. In the LDA model, the document is the probability distribution of the topic. In this paper, for the same topic, we calculate the sum of the weights of all the textbook texts in a day and take the average value as the heat value of the topic in that day, and rank the topic by the heat value. The formula for calculating the heat level of a topic over a period of time is as follows: δk=1Dtd:tdtθdk

Dt denotes the number of textbooks in time period t, and θdk denotes the probability of topic k in the dth document.

Process of topic extraction

In this paper, the crawled textbook text collection is divided into days as the time unit, and the LDA model is applied to the textbook set of the same day, and Gibbs sampling is used to estimate the model parameters to extract the daily Civics topics and their related textbooks.

The specific steps are as follows:

Select the textbooks of the same day in the database.

Perform Chinese word splitting processing on the original text.

Filter the deactivated words, and keep only nouns and verbs.

Use TF-IDF weights for feature word extraction.

Modeling the textbook corpus using the LDA model, setting the hyperparameters and the number of topics of the LDA model.

Solving the probability parameters of the LDA model using the Gibbs sampling algorithm.

Obtain the hot words of the topics according to the solved parameters, and categorize the textbook into the corresponding topics according to the probability of the topics appearing in the document.

Topic prediction for new documents

Conjugacy can enable Bayesian methods to be computed incrementally, so that for new documents, topic probability distributions can be predicted using an already trained LDA model. As long as the topic-word probability distribution of the LDA model is considered to be fixed, provided by the model obtained from the training corpus, it is sufficient to require the topic distribution of the unknown document, and a slight modification of the sampling formula can be used to obtain the sampling formula for a new document. The sampling formula for a new document is as follows: p(zi˜=k|wi=t,z˜i,wi;M) =nk(t)+nk,i(t)+βtt=1V(nk(t)+nk,i(t)+βt)*nm,i(k)+αkk=1K(nm(k)+αk)1

The topic probability distribution of unknown documents is calculated as: θmncw,k=nmncw(k)+αkk=1K(nmncw(k)+αk)

Visualization and sentiment analysis of topic mining results
Visualization of Civic Topic Mining Results Based on LDA

Mining the Civic and Political topics in an English course on the MOOC platform using the mining methods described above has generated a large amount of course text data that needs to be analyzed and presented. The real-time and interactive nature of information feedback is the support and guarantee for learners and educators of English courses to better participate and organize their courses. Therefore, this section visualizes the data using topic mining.

Visualization of Civic Topics Problem Description

The mainstream Chinese MOOC platforms include “China University MOOC”, “Xue Tang Online”, “Good University Online”, “Wisdom Tree”, etc. These MOOC platforms all use audio and video as the main multimedia teaching resources to carry out online education, but most of the data generated by the accompanying courses are text data. These MOOC platforms carry out online education with multimedia teaching resources mainly in the form of audio and video, but most of the data generated along with the courses are text data, and the course discussion forum is precisely the centralized area of text data, and the text data of the course discussion forum is the most effective reference to reflect the interactive status of the course. Over the years, MOOC course platforms have gradually enriched the learning analysis function, and many MOOC platforms support viewing course statistics in the course background, such as the statistics of the number of course candidates, the statistics of the browsing records of teaching resources, the statistics of the number of discussions, the statistics of course grades, etc., but the focus of these functional analysis is not the statistics of the semantic information of the text in the discussion area, and the lack of analysis and presentation of the discussion topics. Therefore, this section visualizes ideological and political topics based on the mining results of the LDA topic model.

Visualization of Civic Topic Mining Results

The data in this part comes from the text data of the course comment area of the MOOC platform of Chinese universities, and the text data of the comment area of the top 50 courses in terms of the number of comments were selected as samples to carry out the experiment. The course with the highest number of comments has 23854 text evaluation data, while the course with the lowest number of comments has 2043 text evaluation data. A total of 242,173 text evaluation data were obtained through the web crawler method.

Heatmap is a data visualization tool that displays keen areas in the form of highlighting and chromaticity, etc. Common heatmaps include attention heatmap, click heatmap, comparison heatmap, analysis heatmap, history heatmap, floating heatmap, and sharing heatmap. Usually, the heat data is discrete points, heat map to discrete points as the center of the circle to establish can be superimposed grayscale band or color band, so that multiple hotspots superimposed when the grayscale or color between the discrete points can be smooth over, heat map is not an accurate expression of the data visualization form, but the pursuit of intuitively show the data sparsity or probability of the effect of high and low.

In the visual expression of text data, text data combined with time information can also be used in the form of heat map to express the time dimension of the text probability of high and low, due to the selection of text data and time granularity of the text data in the form of two-dimensional matrices, the heat map is also used in the form of a two-dimensional matrix of color blocks.

In Python, you can call the heatmap function of the matrix module of the seaborn library to create a heatmap. Arranging the keyword heatmap according to the time-topic form can help to show the change of temporal topic keywords more intuitively. Based on the results obtained from the previous Civic Topic Mining method, the keyword probability data of each Civic Topic is extracted, and the extracted e.g. graphs of the 4 types of Civic Topics are shown in Fig. 2~Fig. 5 respectively.

Figure 2.

Heatmap of key word probability change over time from topic 1

Figure 3.

Heatmap of key word probability change over time from topic 2

Figure 4.

Heatmap of key word probability change over time from topic 3

Figure 5.

Heatmap of key word probability change over time from topic 4

In the keyword heatmap of four types of topics, the darker the color means the higher the probability of the keyword, and the lighter the color means the lower the probability of the keyword. It can be seen from the figure that the probability of keywords in topics 1~4 reflects that the probability of core keywords is very high, and the probability of other keywords is low and equivalent, indicating that there are absolute core words in these topics, and learners tend to describe the core words when participating in discussions: the probability of keywords such as “patriotic education”, “civic consciousness”, “advanced socialist culture” and “cultural industry” has an obvious upward trend, and the probability of keywords such as “ideological and political teacher” and “governing the country according to law” has an obvious downward trend.

Teaching Evaluation Text Mining and Analysis

LDA topic modeling was applied to automatically mine the topic distribution structure and semantic content implied in the teaching evaluation sets of different subgroups (positive and negative) of teachers, providing a basis for improving the design of English Civics teaching.

Sentiment analysis of high-frequency words

Based on the results of TF-IDF keyword extraction, high-frequency words were counted on the two sets of text data, and the initial interpretation of the teacher teaching characteristics focused on the two sets of rubrics.

Positive Evaluation Group High Frequency Words

After categorizing emotional polarity and matching the teacher’s teaching dictionary, a total of 66,982 positive reviews were obtained. The word frequency analysis method was used to capture and extract the hot comment content of positive comments, and the top 20 high-frequency keywords were selected after screening and sorting, as shown in Figure 6. The word with the highest frequency was “explain” (8100 times), indicating that the positive evaluation mainly revolved around the explanation of the English teacher, and the learners paid more attention to the teacher’s explanation, and gave affirmation and recognition to the teacher’s explanation. This was followed by “interesting” (4,351 times) and “case” (3,741 times) with a higher frequency. Words such as “clear”, “teaching”, “knowledge”, “in-depth explanation” and “comprehension” all appear more than 2,000 times. Except for “problem-driven” and “knowledge points”, the frequency of other words is less than 1000 times, and the rest are all in the range of 1000~2000 times.

Negative evaluation group high frequency words

After categorizing emotional polarity and matching the teacher’s teaching dictionary, a total of 987 negative comments were obtained. The word frequency analysis method extracts the top 20 high-frequency keywords, and the results are shown in Figure 7. The word with the highest word frequency was “explain” (74 times), indicating that learners in the negative evaluation group were also primarily concerned about the flexibility and innovation of the teacher’s teaching style. This was followed by a higher frequency of “hope” (73 times) and “teaching” (72 times). Words such as “too fast”, “example”, “mechanical”, “no”, “understood”, “excessively” all appear more than 60 times. All other words were less than 50 times, except for “too slow”, “speech speed”, “knowledge points” and “unclear”, which appeared more than 50 times. Words such as “explain”, “teach”, “too fast”, “too slow”, “speech speed”, “can’t keep up” and other words reflect that learners pay more attention to the rhythm of the teacher’s teaching or explanation. The vocabulary “comprehension”, “case”, “knowledge point”, “example”, “incomprehension”, etc., reflects the learner’s attention to the teacher’s teaching content. The negative words above reflect the learners’ desire for teachers to further improve their teaching design in terms of teaching methods, teaching rhythms, and teaching content.

Figure 6.

Positive evaluation group high frequency vocabulary sorting

Figure 7.

Negative evaluation group high frequency vocabulary sorting

Evaluation Text LDA Mining

The number of topics in the LDA model depends on the size of the dataset. The setting of the a priori parameters ɑ and β in the model then depends on the total number of topics k. In this experiment, the teacher’s teaching positive evaluation set has more data, and it is set based on ɑ=50/k and β=0.01. The 66,982 evaluation texts of the positive evaluation set were imported into the topic mining program for testing and found that the experimental results were best when the total number of topics k=5. According to the probability distribution of the topics of interest to different groups of learners, five topics with significant probability values were extracted. In each topic, 10 words were sorted according to the size of the probability value to obtain the fine-grained semantic content under each topic. The 987 evaluation texts of the negative evaluation group were imported into the topic mining program for testing and it was found that the experimental results were best when the total number of topics k=3. According to the probability distribution of the topics that different groups of learners pay attention to, three topics with significant probability values were extracted. In each topic, 10 words were sorted according to the size of the probability value to obtain the fine-grained semantic content under each topic.

Positive Evaluation Group Topics

Table 1 shows the topic word matrix for the positive evaluation group. Among them, Topic 1 (0.315) and Topic 2 (0.303) focus on the most information, accounting for 61.8% of the proportion of all topics, which can reflect the content of the topics that these learners who gave positive evaluations are most concerned about and eager to discuss. The meaning implied by the topics can be inferred based on the word probability ranking in each topic. The specifics are as follows:

Topic 1 focuses on teaching strategies. Learners focus on the content of the teacher’s lectures and the logic of the lectures. Learners think that the teacher’s teaching content is rich in “cases”, “problem-driven”, rigorously structured, grasps the “key points” and has “logical” explanations, and the learners are inspired and help to improve the level and ability of learners in some aspects, and think that it is commendable.

Topic 2 focuses on teaching methods. It is about the teaching style and language style of the instructor. Learners not only agree with the teaching style and method of the teacher, but also think that the teacher’s lecture is easy to understand, the language is “humorous” and funny, very vivid and “interesting”, etc., these characteristics can attract learners to learn, “learn” and gain a lot of knowledge, which not only improves the learning efficiency, quality, level and ability of some aspects of learners, enthusiasm, etc., but also improves the teaching efficiency of teachers.

Topic 3 focuses on the practicality of content. Mainly from the learner’s intuitive experience to reflect the teacher’s teaching content, explanation or teaching method is helpful for them to understand or learn knowledge and course content, the teaching process is linked to the “reality”, the “time” arrangement is reasonable, with “practicality”, worthy of learning and recommendation, learning “harvest” is quite rich.

Topic 4 focuses on language organization. Learners believe that the course teacher is “organized” and reasonable in the course language, can teach in “simple terms”, and has “clear ideas” in teaching, so that learners can understand the knowledge “thoroughly”, and make the course “easy to understand” and easy to learn.

Topic 5 focuses on speech speed characteristics. Learners think that the “speed of speech” of the teacher’s explanation process is appropriate, and the “explanation” of “knowledge points” and “theories” is very “in place”, and there is interaction to help learners understand.

Negative evaluation group topics

The topic word matrix for the negative evaluation group is shown in Table 2. The comments of the negative evaluation group can be categorized into 3 topics, in which learners paid more attention to topic 1 (0.047) and topic 2 (0.371), which accounted for 74% of the total topics, and topic 3, which reached 19.8%.

Topic 1 focuses on the characteristics of language intelligibility. The key word with the highest probability in this topic is hope, which can indicate that online learners have high expectations for course improvement. Rational encouragement is more effective than stubborn criticism. In this topic, focusing on words such as “not enough”, “not understanding”, “no”, and keywords such as “sound” and “clear”, this topic can be classified as the learner’s evaluation of the teacher’s language clarity, and it is believed that the learning disorders such as incomprehension and failure to understand are mainly caused by the teacher’s lack of clarity in explanation, and it is hoped that the teacher can combine examples and connect with reality, and then explain in depth and easy to understand.

Topic 2 focuses on the similarity of teaching language. The topic extracts keywords such as “mechanical”, “explanation” and “reading”, and the main focus is that the teacher’s teaching and explanation methods are too patterned, and there are problems such as mechanical, PPT and reading in online course teaching. Mechanical teaching completely ignores the subjective role of teachers and students in “creating” course events, which is a common error in traditional classroom teaching, and is easy to cause “dissatisfaction” among English ideological and political learners.

Topic 3 focuses on characteristics of speech speed. The main focus is on the “speed of speech” and “progress” of the teacher’s teaching. Learners believe that the teacher’s teaching progress is unreasonable (too fast or too slow), resulting in learners “can’t keep up” and “rhythm”, the speed of the lecture is too slow, and there are problems such as non-standard English and direct reading of courseware, which are all aspects that teachers need to improve.

By examining the specifics surrounding these three topics, it is apparent that the learners in the negative evaluation group were trying to explain the possible reasons for their disapproval of the teacher’s teaching. For example, unclear language in the teacher’s explanations, mechanical reading of the lesson, and inappropriate speed of lecture may be reasons that determine their negative evaluation. We also found that the content of these comments was highly consistent with the results of the automated mining of the LDA model, validated by validating the course comments posted by the learners.

Positive evaluation group topic-word matrix

Topic 1(0.315) Topic 2(0.303) Topic 3(0.128)
case 0.109 lecture 0.159 obtain 0.223
logic 0.069 elevate 0.100 know 0.177
enhance 0.068 interesting 0.099 help 0.102
means 0.060 appealing 0.073 professor 0.074
fit 0.051 means 0.065 remarkable 0.062
question driven 0.049 learned 0.063 deserve 0.049
inspiring 0.048 form 0.057 actual 0.047
key point 0.047 benefit 0.057 time 0.042
provide 0.042 instruct 0.052 practical 0.023
structure 0.040 humor 0.034 rich 0.021
Topic 4(0.128) Topic 5(0.128)
teach 0.351 explain 0.342
knowledge 0.087 comprehend 0.088
understand 0.058 In place 0.071
organized 0.058 teaching means 0.043
language 0.048 speed 0.043
meaning 0.043 knowledge point 0.042
logical 0.038 theory 0.037
profound explanation 0.034 interaction 0.036
explicitly 0.032 combine 0.031
clear 0.022 give 0.031

Positive evaluation group topic-word matrix

Topic 1(0.431) Topic 2(0.371) Topic 3(0.198)
hope 0.047 mechanical 0.027 lecture 0.031
example 0.025 explain 0.025 speed 0.026
voice 0.017 recite 0.022 waiting 0.022
explain 0.017 means 0.021 material 0.019
fail to grasp 0.017 without 0.020 incapable 0.019
dig in 0.017 read 0.015 place 0.018
understand 0.016 speak 0.015 progress 0.017
inefficient 0.015 part 0.012 enhance 0.016
abstract 0.014 lecture 0.012 fail to catch up 0.015
without 0.011 question 0.011 pace 0.012
Conclusion

In this paper, after preprocessing the topic text of an English course, the LDA topic model is used to complete the mining of ideological and political topics. In the excavation results, the ideological and political topics such as “patriotic education”, “civic consciousness”, “advanced socialist culture” and “cultural industry” showed an obvious upward trend, indicating that students paid more attention to the ideological and political content of culture and education. In the high-frequency vocabulary sentiment analysis, the word “explain” appeared most frequently in the positive and negative evaluation groups, with 8100 times and 74 times, respectively, which shows that the key to the design of English ideological and political teaching lies in optimizing the explanation strategy. The LDA mining of the evaluation text also found that the negative topic words reflected that students were not satisfied with the language clarity, similarity and teaching rhythm in the instructional design, so the instructional design should be improved in a focused way.

Sprache:
Englisch
Zeitrahmen der Veröffentlichung:
1 Hefte pro Jahr
Fachgebiete der Zeitschrift:
Biologie, Biologie, andere, Mathematik, Angewandte Mathematik, Mathematik, Allgemeines, Physik, Physik, andere