Acceso abierto

Recognizing Metaphorical Expressions in Chinese Speech and Their Natural Language Processing Strategies

  
19 mar 2025

Cite
Descargar portada

Introduction

As one of the most widely spoken languages in the world, Chinese carries a profound cultural background and rich historical connotation. Compared with English, Chinese has unique and complex linguistic characteristics, high inter-lexical dependency, flexibility of word order and rich contextual expressions, coupled with the polysemous and multi-level grammatical structure of Chinese, which makes the study of natural language processing in Chinese challenging [1].

In the study of natural language processing, metaphor, as a universal human expression, is an important topic that cannot be avoided and must be solved. Metaphor is an indispensable part of human language, which is not only a rhetorical device, but also a means for people to understand and think. Its essence is to understand one thing with the help of another, which is a powerful tool for human cognition of the world [23]. The application of metaphor in natural language is mainly reflected in machine translation, information retrieval and affective computing. There is an obvious difference between the metaphorical and literal meanings of words in translation, so if the metaphor module is added to machine translation, it will realize a certain degree of Italian translation, which can greatly improve the quality of translation [45]. Information retrieval refers to the technology of searching out relevant information according to the searcher’s needs from the organized information resources, and the introduction of metaphor module can make large-scale search engine have better processing and understanding of the search content, which will significantly improve the accuracy of the search and the user’s satisfaction [67]. Emotional metaphor is a kind of metaphorical expression targeting human emotions, and people often use metaphors to express emotional states and behaviors in their lives, and by calculating the emotional tendencies contained in emotional metaphors, it has an important auxiliary role in sentiment analysis [89].

There are some specific linguistic markers in many metaphorical expressions, and these markers can be used as the most direct clues to determine the metaphorical expressions. Zhang, D. et al. proposed MultiMET, a multimodal metaphor dataset, which contains text-image pairs as well as multimodal annotations related to metaphors, and provides multimodal clues and the interactions among them for automatic metaphor comprehension in natural language processing [10]. Reimann, S. et al. present the first fine-grained metaphor annotated dataset from online religious communication texts. Since religious communication has a tendency to extend metaphorical comparisons, adding a small amount of genre data to the training set of cross-genre migration metaphor detection can improve the metaphor recognition performance of the metaphor detection model [11]. Ovando-Becerril, E. et al. designed a language metaphor recognition classifier for corpus text, introduced LSTM units to propose three different pre-trained language models, and compared the classification results produced by them to reflect the metaphor recognition performance of the proposed model [12]. Chen, G. et al. based on the metaphorical relationship extraction model can be more flexible to use the text units to obtain the target segment and source segment in the language, getting rid of the lexical classification the necessity of metaphor recognition, and metaphor relation extraction facilitates connecting linguistic and conceptual metaphors [13].

The semantic knowledge-based metaphor recognition method considers that metaphor formation is caused by the collocation of phrases in a sentence in the literal sense of the logic violates the cognitive laws, i.e., produces semantic conflicts can be judged as metaphorical expressions. Reijnierse, W. G. et al. found that deliberate metaphors and non-deliberate metaphors exist at the semiotic level of the difference between source domains with or without distinctive referents, based on which they developed a semantics-based identification of deliberate metaphor identification program (DMIP), which has good reliability as a tool for deliberative metaphor analysis [14]. Steen, G. elaborated the main principles of deliberate metaphor theory and further explored the distinction between deliberate and unintentional metaphors in the semantic context by assuming the core concepts in the study of linguistic metaphors [15]. Katz, A. N. emphasized that the location of a sentence in the contextual context is conducive to deeper understanding of the terminological concepts in the sentence, suggesting that contextual drivers can be used as heuristics to limit the interpretation of metaphorical or ironic utterances [16]. Su, C. et al. automated noun metaphors based on distributed semantic word embedding technology and semantic relevance computation method, accomplished the task of metaphor recognition by calculating the relevance of the word vectors in the source and target domains, and to propose a metaphor interpretation method with attribute dynamic migration [17].

With the arrival and development of the big data era, the introduction of statistical machine learning methods in metaphor recognition has become a new research direction in recognition methods. Mao, R. et al. utilized a multi-task learning tower bridging gating mechanism for the sentiment analysis task and the sequential metaphor recognition task, and since the information exchange between the specific task towers can gain additional benefits, the proposed method obtains better performance on both the sentiment analysis task and the sequential metaphor recognition task [18]. Shou, X. et al. constructed an adversarial generative model GMAI guided by conceptual metaphor theory, whose metaphor generation module identifies or modifies the information labels in the language to generate domain-specific metaphorical expression spectra, and the metaphor interpretation module unlabeled state-enriched text for training, with high interpretation accuracy [19]. Yang, Q. et al. establish a deep learning model that fuses hierarchical feature representations and semantic interactions, and introduce a co-attention mechanism to integrate word-level and sentence-level hierarchical feature representations in a language to make the fusion vectors more complementary and complete, and to provide a more natural approach to feature extraction for metaphor detection in a language [20]. Mao, R. et al. combine linguistic theories of metaphor recognition with a standard sequential annotation model to train end-to-end sequential metaphor recognition through deep neural networks to obtain excellent metaphor recognition performance [21].

In this study, the main steps of natural language processing are analyzed, and the text is preprocessed using techniques such as word splitting and de-duplication before extracting the textual features of the Chinese language using the TF-IDF algorithm. The Bi-LSTM network model is used as the foundation for both the grammatical structure metaphor recognition model and word meaning metaphor recognition model. By combining the two, the recognition model designed in this paper can be obtained. The model’s performance in metaphor recognition in Chinese speech is evaluated by evaluating recognition accuracy and other indicators, and the natural language processing strategy for recognizing metaphors in Chinese speech is explored.

Key techniques for recognizing metaphors in Chinese speech
Natural Language Processing

The process of natural language processing [22] is to first study the language model applied to natural language, then build a framework on the computer to realize this language model, then propose improvement methods to continuously improve this language model, and finally apply this language model to a variety of practical systems and explore the evaluation techniques of these systems. Computer analysis and understanding of language is usually a hierarchical process, which is divided into four types: pragmatic analysis, phonological analysis, and semantic analysis.

Figure 1 depicts the fundamental model of natural language understanding, and the primary steps involved in natural language processing comprise:

Lexical processing: that is, separating the text by phrases.

Lexical analysis: the lexical types after the word separation process, mark the lexical nature of the words that is to determine the type of words, including nouns, verbs, adjectives, adverbs, prepositions and so on.

Grammatical analysis: analyze the grammatical components of the sentence.

Semantic analysis: refers to enabling computers to understand natural language

Figure 1.

Basic model diagram of natural language understanding

Text Processing and Feature Extraction
Text pre-processing

Before being analyzed for natural language processing, the text needs to be converted from a natural writing state to a human-specified canonical format in order to reduce the number of steps in the algorithm to process the information. The main elements of preprocessing are as follows:

Segmentation. The segmentation operation splits the textual data stream into separate words, phrases, marks, or other basic lexical meta-inputs that possess complete semantics.

Deactivation of words. Deactivated words are commonly handled by constructing a list of deactivated words, and if a word in the list is matched in the current position when the text is scanned, the word is deleted and not used as an input to the algorithm or model.

Text Noise. Natural text will appear many characters that have special meanings. These symbols need to be removed in preprocessing to prevent interference in the analysis.

Writing check and stemming reduction. The text is briefly passed through the nltk toolkit and the text is word corrected for writing errors.

TF-IDF feature extraction algorithm

Inverse Document Frequency (IDF) is a measure of how often words appear in a corpus. It can reflect the uniqueness of the words in the corpus, the more unique the words have higher values, often combined with the word frequency Term Frequency as TF-IDF technique [23] is used to calculate the score of each word in the document. The formula is as follows: W(d,t)=TF(d,t)*log(Ndf(t))

Where d represents the document, t represents the target word, TF(d,t) represents the frequency of occurrence of t in document d, N represents the number of documents in the corpus, df(t) represents the number of documents containing t, and the right half together constitutes the IDF of the word. the pre-trained IDF is directly utilized to compute the TF-IDF score of the word in the text in practical use.

Bi-LSTM network modeling

RNN (Recurrent neural networks) is a type of neural network for processing time-series data. It takes as input a sequence of vectors X = (x1,x2,…,xn), where xt is a feature representation of moment t in the sequence data, and in turn the RNN returns another sequence of vectors h = (h1,h2,…,hn), where ht can incorporate information from moment t–1 and previous inputs as shown in equation (2): ht=σ(Wxhxt+Whhht1+b) where σ(·) is the activation function, Wxh and Whh are the parameter matrices that are linearly transformed to xt and ht–1, respectively, and b is the bias vector.

This structure causes problems such as gradient vanishing or gradient explosion that make it difficult for ordinary RNN structures to convey information that is far apart. In contrast, LSTM uses memory cells capable of capturing long-distance dependent information for the model and uses several gates to control the flow of information streams. Among them, the three gate structures are formulated as follows: it=σ(Wxixt+Whiht1+bi) ft=σ(Wxfxt+Whfht1+bf) ot=σ(Wxoxt+Whoht1+bo)

Where Wxi and Whi are the parameter matrices for linear transformations of xt and ht–1 respectively and bi is the bias vector. Where it is called input gate, ft is called forget gate and ot is called output gate. The reason for this can be captured based on further formulas: ct=ftct1+ittanh(Wxcxt+Whcht1+bc) where tanh (·) is an activation function, Wxc and Whc are the parameter matrices for linear transformations of xt and ht–1, respectively, bc is the bias vector, and ⊙ is the corresponding elementwise multiplication operation.

Equation (6) calculates ct, which is one of the memory cells at the core of the LSTM design, and its information consists of two parts: one part comes from the memory cell ct–1 of the previous moment, and the forgetting gate ft controls how much of the “memory” it retains. The other part comes from new information, and the input gate it controls how much new information is input. Finally, as shown in equation (7), the output of the current moment ht is calculated, and the output gate ot controls how much information from the memory unit is retained as the output of the current moment t: ht=ottanh(ct)

For the input vector sequence X, which represents a complete sentence of the input in the hidden meaning entity recognition task, where xt ∈⊙d represents the d -dimensional embedding representation vector of the word (or punctuation) located at t, and n is the length of the sentence. From the above formulation, it can be seen that for each moment of the model the output ht actually covers only the information of the sentence to the left of the current word due to its recursive computation.

Therefore, BiLSTM counts the output produced by sequentially reading the input vector sequence as ht , and adds an additional output produced by reading the input vector sequence using the opposite direction, counted as ht . By using two LSTMs with the same network structure, and only changing the order in which they read the input vector sequences, we can obtain a forward LSTM and a backward LSTM, respectively, and finally combine them in a splicing manner to form BiLSTM [2425], as shown in Eq. (8): ht=[ ht;ht ]

As a result, the model output representation ht for moment t covers both pre- and post-textual information. Further, the vector pt = (pt1, pt2,…, ptK) of predicted probability values for each label of the trd word can be used ht to make mutually independent predictions for each word: pt=softmax(Wclfht+bclf)

Where, K is the number of labels, Wclf and bclf are the parameter matrix and bias vector of the fully connected classifier, respectively, and a softmax(·)-activation function is employed for the purpose of multiclassification. Further, the cross-entropy loss function is calculated as follows: J=t=1nk=1Klabeltklog(ptk) where labeltk = 1 when the true label of the t st word is k, otherwise labeltk = 0. Finally, the model can be optimized by using gradient backpropagation.

Metaphor Recognition Strategies Based on Grammatical Structure and Word Meanings
Metaphor Recognition Methods Based on Grammatical Structures
Semantic representation of syntactic structures

In this paper, we improve the basic cosine similarity formula and propose a non-fixed weight cosine similarity formula for semantic computation of word vectors constituting syntactic relations. The calculation formula is given in the following equation: zt=tanh(Wzvkt+bz)t{ i,j } normt=normalize(zt) m=normi*normj d=tanh(Wdm)

Where, in Eq. (11), word vectors vki and vkj are input to a one-layer multilayer perceptron model (MLP) to obtain vectors zi and zj. In Eq. (12), vectors zi and zj are transformed into vectors of mode 1 to obtain normi and normj. In Eq. (13), the corresponding positions of the vectors normi and normj are multiplied to obtain m. Wd is used as a nonfixed weight matrix for vector m in Eq. (14), and the resulting Vector d is used as the vector representation of syntactic structure (wi,wj). where Wz, bz, and Wd are randomly initialized and adapted during training.

Layer Attention Mechanisms

The network model structure of HAN is shown in Figure 2:

Figure 2.

Level of attention network model

The document contains a total of L sentence, respectively {s1,s2,…,sL}, and for sentence s2, a total of T words {w21,w22,…,w2T}. where α2i(i={1,2,…,T}) denotes the attentional weight of each word vector in sentence 2, and αj(j={1,2,…,L}) denotes the attentional weight of each sentence vector in the document.

A model for recognizing grammatical structure metaphors

The metaphor recognition model based on syntactic structure proposed in this section is shown in Fig. 3.

Input layer: the text is processed by word splitting and syntactic analysis to obtain the word sequence {w1,w2,…,wi,…,wj,…,wn} and the set of syntactic structures {(w1,w2),…,(wi,wj),…,(wl,wn)}.

Word Embedding Layer: Obtain word embeddings of words in syntactic structure and sentence context, and splice them as vectors of the words.

Semantic representation layer: semantic computation of the words constituting the syntactic relation by cosine similarity with non-fixed weights, so as to obtain the semantic representation of the syntactic structure.

Syntactic structure-level attention layer: obtain the attention weights of syntactic structures, introduce context vector up in syntactic structures, use this vector to measure the importance of different syntactic structures in sentence metaphor recognition, and obtain syntactic structure-level attention weights βk.

Sentence vector representation layer: obtain sentence vectors. Based on weight βk, weight and sum the syntactic structure vectors pk to get sentence vector representation vp.

Output layer: classify the metaphorical nature of the sentence. Input the sentence vector vp into the classification function, and the output of the classification function is the probability of the metaphorical nature of the sentence: y=σ(Wvp)

Figure 3.

MI_SS network model diagram

In Eq. (15), W is the weight matrix of the fully-connected layer, σ is the activation function of the fully-connected layer, i.e., the classification function, and y is the probability of sentence metaphoricality, which is a value between 0-1.

Metaphor Recognition Model Based on Word Meaning
Network model diagram

The metaphor recognition model MI_WS based on word meaning is shown in Fig. 4.

Input layer: the text is subjected to the word-splitting process to obtain the word sequence {w1,w2,…,wn}.

Word embedding layer: obtain word embeddings oi and bi of the basic and contextual semantics of the words and splice them to vi.

Semantic fusion layer: use Bi-LSTM sequence model to further obtain the contextual semantics of the word in two directions, and splice them with the word basic semantic vector oi to get the vector hoi which contains the contextual semantics centered on the word and the word basic semantics.

Word-level attention layer: introduce word-level context vectors uw to measure the importance of different words and get the attention weights of words αi.

Sentence vector representation layer: obtain sentence vectors. Weight and sum the word vectors hoi based on the weights αi to get the sentence vector representation vw.

Output layer: classify the metaphorical nature of the sentence and output the result of metaphor recognition.

Figure 4.

MI_WS network model diagram

Word Embedding Layer

The word embedding layer is responsible for obtaining the basic semantics and contextual semantics of words, and in this paper, Word2Vec and BERT models are used to obtain the basic and contextual meanings of words, respectively.

Semantic Fusion Layer

The semantic fusion layer further fuses the basic semantics of words and contextual semantics. In MIP-guided metaphor recognition, it is necessary to obtain the semantic information of the current word in both forward and backward directions, and in this paper, Bi-LSTM is used in the semantic fusion layer to further obtain the contextual semantics of the word and splice it with the basic semantics obtained from the Word2Vec model used in the word embedding layer.

Word-level Attention Layers

In this paper, we introduce the word-level context vector uw, which is equivalent to the fixed query “most likely to express a metaphor” in the attention mechanism, and is used to measure the importance of different words, and the formula of the word-level attention layer is as follows: ui=tanh(Wwhoi+bw) αi=softmax(uiTuw) vw=i=1nαi*hoi

In Eq. (16), vector hoi is input to a one-layer MLP network to obtain ui. In Eq. (17), the similarity between ui and the word-level context vector uw is computed to measure the importance of the word and normalized to obtain the word-attention weight αi. In Eq. (18), the weighted sum of word embeddings hoi is computed based on weight αi to obtain sentence-vector representations vw. of which ww, bw, and uw are self-adaptive during the training process.

Metaphor Recognition Model Based on Grammatical Structure and Word Meaning

By comprehensively examining the information of sentences in tree structure and sequence form, a multilevel metaphor recognition model based on syntactic structure and word semantics is proposed, which consists of two modules, word and syntactic structure, and the sentence vector vw obtained through word semantics and the sentence vector vp obtained through syntactic structure are spliced together, and finally the multilevel sentence vector v is obtained for the sentence, and the metaphor recognition based on syntactic structure and word semantic The model is shown in Fig. 5.

Syntactic structure module: based on the SPV theory, the cosine similarity between the words constituting the syntactic structure of the sentence under the tree structure is used to measure the metaphorical nature of the syntactic structure, and combined with the attention module to obtain the importance of different syntactic structures.

Word module: based on the MIP idea, the basic semantics and contextual semantics of words are considered comprehensively at the word level, and combined with the Bi-LSTM model to further obtain the positional information of the words of the sentence in the sequence form, so as to obtain the metaphorical information of the words.

Figure 5.

MI_SS+ MI_WS metaphor recognition model

Experimental setup and analysis of results
Description of the experiment

Experimental environment

The experimental process is done under windows using JAVA language. The access to Wordnet is carried out using the open source tool jwnl14-rc2.

Evaluation index

Accuracy rate is the percentage of samples that are expected to be true in the real classification.

The formula for calculating the accuracy rate can be expressed as follows: P=TPTP+FP

The accuracy rate is the ratio of the sum of the samples in which the predicted value agrees with the actual value to all the samples. The formula for calculating the accuracy rate is shown below: Acc=TP+TnTP+Tn+Fp+Fn

Recall is the percentage of actual samples that are actual samples. The formula for calculating the recall rate is shown below: R=TPTP+Fn

The F-value is a weighted harmonic average of precision and recall, and the formula is shown below: F=2PRP+R

Test corpus

Three knowledge bases were used for the experiment: the 2000 version of the Hownet thesaurus, the 2005 version of the synonym thesaurus, and Wordnet 3.0, and some corrections were made to each of the three knowledge bases to correct some of the spelling errors.

Evaluation of metaphor recognition effect of MI_SS+MI_WS models
Layer Attention Mechanism Assessment Results

In this section, we compare the recognition performance under different numbers of attention layers and evaluate the effect of the attention mechanism on the recognition results.

In the experiments of evaluating the number of attention layers, we set the number of layers of the attention mechanism to be 1, 2, 3, 4, 5 and 6, respectively, and keep the other parameters of the model unchanged to explore the changes of the recognition results under different layers. Figure 6 demonstrates the difference in recognition performance that this paper’s model has under different numbers of layers of attention mechanism.

Figure 6.

Differences of attention levels identify performance differences

According to the data in the figure, we can find that when the number of layers of multi-layer attention mechanism is taken as 4, the metaphor recognition performance of this paper’s model is the best, with Precision, Recall and F1 being 94.32%, 95.03% and 93.36%, respectively. If only one layer of the attention mechanism layer is used, some of the key information away from the attention words may not be effectively extracted. However, the recognition performance is not proportional to the number of layers. We notice that when the number of attention layers is taken as 5 and 6, the recognition performance of the model is lower than when the number of attention layers is taken as 4. In particular, the recognition performance of the model with layer number 5 is better than the model with layer number 6. We believe that: too many attention computations will lead to over-abstraction of the hidden layer features obtained from the last layer, high compression of some of the features of the original sentence, and bias in the final recognition results.

Word Visualization and Recognition Accuracy Measurement

In order to clearly understand the influence of different lexical properties on metaphorical sentences, the distribution of the four lexical properties is displayed and analyzed in a visual form, and the results of the distribution of the four lexical properties in the test corpus are shown in Figure 7.

Figure 7.

The distribution of the four words is tested

The data in the figure show that the frequency of verbs and nouns is higher, accounting for more than 85% of all real words in the four lexemes, and the frequency of adverbs and adjectives is significantly reduced, accounting for less than 15% of all real words in the four lexemes. This is due to the fact that verbs and nouns are important constituents that make up sentences, and adverbs and adjectives modify verbs and nouns, and the results are in line with conventional perceptions. As a result, verbs and nouns have a greater share in both verbal and metaphorical sentences.

In addition, adverbs and adjectives accounted for about 9.21% and 4.28% of the four lexemes in the metaphorical sentences and about 8.83% and 3.74% of the four lexemes in the vernacular sentences, and the probability of the occurrence of adverbs and adjectives was greater in metaphorical sentences than in vernacular sentences, which suggests that adjectives or adverbs are used more often to modify nouns and verbs in metaphorical sentences.

In order to verify the adaptability of the TF-IDF feature extraction algorithm with the metaphor recognition model of this paper, this experiment uses the accuracy recall and F1 value of metaphor recognition as the evaluation criteria of the experimental results, and analyzes the effect of extracting features using the TF-IDF technique. Five verbs, nouns, adverbs, and adjectives with metaphorical meanings each are selected from the corpus, and their classification and recognition experiments are conducted. The results of metaphorical lexical classification and recognition based on TF-IDF feature extraction are shown in Figure 8.

Figure 8.

Metaphorical classification recognition results

As can be seen from the figure, the model in this paper adopts the TF-IDF feature extraction algorithm for the recognition of metaphorical lexical classification in the accuracy between 78% and 85%, which may be caused by the fact that the TF-IDF algorithm carries out simple mathematical statistics on the frequency of the occurrence of Chinese speech in the sentence, and it cannot directly represent the semantic relationship of the words in the sentence. However, the algorithm shows better results in recall and F-value with measurements above 90%. This confirms that the TF-IDF feature extraction algorithm is more compatible with the metaphor recognition model proposed in this paper.

Evaluation of the Effectiveness of Metaphor Recognition Based on Emotion Recognition

This section demonstrates the comparative recognition and analysis of each emotion category of “happiness”, “sadness”, “fear”, “anger”, “surprise” and “disgust” based on the grammatical structure and word semantics of the metaphor recognition model based on grammatical structure and word semantics, and evaluates them using the above evaluation indicators. Figure 9 shows the recognition results of one emotion category.

Figure 9.

Individual mood category identification results

As can be seen from Figure 9, the model in this paper has a relatively poor recognition effect on the emotion category of “surprise”, with the values of the three evaluation indexes ranging from 46.02% to 61.29%, which may be due to the imbalance of emotion category speech in the test corpus. However, the model shows better recognition performance on the remaining five emotions. Among them, the “anger” emotion category was recognized with the best performance, with 88.73%, 82.13%, and 83.29% for F1, Recall, and Precision, respectively, which proves the effectiveness of the metaphor recognition model based on grammatical structures and word meanings for the task of recognizing emotions in Chinese speech.

Comparative Analysis of Chinese Speech Metaphor Recognition Performance

This section compares and analyzes the recognition performance of different methods for recognizing metaphorical expressions in the test corpus. The comparison methods include textual cue-based metaphor recognition methods, semantic-based metaphor recognition methods, syntactic structure-based metaphor recognition methods, and statistical-based metaphor recognition methods, totaling four groups of comparison methods. The commonly used Accuracy (Acc), Precision (P), Recall (R), F-value, and AUC value, which are denoted as Indicators 1-5, are used as evaluation indexes of the metaphor recognition model in this paper. The performance test results of multiple methods in the test corpus are shown in Figure 10.

Figure 10.

Performance test results of various methods in the test corpus

Comparing the experimental data in the figure, it can be concluded that this paper’s metaphor recognition model based on grammatical structure and word meaning has different degrees of improvement in Acc, P, R, F values, and AUC values compared to the current mainstream metaphor recognition models, with the observed values of 0.958, 0.923, 0.964, 0.935, and 0.94, respectively.Among them, this paper’s model has 12.92%-26.18% improvement in the F1 value compared to other comparative methods on the metaphor recognition task in Chinese metaphor recognition task in speech by 12.92%-26.18%. This effectively proves the effectiveness of the metaphor recognition model based on grammatical structure and word meaning constructed in this study compared to other models.

Conclusion

In this study, a model for metaphor recognition based on grammatical structure and word meaning is constructed based on a variety of key techniques for metaphor recognition. The model incorporates the layer attention mechanism to enhance metaphor recognition in Chinese speech. Several experiments are utilized to demonstrate the effect of metaphor recognition on the model of this paper in Chinese speech.

The MI_SS+MI_WS metaphor recognition model works best when the number of layers of the attention mechanism is taken as 4.

The recognition accuracy of this paper’s model for lexical categorization of metaphorical words is 78% to 85%, while the measured values on recall and F-value are above 90%.

The F1, Recall, and Precision of the MI_SS+MI_WS metaphor recognition model in recognizing the metaphorical expression of “anger” are 88.73%, 82.13%, and 83.29%, respectively, which proves that the model can effectively accomplish the task of emotion recognition in Chinese speech.

The Acc, P, R, F and AUC values of the proposed method in this paper are 0.958, 0.923, 0.964, 0.935 and 0.94 on the task of metaphor recognition in Chinese speech, which are significantly higher than those of the comparison methods.