Recognizing Metaphorical Expressions in Chinese Speech and Their Natural Language Processing Strategies 
Publicado en línea: 19 mar 2025
Recibido: 09 nov 2024
Aceptado: 10 feb 2025
DOI: https://doi.org/10.2478/amns-2025-0377
Palabras clave
© 2025 Zhuo Wang, published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
As one of the most widely spoken languages in the world, Chinese carries a profound cultural background and rich historical connotation. Compared with English, Chinese has unique and complex linguistic characteristics, high inter-lexical dependency, flexibility of word order and rich contextual expressions, coupled with the polysemous and multi-level grammatical structure of Chinese, which makes the study of natural language processing in Chinese challenging [1].
In the study of natural language processing, metaphor, as a universal human expression, is an important topic that cannot be avoided and must be solved. Metaphor is an indispensable part of human language, which is not only a rhetorical device, but also a means for people to understand and think. Its essence is to understand one thing with the help of another, which is a powerful tool for human cognition of the world [2–3]. The application of metaphor in natural language is mainly reflected in machine translation, information retrieval and affective computing. There is an obvious difference between the metaphorical and literal meanings of words in translation, so if the metaphor module is added to machine translation, it will realize a certain degree of Italian translation, which can greatly improve the quality of translation [4–5]. Information retrieval refers to the technology of searching out relevant information according to the searcher’s needs from the organized information resources, and the introduction of metaphor module can make large-scale search engine have better processing and understanding of the search content, which will significantly improve the accuracy of the search and the user’s satisfaction [6–7]. Emotional metaphor is a kind of metaphorical expression targeting human emotions, and people often use metaphors to express emotional states and behaviors in their lives, and by calculating the emotional tendencies contained in emotional metaphors, it has an important auxiliary role in sentiment analysis [8–9].
There are some specific linguistic markers in many metaphorical expressions, and these markers can be used as the most direct clues to determine the metaphorical expressions. Zhang, D. et al. proposed MultiMET, a multimodal metaphor dataset, which contains text-image pairs as well as multimodal annotations related to metaphors, and provides multimodal clues and the interactions among them for automatic metaphor comprehension in natural language processing [10]. Reimann, S. et al. present the first fine-grained metaphor annotated dataset from online religious communication texts. Since religious communication has a tendency to extend metaphorical comparisons, adding a small amount of genre data to the training set of cross-genre migration metaphor detection can improve the metaphor recognition performance of the metaphor detection model [11]. Ovando-Becerril, E. et al. designed a language metaphor recognition classifier for corpus text, introduced LSTM units to propose three different pre-trained language models, and compared the classification results produced by them to reflect the metaphor recognition performance of the proposed model [12]. Chen, G. et al. based on the metaphorical relationship extraction model can be more flexible to use the text units to obtain the target segment and source segment in the language, getting rid of the lexical classification the necessity of metaphor recognition, and metaphor relation extraction facilitates connecting linguistic and conceptual metaphors [13].
The semantic knowledge-based metaphor recognition method considers that metaphor formation is caused by the collocation of phrases in a sentence in the literal sense of the logic violates the cognitive laws, i.e., produces semantic conflicts can be judged as metaphorical expressions. Reijnierse, W. G. et al. found that deliberate metaphors and non-deliberate metaphors exist at the semiotic level of the difference between source domains with or without distinctive referents, based on which they developed a semantics-based identification of deliberate metaphor identification program (DMIP), which has good reliability as a tool for deliberative metaphor analysis [14]. Steen, G. elaborated the main principles of deliberate metaphor theory and further explored the distinction between deliberate and unintentional metaphors in the semantic context by assuming the core concepts in the study of linguistic metaphors [15]. Katz, A. N. emphasized that the location of a sentence in the contextual context is conducive to deeper understanding of the terminological concepts in the sentence, suggesting that contextual drivers can be used as heuristics to limit the interpretation of metaphorical or ironic utterances [16]. Su, C. et al. automated noun metaphors based on distributed semantic word embedding technology and semantic relevance computation method, accomplished the task of metaphor recognition by calculating the relevance of the word vectors in the source and target domains, and to propose a metaphor interpretation method with attribute dynamic migration [17].
With the arrival and development of the big data era, the introduction of statistical machine learning methods in metaphor recognition has become a new research direction in recognition methods. Mao, R. et al. utilized a multi-task learning tower bridging gating mechanism for the sentiment analysis task and the sequential metaphor recognition task, and since the information exchange between the specific task towers can gain additional benefits, the proposed method obtains better performance on both the sentiment analysis task and the sequential metaphor recognition task [18]. Shou, X. et al. constructed an adversarial generative model GMAI guided by conceptual metaphor theory, whose metaphor generation module identifies or modifies the information labels in the language to generate domain-specific metaphorical expression spectra, and the metaphor interpretation module unlabeled state-enriched text for training, with high interpretation accuracy [19]. Yang, Q. et al. establish a deep learning model that fuses hierarchical feature representations and semantic interactions, and introduce a co-attention mechanism to integrate word-level and sentence-level hierarchical feature representations in a language to make the fusion vectors more complementary and complete, and to provide a more natural approach to feature extraction for metaphor detection in a language [20]. Mao, R. et al. combine linguistic theories of metaphor recognition with a standard sequential annotation model to train end-to-end sequential metaphor recognition through deep neural networks to obtain excellent metaphor recognition performance [21].
In this study, the main steps of natural language processing are analyzed, and the text is preprocessed using techniques such as word splitting and de-duplication before extracting the textual features of the Chinese language using the TF-IDF algorithm. The Bi-LSTM network model is used as the foundation for both the grammatical structure metaphor recognition model and word meaning metaphor recognition model. By combining the two, the recognition model designed in this paper can be obtained. The model’s performance in metaphor recognition in Chinese speech is evaluated by evaluating recognition accuracy and other indicators, and the natural language processing strategy for recognizing metaphors in Chinese speech is explored.
The process of natural language processing [22] is to first study the language model applied to natural language, then build a framework on the computer to realize this language model, then propose improvement methods to continuously improve this language model, and finally apply this language model to a variety of practical systems and explore the evaluation techniques of these systems. Computer analysis and understanding of language is usually a hierarchical process, which is divided into four types: pragmatic analysis, phonological analysis, and semantic analysis.
Figure 1 depicts the fundamental model of natural language understanding, and the primary steps involved in natural language processing comprise: Lexical processing: that is, separating the text by phrases. Lexical analysis: the lexical types after the word separation process, mark the lexical nature of the words that is to determine the type of words, including nouns, verbs, adjectives, adverbs, prepositions and so on. Grammatical analysis: analyze the grammatical components of the sentence. Semantic analysis: refers to enabling computers to understand natural language

Basic model diagram of natural language understanding
Before being analyzed for natural language processing, the text needs to be converted from a natural writing state to a human-specified canonical format in order to reduce the number of steps in the algorithm to process the information. The main elements of preprocessing are as follows: Segmentation. The segmentation operation splits the textual data stream into separate words, phrases, marks, or other basic lexical meta-inputs that possess complete semantics. Deactivation of words. Deactivated words are commonly handled by constructing a list of deactivated words, and if a word in the list is matched in the current position when the text is scanned, the word is deleted and not used as an input to the algorithm or model. Text Noise. Natural text will appear many characters that have special meanings. These symbols need to be removed in preprocessing to prevent interference in the analysis. Writing check and stemming reduction. The text is briefly passed through the nltk toolkit and the text is word corrected for writing errors.
Inverse Document Frequency (IDF) is a measure of how often words appear in a corpus. It can reflect the uniqueness of the words in the corpus, the more unique the words have higher values, often combined with the word frequency Term Frequency as TF-IDF technique [23] is used to calculate the score of each word in the document. The formula is as follows:
Where 
RNN (Recurrent neural networks) is a type of neural network for processing time-series data. It takes as input a sequence of vectors 
This structure causes problems such as gradient vanishing or gradient explosion that make it difficult for ordinary RNN structures to convey information that is far apart. In contrast, LSTM uses memory cells capable of capturing long-distance dependent information for the model and uses several gates to control the flow of information streams. Among them, the three gate structures are formulated as follows:
Where 
Equation (6) calculates 
For the input vector sequence 
Therefore, BiLSTM counts the output produced by sequentially reading the input vector sequence as 
As a result, the model output representation 
Where, 
In this paper, we improve the basic cosine similarity formula and propose a non-fixed weight cosine similarity formula for semantic computation of word vectors constituting syntactic relations. The calculation formula is given in the following equation:
Where, in Eq. (11), word vectors 
The network model structure of HAN is shown in Figure 2:

Level of attention network model
The document contains a total of 
The metaphor recognition model based on syntactic structure proposed in this section is shown in Fig. 3. Input layer: the text is processed by word splitting and syntactic analysis to obtain the word sequence { Word Embedding Layer: Obtain word embeddings of words in syntactic structure and sentence context, and splice them as vectors of the words. Semantic representation layer: semantic computation of the words constituting the syntactic relation by cosine similarity with non-fixed weights, so as to obtain the semantic representation of the syntactic structure. Syntactic structure-level attention layer: obtain the attention weights of syntactic structures, introduce context vector  Sentence vector representation layer: obtain sentence vectors. Based on weight  Output layer: classify the metaphorical nature of the sentence. Input the sentence vector 

MI_SS network model diagram
In Eq. (15), 
The metaphor recognition model MI_WS based on word meaning is shown in Fig. 4. Input layer: the text is subjected to the word-splitting process to obtain the word sequence { Word embedding layer: obtain word embeddings  Semantic fusion layer: use Bi-LSTM sequence model to further obtain the contextual semantics of the word in two directions, and splice them with the word basic semantic vector  Word-level attention layer: introduce word-level context vectors  Sentence vector representation layer: obtain sentence vectors. Weight and sum the word vectors  Output layer: classify the metaphorical nature of the sentence and output the result of metaphor recognition.

MI_WS network model diagram
The word embedding layer is responsible for obtaining the basic semantics and contextual semantics of words, and in this paper, Word2Vec and BERT models are used to obtain the basic and contextual meanings of words, respectively.
The semantic fusion layer further fuses the basic semantics of words and contextual semantics. In MIP-guided metaphor recognition, it is necessary to obtain the semantic information of the current word in both forward and backward directions, and in this paper, Bi-LSTM is used in the semantic fusion layer to further obtain the contextual semantics of the word and splice it with the basic semantics obtained from the Word2Vec model used in the word embedding layer.
In this paper, we introduce the word-level context vector 
In Eq. (16), vector 
By comprehensively examining the information of sentences in tree structure and sequence form, a multilevel metaphor recognition model based on syntactic structure and word semantics is proposed, which consists of two modules, word and syntactic structure, and the sentence vector  Syntactic structure module: based on the SPV theory, the cosine similarity between the words constituting the syntactic structure of the sentence under the tree structure is used to measure the metaphorical nature of the syntactic structure, and combined with the attention module to obtain the importance of different syntactic structures. Word module: based on the MIP idea, the basic semantics and contextual semantics of words are considered comprehensively at the word level, and combined with the Bi-LSTM model to further obtain the positional information of the words of the sentence in the sequence form, so as to obtain the metaphorical information of the words.

MI_SS+ MI_WS metaphor recognition model
Experimental environment
The experimental process is done under windows using JAVA language. The access to Wordnet is carried out using the open source tool jwnl14-rc2.
Evaluation index
Accuracy rate is the percentage of samples that are expected to be true in the real classification.
The formula for calculating the accuracy rate can be expressed as follows:
The accuracy rate is the ratio of the sum of the samples in which the predicted value agrees with the actual value to all the samples. The formula for calculating the accuracy rate is shown below:
Recall is the percentage of actual samples that are actual samples. The formula for calculating the recall rate is shown below:
The F-value is a weighted harmonic average of precision and recall, and the formula is shown below:
Test corpus
Three knowledge bases were used for the experiment: the 2000 version of the Hownet thesaurus, the 2005 version of the synonym thesaurus, and Wordnet 3.0, and some corrections were made to each of the three knowledge bases to correct some of the spelling errors.
In this section, we compare the recognition performance under different numbers of attention layers and evaluate the effect of the attention mechanism on the recognition results.
In the experiments of evaluating the number of attention layers, we set the number of layers of the attention mechanism to be 1, 2, 3, 4, 5 and 6, respectively, and keep the other parameters of the model unchanged to explore the changes of the recognition results under different layers. Figure 6 demonstrates the difference in recognition performance that this paper’s model has under different numbers of layers of attention mechanism.

Differences of attention levels identify performance differences
According to the data in the figure, we can find that when the number of layers of multi-layer attention mechanism is taken as 4, the metaphor recognition performance of this paper’s model is the best, with Precision, Recall and F1 being 94.32%, 95.03% and 93.36%, respectively. If only one layer of the attention mechanism layer is used, some of the key information away from the attention words may not be effectively extracted. However, the recognition performance is not proportional to the number of layers. We notice that when the number of attention layers is taken as 5 and 6, the recognition performance of the model is lower than when the number of attention layers is taken as 4. In particular, the recognition performance of the model with layer number 5 is better than the model with layer number 6. We believe that: too many attention computations will lead to over-abstraction of the hidden layer features obtained from the last layer, high compression of some of the features of the original sentence, and bias in the final recognition results.
In order to clearly understand the influence of different lexical properties on metaphorical sentences, the distribution of the four lexical properties is displayed and analyzed in a visual form, and the results of the distribution of the four lexical properties in the test corpus are shown in Figure 7.

The distribution of the four words is tested
The data in the figure show that the frequency of verbs and nouns is higher, accounting for more than 85% of all real words in the four lexemes, and the frequency of adverbs and adjectives is significantly reduced, accounting for less than 15% of all real words in the four lexemes. This is due to the fact that verbs and nouns are important constituents that make up sentences, and adverbs and adjectives modify verbs and nouns, and the results are in line with conventional perceptions. As a result, verbs and nouns have a greater share in both verbal and metaphorical sentences.
In addition, adverbs and adjectives accounted for about 9.21% and 4.28% of the four lexemes in the metaphorical sentences and about 8.83% and 3.74% of the four lexemes in the vernacular sentences, and the probability of the occurrence of adverbs and adjectives was greater in metaphorical sentences than in vernacular sentences, which suggests that adjectives or adverbs are used more often to modify nouns and verbs in metaphorical sentences.
In order to verify the adaptability of the TF-IDF feature extraction algorithm with the metaphor recognition model of this paper, this experiment uses the accuracy recall and F1 value of metaphor recognition as the evaluation criteria of the experimental results, and analyzes the effect of extracting features using the TF-IDF technique. Five verbs, nouns, adverbs, and adjectives with metaphorical meanings each are selected from the corpus, and their classification and recognition experiments are conducted. The results of metaphorical lexical classification and recognition based on TF-IDF feature extraction are shown in Figure 8.

Metaphorical classification recognition results
As can be seen from the figure, the model in this paper adopts the TF-IDF feature extraction algorithm for the recognition of metaphorical lexical classification in the accuracy between 78% and 85%, which may be caused by the fact that the TF-IDF algorithm carries out simple mathematical statistics on the frequency of the occurrence of Chinese speech in the sentence, and it cannot directly represent the semantic relationship of the words in the sentence. However, the algorithm shows better results in recall and F-value with measurements above 90%. This confirms that the TF-IDF feature extraction algorithm is more compatible with the metaphor recognition model proposed in this paper.
This section demonstrates the comparative recognition and analysis of each emotion category of “happiness”, “sadness”, “fear”, “anger”, “surprise” and “disgust” based on the grammatical structure and word semantics of the metaphor recognition model based on grammatical structure and word semantics, and evaluates them using the above evaluation indicators. Figure 9 shows the recognition results of one emotion category.

Individual mood category identification results
As can be seen from Figure 9, the model in this paper has a relatively poor recognition effect on the emotion category of “surprise”, with the values of the three evaluation indexes ranging from 46.02% to 61.29%, which may be due to the imbalance of emotion category speech in the test corpus. However, the model shows better recognition performance on the remaining five emotions. Among them, the “anger” emotion category was recognized with the best performance, with 88.73%, 82.13%, and 83.29% for F1, Recall, and Precision, respectively, which proves the effectiveness of the metaphor recognition model based on grammatical structures and word meanings for the task of recognizing emotions in Chinese speech.
This section compares and analyzes the recognition performance of different methods for recognizing metaphorical expressions in the test corpus. The comparison methods include textual cue-based metaphor recognition methods, semantic-based metaphor recognition methods, syntactic structure-based metaphor recognition methods, and statistical-based metaphor recognition methods, totaling four groups of comparison methods. The commonly used Accuracy (Acc), Precision (P), Recall (R), F-value, and AUC value, which are denoted as Indicators 1-5, are used as evaluation indexes of the metaphor recognition model in this paper. The performance test results of multiple methods in the test corpus are shown in Figure 10.

Performance test results of various methods in the test corpus
Comparing the experimental data in the figure, it can be concluded that this paper’s metaphor recognition model based on grammatical structure and word meaning has different degrees of improvement in Acc, P, R, F values, and AUC values compared to the current mainstream metaphor recognition models, with the observed values of 0.958, 0.923, 0.964, 0.935, and 0.94, respectively.Among them, this paper’s model has 12.92%-26.18% improvement in the F1 value compared to other comparative methods on the metaphor recognition task in Chinese metaphor recognition task in speech by 12.92%-26.18%. This effectively proves the effectiveness of the metaphor recognition model based on grammatical structure and word meaning constructed in this study compared to other models.
In this study, a model for metaphor recognition based on grammatical structure and word meaning is constructed based on a variety of key techniques for metaphor recognition. The model incorporates the layer attention mechanism to enhance metaphor recognition in Chinese speech. Several experiments are utilized to demonstrate the effect of metaphor recognition on the model of this paper in Chinese speech.
The MI_SS+MI_WS metaphor recognition model works best when the number of layers of the attention mechanism is taken as 4.
The recognition accuracy of this paper’s model for lexical categorization of metaphorical words is 78% to 85%, while the measured values on recall and F-value are above 90%.
The F1, Recall, and Precision of the MI_SS+MI_WS metaphor recognition model in recognizing the metaphorical expression of “anger” are 88.73%, 82.13%, and 83.29%, respectively, which proves that the model can effectively accomplish the task of emotion recognition in Chinese speech.
The Acc, P, R, F and AUC values of the proposed method in this paper are 0.958, 0.923, 0.964, 0.935 and 0.94 on the task of metaphor recognition in Chinese speech, which are significantly higher than those of the comparison methods.
