Emotional analysis and semantic understanding of multimodal network language data 
Publié en ligne: 31 mars 2025
Reçu: 05 nov. 2024
Accepté: 13 févr. 2025
DOI: https://doi.org/10.2478/amns-2025-0818
Mots clés
© 2025 Chen Weimiao, published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
The diversity of network language data forms, including text, images, audio and other modal data, is not only rich in information content, but also contains users' emotional attitudes and semantic intentions. This paper presents a model that incorporates the unique features of text, image, audio and other modal data. The model includes a multimodal emotion analysis module and a multimodal semantic understanding module. Deep learning method is used to analyze emotion and understand semantics, so as to improve the accuracy and robustness of natural language processing (NLP) technology. In the aspect of emotion analysis, the model combines the feature vectors of different modes by feature level fusion method, and uses the attention mechanism model based on Transformer to classify emotions. In the aspect of semantic understanding, the model integrates the deep features of text and image extracted by BERT and ResNet, and carries out cross-modal semantic reasoning through Transformer model. The experimental findings indicate that the multi-modal fusion model outperforms the single-modal model in emotion analysis, achieving an accuracy of 85.2%, 86.7% recall and 85.0% F1 value, which proves the potential of multi-modal data fusion in enhancing the performance of emotion analysis. At the same time, the precision of the model in semantic tag recognition is 79.3%, the recall rate is 76.8%, and the F1 value is 78.0%, which shows the robustness of the model in avoiding wrong classification and identifying related semantic information. This study provides more intelligent and accurate services for social media monitoring, customer feedback analysis, intelligent customer service and other application fields, and offers important insights for future studies in the area of multimodal information processing.
