A multimodal dialogue system for improving user satisfaction via knowledge-enriched response and image recommendation

Wang Jiangnan; Li Haisheng; Wang LeiquanWu Chunlei

首页> 外文期刊>Neural computing & applications >A multimodal dialogue system for improving user satisfaction via knowledge-enriched response and image recommendation

【24h】

A multimodal dialogue system for improving user satisfaction via knowledge-enriched response and image recommendation

机译：A multimodal dialogue system for improving user satisfaction via knowledge-enriched response and image recommendation

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

Task-oriented multimodal dialogue systems have important application value and development prospects. Existing methods have made significant progress, but the following challenges still exist: (1) Most existing methods focus on improving the accuracy of dialogue state tracking and dialogue act prediction. However, the essential to leverage knowledge in the knowledge base to supplement textual responses in multi-turn dialogues is ignored. (2) One feature that distinguishes multimodal dialogue from plain text dialogue is the usage of visual information. However, existing methods ignore the importance of accurately providing visual information to improve user satisfaction. (3) For multimodal dialogue systems, most existing methods ignore the classification of response types to assign appropriate response generators automatically. To address the issues above, we present a user-satisfactory multimodal dialogue system, USMD for short. Specifically, USMD is designed as four modules. The general response generator is based on generative pre-training 2.0 (GPT-2) to generate dialogue acts and general textual responses. The knowledge-enriched response generator is designed to leverage a structured knowledge base under the guidance of a knowledge graph. The image recommender pays attention to both latent and explicit visual cues, a deep multimodal fusion model to obtain informative image representations. Finally, the response classifier automatically selects the appropriate generators to answer the user based on user and agent actions. Extensive experiments on the benchmark multimodal dialogue datasets show that the proposed USMD model achieves state-of-the-art performance.

著录项

来源
《Neural computing & applications》 |2023年第18期|13187-13206|共20页
作者
Wang Jiangnan; Li Haisheng; Wang LeiquanWu Chunlei;
展开▼
作者单位

China University of Petroleum (East China);

Beijing Technology and Business University;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类人工神经网络计算机;人工智能理论;
关键词
Multimodal dialogue; Response generation; Image selection; User satisfaction;

A multimodal dialogue system for improving user satisfaction via knowledge-enriched response and image recommendation

摘要

著录项

相关主题

期刊订阅