Hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation

Hung Le; Doyen Sahoo; Nancy F. Chen; Steven C.H. Hoi

首页> 外文期刊>Computer speech and language >Hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation

【24h】

Hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation

机译：端到端视听场景感知对话响应生成的分层多模式关注

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This work is extended from our participation in the 7th Dialogue System Technology Challenge (DSTC7), where we participated in the Audio Visual Scene-aware Dialogue System (AVSD) track. The AVSD track evaluates how dialogue systems understand video scenes and responds to users about the video visual and audio content. We propose a hierarchical attention approach on user queries, video caption, audio and visual features that contribute to improved evaluation results. We also apply a nonlinear feature fusion approach to combine the visual and audio features for better knowledge representation. Our proposed model shows superior performance in terms of both objective evaluation and human rating as compared to the baselines. In this extended work, we also provide a more extensive review of the related work, conduct additional experiments with word-level and context-level pretrained embeddings, and investigate different qualitative aspects of the generated responses.

机译：这项工作从我们参与第7个对话系统技术挑战（DSTC7），我们参与了音频视觉场景感知对话系统（AVSD）轨道。 AVSD轨道评估对话系统如何了解视频场景并响应用户有关视频视觉和音频内容的用户。我们提出了对用户查询，视频字幕，音频和视觉功能的分层注意方法，这些功能有助于改进的评估结果。我们还应用非线性特征融合方法来组合视觉和音频功能以获得更好的知识表示。与基线相比，我们所提出的模型在客观评估和人类评级方面表现出卓越的性能。在这项扩展工作中，我们还提供了对相关工作的更广泛的审查，对单词级和上下文级净化嵌入进行额外的实验，并调查所生成的反应的不同定性方面。

著录项

来源
《Computer speech and language》 |2020年第9期|101095.1-101095.13|共13页
作者
Hung Le; Doyen Sahoo; Nancy F. Chen; Steven C.H. Hoi;
展开▼
作者单位

Singapore Management University 81 Victoria St 188065 Singapore Institute for Infocomm Research 1 Fusionopolis Way 138632 Singapore;

Salesforce Research Asia 5 Temasek Boulevard Suntec Tower Five 038985 Singapore;

Institute for Infocomm Research 1 Fusionopolis Way 138632 Singapore;

Singapore Management University 81 Victoria St 188065 Singapore Salesforce Research Asia 5 Temasek Boulevard Suntec Tower Five 038985 Singapore;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Dialogue system; Audio-visual scene-aware dialogue; Neural network; Multimodal attention; Response generation;

机译：对话系统;视听场景感知对话;神经网络;多峰关注;反应生成;

相似文献

外文文献
中文文献
专利

1. Aspect-Aware Response Generation for Multimodal Dialogue System [J] . Firdaus Mauajama, Thakur Nidhi, Ekbal Asif ACM transactions on intelligent systems and technology . 2021,第2期

机译：多模式对话系统的方面感知响应生成
2. Generation and evaluation of user tailored responses in multimodal dialogue [J] . M.A. Walker, S.J. Whittaker, A. Stent, Cognitive Science . 2004,第5期

机译：在多模式对话中生成和评估用户量身定制的响应
3. Neural Dialogue Model with Retrieval Attention for Personalized Response Generation [J] . Computers, Materials & Continua . 2020,第1期

机译：具有检索注意力的神经对话模型用于个性化响应生成
4. End-to-end Audio Visual Scene-aware Dialog Using Multimodal Attention-based Video Features [C] . Chiori Hori, Huda Alamri, Jue Wang, IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：使用基于多模式注意力的视频功能的端到端视听场景感知对话框
5. Scalable and Accurate Dialogue State Tracking via Hierarchical Sequence Generation [D] . Ren, Liliang. 2020

机译：通过分层序列生成可扩展和准确的对话状态跟踪
6. Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment [O] . Yue Gu, Kangning Yang, Shiyu Fu, -1

机译：使用带有词级对齐的分层注意策略的多模式情感分析
7. End-to-end Audio Visual Scene-aware Dialog Using Multimodal Attention-based Video Features [O] . Chiori Hori, Huda Alamri, Jue Wang, 2019

机译：使用基于多模式关注的视频功能的端到端音频视觉场景感知对话框
8. Multimodal Interfaces: Literature Review of Ecological Interface Design, Multimodal Perception and Attention, and Intelligent Adaptive Multimodal Interfaces [R] . Giang, W., Santhakumaran, S., Masnavi, E., 2010

机译：多模态界面：生态界面设计，多模式感知和注意以及智能自适应多模态界面的文献综述

Hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation

摘要

著录项

相似文献

相关主题

期刊订阅