...
首页> 外文期刊>Computer speech and language >Hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation
【24h】

Hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation

机译:端到端视听场景感知对话响应生成的分层多模式关注

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

This work is extended from our participation in the 7th Dialogue System Technology Challenge (DSTC7), where we participated in the Audio Visual Scene-aware Dialogue System (AVSD) track. The AVSD track evaluates how dialogue systems understand video scenes and responds to users about the video visual and audio content. We propose a hierarchical attention approach on user queries, video caption, audio and visual features that contribute to improved evaluation results. We also apply a nonlinear feature fusion approach to combine the visual and audio features for better knowledge representation. Our proposed model shows superior performance in terms of both objective evaluation and human rating as compared to the baselines. In this extended work, we also provide a more extensive review of the related work, conduct additional experiments with word-level and context-level pretrained embeddings, and investigate different qualitative aspects of the generated responses.
机译:这项工作从我们参与第7个对话系统技术挑战(DSTC7),我们参与了音频视觉场景感知对话系统(AVSD)轨道。 AVSD轨道评估对话系统如何了解视频场景并响应用户有关视频视觉和音频内容的用户。我们提出了对用户查询,视频字幕,音频和视觉功能的分层注意方法,这些功能有助于改进的评估结果。我们还应用非线性特征融合方法来组合视觉和音频功能以获得更好的知识表示。与基线相比,我们所提出的模型在客观评估和人类评级方面表现出卓越的性能。在这项扩展工作中,我们还提供了对相关工作的更广泛的审查,对单词级和上下文级净化嵌入进行额外的实验,并调查所生成的反应的不同定性方面。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号