...
首页> 外文期刊>Computer speech and language >Investigating topics, audio representations and attention for multimodal scene-aware dialog
【24h】

Investigating topics, audio representations and attention for multimodal scene-aware dialog

机译:调查多模式场景感知对话框的主题,音频表示和注意力

获取原文
获取原文并翻译 | 示例
           

摘要

With the recent advancements in Artificial Intelligence(AI), Intelligent Virtual Assistants (IVA) such as Alexa and Google Home have become a ubiquitous part of every home. Currently, such IVAs are mostly audio-based, but going forward, we are witnessing a confluence of vision, speech and dialog system technologies that are enabling the IVAs to learn audiovisual groundings of utterances. This will enable agents to have conversations with users about the objects, activities and events surrounding them. As part of the 7th Dialog System Technology Challenges (DSTC7), for Audio Visual Scene-Aware Dialog (AVSD) track, we explore three main techniques for multimodal dialog: 1) exploring 'topics' of the dialog as an important contextual feature for scene-aware conversations, 2) investigating several multi-modal attention mechanisms during response generation and 3) incorporating an end-to-end audio classification sub network(AclNet) into our architecture. We present detailed analysis of our experiments and show that our model variations outperform the baseline system presented for this task.
机译:随着近期人工智能(AI)的进步,智能虚拟助理(IVA),如Alexa和Google Home,已成为每个家庭的无处不在的部分。目前,此类IVA主要是基于音频,但前进,我们正在目睹了愿景,言语和对话系统技术的汇合,这些技术使IVA能够学习话语的视听接地。这将使代理能够与用户有关于它们周围的对象,活动和事件的对话。作为第7个对话系统技术挑战(DSTC7)的一部分,对于音频视觉场景感知对话框(AVSD)跟踪,我们探索了三个多模式对话的主要技术:1)将对话框的“主题”探讨为场景的重要上下文功能--AWARE对话,2)在响应生成期间调查几种多模态注意机制,以及将端到端音频分类子网络(ACLNET)的架构中的响应生成和3)进行调查。我们对我们的实验进行了详细的分析,并表明我们的模型变体优于此任务所呈现的基线系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号