Investigating topics, audio representations and attention for multimodal scene-aware dialog

Shachi H. Kumar; Eda Okur; Saurav Sahay; Jonathan Huang; Lama Nachman

首页> 外文期刊>Computer speech and language >Investigating topics, audio representations and attention for multimodal scene-aware dialog

【24h】

Investigating topics, audio representations and attention for multimodal scene-aware dialog

机译：调查多模式场景感知对话框的主题，音频表示和注意力

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

With the recent advancements in Artificial Intelligence(AI), Intelligent Virtual Assistants (IVA) such as Alexa and Google Home have become a ubiquitous part of every home. Currently, such IVAs are mostly audio-based, but going forward, we are witnessing a confluence of vision, speech and dialog system technologies that are enabling the IVAs to learn audiovisual groundings of utterances. This will enable agents to have conversations with users about the objects, activities and events surrounding them. As part of the 7th Dialog System Technology Challenges (DSTC7), for Audio Visual Scene-Aware Dialog (AVSD) track, we explore three main techniques for multimodal dialog: 1) exploring 'topics' of the dialog as an important contextual feature for scene-aware conversations, 2) investigating several multi-modal attention mechanisms during response generation and 3) incorporating an end-to-end audio classification sub network(AclNet) into our architecture. We present detailed analysis of our experiments and show that our model variations outperform the baseline system presented for this task.

机译：随着近期人工智能（AI）的进步，智能虚拟助理（IVA），如Alexa和Google Home，已成为每个家庭的无处不在的部分。目前，此类IVA主要是基于音频，但前进，我们正在目睹了愿景，言语和对话系统技术的汇合，这些技术使IVA能够学习话语的视听接地。这将使代理能够与用户有关于它们周围的对象，活动和事件的对话。作为第7个对话系统技术挑战（DSTC7）的一部分，对于音频视觉场景感知对话框（AVSD）跟踪，我们探索了三个多模式对话的主要技术：1）将对话框的“主题”探讨为场景的重要上下文功能--AWARE对话，2）在响应生成期间调查几种多模态注意机制，以及将端到端音频分类子网络（ACLNET）的架构中的响应生成和3）进行调查。我们对我们的实验进行了详细的分析，并表明我们的模型变体优于此任务所呈现的基线系统。

著录项

来源
《Computer speech and language》 |2020年第11期|1011021.1-1011021.12|共12页
作者
Shachi H. Kumar; Eda Okur; Saurav Sahay; Jonathan Huang; Lama Nachman;
展开▼
作者单位

Anticipatory Computing Lab Intel Labs United States;

Anticipatory Computing Lab Intel Labs United States;

Anticipatory Computing Lab Intel Labs United States;

Anticipatory Computing Lab Intel Labs United States;

Anticipatory Computing Lab Intel Labs United States;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Al; Intelligent assistants; Multimodal understanding; Response generation;

机译：al;智能助理;多式化理解;反应生成;

相似文献

外文文献
中文文献
专利

1. Hierarchical multimodal attention for end-to-end audio-visual scene-aware dialogue response generation [J] . Hung Le, Doyen Sahoo, Nancy F. Chen, Computer speech and language . 2020,第Sepa期

机译：端到端视听场景感知对话响应生成的分层多模式关注
2. Scene-Aware Audio Rendering via Deep Acoustic Analysis [J] . Tang Zhenyu, Bryan Nicholas J., Li Dingzeyu, IEEE transactions on visualization and computer graphics . 2020,第5期

机译：通过深度声学分析的场景感知音频渲染
3. Scene-Aware Audio for 360° Videos [J] . Li Dingzeyu, Langlois Timothy R., Zheng Changxi ACM Transactions on Graphics . 2018,第4CD期

机译：360°视频的场景感知音频
4. End-to-end Audio Visual Scene-aware Dialog Using Multimodal Attention-based Video Features [C] . Chiori Hori, Huda Alamri, Jue Wang, IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：使用基于多模式注意力的视频功能的端到端视听场景感知对话框
5. An Analysis of Bottom-Up Attention Models and Multimodal Representation Learning for Visual Question Answering [D] . Narayanan, Venkatraman . 2019

机译：视觉问题应答的自下而上关注模型和多式联表学习分析
6. Modulation of Brain Activity by Selective Attention to Audiovisual Dialogues [O] . Alina Leminen, Maxime Verwoert, Mona Moisala, 2020

机译：通过选择性地关注视听对话来调制脑活动
7. End-to-end Audio Visual Scene-aware Dialog Using Multimodal Attention-based Video Features [O] . Chiori Hori, Huda Alamri, Jue Wang, 2019

机译：使用基于多模式关注的视频功能的端到端音频视觉场景感知对话框
8. Cognitive Models of the Effect of Audio Cueing on Attentional Shifts in a Complex Multimodal, Dual-Display Dual Task [R] . Brock, D., McClimens, B., Hornof, A., 2006

机译：复杂多模双显双重任务中音频提示对注意转移影响的认知模型

Investigating topics, audio representations and attention for multimodal scene-aware dialog

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅