首页> 外文会议>International Conference on Computer Vision >Vision-Infused Deep Audio Inpainting
【24h】

Vision-Infused Deep Audio Inpainting

机译:融合视觉的深音频画

获取原文

摘要

Multi-modality perception is essential to develop interactive intelligence. In this work, we consider a new task of visual information-infused audio inpainting, i.e., synthesizing missing audio segments that correspond to their accompanying videos. We identify two key aspects for a successful inpainter: (1) It is desirable to operate on spectrograms instead of raw audios. Recent advances in deep semantic image inpainting could be leveraged to go beyond the limitations of traditional audio inpainting. (2) To synthesize visually indicated audio, a visual-audio joint feature space needs to be learned with synchronization of audio and video. To facilitate a large-scale study, we collect a new multi-modality instrument-playing dataset called MUSIC-Extra-Solo (MUSICES) by enriching MUSIC dataset. Extensive experiments demonstrate that our framework is capable of inpainting realistic and varying audio segments with or without visual contexts. More importantly, our synthesized audio segments are coherent with their video counterparts, showing the effectiveness of our proposed Vision-Infused Audio Inpainter (VIAI).
机译:多模态感知对于发展交互式智能至关重要。在这项工作中,我们考虑了注入视觉信息的音频修补的新任务,即合成与它们的伴随视频相对应的缺失音频片段。我们确定了成功的画家的两个关键方面:(1)最好对声谱图进行操作,而不要对原始音频进行操作。可以利用深度语义图像修复的最新进展来超越传统音频修复的局限性。 (2)为了合成视觉指示的音频,需要通过音频和视频的同步来学习视音频联合特征空间。为了便于进行大规模研究,我们通过丰富MUSIC数据集来收集一个称为MUSIC-Extra-Solo(MUSICES)的新的多模式乐器演奏数据集。大量的实验表明,我们的框架能够在有或没有视觉环境的情况下,修复现实的和变化的音频片段。更重要的是,我们的合成音频片段与视频片段是一致的,这表明我们提出的“视觉融合音频喷漆器”(VIAI)的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号