首页> 外文会议>European Conference on Computer Vision >Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation
【24h】

Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation

机译:SEP-Stereo:通过关联源分离的视觉引导立体声音频生成

获取原文

摘要

Stereophonic audio is an indispensable ingredient, to enhance human auditory experience. Recent research has explored the usage of visual information as guidance to generate binaural or ambisonic audio from mono ones with stereo supervision. However, this fully supervised paradigm suffers from an inherent drawback: the recording of stereophonic audio usually requires delicate devices that are expensive for wide accessibility. To overcome this challenge, we propose to leverage the vastly available mono data to facilitate the generation of stereophonic audio. Our key observation is that the task of visually indicated audio separation also maps independent audios to their corresponding visual positions, which shares a similar objective with stereophonic audio generation. We integrate both stereo generation and source separation into a unified framework, Sep-Stereo, by considering source separation as a particular type of audio spatialization. Specifically, a novel associative pyramid network architecture is carefully designed for audio-visual feature fusion. Extensive experiments demonstrate that our framework can improve the stereophonic audio generation results while performing accurate sound separation with a shared backbone.
机译:立体声音频是一种不可或缺的成分,以提高人类听觉体验。最近的研究已经探讨了视觉信息的使用作为从Mono INER生成双耳或ambisonic音频的指导。然而,这种完全监督的范式遭受了固有的缺点:立体声音频的记录通常需要昂贵的设备以广泛的可访问性昂贵。为了克服这一挑战,我们建议利用广大的单声道数据来促进立体声音频的产生。我们的重点观察是,视觉指示的音频分离的任务也将独立的音频映射到其相应的视觉位置,其与立体声音频产生共享类似的目标。通过考虑作为特定类型的音频时空化的源分离,我们将立体声生成和源分离集成到统一的框架SEP-Stereo中。具体而言,专门为视听特征融合而设计了一种新颖的关联金字塔网络架构。广泛的实验表明,我们的框架可以提高立体声音频生成结果,同时执行与共享骨干的准确声音分离。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号