首页> 外文会议>European Conference on Computer Vision >Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation

【24h】

Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation

机译：SEP-Stereo：通过关联源分离的视觉引导立体声音频生成

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Stereophonic audio is an indispensable ingredient, to enhance human auditory experience. Recent research has explored the usage of visual information as guidance to generate binaural or ambisonic audio from mono ones with stereo supervision. However, this fully supervised paradigm suffers from an inherent drawback: the recording of stereophonic audio usually requires delicate devices that are expensive for wide accessibility. To overcome this challenge, we propose to leverage the vastly available mono data to facilitate the generation of stereophonic audio. Our key observation is that the task of visually indicated audio separation also maps independent audios to their corresponding visual positions, which shares a similar objective with stereophonic audio generation. We integrate both stereo generation and source separation into a unified framework, Sep-Stereo, by considering source separation as a particular type of audio spatialization. Specifically, a novel associative pyramid network architecture is carefully designed for audio-visual feature fusion. Extensive experiments demonstrate that our framework can improve the stereophonic audio generation results while performing accurate sound separation with a shared backbone.

机译：立体声音频是一种不可或缺的成分，以提高人类听觉体验。最近的研究已经探讨了视觉信息的使用作为从Mono INER生成双耳或ambisonic音频的指导。然而，这种完全监督的范式遭受了固有的缺点：立体声音频的记录通常需要昂贵的设备以广泛的可访问性昂贵。为了克服这一挑战，我们建议利用广大的单声道数据来促进立体声音频的产生。我们的重点观察是，视觉指示的音频分离的任务也将独立的音频映射到其相应的视觉位置，其与立体声音频产生共享类似的目标。通过考虑作为特定类型的音频时空化的源分离，我们将立体声生成和源分离集成到统一的框架SEP-Stereo中。具体而言，专门为视听特征融合而设计了一种新颖的关联金字塔网络架构。广泛的实验表明，我们的框架可以提高立体声音频生成结果，同时执行与共享骨干的准确声音分离。

著录项

来源
《European Conference on Computer Vision》|2020年|52-69|共18页
会议地点
作者
Hang Zhou; Xudong Xu; Dahua Lin; Xiaogang Wang; Ziwei Liu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. From Blind to Guided Audio Source Separation: How models and side information can improve the separation of sound [J] . IEEE Signal Processing Magazine . 2014,第3期

机译：从盲目到引导性音频源分离：模型和辅助信息如何改善声音分离
2. Selective Listening Point Audio Based On Blind Signal Separation And Stereophonic Technology [J] . Kenta NIWA, Takanori NISHINO, Kazuya TAKEDA IEICE Transactions on Information and Systems . 2009,第3期

机译：基于盲信号分离和立体声技术的选择性听点音频
3. Single channel informed signal separation using artificial-stereophonic mixtures and exemplar-guided matrix factor deconvolution [J] . Al-Tmeme Ahmed, Woo W. L., Dlay S. S., International Journal of Adaptive Control and Signal Processing . 2018,第9期

机译：使用人工立体声混合物和示例性引导的矩阵因子反卷积进行单通道知悉信号分离
4. MAIN INSTRUMENT SEPARATION FROM STEREOPHONIC AUDIO SIGNALS USING A SOURCE/FILTER MODEL [C] . Jean-Louis DURRIEU, Alexey OZEROV, Cedric FEVOTTE, European signal processing conference;EUSIPCO 2009 . 2010

机译：使用源/滤波器模型从立体声音频信号中分离主要乐器
5. EEG source localization of visual and proprioceptive error processing during visually-guided target tracking with the wrist [D] . Sukerkar, Prajakta Ashok 2010

机译：手腕视觉引导目标跟踪过程中视觉和本体感觉错误处理的脑电信号源定位
6. Impairment of visually guided associative learning in children with Tourette syndrome [O] . Gabriella Eördegh, Ákos Pertich, Zsanett Tárnok, 2020

机译：温室综合征儿童视觉引导联想学习的减值
7. Main Instrument Separation from Stereophonic Audio Signals using a Source/Filter Model [O] . David Bertrand, Durrieu Jean-Louis, Févotte Cédric, 2009

机译：使用源/滤波器模型将主乐器与立体声音频信号分离
8. The subjective effects of interchannel phase-shifts on the stereophonic image localisation of wideband audio signals [R] . J. S. Bower 1975

机译：声道间相移对宽带音频信号立体声图像定位的主观影响

Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation

摘要

著录项

相似文献

相关主题

期刊订阅