Video-Aided Model-Based Source Separation in Real Reverberant Rooms

Khan; M.S.; Naqvi; S.M.; ur-Rehman; A.-; Wang; W.

首页> 外文期刊>IEEE transactions on audio, speech and language processing >Video-Aided Model-Based Source Separation in Real Reverberant Rooms

【24h】

Video-Aided Model-Based Source Separation in Real Reverberant Rooms

机译：真实混响室中基于视频的基于模型的源分离

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Source separation algorithms that utilize only audio data can perform poorly if multiple sources or reverberation are present. In this paper we therefore propose a video-aided model-based source separation algorithm for a two-channel reverberant recording in which the sources are assumed static. By exploiting cues from video, we first localize individual speech sources in the enclosure and then estimate their directions. The interaural spatial cues, the interaural phase difference and the interaural level difference, as well as the mixing vectors are probabilistically modeled. The models make use of the source direction information and are evaluated at discrete time-frequency points. The model parameters are refined with the well-known expectation-maximization (EM) algorithm. The algorithm outputs time-frequency masks that are used to reconstruct the individual sources. Simulation results show that by utilizing the visual modality the proposed algorithm can produce better time-frequency masks thereby giving improved source estimates. We provide experimental results to test the proposed algorithm in different scenarios and provide comparisons with both other audio-only and audio-visual algorithms and achieve improved performance both on synthetic and real data. We also include dereverberation based pre-processing in our algorithm in order to suppress the late reverberant components from the observed stereo mixture and further enhance the overall output of the algorithm. This advantage makes our algorithm a suitable candidate for use in under-determined highly reverberant settings where the performance of other audio-only and audio-visual methods is limited.

机译：如果存在多个源或混响，则仅利用音频数据的源分离算法可能会表现不佳。因此，在本文中，我们为两通道混响录音提出了一种基于视频辅助模型的源分离算法，其中源被假定为静态。通过利用视频中的提示，我们首先在外壳中定位单个语音源，然后估计其方向。对听觉空间提示，听觉相位差和听觉水平差以及混合矢量进行概率建模。这些模型利用了源方向信息，并在离散的时频点进行了评估。使用众所周知的期望最大化（EM）算法完善模型参数。该算法输出时频掩码，用于重建各个源。仿真结果表明，通过利用视觉模态，所提出的算法可以产生更好的时频掩模，从而提供更好的源估计。我们提供实验结果以在不同的场景下测试该算法，并与其他仅音频和视听算法进行比较，并在合成数据和真实数据上均实现了更高的性能。我们还在算法中包括基于混响的预处理，以便从观察到的立体声混合中抑制后期混响成分，并进一步增强算法的整体输出。这一优势使我们的算法成为在不确定的高混响设置中使用的合适候选者，在这种设置中，其他纯音频和视听方法的性能受到限制。

著录项

来源
《IEEE transactions on audio, speech and language processing》 |2013年第9期|1900-1912|共13页
作者
Khan; M.S.; Naqvi; S.M.; ur-Rehman; A.-; Wang; W.;
展开▼
作者单位

Advanced Signal Processing Group, School of Electronic, Electrical and Systems Engineering, Loughborough University, Leicestershire, UK|c|;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Expectation-maximization; reverberation; source separation; spatial cues; time-frequency masking;

机译：期望最大化;混响;源分离;空间线索;时频掩蔽;

相似文献

外文文献
中文文献
专利

1. A real-time blind source separation scheme and its application to reverberant and noisy acoustic environments [J] . Aichner R, Buchner H, Yan F, Signal processing . 2006,第6期

机译：实时盲源分离方案及其在混响和嘈杂声环境中的应用
2. Indeterminacy Free Frequency-Domain Blind Separation of Reverberant Audio Sources [J] . Di Persia L., Milone D., Yanagida M. IEEE transactions on audio, speech and language processing . 2009,第2期

机译：混响声源的不确定性自由频域盲分离
3. Robust sound source separation in a reverberant environment based on harmonic structure and sound source direction [J] . Takashi Yoshida, Tomohiro Nakatani, Hiroshi G. Okuno, 電子情報通信学会技術研究報告. 音声. Speech . 2003,第27期

机译：基于谐波结构和声源方向的混响环境中的稳健声源分离
4. Underdetermined Model-Based Blind Source Separation of Reverberant Speech Mixtures using Spatial Cues in a Variational Bayesian Framework [C] . Victor Popa, Wenwu Wang, Atiyeh Alinaghi IET Intelligent Signal Processing Conference . 2014

机译：基于模型的基于模型的盲来源分离混响语音混合物在变分贝叶斯框架中使用空间线索
5. Binaural model-based source separation and localization. [D] . Mandel, Michael I. 2010

机译：基于双耳模型的源分离和本地化。
6. Hearing Scenes: A Neuromagnetic Signature of Auditory Source and Reverberant Space Separation [O] . Santani Teng, Verena R. Sommer, Dimitrios Pantazis, 2017

机译：听觉场景：听觉源和混响空间分离的神经磁学签名。
7. Video-Aided Model-Based Source Separation in Real Reverberant Rooms [O] . Khan, MS, Naqvi, SM, Ata-ur-Rehman,, 2013

机译：真实混响室中基于视频的基于模型的源分离
8. Hybrid Algorithm for Robust, Real-Time Source Localization in Reverberant Environments [R] . Peterson, J. M. , Kyriakakis, C. 2006

机译：混响环境中鲁棒实时源定位的混合算法

Video-Aided Model-Based Source Separation in Real Reverberant Rooms

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅