Multimodal (audiovisual) source separation exploiting multi-speaker tracking, robust beamforming and time-frequency masking

Naqvi S.M.; Wang W.; Khan M.S.; Barnard M.; Chambers J.A.

首页> 外文期刊>Signal Processing, IET >Multimodal (audiovisual) source separation exploiting multi-speaker tracking, robust beamforming and time-frequency masking

【24h】

Multimodal (audiovisual) source separation exploiting multi-speaker tracking, robust beamforming and time-frequency masking

机译：利用多扬声器跟踪，强大的波束形成和时频掩蔽的多模式（视听）源分离

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

A novel multimodal source separation approach is proposed for physically moving and stationary sources which exploits a circular microphone array, multiple video cameras, robust spatial beamforming and time-frequency masking. The challenge of separating moving sources, including higher reverberation time (RT) even for physically stationary sources, is that the mixing filters are time varying; as such the unmixing filters should also be time varying but these are difficult to determine from only audio measurements. Therefore in the proposed approach, visual modality is used to facilitate the separation for both stationary and moving sources. The movement of the sources is detected by a three-dimensional tracker based on a Markov Chain Monte Carlo particle filter. The audio separation is performed by a robust least squares frequency invariant data-independent beamformer. The uncertainties in source localisation and direction of arrival information obtained from the 3D video-based tracker are controlled by using a convex optimisation approach in the beamformer design. In the final stage, the separated audio sources are further enhanced by applying a binary time-frequency masking technique in the cepstral domain. Experimental results show that using the visual modality, the proposed algorithm cannot only achieve performance better than conventional frequency-domain source separations algorithms, but also provide acceptable separation performance for moving sources.

机译：提出了一种新颖的多峰源分离方法，该方法用于物理和固定源，它利用了圆形麦克风阵列，多个摄像机，鲁棒的空间波束形成和时频掩蔽。分离移动声源的挑战，即使对于物理上固定的声源，也包括较高的混响时间（RT），这是混合滤波器随时间变化的原因。因此，解混滤波器也应随时间变化，但是很难仅通过音频测量来确定。因此，在所提出的方法中，视觉形态用于促进固定源和移动源的分离。源的运动由基于Markov Chain Monte Carlo粒子滤波器的三维跟踪器检测。音频分离由鲁棒的最小二乘频率不变数据独立波束形成器执行。从基于3D视频的跟踪器获得的源定位和到达方向信息的不确定性是通过在波束形成器设计中使用凸优化方法来控制的。在最后阶段，通过在倒谱域中应用二进制时频掩蔽技术来进一步增强分离的音频源。实验结果表明，利用视觉模态，该算法不仅性能优于常规频域源分离算法，而且为移动源提供了可接受的分离性能。

著录项

来源
《Signal Processing, IET》 |2012年第5期|p.466-477|共12页
作者
Naqvi S.M.; Wang W.; Khan M.S.; Barnard M.; Chambers J.A.;
展开▼
作者单位

Advanced Signal Processing Group, School of Electronic, Electrical and Systems Engineering, Loughborough University, Leicestershire LE11 3TU, UK;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Online blind speech separation using multiple acoustic speaker tracking and time-frequency masking [J] . P. Pertila Computer speech and language . 2013,第3期

机译：使用多个声学扬声器跟踪和时频掩蔽的在线盲语音分离
2. Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering [J] . Jean-Marc Valin, Francois Michaud, Jean Rouat Robotics and Autonomous Systems . 2007,第3期

机译：使用波束成形和粒子滤波对同时移动的声源进行稳健的定位和跟踪
3. Robust acoustic source localization based on modal beamforming and time-frequency processing using circular microphone arrays [J] . Torres A.M., Cobos M., Pueo B., The Journal of the Acoustical Society of America . 2012,第3aPta1期

机译：基于模态波束形成和使用圆形麦克风阵列的时频处理的稳健声源定位
4. Sound source separation by using matched beamforming and time-frequency masking [C] . IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems . 2010

机译：通过使用匹配的波束形成和时频掩膜进行声源分离
5. Separation of Agile Waveform Time-Frequency Signatures from Coexisting Multimodal Systems [D] . Gattani, Vineet Sunil. 2018

机译：共存多峰系统的敏捷波形时频签名的分离
6. Towards Robust Multiple Blind Source Localization Using Source Separation and Beamforming [O] . Henglin Pu, Chao Cai, Menglan Hu, 2021

机译：通过源分离和波束成形来实现强大的多个盲源本地化
7. Sound Source Separation by Using Matched Beamforming and Time-Frequency Masking [O] . Jounghoon Beh, Taekjin Lee, David Han, 2013

机译：通过匹配波束成形和时频掩蔽进行声源分离

Multimodal (audiovisual) source separation exploiting multi-speaker tracking, robust beamforming and time-frequency masking

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅