首页> 外文期刊>Signal Processing, IET >Multimodal (audiovisual) source separation exploiting multi-speaker tracking, robust beamforming and time-frequency masking
【24h】

Multimodal (audiovisual) source separation exploiting multi-speaker tracking, robust beamforming and time-frequency masking

机译:利用多扬声器跟踪,强大的波束形成和时频掩蔽的多模式(视听)源分离

获取原文
获取原文并翻译 | 示例
           

摘要

A novel multimodal source separation approach is proposed for physically moving and stationary sources which exploits a circular microphone array, multiple video cameras, robust spatial beamforming and time-frequency masking. The challenge of separating moving sources, including higher reverberation time (RT) even for physically stationary sources, is that the mixing filters are time varying; as such the unmixing filters should also be time varying but these are difficult to determine from only audio measurements. Therefore in the proposed approach, visual modality is used to facilitate the separation for both stationary and moving sources. The movement of the sources is detected by a three-dimensional tracker based on a Markov Chain Monte Carlo particle filter. The audio separation is performed by a robust least squares frequency invariant data-independent beamformer. The uncertainties in source localisation and direction of arrival information obtained from the 3D video-based tracker are controlled by using a convex optimisation approach in the beamformer design. In the final stage, the separated audio sources are further enhanced by applying a binary time-frequency masking technique in the cepstral domain. Experimental results show that using the visual modality, the proposed algorithm cannot only achieve performance better than conventional frequency-domain source separations algorithms, but also provide acceptable separation performance for moving sources.
机译:提出了一种新颖的多峰源分离方法,该方法用于物理和固定源,它利用了圆形麦克风阵列,多个摄像机,鲁棒的空间波束形成和时频掩蔽。分离移动声源的挑战,即使对于物理上固定的声源,也包括较高的混响时间(RT),这是混合滤波器随时间变化的原因。因此,解混滤波器也应随时间变化,但是很难仅通过音频测量来确定。因此,在所提出的方法中,视觉形态用于促进固定源和移动源的分离。源的运动由基于Markov Chain Monte Carlo粒子滤波器的三维跟踪器检测。音频分离由鲁棒的最小二乘频率不变数据独立波束形成器执行。从基于3D视频的跟踪器获得的源定位和到达方向信息的不确定性是通过在波束形成器设计中使用凸优化方法来控制的。在最后阶段,通过在倒谱域中应用二进制时频掩蔽技术来进一步增强分离的音频源。实验结果表明,利用视觉模态,该算法不仅性能优于常规频域源分离算法,而且为移动源提供了可接受的分离性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号