...
首页> 外文期刊>Speech Communication >Visual voice activity detection as a help for speech source separation from convolutive mixtures
【24h】

Visual voice activity detection as a help for speech source separation from convolutive mixtures

机译:视觉语音活动检测可帮助从卷积混合物中分离语音源

获取原文
获取原文并翻译 | 示例

摘要

Audio-visual speech source separation consists in mixing visual speech processing techniques (e.g., lip parameters tracking) with source separation methods to improve the extraction of a speech source of interest from a mixture of acoustic signals. In this paper, we present a new approach that combines visual information with separation methods based on the sparseness of speech: visual information is used as a voice activity detector (VAD) which is combined with a new geometric method of separation. The proposed audiovisual method is shown to be efficient to extract a real spontaneous speech utterance in the difficult case of convolutive mixtures even if the competing sources are highly non-stationary. Typical gains of 18-20 dB in signal to interference ratios are obtained for a wide range of (2 x 2) and (3 x 3) mixtures. Moreover, the overall process is computationally quite simpler than previously proposed audio-visual separation schemes.
机译:视听语音源分离在于将视觉语音处理技术(例如,嘴唇参数跟踪)与源分离方法相混合,以改善从声信号的混合中提取感兴趣的语音源。在本文中,我们提出了一种基于语音稀疏性将视觉信息与分离方法相结合的新方法:视觉信息用作语音活动检测器(VAD),并与一种新的几何分离方法相结合。事实证明,即使在竞争性源高度不稳定的情况下,在卷积混合物的困难情况下,提出的视听方法也能有效地提取真正的自发语音。对于宽范围的(2 x 2)和(3 x 3)混合物,信号干扰比的典型增益为18-20 dB。而且,整个过程在计算上比以前提出的视听分离方案要简单得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号