Visual voice activity detection as a help for speech source separation from convolutive mixtures

Bertrand Rivet; Laurent Girin; Christian Jutten

首页> 外文期刊>Speech Communication >Visual voice activity detection as a help for speech source separation from convolutive mixtures

【24h】

Visual voice activity detection as a help for speech source separation from convolutive mixtures

机译：视觉语音活动检测可帮助从卷积混合物中分离语音源

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Audio-visual speech source separation consists in mixing visual speech processing techniques (e.g., lip parameters tracking) with source separation methods to improve the extraction of a speech source of interest from a mixture of acoustic signals. In this paper, we present a new approach that combines visual information with separation methods based on the sparseness of speech: visual information is used as a voice activity detector (VAD) which is combined with a new geometric method of separation. The proposed audiovisual method is shown to be efficient to extract a real spontaneous speech utterance in the difficult case of convolutive mixtures even if the competing sources are highly non-stationary. Typical gains of 18-20 dB in signal to interference ratios are obtained for a wide range of (2 x 2) and (3 x 3) mixtures. Moreover, the overall process is computationally quite simpler than previously proposed audio-visual separation schemes.

机译：视听语音源分离在于将视觉语音处理技术（例如，嘴唇参数跟踪）与源分离方法相混合，以改善从声信号的混合中提取感兴趣的语音源。在本文中，我们提出了一种基于语音稀疏性将视觉信息与分离方法相结合的新方法：视觉信息用作语音活动检测器（VAD），并与一种新的几何分离方法相结合。事实证明，即使在竞争性源高度不稳定的情况下，在卷积混合物的困难情况下，提出的视听方法也能有效地提取真正的自发语音。对于宽范围的（2 x 2）和（3 x 3）混合物，信号干扰比的典型增益为18-20 dB。而且，整个过程在计算上比以前提出的视听分离方案要简单得多。

著录项

来源
《Speech Communication 》 |2007年第8期| p.667-677| 共11页
作者
Bertrand Rivet; Laurent Girin; Christian Jutten;
展开▼
作者单位

Institut de la Communication Parlee (ICP), CNRS UMR 5009, INPG, Universite Stendhal, Grenoble, France;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类语言、文字 ;
关键词
speech source separation; convolutive mixtures; voice activity detector; visual speech processing; speech enhancement; highly non-stationary environments;

机译：语音源分离;卷积混合物;语音活动检测器;可视语音处理;语音增强;高度非平稳环境;

相似文献

外文文献
中文文献
专利

1. Mixing Audiovisual Speech Processing and Blind Source Separation for the Extraction of Speech Signals From Convolutive Mixtures [J] . Bertrand Rivet, Laurent Girin, Christian Jutten IEEE transactions on audio, speech and language processing . 2007 ,第1期

机译：混合视听语音处理和盲源分离，从卷积混合物中提取语音信号
2. Interference Reduction in Reverberant Speech Separation With Visual Voice Activity Detection [J] . Liu Q., Aubrey A.J., Wang W. Multimedia, IEEE Transactions on . 2014 ,第6期

机译：可视化语音活动检测以减少混响语音分离中的干扰
3. A hybrid algorithm for blind source separation of a convolutive mixture of three speech sources [J] . Shahab Faiz Minhas, Patrick Gaydecki EURASIP journal on advances in signal processing . 2014 ,第1期

机译：一种混合算法，用于分离三个语音源的卷积混合的盲源
4. Using a Visual Voice Activity Detector to Regularize the Permutations in Blind Separation of Convolutive Speech Mixtures [C] . Rivet, Bertrand, Girin, . 2007

机译：使用可视语音活动检测器对卷积语音混合盲分离中的排列进行正则化
5. Advances in Audiovisual Speech Processing for Robust Voice Activity Detection and Automatic Speech Recognition [D] . Tao, Fei. 2018

机译：用于鲁棒语音活动检测和自动语音识别的视听语音处理方面的进展
6. A Convolutional Neural Network Smartphone App for Real-Time Voice Activity Detection [O] . Abhishek Sehgal, Nasser Kehtarnavaz -1

机译：用于实时语音活动检测的卷积神经网络智能手机应用程序
7. Visual voice activity detection as a help for speech source separation from convolutive mixtures [O] . Bertrand Rivet, Laurent Girin, Christian Jutten 2007

机译：视觉语音活动检测为来自卷曲混音的言语源分离的帮助
8. Speech Spectral Moment Convergence. Voiced-Voiceless Consonant Contrasts in Whispered Speech [R] . Golomb, S. W. 1967

机译：语音谱瞬间收敛。低语言中的浊音 - 无声辅音对比

Visual voice activity detection as a help for speech source separation from convolutive mixtures

摘要

著录项

相似文献

相关主题

期刊订阅