首页> 外文会议>International Conference on Digital Signal Processing >Two-stage audio-visual speech dereverberation and separation based on models of the interaural spatial cues and spatial covariance
【24h】

Two-stage audio-visual speech dereverberation and separation based on models of the interaural spatial cues and spatial covariance

机译:基于听觉空间线索和空间协方差模型的两阶段视听语音混响和分离

获取原文

摘要

This work presents a two-stage speech source separation algorithm based on combined models of interaural cues and spatial covariance which utilize knowledge of the locations of the sources estimated through video. In the first pre-processing stage the late reverberant speech components are suppressed by a spectral subtraction rule to dereverberate the observed mixture. In the second stage, the binaural spatial parameters, the interaural phase difference and the interaural level difference, and the spatial covariance are modeled in the short-time Fourier transform (STFT) domain to classify individual time-frequency (TF) units to each source. The parameters of these probabilistic models and the TF regions assigned to each source are updated with the expectation-maximization (EM) algorithm. The algorithm generates TF masks that are used to reconstruct the individual speech sources. Objective results, in terms of the signal-to-distortion ratio (SDR) and the perceptual evaluation of speech quality (PESQ), confirm that the proposed multimodal method with pre-processing is a promising approach for source separation in highly reverberant rooms.
机译:这项工作提出了一个两阶段的语音源分离算法,该算法基于听觉提示和空间协方差的组合模型,利用了通过视频估算的源位置的知识。在第一个预处理阶段,后期的混响语音成分被频谱减法规则抑制,以消除所观察到的混合物的声音。在第二阶段,在短时傅立叶变换(STFT)域中对双耳空间参数,耳间相位差和耳间水平差以及空间协方差建模,以将各个时频(TF)单元分类到每个源。这些概率模型的参数和分配给每个源的TF区域使用期望最大化(EM)算法进行更新。该算法生成TF掩码,用于重建各个语音源。从信号失真比(SDR)和语音质量的感知评估(PESQ)方面的客观结果,证实了所提出的带有预处理的多峰方法是一种在高混响房间中进行信号源分离的有前途的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号