Two-stage audio-visual speech dereverberation and separation based on models of the interaural spatial cues and spatial covariance

机译：基于听觉空间线索和空间协方差模型的两阶段视听语音混响和分离

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This work presents a two-stage speech source separation algorithm based on combined models of interaural cues and spatial covariance which utilize knowledge of the locations of the sources estimated through video. In the first pre-processing stage the late reverberant speech components are suppressed by a spectral subtraction rule to dereverberate the observed mixture. In the second stage, the binaural spatial parameters, the interaural phase difference and the interaural level difference, and the spatial covariance are modeled in the short-time Fourier transform (STFT) domain to classify individual time-frequency (TF) units to each source. The parameters of these probabilistic models and the TF regions assigned to each source are updated with the expectation-maximization (EM) algorithm. The algorithm generates TF masks that are used to reconstruct the individual speech sources. Objective results, in terms of the signal-to-distortion ratio (SDR) and the perceptual evaluation of speech quality (PESQ), confirm that the proposed multimodal method with pre-processing is a promising approach for source separation in highly reverberant rooms.

机译：这项工作提出了一个两阶段的语音源分离算法，该算法基于听觉提示和空间协方差的组合模型，利用了通过视频估算的源位置的知识。在第一个预处理阶段，后期的混响语音成分被频谱减法规则抑制，以消除所观察到的混合物的声音。在第二阶段，在短时傅立叶变换（STFT）域中对双耳空间参数，耳间相位差和耳间水平差以及空间协方差建模，以将各个时频（TF）单元分类到每个源。这些概率模型的参数和分配给每个源的TF区域使用期望最大化（EM）算法进行更新。该算法生成TF掩码，用于重建各个语音源。从信号失真比（SDR）和语音质量的感知评估（PESQ）方面的客观结果，证实了所提出的带有预处理的多峰方法是一种在高混响房间中进行信号源分离的有前途的方法。

著录项

来源
《International Conference on Digital Signal Processing》|2013年|1-6|共6页
会议地点
作者
Khan Muhammad Salman; Naqvi Syed Mohsen; Chambers Jonathon;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Source separation; expectation-maximization; reverberation; spatial cues; time-frequency masking;

机译：源分离;期望最大化;混响;空间线索;时频掩蔽;

相似文献

外文文献
中文文献
专利

1. Audio-Visual Speech Separation and Dereverberation With a Two-Stage Multimodal Network [J] . Tan Ke, Xu Yong, Zhang Shi-Xiong, Selected Topics in Signal Processing, IEEE Journal of . 2020,第3期

机译：具有两级多模式网络的视听语音分离和DeReveration
2. A Blind Channel Identification-Based Two-Stage Approach to Separation and Dereverberation of Speech Signals in a Reverberant Environment [J] . Huang Y., Benesty J., Chen J. IEEE Transactions on Speech and Audio Proceessing . 2005,第5期

机译：基于盲通道识别的两阶段混响环境中语音信号分离与去混响方法
3. Integration of deep learning with expectation maximization for spatial cue-based speech separation in reverberant conditions [J] . Gul Sania, Khan Muhammad Salman, Shah Syed Waqar Applied Acoustics . 2021,第Auga期

机译：深度学习与期望最大化对混响条件中的空间线索语音分离的最大化
4. Two-stage audio-visual speech dereverberation and separation based on models of the interaural spatial cues and spatial covariance [C] . Khan Muhammad Salman, Naqvi Syed Mohsen, Chambers Jonathon International Conference on Digital Signal Processing . 2013

机译：基于腔室空间线索和空间协方差的模型的两级视听语音放置和分离
5. The effects of pitch, reverberation, and spatial separation on the intelligibility of speech masked by speech in normal-hearing and hearing-impaired listeners. [D] . Carr, Suzanne Patricia. 2010

机译：音高，混响和空间分离对正常听觉和听觉受损听众中被语音掩盖的语音清晰度的影响。
6. Modelling of Human Low Frequency Sound Localization Acuity Demonstrates Dominance of Spatial Variation of Interaural Time Difference and Suggests Uniform Just-Noticeable Differences in Interaural Time Difference [O] . Rosanna C. G. Smith, Stephen R. Price 2010

机译：人类低频声音定位敏锐度的模型展示了耳间时差的空间变化优势，并暗示了耳间时差的均匀正好可察觉的差异
7. Convolutive speech separation by combining probabilistic models employing the interaural spatial cues and properties of the room assisted by vision [O] . Khan Muhammad Salman, ur-Rehman Ata, Liang Yanfeng, 2012

机译：通过结合采用听觉空间线索和房间属性并借助视觉的概率模型的卷积语音分离

Two-stage audio-visual speech dereverberation and separation based on models of the interaural spatial cues and spatial covariance

摘要

著录项

相似文献

相关主题

期刊订阅