Complex Ratio Masking for Monaural Speech Separation

Williamson Donald S.; Wang Yuxuan; Wang DeLiang

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Complex Ratio Masking for Monaural Speech Separation

【24h】

Complex Ratio Masking for Monaural Speech Separation

机译：用于单声道语音分离的复数比率掩蔽

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Speech separation systems usually operate on the short-time Fourier transform (STFT) of noisy speech, and enhance only the magnitude spectrum while leaving the phase spectrum unchanged. This is done because there was a belief that the phase spectrum is unimportant for speech enhancement. Recent studies, however, suggest that phase is important for perceptual quality, leading some researchers to consider magnitude and phase spectrum enhancements. We present a supervised monaural speech separation approach that simultaneously enhances the magnitude and phase spectra by operating in the complex domain. Our approach uses a deep neural network to estimate the real and imaginary components of the ideal ratio mask defined in the complex domain. We report separation results for the proposed method and compare them to related systems. The proposed approach improves over other methods when evaluated with several objective metrics, including the perceptual evaluation of speech quality (PESQ), and a listening test where subjects prefer the proposed approach with at least a 69% rate.

机译：语音分离系统通常在嘈杂语音的短时傅立叶变换（STFT）上运行，并且仅增强幅度谱而保持相位谱不变。这样做是因为人们认为相位频谱对于语音增强不重要。然而，最近的研究表明，相位对于感知质量很重要，导致一些研究人员考虑幅度和相位频谱的增强。我们提出了一种有监督的单声道语音分离方法，该方法通过在复杂域中进行操作来同时增强幅度和相位谱。我们的方法使用深层神经网络来估计复杂域中定义的理想比率蒙版的实部和虚部。我们报告所提出方法的分离结果，并将其与相关系统进行比较。当用几个客观指标进行评估时，所提出的方法比其他方法有所改进，包括语音质量的感知评估（PESQ）和受试者的听力测试，受试者更喜欢以至少69％的比率提出的方法。

著录项

来源
《Audio, Speech, and Language Processing, IEEE Transactions on》 |2016年第3期|483-492|共10页
作者
Williamson Donald S.; Wang Yuxuan; Wang DeLiang;
展开▼
作者单位

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Complex ideal ratio mask; deep neural networks; speech quality; speech separation;

机译：复杂理想比率蒙版;深度神经网络;语音质量;语音分离;

相似文献

外文文献
中文文献
专利

1. Optimum Soft Mask for Monaural Speech Separation System [J] . M. Dharmalingam, M. C. John Wiselin International journal of computing & information technology . 2019,第2期

机译：单声道语音分离系统的最佳软掩膜
2. Features for Masking-Based Monaural Speech Separation in Reverberant Conditions [J] . Masood Delfarah, DeLiang Wang Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2017,第5期

机译：混响条件下基于蒙版的单声道语音分离的功能
3. Monaural speech/music source separation using discrete energy separation algorithm [J] . Yevgeni Litvin, Israel Cohen, Dan Chazan Signal processing . 2010,第12期

机译：使用离散能量分离算法的单声道语音/音乐源分离
4. Review of Ideal Binary and Ratio Mask Estimation Techniques for Monaural Speech Separation [C] . T. M. Minipriya, R. Rajavel International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics . 2018

机译：单声道语音分离的理想二进制和比率掩码估计技术综述
5. The effect of spatial separation on informational and energetic masking of speech in normal-hearing and hearing-impaired listeners. [D] . Arbogast, Tanya Lee. 2003

机译：在正常听觉和听力受损的听众中，空间分离对语音的信息性和能量性掩蔽的影响。
6. Complex Ratio Masking for Monaural Speech Separation [O] . Donald S. Williamson, Yuxuan Wang, DeLiang Wang -1

机译：用于单声道语音分离的复数比率掩蔽
7. NMF based speech and music separation in monaural speech recordings with sparseness and temporal continuity constraints [O] . Tu Ming, Xie Xiang, Jiao Yishan 2013

机译：基于NMF的语音和音乐分离在单声道语音记录中，具有稀疏性和时间连续性约束
8. Deep Ensemble Learning for Monaural Speech Separation. [R] . Wang, D. 2015

机译：单声道语音分离的深度集成学习。

Complex Ratio Masking for Monaural Speech Separation

摘要

著录项

相似文献

相关主题

期刊订阅