On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement

Morten Kolbæk; Zheng-Hua Tan; Søren Holdt Jensen; Jesper Jensen

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement

【24h】

On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement

机译：关于监督单型时域语音增强的损失函数

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many deep learning-based speech enhancement algorithms are designed to minimize the mean-square error (MSE) in some transform domain between a predicted and a target speech signal. However, optimizing for MSE does not necessarily guarantee high speech quality or intelligibility, which is the ultimate goal of many speech enhancement algorithms. Additionally, only little is known about the impact of the loss function on the emerging class of time-domain deep learning-based speech enhancement systems. We study how popular loss functions influence the performance of time-domain deep learning-based speech enhancement systems. First, we demonstrate that perceptually inspired loss functions might be advantageous over classical loss functions like MSE. Furthermore, we show that the learning rate is a crucial design parameter even for adaptive gradient-based optimizers, which has been generally overlooked in the literature. Also, we found that waveform matching performance metrics must be used with caution as they in certain situations can fail completely. Finally, we show that a loss function based on scale-invariant signal-to-distortion ratio (SI-SDR) achieves good general performance across a range of popular speech enhancement evaluation metrics, which suggests that SI-SDR is a good candidate as a general-purpose loss function for speech enhancement systems.

机译：许多基于深度学习的语音增强算法被设计成最小化预测和目标语音信号之间的一些变换域中的平均误差（MSE）。然而，优化MSE不一定保证高语音质量或可懂度，这是许多语音增强算法的最终目标。此外，关于损耗功能对基于时域深度学习的语音增强系统的损失功能的影响只有很少。我们研究流行损失功能如何影响基于时域深度学习的语音增强系统的性能。首先，我们证明了感知的灵感损失函数可能是古典损失功能，如MSE。此外，我们表明，即使对于基于自适应梯度的优化器，学习率也是一种重要的设计参数，其通常被忽视在文献中。此外，我们发现波形匹配性能指标必须谨慎使用，因为它们在某些情况下它们可以完全失败。最后，我们表明，基于尺度不变的信号到失真率（SI-SDR）的损失函数在一系列流行的语音增强评估指标中实现了良好的一般性表现，这表明SI-SI-SDS是一个良好的候选者语音增强系统通用损失功能。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2020年第2020期|825-838|共14页
作者
Morten Kolbæk; Zheng-Hua Tan; Søren Holdt Jensen; Jesper Jensen;
展开▼
作者单位

Department of Electronic Systems Aalborg University Aalborg Denmark;

Department of Electronic Systems Aalborg University Aalborg Denmark;

Department of Electronic Systems Aalborg University Aalborg Denmark;

Department of Electronic Systems Aalborg University Aalborg Denmark;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Speech enhancement; Time-domain analysis; Noise measurement; Mean square error methods; Training;

机译：语音增强;时间域分析;噪声测量;均方误差方法;培训;

相似文献

外文文献
中文文献
专利

1. Supervised monaural speech enhancement using two-level complementary joint sparse representations [J] . Fu Jiafei, Zhang Long, Ye Zhongfu Applied Acoustics . 2018,第MARa期

机译：使用两级互补联合稀疏表示进行有监督的单声道语音增强
2. FLGCNN: A novel fully convolutional neural network for end-to-end monaural speech enhancement with utterance-based objective functions [J] . Zhu Yuanyuan, Xu Xu, Ye Zhongfu Applied Acoustics . 2020,第Deca期

机译：FLGCNN：具有基于话语的目标功能的端到端单声道语音增强新颖的全卷积神经网络
3. A Supervised Learning Approach to Monaural Segregation of Reverberant Speech [J] . Zhaozhang JinDeLiang Wang Audio, Speech, and Language Processing, IEEE Transactions on . 2009,第4期

机译：混响语音单声道隔离的一种监督学习方法
4. Loss Functions for Deep Monaural Speech Enhancement [C] . Jan Freiwald, Lea Schönherr, Christopher Schymura, International Joint Conference on Neural Networks . 2020

机译：用于深层单声道语音增强的损失函数
5. Design of loss functions and feature transformation for minimum classification error based automatic speech recognition [D] . Ratnagiri, Madhavi Vedula 2011

机译：基于最小分类误差的自动语音识别损失函数设计和特征变换
6. Executive functions predict weight loss in a medically supervised weight loss programme [O] . R. Galioto, D. Bond, J. Gunstad, 2016

机译：执行功能在医学监督的减肥计划中预测减肥
7. SUPERVISED AND SEMI-SUPERVISED SUPPRESSION OF BACKGROUND MUSIC IN MONAURAL SPEECH RECORDINGS [O] . Felix Weninger, Jordi Feliu, Björn Schuller 2013

机译：单声道录音中背景音乐的监督和半监督抑制
8. Supervised Learning Approach to Monaural Segregation of Reverberant Speech. [R] . Jin, Z., Wang, D. 2008

机译：单调混响语音分离的监督学习方法。

On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement

摘要

著录项

相似文献

相关主题

期刊订阅