A consolidated view of loss functions for supervised deep learning-based speech enhancement

机译：基于深度学习的语言增强的损失职能的综合思考

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep learning-based speech enhancement for real-time applications recently made large advancements. Due to the lack of a tractable perceptual optimization target, many myths around training losses emerged, whereas the contribution to success of the loss functions in many cases has not been investigated isolated from other factors such as network architecture, features, or training procedures. In this work, we investigate a wide variety of loss spectral functions for a recurrent neural network architecture suitable to operate in online frame-by-frame processing. We relate magnitude-only with phase-aware losses, ratios, correlation metrics, and compressed metrics. Our results reveal that combining magnitude-only with phase-aware objectives always leads to improvements, even when the phase is not enhanced. Furthermore, using compressed spectral values also yields a significant improvement. On the other hand, phase-sensitive improvement is best achieved by linear domain losses such as mean absolute error.

机译：基于深度学习的语音增强，用于实时应用最近取得了大量进步。由于缺乏贸易感知优化目标，出现了许多培训损失的神话，而对损失职能成功的贡献尚未调查从网络架构，特征或培训程序等其他因素中分离出来的孤立。在这项工作中，我们研究了适合于在线帧间处理中运行的经常性神经网络架构的各种损耗光谱功能。我们仅通过相位感知损失，比率，相关度量和压缩度量相关幅度。我们的结果表明，只有相位感知目标的相结合幅度始终导致改进，即使不增强阶段。此外，使用压缩光谱值也产生显着的改善。另一方面，通过平均绝对误差如平均绝对误差，最佳地实现相位敏感的改进。

著录项

来源
《International Conference on Telecommunications and Signal Processing》|2021年|72-76|共5页
会议地点
作者
Sebastian Braun; Ivan Tashev;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Measurement; Recurrent neural networks; Frequency-domain analysis; Speech enhancement; Signal processing; Network architecture;

机译：培训;测量;经常性神经网络;频域分析;语音增强;信号处理;网络架构;

相似文献

外文文献
中文文献
专利

1. On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement [J] . Morten Kolbæk, Zheng-Hua Tan, Søren Holdt Jensen, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2020,第期

机译：关于监督单型时域语音增强的损失函数
2. Contextual deep learning-based audio-visual switching for speech enhancement in real-world environments [J] . Information Fusion . 2020,第期

机译：基于语境的基于深度学习的音频视觉切换，用于真实环境中的语音增强
3. Using Generalized Gaussian Distributions to Improve Regression Error Modeling for Deep Learning-Based Speech Enhancement [J] . Chai Li, Du Jun, Liu Qing-Feng, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2019,第12期

机译：使用广义高斯分布来改善基于深度学习的语音增强的回归误差建模
4. Loss Functions for Deep Monaural Speech Enhancement [C] . Jan Freiwald, Lea Schönherr, Christopher Schymura, International Joint Conference on Neural Networks . 2020

机译：用于深层单声道语音增强的损失函数
5. Deep Learning-Based Reconstruction of Volumetric CT Images of Vertebrae from a Single View X-Ray Image [D] . Xiang, Mingren. 2020

机译：从单视图X射线图像中基于深度学习的椎骨容量CT图像的重建
6. Biosignal Sensors and Deep Learning-Based Speech Recognition: A Review [O] . Wookey Lee, Jessica Jiwon Seong, Busra Ozlu, 2021

机译：生物关键传感器与基于深度学习的语音识别：审查
7. Progressive loss functions for speech enhancement with deep neural networks [O] . Jorge Llombart, Dayana Ribas, Antonio Miguel, 2021

机译：具有深神经网络的语音增强渐进损失函数

A consolidated view of loss functions for supervised deep learning-based speech enhancement

摘要

著录项

相似文献

相关主题

期刊订阅