首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >A Joint Learning Algorithm for Complex-Valued T-F Masks in Deep Learning-Based Single-Channel Speech Enhancement Systems
【24h】

A Joint Learning Algorithm for Complex-Valued T-F Masks in Deep Learning-Based Single-Channel Speech Enhancement Systems

机译:基于深度学习的单通道语音增强系统中复数T-F掩模的联合学习算法

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a joint learning algorithm for complex-valued time-frequency (T-F) masks in single-channel speech enhancement systems. Most speech enhancement algorithms operating in a single-channel microphone environment aim to enhance the magnitude component in a T-F domain, while the input noisy phase component is used directly without any processing. Consequently, the mismatch between the processed magnitude and the unprocessed phase degrades the sound quality. To address this issue, a learning method of targeting a T-F mask that is defined in a complex domain has recently been proposed. However, due to a wide dynamic range and an irregular spectrogram pattern of the complex-valued T-F mask, the learning process is difficult even with a large-scale deep learning network. Moreover, the learning process targeting the T-F mask itself does not directly minimize the distortion in spectra or time domains. In order to address these concerns, we focus on three issues: 1) an effective estimation of complex numbers with a wide dynamic range; 2) a learning method that is directly related to speech enhancement performance; and 3) a way to resolve the mismatch between the estimated magnitude and phase spectra. In this study, we propose objective functions that can solve each of these issues and train the network by minimizing them with a joint learning framework. The evaluation results demonstrate that the proposed learning algorithm achieves significant performance improvement in various objective measures and subjective preference listening test.
机译:本文介绍了单通道语音增强系统中的复值时间频率(T-F)掩码的联合学习算法。在单通道麦克风环境中操作的大多数语音增强算法旨在增强T-F域中的幅度分量,而输入的噪声相位分量直接使用而无需任何处理。因此,处理后幅度与未处理相位之间的不匹配会降低了音质。为了解决这个问题,最近已经提出了一种定义在复杂域中定义的T-F掩模的学习方法。然而,由于复值为T-F掩模的宽动态范围和不规则的频谱图,即使具有大规模的深度学习网络,学习过程也是困难的。此外,针对T-F掩模本身的学习过程不会直接最小化光谱或时间域中的失真。为了解决这些问题,我们专注于三个问题:1)有效估计具有广泛动态范围的复数; 2)一种与语音增强性能直接相关的学习方法; 3)一种解决估计幅度和相位光谱之间不匹配的方法。在这项研究中,我们提出了客观的函数,可以通过将它们与联合学习框架最小化来解决这些问题并培训网络。评估结果表明,所提出的学习算法在各种客观措施和主观偏好听测测试中实现了显着的性能改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号