首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >A New Framework for CNN-Based Speech Enhancement in the Time Domain
【24h】

A New Framework for CNN-Based Speech Enhancement in the Time Domain

机译:基于时域CNN语音增强的新框架

获取原文
获取原文并翻译 | 示例

摘要

This paper proposes a new learning mechanism for a fully convolutional neural network (CNN) to address speech enhancement in the time domain. The CNN takes as input the time frames of noisy utterance and outputs the time frames of the enhanced utterance. At the training time, we add an extra operation that converts the time domain to the frequency domain. This conversion corresponds to simple matrix multiplication, and is hence differentiable implying that a frequency domain loss can be used for training in the time domain. We use mean absolute error loss between the enhanced short-time Fourier transform (STFT) magnitude and the clean STFT magnitude to train the CNN. This way, the model can exploit the domain knowledge of converting a signal to the frequency domain for analysis. Moreover, this approach avoids the well-known invalid STFT problem since the proposed CNN operates in the time domain. Experimental results demonstrate that the proposed method substantially outperforms the other methods of speech enhancement. The proposed method is easy to implement and applicable to related speech processing tasks that require time-frequency masking or spectral mapping.
机译:本文为全卷积神经网络(CNN)提出了一种新的学习机制,以解决时域中的语音增强问题。 CNN将嘈杂话语的时间范围作为输入,并输出增强话语的时间范围。在训练时,我们添加了一个额外的操作,可将时域转换为频域。该转换对应于简单的矩阵乘法,因此可微分地暗示频域损耗可用于时域训练。我们使用增强的短时傅立叶变换(STFT)幅度和干净的STFT幅度之间的平均绝对误差损失来训练CNN。这样,模型可以利用将信号转换到频域进行分析的领域知识。此外,该方法避免了众所周知的无效STFT问题,因为建议的CNN在时域中运行。实验结果表明,所提出的方法明显优于其他语音增强方法。所提出的方法易于实现并且适用于需要时频掩蔽或频谱映射的相关语音处理任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号