...
首页> 外文期刊>Applied Acoustics >Combining adaptive sparse NMF feature extraction and soft mask to optimize DNN for speech enhancement
【24h】

Combining adaptive sparse NMF feature extraction and soft mask to optimize DNN for speech enhancement

机译:组合自适应稀疏NMF特征提取和软掩码,优化DNN进行语音增强

获取原文
获取原文并翻译 | 示例
           

摘要

In masking-based deep neural network (DNN) speech enhancement, the time-frequency masking value cannot be estimated accurately because the potential structure information of speech is ignored. In this paper, a speech enhancement method is proposed by combining adaptive sparse non-negative matrix factorization (NMF) feature extraction and soft mask to optimize DNN, using the advantages of the sparse matrix in catching the protruding structure of speech and combining with optimized masking-based prediction. First, considering the dominance of speech and noise interference in different noisy speech signals, this paper proposes a new method for estimating soft mask value, and the initial soft mask value is estimated by using speech cochleagram and noise cochleagram. Then, speech cochleagram and noise cochleagram are learned separately by the sparse NMF (SNMF) to obtain a joint dictionary. The noisy speech is sparsely represented on the joint dictionary, and the adaptive adjustment factor related to the changes of speech and noise dictionary is added to obtain the sparse coefficient. The sparse coefficient is used as the input of the DNN model, and the initial soft mask value is used as the learning label to estimate the final soft mask value. Finally, the estimated soft mask value is combined with the noisy speech cochleagram to obtain enhanced speech. Compared with other methods, the results show that 1.6039 dB increases the average signal-to-noise ratio (SNR) of the proposed method, the average perceptual evaluation of speech quality (PESQ) is increased by 0.1994, and the average short-time objective intelligibility (STOI) is improved by 0.0271, which fully illustrate the superiority of the proposed algorithm. (C) 2020 Elsevier Ltd. All rights reserved.
机译:在基于掩蔽的深神经网络(DNN)语音增强中,不能准确地估计时频屏蔽值,因为忽略了语音的潜在结构信息。在本文中,通过组合自适应稀疏非负矩阵分子分解(NMF)特征提取和软掩模来优化DNN的语音增强方法,利用稀疏矩阵捕获语音突出结构的优点,以及用优化掩模结合基于预测。首先,考虑到不同嘈杂的语音信号中言语和噪声干扰的主导地位,本文提出了一种估计软掩模值的新方法,通过使用语音脚踏板和噪声脚踏板估计初始软掩模值。然后,通过稀疏NMF(SNMF)分开学习语音脚踏板和噪声脚踏板以获得联合字典。嘈杂的语音在联合字典上略微表示,并且添加与语音和噪声字典的变化相关的自适应调整因子以获得稀疏系数。稀疏系数用作DNN模型的输入,初始软掩模值用作学习标签以估计最终软掩码值。最后,估计的软掩模值与嘈杂的语音Cochleegram相结合以获得增强的语音。与其他方法相比,结果表明,1.6039dB增加了所提出的方法的平均信噪比(SNR),语音质量(PESQ)的平均感知评估增加了0.1994,平均短时间目标可懂度(STOI)得到0.0271的改善,这完全说明了所提出的算法的优越性。 (c)2020 elestvier有限公司保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号