首页> 外文会议>IEEE International Conference on Systems, Man, and Cybernetics >AMRConvNet: AMR-Coded Speech Enhancement Using Convolutional Neural Networks
【24h】

AMRConvNet: AMR-Coded Speech Enhancement Using Convolutional Neural Networks

机译:AMRCONVNET:使用卷积神经网络进行AMR编码的语音增强

获取原文
获取外文期刊封面目录资料

摘要

Speech is converted to digital signals using speech coding for efficient transmission. However, this often lowers the quality and bandwidth of speech. This paper explores the application of convolutional neural networks for Artificial Bandwidth Expansion (ABE) and speech enhancement on coded speech, particularly Adaptive Multi-Rate (AMR) used in 2G cellular phone calls. In this paper, we introduce AMRConvNet: a convolutional neural network that performs ABE and speech enhancement on speech encoded with AMR. The model operates directly on the time-domain for both input and output speech but optimizes using combined time-domain reconstruction loss and frequency-domain perceptual loss. AMRConvNet resulted in an average improvement of 0.425 Mean Opinion Score - Listening Quality Objective (MOS-LQO) points for AMR bitrate of 4.75k, and 0.073 MOS-LQO points for AMR bitrate of 12.2k. AMRConvNet also showed robustness in AMR bitrate inputs. Finally, an ablation test showed that our combined time-domain and frequency-domain loss leads to slightly higher MOS-LQO and faster training convergence than using either loss alone.
机译:使用语音编码转换为数字信号进行语音编码以进行高效传输。然而,这通常会降低语音的质量和带宽。本文探讨了对人工带宽扩展(ABE)和语音增强的卷积神经网络的应用,特别是在2G蜂窝电话中使用的适应性多速率(AMR)。在本文中,我们介绍了AMRConvnet:一个卷积神经网络,对与AMR编码的语音执行ABE和语音增强。该模型直接在时域上运行,以进行输入和输出语音,但使用组合时域重建损失和频域感知损失优化。 AMRCONVNET导致平均改善0.425的平均意见评分 - 聆听质量目标(MOS-LQO)点,适用于4.75K的AMR比特率为4.75K,为12.2K的AMR比特率为0.073 MOS-LQO点。 AMRCONVNET还在AMR比特率输入中显示了鲁棒性。最后,一种消融测试表明,我们的组合时域和频域损失导致MOS-LQO略高,速度更快,而不是单独使用损失。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号