AMRConvNet: AMR-Coded Speech Enhancement Using Convolutional Neural Networks

机译：AMRCONVNET：使用卷积神经网络进行AMR编码的语音增强

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Speech is converted to digital signals using speech coding for efficient transmission. However, this often lowers the quality and bandwidth of speech. This paper explores the application of convolutional neural networks for Artificial Bandwidth Expansion (ABE) and speech enhancement on coded speech, particularly Adaptive Multi-Rate (AMR) used in 2G cellular phone calls. In this paper, we introduce AMRConvNet: a convolutional neural network that performs ABE and speech enhancement on speech encoded with AMR. The model operates directly on the time-domain for both input and output speech but optimizes using combined time-domain reconstruction loss and frequency-domain perceptual loss. AMRConvNet resulted in an average improvement of 0.425 Mean Opinion Score - Listening Quality Objective (MOS-LQO) points for AMR bitrate of 4.75k, and 0.073 MOS-LQO points for AMR bitrate of 12.2k. AMRConvNet also showed robustness in AMR bitrate inputs. Finally, an ablation test showed that our combined time-domain and frequency-domain loss leads to slightly higher MOS-LQO and faster training convergence than using either loss alone.

机译：使用语音编码转换为数字信号进行语音编码以进行高效传输。然而，这通常会降低语音的质量和带宽。本文探讨了对人工带宽扩展（ABE）和语音增强的卷积神经网络的应用，特别是在2G蜂窝电话中使用的适应性多速率（AMR）。在本文中，我们介绍了AMRConvnet：一个卷积神经网络，对与AMR编码的语音执行ABE和语音增强。该模型直接在时域上运行，以进行输入和输出语音，但使用组合时域重建损失和频域感知损失优化。 AMRCONVNET导致平均改善0.425的平均意见评分 - 聆听质量目标（MOS-LQO）点，适用于4.75K的AMR比特率为4.75K，为12.2K的AMR比特率为0.073 MOS-LQO点。 AMRCONVNET还在AMR比特率输入中显示了鲁棒性。最后，一种消融测试表明，我们的组合时域和频域损失导致MOS-LQO略高，速度更快，而不是单独使用损失。

著录项

来源
《IEEE International Conference on Systems, Man, and Cybernetics》|2020年|1671-1676|共6页
会议地点
作者
Williard Joshua Jose;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
coded speech enhancement; adaptive multi-rate; artificial bandwidth expansion; audio super-resolution;

机译：编码语音增强;自适应多率;人工带宽扩张;音频超分辨率;

相似文献

外文文献
中文文献
专利

1. Encoding Detection and Bit Rate Classification of AMR-Coded Speech Based on Deep Neural Network [J] . Seong-Hyeon SHIN, Woo-Jin JANG, Ho-Won YUN, IEICE transactions on information and systems . 2018,第1期

机译：基于深度神经网络的AMR编码语音的编码检测和比特率分类
2. Deep convolutional neural network-based speech enhancement to improve speech intelligibility and quality for hearing-impaired listeners (Retraction of 2018) [J] . Rahiman P. F. Khaleelur, Jayanthi V. S., Jayanthi A. N. Medical and Biological Engineering and Computing: Journal of the International Federation for Medical and Biological Engineering . 2019,第3期

机译：基于深度卷积神经网络的语言增强，提高听力障碍听众的语音清晰度和质量（2018年撤回）
3. Time–Frequency Masking Based Online Multi-Channel Speech Enhancement With Convolutional Recurrent Neural Networks [J] . Chakrabarty Soumitro, Habets Emanuel A. P. Selected Topics in Signal Processing, IEEE Journal of . 2019,第4期

机译：卷积神经网络的基于时频屏蔽的在线多通道语音增强
4. Speech Emotion Recognition using Convolution Neural Networks and Deep Stride Convolutional Neural Networks [C] . Taiba Majid Wani, Teddy Surya Gunawan, Syed Asif Ahmad Qadri, International Conference on Wireless and Telematics . 2020

机译：使用卷积神经网络和深度跨步卷积神经网络的语音情感识别
5. Convolutional Neural Networks for Speaker-Independent Speech Recognition. [D] . Belilovsky, Eugene. 2011

机译：用于与说话人无关的语音识别的卷积神经网络。
6. 3D Convolutional Neural Networks Initialized from Pretrained 2D Convolutional Neural Networks for Classification of Industrial Parts [O] . Ibon Merino, Jon Azpiazu, Anthony Remazeilles, 2021

机译：3D卷积神经网络从佩带的2D卷积神经网络初始化用于工业部件的分类
7. Encoding Detection and Bit Rate Classification of AMR-Coded Speech Based on Deep Neural Network [O] . Seong-Hyeon SHIN, Woo-Jin JANG, Ho-Won YUN, 2018

机译：基于深神经网络编码AMR编码语音的检测和比特率分类

AMRConvNet: AMR-Coded Speech Enhancement Using Convolutional Neural Networks

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅