首页> 外文会议>2017 IEEE International Joint Conference on Biometrics >Extracting sub-glottal and Supra-glottal features from MFCC using convolutional neural networks for speaker identification in degraded audio signals
【24h】

Extracting sub-glottal and Supra-glottal features from MFCC using convolutional neural networks for speaker identification in degraded audio signals

机译:使用卷积神经网络从MFCC中提取声门下和声门上特征,以识别降级音频信号中的说话人

获取原文
获取原文并翻译 | 示例

摘要

We present a deep learning based algorithm for speaker recognition from degraded audio signals. We use the commonly employed Mel-Frequency Cepstral Coefficients (MFCC) for representing the audio signals. A convolutional neural network (CNN) based on 1D filters, rather than 2D filters, is then designed. The filters in the CNN are designed to learn inter-dependency between cepstral coefficients extracted from audio frames of fixed temporal expanse. Our approach aims at extracting speaker dependent features, like Sub-glottal and Supra-glottal features, of the human speech production apparatus for identifying speakers from degraded audio signals. The performance of the proposed method is compared against existing baseline schemes on both synthetically and naturally corrupted speech data. Experiments convey the efficacy of the proposed architecture for speaker recognition.
机译:我们提出了一种基于深度学习的算法,用于从降级的音频信号中进行说话人识别。我们使用常用的梅尔频率倒谱系数(MFCC)表示音频信号。然后设计基于1D过滤器而不是2D过滤器的卷积神经网络(CNN)。 CNN中的滤波器旨在了解从固定时间范围的音频帧提取的倒谱系数之间的相互依赖性。我们的方法旨在提取人类语音产生设备的依赖于说话者的特征,例如声门下和声门上特征,以从降级的音频信号中识别说话者。将该方法的性能与现有基准方案在合成和自然损坏的语音数据上进行比较。实验传达了所提出的体系用于说话人识别的功效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号