首页> 外文会议>2017 IEEE International Joint Conference on Biometrics >Extracting sub-glottal and Supra-glottal features from MFCC using convolutional neural networks for speaker identification in degraded audio signals

【24h】

Extracting sub-glottal and Supra-glottal features from MFCC using convolutional neural networks for speaker identification in degraded audio signals

机译：使用卷积神经网络从MFCC中提取声门下和声门上特征，以识别降级音频信号中的说话人

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a deep learning based algorithm for speaker recognition from degraded audio signals. We use the commonly employed Mel-Frequency Cepstral Coefficients (MFCC) for representing the audio signals. A convolutional neural network (CNN) based on 1D filters, rather than 2D filters, is then designed. The filters in the CNN are designed to learn inter-dependency between cepstral coefficients extracted from audio frames of fixed temporal expanse. Our approach aims at extracting speaker dependent features, like Sub-glottal and Supra-glottal features, of the human speech production apparatus for identifying speakers from degraded audio signals. The performance of the proposed method is compared against existing baseline schemes on both synthetically and naturally corrupted speech data. Experiments convey the efficacy of the proposed architecture for speaker recognition.

机译：我们提出了一种基于深度学习的算法，用于从降级的音频信号中进行说话人识别。我们使用常用的梅尔频率倒谱系数（MFCC）表示音频信号。然后设计基于1D过滤器而不是2D过滤器的卷积神经网络（CNN）。 CNN中的滤波器旨在了解从固定时间范围的音频帧提取的倒谱系数之间的相互依赖性。我们的方法旨在提取人类语音产生设备的依赖于说话者的特征，例如声门下和声门上特征，以从降级的音频信号中识别说话者。将该方法的性能与现有基准方案在合成和自然损坏的语音数据上进行比较。实验传达了所提出的体系用于说话人识别的功效。

著录项

来源
《2017 IEEE International Joint Conference on Biometrics》|2017年|608-617|共10页
会议地点 Denver(US)
作者
Anurag Chowdhury; Arun Ross;
展开▼
作者单位

Michigan State University;

Michigan State University;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Mel frequency cepstral coefficient; Speech; Feature extraction; Speaker recognition; Noise measurement; Speech recognition;

机译：梅尔频率倒谱系数；语音；特征提取；说话人识别；噪声测量；语音识别；;

相似文献

外文文献
中文文献
专利

1. Fusing MFCC and LPC Features Using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals [J] . Chowdhury Anurag, Ross Arun IEEE transactions on information forensics and security . 2020,第期

机译：使用一维三重态CNN融合MFCC和LPC功能，以在严重降级的音频信号中识别扬声器
2. Query by Example of Speaker Audio Signals using Power Spectrum and MFCCs [J] . Pafan Doungpaisan, Anirach Mingkhwan International Journal of Electrical and Computer Engineering . 2017,第6期

机译：使用功率谱和MFCC通过扬声器音频信号示例查询
3. Deep neural network framework and transformed MFCCs for speaker's age and gender classification [J] . Qawaqneh Zakariya, Abu Mallouh Arafat, Barkana Buket D. Knowledge-Based Systems . 2017,第JANa1期

机译：深度神经网络框架和转换后的MFCC用于说话人的年龄和性别分类
4. Extracting sub-glottal and Supra-glottal features from MFCC using convolutional neural networks for speaker identification in degraded audio signals [C] . Anurag Chowdhury, Arun Ross International Joint Conference on Biometrics . 2017

机译：使用卷积神经网络从MFCC中提取副光泽和上文 - 光学特征，用于降级音频信号中的扬声器识别
5. A novel approach in the detection of obstructive sleep apnea from electrocardiogram signals using neural network classification of textural features extracted from time-frequency plots. [D] . Al-Abed, Mohammad Ahmad. 2006

机译：一种使用从时频图提取的纹理特征的神经网络分类从心电图信号中检测阻塞性睡眠呼吸暂停的新方法。
6. Can pre-trained convolutional neural networks be directly used as a feature extractor for video-based neonatal sleep and wake classification? [O] . Muhammad Awais, Xi Long, Bin Yin, 2020

机译：可以预先训练的卷积神经网络直接用作基于视频的新生儿睡眠和唤醒分类的特征提取器吗？
7. Cough Classification for COVID-19 based on audio mfcc features using Convolutional Neural Networks [O] . Vipin Bansal, Gaurav Pahwa, Nirmal Kannan 2020

机译：基于音频MFCC功能的Covid-19咳嗽分类，使用卷积神经网络

Extracting sub-glottal and Supra-glottal features from MFCC using convolutional neural networks for speaker identification in degraded audio signals

摘要

著录项

相似文献

相关主题

期刊订阅