首页> 外文期刊>Journal of Telecommunications and Information Technology >Genetic Algorithm for Combined Speaker and Speech Recognition using Deep Neural Networks
【24h】

Genetic Algorithm for Combined Speaker and Speech Recognition using Deep Neural Networks

机译:使用深神经网络组合扬声器和语音识别的遗传算法

获取原文
获取原文并翻译 | 示例
           

摘要

Huge growth is observed in the speech and speaker recognition field due to many artificial intelligence algorithms being applied. Speech is used to convey messages via the lan- guage being spoken, emotions, gender and speaker identity. Many real applications in healthcare are based upon speech and speaker recognition, e.g. a voice-controlled wheelchair helps control the chair. In this paper, we use a genetic algo- rithm (GA) for combined speaker and speech recognition, rely- ing on optimized Mel Frequency Cepstral Coefficient (MFCC) speech features, and classification is performed using a Deep Neural Network (DNN). In the first phase, feature extraction using MFCC is executed. Then, feature optimization is per- formed using GA. In the second phase training is conducted using DNN. Evaluation and validation of the proposed work model is done by setting a real environment, and efficiency is calculated on the basis of such parameters as accuracy, precision rate, recall rate, sensitivity, and specificity. Also, this paper presents an evaluation of such feature extraction methods as linear predictive coding coefficient (LPCC), per- ceptual linear prediction (PLP), mel frequency cepstral coef- ficients (MFCC) and relative spectra filtering (RASTA), with all of them used for combined speaker and speech recogni- tion systems. A comparison of different methods based on existing techniques for both clean and noisy environments is made as well.
机译:由于应用了许多人工智能算法,在语音和扬声器识别领域中观察到巨大的增长。演讲用于通过口语,情绪,性别和扬声器身份传达信息。医疗保健中的许多真实应用都基于言语和扬声器识别,例如言语和扬声器识别。语音控制的轮椅有助于控制椅子。在本文中,我们使用用于组合扬声器和语音识别的遗传算法(GA),依赖于优化的MEL频率谱系统系数(MFCC)语音特征,并且使用深神经网络(DNN)进行分类。在第一阶段,执行使用MFCC的特征提取。然后,使用GA每种特征优化。在第二阶段训练中使用DNN进行。通过设定真实环境来完成所提出的工作模型的评估和验证,并且基于这种参数计算效率,作为准确性,精度率,召回率,灵敏度和特异性。此外,本文介绍了这种特征提取方法的评估,作为线性预测编码系数(LPCC),肌发线性预测(PLP),MEL频率谱系核心核心(MFCC)和相对谱滤波(RASTA),所有它们用于组合扬声器和语音识别系统。还制作了基于现有技术的不同方法的比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号