...
首页> 外文期刊>Journal of computer sciences >A FRAMEWORK FOR MULTILINGUAL TEXT-INDEPENDENT SPEAKER IDENTIFICATION SYSTEM
【24h】

A FRAMEWORK FOR MULTILINGUAL TEXT-INDEPENDENT SPEAKER IDENTIFICATION SYSTEM

机译:多语种独立于文本的说话人识别系统的框架

获取原文
获取原文并翻译 | 示例

摘要

This article evaluates the performance of Extreme Learning Machine (ELM) and Gaussian Mixture Model (GMM) in the context of text independent Multi lingual speaker identification for recorded and synthesized speeches. The type and number of filters in the filter bank, number of samples in each frame of the speech signal and fusion of model scores play a vital role in speaker identification accuracy and are analyzed in this article. Extreme Learning Machine uses a single hidden layer feed forward neural network for multilingual speaker identification. The individual Gaussian components of GMM best represent speaker-dependent spectral shapes that are effective in speaker identity. Both the modeling techniques make use of Linear Predictive Residual Cepstral Coefficient (LPRCC), Mel Frequency Cepstral Coefficient (MFCC), Modified Mel Frequency Cepstral Coefficient (MMFCC) and Bark Frequency Cepstral Coefficient (BFCC) features to represent the speaker specific attributes of speech signals. Experimental results show that GMM outperforms ELM with speaker identification accuracy of 97.5% with frame size of 256 and frame shift of half of frame size and filter bank size of 40.
机译:本文评估了极端学习机(ELM)和高斯混合模型(GMM)在文本独立的多语言说话者识别的情况下对录制和合成的语音的性能。滤波器组中滤波器的类型和数量,语音信号每帧中的样本数量以及模型分数的融合对于说话人识别准确度起着至关重要的作用,本文对此进行了分析。 Extreme Learning Machine使用单个隐藏层前馈神经网络进行多语言说话者识别。 GMM的各个高斯分量最好地代表了与说话者相关的频谱形状,这些形状对说话者的身份有效。两种建模技术都使用线性预测残留倒谱系数(LPRCC),梅尔频率倒谱系数(MFCC),修正梅尔频率倒谱系数(MMFCC)和树皮频率倒谱系数(BFCC)功能来表示语音信号的说话者特定属性。实验结果表明,GMM优于ELM,其说话人识别精度为97.5%,帧大小为256,帧移位为帧大小的一半,滤波器组大小为40。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号