首页> 外文会议>2017 Intelligent Systems Conference >Multi-dimensional I-vector closed set speaker identification based on an extreme learning machine with and without fusion technologies
【24h】

Multi-dimensional I-vector closed set speaker identification based on an extreme learning machine with and without fusion technologies

机译:基于具有和不具有融合技术的极限学习机的多维I向量向量闭集说话人识别

获取原文
获取原文并翻译 | 示例

摘要

In this article, I-vector Speaker Identification (SID) is exploited as a compact, low dimension, fixed length and modern state of the art system. The main structures for this study consist of four combinations of features which depend on Power Normalization Cepstral Coefficient (PNCC) and Mel Frequency Cepstral Coefficient (MFCC) features, with two different compensation approaches which have been previously proposed. The main system is modelled by I-vectors with low dimensions, and we also propose fusion strategies with different higher I-vector dimensions to improve the recognition rate. In addition, cumulative, concatenated, and interleaved fusion techniques are investigated to improve the conventional late fusion presented in our previous work. Moreover, the proposed system employs an Extreme Learning Machine (ELM) for classification purpose, which is efficient, less complex and less time consuming compared with traditional neural network based approaches. The system is evaluated on the TIMIT database for clean and AWGN environments and achieved a recognition rate of 96.67% and 80.83% respectively. The system shows improvements compared with the Gaussian Mixture Model-Universal Background Model (GMM-UBM) in our previously proposed scheme, with an improvement of 1.76% in clean speech and 2.1% for 30dB AWGN and with the highest improvement at 10dB with 43.81%.
机译:在本文中,I矢量说话者识别(SID)被用作一种紧凑,低尺寸,固定长度和现代技术水平的系统。这项研究的主要结构由四个特征组合组成,这些特征取决于功率归一化倒谱系数(PNCC)和梅尔频率倒谱系数(MFCC)的特征,并已提出了两种不同的补偿方法。主系统由低维I向量建模,我们还提出了不同I维向量较高的融合策略,以提高识别率。此外,还对累积,串联和交错融合技术进行了研究,以改进我们先前工作中提出的常规后期融合。此外,提出的系统采用了一种用于分类目的的极限学习机(ELM),与传统的基于神经网络的方法相比,它效率高,复杂度低,耗时少。该系统在TIMIT数据库上进行了清洁和AWGN环境评估,识别率分别为96.67%和80.83%。与我们先前提出的方案中的高斯混合模型-通用背景模型(GMM-UBM)相比,该系统显示出改进,纯语音的改进为1.76%,AWGN为30dB的改进为2.1%,而10dB的改进最大,为43.81% 。

著录项

  • 来源
    《2017 Intelligent Systems Conference》|2017年|1141-1146|共6页
  • 会议地点 London(GB)
  • 作者单位

    Communications, Sensors, Signal and Information Processing (ComSIP) Group, School of Electrical and Electronics Engineering, Newcastle University, NE1 7RU, UK;

    Communications, Sensors, Signal and Information Processing (ComSIP) Group, School of Electrical and Electronics Engineering, Newcastle University, NE1 7RU, UK;

    Communications, Sensors, Signal and Information Processing (ComSIP) Group, School of Electrical and Electronics Engineering, Newcastle University, NE1 7RU, UK;

    Communications, Sensors, Signal and Information Processing (ComSIP) Group, School of Electrical and Electronics Engineering, Newcastle University, NE1 7RU, UK;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Training; Mel frequency cepstral coefficient; Testing; Databases; Speech; Feature extraction; Hidden Markov models;

    机译:训练;梅尔频率倒谱系数;测试;数据库;语音;特征提取;隐马尔可夫模型;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号