首页> 外文期刊>IEEE Transactions on Speech and Audio Proceessing >Rapid discriminative acoustic model based on eigenspace mapping for fast speaker adaptation
【24h】

Rapid discriminative acoustic model based on eigenspace mapping for fast speaker adaptation

机译:基于特征空间映射的快速判别声学模型,用于说话人快速适应

获取原文
获取原文并翻译 | 示例
       

摘要

It is widely believed that strong correlations exist across an utterance as a consequence of time-invariant characteristics of speaker and acoustic environments. It is verified in this paper that the first primary eigendirections of the utterance covariance matrix are speaker dependent. Based on this observation, a novel family of fast speaker adaptation algorithms entitled Eigenspace Mapping (EigMap) is proposed. The proposed algorithms are applied to continuous density Hidden Markov Model (HMM) based speech recognition. The EigMap algorithm rapidly constructs discriminative acoustic models in the test speaker's eigenspace by preserving discriminative information learned from baseline models in the directions of the test speaker's eigenspace. Moreover, the adapted models are compressed by discarding model parameters that are assumed to contain no discrimination information. The core idea of EigMap can be extended in many ways, and a family of algorithms based on EigMap is described in this paper. Unsupervised adaptation experiments show that EigMap is effective in improving baseline models using very limited amounts of adaptation data with superior performance to conventional adaptation techniques such as MLLR and block diagonal MLLR. A relative improvement of 18.4% over a baseline recognizer is achieved using EigMap with only about 4.5 s of adaptation data. Furthermore, it is also demonstrated that EigMap is additive to MLLR by encompassing important speaker dependent discriminative information. A significant relative improvement of 24.6% over baseline is observed using 4.5 s of adaptation data by combining MLLR and EigMap techniques.
机译:人们普遍认为,由于说话者和声学环境的时不变特性,整个话语之间存在很强的相关性。本文证明了话语协方差矩阵的第一主要特征方向是说话者相关的。基于此观察结果,提出了一种新颖的快速说话人自适应算法家族,称为本征空间映射(EigMap)。该算法被应用于基于连续密度隐马尔可夫模型(HMM)的语音识别。 EigMap算法通过保留从基线模型学到的方向说话者特征空间的信息,从而在测试说话者特征空间中快速构建判别声学模型。此外,通过丢弃假定不包含判别信息的模型参数来压缩适配的模型。 EigMap的核心思想可以通过多种方式扩展,本文介绍了一系列基于EigMap的算法。无监督的适应性实验表明,EigMap使用有限数量的适应性数据可有效改善基线模型,并且具有优于常规适应性技术(例如MLLR和块对角MLLR)的性能。使用EigMap仅使用约4.5 s的适应数据,即可相对于基线识别器实现18.4%的相对改进。此外,还证明了EigMap通过包含重要的依赖于说话者的区分性信息而成为MLLR的补充。通过将MLLR和EigMap技术结合使用4.5 s的适应数据,可以观察到相对于基线有24.6%的显着相对改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号