首页> 外文期刊>電子情報通信学会技術研究報告. 言語理解とコミュニケーション. Natural Language Understanding and Models of Communication >Unsupervised Speaker Adaptation Based on HMM Sufficient Statistics Using Multiple Acoustic Models Under Noisy Environment
【24h】

Unsupervised Speaker Adaptation Based on HMM Sufficient Statistics Using Multiple Acoustic Models Under Noisy Environment

机译:噪声环境下基于HMM充分统计的多种声学模型的无监督说话人自适应

获取原文
获取原文并翻译 | 示例
       

摘要

Speaker adaptation in speech recognition is necessary to achieve a high accuracy for wide varieties of speakers. On the other hand, using class-dependent (CD) acoustic model for specific gender/age class can result to a better accuracy than a single speaker-independent (SI) model. In this research, we extend the unsupervised speaker adaptation based on HMM Sufficient Statistics (HMM-SS) for multiple database and multiple initial models, given a wide varieties of speech database. As opposed to the conventional approach which utilizes only a single SI model as a base model, the proposed method makes use of multiple CD models to push up the performance of initial model before adaptation. A speaker's class is estimated from the N-best neighbor speakers by Gaussian Mixture Models (GMM) on the way of speaker selection, and the corresponding CD model is adopted as a base model. Then, the unsupervised speaker adaptation is performed by constructing HMM from HMM-SS of the selected speakers. Experiments were carried out on two database namely, adults and senior people by JNAS, and we performed testing under noisy environment conditions such as office, crowd, booth and car noise with 20dB SNR. Recognition results show that the proposed method based on multiple model outperforms the conventional approach. Moreover, comparison with the Maximum Likelihood Linear Regression (MLLR) adaptation with 10 supervised utterance confirms that our method perfroms better with only a single utterance input.
机译:语音识别中的说话人自适应是实现多种说话人的高精度所必需的。另一方面,对于特定的性别/年龄类别,使用与类别相关的(CD)声学模型可以比单个与说话者无关的(SI)模型获得更高的准确性。在这项研究中,我们在给定语音数据库种类繁多的情况下,将基于HMM足够统计量(HMM-SS)的无监督说话人适应性扩展到多个数据库和多个初始模型。与仅使用单个SI模型作为基础模型的常规方法相反,该方法利用多个CD模型来提高适应之前的初始模型的性能。通过高斯混合模型(GMM)从说话者选择的N个最佳邻居说话者中估计说话者的类别,并采用相应的CD模型作为基础模型。然后,通过从所选说话者的HMM-SS构造HMM来执行无监督说话者自适应。 JNAS在成年人和老年人这两个数据库上进行了实验,我们在嘈杂的环境条件下进行了测试,例如办公室,人群,展位和20dB SNR的汽车噪音。识别结果表明,该基于多模型的方法优于传统方法。此外,与具有10个受监督话语的最大似然线性回归(MLLR)适应性进行比较,证实了我们的方法仅使用单个话语输入即可表现更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号