首页> 外文学位 >Toward more effective acoustic model clustering by more efficient use of data in speech recognition.
【24h】

Toward more effective acoustic model clustering by more efficient use of data in speech recognition.

机译:通过在语音识别中更有效地使用数据来实现更有效的声学模型聚类。

获取原文
获取原文并翻译 | 示例

摘要

In current large vocabulary continuous speech recognition systems, multivariate Gaussian mixture distributions and context-dependent phones, typically triphones, are used to achieve high accuracy acoustic models. It is crucial to address the problem of how to estimate an extremely large number of model parameters from a limited amount of training data. The traditional approach uses phonetic decision tree based context clustering for reducing free parameters. However, this approach has several problems that might cause system performance degradation. All of these problems are due to the fact that the traditional approach does not efficiently use the limited training data and therefore fails to obtain effective acoustic models. Specifically, three problems are identified and addressed. The first problem is that all states clustered in a leaf node must share the same set of Gaussian components and mixture weights; no distinction is provided among those states. The second problem is due to the fact that triphones that are rarely seen in the training data might be poorly estimated and this causes an adverse effect on decision-tree clustering. The traditional approach lacks an effective mechanism to handle this. The third problem is that only single-Gaussian distributions are used to build decision trees whereas multiple-Gaussian mixture distributions are used in the final model set.; In this thesis, we propose to improve the quality of acoustic models by making use of training data more efficiently. We present a number of ways to address the problems in the traditional approach, namely, a two-level decision tree approach for the first problem, a two-stage decision tree based approach and a MAP-based approach for the second problem, and an approach using a new criterion and an effective clustering algorithm for the third problem. Each of these approaches has successfully reduced the word error rate (WER) of the traditional approach with a statistical significance. Finally the system combining all new approaches has achieved the best performance, which reduced the WER of the baseline system by 14% to 17% relative, with the sizes of acoustic models smaller than those of the baseline models by 8% to 11%.
机译:在当前的大词汇量连续语音识别系统中,使用多元高斯混合分布和上下文相关的电话(通常为三音机)来实现高精度声学模型。解决如何从数量有限的训练数据中估计大量模型参数的问题至关重要。传统方法使用基于语音决策树的上下文聚类来减少自由参数。但是,此方法有几个问题,可能会导致系统性能下降。所有这些问题都是由于传统方法无法有效利用有限的训练数据,因此无法获得有效的声学模型这一事实造成的。具体来说,确定并解决了三个问题。第一个问题是,聚集在叶节点中的所有状态必须共享同一组高斯分量和混合权重;这些州之间没有区别。第二个问题是由于以下事实:在训练数据中很少见到的三音素可能估算不佳,这会对决策树聚类产生不利影响。传统方法缺乏有效的机制来处理此问题。第三个问题是,仅单高斯分布用于构建决策树,而多高斯混合分布用于最终模型集。本文提出通过更有效地利用训练数据来提高声学模型的质量。我们提出了许多方法来解决传统方法中的问题,即针对第一个问题的两级决策树方法,基于两阶段决策树的方法和针对第二个问题的基于MAP的方法,以及方法使用新准则和有效的聚类算法解决第三个问题。这些方法中的每一种都成功地降低了具有统计意义的传统方法的单词错误率(WER)。最终,结合所有新方法的系统达到了最佳性能,相对于基准模型,声学模型的尺寸减小了8%至11%,从而使基准系统的WER相对降低了14%至17%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号