首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Hidden Markov Acoustic Modeling With Bootstrap and Restructuring for Low-Resourced Languages
【24h】

Hidden Markov Acoustic Modeling With Bootstrap and Restructuring for Low-Resourced Languages

机译:具有自举和低资源语言重构的隐马尔可夫声学建模

获取原文
获取原文并翻译 | 示例

摘要

This paper proposes an acoustic modeling approach based on bootstrap and restructuring to dealing with data sparsity for low-resourced languages. The goal of the approach is to improve the statistical reliability of acoustic modeling for automatic speech recognition (ASR) in the context of speed, memory and response latency requirements for real-world applications. In this approach, randomized hidden Markov models (HMMs) estimated from the bootstrapped training data are aggregated for reliable sequence prediction. The aggregation leads to an HMM with superior prediction capability at cost of a substantially larger size. For practical usage the aggregated HMM is restructured by Gaussian clustering followed by model refinement. The restructuring aims at reducing the aggregated HMM to a desirable model size while maintaining its performance close to the original aggregated HMM. To that end, various Gaussian clustering criteria and model refinement algorithms have been investigated in the full covariance model space before the conversion to the diagonal covariance model space in the last stage of the restructuring. Large vocabulary continuous speech recognition (LVCSR) experiments on Pashto and Dari have shown that acoustic models obtained by the proposed approach can yield superior performance over the conventional training procedure with almost the same run-time memory consumption and decoding speed.
机译:本文提出了一种基于引导和重构的声学建模方法,以处理资源匮乏的语言的数据稀疏性。该方法的目标是在现实应用的速度,内存和响应延迟要求的背景下,提高用于自动语音识别(ASR)的声学模型的统计可靠性。在这种方法中,将从自举训练数据估计的随机隐马尔可夫模型(HMM)进行汇总,以进行可靠的序列预测。该聚集导致具有显着更大尺寸的代价的,具有优异预测能力的HMM。在实际应用中,聚合的HMM通过高斯聚类进行重构,然后进行模型优化。重组旨在将聚合的HMM减小到所需的模型大小,同时保持其性能接近原始聚合的HMM。为此,在重构的最后阶段转换为对角协方差模型空间之前,已经在完整的协方差模型空间中研究了各种高斯聚类准则和模型优化算法。在Pashto和Dari上进行的大词汇量连续语音识别(LVCSR)实验表明,通过所提出的方法获得的声学模型可以在几乎相同的运行时内存消耗和解码速度下产生优于常规训练过程的性​​能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号