首页> 外文OA文献 >A Parallel Training Algorithm for Hierarchical Pitman-Yor Process Language Models
【2h】

A Parallel Training Algorithm for Hierarchical Pitman-Yor Process Language Models

机译:分层Pitman-Yor过程语言模型的并行训练算法

摘要

The Hierarchical Pitman Yor Process Language Model (HPYLM) is a Bayesian language model based on a non-parametric prior, the Pitman-Yor Process. It has been demonstrated, both theoretically and practically, that the HPYLM can provide better smoothing for language modeling, compared with state-of-the-art approaches such as interpolated Kneser-Ney and modified Kneser-Ney smoothing. However, estimation of Bayesian language models is expensive in terms of both computation time and memory; the inference is approximate and requires a number of iterations to converge. In this paper, we present a parallel training algorithm for the HPYLM, which enables the approach to be applied in the context of automatic speech recognition, using large training corpora with large vocabularies. We demonstrate the effectiveness of the proposed algorithm by estimating language models from corpora for meeting transcription containing over 200 million words, and observe significant reductions in perplexity and word error rate.
机译:分层Pitman Yor过程语言模型(HPYLM)是基于非参数先验Pitman-Yor过程的贝叶斯语言模型。从理论上和实践上都证明,与诸如插值Kneser-Ney和改进的Kneser-Ney平滑之类的最新方法相比,HPYLM可以为语言建模提供更好的平滑。但是,就计算时间和存储而言,贝叶斯语言模型的估计是昂贵的。推断是近似的,需要多次迭代才能收敛。在本文中,我们提出了一种用于HPYLM的并行训练算法,该算法使该方法可以在具有大量词汇的大型训练语料库的情况下应用于自动语音识别。我们通过估计语料库中满足超过2亿个单词的转录的语言模型来证明所提出算法的有效性,并观察到困惑和单词错误率的显着降低。

著录项

  • 作者

    Huang Songfang; Renals Steve;

  • 作者单位
  • 年度 2009
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号