首页> 外文会议>Advances in natural language processing >Large-Scale Language Modeling with Random Forests for Mandarin Chinese Speech-to-Text
【24h】

Large-Scale Language Modeling with Random Forests for Mandarin Chinese Speech-to-Text

机译:普通话语音到文本的随机森林大规模语言建模

获取原文
获取原文并翻译 | 示例

摘要

In this work the random forest language modeling approach is applied with the aim of improving the performance of the LIMSI, highly competitive, Mandarin Chinese speech-to-text system. The experimental setup is that of the GALE Phase 4 evaluation. This setup is characterized by a large amount of available language model training data (over 3.2 billion segmented words). A conventional unpruned 4-gram language model with a vocabulary of 56K words serves as a baseline that is challenging to improve upon. However moderate perplexity and CER improvements over this model were obtained with a random forest language model. Different random forest training strategies were explored so as to attain the maximal gain in performance and Forest of Random Forest language modeling scheme is introduced.
机译:在这项工作中,采用了随机森林语言建模方法,旨在提高LIMSI,高度竞争的普通话语音到文本系统的性能。实验设置是GALE阶段4评估的设置。这种设置的特点是拥有大量可用的语言模型训练数据(超过32亿个分段词)。具有56K个单词的词汇量的常规未删节4克语言模型用作难以改进的基线。但是,使用随机森林语言模型可以获得对该模型适度的困惑和CER改进。探索了不同的随机森林训练策略,以获得最大的性能提升,并介绍了随机森林语言建模方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号