首页> 外文会议>Annual meeting of the Association for Computational Linguistics;ACL 2012 >Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining
【24h】

Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining

机译:通过子结构共享和训练对统计语言建模进行快速语法分析

获取原文

摘要

Long-span features, such as syntax, can improve language models for tasks such as speech recognition and machine translation. However, these language models can be difficult to use in practice because of the time required to generate features for rescoring a large hypothesis set. In this work, we propose substructure sharing, which saves duplicate work in processing hypothesis sets with redundant hypothesis structures. We apply substructure sharing to a dependency parser and part of speech tagger to obtain significant speedups, and further improve the accuracy of these tools through up-training. When using these improved tools in a language model for speech recognition, we obtain significant speed improvements with both TV-best and hill climbing rescoring, and show that up-training leads to WER reduction.
机译:大范围的功能(例如语法)可以改善语音识别和机器翻译等任务的语言模型。但是,由于生成与大型假设集对应的特征所需的时间,这些语言模型在实践中可能难以使用。在这项工作中,我们提出了子结构共享,它可以节省处理带有冗余假设结构的假设集时的重复工作。我们将子结构共享应用于依赖解析器和语音标记器的一部分,以获得显着的加速,并通过向上训练进一步提高这些工具的准确性。当在语言模型中使用这些经过改进的工具进行语音识别时,我们在电视最佳和爬山记录方面都获得了显着的速度改进,并表明向上训练可以降低WER。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号