首页> 外文期刊>Computational linguistics >A Scalable Distributed Syntactic, Semantic, and Lexical Language Model
【24h】

A Scalable Distributed Syntactic, Semantic, and Lexical Language Model

机译:可扩展的分布式句法,语义和词汇语言模型

获取原文
       

摘要

This paper presents an attempt at building a large scale distributed composite language model that is formed by seamlessly integrating an n-gram model, a structured language model, and probabilistic latent semantic analysis under a directed Markov random field paradigm to simultaneously account for local word lexical information, mid-range sentence syntactic structure, and long-span document semantic content. The composite language model has been trained by performing a convergent N-best list approximate EM algorithm and a follow-up EM algorithm to improve word prediction power on corpora with up to a billion tokens and stored on a supercomputer. The large scale distributed composite language model gives drastic perplexity reduction over n-grams and achieves significantly better translation quality measured by the Bleu score and “readability” of translations when applied to the task of re-ranking the N-best list from a state-of-the-art parsing-based machine translation system.
机译:本文提出了构建大型分布式复合语言模型的尝试,该模型是通过在有向马尔可夫随机字段范式下无缝集成n元语法模型,结构化语言模型和概率潜在语义分析以同时解释本地词词法而形成的信息,中档句法结构和大跨度文档语义内容。通过执行收敛的N最佳列表近似EM算法和后续的EM算法来训练复合语言模型,以提高具有多达十亿个令牌的语料库上的单词预测能力,并将其存储在超级计算机上。大规模的分布式复合语言模型可将N-gram的混乱程度大大降低,并通过将Bleu得分和翻译的“可读性”衡量出来的翻译质量(从州/州/直辖市/直辖市/直辖市/直辖市/直辖市/直辖市/直辖市/直辖市/直辖市/直辖市/直辖市/直辖市/直辖市/直辖市/直辖市/直辖市/直辖市/直辖市)办公室开始重新排序,可以显着提高翻译质量。最先进的基于解析的机器翻译系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号