首页> 外文学位 >Maximum entropy language modeling with non-local dependencies.
【24h】

Maximum entropy language modeling with non-local dependencies.

机译:具有非本地依赖性的最大熵语言建模。

获取原文
获取原文并翻译 | 示例

摘要

Stochastic language models are an important component in many natural language processing applications, such as automatic speech recognition and machine translation. A language model is a probability measure on word-sequences in a language. The most widely used models are N-gram models, which treat word sequences as a Markov process and predict the next word from the preceding N-1 words. For reasons of data sparseness, N is typically 2-4. N-gram models successfully “learn” local lexical dependencies but fail to capture syntactic well-formedness in sentences and semantic coherence within and across sentences.; To improve the performance of language models, two critical problems must be solved: first, deciding what kinds of long-range dependence should be used in language models, and second, determining how dependencies from different sources can be incorporated in a sound model. In this dissertation, a new language model is presented that overcomes some of the shortcomings of N-gram models by combining collocational dependencies with two sources of important long-range dependence: the syntactic structure of a sentence and the topic of a discourse. Maximum entropy techniques, which are particularly well suited for modeling diverse sources of statistical dependence, are used.; Previously known parameter estimation procedures for maximum entropy models have a computational cost that makes them impractical for large-scale applications, including the two language modeling tasks examined in this dissertation. Some fundamental algorithmic improvements in the parameter estimation procedure for maximum entropy models are presented. The computational complexity of the model parameter estimation is reduced by 2–3 orders of magnitude.; Significant improvements clue to the new language model over a trigram model are demonstrated in perplexity and in word error rate for the Switchboard and the Broadcast News tasks. Experimental results show that topic information is most helpful on content-bearing words, and syntactic structure is more useful when meaningful predicting words cannot be captured by N-grams. Experimental results also show that the topic dependence and the syntactic dependence are complementary and the gains from modeling them are nearly additive. A comparison of maximum entropy models with other models proposed in the literature is provided throughout the dissertation.
机译:随机语言模型是许多自然语言处理应用程序(例如自动语音识别和机器翻译)中的重要组成部分。语言模型是一种语言中单词序列的概率度量。使用最广泛的模型是N-gram模型,该模型将单词序列视为马尔可夫过程,并根据前面的N-1个单词预测下一个单词。由于数据稀疏的原因,N通常为2-4。 N-gram模型成功地“学习”了本地词汇的依存关系,但未能捕获句子中句法语法的正确性以及句子内部和句子之间的语义连贯性。为了提高语言模型的性能,必须解决两个关键问题:首先,确定在语言模型中应使用哪种远程依赖关系;其次,确定如何将来自不同来源的依赖关系合并到声音模型中。本文提出了一种新的语言模型,它通过将搭配依赖与重要的长期依赖的两个来源结合在一起,从而克服了N-gram模型的一些缺点:句子的句法结构和语篇主题。使用了最大熵技术,该技术特别适合于建模各种统计依赖性源。先前已知的用于最大熵模型的参数估计程序具有计算量,这使其不适用于大规模应用,包括本文中研究的两种语言建模任务。在最大熵模型的参数估计过程中,提出了一些基本的算法改进。模型参数估计的计算复杂度降低了2-3个数量级。在总机和广播新闻任务的困惑和单词错误率方面,证明了新语言模型相对于三字母组合词模型的重大改进线索。实验结果表明,主题信息对含内容的单词最有帮助,而有意义的预测单词不能被N-gram捕获时,句法结构则更有用。实验结果还表明,主题依赖和句法依赖是互补的,对它们的建模所获得的收益几乎是可加的。全文中将最大熵模型与文献中提出的其他模型进行了比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号