...
首页> 外文期刊>Machine Learning >Combining Statistical Language Models via the Latent Maximum Entropy Principle
【24h】

Combining Statistical Language Models via the Latent Maximum Entropy Principle

机译:通过潜在最大熵原理组合统计语言模型

获取原文
获取原文并翻译 | 示例
           

摘要

We present a unified probabilistic framework for statistical language modeling which can simultaneously incorporate various aspects of natural language, such as local word interaction, syntactic structure and semantic document information. Our approach is based on a recent statistical inference principle we have proposed—the latent maximum entropy principle—which allows relationships over hidden features to be effectively captured in a unified model. Our work extends previous research on maximum entropy methods for language modeling, which only allow observed features to be modeled. The ability to conveniently incorporate hidden variables allows us to extend the expressiveness of language models while alleviating the necessity of pre-processing the data to obtain explicitly observed features. We describe efficient algorithms for marginalization, inference and normalization in our extended models. We then use these techniques to combine two standard forms of language models: local lexical models (Markov N-gram models) and global document-level semantic models (probabilistic latent semantic analysis). Our experimental results on the Wall Street Journal corpus show that we obtain a 18.5% reduction in perplexity compared to the baseline tri-gram model with Good-Turing smoothing.
机译:我们为统计语言建模提供了一个统一的概率框架,该框架可以同时合并自然语言的各个方面,例如本地单词交互,句法结构和语义文档信息。我们的方法基于我们提出的最新统计推断原理-潜在最大熵原理-该原理允许在统一模型中有效捕获与隐藏特征有关的关系。我们的工作扩展了先前对语言建模的最大熵方法的研究,该方法仅允许对观察到的特征进行建模。方便地合并隐藏变量的能力使我们能够扩展语言模型的表达能力,同时减轻了对数据进行预处理以获得明确观察到的特征的必要性。我们在扩展模型中描述了用于边际化,推理和归一化的有效算法。然后,我们使用这些技术来组合语言模型的两种标准形式:局部词法模型(Markov N-gram模型)和全局文档级语义模型(概率潜在语义分析)。我们在《华尔街日报》语料库上的实验结果表明,与采用Good-Turing平滑的基线三元语法模型相比,我们的困惑度降低了18.5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号