首页> 外文OA文献 >Whole-Sentence Exponential Language Models: A Vehicle for Linguistic-Statistical Integration
【2h】

Whole-Sentence Exponential Language Models: A Vehicle for Linguistic-Statistical Integration

机译:整句指数语言模型:语言统计整合的工具

摘要

We introduce an exponential language model which models a whole sentence or utterance as a single unit. By avoiding the chain rule, the model treats each sentence as a “bag of features”, where features are arbitrary computable properties of the sentence. The new model is computationally more efficient, and more naturally suited to modeling global sentential phenomena, than the conditional exponential (e.g. Maximum Entropy) models proposed to date. Using the model is straightforward. Training the model requires sampling from an exponential distribution. We describe the challenge of applying Monte Carlo Markov Chain (MCMC) and other sampling techniques to natural language, and discuss smoothing and step-size selection. We then present a novel procedure for feature selection, which exploits discrepancies between the existing model and the training corpus. We demonstrate our ideas by constructing and analyzing competitive models in the Switchboard domain, incorporating lexical and syntactic information.
机译:我们引入了一种指数语言模型,该模型将整个句子或话语建模为一个单元。通过避免链式规则,该模型将每个句子视为“特征包”,其中特征是句子的任意可计算属性。与迄今提出的条件指数(例如最大熵)模型相比,新模型在计算上更高效,更自然地适合于建模全局句子现象。使用模型很简单。训练模型需要从指数分布中采样。我们描述了将蒙特卡洛马尔可夫链(MCMC)和其他采样技术应用于自然语言所面临的挑战,并讨论了平滑和步长选择。然后,我们提出了一种特征选择的新颖程序,该程序利用了现有模型与训练语料库之间的差异。我们通过构建和分析Switchboard领域的竞争模型并结合词汇和句法信息来展示我们的想法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号