...
首页> 外文期刊>Journal of Automata, Languages and Combinatorics >A CONSISTENT AND EFFICIENT ESTIMATOR FOR DATA-ORIENTED PARSING
【24h】

A CONSISTENT AND EFFICIENT ESTIMATOR FOR DATA-ORIENTED PARSING

机译:面向数据的解析的一致且有效的估计

获取原文
获取原文并翻译 | 示例
           

摘要

Given a sequence of samples from an unknown probability distribution, a statistical estimator aims at providing an approximate guess of the distribution by utilizing statistics from the samples. One crucial property of a 'good' estimator is that its guess approaches the unknown distribution as the sample sequence grows large. This property is called consistency. This paper concerns estimators for natural language parsing under the Data-Oriented Parsing (DOP) model. The DOP model specifies how a probabilistic grammar is acquired from statistics over a given training treebank, a corpus of sentence-parse pairs. Recently, Johnson [15] showed that the DOP estimator (called DOP1) is biased and inconsistent. A second relevant problem with DOP1 is that it suffers from an overwhelming computational inefficiency. This paper presents the first (nontrivial) consistent estimator for the DOP model. The new estimator is based on a combination of held-out estimation and a bias toward parsing with shorter derivations. To justify the need for a biased estimator in the case of DOP, we prove that every non-overfitting DOP estimator is statistically biased. Our choice for the bias toward shorter derivations is justified by empirical experience, mathematical convenience and efficiency considerations. In support of our theoretical results of consistency and computational efficiency, we also report experimental results with the new estimator.
机译:给定一系列来自未知概率分布的样本,统计估计器旨在通过利用样本中的统计数据来提供对该分布的近似猜测。 “好的”估计量的一个关键特性是,随着样本序列的增大,其估计值接近未知分布。此属性称为一致性。本文涉及面向数据的解析(DOP)模型下自然语言解析的估计量。 DOP模型指定如何从给定的训练树库(句子-句法对对的语料库)上的统计信息中获取概率语法。最近,约翰逊[15]表明DOP估计量(称为DOP1)是有偏差的和不一致的。 DOP1的第二个相关问题是它遭受了压倒性的计算效率。本文介绍了DOP模型的第一个(非平凡的)一致估计量。新的估算器基于保持的估算和偏向于使用较短派生的解析的组合。为了证明在DOP情况下需要有偏估计量,我们证明了每个非拟合DOP估计量在统计上都是有偏见的。我们选择偏向于较短的导数的方法是通过经验,数学便利性和效率方面的考虑来证明的。为了支持一致性和计算效率的理论结果,我们还使用新的估算器报告了实验结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号