A CONSISTENT AND EFFICIENT ESTIMATOR FOR DATA-ORIENTED PARSING

ANDREAS ZOLLMANN; KHALIL SIMAAN

首页> 外文期刊>Journal of Automata, Languages and Combinatorics >A CONSISTENT AND EFFICIENT ESTIMATOR FOR DATA-ORIENTED PARSING

【24h】

A CONSISTENT AND EFFICIENT ESTIMATOR FOR DATA-ORIENTED PARSING

机译：面向数据的解析的一致且有效的估计

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Given a sequence of samples from an unknown probability distribution, a statistical estimator aims at providing an approximate guess of the distribution by utilizing statistics from the samples. One crucial property of a 'good' estimator is that its guess approaches the unknown distribution as the sample sequence grows large. This property is called consistency. This paper concerns estimators for natural language parsing under the Data-Oriented Parsing (DOP) model. The DOP model specifies how a probabilistic grammar is acquired from statistics over a given training treebank, a corpus of sentence-parse pairs. Recently, Johnson [15] showed that the DOP estimator (called DOP1) is biased and inconsistent. A second relevant problem with DOP1 is that it suffers from an overwhelming computational inefficiency. This paper presents the first (nontrivial) consistent estimator for the DOP model. The new estimator is based on a combination of held-out estimation and a bias toward parsing with shorter derivations. To justify the need for a biased estimator in the case of DOP, we prove that every non-overfitting DOP estimator is statistically biased. Our choice for the bias toward shorter derivations is justified by empirical experience, mathematical convenience and efficiency considerations. In support of our theoretical results of consistency and computational efficiency, we also report experimental results with the new estimator.

机译：给定一系列来自未知概率分布的样本，统计估计器旨在通过利用样本中的统计数据来提供对该分布的近似猜测。 “好的”估计量的一个关键特性是，随着样本序列的增大，其估计值接近未知分布。此属性称为一致性。本文涉及面向数据的解析（DOP）模型下自然语言解析的估计量。 DOP模型指定如何从给定的训练树库（句子-句法对对的语料库）上的统计信息中获取概率语法。最近，约翰逊[15]表明DOP估计量（称为DOP1）是有偏差的和不一致的。 DOP1的第二个相关问题是它遭受了压倒性的计算效率。本文介绍了DOP模型的第一个（非平凡的）一致估计量。新的估算器基于保持的估算和偏向于使用较短派生的解析的组合。为了证明在DOP情况下需要有偏估计量，我们证明了每个非拟合DOP估计量在统计上都是有偏见的。我们选择偏向于较短的导数的方法是通过经验，数学便利性和效率方面的考虑来证明的。为了支持一致性和计算效率的理论结果，我们还使用新的估算器报告了实验结果。

著录项

来源
《Journal of Automata, Languages and Combinatorics》 |2005年第3期|共22页
作者
ANDREAS ZOLLMANN; KHALIL SIMAAN;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动控制理论;
关键词
Statistical parsing; Data-oriented parsing; Consistent estimator;

机译：统计解析;面向数据的解析;一致估计;

相似文献

外文文献
中文文献
专利

1. A CONSISTENT AND EFFICIENT ESTIMATOR FOR DATA-ORIENTED PARSING [J] . ANDREAS ZOLLMANN, KHALIL SIMAAN Journal of Automata, Languages and Combinatorics . 2005,第2a3期

机译：面向数据的解析的一致且有效的估计
2. Data-Oriented Parsing [J] . Dan Klein Computational linguistics . 2004,第2期

机译：面向数据的解析
3. Data-Oriented Parsing [J] . Dan Klein Computational linguistics . 2004,第2期

机译：面向数据的解析
4. Experience-Consistent Fuzzy Rule-Based Systems: An Enhancement of Data-Oriented Fuzzy Modeling [C] . Jianguo Huang, rnLijie Zhang, rnYunshan Hou, 2007 international conference on intelligent systems and knowledge engineering (ISKE 2007) . 2007

机译：基于经验的模糊规则系统：面向数据的模糊建模的增强
5. Constraining the mass of M31: A Monte Carlo mass estimator using self-consistent disk-bulge-halo galaxy models. [D] . Crosby, Matthew. 2006

机译：约束M31的质量：使用自洽的圆盘凸出晕星系模型的蒙特卡洛质量估计器。
6. Self-Consistent Nonparametric Maximum Likelihood Estimator of the Bivariate Survivor Function [O] . R. L. Prentice -1

机译：二元生存函数的自洽非参数最大似然估计
7. The data-oriented parsing approach: Theory and application [O] . Bod R., Fulcher J., Jain L.C. 2008

机译：面向数据的解析方法：理论与应用

A CONSISTENT AND EFFICIENT ESTIMATOR FOR DATA-ORIENTED PARSING

摘要

著录项

相似文献

相关主题

期刊订阅