Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models

Stephen Clark; James R. Curran

首页> 外文期刊>Computational linguistics >Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models

【24h】

Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models

机译：使用CCG和对数线性模型进行大范围有效的统计解析

获取原文

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This article describes a number of log-linear parsing models for an automatically extracted lexicalized grammar. The models are "full" parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Discriminative training is used to estimate the models, which requires incorrect parses for each sentence in the training data as well as the correct parse. The lexicalized grammar formalism used is Combinatory Categorial Grammar (CCG), and the grammar is automatically extracted from CCGbank, a CCG version of the Venn Treebank. The combination of discriminative training and an automatically extracted grammar leads to a significant memory requirement (up to 25 GB), which is satisfied using a parallel implementation of the BFGS optimization algorithm running on a Beowulf cluster. Dynamic programming over a packed chart, in combination with the parallel implementation, allows us to solve one of the largest-scale estimation problems in the statistical parsing literature in under three hours.rnA key component of the parsing system, for both training and testing, is a Maximum Entropy supertagger which assigns CCG lexical categories to words in a sentence. The super-tagger makes the discriminative training feasible, and also leads to a highly efficient parser. Surprisingly, given CCG's "spurious ambiguity," the parsing speeds are significantly higher than those reported for comparable parsers in the literature. We also extend the existing parsing techniques for CCG by developing a new model and efficient parsing algorithm which exploits all derivations, including CCG's nonstandard derivations. This model and parsing algorithm, when combined with normal-form constraints, give state-of-the-art accuracy for the recovery of predicate-argument dependencies from CCGbank. The parser is also evaluated on DepBank and compared against the RASP parser, outperforming RASP overall and on the majority of relation types. The evaluation on DepBank raises a number of issues regarding parser evaluation.rnThis article provides a comprehensive blueprint for building a wide-coverage CCG parser. We demonstrate that both accurate and highly efficient parsing is possible with CCG.

机译：本文介绍了自动提取词法化语法的许多对数线性分析模型。这些模型是“完整”的解析模型，从某种意义上说，概率是为完整的解析定义的，而不是为通过分解解析树而派生的独立事件定义的。区分训练用于估计模型，这需要训练数据中每个句子的不正确解析以及正确解析。所使用的词汇化语法形式主义是组合分类语法（CCG），并且该语法是从CCGbank（维恩树库的CCG版本）中自动提取的。区分性训练和自动提取的语法的结合导致显着的内存需求（最大25 GB），这是通过并行运行在Beowulf集群上的BFGS优化算法来满足的。打包图表上的动态编程与并行实现相结合，使我们能够在不到三小时的时间内解决统计解析文献中最大规模的估计问题之一。rn解析系统的关键组件，用于培训和测试，是最大熵超级标尺，它将CCG词汇类别分配给句子中的单词。超级标记使辨别训练变得可行，并且还导致了高效的解析器。出乎意料的是，考虑到CCG的“虚假歧义”，解析速度明显高于文献中可比解析器的报告速度。我们还通过开发一种新模型和有效的解析算法，扩展了CCG的现有解析技术，该算法利用了所有派生，包括CCG的非标准派生。与正常形式的约束条件结合使用时，该模型和解析算法可为从CCGbank中恢复谓词参数依存关系提供最新的准确性。还可以在DepBank上对解析器进行评估，并与RASP解析器进行比较，从而在整体和大多数关系类型上均胜过RASP。 DepBank的评估提出了有关解析器评估的许多问题。rn本文为构建广泛的CCG解析器提供了全面的蓝图。我们证明，使用CCG可以进行准确而高效的解析。

著录项

来源
《Computational linguistics》 |2007年第4期|493-552|共页
作者
Stephen Clark; James R. Curran;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Wide-coverage efficient statistical parsing with CCG and log-linear models [J] . Clark S, Curran JR Computational linguistics . 2007,第4期

机译：使用CCG和对数线性模型进行广域有效的统计分析
2. Wide-coverage deep statistical parsing using automatic dependency structure annotation [J] . Cahill A, Burke M, ODonovan R, Computational linguistics . 2008,第1期

机译：使用自动依赖项结构注释的广域深度统计解析
3. Wide-Coverage Deep Statistical Parsing Using Automatic Dependency Structure Annotation [J] . Aoife Cahill, Michael Burke, Ruth ODonovan, Computational linguistics . 2008,第1期

机译：使用自动依赖项结构注释的大范围深度统计解析
4. Log-Linear Models for Wide-Coverage CCG Parsing [C] . Stephen Clark, James R. Curran Conference on Empirical Methods in Natural Language Processing; 20030711-12; Sapporo(JP) . 2003

机译：对数线性模型用于宽覆盖CCG解析
5. Towards efficient statistical parsing using lexicalized grammatical information. [D] . Chen, John. 2002

机译：使用词汇化的语法信息进行有效的统计解析。
6. Efficient calculation of heterogeneous non-equilibrium statistics in coupled firing-rate models [O] . Cheng Ly, Woodrow L. Shew, Andrea K. Barreiro 2019

机译：耦合点火率模型中异质非平衡统计的高效计算
7. Log-Linear Models for Wide-Coverage CCG Parsing [O] . Stephen Clark, James R. Curran 2003

机译：对数线性模型用于宽覆盖CCG解析
8. Application of Log-Linear Models to Statistical Record Linkage [R] . Odoroff, C. L. 1980

机译：对数线性模型在统计记录联动中的应用

Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅