首页> 外文期刊>Computational linguistics >Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models
【24h】

Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models

机译:使用CCG和对数线性模型进行大范围有效的统计解析

获取原文

摘要

This article describes a number of log-linear parsing models for an automatically extracted lexicalized grammar. The models are "full" parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Discriminative training is used to estimate the models, which requires incorrect parses for each sentence in the training data as well as the correct parse. The lexicalized grammar formalism used is Combinatory Categorial Grammar (CCG), and the grammar is automatically extracted from CCGbank, a CCG version of the Venn Treebank. The combination of discriminative training and an automatically extracted grammar leads to a significant memory requirement (up to 25 GB), which is satisfied using a parallel implementation of the BFGS optimization algorithm running on a Beowulf cluster. Dynamic programming over a packed chart, in combination with the parallel implementation, allows us to solve one of the largest-scale estimation problems in the statistical parsing literature in under three hours.rnA key component of the parsing system, for both training and testing, is a Maximum Entropy supertagger which assigns CCG lexical categories to words in a sentence. The super-tagger makes the discriminative training feasible, and also leads to a highly efficient parser. Surprisingly, given CCG's "spurious ambiguity," the parsing speeds are significantly higher than those reported for comparable parsers in the literature. We also extend the existing parsing techniques for CCG by developing a new model and efficient parsing algorithm which exploits all derivations, including CCG's nonstandard derivations. This model and parsing algorithm, when combined with normal-form constraints, give state-of-the-art accuracy for the recovery of predicate-argument dependencies from CCGbank. The parser is also evaluated on DepBank and compared against the RASP parser, outperforming RASP overall and on the majority of relation types. The evaluation on DepBank raises a number of issues regarding parser evaluation.rnThis article provides a comprehensive blueprint for building a wide-coverage CCG parser. We demonstrate that both accurate and highly efficient parsing is possible with CCG.
机译:本文介绍了自动提取词法化语法的许多对数线性分析模型。这些模型是“完整”的解析模型,从某种意义上说,概率是为完整的解析定义的,而不是为通过分解解析树而派生的独立事件定义的。区分训练用于估计模型,这需要训练数据中每个句子的不正确解析以及正确解析。所使用的词汇化语法形式主义是组合分类语法(CCG),并且该语法是从CCGbank(维恩树库的CCG版本)中自动提取的。区分性训练和自动提取的语法的结合导致显着的内存需求(最大25 GB),这是通过并行运行在Beowulf集群上的BFGS优化算法来满足的。打包图表上的动态编程与并行实现相结合,使我们能够在不到三小时的时间内解决统计解析文献中最大规模的估计问题之一。rn解析系统的关键组件,用于培训和测试,是最大熵超级标尺,它将CCG词汇类别分配给句子中的单词。超级标记使辨别训练变得可行,并且还导致了高效的解析器。出乎意料的是,考虑到CCG的“虚假歧义”,解析速度明显高于文献中可比解析器的报告速度。我们还通过开发一种新模型和有效的解析算法,扩展了CCG的现有解析技术,该算法利用了所有派生,包括CCG的非标准派生。与正常形式的约束条件结合使用时,该模型和解析算法可为从CCGbank中恢复谓词参数依存关系提供最新的准确性。还可以在DepBank上对解析器进行评估,并与RASP解析器进行比较,从而在整体和大多数关系类型上均胜过RASP。 DepBank的评估提出了有关解析器评估的许多问题。rn本文为构建广泛的CCG解析器提供了全面的蓝图。我们证明,使用CCG可以进行准确而高效的解析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号