首页> 外文期刊>Natural language engineering >A scaffolding approach to coreference resolution integrating statistical and rule-based models
【24h】

A scaffolding approach to coreference resolution integrating statistical and rule-based models

机译:整合统计模型和基于规则的模型的共引用解析的脚手架方法

获取原文
获取原文并翻译 | 示例
           

摘要

We describe a scaffolding approach to the task of coreference resolution that incrementally combines statistical classifiers, each designed for a particular mention type, with rule-based models (for sub-tasks well-matched to determinism). We motivate our design by an oracle-based analysis of errors in a rule-based coreference resolution system, showing that rule-based approaches are poorly suited to tasks that require a large lexical feature space, such as resolving pronominal and common-noun mentions. Our approach combines many advantages: it incrementally builds clusters integrating joint information about entities, uses rules for deterministic phenomena, and integrates rich lexical, syntactic, and semantic features with random forest classifiers well-suited to modeling the complex feature interactions that are known to characterize the coreference task. We demonstrate that all these decisions are important. The resulting system achieves 63.2 Fl on the CoNLL-2012 shared task dataset, outperforming the rule-based starting point by over seven F1 points. Similarly, our system outperforms an equivalent sieve-based approach that relies on logistic regression classifiers instead of random forests by over four Fl points. Lastly, we show that by changing the coreference resolution system from relying on constituent-based syntax to using dependency syntax, which can be generated in linear time, we achieve a runtime speedup of 550 per cent without considerable loss of accuracy.
机译:我们描述了一种用于共指解决任务的脚手架方法,该方法将统计分类器(每个分类器专门针对特定的提及类型)与基于规则的模型(针对与确定性非常匹配的子任务)进行增量组合。我们通过对基于规则的共指解决系统中的错误进行基于oracle的分析来激发我们的设计,这表明基于规则的方法不适用于需要大量词汇特征空间的任务,例如解析代词和名词用法。我们的方法具有许多优点:它逐步建立了集成有关实体的联合信息的集群,使用确定性现象的规则,并将丰富的词汇,句法和语义特征与随机森林分类器相集成,这些随机分类器非常适合于建模已知表征的复杂特征交互共参考任务。我们证明所有这些决定都是重要的。最终的系统在CoNLL-2012共享任务数据集上达到63.2 Fl,比基于规则的起点高出七个F1点。同样,我们的系统优于基于筛分的等效方法,该方法依赖于逻辑回归分类器,而不是随机森林超过四个Fl点。最后,我们表明,通过将共引用解析系统从依赖于基于成分的语法更改为使用可以在线性时间内生成的依赖语法,我们可以在不显着降低准确性的情况下实现550%的运行时加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号