...
首页> 外文期刊>Natural language engineering >A PDTB-styled end-to-end discourse parser
【24h】

A PDTB-styled end-to-end discourse parser

机译:PDTB风格的端到端语篇解析器

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Since the release of the large discourse-level annotation of the Penn Discourse Treebank (PDTB), research work has been carried out on certain subtasks of this annotation, such as disambiguating discourse connectives and classifying Explicit or Implicit relations. We see a need to construct a full parser on top of these subtasks and propose a way to evaluate the parser. In this work, we have designed and developed an end-to-end discourse parser-to-parse free texts in the PDTB style in a fully data-driven approach. The parser consists of multiple components joined in a sequential pipeline architecture, which includes a connective classifier, argument labeler, explicit classifier, non-explicit classifier, and attribution span labeler. Our trained parser first identifies all discourse and non-discourse relations, locates and labels their arguments, and then classifies the sense of the relation between each pair of arguments. For the identified relations, the parser also determines the attribution spans, if any, associated with them. We introduce novel approaches to locate and label arguments, and to identify attribution spans. We also significantly improve on the current state-of-the-art connective classifier. We propose and present a comprehensive evaluation from both component-wise and error-cascading perspectives, in which we illustrate how each component performs in isolation, as well as how the pipeline performs with errors propagated forward. The parser gives an overall system F_1 score of 46.80 percent for partial matching utilizing gold standard parses, and 38.18 percent with full automation.
机译:自从Penn话语树库(PDTB)的大型话语级注释发布以来,已经对该注释的某些子任务进行了研究工作,例如消除话语连接词的歧义和对显式或隐式关系进行分类。我们认为有必要在这些子任务之上构造一个完整的解析器,并提出一种评估解析器的方法。在这项工作中,我们以完全数据驱动的方式设计和开发了PDTB风格的端到端话语解析器到解析自由文本。解析器由加入顺序管道体系结构中的多个组件组成,该组件包括连接性分类器,参数标签器,显式分类器,非显式分类器和属性跨度标签器。我们训练有素的解析器首先识别所有话语和非话语关系,找到并标记它们的论据,然后对每对论据之间的关系进行分类。对于标识的关系,解析器还确定与它们相关联的属性范围(如果有)。我们介绍了新颖的方法来定位和标记自变量,并识别归因范围。我们还在当前最先进的连接分类器上进行了显着改进。我们从组件方面和错误级联的角度提出并提出了一个综合评估,其中我们说明了每个组件如何独立执行,以及管道在向前传播的错误下如何执行。对于使用金标准解析器进行部分匹配的分析器,该系统的总体系统F_1得分为46.80%,对于完全自动化的系统则为38.18%。

著录项

  • 来源
    《Natural language engineering》 |2014年第2期|151-184|共34页
  • 作者单位

    Department of Computer Science, National University of Singapore 13 Computing Drive, Singapore 117417,SAP Research, SAP Asia Pte Ltd, 30 Pasir Panjang Road, Singapore 117440;

    Department of Computer Science, National University of Singapore 13 Computing Drive, Singapore 117417;

    Department of Computer Science, National University of Singapore 13 Computing Drive, Singapore 117417;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号