首页> 外文期刊>Computer speech and language >Automatic cohesive summarization with pronominal anaphora resolution
【24h】

Automatic cohesive summarization with pronominal anaphora resolution

机译:具有代词照应分辨率的自动内聚摘要

获取原文
获取原文并翻译 | 示例
           

摘要

Automatic Text Summarization is the process of creating a compressed representation of one or more related documents, keeping only the most valuable information. The extractive approach for summarization is the most studied and aims to generate a compressed version of a document by identifying, ranking, and selecting the most relevant sentences or phrases from a text. The selected sentences go verbatim into the summary. However, this strategy may yield incoherent summaries, as pronominal coreferences may appear unbound. To alleviate this problem, this paper proposes a method that solves unbound pronominal anaphoric expressions, automatically enabling the cohesiveness of the extractive summaries. The proposed method can be applied to two distinct scenarios. The first one aims to find and fix unbound anaphoric expressions present in the generated summaries at a post-processing stage; whereas the second one is performed at the preprocessing stage of the proposed pipeline and generates an intermediate version of the input document that resolves the unbound pronominal coreferences. The proposed solution was evaluated on the CNN news corpus using the seventeen summarization techniques most widely acknowledged in the literature and four state-of-the-art summarization systems. Moreover, it also provides a comparative evaluation concerning two distinct assessment scenarios which are compared to a baseline. The experiments performed achieved very encouraging quantitative and qualitative results.
机译:自动文本摘要是创建一个或多个相关文档的压缩表示的过程,仅保留最有价值的信息。摘要的提取方法是研究最多的方法,旨在通过识别,排序和从文本中选择最相关的句子或短语来生成文档的压缩版本。选定的句子将逐字逐句地放入摘要中。但是,此策略可能会产生不连贯的摘要,因为代词共指可能看起来没有约束。为了缓解这个问题,本文提出了一种解决未绑定代词隐喻表达的方法,该方法自动启用提取摘要的内聚性。所提出的方法可以应用于两种不同的情况。第一个目标是在后期处理阶段查找并修复生成的摘要中存在的未绑定照应表达;而第二个则在提议的管道的预处理阶段执行,并生成输入文档的中间版本,该中间版本可解析未绑定的代名词共指。在CNN新闻语料库上使用在文献中最广泛认可的十七种汇总技术和四个最新的汇总系统对提出的解决方案进行了评估。此外,它还提供了关于两个不同评估方案的比较评估,这些方案与基准进行了比较。进行的实验获得了非常令人鼓舞的定量和定性结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号