首页> 外文会议>9th International conference on language resources and evaluation >Priberam Compressive Summarization Corpus: A New Multi-Document Summarization Corpus for European Portuguese
【24h】

Priberam Compressive Summarization Corpus: A New Multi-Document Summarization Corpus for European Portuguese

机译:PRIBERAM压缩摘要语料库:欧洲葡萄牙语的新多文件摘要语料库

获取原文

摘要

In this paper, we introduce the Priberam Compressive Summarization Corpus, a new multi-document summarization corpus for European Portuguese. The corpus follows the format of the summarization corpora for English in recent DUC and TAC conferences. It contains 80 manually chosen topics referring to events occurred between 2010 and 2013. Each topic contains 10 news stories from major Portuguese newspapers, radio and TV stations, along with two human generated summaries up to 100 words. Apart from the language, one important difference from the DUC/TAC setup is that the human summaries in our corpus are compressive: the annotators performed only sentence and word deletion operations, as opposed to generating summaries from scratch. We use this corpus to train and evaluate learning-based extractive and compressive summarization systems, providing an empirical comparison between these two approaches. The corpus is made freely available in order to facilitate research on automatic summarization.
机译:在本文中,我们介绍了欧洲葡萄牙语新的多文件摘要语料库的PRIBERAM压缩摘要语料库。该语料库在最近的DUC和TAC会议中遵循英语摘要的格式。它包含80个手动所选择的主题,指的是2010年和2013年之间发生的事件。每个主题都包含来自主要葡萄牙报纸,广播电视台的10个新闻故事,以及两个人生成的总结,最多100字。除了语言之外,DUC / TAC设置的一个重要区别是我们的语料库中的人类摘要是压缩:注释器只执行了句子和单词删除操作,而不是从头开始生成摘要。我们使用该语料库来培训和评估基于学习的提取和压缩摘要系统,在这两种方法之间提供了经验比较。语料库是免费提供的,以便于自动摘要研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号