首页> 外文会议>9th International conference on language resources and evaluation >Priberam Compressive Summarization Corpus: A New Multi-Document Summarization Corpus for European Portuguese
【24h】

Priberam Compressive Summarization Corpus: A New Multi-Document Summarization Corpus for European Portuguese

机译:Priberam压缩摘要语料库:一种新的针对欧洲葡萄牙语的多文档摘要语料库

获取原文

摘要

In this paper, we introduce the Priberam Compressive Summarization Corpus, a new multi-document summarization corpus for European Portuguese. The corpus follows the format of the summarization corpora for English in recent DUC and TAC conferences. It contains 80 manually chosen topics referring to events occurred between 2010 and 2013. Each topic contains 10 news stories from major Portuguese newspapers, radio and TV stations, along with two human generated summaries up to 100 words. Apart from the language, one important difference from the DUC/TAC setup is that the human summaries in our corpus are compressive: the annotators performed only sentence and word deletion operations, as opposed to generating summaries from scratch. We use this corpus to train and evaluate learning-based extractive and compressive summarization systems, providing an empirical comparison between these two approaches. The corpus is made freely available in order to facilitate research on automatic summarization.
机译:在本文中,我们介绍了Priberam压缩摘要语料库,这是一种针对欧洲葡萄牙语的新型多文档摘要语料库。语料库遵循最近的DUC和TAC会议中英语摘要语料库的格式。它包含80个手动选择的主题,涉及2010年至2013年之间发生的事件。每个主题都包含10个来自葡萄牙主要报纸,广播和电视台的新闻报道,以及两个人为生成的,最多100个单词的摘要。除了语言之外,与DUC / TAC设置的一个重要区别是,我们语料库中的人类摘要是压缩性的:注释器仅执行句子和单词删除操作,而不是从头开始生成摘要。我们使用该语料库来训练和评估基于学习的提取和压缩摘要系统,从而提供这两种方法之间的经验比较。免费提供语料库,以促进自动摘要的研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号