首页> 外文会议>International natural language generation conference >KPTimes: A Large-Scale Dataset for Keyphrase Generation on News Documents
【24h】

KPTimes: A Large-Scale Dataset for Keyphrase Generation on News Documents

机译:KPTimes:用于新闻文档关键字生成的大规模数据集

获取原文

摘要

Keyphrase generation is the task of predicting a set of lexical units that conveys the main content of a source text. Existing datasets for keyphrase generation are only readily available for the scholarly domain and include nonexpert annotations. In this paper we present KPTimes, a large-scale dataset of news texts paired with editor-curated keyphrases. Exploring the dataset, we show how editors tag documents, and how their annotations differ from those found in existing datasets. We also train and evaluate state-of-the-art neural keyphrase generation models on KPTimes to gain insights on how well they perform on the news domain. The dataset is available online at https: // github.com/ygorg/KPTimes.
机译:关键字短语的生成是预测一组词汇单元的任务,该词汇单元传达源文本的主要内容。用于关键字短语生成的现有数据集仅可用于学术领域,并且包含非专家注释。在本文中,我们介绍了KPTimes,这是新闻文本的大型数据集,并配有编辑者策划的关键词。探索数据集,我们展示了编辑者如何标记文档以及它们的注释与现有数据集中的注释有何不同。我们还将在KPTimes上训练和评估最新的神经关键字短语生成模型,以了解它们在新闻领域的表现如何。该数据集可从https://github.com/ygorg/KPTimes在线获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号