...
首页> 外文期刊>Computer speech and language >QBSUM: A large-scale query-based document summarization dataset from real-world applications
【24h】

QBSUM: A large-scale query-based document summarization dataset from real-world applications

机译:qbsum:真实世界应用程序的基于大规模的查询文件摘要数据集

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Query-based document summarization aims to extract or generate a summary of a document which directly answers or is relevant to the search query. It is an important technique that can be beneficial to a variety of applications such as search engines, document-level machine reading comprehension, and chatbots. Currently, datasets designed for query-based summarization are short in numbers and existing datasets are also limited in both scale and quality. Moreover, to the best of our knowledge, there is no publicly available dataset for Chinese query-based document summarization. In this paper, we present QBSUM, a high-quality large-scale dataset consisting of 49,000+ data samples for the task of Chinese query-based document summarization. We also propose multiple unsupervised and supervised solutions to the task and demonstrate their high-speed inference and superior performance via both offline experiments and online A/B tests. The QBSUM dataset is released in order to facilitate future advancement of this research field.
机译:基于查询的文档摘要旨在提取或生成直接答案或与搜索查询相关的文档的摘要。这是一个重要的技术,可以有利于各种应用,例如搜索引擎,文档级机器阅读理解和聊天禁止。目前,专为基于查询的摘要设计的数据集是数量短,并且现有数据集也受到规模和质量的限制。此外,据我们所知,基于中文查询的文档摘要没有公开的数据集。在本文中,我们展示了QBSUM,这是一个高质量的大型数据集,包括49,000多个数据样本,用于基于中国查询的文件摘要任务。我们还向任务提出多个无人监督和监督的解决方案,并通过离线实验和在线A / B测试来展示其高速推理和卓越的性能。 QBSUM数据集被释放,以促进该研究领域的未来进步。

著录项

  • 来源
    《Computer speech and language》 |2021年第3期|101166.1-101166.12|共12页
  • 作者单位

    University of Alberta 116 St. and 85 Ave. Edmonton AB T6G 2R3 Canada;

    Platform and Content Group Tencent 10000 Shennan Ave Shenzhen 518057 China;

    University of Montreal Apartment 1209 4998 Boul De Maisonneuve O Westmount QC H3Z 1N2 Canada;

    Platform and Content Group Tencent 10000 Shennan Ave Shenzhen 518057 China;

    Platform and Content Group Tencent 10000 Shennan Ave Shenzhen 518057 China;

    Platform and Content Group Tencent 10000 Shennan Ave Shenzhen 518057 China;

    University of Alberta 116 St. and 85 Ave. Edmonton AB T6G 2R3 Canada;

    Platform and Content Group Tencent 10000 Shennan Ave Shenzhen 518057 China;

    Platform and Content Group Tencent 10000 Shennan Ave Shenzhen 518057 China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Query-based summarization; Natural language generation; Information retrieval;

    机译:基于查询的摘要;自然语言生成;信息检索;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号