首页> 外文会议>Annual meeting of the Association for Computational Linguistics >DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications
【24h】

DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications

机译:Dureader:来自现实世界应用的中国机器阅读理解数据集

获取原文

摘要

This paper introduces DuReader, a new large-scale, open-domain Chinese machine reading comprehension (MRC) dataset, designed to address real-world MRC. DuReader has three advantages over previous MRC datasets: (1) data sources: questions and documents are based on Baidu Search and Baidu Zhi-dao; answers are manually generated. (2) question types: it provides rich annotations for more question types, especially yes-no and opinion questions, that leaves more opportunity for the research community. (3) scale: it contains 200K questions, 420K answers and 1M documents; it is the largest Chinese MRC dataset so far. Experiments show that human performance is well above current state-of-the-art baseline systems, leaving plenty of room for the community to make improvements. To help the community make these improvements, both DuReader and baseline systems have been posted online. We also organize a shared competition to encourage the exploration of more models. Since the release of the task, there are significant improvements over the baselines.
机译:本文介绍了Dureader,一种新的大型开放式中文机器阅读理解(MRC)数据集,旨在解决现实世界MRC。杜威尔在以前的MRC数据集中有三个优势:(1)数据来源:问题和文件基于百度搜索和百度志道;答案是手动生成的。 (2)问题类型:它为更多问题类型提供了丰富的注释,尤其是否和意见问题,为研究界提供更多机会。 (3)规模:它包含200k问题,420k答案和1M文件;到目前为止,这是中国最大的MRC数据集。实验表明,人类性能远远超过当前最先进的基线系统,为社区留下了足够的空间来改善。为了帮助社区进行这些改进,Dureader和基线系统都已在线发布。我们还组织共享竞争,鼓励探索更多型号。由于任务的发布以来,基线上存在显着的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号