首页> 外文会议>Workshop on Scholarly Document Processing >DeepPaperComposer: A Simple Solution for Training Data Preparation for Parsing Research Papers
【24h】

DeepPaperComposer: A Simple Solution for Training Data Preparation for Parsing Research Papers

机译:DeeppaperComposer:一种简单的解决方案,用于培养解析研究论文的数据准备

获取原文

摘要

We present DeepPaperComposer, a simple solution for preparing highly accurate (100%) training data without manual labeling to extract content from scholarly articles using con-volutional neural networks (CNNs). We used our approach to generate data and trained CNNs to extract eight categories of both textual (titles, abstracts, authors, headers, figure and table captions, and body texts) and nontextual content (figures and tables) from 30 years of 2916 IEEE VIS conference papers, of which a third were scanned bitmap PDFs. We curated this dataset and named it VISpaper-3K. We then showed our initial benchmark performance using VISpaper-3K over CS-150 using YOLOv3 and Faster-RCNN. We have open-sourced DeepPaperComposer for training data generation and have released the resulting annotation data VISpaper-3K2 to promote reproducible research.
机译:我们介绍DeeppaperComposer,这是一种简单的解决方案,用于制备高度准确的(100%)培训数据,无需手动标记,使用Con-volutional神经网络(CNN)从学术文章中提取内容。我们利用我们的方法来生成数据并培训CNN,以提取八类文本(标题,摘要,作者,标题,图和表格和身体文本)以及从30年的IEEE VIS的30年中提取的内容(数字和表格)会议论文,其中第三个是扫描位图PDFS。我们策划了这个数据集并命名为VISPAPPOPPOPPER-3K。然后,我们使用YOLOV3和FASTER-RCNN使用VISPAPPOR-3K OVER CS-150展示了我们的初始基准性能。我们拥有开放的DeeppaperComposper,用于培训数据生成,并发布了所产生的注释数据AVATPAPER-3K2,以促进可重复的研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号