首页> 外文会议>Workshop on Scholarly Document Processing >DeepPaperComposer: A Simple Solution for Training Data Preparation for Parsing Research Papers

【24h】

DeepPaperComposer: A Simple Solution for Training Data Preparation for Parsing Research Papers

机译：DeeppaperComposer：一种简单的解决方案，用于培养解析研究论文的数据准备

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present DeepPaperComposer, a simple solution for preparing highly accurate (100%) training data without manual labeling to extract content from scholarly articles using con-volutional neural networks (CNNs). We used our approach to generate data and trained CNNs to extract eight categories of both textual (titles, abstracts, authors, headers, figure and table captions, and body texts) and nontextual content (figures and tables) from 30 years of 2916 IEEE VIS conference papers, of which a third were scanned bitmap PDFs. We curated this dataset and named it VISpaper-3K. We then showed our initial benchmark performance using VISpaper-3K over CS-150 using YOLOv3 and Faster-RCNN. We have open-sourced DeepPaperComposer for training data generation and have released the resulting annotation data VISpaper-3K2 to promote reproducible research.

机译：我们介绍DeeppaperComposer，这是一种简单的解决方案，用于制备高度准确的（100％）培训数据，无需手动标记，使用Con-volutional神经网络（CNN）从学术文章中提取内容。我们利用我们的方法来生成数据并培训CNN，以提取八类文本（标题，摘要，作者，标题，图和表格和身体文本）以及从30年的IEEE VIS的30年中提取的内容（数字和表格）会议论文，其中第三个是扫描位图PDFS。我们策划了这个数据集并命名为VISPAPPOPPOPPER-3K。然后，我们使用YOLOV3和FASTER-RCNN使用VISPAPPOR-3K OVER CS-150展示了我们的初始基准性能。我们拥有开放的DeeppaperComposper，用于培训数据生成，并发布了所产生的注释数据AVATPAPER-3K2，以促进可重复的研究。

著录项

来源
《Workshop on Scholarly Document Processing 》|2020年|91-96|共6页
会议地点
作者
Meng Ling; Jian Chen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A simple and fast method for the determination of endo- and exo-cellulase activity in cellulase preparations using filter paper [J] . Marcos Henrique Luciano Silveira, Martinho Rau, Elba Pinto da Silva Bon Enzyme and Microbial Technology . 2012 ,第5期

机译：使用滤纸测定纤维素酶制剂中内切和外切纤维素酶活性的简便快捷方法
2. Excemplify: A Flexible Template Based Solution, Parsing and Managing Data in Spreadsheets for Experimentalists [J] . Lei Shi, Lenneke Jong, Ulrike Wittig, Journal of Integrative Bioinformatics . 2013 ,第2期

机译：示例：基于灵活模板的解决方案，为实验人员解析和管理电子表格中的数据
3. Excemplify: A Flexible Template Based Solution, Parsing and Managing Data in Spreadsheets for Experimentalists [J] . Lei Shi, Lenneke Jong, Ulrike Wittig, Journal of Integrative Bioinformatics . 2013 ,第2期

机译：默认：基于模板的灵活模板，解析和管理实验主义者的电子表格中的数据
4. A Simple yet Effective Joint Training Method for Cross-Lingual Universal Dependency Parsing [C] . Danlu Chen, Mengxiao Lin, Zhifeng Hu, SIGNLL conference on computational natural language learning . 2018

机译：一种简单而有效的跨语言通用依赖性解析联合训练方法
5. Leveraging Training Data from High-Resource Languages to Improve Dependency Parsing for Low-Resource Languages [D] . Jaja, Claire. 2014

机译：利用来自高资源语言的培训数据来改善对低资源语言的依赖关系解析
6. Shetti a simple tool to parse manipulate and search large dataset of sequences [O] . Haitham Sobhy 2015

机译：Shetti一种简单的工具可以分析操纵和搜索大型序列数据集
7. Certain Simple, Unsolvable Problems of Group Theory. V 29,3029Parts I, II, III and IV have appeared in Series A, 57, Nos. 3 and 5; 58, Nos. 2 and 5 of these Proceedings, as well as Indag. Math., 16, Nos. 3 and 5 (1954); 17, Nos. 2 and 5 (1955).In Part I, page 234, the ninth and tenth lines following the displayed material, for both occurrences of A read D and for both occurrences of B read E: In part II, page 497, for all occurrences of M in Diagrams A and D read C. In Part IV, page 574, the third displayed line, for zαiLread z―αiL. 19).30The proof of the unsolvability of the word problem contained in this and subsequent paper is not contained in the dissertation Several simple unsolvable problems of group theory related to the word problem. (See footnote 1 of Part I.) Our proof was finally completed during the period 1954-56 while in residence at the Institute for Advanced Study. We were supported in 1954-55 directly by the Institute and in 1955-56 by National Science Foundation contract G-1974. Certain improvements were evolved ans preparations for publication were completed while the author held a Fulbright grant to the University of Oslo.Our thanks are due to Professor Kurt Gödel for his kind encouragement in these matter; certain related problems for study he has suggested we hope to deal with later. When an earlier version of Lemmas 30 and 31 was explained at a colloquium at the University of Michigan, August 3, 1956, Professor Roger Lyndon suggested an improvement which is incorporated in our present version. We are indeed indebted to Dr. John Addison and Dr. Michael Rabin for checking many of the new details during July of 1956.) [O] . Boone William W. 1957

机译：群体理论的某些简单，不可解决的问题。 V 29,3029 I，II，III和IV部分出现在A系列57、3和5号中； 58，第2和5号程序，以及Indag。 Math。，16，Nos 3 and 5（1954）; 》，第17卷，第2期和第5期（1955年）。在第一部分，第234页中，显示的材料之后的第九行和第十行，出现A读D和出现B都读E：在第二部分，第497页，对于图A和D中所有出现的M，请读C。在第Ⅳ部分，第574页中，第三行显示为zαiLreadz-αiL。 19）.30本论文及后续论文中所包含的单词问题的不可解性的证明不包含在本文中。（请参阅第I部分的脚注1。）我们的证明终于在1954-56年期间居住在高级研究所期间完成了。该研究所在1954-55年间直接为我们提供了支持，在1955-56年间得到了美国国家科学基金会的G-1974合同的支持。在作者获得奥斯陆大学富布赖特奖学金的同时，对出版物的改进和准备工作也有所进展。感谢库尔特·哥德尔教授在此方面的大力鼓励。他建议我们研究一些相关的问题，希望以后再解决。 1956年8月3日，在密歇根大学的一个座谈会上解释了Lemmas 30和31的早期版本时，Roger Lyndon教授建议进行改进，并将其纳入当前版本。我们确实要感谢John Addison博士和Michael Rabin博士在1956年7月检查了许多新细节。）

DeepPaperComposer: A Simple Solution for Training Data Preparation for Parsing Research Papers

摘要

著录项

相似文献

相关主题

期刊订阅