Evaluation of Header Metadata Extraction Approaches and Tools for Scientific PDF Documents

机译：评估科学PDF文档的标题元数据提取方法和工具

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper evaluates the performance of tools for the extraction of metadata from scientific articles. Accurate metadata extraction is an important task for automating the management of digital libraries. This comparative study is a guide for developers looking to integrate the most suitable and effective metadata extraction tool into their software. We shed light on the strengths and weaknesses of seven tools in common use. In our evaluation using papers from the arXiv collection, GROBID delivered the best results, followed by Mendeley Desktop. SciPlore Xtract, PDFMeat, and SVMHeaderParse also delivered good results depending on the metadata type to be extracted.

机译：本文评估了从科学文章中提取元数据的工具的性能。准确的元数据提取是自动化数字图书馆管理的重要任务。这项比较研究为寻求将最合适和最有效的元数据提取工具集成到其软件中的开发人员提供了指南。我们阐明了七个常用工具的优缺点。在使用arXiv集合中的论文进行的评估中，GROBID获得了最佳结果，其次是Mendeley Desktop。根据要提取的元数据类型，SciPlore Xtract，PDFMeat和SVMHeaderParse也提供了良好的结果。

著录项

来源
《ACM/IEEE-CS joint conference on digital libraries》|2013年|385-386|共2页
会议地点 Indianapolis IN(US)
作者
Mario Lipinski; Kevin Yao; Corinna Breitinger; Joeran Beel; Bela Gipp;
展开▼
作者单位

University of California Berkeley CA USA;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Information Retrieval; Metadata Extraction; Evaluation; PDF;

机译：信息检索；元数据提取；评估； PDF格式;

相似文献

外文文献
中文文献
专利

1. Metadata Extraction Approach of PDF Documents Based on Measurement Fusion [J] . Junmin Zhao, Huazhong Liu Journal of Multimedia . 2013,第6期

机译：基于测量融合的PDF文档元数据提取方法
2. On methods and tools of table detection, extraction and annotation in PDF documents [J] . Shah Khusro, Asima Latif, Irfan Ullah Journal of Information Science . 2015,第1期

机译：PDF文档中表格检测，提取和注释的方法和工具
3. Evaluation of the Implementation of Indonesian Electronic Journals Citation System Using Regex Technique and PDF Extraction Tool [J] . Riri Fitri Sari, Agung Kurniawan Asian Journal of Information Technology . 2011,第7期

机译：使用正则表达式技术和PDF提取工具评估印度尼西亚电子期刊引文系统的实施情况
4. Evaluation of Header Metadata Extraction Approaches and Tools for Scientific PDF Documents [C] . Mario Lipinski, Kevin Yao, Corinna Breitinger, ACM/IEEE-CS joint conference on digital libraries . 2013

机译：评估标题元数据提取方法和工具的科学PDF文件
5. Automatic semantic header generator for PDF documents [D] . Xue, Furong 2004

机译：PDF文档的自动语义头生成器
6. Embedding and Publishing Interactive 3-Dimensional Scientific Figures in Portable Document Format (PDF) Files [O] . David G. Barnes, Michail Vidiassov, Bernhard Ruthensteiner, -1

机译：以便携式文档格式（PDF）文件嵌入和发布交互式三维科学图形
7. Evaluation of header metadata extraction approaches and tools for scientific pdf documents [O] . Mario Lipinski, Kevin Yao, Corinna Breitinger, 2013

机译：评估用于科学pdf文档的标题元数据提取方法和工具

Evaluation of Header Metadata Extraction Approaches and Tools for Scientific PDF Documents

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅