首页> 外文会议>IAPR International Workshop on Document Analysis Systems >A Modular Metadata Extraction System for Born-Digital Articles
【24h】

A Modular Metadata Extraction System for Born-Digital Articles

机译:出生 - 数字文章的模块化元数据提取系统

获取原文

摘要

We present a comprehensive system for extracting metadata from scholarly articles. In our approach the entire document is inspected, including headers and footers of all the pages as well as bibliographic references. The system is based on a modular workflow which allows for evaluation, unit testing and replacement of individual components. The workflow is optimized towards processing of born-digital documents, but may accept scanned document images as well. The machine-learning approaches we have chosen for solving individual tasks increase the ability to adapt to new document layouts and formats. The evaluation tests we have performed showed good results of the individual implementations and the entire metadata extraction process.
机译:我们为从学术文章中提取元数据提出了一个全面的系统。 在我们的方法中,检查整个文件,包括所有页面的页眉和页脚以及书目引用。 该系统基于模块化工作流,其允许评估,单元测试和更换各个组件。 工作流程经过优化朝向生于数字文档的处理,但也可以接受扫描的文档图像。 我们选择解决个人任务的机器学习方法增加了适应新文档布局和格式的能力。 我们执行的评估测试显示了各个实施方式的良好结果和整个元数据提取过程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利