首页> 外文会议>IAPR International Conference on Document Analysis and Recognition >A Data Driven Approach for Compound Figure Separation Using Convolutional Neural Networks
【24h】

A Data Driven Approach for Compound Figure Separation Using Convolutional Neural Networks

机译:基于卷积神经网络的复合图形分离数据驱动方法

获取原文

摘要

A key problem in automatic analysis and understanding of scientific papers is to extract semantic information from non-textual paper components like figures, diagrams, tables, etc. Much of this work requires a very first preprocessing step: decomposing compound multi-part figures into individual sub-figures. Previous work in compound figure separation has been based on manually designed features and separation rules, which often fail for less common figure types and layouts. Moreover, few implementations for compound figure decomposition are publicly available. This paper proposes a data driven approach to separate compound figures using modern deep Convolutional Neural Networks (CNNs) to train the separator in an end-to-end manner. CNNs eliminate the need for manually designing features and separation rules, but require a large amount of annotated training data. We overcome this challenge using transfer learning as well as automatically synthesizing training exemplars. We evaluate our technique on the ImageCLEF Medical dataset, achieving 85.9% accuracy and outperforming previous techniques. We have released our implementation as an easy-to-use Python library, aiming to promote further research in scientific figure mining.
机译:自动分析和理解科学论文的一个关键问题是从图形,图表,表格等非文本论文组件中提取语义信息。许多工作需要一个非常第一步的预处理步骤:将复合的多部分图形分解为单个图形子图。复合图形分离的先前工作是基于手动设计的功能和分离规则的,这些功能和分离规则通常会因不太常见的图形类型和布局而失败。而且,很少有用于复合图形分解的实现方式可公开获得。本文提出了一种数据驱动的方法,该方法使用现代深度卷积神经网络(CNN)来分离复合图形,从而以端到端的方式训练分隔符。 CNN消除了手动设计特征和分隔规则的需要,但是需要大量带注释的训练数据。我们使用转移学习以及自动综合训练示例来克服这一挑战。我们在ImageCLEF Medical数据集上评估了我们的技术,达到了85.9%的准确性,并且优于以前的技术。我们已将实现发布为易于使用的Python库,旨在促进对科学图形挖掘的进一步研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号