首页> 外文期刊>International journal of data mining and bioinformatics >A novel figure panel classification and extraction method for document image understanding
【24h】

A novel figure panel classification and extraction method for document image understanding

机译:用于文档图像理解的新颖的图形面板分类和提取方法

获取原文
获取原文并翻译 | 示例
           

摘要

With the availability of full-text documents in many online databases, the paradigm of biomedical literature mining and document understanding has shifted to analysis of both text and figures to derive implicit messages that are unforeseen with text mining only. To enable automatic, massive processing, a key step is to extract and parse figures embedded in papers. In this paper, we present a novel model-driven, hierarchical method to classify and extract panels from figures in scientific papers. Our method consists of two integrated components: figure (or panel) classification and panel segmentation. Figure classification evaluates each panel and decides the existence of photographs and drawings. Mixtures of photographs and non-photographs are divided into subfigures. The splitting process repeats until no further panel collage can be identified. Detection of highlighted views is addressed with Hough space analysis. Using reconstruction from Hough peaks, enclosed panels are retrieved and saved into separate files. Experiments were conducted with a total of 360 figures extracted from two sets of papers that are retrieved with difference sets of keywords. Experimental results demonstrated that our method successfully segmented figures and extracted photographs and non-photographs with high accuracy and robustness. In addition, our method was able to identify zoom-in views that are superimposed on the original photographs. The efficiency of our method allows online implementation.
机译:随着许多在线数据库中全文文档的可用性,生物医学文献挖掘和文档理解的范式已转移到对文本和图形的分析,以导出仅文本挖掘无法预见的隐式消息。为了实现自动的大规模处理,关键步骤是提取和解析嵌入在纸张中的图形。在本文中,我们提出了一种新颖的模型驱动的分层方法来对科学论文中的人物进行分类和提取。我们的方法由两个集成组件组成:图形(或面板)分类和面板分割。图形分类会评估每个面板,并确定照片和图纸的存在。照片和非照片的混合物分为子图。重复分割过程,直到无法确定其他面板拼贴。霍夫空间分析解决了高亮视图的检测问题。使用霍夫峰的重建,可以将封闭的面板检索并保存到单独的文件中。实验是从两套论文中提取的总共360个图形进行的,这些论文是用不同的关键字集检索的。实验结果表明,我们的方法成功地对人物进行了分割,并以高精度和鲁棒性提取了照片和非照片。此外,我们的方法能够识别叠加在原始照片上的放大视图。我们方法的效率允许在线实施。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号