首页> 外文会议>IAPR International Conference on Document Analysis and Recognition >A Document Straight Line Based Segmentation for Complex Layout Extraction
【24h】

A Document Straight Line Based Segmentation for Complex Layout Extraction

机译:基于文档直线的分割,用于复杂布局提取

获取原文

摘要

Document layout extraction is a difficult step in the image interpretation process due to the high complexity of documents. The main challenge relies on the huge gap between both the physical and the logical structures of document images. In order to loose as few as possible information, most existing methods are working at pixel level. In this paper, we present a new framework for complex layout extraction based on features of high levels obtained from a document straight line based segmentation. We propose to capture the straight line segments thanks to a new transform integrating the local spatial organization of the segments contained in the document content. Such transform can be applied either on the foreground (related to the document content) or the background pixels, in order to take advantage of the duality of information present in both document parts. Experimental results obtained on the PRImA Layout Analysis dataset illustrate the robustness of our framework for the extraction of specific components of the document including text areas, images and separators.
机译:由于文档的高度复杂性,文档布局提取是图像解释过程中的一个困难步骤。主要挑战在于文档图像的物理和逻辑结构之间的巨大差距。为了尽可能少地散布信息,大多数现有方法都在像素级别上工作。在本文中,我们基于从基于文档直线的分割中获得的高级特征,提出了一种用于复杂布局提取的新框架。我们建议通过整合文档内容中包含的片段的局部空间组织的新转换来捕获直线​​片段。为了利用存在于两个文档部分中的信息的双重性,可以将这种变换应用于前景(与文档内容有关)或背景像素。在PRImA Layout Analysis数据集上获得的实验结果说明了我们框架在提取文档中特定部分(包括文本区域,图像和分隔符)方面的鲁棒性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号