首页> 外文会议>International Conference on Document Analysis and Recognition >A document image segmentation system using analysis of connected components
【24h】

A document image segmentation system using analysis of connected components

机译:使用连接组件分析的文档图像分割系统

获取原文

摘要

Page segmentation into text and non-text elements is an essential preprocessing step before optical character recognition (OCR) operation. In case of poor segmentation, an OCR classification engine produces garbage characters due to the presence of non-text elements. This paper presents a method to separate the textual and non textual components in document images using a graph-based modeling and structural analysis. This is a fast and efficient method to separate adequately the graphical and the textual parts of a document. We have evaluated our method on two well-known subsets: the UW-III dataset and the ICDAR 2009 page segmentation competition dataset. Comparisons are led with two methods of state-of-the-art; these results showing that our method proved better performances in this task.
机译:页面分段为文本和非文本元素是光学字符识别(OCR)操作之前的基本预处理步骤。在分割不良的情况下,由于存在非文本元素,OCR分类引擎产生垃圾字符。本文介绍了一种使用基于图形的建模和结构分析在文档图像中分离文本和非文本组件的方法。这是一种快速而有效的方法,可以采用适当的图形和文档的文本部分。我们已经在两个众所周知的子集中评估了我们的方法:UW-III数据集和ICDAR 2009页面分段竞争数据集。比较是用两种最先进的方法带来的;这些结果表明,我们的方法在这项任务中证明了更好的表现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号