首页> 外文会议>International Conference on Advances in Pattern Recognition >Separation of Foreground Text from Complex Background in Color Document Images
【24h】

Separation of Foreground Text from Complex Background in Color Document Images

机译:从复杂背景中的前景文本的分离在彩色文档图像中

获取原文

摘要

Reading of the foreground text is difficult in documents having multi colored complex background. Automatic foreground text separation in such document images is very much essential for smooth reading of the document contents. In this paper we propose a hybrid approach which combines connected component analysis and an unsupervised thresholding for separation of text from the complex background. The proposed approach identifies the candidate text regions based on edge detection followed by a connected component analysis. Because of background complexity it is also possible that a non text region may be identified as a text region. To overcome this problem we extract texture features of connected components and analyze the feature values. Finally the threshold value for each detected text region is derived automatically from the data of corresponding image region to perform foreground separation. The proposed approach can handle document images with varying background of multiple colors. Also it can handle foreground text of any color, font and size. Experimental results show that the proposed algorithm detects on an average 97.8% of text regions in the source document. Readability of the extracted foreground text is illustrated through OCRing.
机译:在具有多色复杂背景的文档中读取前景文本很难。在此类文档图像中的自动前景文本分离对于流畅的读取文档内容非常重要。在本文中,我们提出了一种混合方法,它结合了连接的分量分析和无监督的阈值,以便从复杂背景中分离文本。所提出的方法基于边缘检测识别候选文本区域,然后是连接的分量分析。由于背景复杂性,也可以将非文本区域识别为文本区域。为了克服这个问题,我们提取连接组件的纹理功能并分析特征值。最后,每个检测到的文本区域的阈值是从相应图像区域的数据自动导出的,以执行前景分离。该方法可以处理具有多种颜色的不同背景的文档图像。它也可以处理任何颜色,字体和大小的前景文本。实验结果表明,该算法在源文档中平均检测到平均97.8%的文本区域。提取的前景文本的可读性通过occring说明。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号