【24h】

Text Binarization in Color Documents

机译:彩色文档中的文本二值化

获取原文
获取原文并翻译 | 示例
       

摘要

This article presents a new method for the binarization of color document images. Initially, the colors of the document image are reduced to a small number using a new color reduction technique. Specifically, this technique estimates the dominant colors and then assigns the original image colors to them in order that the background and text components to become uniform. Each dominant color defines a color plane in which the connected components (CCs) are extracted. Next, in each color plane a CC filtering procedure is applied which is followed by a grouping procedure. At the end of this stage, blocks of CCs are constructed which are next redefined by obtaining the direction of connection (DOC) property for each CC. Using the DOC property, the blocks of CCs are classified as text or nontext. The identified text blocks are binarized properly using suitable binarization techniques, considering the rest of the pixels as background. The final result is a binary image which contains always black characters in white background independently of the original colors of each text block. The proposed document binarization approach can also be used for binarization of noisy color (or grayscale) document images. Several experiments that confirm the effectiveness of the proposed technique are presented.
机译:本文介绍了一种用于彩色文档图像二值化的新方法。最初,使用新的色彩还原技术将文档图像的色彩还原为少量。具体来说,此技术会估计主要颜色,然后将原始图像颜色分配给它们,以使背景和文本成分变得均匀。每个主色都定义了一个色平面,在该色平面中提取了连接的分量(CC)。接下来,在每个颜色平面中,应用CC过滤过程,然后进行分组过程。在此阶段的最后,将构建CC块,然后通过获取每个CC的连接方向(DOC)属性来重新定义这些CC块。使用DOC属性,将CC块分类为文本或非文本。使用适当的二值化技术,将剩余的像素视为背景,对识别出的文本块进行二值化。最终结果是一个二进制图像,该图像始终在白色背景中包含黑色字符,与每个文本块的原始颜色无关。所提出的文档二值化方法也可以用于将有噪声的彩色(或灰度)文档图像二值化。提出了几个实验,证实了所提出技术的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号