首页> 外文OA文献 >Gabor Filter Based Block Energy Analysis for Text Extraction from Digital Document Images
【2h】

Gabor Filter Based Block Energy Analysis for Text Extraction from Digital Document Images

机译:基于Gabor滤波器的块能量分析,用于从数字文档图像中提取文本

摘要

Extraction of text areas is a necessary first step for taking a complex document image for character recognition task. In digital libraries, such OCR'ed text facilitates access to the image of document page through keyword search. Gabor filters, known to be simulating certain characteristics of the Human Visual System (HVS), have been employed for this task by a large number of scientists, in scanned document images.Adapting such a scheme for camera based document images is a relatively new approach. Moreover, design of the appropriate filters to separate text areas, which are assumed to be rich in high frequency components, from non-text areas is a difficult task. The difficulty increases if the clutter is also rich in high frequency components. Other reported works, on separating text from non-text areas, have used geometrical/structural information like shape and size of the regions in binarized document images.In this work, we have used a combination of the above mentioned approaches for the purpose. We have used connected component analysis (CCA), in binarized images, to segment non-text areas based on the size information of the connected regions. A Gabor function based filter bank is used to separate the text and the non-text areas of comparable size. The technique is shown to work efficiently on different kinds of scanned document images, camera captured document images and sometimes on scenic images.Key Words: Gabor filter, connected component analysis, document image, multi-channel filtering.
机译:提取文本区域是为字符识别任务拍摄复杂文档图像的必要的第一步。在数字图书馆中,此类OCR文本有助于通过关键字搜索访问文档页面的图像。众所周知,Gabor过滤器已被用来模拟人类视觉系统(HVS)的某些特征,已被许多科学家用于扫描文档图像中。针对这种情况,将这种方案用于基于相机的文档图像是一种相对较新的方法。 。此外,设计适当的过滤器以将被认为富含高频成分的文本区域与非文本区域分开是困难的任务。如果杂波中也富含高频成分,则难度会增加。其他有关将文本与非文本区域分开的报道也使用了几何/结构信息,例如二值化文档图像中区域的形状和大小。在这项工作中,我们结合使用了上述方法。我们已使用二值化图像中的连接成分分析(CCA),根据连接区域的大小信息对非文本区域进行了细分。基于Gabor函数的滤波器组用于分隔可比较大小的文本区域和非文本区域。该技术在各种类型的扫描文档图像,相机捕获的文档图像以及有时在风景图像上均能有效工作。关键词:Gabor滤波器,连通分量分析,文档图像,多通道滤波。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号