...
首页> 外文期刊>Electronics and Electrical Engineering >Approach to the Improvement of the Text Line Segmentation by Oriented Anisotropic Gaussian Kernel
【24h】

Approach to the Improvement of the Text Line Segmentation by Oriented Anisotropic Gaussian Kernel

机译:定向各向异性高斯核改善文本线分割的方法

获取原文
获取原文并翻译 | 示例

摘要

Text line segmentation is a major step in a document analytic procedure. It is prerequisite for the valid optical character recognition (OCR) process. In addition, the text line segmentation and character recognition are dependent tasks as well [1]. There are a few successful techniques for printed text line segmentation. However, processing of handwritten documents has been remained a key problem in OCR [2, 3]. Most text line segmentation methods are based on the assumptions that distance between neighboring text lines is sufficiently large and text lines are reasonably straight. However, these assumptions are not always valid for handwritten documents. Hence, text line segmentation is a leading challenge in OCR. Related work on text line segmentation can be categorized in few directions [1]: projection based methods, Hough transform methods, smearing methods, grouping methods, methods for processing overlapping and touching components, stochastic methods, and others. Conventionally, text is written around the horizontal axis. Smearing methods exploited this text property. Hence, they smeared consecutive black pixels representing text along the horizontal direction. If the distance between the white space is within predefined threshold, it is filled with black pixels. The bounding boxes of the connected components in the smeared image which represents control image are considered as text lines.
机译:文本行分割是文档分析过程中的主要步骤。这是有效的光学字符识别(OCR)过程的先决条件。另外,文本行分割和字符识别也是相关的任务[1]。有一些成功的打印文本行分割技术。但是,手写文档的处理仍然是OCR中的关键问题[2,3]。大多数文本行分割方法均基于以下假设:相邻文本行之间的距离足够大,并且文本行相当笔直。但是,这些假设并不总是适用于手写文档。因此,文本行分割是OCR中的主要挑战。文本行分割的相关工作可以沿几个方向进行分类[1]:基于投影的方法,霍夫变换方法,拖尾方法,分组方法,用于处理重叠和接触分量的方法,随机方法等。按照惯例,文本围绕水平轴书写。涂片方法利用了此文本属性。因此,他们沿水平方向涂抹了代表文本的连续黑色像素。如果空白之间的距离在预定义的阈值内,则将其填充为黑色像素。涂抹图像中代表控制图像的连接组件的边界框被视为文本行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号