Text line segmentation is a major step in a document analytic procedure. It is prerequisite for the valid optical character recognition (OCR) process. In addition, the text line segmentation and character recognition are dependent tasks as well [1]. There are a few successful techniques for printed text line segmentation. However, processing of handwritten documents has been remained a key problem in OCR [2, 3]. Most text line segmentation methods are based on the assumptions that distance between neighboring text lines is sufficiently large and text lines are reasonably straight. However, these assumptions are not always valid for handwritten documents. Hence, text line segmentation is a leading challenge in OCR. Related work on text line segmentation can be categorized in few directions [1]: projection based methods, Hough transform methods, smearing methods, grouping methods, methods for processing overlapping and touching components, stochastic methods, and others. Conventionally, text is written around the horizontal axis. Smearing methods exploited this text property. Hence, they smeared consecutive black pixels representing text along the horizontal direction. If the distance between the white space is within predefined threshold, it is filled with black pixels. The bounding boxes of the connected components in the smeared image which represents control image are considered as text lines.
展开▼