首页> 外文期刊>Pattern Analysis and Applications >Piece-wise painting technique for line segmentation of unconstrained handwritten text: a specific study with Persian text documents
【24h】

Piece-wise painting technique for line segmentation of unconstrained handwritten text: a specific study with Persian text documents

机译:分段绘画技术用于无约束手写文本的行分割:波斯文本文档的一项特殊研究

获取原文
获取原文并翻译 | 示例
       

摘要

The most important and difficult task in text document analysis is to achieve line segmentation accurately, particularly when the document is composed of unconstrained handwritten text. To accomplish this objective a painting scheme is proposed in this research work. Being motivated by the fact that the handwritten Persian texts offer the most critical challenges in the process of text-line segmentation, the new method has been devised by studying the cursive Persian text scripts extensively; yet, in general the proposed line segmentation algorithm is applicable to handwritten text in any language/script. The text block is vertically decomposed into parallel pipe structures called as strip. Each row in each strip is painted by a gray intensity, which is the average intensity value of gray values of all pixels present in that row-strip. Subsequently, the painted pipes are converted into two-tone painting and it is smoothed. The white/black spaces in each pipe of the smoothed image are analyzed to get a short line of separation, phrased as Piece-wise Potential Separating Line (PPSL), between two consecutive black spaces. The PPSLs are concatenated to produce the segmentation of text lines. Some additional procedures are built to handle certain anomalies, which may occur. The scheme is validated by extensive experimentation. We tested the proposed algorithm with 52 pages of Persian text documents containing totally 823 lines and correct line segmentation of 92.35% is achieved. Moreover, the proposed algorithm was also tested with two different datasets of 152 and 200 handwritten text-pages of different languages. Efficiency and script independency of the proposed algorithm were proved when compared with various approaches presented in recent literature.
机译:文本文档分析中最重要和最困难的任务是准确地实现行分割,尤其是在文档由不受约束的手写文本组成时。为了达到这个目的,在这项研究工作中提出了一种绘画方案。由于手写波斯文本在文本行分割过程中提出了最关键的挑战这一事实,因此,这种新方法是通过广泛研究草书波斯文本脚本而设计出来的。然而,一般而言,提出的线段分割算法适用于任何语言/脚本的手写文本。文本块在垂直方向上分解为称为条的平行管道结构。每个条带中的每一行都用灰度强度绘制,灰度强度是该行条中所有像素的灰度值的平均强度值。随后,将已上漆的管道转换为两色上漆并进行平滑处理。分析平滑图像的每个管道中的白色/黑色空间,以得到两个连续的黑色空间之间的短分离线,称为逐段电位分离线(PPSL)。将PPSL连接起来以产生文本行的分段。构建了一些其他过程来处理可能发生的某些异常。该方案已通过广泛的实验验证。我们用52页的波斯文本文档(总共823行)测试了该算法,正确分割率达到92.35%。此外,还使用152个和200个不同语言的手写文本页面的两个不同数据集对提出的算法进行了测试。与最新文献中提出的各种方法相比,该算法的效率和脚本独立性得到了证明。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号