首页> 外文期刊>Journal of Intelligent Systems >Extraction of Text Lines from Handwritten Documents Using Piecewise Water Flow Technique
【24h】

Extraction of Text Lines from Handwritten Documents Using Piecewise Water Flow Technique

机译:使用分段水流技术从手写文档中提取文本行

获取原文
获取原文并翻译 | 示例
           

摘要

A novel piecewise water flow technique for text line extraction from multi-skewed document images of handwritten text of different scripts is presented here. The basic water flow technique assumes that the hypothetical water flows from both left and right sides of the image frame. This flow of water fills up the gaps between consecutive objects (texts) but faces obstruction if any object lies in the path of the flow. All unwetted regions in the document image are then labeled distinctly to extract the text lines. However, the technique fails when two neighboring text lines touch each other, as water gets obstructed by the touching segment(s). To get rid of this difficulty, we have modified the basic water flow technique by iteratively applying the same over the vertically segmented document images. The main purpose of this vertical segmentation is to localize the text line segment(s) where two text lines get joined. These segments are then horizontally fragmented, and each fragment is placed suitably to the text line in which it actually belongs to. This way, the probable data loss during isolation of the touching text line segment is minimized. Both the techniques (current and basic ones) have been tested on three different databases, viz., CMATERdb1.1.1, CMATERdb1.1.2, and ICDAR2009 handwritten segmentation contest pages, respectively. The test results show that the present technique outperforms the basic one for all three databases.
机译:这里提出了一种新颖的分段水流技术,用于从不同脚本的手写文本的多倾斜文档图像中提取文本行。基本的水流技术假设假设的水从图像帧的左侧和右侧流出。这种水流填充了连续对象(文本)之间的间隙,但是如果任何对象位于流路中,则会遇到障碍。然后,对文档图像中所有未润湿的区域进行明显标记,以提取文本行。但是,当两个相邻的文本行彼此接触时,该技术将失败,因为水会被触摸段阻塞。为了摆脱这一困难,我们通过在垂直分割的文档图像上迭代应用基本水流技术来对其进行修改。这种垂直分割的主要目的是将两个文本行连接在一起的文本行段进行本地化。然后将这些片段水平分割,然后将每个片段适当地放置到其实际所属的文本行中。这样,在隔离触摸文本线段期间可能的数据丢失被最小化。这两种技术(当前技术和基础技术)已分别在三个不同的数据库(即CMATERdb1.1.1,CMATERdb1.1.2和ICDAR2009手写分段竞赛页面)上进行了测试。测试结果表明,对于所有三个数据库,本技术均优于基本技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号