首页> 外文期刊>International journal of digital library systems >Efficient Word Segmentation and Baseline Localization in Handwritten Documents Using Isothetic Covers
【24h】

Efficient Word Segmentation and Baseline Localization in Handwritten Documents Using Isothetic Covers

机译:使用等距封面的手写文档中的有效分词和基线本地化

获取原文
获取原文并翻译 | 示例
           

摘要

Analysis of handwritten documents is a challenging task in the modern era of document digitization. It requires efficient preprocessing which includes word segmentation and baseline detection. This paper proposes a novel approach toward word segmentation and baseline detection in a handwritten document. It is based on certain structural properties of isothetic covers tightly enclosing the words in a handwritten document. For an appropriate grid size, the isothetic covers successfully segregate the words so that each cover corresponds to a particular word. The grid size is selected by an adaptive technique that classifies the inter-cover distances into two classes in an unsupervised manner. Finally, by using a geometric heuristic with the horizontal chords of these covers, the corresponding baselines are extracted. Owing to its traversal strategy along the word boundaries in a combinatorial manner and usage of limited operations strictly in the integer domain, the method is found to be quite fast, efficient, and robust, as demonstrated by experimental results with datasets of both Bengali and English handwritings.
机译:在现代的文档数字化时代,手写文档的分析是一项艰巨的任务。它需要高效的预处理,其中包括分词和基线检测。本文提出了一种针对手写文档中的分词和基线检测的新颖方法。它是基于等渗覆盖层的某些结构特性,将单词紧密地封装在手写文档中。对于适当的网格大小,等距封面成功地隔离了单词,因此每个封面都对应一个特定的单词。网格大小是通过自适应技术选择的,该技术以无监督的方式将覆盖层间的距离分为两类。最后,通过对这些封面的水平弦使用几何启发式,提取相应的基线。由于其以组合方式沿单词边界遍历的策略以及严格地在整数域中使用有限的操作,因此该方法非常快速,高效且健壮,如孟加拉语和英语数据集的实验结果所证明笔迹。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号