首页> 外文会议>International Doctoral Symposium on Applied Computation and Security Systems >Line, Word, and Character Segmentation from Bangla Handwritten Text-A Precursor Toward Bangla HOCR
【24h】

Line, Word, and Character Segmentation from Bangla Handwritten Text-A Precursor Toward Bangla HOCR

机译:来自Bangla手写文本的线,单词和字符细分 - 对Bangla Hocr的前兆

获取原文

摘要

The basic functionalities of optical character recognition (OCR) are to recognize and extract text to digitally editable text from document images. Apart from this, an OCR has other potentials in document image processing such as in automatic document sorter, writer identification/verification. In current situation, various commercially available OCR systems can be found mostly for Roman script. Development of an unconstrained offline handwritten character recognition system is one of the most challenging tasks for the research community. Things get more complicated when we consider Indic scripts like Bangla which contains more than 280 modified and compound characters along with isolated characters. For recognition of handwritten document, the most convenient way is to segment the text into characters or character parts. So line, word and character level segmentation plays a vital role in the development of such a system. In this paper, a scheme for tri-level segmentation (line, word, and character) is presented. Encouraging segmentation results are achieved on a set of 50 handwritten text documents.
机译:光学字符识别(OCR)的基本功能是识别并从文档图像中提取以数字可编辑文本的文本。除此之外,OCR还具有文件图像处理中的其他潜力,例如在自动文档分拣机中,写入器识别/验证。在目前的情况下,可以获得各种可商购的OCR系统,主要用于罗马脚本。开发不受约束的离线手写字符识别系统是研究界最具挑战性的任务之一。当我们考虑Bangla等指示脚本时,它会变得更加复杂,其中包含超过280个修改和复合字符以及隔离字符。为了识别手写文档,最方便的方法是将文本分段为字符或字符部件。因此,Word和字符级分割在这种系统的开发中起着重要作用。本文介绍了三级分段(行,单词和字符)的方案。鼓励细分结果在一套50个手写文本上实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号