首页> 外文会议>Language Engineering Conference >A multi-font OCR system for printed Telugu text
【24h】

A multi-font OCR system for printed Telugu text

机译:用于打印Telugu文本的多字体OCR系统

获取原文

摘要

This work describes the design and development of a Telugu Optical Character Recognition system for printed text (TOSP). Pre- processing tasks considered in this paper are: Conversion of a grey scale image to a binary image, image rectification, skew detection and removal, segmentation of text into lines, words and basic symbols. Basic symbols are identified as the fundamental unit of segmentation in this paper which are recognized by the recognizer. The combinations of these basic symbols that together form characters and compound characters of Telugu are also determined to complete the recognition process. The special feature of TOSP is that it is designed to handle multiple sizes and multiple fonts. Further, the output produced by TOSP can directly be opened in any Indian language software that supports transliteration facility into Telugu script and edited. Several such softwares are popular and available.
机译:这项工作描述了用于打印文本(TOSP)的Telugu光学字符识别系统的设计和开发。本文考虑的预处理任务是:将灰度图像转换为二进制图像,图像整流,偏斜检测和删除,文本分段为行,单词和基本符号。基本符号被识别为本文中的分割基本单元,该纸张被识别器识别。还决定了这些基本符号的组合,形成Telugu的字符和复合特征以完成识别过程。 TOSP的特殊功能是它旨在处理多种大小和多个字体。此外,通过TOP生产的输出可以直接在任何印度语言软件中打开,该软件支持转换设施进入Telugu脚本并编辑。几个这样的软件很受欢迎和可用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号