...
首页> 外文期刊>Pattern Analysis and Applications >An optical character recognition system for printed Telugu text
【24h】

An optical character recognition system for printed Telugu text

机译:用于打印泰卢固语文本的光学字符识别系统

获取原文
获取原文并翻译 | 示例
           

摘要

Telugu is one of the oldest and popular languages of India, spoken by more than 66 million people, especially in South India. Not much work has been reported on the development of optical character recognition (OCR) systems for Telugu text. Therefore, it is an area of current research. Some characters in Telugu are made up of more than one connected symbol. Compound characters are written by associating modifiers with consonants, resulting in a huge number of possible combinations, running into hundreds of thousands. A compound character may contain one or more connected symbols. Therefore, systems developed for documents of other scripts, like Roman, cannot be used directly for the Telugu language. The individual connected portions of a character or a compound character are defined as basic symbols in this paper and treated as a unit of recognition. The algorithms designed exploit special characteristics of Telugu script for processing the document images efficiently. The algorithms have been implemented to create a Telugu OCR system for printed text (TOSP). The output of TOSP is in phonetic English that can be transliterated to generate editable Telugu text. A special feature of TOSP is that it is designed to handle a large variety of sizes and multiple fonts, and still provides raw OCR accuracy of nearly 98%. The phonetic English representation can be also used to develop a Telugu text-to-speech system; work is in progress in this regard.
机译:泰卢固语是印度最古老,最流行的语言之一,有超过6600万人使用,尤其是在印度南部。泰卢固语文本的光学字符识别(OCR)系统的开发工作尚未报道。因此,这是当前研究的领域。泰卢固语中的某些字符由多个连接的符号组成。通过将修饰符与辅音相关联来编写复合字符,从而导致大量可能的组合,成千上万种组合。复合字符可以包含一个或多个连接的符号。因此,为其他脚本(例如Roman)的文档开发的系统不能直接用于泰卢固语。字符或复合字符的各个连接部分在本文中定义为基本符号,并视为识别单位。设计的算法利用泰卢固语脚本的特殊特性来有效处理文档图像。已经实施了这些算法,以创建用于印刷文本(TOSP)的泰卢固语OCR系统。 TOSP的输出为语音英语,可以将其音译成可编辑的泰卢固语文本。 TOSP的一个特殊功能是设计用于处理各种大小和多种字体,并且仍提供接近98%的原始OCR精度。语音英语表示也可以用于开发泰卢固语文本转语音系统;这方面的工作正在进行中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号