首页> 外文期刊>International Journal of Pattern Recognition and Artificial Intelligence >Optical Character Recognition System for Nastalique Urdu-Like Script Languages Using Supervised Learning
【24h】

Optical Character Recognition System for Nastalique Urdu-Like Script Languages Using Supervised Learning

机译:监督学习的类似乌尔都语文字的光学字符识别系统

获取原文
获取原文并翻译 | 示例
       

摘要

There are two main techniques to convert written or printed text into digital format. The first technique is to create an image of written/printed text, but images are large in size so they require huge memory space to store, as well as text in image form cannot be undergo further processes like edit, search, copy, etc. The second technique is to use an Optical Character Recognition (OCR) system. OCR's can read documents and convert manual text documents into digital text and this digital text can be processed to extract knowledge. A huge amount of Urdu language's data is available in handwritten or in printed form that needs to be converted into digital format for knowledge acquisition. Highly cursive, complex structure, bi-directionality, and compound in nature, etc. make the Urdu language too complex to obtain accurate OCR results. In this study, supervised learning-based OCR system is proposed for Nastalique Urdu language. The proposed system evaluations under a variety of experimental settings apprehend 98.4% training results and 97.3% test results, which is the highest recognition rate ever achieved by any Urdu language OCR system. The proposed system is simple to implement especially in software front of OCR system also the proposed technique is useful for printed text as well as handwritten text and it will help in developing more accurate Urdu OCR's software systems in the future.
机译:有两种主要技术可将书面或印刷文本转换为数字格式。第一种技术是创建手写/打印文本的图像,但是图像尺寸很大,因此它们需要巨大的存储空间来存储,而且图像形式的文本无法接受进一步的处理,例如编辑,搜索,复制等。第二种技术是使用光学字符识别(OCR)系统。 OCR可以读取文档并将手册文本文档转换为数字文本,并且可以对该数字文本进行处理以提取知识。大量的乌尔都语语言数据可以手写或印刷形式获得,需要将其转换为数字格式以获取知识。高度草书,复杂的结构,双向性和本质上的复合性等,使乌尔都语语言过于复杂而无法获得准确的OCR结果。在这项研究中,针对Nastalique乌尔都语语言,提出了基于监督学习的OCR系统。拟议的系统评估在各种实验设置下可获得98.4%的训练结果和97.3%的测试结果,这是所有Urdu语言OCR系统都达到的最高识别率。所提出的系统易于实现,特别是在OCR系统的软件方面,所提出的技术对于印刷文本和手写文本都非常有用,它将有助于将来开发更准确的Urdu OCR软件系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号