Optical Character Recognition System for Nastalique Urdu-Like Script Languages Using Supervised Learning

Rizvi S. S. R.; Sagheer A.; Adnan K.; Muhammad A.

首页> 外文期刊>International Journal of Pattern Recognition and Artificial Intelligence >Optical Character Recognition System for Nastalique Urdu-Like Script Languages Using Supervised Learning

【24h】

Optical Character Recognition System for Nastalique Urdu-Like Script Languages Using Supervised Learning

机译：监督学习的类似乌尔都语文字的光学字符识别系统

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

There are two main techniques to convert written or printed text into digital format. The first technique is to create an image of written/printed text, but images are large in size so they require huge memory space to store, as well as text in image form cannot be undergo further processes like edit, search, copy, etc. The second technique is to use an Optical Character Recognition (OCR) system. OCR's can read documents and convert manual text documents into digital text and this digital text can be processed to extract knowledge. A huge amount of Urdu language's data is available in handwritten or in printed form that needs to be converted into digital format for knowledge acquisition. Highly cursive, complex structure, bi-directionality, and compound in nature, etc. make the Urdu language too complex to obtain accurate OCR results. In this study, supervised learning-based OCR system is proposed for Nastalique Urdu language. The proposed system evaluations under a variety of experimental settings apprehend 98.4% training results and 97.3% test results, which is the highest recognition rate ever achieved by any Urdu language OCR system. The proposed system is simple to implement especially in software front of OCR system also the proposed technique is useful for printed text as well as handwritten text and it will help in developing more accurate Urdu OCR's software systems in the future.

机译：有两种主要技术可将书面或印刷文本转换为数字格式。第一种技术是创建手写/打印文本的图像，但是图像尺寸很大，因此它们需要巨大的存储空间来存储，而且图像形式的文本无法接受进一步的处理，例如编辑，搜索，复制等。第二种技术是使用光学字符识别（OCR）系统。 OCR可以读取文档并将手册文本文档转换为数字文本，并且可以对该数字文本进行处理以提取知识。大量的乌尔都语语言数据可以手写或印刷形式获得，需要将其转换为数字格式以获取知识。高度草书，复杂的结构，双向性和本质上的复合性等，使乌尔都语语言过于复杂而无法获得准确的OCR结果。在这项研究中，针对Nastalique乌尔都语语言，提出了基于监督学习的OCR系统。拟议的系统评估在各种实验设置下可获得98.4％的训练结果和97.3％的测试结果，这是所有Urdu语言OCR系统都达到的最高识别率。所提出的系统易于实现，特别是在OCR系统的软件方面，所提出的技术对于印刷文本和手写文本都非常有用，它将有助于将来开发更准确的Urdu OCR软件系统。

著录项

来源
《International Journal of Pattern Recognition and Artificial Intelligence》 |2019年第10期|1953004.1-1953004.32|共32页
作者
Rizvi S. S. R.; Sagheer A.; Adnan K.; Muhammad A.;
展开▼
作者单位

Univ Lahore Dept Comp Sci & IT Lahore 54000 Pakistan|NCBA&E Sch Comp Sci Lahore 54000 Pakistan;

NCBA&E Sch Comp Sci Lahore 54000 Pakistan;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Optical Character Recognition (OCR); Urdu Nastalique; Image Processing; Pattern Recognition; Supervised Learning;

机译：光学字符识别（OCR）;Urdu Nastalique;图像处理;模式识别;监督学习;
入库时间 2022-08-18 04:35:51

相似文献

外文文献
中文文献
专利

1. The optical character recognition of Urdu-like cursive scripts [J] . Saeeda Naz, Khizar Hayat, Muhammad Imran Razzak, Pattern Recognition: The Journal of the Pattern Recognition Society . 2014,第3期

机译：乌尔都语类草书的光学字符识别
2. BIO-INSPIRED MULTILAYERED AND MULTILANGUAGE ARABIC SCRIPT CHARACTER RECOGNITION SYSTEM [J] . Muhammad Imran Razzak, Syed Afaq Husain, Abdulrahman A. Mirza, International Journal of Innovative Computing Information and Control . 2012,第4期

机译：生物启发的多层多语言阿拉伯文字特征识别系统
3. Optical character recognition system for Baybayin scripts using support vector machine [J] . Rodney Pino, Renier Mendoza, Rachelle Sambayan PeerJ Computer Science . 2021,第a期

机译：使用支持向量机的Baybayin脚本光学字符识别系统
4. Optical character recognition (OCR) system for Roman script English language using Artificial Neural Network (ANN) classifier [C] . Honey Mehta, Sanjay Singla, Aarti Mahajan International Conference on Research Advances in Integrated Navigation Systems . 2016

机译：使用人工神经网络（ANN）分类器的罗马文字和英语光学字符识别（OCR）系统
5. A multimodal fusion approach for automatic postal address recognition system using Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR) techniques. [D] . Singh, Amriteshwar. 2011

机译：一种使用光学字符识别（OCR）和自动语音识别（ASR）技术的自动邮政地址识别系统的多模式融合方法。
6. A Real-Time Automatic Plate Recognition System Based on Optical Character Recognition and Wireless Sensor Networks for ITS [O] . Nicole do Vale Dalarmelina, Marcio Andrey Teixeira, Rodolfo I. Meneguette 2020

机译：基于光学字符识别和无线传感器网络的ITS实时自动车牌识别系统
7. Weakly Supervised Training of a Sign Language\ud Recognition System Using Multiple Instance\ud Learning Density Matrices [O] . Kelly, Daniel, McDonald, John, Markham, Charles 2011

机译：手语的弱监督训练\ ud 使用多个实例的识别系统\ ud 学习密度矩阵
8. Foreign Language Optical Character Recognition, Phase II: Arabic and PersianTraining and Test Data Sets [R] . Davidson, R. B., Hopely, R. L. 1997

机译：外语光学字符识别，第二阶段：阿拉伯语和波斯语培训和测试数据集

Optical Character Recognition System for Nastalique Urdu-Like Script Languages Using Supervised Learning

摘要

著录项

相似文献

相关主题

期刊订阅