Gujarati Handwritten Character Recognition from Text Images

Jyoti Pareek(Ph.D); Suchit Purohit(Ph.D)

首页> 外文期刊>Procedia Computer Science >Gujarati Handwritten Character Recognition from Text Images

【24h】

Gujarati Handwritten Character Recognition from Text Images

机译：古吉拉蒂从文本图像手写的字符识别

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Today is the era of paperless office and governance. It comes with numerous advantages like increased productivity and efficiency, pervasiveness, storage optimization, robustness and eco-friendliness. Hence there is a need of converting paper documents into machine editable form. This leads to development of OCR (Optical Character Recognition). OCR is a technique to convert, mechanically or electronically an image, photo or scanned document of a handwritten text (HCR-Handwritten Character Recognition) or printed text (PCR- Printed Character Recognition) into digital text. HCR is a form of OCR that is specifically designed to recognize the handwritten text whereas PCR focuses on recognition of printed text. HCR is more challenging as compared to PCR due to diversity in human writing styles, size, curve, strokes and thickness of characters. Based on data acquisition mode, the OCR can either be online or offline. Offline recognition is performed in two ways: handwritten and printed [1]. In offline mode, the characters are on paper and captured using scanner or high-resolution camera whereas in online mode the pixel values of characters are captured by movement of cursor, pen or stylus. The HCR systems are readily available for foreign languages and many of the Indian languages like Bangla, Devanagari and Gurumukhi but for Gujarati language the HCR development is still in its infancy stage. This study focuses on development of an artificial intelligence based offline HCR system for Gujarati language. Important contribution of this study is data collection, of size 10,000 images from 250 number of people, of different age groups, of different professions. This paper describes a supervised classifier approach based on CNN (Convolutional Neural Networks) and MLP (Multi-Layer Perceptron) for recognition of handwritten Gujarati characters. A success rate of 97.21% is obtained using CNN and 64.48% using MLP. Lot of work has been done at character level, but very few has been done at word level recognition. Major focus of this study was on creating a continuous workflow for image to text conversion at word level.

机译：今天是无纸化办公室和治理的时代。它具有许多优势，如提高生产力和效率，普遍存算，储存优化，鲁棒性和生态友好性。因此，需要将纸质文档转换为机器可编辑形式。这导致了OCR的开发（光学字符识别）。 OCR是一种用于将手写文本（HCR-CandleTegrite识别）或打印文本（PCR-印刷字符识别）的图像，照片或扫描文档转换为数字文本的技术。 HCR是一种OCR的形式，专门用于识别手写文本，而PCR侧重于识别印刷文本。由于人类写作风格，尺寸，曲线，笔划和字符厚度的多样性，HCR与PCR相比，HCR更具挑战性。基于数据采集模式，OCR可以在线或离线。离线识别以两种方式执行：手写和印刷[1]。在离线模式下，字符在纸上并使用扫描仪或高分辨率相机捕获，而在线模式下，通过光标，笔或触控笔的移动捕获字符的像素值。 HCR系统很容易获得外语，许多印度语言，如Bangla，Devanagari和Gurumukhi，但对于古吉拉蒂语言，HCR开发仍处于初期阶段。本研究侧重于古吉拉特语言基于人工智能的开发。本研究的重要贡献是数据收集，大小来自250人的250人，不同的年龄组的不同职业。本文介绍了一种基于CNN（卷积神经网络）和MLP（多层Perceptron）的监督分类方法，用于识别手写的Gujarati字符。使用MLP使用CNN和64.48％获得97.21％的成功率。很多工作已经在角色级别完成，但很少有人在Word级别识别下完成。本研究的重点是创建图像的连续工作流程，以便在Word级别进行文本转换。

著录项

来源
《Procedia Computer Science》 |2020年第5期|共10页
作者
Jyoti Pareek(Ph.D); Suchit Purohit(Ph.D);
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
Histogram projection profileConnected Components LabellingSegmentationCNNMLPOptical Character Recognition;

机译：直方图投影ProfileConnected组件LabellingSeationCnMlPoptical字符识别;

相似文献

外文文献
中文文献
专利

1. Handwritten Gujarati Character Recognition Using Structural Decomposition Technique [J] . Ankit K. Sharma, Priyank Thakkar, Dipak M. Adhyaru, Pattern recognition and image analysis: advances in mathematical theory and applications in the USSR . 2019,第2期

机译：采用结构分解技术手写的古吉拉加提字符识别
2. COMPREHENSIVE STUDY ON GUJARATI HANDWRITTEN CHARACTER RECOGNITION [J] . Jitendra B. Upadhyay, Kalpesh B. Lad National Journal of System and Information Technology . 2017,第1期

机译：GUJARATI手写字符识别的综合研究
3. Text-Line and Character Segmentation for Off-line Recognition of Handwritten Japanese Text [J] . Kha Cong Nguyen, Nakagawa Masaki 電子情報通信学会技術研究報告. パターン認識·メディア理解. Pattern Recognition and Media Understanding . 2015,第517期

机译：文本行和字符分割，用于手写日语文本的离线识别
4. Augmentation based Convolutional Neural Network for recognition of Handwritten Gujarati Characters [C] . Pritesh Borad, Parth Dethaliya, Anand Mehta IEEE International Conference for Innovation in Technology . 2020

机译：基于增强的卷积神经网络，用于识别手写的Gujarati字符
5. Hierarchical character recognition and its use in handwritten word/phrase recognition [D] . Park, Jaehwa 2000

机译：分层字符识别及其在手写单词/短语识别中的应用
6. Handwritten Bangla Character Recognition Using the State-of-the-Art Deep Convolutional Neural Networks [O] . Md Zahangir Alom, Paheding Sidike, Mahmudul Hasan, 2018

机译：使用最先进的深度卷积神经网络进行手写Bangla字符识别
7. Survey on Offline Character Recognition for Handwritten Gujarati Text [O] . Bhumika B., Hinaxi M. 2017

机译：关于手写古吉拉特文本的离线字符识别调查

Gujarati Handwritten Character Recognition from Text Images

摘要

著录项

相似文献

相关主题

期刊订阅