...
首页> 外文期刊>Procedia Computer Science >Gujarati Handwritten Character Recognition from Text Images
【24h】

Gujarati Handwritten Character Recognition from Text Images

机译:古吉拉蒂从文本图像手写的字符识别

获取原文
           

摘要

Today is the era of paperless office and governance. It comes with numerous advantages like increased productivity and efficiency, pervasiveness, storage optimization, robustness and eco-friendliness. Hence there is a need of converting paper documents into machine editable form. This leads to development of OCR (Optical Character Recognition). OCR is a technique to convert, mechanically or electronically an image, photo or scanned document of a handwritten text (HCR-Handwritten Character Recognition) or printed text (PCR- Printed Character Recognition) into digital text. HCR is a form of OCR that is specifically designed to recognize the handwritten text whereas PCR focuses on recognition of printed text. HCR is more challenging as compared to PCR due to diversity in human writing styles, size, curve, strokes and thickness of characters. Based on data acquisition mode, the OCR can either be online or offline. Offline recognition is performed in two ways: handwritten and printed [1]. In offline mode, the characters are on paper and captured using scanner or high-resolution camera whereas in online mode the pixel values of characters are captured by movement of cursor, pen or stylus. The HCR systems are readily available for foreign languages and many of the Indian languages like Bangla, Devanagari and Gurumukhi but for Gujarati language the HCR development is still in its infancy stage. This study focuses on development of an artificial intelligence based offline HCR system for Gujarati language. Important contribution of this study is data collection, of size 10,000 images from 250 number of people, of different age groups, of different professions. This paper describes a supervised classifier approach based on CNN (Convolutional Neural Networks) and MLP (Multi-Layer Perceptron) for recognition of handwritten Gujarati characters. A success rate of 97.21% is obtained using CNN and 64.48% using MLP. Lot of work has been done at character level, but very few has been done at word level recognition. Major focus of this study was on creating a continuous workflow for image to text conversion at word level.
机译:今天是无纸化办公室和治理的时代。它具有许多优势,如提高生产力和效率,普遍存算,储存优化,鲁棒性和生态友好性。因此,需要将纸质文档转换为机器可编辑形式。这导致了OCR的开发(光学字符识别)。 OCR是一种用于将手写文本(HCR-CandleTegrite识别)或打印文本(PCR-印刷字符识别)的图像,照片或扫描文档转换为数字文本的技术。 HCR是一种OCR的形式,专门用于识别手写文本,而PCR侧重于识别印刷文本。由于人类写作风格,尺寸,曲线,笔划和字符厚度的多样性,HCR与PCR相比,HCR更具挑战性。基于数据采集模式,OCR可以在线或离线。离线识别以两种方式执行:手写和印刷[1]。在离线模式下,字符在纸上并使用扫描仪或高分辨率相机捕获,而在线模式下,通过光标,笔或触控笔的移动捕获字符的像素值。 HCR系统很容易获得外语,许多印度语言,如Bangla,Devanagari和Gurumukhi,但对于古吉拉蒂语言,HCR开发仍处于初期阶段。本研究侧重于古吉拉特语言基于人工智能的开发。本研究的重要贡献是数据收集,大小来自250人的250人,不同的年龄组的不同职业。本文介绍了一种基于CNN(卷积神经网络)和MLP(多层Perceptron)的监督分类方法,用于识别手写的Gujarati字符。使用MLP使用CNN和64.48%获得97.21%的成功率。很多工作已经在角色级别完成,但很少有人在Word级别识别下完成。本研究的重点是创建图像的连续工作流程,以便在Word级别进行文本转换。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号