End-to-End Optical Character Recognition for Bengali Handwritten Words

机译：Bengali手写单词的端到端光学字符识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Optical character recognition (OCR) is a process of converting analogue documents into digital using document images. Currently, many commercial and non-commercial OCR systems exist for both handwritten and printed copies for different languages. Despite this, very few works are available in case of recognising Bengali words. Among them, most of the works focused on OCR of printed Bengali characters. This paper introduces an end-to-end OCR system for Bengali language. The proposed architecture implements an end to end strategy that recognises handwritten Bengali words from handwritten word images. We experiment with popular convolutional neural network (CNN) architectures, including DenseNet, Xception, NASNet, and MobileNet to build the OCR architecture. Further, we experiment with two different recurrent neural networks (RNN) methods, LSTM and GRU. We evaluate the proposed architecture using BanglaWritting dataset, which is a peer-reviewed Bengali handwritten image dataset. The proposed method achieves 0.091 character error rate and 0.273 word error rate performed using DenseNet121 model with GRU recurrent layer.

机译：光学字符识别（OCR）是使用文档图像将模拟文档转换为数字图像的过程。目前，不同语言的手写和印刷副本都存在许多商业和非商业OCR系统。尽管如此，在识别孟加拉语时，可以使用很少的作品。其中，大多数作品专注于印刷孟加拉人物的OCR。本文介绍了孟加拉语的端到端OCR系统。拟议的架构实现了结束到结束策略，识别手写的字图像中的手写孟加拉语。我们试验热门的卷积神经网络（CNN）架构，包括Densenet，Xcepion，NASnet和MobileNet来构建OCR架构。此外，我们试验两种不同的经常性神经网络（RNN）方法，LSTM和GRU。我们使用Banglawritt DataSet评估所提出的架构，该数据集是一个对等待的孟加拉手写图像数据集。所提出的方法实现了使用Densenet121模型进行了0.091个字符的误差率和0.273字误差率，使用GRU复发层进行了Gru复发层。

著录项

来源
《IEEE National Computing Colleges Conference》|2021年|1-7|共7页
会议地点
作者
Farisa Benta Safir; Abu Quwsar Ohi; M.F. Mridha; Muhammad Mostafa Monowar; Md. Abdul Hamid;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Handwriting recognition; Recurrent neural networks; Image recognition; Error analysis; Computer architecture; Optical computing; Optical imaging;

机译：手写识别;复发性神经网络;图像识别;错误分析;计算机架构;光学计算;光学成像;

相似文献

外文文献
中文文献
专利

1. A Complete Bengali OCR: A Novel Hybrid Approach to Handwritten Bengali Character Recognition [J] . Kaykobad M., Rahman A. F. R. Journal of computing and information technology . 1998,第4期

机译：完整的孟加拉OCR：手写孟加拉字符识别的新型混合方法
2. HANDWRITTEN BENGALI CHARACTER RECOGNITION THROUGH GEOMETRY BASED FEATURE EXTRACTION [J] . MOSHIUR RAHMAN, IQBAL MAHMUD, PALASH UDDIN, Journal of Theoretical and Applied Information Technology . 2019,第23期

机译：基于几何特征提取的手写孟加拉语字符识别
3. Recognition of handwritten Bengali characters: a novel multistage approach [J] . Rahman AFR., Rahman R., Fairhurst MC. Pattern Recognition: The Journal of the Pattern Recognition Society . 2002,第5期

机译：手写孟加拉语字符的识别：一种新颖的多阶段方法
4. Pattern Recognition based Tasks and Achievements on Handwritten Bengali Character Recognition [C] . Apash Roy, Debayani Ghosh International Conference on Inventive Computation Technologies . 2021

机译：基于模式识别的任务与手写孟加拉人物识别的成就
5. Hierarchical character recognition and its use in handwritten word/phrase recognition [D] . Park, Jaehwa 2000

机译：分层字符识别及其在手写单词/短语识别中的应用
6. FOCUS on clinical research informatics: Development of an optical character recognition pipeline for handwritten form fields from an electronic health record [O] . Luke V Rasmussen, Peggy L Peissig, Catherine A McCarty, 2012

机译：关于临床研究信息学的FOCUS：开发电子字符记录中用于手写表格领域的光学字符识别管道
7. Automatic Recognition of Handwritten Bengali Broken Characters (BBC): Simulating Human Pattern Matching [O] . Manas Ranjan Nayak, Saswat Nayak, Yetirajam Manas, 2013

机译：自动识别手写的孟加拉破字符（BBC）：模拟人模式匹配

End-to-End Optical Character Recognition for Bengali Handwritten Words

摘要

著录项

相似文献

相关主题

期刊订阅