An Adaptive Thresholding Algorithm-Based Optical Character Recognition System for Information Extraction in Complex Images

Daniel Akinbade; Adewale Opeoluwa Ogunde; Mba Obasi Odim; Bosede Oyenike Oguntunde

首页> 外文期刊>Journal of computer sciences >An Adaptive Thresholding Algorithm-Based Optical Character Recognition System for Information Extraction in Complex Images

【24h】

An Adaptive Thresholding Algorithm-Based Optical Character Recognition System for Information Extraction in Complex Images

机译：基于自适应阈值算法的复杂图像信息提取的光学字符识别系统

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Extracting texts from images with complex backgrounds is a major challenge today. Many existing Optical Character Recognition (OCR) systems could not handle this problem. As reported in the literature, some existing methods that can handle the problem still encounter major difficulties with extracting texts from images with sharp varying contours, touching word and skewed words from scanned documents and images with such complex backgrounds. There is, therefore, a need for new methods that could easily and efficiently extract texts from these images with complex backgrounds, which is the primary reason for this work. This study collected image data and investigated the processes involved in image processing and the techniques applied for data segmentation. It employed an adaptive thresholding algorithm to the selected images to properly segment text characters from the image’s complex background. It then used Tesseract, a machine learning product, to extract the text from the image file. The images used were coloured images sourced from the internet with different formats like jpg, png, webp and different resolutions. A custom adaptive algorithm was applied to the images to unify their complex backgrounds. This algorithm leveraged on the Gaussian thresholding algorithm. The algorithm differs from the conventional Gaussian algorithm as it dynamically generated the blocksize to apply threshing to the image. This ensured that, unlike conventional image segmentation, images were processed area-wise (in pixels) as specified by the algorithm at each instance. The system was implemented using Python 3.6 programming language. Experimentation involved fifty different images with complex backgrounds. The results showed that the system was able to extract English character-based texts from images with complex backgrounds with 69.7% word-level accuracy and 81.9% character-level accuracy. The proposed method in this study proved to be more efficient as it outperformed the existing methods in terms of the character level percentage accuracy.

机译：从复杂背景中提取图像的文本是今天的主要挑战。许多现有的光学字符识别（OCR）系统无法处理此问题。如文献中所报告的，一些可以处理问题的现有方法仍然遇到主要困难，其中用尖锐的不同轮廓提取来自图像的文本，从扫描的文档和图像中触摸单词和偏移的单词以及如此复杂的背景。因此，需要一种可以容易且有效地从这些图像中提取文本的新方法，其中包含复杂的背景，这是这项工作的主要原因。本研究收集了图像数据并研究了图像处理中涉及的过程以及应用于数据分割的技术。它采用自适应阈值算法到所选图像中从图像和rsquo; s复杂背景正确分段文本字符。然后它使用TESSERACT，机器学习产品，从图像文件中提取文本。所使用的图像是从互联网上源于互联网的彩色图像，不同的格式，如JPG，PNG，网页和不同的分辨率。自定义自适应算法应用于图像以统一其复杂背景。该算法利用高斯阈值算法。该算法与传统的高斯算法不同，因为它动态地生成了块以将脱发到图像。这确保了与传统的图像分割不同，根据每个实例的算法指定的区域是关于的区域 - WISE（以像素为单位）。系统使用Python 3.6编程语言实现。实验涉及复杂背景的五十个不同图像。结果表明，该系统能够从具有复杂背景的图像中提取基于英语字符的文本，具有69.7％的字级精度和81.9％的性格精度。本研究中的拟议方法被证明更有效，因为它在角色级别百分比精度方面表现出现有方法。

著录项

来源
《Journal of computer sciences》 |2020年第6期|共18页
作者
Daniel Akinbade; Adewale Opeoluwa Ogunde; Mba Obasi Odim; Bosede Oyenike Oguntunde;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
Adaptive Threshold AlgorithmComplex BackgroundsImagesOptical Character RecognitionPattern Recognition;

机译：自适应阈值算法复杂的Backgroundsimagesoptical Charact embotopationPattern识别;

相似文献

外文文献
中文文献
专利

1. An Adaptive Thresholding Algorithm-Based Optical Character Recognition System for Information Extraction in Complex Images [J] . Daniel Akinbade, Adewale Opeoluwa Ogunde, Mba Obasi Odim, Journal of computer sciences . 2020,第6期

机译：基于自适应阈值算法的复杂图像信息提取的光学字符识别系统
2. Image Thresholding for Optical Character Recognition and Other Applications Requiring Character Image Extraction [J] . IBM Journal of Research and Development . 1983,第4期

机译：用于光学字符识别的图像阈值处理和其他需要字符图像提取的应用
3. Automated system inspects radioactive medical imaging product labels A contact image sensor (CIS) line scan camera provides clear images of radiotracer labels for optical character recognition and optical character verification tasks. [J] . James Carroll Vision Systems Design . 2019,第10期

机译：自动化系统检查放射性医学成像产品标签接触式图像传感器（CIS）线扫描相机可提供放射性示踪剂标签的清晰图像，以进行光学字符识别和光学字符验证任务。
4. Chinese Optical Character Recognition for Information Extraction from Video Images [C] . Wing Hang Cheung, Ka Fai Pang, Michael R. Lyu, Proceedings of the International Conference on Imaging Science, Systems, and Technology (CISST'2000) . 2000

机译：用于从视频图像中提取信息的中文光学字符识别
5. A multimodal fusion approach for automatic postal address recognition system using Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR) techniques. [D] . Singh, Amriteshwar. 2011

机译：一种使用光学字符识别（OCR）和自动语音识别（ASR）技术的自动邮政地址识别系统的多模式融合方法。
6. A Real-Time Automatic Plate Recognition System Based on Optical Character Recognition and Wireless Sensor Networks for ITS [O] . Nicole do Vale Dalarmelina, Marcio Andrey Teixeira, Rodolfo I. Meneguette 2020

机译：基于光学字符识别和无线传感器网络的ITS实时自动车牌识别系统
7. An application analysis of methods of character extraction and recognition from complex scene images [O] . 李海良 2004

机译：复杂场景图像中字符提取与识别方法的应用与分析

An Adaptive Thresholding Algorithm-Based Optical Character Recognition System for Information Extraction in Complex Images

摘要

著录项

相似文献

相关主题

期刊订阅