首页> 外文会议>International Conference on Computing for Sustainable Global Development >Comparative Analysis of Text Extraction from Color Images using Tesseract and OpenCV
【24h】

Comparative Analysis of Text Extraction from Color Images using Tesseract and OpenCV

机译:使用TESSERACT和OPENCV从彩色图像中提取文本提取的比较分析

获取原文

摘要

Image-based Text Extraction has a growing requirement in today's generation. Students, doctors, and engineers generate a lot of images every day. It is very important to extract text from these images in a simple yet effective manner. We can obtain useful information by testing these images. We aim is to summarize the visual information and retrieve its content. The Optical Recognition System involves several algorithms that fulfill this purpose. Text Extraction involves a lot of processes from text detection, localization, segmentation and, text recognition. Tesseract is the most optimized OCR Engine build by HP Labs and owned by Google. Text Detection involves the recognition of text from desired input images. Text Localization involves identifying the position of text on the images. Tesseract works pretty well on the light-colored background but unable to recognize text on darker shades. We have tried to apply various image processing techniques. This method will allow us to recognize text from most types of background. We propose to provide methods for easy text extraction. Track bar allows the user to adjust various parameters to extract a required text from an Image. This method is gaining huge importance in years to come. For Automation, we can use a set of image processing techniques such as edge detection, filtering and, blurring for better results. A series of these steps will enable us to extract text from images efficiently. This experiment compares the optimized result by two methods for efficient Text Extraction.
机译:基于图像的文本提取在今天的一代中具有不断增长的要求。学生,医生和工程师每天都会产生很多图像。以简单但有效的方式从这些图像中提取文本非常重要。我们可以通过测试这些图像来获得有用的信息。我们的目标是总结视觉信息并检索其内容。光学识别系统涉及几种满足该目的的算法。文本提取涉及从文本检测,定位,分割和文本识别的大量进程。 TESSERACT是HP实验室的最优化的OCR引擎构建并由Google拥有。文本检测涉及从所需输入图像识别文本。文本本地化涉及识别图像上的文本的位置。 Tesseract在浅色背景上很好地工作,但无法在较暗的色调上识别文本。我们尝试应用各种图像处理技术。此方法允许我们识别大多数类型的背景的文本。我们建议提供简单的文本提取的方法。轨道栏允许用户调整各种参数以从图像中提取所需的文本。这种方法在未来几年内取得了很大的重要性。对于自动化,我们可以使用一组图像处理技术,如边缘检测,过滤和模糊以获得更好的结果。一系列这些步骤将使我们能够有效地从图像中提取文本。该实验比较了通过两种有效文本提取方法的优化结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号