首页> 外文会议>International symposium on visual computing >OCR as a Service: An Experimental Evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym
【24h】

OCR as a Service: An Experimental Evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym

机译:OCR即服务:对Google文档OCR,Tesseract,ABBYY FineReader和Transym的实验评估

获取原文

摘要

Optical character recognition (OCR) as a classic machine learning challenge has been a longstanding topic in a variety of applications in healthcare, education, insurance, and legal industries to convert different types of electronic documents, such as scanned documents, digital images, and PDF files into fully editable and searchable text data. The rapid generation of digital images on a daily basis prioritizes OCR as an imperative and foundational tool for data analysis. With the help of OCR systems, we have been able to save a reasonable amount of effort in creating, processing, and saving electronic documents, adapting them to different purposes. A set of different OCR platforms are now available which, aside from lending theoretical contributions to other practical fields, have demonstrated successful applications in real-world problems. In this work, several qualitative and quantitative experimental evaluations have been performed using four well-know OCR services, including Google Docs OCR, Tesseract, ABBYY FineReader, and Transym. We analyze the accuracy and reliability of the OCR packages employing a dataset including 1227 images from 15 different categories. Furthermore, we review the state-of-the-art OCR applications in healtcare informatics. The present evaluation is expected to advance OCR research, providing new insights and consideration to the research area, and assist researchers to determine which service is ideal for optical character recognition in an accurate and efficient manner.
机译:光学字符识别(OCR)作为经典的机器学习挑战已成为医疗保健,教育,保险和法律行业中各种应用程序中转换不同类型的电子文档(如扫描文档,数字图像和PDF)的长期问题。文件转换为完全可编辑和可搜索的文本数据。每天快速生成数字图像使OCR成为数据分析的必不可少的基础工具。借助OCR系统,我们已经能够在创建,处理和保存电子文档方面进行合理的工作量调整,以使其适应不同的目的。现在提供了一组不同的OCR平台,这些平台除了为其他实际领域提供理论上的贡献外,还展示了在实际问题中的成功应用。在这项工作中,已经使用四个众所周知的OCR服务进行了一些定性和定量的实验评估,包括Google Docs OCR,Tesseract,ABBYY FineReader和Transym。我们使用包含15个不同类别的1227张图像的数据集来分析OCR软件包的准确性和可靠性。此外,我们回顾了医疗保健信息学中最先进的OCR应用程序。当前的评估有望促进OCR研究,为研究领域提供新的见解和考虑,并帮助研究人员确定哪种服务最适合以准确有效的方式进行光学字符识别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号