首页> 外文会议>International Electrical Engineering Congress >Using OCR Framework and Information Extraction for Thai Documents Digitization
【24h】

Using OCR Framework and Information Extraction for Thai Documents Digitization

机译:使用OCR框架和泰语文档的信息提取数字化

获取原文

摘要

At present, digital transformation has taken into an account for many enterprises and businesses. The data has been stored and structured into usable format in order for analytics propose. However, some data has been storing in the format of hard copies, scanned document, images and PDFs which need to be transformed into digital form for future use. The objective of this paper is to propose the technique for recognizing the text from a physical document into digital format, by using Optical Character Recognition, also called OCR that attempts to extract all the text from photocopies into database structure. The experimental studies showed that the proposed technique makes the digitized documents completely searchable and editable with the average of accuracy performance around 75.38% for extracting attributes and 66.92% for extracting values from printed documents. This technology provides significant benefits to all businesses. Utilizing OCR helps businesses to easily seek for highly useful information throughout the document, and also there is a reduced amount of paper taking up space in the office.
机译:目前,数字转型已经占许多企业和企业的账户。数据已存储和结构为可用格式,以便进行分析建议。但是,某些数据以硬拷贝,扫描文档,图像和PDF的格式存储,需要将其转换为数字表单以供将来使用。本文的目的是通过使用光学字符识别,提出从物理文档将文本识别到数字格式的技术,也称为OCR试图将来自复印件的所有文本提取到数据库结构中。实验研究表明,该技术使数字化文件完全搜索和可编辑,精度性能的平均值约为75.38%,以提取属性和66.92%,用于从打印文档中提取值。这项技术对所有业务提供了重大的好处。利用OCR帮助企业在整个文件中轻松寻求高度有用的信息,并且还有一笔缩短的纸张占据了办公室的空间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号