Using OCR Framework and Information Extraction for Thai Documents Digitization

机译：使用OCR框架和泰语文档的信息提取数字化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

At present, digital transformation has taken into an account for many enterprises and businesses. The data has been stored and structured into usable format in order for analytics propose. However, some data has been storing in the format of hard copies, scanned document, images and PDFs which need to be transformed into digital form for future use. The objective of this paper is to propose the technique for recognizing the text from a physical document into digital format, by using Optical Character Recognition, also called OCR that attempts to extract all the text from photocopies into database structure. The experimental studies showed that the proposed technique makes the digitized documents completely searchable and editable with the average of accuracy performance around 75.38% for extracting attributes and 66.92% for extracting values from printed documents. This technology provides significant benefits to all businesses. Utilizing OCR helps businesses to easily seek for highly useful information throughout the document, and also there is a reduced amount of paper taking up space in the office.

机译：目前，数字转型已经占许多企业和企业的账户。数据已存储和结构为可用格式，以便进行分析建议。但是，某些数据以硬拷贝，扫描文档，图像和PDF的格式存储，需要将其转换为数字表单以供将来使用。本文的目的是通过使用光学字符识别，提出从物理文档将文本识别到数字格式的技术，也称为OCR试图将来自复印件的所有文本提取到数据库结构中。实验研究表明，该技术使数字化文件完全搜索和可编辑，精度性能的平均值约为75.38％，以提取属性和66.92％，用于从打印文档中提取值。这项技术对所有业务提供了重大的好处。利用OCR帮助企业在整个文件中轻松寻求高度有用的信息，并且还有一笔缩短的纸张占据了办公室的空间。

著录项

来源
《International Electrical Engineering Congress》|2021年|440-443|共4页
会议地点
作者
Todsanai Chumwatana; Waramporn Rattana-umnuaychai;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Image recognition; Text recognition; Databases; Optical imaging; Information retrieval; Optical character recognition software; Character recognition;

机译：图像识别;文本识别;数据库;光学成像;信息检索;光学字符识别软件;字符识别;

相似文献

外文文献
中文文献
专利

1. A Bilingual Numeral OCR System for Creating Uni-Lingual Digitized Numeral Document [J] . Karthick K, Chitra S Modern Applied Science . 2015,第13期

机译：一种用于创建Uni-Lingual数字化数字文档的双语数字OCR系统
2. The Giles Ecosystem – Storage, Text Extraction, and OCR of Documents [J] . Julia Damerow, B. R. Erick Peirson, Manfred D. Laubichler Journal of Open Research Software . 2017,第1期

机译：Giles生态系统–文件的存储，文本提取和OCR
3. A Structural Analysis Based Feature Extraction Method for OCR System For Myanmar Printed Document Images [J] . Htwe Pa Pa Win, Phyo Thu Thu Khine, KhinNweNi Tun International journal of computer vision and iImage processing . 2012,第1期

机译：基于结构分析的缅甸印刷文档图像OCR系统特征提取方法
4. An information extraction framework for legal documents: A case study of Thai Supreme Court verdicts [C] . Kowsrihawat Kankawin, Vateekul Peerapon The 2015 12th international joint conference on computer science and software engineering: "shaping the future with convergence" . 2015

机译：法律文件信息提取框架：以泰国最高法院判决为例
5. A hybrid two-dimensional HMM and MLP OCR system for processing multi-font and low-quality English documents. [D] . Fu, Nenghong. 2004

机译：混合的二维HMM和MLP OCR系统，用于处理多字体和低质量的英语文档。
6. Towards Mobile OCR: How To Take a Good Picture of a Document Without Sight [O] . Michael Cutter, Roberto Manduchi -1

机译：迈向移动OCR：如何在无视的情况下对文档进行良好的拍摄
7. Digitization technologies - From the introduction to the application 7 Digitization methods for printed documents - OCR and digitizing tablet [O] . Masaki YAMAOKA 1999

机译：数字化技术 - 从介绍应用程序7数字化方法 - OCR和数字化平板电脑

Using OCR Framework and Information Extraction for Thai Documents Digitization

摘要

著录项

相似文献

相关主题

期刊订阅