Machine Learning of Generalized Document Templates for Data Extraction

机译：用于数据提取的广义文档模板的机器学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The purpose of this research is to reverse engineer the process of encoding data in structured documents and subsequently automate the process of extracting it. We assume a broad category of structured documents for processing that goes beyond form processing. In fact, the documents may have flexible layouts and consist of multiple and varying numbers of pages. The data extraction method (DataX) employs general templates generated by the Inductive Template Generator (InTeGen). The InTeGen method utilizes inductive learning from examples of documents with identified data elements. Both methods achieve high automation with minimal user's input.

机译：本研究的目的是逆转工程师在结构化文档中编码数据的过程，随后自动化提取它的过程。我们假设广泛的结构化文件，以便处理超出形式处理。实际上，文档可能具有灵活的布局并由多个和不同数量的页面组成。数据提取方法（Datax）采用由电感模板生成器（整数）产生的一般模板。整数方法利用具有识别数据元素的文档示例的归纳学习。两种方法都通过最小的用户输入实现了高自动化。

著录项

来源
《International Workshop on Document Analysis Systems》|2002年||共12页
会议地点
作者
Janusz Wnek;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词

相似文献

外文文献
中文文献
专利

1. Using machine learning for concept extraction on clinical documents from multiple data sources. [J] . Torii M, Wagholikar K, Liu H Journal of the American Medical Informatics Association : . 2011,第5期

机译：使用机器学习从多个数据源提取临床文档的概念。
2. Machine learning, template matching, and the International Tracing Service digital archive: Automating the retrieval of death certificate reference cards from 40 million document scans [J] . Lee Benjamin Charles Germain Literary & linguistic computing . 2019,第3期

机译：机器学习，模板匹配和International Tracing Service数字档案馆：从4000万份文档扫描中自动检索死亡证书参考卡
3. SFEM: Structural feature extraction methodology for the detection of malicious office documents using machine learning methods [J] . Cohen Aviad, Nissim Nir, Rokach Lior, Expert Systems with Application . 2016,第nova期

机译：SFEM：使用机器学习方法检测恶意Office文档的结构特征提取方法
4. Machine Learning of Generalized Document Templates for Data Extraction [C] . Janusz Wnek International Workshop on Document Analysis Systems . 2002

机译：用于数据提取的广义文档模板的机器学习
5. Document understanding using data mining and machine learning techniques. [D] . Wang, Dingding. 2010

机译：使用数据挖掘和机器学习技术进行文档理解。
6. Using machine learning for concept extraction on clinical documents from multiple data sources [O] . Manabu Torii, Kavishwar Wagholikar, Hongfang Liu 2011

机译：使用机器学习从多个数据源提取临床文档的概念
7. Machine Learning of Generalized Document Templates for Data Extraction [O] . Janusz Wnek 2002

机译：用于数据提取的广义文档模板的机器学习

Machine Learning of Generalized Document Templates for Data Extraction

摘要

著录项

相似文献

相关主题

期刊订阅