首页>
外文OA文献
>Machine Learning for digital document processing: from layout analysis to metadata extraction
【2h】
Machine Learning for digital document processing: from layout analysis to metadata extraction
展开▼
机译:用于数字文档处理的机器学习:从布局分析到元数据提取
展开▼
免费
页面导航
摘要
著录项
引文网络
相似文献
相关主题
摘要
In the last years, the spread of computers and the Internet caused a significant amount of documents to be available in digital format. Collecting them in digital repositories raised problems that go beyond simple acquisition issues, and cause the need to organize and classify them in order to improve the effectiveness and efficiency of the retrieval procedure. The success of such a process is tightly related to the ability of understanding the semantics of the document components and content. Since the obvious solution of manually creating and maintaining an updatedudindex is clearly infeasible, due to the huge amount of data under consideration,udthere is a strong interest in methods that can provide solutions for automaticallyudacquiring such a knowledge. This work presents a framework that intensively exploits intelligent techniques to support different tasks of automatic document processing from acquisition to indexing, from categorization to storing and retrieval.udThe prototypical version of the system DOMINUS is presented, whose main characteristic is the use of a Machine Learning Server, a suite of different inductiveudlearning methods and systems, among which the more suitable for each specific documentudprocessing phase is chosen and applied. The core system is the incrementaludfirst-order logic learner INTHELEX. Thanks to incrementality, it can continuously update and refine the learned theories, dynamically extending its knowledge to handle even completely new classes of documents.udSince DOMINUS is general and flexible, it can be embedded as a document management engine into many different Digital Library systems. Experiments in a real-world domain scenario, scientific conference management, confirmed the goodudperformance of the proposed prototype.
展开▼