首页> 外文会议>Asian Conference on Computer Vision >Deep Reader: Information Extraction from Document Images via Relation Extraction and Natural Language
【24h】

Deep Reader: Information Extraction from Document Images via Relation Extraction and Natural Language

机译:深度读者:通过关系提取和自然语言从文档图像提取信息

获取原文

摘要

Recent advancements in the area of Computer Vision with state-of-art Neural Networks has given a boost to Optical Character Recognition (OCR) accuracies. However, extracting characters/text alone is often insufficient for relevant information extraction as documents also have a visual structure that is not captured by OCR. Extracting information from tables, charts, footnotes, boxes, headings and retrieving the corresponding structured representation for the document remains a challenge and finds application in a large number of real-world use cases. In this paper, we propose a novel enterprise based end-to-end framework called DeepReader which facilitates information extraction from document images via identification of visual entities and populating a meta relational model across different entities in the document image. The model schema allows for an easy to understand abstraction of the entities detected by the deep vision models and the relationships between them. DeepReader has a suite of state-of-the-art vision algorithms which are applied to recognize handwritten and printed text, eliminate noisy effects, identify the type of documents and detect visual entities like tables, lines and boxes. Deep Reader maps the extracted entities into a rich relational schema so as to capture all the relevant relationships between entities (words, textboxes, lines etc.) detected in the document. Relevant information and fields can then be extracted from the document by writing SQL queries on top of the relationship tables. A natural language based interface is added on top of the relationship schema so that a non-technical user, specifying the queries in natural language, can fetch the information with minimal effort. In this paper, we also demonstrate many different capabilities of Deep Reader and report results on a real-world use case.
机译:计算机视觉与国家的最先进的神经网络领域的最新进展给予了推动光学字符识别(OCR)精度。但是,提取的字符/文本单独往往是不够的相关信息提取的文件也有不被OCR拍摄的视觉结构。从表格,图表,脚注,盒,标题中提取信息和检索文档的相应结构的代表性仍然在大量的真实使用情况下的挑战和得到应用。在本文中,我们提出了称为DeepReader一种新颖的基于企业的端至端的框架,它通过视觉实体的标识和填充文件图像中跨不同的实体的元关系模型有利于从文档图像信息提取。该模型的架构允许一个容易理解的深视觉模型和它们之间的关系检测机构的抽象。 DeepReader具有一套,其被施加到识别手写和印刷文本,消除噪声的影响,确定的文档的类型和检测如表,线和框视觉实体状态的最先进的视觉算法。深阅读器中提取的实体映射到一个丰富的关系模式,以便捕获(文字,文本框,线条等)的文件中检测到的实体之间的所有相关关系。相关信息和字段然后可以通过在关系表之上编写SQL查询从文档中提取。一个基于自然语言的界面上的关系架构的顶部增加,使非技术用户,在指定的自然语言查询,可以获取以最小的努力的信息。在本文中,我们也证明了深Reader和报告结果对现实世界的使用情况下,许多不同的功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号