【24h】

Deep Reader: Information Extraction from Document Images via Relation Extraction and Natural Language

机译:深度阅读器:通过关系提取和自然语言从文档图像中提取信息

获取原文

摘要

Recent advancements in the area of Computer Vision with state-of-art Neural Networks has given a boost to Optical Character Recognition (OCR) accuracies. However, extracting characters/text alone is often insufficient for relevant information extraction as documents also have a visual structure that is not captured by OCR. Extracting information from tables, charts, footnotes, boxes, headings and retrieving the corresponding structured representation for the document remains a challenge and finds application in a large number of real-world use cases. In this paper, we propose a novel enterprise based end-to-end framework called DeepReader which facilitates information extraction from document images via identification of visual entities and populating a meta relational model across different entities in the document image. The model schema allows for an easy to understand abstraction of the entities detected by the deep vision models and the relationships between them. DeepReader has a suite of state-of-the-art vision algorithms which are applied to recognize handwritten and printed text, eliminate noisy effects, identify the type of documents and detect visual entities like tables, lines and boxes. Deep Reader maps the extracted entities into a rich relational schema so as to capture all the relevant relationships between entities (words, textboxes, lines etc.) detected in the document. Relevant information and fields can then be extracted from the document by writing SQL queries on top of the relationship tables. A natural language based interface is added on top of the relationship schema so that a non-technical user, specifying the queries in natural language, can fetch the information with minimal effort. In this paper, we also demonstrate many different capabilities of Deep Reader and report results on a real-world use case.
机译:先进的神经网络在计算机视觉领域的最新进展促进了光学字符识别(OCR)的准确性。但是,仅提取字符/文本通常不足以进行相关信息提取,因为文档还具有OCR无法捕获的视觉结构。从表格,图表,脚注,方框,标题中提取信息并检索文档的相应结构化表示仍然是一项挑战,并在大量实际使用案例中找到了应用。在本文中,我们提出了一种新颖的基于企业的端到端框架,称为DeepReader,该框架可通过识别可视实体并在文档图像中的不同实体之间填充元关系模型来促进从文档图像中提取信息。该模型架构允许轻松理解由深度视觉模型检测到的实体及其之间的关系的抽象。 DeepReader拥有一套最先进的视觉算法,可用于识别手写和打印的文本,消除噪声影响,识别文档的类型以及检测诸如表格,线条和盒子之类的视觉实体。深度阅读器将提取的实体映射到丰富的关系模式中,以捕获在文档中检测到的实体(单词,文本框,行等)之间的所有相关关系。然后,可以通过在关系表之上编写SQL查询来从文档中提取相关信息和字段。在关系模式的顶部添加了基于自然语言的界面,以便非技术用户使用自然语言指定查询时,可以以最小的努力获取信息。在本文中,我们还演示了Deep Reader的许多不同功能,并在实际用例中报告结果。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号