首页> 外文期刊>ACM Computing Surveys >Document Layout Analysis: A Comprehensive Survey
【24h】

Document Layout Analysis: A Comprehensive Survey

机译:文档布局分析:全面调查

获取原文
获取原文并翻译 | 示例
       

摘要

Document layout analysis (DLA) is a preprocessing step of document understanding systems. It is responsible for detecting and annotating the physical structure of documents. DLA has several important applications such as document retrieval, content categorization, text recognition, and the like. The objective of DLA is to ease the subsequent analysis/recognition phases by identifying the document-homogeneous blocks and by determining their relationships. The DLA pipeline consists of several phases that could vary among DLA methods, depending on the documents' layouts and final analysis objectives. In this regard, a universal DLA algorithm that fits all types of document-layouts or that satisfies all analysis objectives has not been developed, yet. In this survey paper, we present a critical study of different document layout analysis techniques. The study highlights the motivational reasons for pursuing DLA and discusses comprehensively the different phases of the DLA algorithms based on a general framework that is formed as an outcome of reviewing the research in the field. The DLA framework consists of preprocessing, layout analysis strategies, post-processing, and performance evaluation phases. Overall, the article delivers an essential baseline for pursuing further research in document layout analysis.
机译:文档布局分析(DLA)是一个记录理解系统的预处理步骤。它负责检测和注释文档的物理结构。 DLA有几个重要的应用程序,如文档检索,内容分类,文本识别等。 DLA的目的是通过识别文档均匀块来缓解随后的分析/识别阶段,并通过确定其关系来缓解随后的分析/识别阶段。 DLA管道由几个阶段组成,这些阶段可能因DLA方法而异,具体取决于文件的布局和最终分析目标。在这方面,尚未开发尚未开发适合所有类型的文档布局或满足所有分析目标的通用DLA算法。在本调查纸上,我们展示了对不同文件布局分析技术的关键研究。该研究突出了追求DLA的动机原因,并基于一般框架全面讨论了DLA算法的不同阶段,该框架形成为审查现场研究的结果。 DLA框架包括预处理,布局分析策略,后处理和性能评估阶段。总体而言,该文章提供了一种基本的基准,用于追求文献布局分析的进一步研究。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号