首页> 外文会议>International Conference on Pattern Recognition >The PAGE (Page Analysis and Ground-Truth Elements) Format Framework
【24h】

The PAGE (Page Analysis and Ground-Truth Elements) Format Framework

机译:页面(页面分析和地面真实元素)格式框架

获取原文

摘要

There is a plethora of established and proposed document representation formats but none that can adequately support individual stages within an entire sequence of document image analysis methods (from document image enhancement to layout analysis to OCR) and their evaluation. This paper describes PAGE, a new XML-based page image representation framework that records information on image characteristics (image borders, geometric distortions and corresponding corrections, binarisation etc.) in addition to layout structure and page content. The suitability of the framework to the evaluation of entire workflows as well as individual stages has been extensively validated by using it in high-profile applications such as in public contemporary and historical ground-truthed datasets and in the ICDAR Page Segmentation competition series.
机译:有一个已建立和提议的文件表示格式,但无可以在整个文档图像分析方法(从文档图像增强到Blayout分析到OCR)和评估中,可以充分支持各个阶段的单个阶段。本文介绍了页面,一种基于新的XML的页面图像表示框架,其除了布局结构和页面内容之外还记录图像特征(图像边框,几何失真和相应校正,二等作用等)。通过在高调的应用中使用它在公共当代和历史地面判决数据集和icDar页面分割竞赛系列中,通过在高调的应用中使用它,框架对整个工作流的评估以及单个阶段的适用性以及各个阶段进行了广泛的验证。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利