首页> 外文会议>IAPR International Workshop on Document Analysis Systems >Ground-Truth and Performance Evaluation for Page Layout Analysis of Born-Digital Documents
【24h】

Ground-Truth and Performance Evaluation for Page Layout Analysis of Born-Digital Documents

机译:出生数字文档的页面布局分析的基础和性能评估

获取原文

摘要

In this paper, a new dataset is proposed for page layout analysis of born-digital documents. By extracting uniformly the document contents, an XML based data format is designed in terms of raw data and structure data. Utilizing a self-developed ground-truthing tool, a public dataset is constructed from diverse styles of document resources. With consideration of physical segmentation and logical labeling, automatic performance evaluation methods are adjusted to cope with different scenarios. The applications of the proposed dataset have shown that it is suitable for evaluating various layout analysis tasks.
机译:本文提出了一个新的数据集,用于出生数字文档的页面布局分析。通过统一提取文档内容,根据原始数据和结构数据设计了基于XML的数据格式。利用自行开发的地面真相工具,可以从多种样式的文档资源中构建公共数据集。考虑到物理分段和逻辑标记,调整了自动性能评估方法以应对不同的情况。所提出的数据集的应用表明,它适用于评估各种布局分析任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号