首页> 外文会议>International Workshop on Document Analysis Systems >XCDF: A Canonical and Structured Document Format
【24h】

XCDF: A Canonical and Structured Document Format

机译:XCDF:规范和结构化文件格式

获取原文

摘要

Accessing the structured content of PDF document is a difficult task, requiring pre-processing and reverse engineering techniques. In this paper, we first present different methods to accomplish this task, which are based either on document image analysis, or on electronic content extraction. Then, XCDF, a canonical format with well-defined properties is proposed as a suitable solution for representing structured electronic documents and as an entry point for further researches and works. The system and methods used for reverse engineering PDF document into this canonical format are also presented. We finally present current applications of this work into various domains, spacing from data mining to multimedia navigation, and consistently benefiting from our canonical format in order to access PDF document content and structures.
机译:访问PDF文档的结构化内容是一项艰巨的任务,需要预处理和逆向工程技术。在本文中,我们首先呈现不同的方法来完成此任务,基于文档图像分析或电子内容提取。然后,XCDF,具有明确定义的属性的规范格式被提出为代表结构化电子文档的合适解决方案以及作为进一步研究和工作的入口点。还提出了用于逆向工程PDF文档的系统和方法。我们最终将此工作的当前应用程序呈现为各个域,从数据挖掘到多媒体导航,并始终如一地从我们的规范格式中受益,以便访问PDF文档内容和结构。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号