Towards a canonical and structured representation of PDF documents through reverse engineering

机译：通过反向工程实现PDF文档的规范化和结构化表示

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This article presents Xed, a reverse engineering tool for PDF documents, which extracts the original document layout structure. Xed mixes electronic extraction methods with state-of-the-art document analysis techniques and outputs the layout structure in a hierarchical canonical form, i.e. which is universal and independent of the document type. This article first reviews the major traps and tricks of the PDF format. It then introduces the architecture of Xed along with its main modules, and, in particular, the document physical structure extraction algorithm. Later on, a canonical format is proposed and discussed with an example. Finally the results of a practical evaluation are presented, followed by an outline of future works on the logical structure extraction.

机译：本文介绍了Xed，这是一种用于PDF文档的逆向工程工具，它可以提取原始的文档布局结构。 Xed将电子提取方法与最新的文档分析技术相结合，并以分层的规范形式输出布局结构，即通用且独立于文档类型的布局结构。本文首先回顾了PDF格式的主要陷阱和技巧。然后介绍Xed的体系结构及其主要模块，特别是文档物理结构提取算法。稍后，将提出规范格式并通过示例进行讨论。最后，介绍了实际评估的结果，然后概述了逻辑结构提取的未来工作。

著录项

来源
《VLSI Multilevel Interconnection Conference, 1990.》|1990年|p.1050-1054|共5页
会议地点
作者
Rigamonti M.; Bloechle J.L.; Hadjar K.; Lalanne D.; Ingold R.;
展开▼
作者单位

Fribourg Univ., Switzerland;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Reengineering PDF-based documents targeting complex software specifications [J] . Mehrdad Nojoumian, Timothy C. Lethbridge International journal of knowledge and web intelligence . 2011,第4期

机译：重新设计针对复杂软件规范的基于PDF的文档
2. A Canonical Piecewise-Linear Representation Theorem: Geometrical Structures Determine Representation Capability [J] . Wen C., Ma X. Circuits and Systems II: Express Briefs, IEEE Transactions on . 2011,第12期

机译：典型的分段线性表示定理：几何结构确定表示能力
3. Document engineering approaches toward scalable and structured multimedia, web and printable documents [J] . Maria da Graca Pimentel, Dick C. A. Bulterman, Luiz Fernando Gomes Soares Multimedia Tools and Applications . 2009,第3期

机译：面向可扩展和结构化的多媒体，Web和可打印文档的文档工程方法
4. Document Engineering for a Digital Library PDF recompression using JBIG2 and other optimization of PDF documents [C] . Petr Sojka, Radim Hatlapatka 10th ACM symposium on document engineering 2010 . 2010

机译：使用JBIG2进行数字图书馆PDF重新压缩的文档工程和其他PDF文档优化
5. Reverse engineering of data structures from binary. [D] . Lin, Zhiqiang. 2011

机译：二进制数据结构的逆向工程。
6. Local Crystal Structure of Antiferroelectric Bi2Mn4/3Ni2/3O6 in Commensurateand Incommensurate Phases Described by Pair Distribution Function(PDF) and Reverse Monte Carlo (RMC) Modeling [O] . RobertJ. Szczecinski, Samantha Y. Chong, Philip A. Chater, -1

机译：反铁电Bi2Mn4 / 3Ni2 / 3O6的局部晶体结构对分布函数描述的不对称相位（PDF）和反向蒙特卡洛（RMC）建模
7. Towards a Canonical and Structured Representation of PDF Documents through Reverse Engineering [O] . Maurizio Rigamonti, Jean-luc Bloechle, Karim Hadjar, 2005

机译：通过逆向工程实现PDF文档的规范化和结构化表示
8. GRASP/Ada: Graphical Representations of Algorithms, Structures, and Processes for Ada. The development of a program analysis environment for Ada: Reverse engineering tools for Ada, task 2, phase 3 [R] . Cross, James H., II 1991

机译：GRasp / ada：ada的算法，结构和过程的图形表示。为ada开发程序分析环境：ada的逆向工程工具，任务2，第3阶段

Towards a canonical and structured representation of PDF documents through reverse engineering

摘要

著录项

相似文献

相关主题

期刊订阅