首页> 外文会议>Document Recognition and Retrieval XIII; Electronic Imaging Science and Technology >Graphic Design Principles for Automated Document Segmentation and Understanding
【24h】

Graphic Design Principles for Automated Document Segmentation and Understanding

机译:自动文档分割和理解的图形设计原理

获取原文
获取原文并翻译 | 示例

摘要

When designers develop a document layout their objective is to convey a specific message and provoke a specific response from the audience. Design principles provide the foundation for identifying document components and relations among them to extract implicit knowledge from the layout. Variable Data Printing enables the production of personalized printing jobs for which traditional proofing of all the job instances could result unfeasible. This paper explains a rule-based system that uses design principles to segment and understand document context. The system uses the design principles of repetition, proximity, alignment, similarity, and contrast as the foundation for the strategy in document segmentation and understanding which holds a strong relation with the recognition of artifacts produced by the infringement of the constraints articulated in the document layout. There are two main modules in the tool: the geometric analysis module; and the design rule engine. The geometric analysis module extracts explicit knowledge from the data provided in the document. The design rule module uses the information provided by the geometric analysis to establish logical units inside the document. We used a subset of XSL-FO, sufficient for designing documents with an adequate amount complexity. The system identifies components such as headers, paragraphs, lists, images and determines the relations between them, such as header-paragraph, header-list, etc. The system provides accurate information about the geometric properties of the components, detects the elements of the documents and identifies corresponding components between a proofed instance and the rest of the instances in a Variable Data Printing Job.
机译:当设计师设计文档布局时,他们的目标是传达特定的信息并引起听众的特定回应。设计原则为识别文档组件及其之间的关系提供了基础,以从布局中提取隐式知识。可变数据打印可实现个性化打印作业的生产,所有作业实例的传统打样可能无法实现。本文介绍了一个基于规则的系统,该系统使用设计原则来分割和理解文档上下文。该系统使用重复,接近,对齐,相似和对比的设计原理作为文档分割和理解策略的基础,该策略与对因违反文档布局中阐明的约束而产生的工件的识别有着密切的关系。该工具有两个主要模块:几何分析模块;和设计规则引擎。几何分析模块从文档中提供的数据中提取显式知识。设计规则模块使用几何分析提供的信息在文档内部建立逻辑单元。我们使用了XSL-FO的子集,足以用于设计具有足够数量复杂性的文档。该系统识别组件,例如标题,段落,列表,图像,并确定它们之间的关系,例如标题-段落,标题列表等。系统提供有关组件几何特性的准确信息,检测组件的几何形状。记录并标识可变数据打印作业中经过证明的实例与其余实例之间的相应组件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号