...
首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >A Fuzzy Logic Approach to Wrapping PDF Documents
【24h】

A Fuzzy Logic Approach to Wrapping PDF Documents

机译:包装PDF文档的模糊逻辑方法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The PDF format represents the de facto standard for print-oriented documents. In this paper, we address the problem of wrapping PDF documents, which raises new challenges in several contexts of text data management. Our proposal is based on a novel bottom-up hierarchical wrapping approach that exploits fuzzy logic to handle the ȁC;uncertaintyȁD; which is intrinsic to the structure and presentation of PDF documents. A PDF wrapper is defined by specifying a set of group type definitions that impose a target structure to groups of tokens containing the required information. Constraints on token groupings are formulated as fuzzy conditions, which are defined on spatial and content predicates of tokens. We define a formal semantics for PDF wrappers and propose an algorithm for wrapper evaluation working in polynomial time with respect to the size of a PDF document. The proposed approach has been implemented in a wrapper generation system that offers visual capabilities to assist the designer in specifying and evaluating a PDF wrapper. Experimental results have shown good accuracy and applicability of our system to PDF documents of various domains.
机译:PDF格式代表了面向打印文档的事实上的标准。在本文中,我们解决了包装PDF文档的问题,这在文本数据管理的多种情况下提出了新的挑战。我们的建议基于一种新颖的自下而上的层次包装方法,该方法利用模糊逻辑来处理ȁC;不确定性ȁD;。这是PDF文档的结构和表示所固有的。通过指定一组组类型定义来定义PDF包装器,这些组类型定义将目标结构强加给包含所需信息的令牌组。将令牌分组的约束公式化为模糊条件,该条件在令牌的空间和内容谓词上定义。我们为PDF包装器定义了一种形式化语义,并针对PDF文档的大小,提出了一种在多项式时间内对包装器进行评估的算法。所提出的方法已在包装器生成系统中实现,该系统提供视觉功能以帮助设计人员指定和评估PDF包装器。实验结果表明我们的系统对不同领域的PDF文档具有良好的准确性和适用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号