首页> 外文会议>Document recognition and retrieval XXI >Document Page Structure Learning for Fixed-layout E-books Using Conditional Random Fields
【24h】

Document Page Structure Learning for Fixed-layout E-books Using Conditional Random Fields

机译:使用条件随机字段的固定版式电子书文档页面结构学习

获取原文
获取原文并翻译 | 示例

摘要

In this paper, a model is proposed to learn logical structure of fixed-layout document pages by combining support vector machine (SVM) and conditional random fields (CRF). Features related to each logical label and their dependencies are extracted from various original Portable Document Format (PDF) attributes. Both local evidence and contextual dependencies are integrated in the proposed model so as to achieve better logical labeling performance. With the merits of SVM as local discriminative classifier and CRF modeling contextual correlations of adjacent fragments, it is capable of resolving the ambiguities of semantic labels. The experimental results show that CRF based models with both tree and chain graph structures outperform the SVM model with an increase of macro-averaged F_1 by about 10%.
机译:本文提出了一个模型,通过结合支持向量机(SVM)和条件随机字段(CRF)来学习固定版式文档页面的逻辑结构。从每个原始的可移植文档格式(PDF)属性中提取与每个逻辑标签及其依赖项有关的功能。本地证据和上下文相关性都集成在建议的模型中,以实现更好的逻辑标记性能。利用SVM作为局部判别分类器的优点以及CRF对相邻片段的上下文相关性进行建模的能力,它能够解决语义标签的歧义。实验结果表明,具有树形图和链图图结构的基于CRF的模型优于SVM模型,其宏平均F_1增加了约10%。

著录项

  • 来源
    《Document recognition and retrieval XXI》|2014年|90210I.1-90210I.9|共9页
  • 会议地点 San Francisco CA(US)
  • 作者

    Xin Tao; Zhi Tang; Canhui Xu;

  • 作者单位

    Institute of Computer Science and Technology, Peking University, Beijing, China;

    Institute of Computer Science and Technology, Peking University, Beijing, China ,State Key Laboratory of Digital Publishing Technology, Beijing, China;

    Institute of Computer Science and Technology, Peking University, Beijing, China ,State Key Laboratory of Digital Publishing Technology, Beijing, China;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Page structure; Logical labeling; Conditional Random Fields; Fixed-layout document;

    机译:页面结构;逻辑标签;条件随机场;固定版式文件;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号