首页> 外国专利> Method and apparatus for detecting pagination constructs including a header and a footer in legacy documents

Method and apparatus for detecting pagination constructs including a header and a footer in legacy documents

机译:用于检测遗留文档中包括页眉和页脚的分页构造的方法和设备

摘要

A method for identifying header/footer content of a document, in order to sequence text fragments comprising recognizable text blocks as derived from the document. The textual variability of lines comprised of text blocks, including the different kinds of text blocks within the line is analyzed for assessment of textual variability. Header/footer zones are defined by textual content having a low textual variability. An alternative embodiment identifies pagination constructs by comparing selected text-boxes for similarity and proximity and clustering the text boxes satisfying a predetermined similarity value, wherein the clustered text boxes are deemed to comprise pagination constructs.
机译:一种用于识别文档的页眉/页脚内容的方法,以便对包括从文档派生的可识别文本块的文本片段进行排序。分析由文本块组成的行的文本可变性,包括行内不同种类的文本块,以评估文本可变性。页眉/页脚区域由文本可变性低的文本内容定义。备选实施例通过比较所选文本框的相似度和接近度并聚类满足预定相似度值的文本框来识别分页构造,其中,被聚类的文本框被认为包括分页构造。

著录项

  • 公开/公告号US9218326B2

    专利类型

  • 公开/公告日2015-12-22

    原文格式PDF

  • 申请/专利权人 HERVE DEJEAN;JEAN-LUC MEUNIER;

    申请/专利号US201113032996

  • 发明设计人 HERVE DEJEAN;JEAN-LUC MEUNIER;

    申请日2011-02-23

  • 分类号G06F17;G06F17/21;G06F17/27;

  • 国家 US

  • 入库时间 2022-08-21 14:29:29

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号