首页> 外国专利> EXTRACTION DEVICE FOR COMPOSITE GRAPH IN FIXED LAYOUT DOCUMENT AND EXTRACTION METHOD THEREOF

EXTRACTION DEVICE FOR COMPOSITE GRAPH IN FIXED LAYOUT DOCUMENT AND EXTRACTION METHOD THEREOF

机译:固定版面文档中复合图形的提取装置及其提取方法

摘要

An extraction device for the composite graph in a fixed layout document comprising: a document parsing unit, for parsing the fixed layout document, and determining the primitives of the fixed layout document and their types; a layer generation unit, for extracting text primitives so as to form a text layer, and using the rest non-text primitives to form a non-text layer; a page analysis unit, for processing the text layer and the non-text layer with page analyses respectively; a block generation unit, for generating a text block in the text layer and a graph block in the non-text layer; a correlation block determination unit, for determining text blocks correlating to every graph block and merging those correlated text blocks and graph blocks into a composite graph block; an identifier storage unit, for storing the identifiers of all the primitives contained in the composite graph block.
机译:一种用于固定布局文档中的合成图的提取装置,包括:文档解析单元,用于解析所述固定布局文档,并确定所述固定布局文档的原语及其类型;层生成单元,用于提取文本基元以形成文本层,并使用其余的非文本基元形成非文本层;页面分析单元,用于分别通过页面分析处理文本层和非文本层;块生成单元,用于在文本层中生成文本块,在非文本层中生成图形块;相关块确定单元,用于确定与每个图形块相关的文本块,并将这些相关的文本块和图形块合并为合成图形块;标识符存储单元,用于存储包含在合成图块中的所有图元的标识符。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号