首页> 外国专利> APPARATUS AND METHOD FOR RECOGNIZING IMAGE-BASED CONTENT PRESENTED IN A STRUCTURED LAYOUT

APPARATUS AND METHOD FOR RECOGNIZING IMAGE-BASED CONTENT PRESENTED IN A STRUCTURED LAYOUT

机译:用于识别以结构化布局呈现的基于图像的内容的装置和方法

摘要

A method for extracting information from a table includes steps as follows. Characters of a table are extracted. The characters are merged into n-gram characters. The n-gram characters are merged into words and text lines through a two-stage GNN mode. The two-stage GNN mode comprises sub steps as: spatial features, semantic features, CNN image features are extracted from a target source; a first GNN stage is processed to output graph embedding spatial features from the spatial features; and a second GNN stage is processed to output graph embedding semantic features and graph embedding CNN image features from the semantic features and the CNN image features, respectively. The text lines are merged into cells. The cells are grouped into rows, columns, and key-value pairs, obtaining row, column, and key-value relationships among the cells. Adjacency matrices are generated in response to the row, column, and key-value relationships among the cells.
机译:用于从表中提取信息的方法包括如下步骤。 提取表格的字符。 字符合并为n-gram字符。 N-GRAM字符通过两级GNN模式合并为单词和文本行。 两级GNN模式包括子步骤,如:空间特征,语义特征,CNN图像特征是从目标源提取的; 处理第一GNN阶段以输出嵌入空间特征的曲线图; 并处理第二个GNN阶段以分别输出植物嵌入语义特征和图形嵌入来自语义特征和CNN图像特征的CNN图像特征。 文本行被合并到单元格中。 将小区分组为行,列和键值对,获取小区之间的行,列和键值关系。 响应于小区中的行,列和键值关系而生成邻接矩阵。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号