首页> 外文会议>International Conference on Document Analysis and Recognition >Semantic Label and Structure Model based Approach for Entity Recognition in Database Context
【24h】

Semantic Label and Structure Model based Approach for Entity Recognition in Database Context

机译:数据库上下文中基于语义标签和结构模型的实体识别方法

获取原文

摘要

This paper proposes an entity recognition approach in scanned documents referring to their description in database records. First, using the database record values, the corresponding document fields are labeled. Second, entities are identified by their labels and ranked using a TF/IDF based score. For each entity, local labels are grouped into a graph. This graph is matched with a graph model (structure model) which represents geometric structures of local entity labels using a specific cost function. This model is trained on a set of well chosen entities semi-automatically annotated. At the end, a correction step allows us to complete the eventual entity mislabeling using geometrical relationships between labels. The evaluation on 200 business documents containing 500 entities reaches about 93% for recall and 97% for precision.
机译:本文针对扫描文档中的实体识别方法,参考其在数据库记录中的描述,提出了一种实体识别方法。首先,使用数据库记录值标记相应的文档字段。其次,实体通过其标签进行标识,并使用基于TF / IDF的分数进行排名。对于每个实体,将局部标签分组为图形。该图与使用特定成本函数表示局部实体标签的几何结构的图模型(结构模型)匹配。该模型是在一组精心挑选的,经过半自动注释的实体上进行训练的。最后,校正步骤使我们能够使用标签之间的几何关系来完成最终的实体标签错误。对包含500个实体的200个业务文档的评估,召回率大约为93%,精确度则为97%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号