首页> 外国专利> EXTRACTING NAMED ENTITIES BASED USING DOCUMENT STRUCTURE

EXTRACTING NAMED ENTITIES BASED USING DOCUMENT STRUCTURE

机译:使用文档结构提取命名实体

摘要

The invention relates to a method for extracting at least one entity in at least one document (100), comprising the steps of Receiving the at least one document (100) as input data (S1); Identifying at least one block (10) in the at least one document (100) based on the structure or layout of the at least one document (S2); Determining at least one feature (12) associated with the identified at least one block (10), wherein the at least one feature relates to the content of the at least one block (10), structure of the at least one block (10) and/or other block (10) related information (S3); and Determining at least one score for the at least one block (10) based on the at least one block (10) and the associated at least one feature (12) using machine learning; wherein the at least one score is the likelihood that the at least one block (10) contains the at least one entity. Further, the invention relates to a corresponding computer program product and system.
机译:本发明涉及一种用于提取至少一个文档(100)中的至少一个实体的方法,包括以下步骤:接收至少一个文档(100)作为输入数据(S1);根据所述至少一个文件(S2)的结构或布局来识别所述至少一个文件(100)中的至少一个块(10);确定与所识别的至少一个区块(10)相关联的至少一个特征(12),其中,所述至少一个特征与所述至少一个区块(10)的内容,所述至少一个区块(10)的结构有关和/或其他块(10)相关信息(S3);使用机器学习基于至少一个块(10)和相关的至少一个特征(12)确定至少一个块(10)的至少一个分数;其中,至少一个分数是至少一个块(10)包含至少一个实体的可能性。此外,本发明涉及相应的计算机程序产品和系统。

著录项

  • 公开/公告号EP3716104A1

    专利类型

  • 公开/公告日2020-09-30

    原文格式PDF

  • 申请/专利权人 SIEMENS AKTIENGESELLSCHAFT;

    申请/专利号EP20190165469

  • 发明设计人 BUCKLEY MARK;LANGER STEFAN;

    申请日2019-03-27

  • 分类号G06F17/27;G06F17/22;

  • 国家 EP

  • 入库时间 2022-08-21 11:39:27

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号