首页> 外国专利> Graph based re-composition of document fragments for name entity recognition under exploitation of enterprise databases

Graph based re-composition of document fragments for name entity recognition under exploitation of enterprise databases

机译:基于图的文档片段重组,用于利用企业数据库进行名称实体识别

摘要

Methods and systems are described that involve recognizing complex entities from text documents with the help of structured data and Natural Language Processing (NLP) techniques. In one embodiment, the method includes receiving a document as input from a set of documents, wherein the document contains text or unstructured data. The method also includes identifying a plurality of text segments from the document via a set of tagging techniques. Further, the method includes matching the identified plurality of text segments against attributes of a set of predefined entities. Lastly, a best matching predefined entity is selected for each text segment from the plurality of text segments.;In one embodiment, the system includes a set of documents, each document containing text or unstructured data. The system also includes a database storage unit that stores a set of predefined entities, wherein each entity contains a set of attributes. Further, the system includes a processor to identify a plurality of text segments from a document via a set of tagging techniques and to match the identified plurality of text segments against the set of attributes.
机译:描述了涉及借助结构化数据和自然语言处理(NLP)技术从文本文档中识别复杂实体的方法和系统。在一个实施例中,该方法包括从一组文档中接收文档作为输入,其中该文档包含文本或非结构化数据。该方法还包括通过一组标记技术从文档中识别多个文本段。此外,该方法包括将所识别的多个文本片段与一组预定义实体的属性进行匹配。最后,从多个文本段中为每个文本段选择最佳匹配的预定义实体。在一个实施例中,系统包括一组文档,每个文档包含文本或非结构化数据。该系统还包括存储一组预定义实体的数据库存储单元,其中每个实体包含一组属性。此外,该系统包括处理器,该处理器经由一组标记技术从文档中识别出多个文本片段,并使所识别出的多个文本片段与该组属性相匹配。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号