首页> 外国专利> METHOD FOR AUTOMACTICALLY CONSTRUCTING CORPUS, METHOD AND APPARATUS FOR RECOGNIZING NAMED ENTITY USING THE SAME

METHOD FOR AUTOMACTICALLY CONSTRUCTING CORPUS, METHOD AND APPARATUS FOR RECOGNIZING NAMED ENTITY USING THE SAME

机译:自动构造语料库的方法,使用相同方法识别命名实体的方法和装置

摘要

Suggested are a method for automatically constructing a corpus, and a method and an apparatus for recognizing an entity name using the same, which specify unstructured personal information detection portion to have high detection accuracy, do not apply a heavy morpheme analysis method to be speedy, and construct a minimum dictionary and, if necessary, a dictionary via web search to reflect latest information. The suggested method for recognizing the entity name comprises the steps of: constructing an entity name dictionary recording the unstructured personal information; checking an entity name search result in the case that one or more among entry terms of the entity name dictionary and words inputted by a user are search targets, and extracting one or more snippets for each data characteristic; tagging the extracted snippets with corresponding entity names to secure entity name learning data; determining a learning model for recognizing the entity name, which is the unstructured personal information, based on the secured entity name learning data; and outputting a document tagged by automatically detecting the entity name, which is the unstructured personal information existing within a corresponding target document, using the entity name dictionary and the determined learning model as the target document is received.
机译:提出了一种自动构建语料库的方法,以及使用该方法识别实体名称的方法和设备,其指定非结构化的个人信息检测部分具有较高的检测精度,并且不应用重语素分析方法来进行快速处理,并构建最小词典,并在必要时通过网络搜索构建词典以反映最新信息。所建议的识别实体名称的方法包括以下步骤:构造记录非结构化个人信息的实体名称词典;在实体名称词典的输入词和用户输入的单词中的一个或多个为搜索目标的情况下,检查实体名称搜索结果,并为每个数据特征提取一个或多个摘要;用对应的实体名称标记所提取的片段,以保护实体名称学习数据;基于安全实体名称学习数据,确定用于识别实体名称的学习模型,该实体名称是非结构化的个人信息;通过接收实体名称字典和确定的学习模型作为目标文档,输出通过自动检测实体名称而标记的文档,该实体名称是对应目标文档内存在的非结构化个人信息。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号