首页> 外文会议>Asia-Pacific services computing conference >An Unsupervised Method for Linking Entity Mentions in Chinese Text
【24h】

An Unsupervised Method for Linking Entity Mentions in Chinese Text

机译:链接中文文本中实体提及的无监督方法

获取原文

摘要

Entity linking is the process of linking entity mentions in text with the unambiguous entity objects in a knowledge base. The technology is a key step of expanding a knowledge base, and can improve the information filtering ability of online recommendation systems, search engines, and other practical applications. However, the large number of entities, the diversity and ambiguity of entity names bring huge challenges for entity Unking research. In addition, the rare Chinese knowledge bases and the complex syntax of Chinese text restrict researching Chinese entity linking technologies. In order to meet the processing requirement of Chinese text, we propose an unsupervised Chinese entity linking method, namely un-CEML. This method uses Baidu encyclopedia as a knowledge base, exploits a similarity algorithm to obtain entries from Baidu encyclopedia, and combines the characteristics of this encyclopedia to obtain candidate entities, which can handle the abbreviation and wrongly segmenting entity mentions, ensuring the size of candidate entities and the probability of containing the target entity. In the ranking stage of candidate entities, we obtain the strongly relevant information of entity mentions based on the dependencies of components in a sentence as the context information, to reduce the noise of calculating the similarity with candidate entities. Because the nominal mentions are mostly common words, small correlation with the document knowledge, we deal with them separately. We conduct experiments on real data sets, and compare with some standard methods. The experimental results show that our method can solve the ambiguity problem of Chinese entity mentions, and achieve high accuracy of linking results.
机译:实体链接是将文本中的实体提及与知识库中明确的实体对象链接的过程。该技术是扩展知识库的关键步骤,可以提高在线推荐系统,搜索引擎和其他实际应用程序的信息过滤能力。但是,实体数量众多,实体名称的多样性和含糊性给实体Unking研究带来了巨大挑战。另外,中文基础知识的稀缺和中文文本语法的复杂性限制了中文实体链接技术的研究。为了满足中文文本的处理需求,我们提出了一种无监督的中文实体链接方法,即un-CEML。该方法以百度百科为知识库,利用相似性算法从百度百科中获取条目,并结合该百科的特征获取候选实体,可以处理缩写和错误地分割实体提及,确保候选实体的大小以及包含目标实体的概率。在候选实体的排序阶段,我们基于句子中各组成部分的依存关系获取实体提及的高度相关信息作为上下文信息,以减少计算与候选实体相似度的噪声。因为名词性提法通常是常用词,与文档知识的相关性很小,所以我们将它们分开处理。我们在真实数据集上进行实验,并与一些标准方法进行比较。实验结果表明,该方法可以解决中文实体提及中的歧义问题,并实现了高精度的链接结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号