首页> 外文期刊>Intelligent data analysis >Entity resolution in disjoint graphs: An application on genealogical data
【24h】

Entity resolution in disjoint graphs: An application on genealogical data

机译:不相交图中的实体解析:家谱数据的应用

获取原文
获取原文并翻译 | 示例
           

摘要

Entity Resolution (ER) is the process of identifying references referring to the same entity from one or more data sources. In the ER process, most existing approaches exploit the content information of references, categorized as content-based ER, or additionally consider linkage information among references, categorized as context-based ER. However, in new applications of ER, such as in the genealogical domain, the very limited linkage information among references results in a disjoint graph in which the existing content-/context-based ER techniques have very limited applicability. Therefore, in this paper we propose first, to use the homophily principle for augmentation of the original input graph by connecting the potential similar references, and second, to use a Random Walk based approach to consider contextual information available for each reference in the augmented graph. We evaluate the proposed method by applying it to a large genealogical dataset and we succeed to predict 420,000 reference matches with precision 92% and discover six novel and informative patterns among them which can not be detected in the original disjoint graph.
机译:实体解析(ER)是从一个或多个数据源中识别引用同一实体的引用的过程。在ER过程中,大多数现有方法都利用参考的内容信息(归类为基于内容的ER),或另外考虑参考之间的链接信息,归类为基于上下文的ER。但是,在ER的新应用中,例如在家谱领域,参考文献之间非常有限的链接信息会导致脱节图,其中现有的基于内容/上下文的ER技术的适用性非常有限。因此,在本文中,我们建议首先,通过连接潜在的相似引用,使用同构原理来增强原始输入图,其次,使用基于随机游走的方法来考虑可用于扩展图中的每个引用的上下文信息。 。我们通过将其应用到大型族谱数据集来评估该方法,并成功地以92%的精度预测了420,000个参考匹配,并发现了其中的6种新颖且信息丰富的模式,这些模式在原始脱节图中无法检测到。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号