首页> 外文会议>ISWC 2011;International semantic web conference >Automatically Generating Data Linkages Using a Domain-Independent Candidate Selection Approach
【24h】

Automatically Generating Data Linkages Using a Domain-Independent Candidate Selection Approach

机译:使用独立于域的候选者选择方法自动生成数据链接

获取原文

摘要

One challenge for Linked Data is scalably establishing high-quality owl:sameAs links between instances (e.g., people, geographical locations, publications, etc.) in different data sources. Traditional approaches to this entity coreference problem do not scale because they exhaustively compare every pair of instances. In this paper, we propose a candidate selection algorithm for pruning the search space for entity coreference. We select candidate instance pairs by computing a character-level similarity on discriminating literal values that are chosen using domain-independent unsupervised learning. We index the instances on the chosen predicates' literal values to efficiently look up similar instances. We evaluate our approach on two RDF and three structured datasets. We show that the traditional metrics don't always accurately reflect the relative benefits of candidate selection, and propose additional metrics. We show that our algorithm frequently outperforms alternatives and is able to process 1 million instances in under one hour on a single Sun Workstation. Furthermore, on the RDF datasets, we show that the entire entity coreference process scales well by applying our technique. Surprisingly, this high recall, low precision filtering mechanism frequently leads to higher F-scores in the overall system.
机译:链接数据的挑战之一是在不同数据源中的实例(例如人物,地理位置,出版物等)之间可伸缩地建立高质量的owl:sameAs链接。解决此实体共指问题的传统方法无法扩展,因为它们详尽地比较了每对实例。在本文中,我们提出了一种候选选择算法,用于修剪实体共指关系的搜索空间。我们通过在区分文字值上计算字符级相似度来选择候选实例对,这些文字值是使用与领域无关的无监督学习选择的。我们在所选谓词的文字值上为实例建立索引,以有效地查找相似的实例。我们在两个RDF和三个结构化数据集上评估我们的方法。我们证明了传统的指标并不能总是准确地反映出候选人选择的相对优势,并提出了其他指标。我们证明了我们的算法经常优于其他算法,并且能够在单个Sun Workstation上在一小时内处理100万个实例。此外,在RDF数据集上,我们证明了应用我们的技术可以很好地扩展整个实体共参照过程。令人惊讶的是,这种高召回率,低精度过滤机制经常导致整个系统的F分数更高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号