...
首页> 外文期刊>Semantic web >Network metrics for assessing the quality of entity resolution between multiple datasets
【24h】

Network metrics for assessing the quality of entity resolution between multiple datasets

机译:用于评估多个数据集之间实体分辨率质量的网络指标

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Matching entities between datasets is a crucial step for combining multiple datasets on the semantic web. A rich literature exists on different approaches to this entity resolution problem. However, much less work has been done on how to assess the quality of such entity links once they have been generated. Evaluation methods for link quality are typically limited to either comparison with a ground truth dataset (which is often not available), manual work (which is cumbersome and prone to error), or crowd sourcing (which is not always feasible, especially if expert knowledge is required). Furthermore, the problem of link evaluation is greatly exacerbated for links between more than two datasets, because the number of possible links grows rapidly with the number of datasets. In this paper, we propose a method to estimate the quality of entity links between multiple datasets. We exploit the fact that the links between entities from multiple datasets form a network, and we show how simple metrics on this network can reliably predict their quality. We verify our results in a large experimental study using six datasets from the domain of science, technology and innovation studies, for which we created a gold standard. This gold standard, available online, is an additional contribution of this paper. In addition, we evaluate our metric on a recently published gold standard to confirm our findings.
机译:数据集之间的实体匹配是在语义web上组合多个数据集的关键步骤。关于这个实体解决问题的不同方法,存在着丰富的文献。然而,在如何评估此类实体链接生成后的质量方面所做的工作要少得多。链路质量的评估方法通常限于与地面真实数据集(通常不可用)、手工工作(繁琐且容易出错)或众包(并不总是可行的,尤其是在需要专家知识的情况下)。此外,对于两个以上数据集之间的链接,链接评估问题会大大加剧,因为可能的链接数量会随着数据集的数量而快速增长。在本文中,我们提出了一种方法来估计多个数据集之间实体链接的质量。我们利用了来自多个数据集的实体之间的链接形成一个网络这一事实,并展示了这个网络上的简单指标是如何可靠地预测其质量的。我们使用科学、技术和创新研究领域的六个数据集,在一项大型实验研究中验证了我们的结果,并为此创建了一个金标准。该金标准可在线获取,是本文的补充贡献。此外,我们根据最近发布的金标准评估我们的指标,以确认我们的发现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号