首页> 外文会议>International Workshop on Resource Discovery >LiQuate-Estimating the Quality of Links in the Linking Open Data Cloud
【24h】

LiQuate-Estimating the Quality of Links in the Linking Open Data Cloud

机译:液化估计链接开放数据云中的链接质量

获取原文

摘要

During the last years, RDF datasets from almost any knowledge domain have been published in the Linking Open Data (LOD) cloud. The Linked Open Data guidelines establish the conditions to be satisfied by resources in order to be included as part of the LOD cloud, as well as connected to previously published data. The process of publication and linkage of resources in the LOD cloud relies on: i) data cleaning and transformation into existing RDF formats, ii) storage of the data into RDF storage systems, and iii) data interlinking. Because of data source heterogeneity, generated RDF data may be ambiguous and links may be incomplete with respect to this data. Users of the Web of Data require linked data to meet high quality standards in order to develop applications that can produce trustworthy results, but data in the LOD cloud has not been curated; thus, tools are necessary to detect data quality problems. For example, researchers that study Life Sciences datasets to explain phenomena or identify anomalies, demand that their findings correspond to current discoveries, and not to the effect of low data quality standards of completeness or redundancy. In this paper we propose LiQuate, a system that uses Bayesian networks to study the incompleteness of links, and ambiguities between labels and between links in the LOD cloud, and can be applied to any domain. Additionally, a probabilistic rule-based system is used to infer new links that associate equivalent resources, and allow to resolve the ambiguities and incompleteness identified during the exploration of the Bayesian network. As a proof of concept, we applied LiQuate to existing Life Sciences linked datasets, and detected ambiguities in the data, that may compromise the confidence of the results of applications such as link prediction or pattern discovery. We illustrate a variety of identified problems and propose a set of enriched intra- and inter-links that may improve the quality of data items and links of specific datasets of the LOD cloud
机译:在过去几年中,几乎任何知识域的RDF数据集已在链接开放数据(LOD)云中发布。链接的开放式数据指南建立了资源满足的条件,以便作为LOD云的一部分包含,以及连接到以前发布的数据。 LOD云中资源的发布过程和链接依靠:i)数据清理和转换成现有的RDF格式,ii)将数据存储到RDF存储系统中,以及III)数据互连。由于数据源异质性,所生成的RDF数据可能是模糊的,并且对于该数据来说可能是不完整的。数据网络的用户需要链接数据以满足高质量标准,以便开发可以产生值得信赖的结果的应用程序,但Lod Cloud中的数据尚未愈合;因此,需要工具来检测数据质量问题。例如,研究生命科学数据集的研究人员解释现象或识别异常,要求他们的发现对应于当前的发现,而不是低数据质量标准的完整性或冗余的影响。在本文中,我们提出了一种使用贝叶斯网络来研究链接的不完整性的系统,以及标签之间的模糊,并且可以应用于任何域的链路之间。此外,基于概率规则的系统用于推断将同等资源相关联的新链接,并允许解决在探索贝叶斯网络期间确定的歧义和不完整性。作为概念证明,我们将水合物应用于现有的生命科学链接数据集,并检测到数据中的歧义,这可能会损害诸如链路预测或模式发现的应用结果的置信度。我们说明了各种所识别的问题,并提出了一套丰富的内部链路,可以提高LOD云的特定数据集的数据项和链接

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号