首页> 外文会议>Resource discovery >LiQuate-Estimating the Quality of Links in the Linking Open Data Cloud
【24h】

LiQuate-Estimating the Quality of Links in the Linking Open Data Cloud

机译:LiQuate-评估链接开放数据云中的链接质量

获取原文
获取原文并翻译 | 示例

摘要

During the last years, RDF datasets from almost any knowledge domain have been published in the Linking Open Data (LOD) cloud. The Linked Open Data guidelines establish the conditions to be satisfied by resources in order to be included as part of the LOD cloud, as well as connected to previously published data. The process of publication and linkage of resources in the LOD cloud relies on: ⅰ) data cleaning and transformation into existing RDF formats, ⅱ) storage of the data into RDF storage systems, and ⅲ) data interlinking. Because of data source heterogeneity, generated RDF data may be ambiguous and links may be incomplete with respect to this data. Users of the Web of Data require linked data to meet high quality standards in order to develop applications that can produce trustworthy results, but data in the LOD cloud has not been curated; thus, tools are necessary to detect data quality problems. For example, researchers that study Life Sciences datasets to explain phenomena or identify anomalies, demand that their findings correspond to current discoveries, and not to the effect of low data quality standards of completeness or redundancy. In this paper we propose LiQuate, a system that uses Bayesian networks to study the incompleteness of links, and ambiguities between labels and between links in the LOD cloud, and can be applied to any domain. Additionally, a probabilistic rule-based system is used to infer new links that associate equivalent resources, and allow to resolve the ambiguities and incompleteness identified during the exploration of the Bayesian network. As a proof of concept, we applied LiQuate to existing Life Sciences linked datasets, and detected ambiguities in the data, that may compromise the confidence of the results of applications such as link prediction or pattern discovery. We illustrate a variety of identified problems and propose a set of enriched intra- and inter-links that may improve the quality of data items and links of specific datasets of the LOD cloud.
机译:在过去的几年中,来自几乎所有知识领域的RDF数据集已经发布在链接开放数据(LOD)云中。链接的开放数据准则确定了资源要满足的条件,以便被包括在LOD云中并与先前发布的数据连接。 LOD云中资源的发布和链接过程取决于:ⅰ)数据清理和转换为现有RDF格式,ⅱ)将数据存储到RDF存储系统中,以及ⅲ)数据互连。由于数据源的异构性,生成的RDF数据可能会模棱两可,并且相对于此数据,链接可能不完整。数据网络的用户需要链接的数据才能满足高质量标准,以便开发可以产生可信赖结果的应用程序,但是LOD云中的数据尚未整理;因此,有必要使用工具来检测数据质量问题。例如,研究人员通过研究生命科学数据集来解释现象或识别异常,要求他们的发现与当前发现相对应,而不是对完整性或冗余性较低的数据质量标准造成的影响进行研究。在本文中,我们提出了LiQuate系统,该系统使用贝叶斯网络研究链接的不完整性以及LOD云中标签之间以及链接之间的歧义,并且可以应用于任何领域。另外,基于概率规则的系统用于推断关联等效资源的新链接,并允许解决在探索贝叶斯网络过程中发现的歧义和不完整性。作为概念证明,我们将LiQuate应用于现有的Life Sciences链接数据集,并检测到数据中的歧义,这可能会损害诸如链接预测或模式发现之类的应用程序结果的可信度。我们说明了各种已发现的问题,并提出了一组丰富的内部和内部链接,可以改善数据项的质量以及LOD云的特定数据集的链接。

著录项

  • 来源
    《Resource discovery》|2012年|56-82|共27页
  • 会议地点 Heraklion(GR)
  • 作者单位

    Universidad Simon Bolivar Caracas, Venezuela;

    Universidad Simon Bolivar Caracas, Venezuela;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号