首页> 外文会议>International semantic web conference >A Scalable Framework for Quality Assessment of RDF Datasets
【24h】

A Scalable Framework for Quality Assessment of RDF Datasets

机译:RDF数据集质量评估的可扩展框架

获取原文
获取外文期刊封面目录资料

摘要

Over the last years, Linked Data has grown continuously. Today, we count more than 10,000 datasets being available online following Linked Data standards. These standards allow data to be machine readable and inter-operable. Nevertheless, many applications, such as data integration, search, and interlinking, cannot take full advantage of Linked Data if it is of low quality. There exist a few approaches for the quality assessment of Linked Data, but their performance degrades with the increase in data size and quickly grows beyond the capabilities of a single machine. In this paper, we present DistQualityAssessment - an open source implementation of quality assessment of large RDF datasets that can scale out to a cluster of machines. This is the first distributed, in-memory approach for computing different quality metrics for large RDF datasets using Apache Spark. We also provide a quality assessment pattern that can be used to generate new scalable metrics that can be applied to big data. The work presented here is integrated with the SANSA framework and has been applied to at least three use cases beyond the SANSA community. The results show that our approach is more generic, efficient, and scalable as compared to previously proposed approaches.
机译:在过去的几年中,链接数据不断增长。如今,我们根据链接数据标准统计了超过10,000个在线可用数据集。这些标准允许数据是机器可读的和可互操作的。但是,如果数据质量低下,则许多应用程序(例如数据集成,搜索和互连)都无法充分利用链接数据。存在几种用于链接数据质量评估的方法,但是它们的性能会随着数据大小的增加而降低,并迅速增长,超出了单台计算机的功能范围。在本文中,我们介绍了DistQualityAssessment-一种大型RDF数据集质量评估的开源实现,该数据集可以扩展到一组机器。这是第一种使用Apache Spark为大型RDF数据集计算不同质量指标的内存中分布式方法。我们还提供了一种质量评估模式,可用于生成可应用于大数据的新的可扩展指标。此处介绍的工作已与SANSA框架集成在一起,并已应用于SANSA社区以外的至少三个用例。结果表明,与以前提出的方法相比,我们的方法更具通用性,效率和可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号