首页> 外文会议>2018 IEEE 19th International Conference on Information Reuse and Integration for Data Science >RDF Data Storage Techniques for Efficient SPARQL Query Processing Using Distributed Computation Engines
【24h】

RDF Data Storage Techniques for Efficient SPARQL Query Processing Using Distributed Computation Engines

机译:使用分布式计算引擎进行高效SPARQL查询处理的RDF数据存储技术

获取原文
获取原文并翻译 | 示例

摘要

The rapidly growing amount of linked open data demands semantic RDF services that are efficient, scalable, and distributed along with high availability and fault tolerance. To address this concern, the Big Data processing infrastructure Hadoop has been adopted for RDF data management systems. In this paper, we introduce distributed RDF data stores, namely VPExp and 3CStore, based on the existing vertical partitioning (VP) approach. In the VPExp approach, we propose splitting of predicates based on explicit type information of an object. The 3CStore scheme is designed with a 3-column store, comprising of a subset of triples from the VP table based on different join correlations, to reduce the number of join operations while executing SPARQL queries as SQL in a distributed system. We evaluate these two RDF data storage approaches by comparing them with vertical partitioning approach and state-of-the-art RDF management system S2RDF. We also present an evaluation of query performance of these systems built upon two popular distributed computation engines namely, Spark and Drill.
机译:链接开放数据的数量迅速增长,需要高效,可扩展和分布式的语义RDF服务,以及高可用性和容错能力。为了解决此问题,RDF数据管理系统采用了大数据处理基础架构Hadoop。在本文中,我们基于现有的垂直分区(VP)方法介绍分布式RDF数据存储,即VPExp和3CStore。在VPExp方法中,我们建议根据对象的显式类型信息进行谓词拆分。 3CStore方案设计为具有3列存储,该存储由基于不同联接相关性的VP表中三元组的子集组成,以减少在分布式系统中以SQL形式执行SPARQL查询时的联接操作数。通过将它们与垂直分区方法和最新的RDF管理系统S2RDF进行比较,我们评估了这两种RDF数据存储方法。我们还介绍了基于两个流行的分布式计算引擎Spark和Drill构建的这些系统的查询性能评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号