首页> 外文会议>IEEE International Conference on Parallel and Distributed Systems >ScalaRDF: A Distributed, Elastic and Scalable In-Memory RDF Triple Store
【24h】

ScalaRDF: A Distributed, Elastic and Scalable In-Memory RDF Triple Store

机译:ScalaRDF:分布式,弹性和可扩展的内存RDF三重存储

获取原文

摘要

The Resource Description Framework (RDF) andSPARQL query language are gaining increasing popularity andacceptance. The ever-increasing RDF data has reached a billionscale of triples, resulting in the proliferation of distributed RDFstore systems within the Semantic Web community. However, theelasticity and performance issues are still far from settled inface of data volume explosion and workload spike. In addition, providers face great pressures to provision uninterrupted reliablestorage service whilst reducing the operational costs due to avariety of system failures. Therefore, how to efficiently realizesystem fault tolerance remains an intractable problem. In this paper, we introduce ScalaRDF, a distributed and elastic in-memoryRDF triple store to provision a fault-tolerant and scalable RDFstore and query mechanism. Specifically, we describe a consistenthashing protocol that optimizes the RDF data placement, dataoperations (especially for online RDF triple update operations)and achieves an autonomously elastic data re-distribution in theevent of cluster node joining or departing, avoiding the holisticoscillation of data storage. In addition, the data store is ableto realize a rapid and transparent failover through replicationmechanism which stores in-memory data replica in the next hashhop. The experiments demonstrate that query time and updatetime are reduced by 87% and 90% respectively compared to otherapproaches. For an 18G source dataset, the data redistributiontakes at most 60 seconds when system scales out and at most 100seconds for recovery when nodes undergo crash-stop failures.
机译:资源描述框架(RDF)和SPARQL查询语言正变得越来越受欢迎和接受。不断增长的RDF数据已达到10亿的三倍,导致语义Web社区中分布式RDFstore系统的激增。但是,弹性和性能问题仍然远远没有解决,因为数据量激增和工作量激增。此外,由于各种系统故障,提供商在提供不间断的可靠存储服务的同时也面临着巨大的压力,同时降低了运营成本。因此,如何有效地实现系统容错仍然是一个棘手的问题。在本文中,我们介绍了ScalaRDF,这是一种分布式弹性的内存中RDF三元存储,以提供容错和可伸缩的RDF存储和查询机制。具体来说,我们描述了一种一致性哈希协议,该协议可优化RDF数据放置,数据操作(特别是对于在线RDF三重更新操作),并在群集节点加入或离开的情况下实现自主弹性的数据重新分配,从而避免了数据存储的整体振荡。另外,数据存储能够通过复制机制实现快速透明的故障转移,该机制将内存中的数据副本存储在下一个哈希中。实验表明,与其他方法相比,查询时间和更新时间分别减少了87%和90%。对于18G源数据集,当系统扩展时,数据重新分配最多需要60秒,而当节点发生崩溃停止故障时,最多需要100秒才能恢复。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号