首页> 外文会议>Joint international conference on pervasive computing and the networked world >Scalable SAPRQL Querying Processing on Large RDF Data in Cloud Computing Environment
【24h】

Scalable SAPRQL Querying Processing on Large RDF Data in Cloud Computing Environment

机译:云计算环境中大型RDF数据的可扩展SAPRQL查询处理

获取原文

摘要

Recently the flexibility of RDF data model makes increasing number of organizations and communities keep their data available in the RDF format. There is a growing need for querying these data in scalable and efficient way. MapReduce is a parallel data processing solution for processing large data-intensive workloads, which is not supported directly for join-intensive workloads. In this paper, we present a schema based hybrid partitioning technique for RDF triples placement according to the relationships between them, and reduce the necessary number of MR cycles in each SAPRQL query job. Then we propose a lightweight sideways information passing techniques which pass the join information across MR jobs to decrease the intermediate results involved in join operations. The experimental results show that our approaches achieve a substantial performance improvement, and outperform the previous system by a factor of 2-20 using LUBM benchmark.
机译:最近,RDF数据模型的灵活性使得越来越多的组织和社区以RDF格式保持其数据可用。越来越需要以可扩展和高效的方式查询这些数据。 MapReduce是用于处理大型数据密集型工作负载的并行数据处理解决方案,而对于连接密集型工作负载则不直接支持。在本文中,我们根据RDF三元组之间的关系提出了一种基于模式的混合分区技术,用于RDF三元组放置,并减少了每个SAPRQL查询作业中所需的MR循环数。然后,我们提出了一种轻量级的横向信息传递技术,该技术可跨MR作业传递联接信息,以减少联接操作中涉及的中间结果。实验结果表明,使用LUBM基准测试,我们的方法可实现显着的性能改进,并且性能比以前的系统好2到20倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号