首页> 外文期刊>Frontiers of computer science in China >RDF partitioning for scalable SPARQL query processing
【24h】

RDF partitioning for scalable SPARQL query processing

机译:RDF分区可扩展SPARQL查询处理

获取原文
获取原文并翻译 | 示例
           

摘要

The volume of RDF data increases dramatically within recent years, while cloud computing platforms like Hadoop are supposed to be a good choice for processing queries over huge data sets for their wonderful scalability. Previous work on evaluating SPARQL queries with Hadoop mainly focus on reducing the number of joins through careful split of HDFS files and algorithms for generating Map/Reduce jobs. However, the way of partitioning RDF data could also affect system performance. Specifically, a good partitioning solution would greatly reduce or even totally avoid cross-node joins, and significantly cut down the cost in query evaluation. Based on HadoopDB, this work processes SPARQL queries in a hybrid architecture, where Map/Reduce takes charge of the computing tasks, and RDF query engines like RDF-3X store the data and execute join operations. According to the analysis of query workloads, this work proposes a novel algorithm for automatically partitioning RDF data and an approximate solution to physically place the partitions in order to reduce data redundancy. It also discusses how to make a good trade-off between query evaluation efficiency and data redundancy. All of these proposed approaches have been evaluated by extensive experiments over large RDF data sets.
机译:近年来,RDF数据的数量急剧增加,而Hadoop等云计算平台因其出色的可扩展性而被认为是处理海量数据集查询的理想选择。以前使用Hadoop评估SPARQL查询的工作主要集中在通过仔细分割HDFS文件和用于生成Map / Reduce作业的算法来减少联接数。但是,划分RDF数据的方式也会影响系统性能。具体来说,一个好的分区解决方案将大大减少甚至完全避免跨节点联接,并显着降低查询评估的成本。该工作基于HadoopDB,在混合架构中处理SPARQL查询,其中Map / Reduce负责计算任务,而RDF查询引擎(如RDF-3X)存储数据并执行联接操作。根据对查询工作负载的分析,这项工作提出了一种用于自动分割RDF数据的新颖算法,以及一种物理解决方案,以减少数据冗余的近似解决方案。它还讨论了如何在查询评估效率和数据冗余之间做出良好的权衡。所有这些建议的方法均已通过对大型RDF数据集的广泛实验进行了评估。

著录项

  • 来源
    《Frontiers of computer science in China》 |2015年第6期|919-933|共15页
  • 作者单位

    School of Information, Renmin University of China, Beijing 100872, China,Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education, Renmin University, Beijing 100872, China,Information Center, Supreme People's Court, Beijing 100745, China;

    School of Information, Renmin University of China, Beijing 100872, China;

    Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education, Renmin University, Beijing 100872, China;

    School of Information, Renmin University of China, Beijing 100872, China;

    School of Information, Renmin University of China, Beijing 100872, China,Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education, Renmin University, Beijing 100872, China,State Key Laboratory of Software Development Environment, Beihang University, Beijing 100191, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    RDF data; data partitioning; SPARQL query;

    机译:RDF数据;数据分区;SPARQL查询;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号