RDF partitioning for scalable SPARQL query processing

Xiaoyan WANG; Tao YANG; Jinchuan CHEN; Long HE; Xiaoyong DU

首页> 外文期刊>Frontiers of computer science in China >RDF partitioning for scalable SPARQL query processing

【24h】

RDF partitioning for scalable SPARQL query processing

机译：RDF分区可扩展SPARQL查询处理

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The volume of RDF data increases dramatically within recent years, while cloud computing platforms like Hadoop are supposed to be a good choice for processing queries over huge data sets for their wonderful scalability. Previous work on evaluating SPARQL queries with Hadoop mainly focus on reducing the number of joins through careful split of HDFS files and algorithms for generating Map/Reduce jobs. However, the way of partitioning RDF data could also affect system performance. Specifically, a good partitioning solution would greatly reduce or even totally avoid cross-node joins, and significantly cut down the cost in query evaluation. Based on HadoopDB, this work processes SPARQL queries in a hybrid architecture, where Map/Reduce takes charge of the computing tasks, and RDF query engines like RDF-3X store the data and execute join operations. According to the analysis of query workloads, this work proposes a novel algorithm for automatically partitioning RDF data and an approximate solution to physically place the partitions in order to reduce data redundancy. It also discusses how to make a good trade-off between query evaluation efficiency and data redundancy. All of these proposed approaches have been evaluated by extensive experiments over large RDF data sets.

机译：近年来，RDF数据的数量急剧增加，而Hadoop等云计算平台因其出色的可扩展性而被认为是处理海量数据集查询的理想选择。以前使用Hadoop评估SPARQL查询的工作主要集中在通过仔细分割HDFS文件和用于生成Map / Reduce作业的算法来减少联接数。但是，划分RDF数据的方式也会影响系统性能。具体来说，一个好的分区解决方案将大大减少甚至完全避免跨节点联接，并显着降低查询评估的成本。该工作基于HadoopDB，在混合架构中处理SPARQL查询，其中Map / Reduce负责计算任务，而RDF查询引擎（如RDF-3X）存储数据并执行联接操作。根据对查询工作负载的分析，这项工作提出了一种用于自动分割RDF数据的新颖算法，以及一种物理解决方案，以减少数据冗余的近似解决方案。它还讨论了如何在查询评估效率和数据冗余之间做出良好的权衡。所有这些建议的方法均已通过对大型RDF数据集的广泛实验进行了评估。

著录项

来源
《Frontiers of computer science in China》 |2015年第6期|919-933|共15页
作者
Xiaoyan WANG; Tao YANG; Jinchuan CHEN; Long HE; Xiaoyong DU;
展开▼
作者单位

School of Information, Renmin University of China, Beijing 100872, China,Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education, Renmin University, Beijing 100872, China,Information Center, Supreme People's Court, Beijing 100745, China;

School of Information, Renmin University of China, Beijing 100872, China;

Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education, Renmin University, Beijing 100872, China;

School of Information, Renmin University of China, Beijing 100872, China;

School of Information, Renmin University of China, Beijing 100872, China,Key Laboratory of Data Engineering and Knowledge Engineering of Ministry of Education, Renmin University, Beijing 100872, China,State Key Laboratory of Software Development Environment, Beihang University, Beijing 100191, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
RDF data; data partitioning; SPARQL query;

机译：RDF数据;数据分区;SPARQL查询;

相似文献

外文文献
中文文献
专利

1. Map-Side Join Processing of SPARQL Queries Based on Abstract RDF Data Filtering [J] . Song Minjae, Oh Hyunsuk, Seo Seungmin, Journal of database management . 2019,第1期

机译：基于抽象RDF数据过滤的SPARQL查询的地图侧联接处理
2. Map-Side Join Processing of SPARQL Queries Based on Abstract RDF Data Filtering [J] . Song Minjae, Oh Hyunsuk, Seo Seungmin, Journal of database management . 2019,第1期

机译：基于抽象RDF数据过滤的SPARQL查询的Map-Side加入处理
3. Processing SPARQL queries over distributed RDF graphs [J] . Peng Peng, Zou Lei, Ozsu M. Tamer, The VLDB journal . 2016,第2期

机译：通过分布式RDF图处理SPARQL查询
4. S3QLRDF: Property Table Partitioning Scheme for Distributed SPARQL Querying of large-scale RDF data [C] . Mahmudul Hassan, Srividya K. Bansal IEEE International Conference on Smart Data Services . 2020

机译：S3QLRDF：分布式SPARQL查询大规模RDF数据的属性表分区方案
5. A new approach for fast processing of SPARQL queries on RDF quadruples [D] . Slavov, Vasil Georgiev 2015

机译：快速处理RDF四倍的SPARQL查询的新方法
6. Processing SPARQL queries with regular expressions in RDF databases [O] . Jinsoo Lee, Minh-Duc Pham, Jihwan Lee, 2011

机译：使用RDF数据库中的正则表达式处理SPARQL查询
7. Processing SPARQL Queries Over Distributed RDF Graphs [O] . Peng, Peng, Zou, Lei, Özsu, M. Tamer, 2016

机译：处理分布式RDF图上的spaRQL查询

RDF partitioning for scalable SPARQL query processing

摘要

著录项

相似文献

相关主题

期刊订阅