首页> 外文学位 >Distributed SPARQL over Big RDF Data, A Comparative Analysis Using Presto and MapReduce.
【24h】

Distributed SPARQL over Big RDF Data, A Comparative Analysis Using Presto and MapReduce.

机译:在大型RDF数据上进行分布式SPARQL,使用Presto和MapReduce进行比较分析。

获取原文
获取原文并翻译 | 示例

摘要

The processing of large volumes of RDF data require an efficient storage and query processing engine that can scale well with the volume of data. The initial attempts to address this issue focused on optimizing native RDF stores as well as conventional relational databases management systems. But as the volume of RDF data grew to exponential proportions, the limitations of these systems became apparent and researchers began to focus on using big data analysis tools, most notably Hadoop, to process RDF data. Various studies and benchmarks that evaluate these tools for RDF data processing have been published. In the past two and half years, however, heavy users of big data systems, like Facebook, noted limitations with the query performance of these big data systems and began to develop new distributed query engines for big data that do not rely on map-reduce. Facebook's Presto is one such example.;This thesis deals with evaluating the performance of Presto in processing big RDF data against Apache Hive. A comparative analysis was also conducted against 4store, a native RDF store. To evaluate the performance Presto for big RDF data processing, a map-reduce program and a compiler, based on Flex and Bison, were implemented. The map-reduce program loads RDF data into HDFS while the compiler translates SPARQL queries into a subset of SQL that Presto (and Hive) can understand. The evaluation was done on four and eight node Linux clusters installed on Microsoft Windows Azure platform with RDF datasets of size 10, 20, and 30 million triples. The results of the experiment show that Presto has a much higher performance than Hive can be used to process big RDF data. The thesis also proposes an architecture based on Presto, Presto-RDF, that can be used to process big RDF data.
机译:处理大量RDF数据需要一个有效的存储和查询处理引擎,该引擎可以随数据量很好地扩展。解决此问题的最初尝试集中在优化本机RDF存储以及常规关系数据库管理系统上。但是,随着RDF数据量成倍增长,这些系统的局限性变得显而易见,研究人员开始致力于使用大数据分析工具(尤其是Hadoop)来处理RDF数据。评估这些RDF数据处理工具的各种研究和基准已经发布。但是,在过去的两年半中,像Facebook这样的大数据系统的重度用户注意到了这些大数据系统的查询性能的局限性,并开始为不依赖于map-reduce的大数据开发新的分布式查询引擎。 。 Facebook的Presto就是一个这样的例子。本论文旨在评估Presto在针对Apache Hive处理大型RDF数据时的性能。还对本地RDF商店4store进行了比较分析。为了评估Presto对于大型RDF数据处理的性能,实施了基于Flex和Bison的map-reduce程序和编译器。 map-reduce程序将RDF数据加载到HDFS中,同时编译器将SPARQL查询转换为Presto(和Hive)可以理解的SQL子集。评估是在Microsoft Windows Azure平台上安装的四个和八个节点Linux群集上完成的,RDF数据集的大小分别为10、20和3000万个三元组。实验结果表明,Presto具有比Hive可用于处理大型RDF数据的性能高得多的性能。本文还提出了一种基于Presto的架构,即Presto-RDF,可用于处理大型RDF数据。

著录项

  • 作者

    Mammo, Mulugeta.;

  • 作者单位

    Arizona State University.;

  • 授予单位 Arizona State University.;
  • 学科 Computer science.
  • 学位 M.S.
  • 年度 2014
  • 页码 148 p.
  • 总页数 148
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号