首页> 外文学位 >Distributed SPARQL over Big RDF Data, A Comparative Analysis Using Presto and MapReduce.

【24h】

Distributed SPARQL over Big RDF Data, A Comparative Analysis Using Presto and MapReduce.

机译：在大型RDF数据上进行分布式SPARQL，使用Presto和MapReduce进行比较分析。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The processing of large volumes of RDF data require an efficient storage and query processing engine that can scale well with the volume of data. The initial attempts to address this issue focused on optimizing native RDF stores as well as conventional relational databases management systems. But as the volume of RDF data grew to exponential proportions, the limitations of these systems became apparent and researchers began to focus on using big data analysis tools, most notably Hadoop, to process RDF data. Various studies and benchmarks that evaluate these tools for RDF data processing have been published. In the past two and half years, however, heavy users of big data systems, like Facebook, noted limitations with the query performance of these big data systems and began to develop new distributed query engines for big data that do not rely on map-reduce. Facebook's Presto is one such example.;This thesis deals with evaluating the performance of Presto in processing big RDF data against Apache Hive. A comparative analysis was also conducted against 4store, a native RDF store. To evaluate the performance Presto for big RDF data processing, a map-reduce program and a compiler, based on Flex and Bison, were implemented. The map-reduce program loads RDF data into HDFS while the compiler translates SPARQL queries into a subset of SQL that Presto (and Hive) can understand. The evaluation was done on four and eight node Linux clusters installed on Microsoft Windows Azure platform with RDF datasets of size 10, 20, and 30 million triples. The results of the experiment show that Presto has a much higher performance than Hive can be used to process big RDF data. The thesis also proposes an architecture based on Presto, Presto-RDF, that can be used to process big RDF data.

机译：处理大量RDF数据需要一个有效的存储和查询处理引擎，该引擎可以随数据量很好地扩展。解决此问题的最初尝试集中在优化本机RDF存储以及常规关系数据库管理系统上。但是，随着RDF数据量成倍增长，这些系统的局限性变得显而易见，研究人员开始致力于使用大数据分析工具（尤其是Hadoop）来处理RDF数据。评估这些RDF数据处理工具的各种研究和基准已经发布。但是，在过去的两年半中，像Facebook这样的大数据系统的重度用户注意到了这些大数据系统的查询性能的局限性，并开始为不依赖于map-reduce的大数据开发新的分布式查询引擎。。 Facebook的Presto就是一个这样的例子。本论文旨在评估Presto在针对Apache Hive处理大型RDF数据时的性能。还对本地RDF商店4store进行了比较分析。为了评估Presto对于大型RDF数据处理的性能，实施了基于Flex和Bison的map-reduce程序和编译器。 map-reduce程序将RDF数据加载到HDFS中，同时编译器将SPARQL查询转换为Presto（和Hive）可以理解的SQL子集。评估是在Microsoft Windows Azure平台上安装的四个和八个节点Linux群集上完成的，RDF数据集的大小分别为10、20和3000万个三元组。实验结果表明，Presto具有比Hive可用于处理大型RDF数据的性能高得多的性能。本文还提出了一种基于Presto的架构，即Presto-RDF，可用于处理大型RDF数据。

著录项

作者
Mammo, Mulugeta.;
展开▼
作者单位

Arizona State University.;

展开▼
授予单位 Arizona State University.;
学科 Computer science.
学位 M.S.
年度 2014
页码 148 p.
总页数 148
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. SPANG: a SPARQL client supporting generation and reuse of queries for distributed RDF databases [J] . Hirokazu Chiba, Ikuo Uchiyama BMC Bioinformatics . 2017,第1期

机译：SPANG：SPARQL客户端，支持生成和重用分布式RDF数据库的查询
2. Impact analysis of data placement strategies on query efforts in distributed RDF stores [J] . Janke Daniel, Staab Steffen, Thimm Matthias Journal of web semantics: . 2018,第MAY期

机译：数据放置策略对分布式RDF存储中查询工作的影响分析
3. Broadband connected employees and labour productivity: a comparative analysis of 14 European countries based on distributed Microdata access [J] . Eva Hagsten Economics of innovation and new technology . 2016,第5a6期

机译：宽带连接的员工和劳动生产率：基于分布式微数据访问的14个欧洲国家的比较分析
4. Distributed SPARQL over Big RDF Data: A Comparative Analysis Using Presto and MapReduce [C] . Mammo Mulugeta, Bansal Srividya K. 2015 IEEE International Congress on Big Data . 2015

机译：大RDF数据上的分布式SPARQL：使用Presto和MapReduce的比较分析
5. A new approach for fast processing of SPARQL queries on RDF quadruples [D] . Slavov, Vasil Georgiev 2015

机译：快速处理RDF四倍的SPARQL查询的新方法
6. SPANG: a SPARQL client supporting generation and reuse of queries for distributed RDF databases [O] . Hirokazu Chiba, Ikuo Uchiyama 2017

机译：SPANG：SPARQL客户端支持生成和重用分布式RDF数据库的查询
7. A survey and experimental comparison of distributed SPARQL engines for very large RDF data [O] . Abdelaziz Ibrahim, Harbi Razen, Khayyat Zuhair, 2017

机译：大型RDF数据的分布式SPARQL引擎的调查和实验比较

Distributed SPARQL over Big RDF Data, A Comparative Analysis Using Presto and MapReduce.

摘要

著录项

相似文献

相关主题

期刊订阅