首页> 外文会议>International Conference on Innovative Computing Technology >Performance analysis of Shared-nothing SQL-on-Hadoop Frameworks based on Columnar Database Systems
【24h】

Performance analysis of Shared-nothing SQL-on-Hadoop Frameworks based on Columnar Database Systems

机译:基于柱状数据库系统的共享-Nood-On-HadoOps框架的性能分析

获取原文

摘要

Hadoop is a Java-based programming framework used by enterprises for management and analysis of large scale data originating from heterogeneous sources. To support the analysis of large scale data, different SQL-on-Hadoop systems are being utilized due to their ease of use for the people familiar with SQL. This study performs a comparative analysis of the SQL-on-Hadoop systems by comparing their performance with various hardware and software parameters. The performance of three SQL-on-Hadoop systems i.e. Hive, Impala and Tajo is analyzed by applying TPC-H benchmarks. The experimentation is done with two major and largely used file formats for columnar databases i.e. ORC and Parquet file formats. This work also investigates the performance of ORC and Parquet file formats and analyzes their characteristics along with various performance impacts of these two file formats on Hive, Impala, and Tajo. Finally, the results show that Impala outperforms Hive and Tajo by 5X to 10X when the workload dataset fits in its memory.
机译:Hadoop是一种基于Java的编程框架,由企业用于管理和分析来自异构来源的大规模数据。为了支持大规模数据的分析,由于熟悉SQL的人员的易用性,正在利用不同的SQL-on-Hadoop系统。本研究通过将其性能与各种硬件和软件参数进行比较来执行对SQL-on-Hadoop系统的比较分析。通过应用TPC-H基准来分析三个SQL-ON-HADOOP系统的性能I.E.ET.HIVE,IMPALA和TAJO。实验是用两个主要和大量使用的柱状数据库的文件格式完成,即orc和parquet文件格式。这项工作还调查了兽人和镶木地板文件格式的性能,并分析了它们的特征以及蜂巢,Impala和Tajo上这两个文件格式的各种性能影响。最后,结果表明,当工作负载数据集适合其存储器时,Impala以5x到10倍的蜂巢和tajo达到5倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号