首页> 外文会议>International Joint Conference on Computer Science and Software Engineering >A Performance Comparison of Apache Tez and MapReduce with Data Compression on Hadoop Cluster
【24h】

A Performance Comparison of Apache Tez and MapReduce with Data Compression on Hadoop Cluster

机译:Apache Tez和MapReeduce对Hadoop集群数据压缩的性能比较

获取原文

摘要

Big data is a popular topic on cloud computing research. The main characteristics of big data are volume, velocity and variety. These characteristics are difficult to handle by using traditional softwares and methods. Hadoop is open-source framework software which was developed to provide solutions for handling several domains of big data problems. For big data analytic, MapReduce framework is a main engine of Hadoop cluster and widely used nowadays. It uses a batch oriented processing. Apache also developed an alternative engine called "Tez". It supports an interactive query and does not write temporary data into HDFS. In this paper, we focus on the performance comparison between MapReduce and Tez. We also investigate the performance of these two engines with the compression of input files and map output files. Bzip is a compression algorithm used for input files and snappy is used for map output files. Word-count and terasort benchmarks are used in our experiments. For the word-count benchmark, the results show that Tez engine always has better execution-time than MapReduce engine for both of compressed data or non-compressed data. It can reduce an execution-time up to 39% comparing with the execution time of MapReduce engine. In contrast, the results show that Tez engine usually has higher execution-time than MapReduce engine up to 13% for terasort benchmark. The results also show that the performance of compressing map output files with snappy provides better performance on execution time for both benchmarks.
机译:大数据是云计算研究的热门话题。大数据的主要特征是体积,速度和品种。通过使用传统的软件和方法难以处理这些特性。 Hadoop是开源框架软件,该软件是开发的,为处理大数据问题的多个域提供解决方案。对于大数据分析,MapReduce Framework是Hadoop集群的主要引擎,现在广泛使用。它使用批量导向的处理。 Apache还开发了一个名为“TEZ”的替代引擎。它支持交互式查询,并不将临时数据写入HDFS。在本文中,我们专注于MapReduce和Tez之间的性能比较。我们还通过压缩输入文件和地图输出文件来调查这两个发动机的性能。 BZIP是用于输入文件的压缩算法,并且SNAPPY用于地图输出文件。我们的实验中使用了单词计数和Terasort基准。对于单词计数基准,结果表明,TEZ引擎始终具有比MapReduce引擎更好的执行时间,适用于压缩数据或非压缩数据。与MapReduce引擎的执行时间相比,它可以减少高达39%的执行时间。相比之下,结果表明,TERASORT基准测试泰茨发动机通常具有比Mapreduce发动机更高的执行时间,高达13%。结果还表明,使用SNAPPY压缩MAP输出文件的性能在两台基准的执行时间上提供了更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号