首页> 外文会议>International conference on very large data bases >Only Aggressive Elephants are Fast Elephants
【24h】

Only Aggressive Elephants are Fast Elephants

机译:只有好斗的大象才是快象

获取原文

摘要

Yellow elephants are slow. A major reason is that they consume their inputs entirely before responding to an elephant rider's orders. Some clever riders have trained their yellow elephants to only consume parts of the inputs before responding. However, the teaching time to make an elephant do that is high. So high that the teaching lessons often do not pay off. We take a different approach. We make elephants aggressive; only this will make them very fast. We propose HAIL (Hadoop Aggressive Indexing Library), an enhancement of HDFS and Hadoop MapReduce that dramatically improves runtimes of several classes of MapReduce jobs. HAIL changes the upload pipeline of HDFS in order to create different clustered indexes on each data block replica. An interesting feature of HAIL is that we typically create a win-win situation: we improve both data upload to HDFS and the runtime of the actual Hadoop MapReduce job. In terms of data upload, HAIL improves over HDFS by up to 60% with the default replication factor of three. In terms of query execution, we demonstrate that HAIL runs up to 68x faster than Hadoop. In our experiments, we use six clusters including physical and EC2 clusters of up to 100 nodes. A series of scalability experiments also demonstrates the superiority of HAIL.
机译:黄象很慢。一个主要原因是,他们在回应骑大象者的命令之前就完全消耗了他们的投入。一些聪明的骑手已经训练了他们的黄色大象,以便在做出回应之前只消耗部分投入。但是,制作一头大象的教学时间很长。如此之高,以至于教学课程往往无法奏效。我们采用不同的方法。我们让大象具有攻击性;只有这样才能使它们变得非常快。我们建议使用HAIL(Hadoop积极索引索引库),它是HDFS和Hadoop MapReduce的增强功能,可以显着改善几类MapReduce作业的运行时间。 HAIL更改了HDFS的上载管道,以便在每个数据块副本上创建不同的聚集索引。 HAIL的一个有趣功能是,我们通常会创造一种双赢局面:我们不仅改进了将数据上传到HDFS的速度,而且还改善了实际Hadoop MapReduce作业的运行时间。在数据上传方面,默认复制因子为3,HAIL通过HDFS最多可提高60%。在查询执行方面,我们证明了HAIL的运行速度比Hadoop快68倍。在我们的实验中,我们使用六个集群,包括最多100个节点的物理集群和EC2集群。一系列可伸缩性实验也证明了HAIL的优越性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号