首页> 外文会议>IEEE International Conference on Cloud Computing >Chisel: A Resource Savvy Approach for Handling Skew in MapReduce Applications
【24h】

Chisel: A Resource Savvy Approach for Handling Skew in MapReduce Applications

机译:Chisel:MapReduce应用程序中处理偏斜的资源精明方法

获取原文
获取外文期刊封面目录资料

摘要

Skew mitigation has been a major concern in distributed programming frameworks like MapReduce. It is becoming more prominent with the increasing complexity in user requirements and computation involved. We present Chisel, a self-regulating skew detection and mitigation policy for MapReduce applications. The novelty of the approach is that it involves no scanning or sampling of input data to detect skew and hence incurs low overhead, provides better resource utilization and maintains output order and file structure. It is also transparent to the users and can be used as a plugin whenever required. We use Hadoop to implement our skew handling policies. Chisel implements two skew handling policies for mitigating skew. It does late skew detection for map operators i.e at the last wave of map execution, where skewed maps are selected on the basis of remaining time to complete. More maps are created dynamically over remaining data per block. An early skew detection i.e before starting shuffle phase, is done for reduce operator. This prevents the expensive shuffle and sort phases from delaying skew detection and job completion time. Multiple reducers are created per skewed partition, each shuffling data from a subset of total maps and starts processing it when their portion of maps are over. They need not wait for the completion of all the maps. Therefore, the barrier between map and reduce phase no longer remains a constraint for effective resource utilization. Chisel additionally implements an online job profiler to determine the start point of reduce tasks and also modifies the capacity scheduler to distribute reduce tasks evenly in the cluster. Chisel significantly decreases the overall execution time of jobs and increases resource utilization. Improvement depends directly upon the availability of resources in the cluster and skewness in the job.
机译:缓解歪斜一直是诸如MapReduce之类的分布式编程框架中的主要问题。随着用户需求和计算复杂性的增加,它变得越来越突出。我们介绍Chisel,这是一种针对MapReduce应用程序的自调节偏斜检测和缓解策略。该方法的新颖性在于它不扫描或采样输入数据以检测偏斜,因此产生的开销较低,提供了更好的资源利用率,并保持了输出顺序和文件结构。它对用户也是透明的,可以在需要时用作插件。我们使用Hadoop实施偏斜处理策略。凿子实施两种歪斜处理策略以减轻歪斜。它会为地图操作员进行后期偏斜检测,即在地图执行的最后一波中,根据剩余的完成时间选择偏斜的地图。在每个块的剩余数据上动态创建更多映射。为了减少操作员,进行了早期偏斜检测,即在开始随机播放阶段之前。这样可以防止昂贵的混洗和分类阶段延迟偏斜检测和作业完成时间。每个偏斜的分区都会创建多个化简器,每个约化图都从总图的子集中改组数据,并在它们的图部分结束时开始对其进行处理。他们无需等待所有地图的完成。因此,映射和缩减阶段之间的障碍不再是有效资源利用的限制。 Chisel还实现了一个在线作业分析器,以确定还原任务的起点,并且还修改了容量调度程序,以在集群中平均分配还原任务。凿子显着减少了作业的总体执行时间,并提高了资源利用率。改进直接取决于群集中资源的可用性和作业的偏度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号