首页> 外文学位 >Optimizing parallel job performance in data-intensive clusters.
【24h】

Optimizing parallel job performance in data-intensive clusters.

机译:在数据密集型集群中优化并行作业性能。

获取原文
获取原文并翻译 | 示例

摘要

Extensive data analysis has become the enabler for diagnostics and decision making in many modern systems. These analyses have both competitive as well as social benefits. To cope with the deluge in data that is growing faster than Moore's law, computation frameworks have resorted to massive parallelization of analytics jobs into many fine-grained tasks. These frameworks promised to provide efficient and fault-tolerant execution of these tasks. However, meeting this promise in clusters spanning hundreds of thousands of machines is challenging and a key departure from earlier work on parallel computing. A simple but key aspect of parallel jobs is the all-or-nothing property: unless all tasks of a job are provided equal improvement, there is no speedup in the completion of the job. The all-or-nothing property is critical for the promise of efficient and fault-tolerant parallel computations on large clusters. Meeting this promise in clusters of these scales is challenging and a key departure from prior work on distributed systems. This work examines the execution of a job from first principles and propose techniques spanning the software stack of data analytics systems such that its tasks achieve homogeneous performance while overcoming the various heterogeneities. To that end, we will propose techniques for (i) caching and cache replacement for parallel jobs, which outperforms even Belady's MIN (that uses an oracle), (ii) data locality, and (iii) straggler mitigation. Our analyses and evaluation are performed using workloads from Facebook and Bing production datacenters. Along the way, we will also describe how we broke the myth of disk-locality's importance in datacenter computing.
机译:广泛的数据分析已成为许多现代系统中的诊断和决策制定者。这些分析既具有竞争优势,也具有社会效益。为了应对比摩尔定律更快地增长的数据洪流,计算框架已将分析工作大规模并行化为许多细粒度的任务。这些框架有望为这些任务提供高效且容错的执行。但是,要在成千上万台计算机的集群中实现这一承诺具有挑战性,并且这与早期并行计算工作大相径庭。并行作业的一个简单但关键的方面是“要么有要么没有”的属性:除非为作业的所有任务提供相同的改进,否则作业的完成不会加速。 “全有或全无”属性对于在大型集群上实现高效且容错的并行计算的承诺至关重要。在如此规模的集群中实现这一承诺具有挑战性,并且与以前在分布式系统上的工作有很大的出入。这项工作从第一条原则检查了一项工作的执行情况,并提出了跨越数据分析系统软件堆栈的技术,以使其任务在克服各种异质性的同时达到同类性能。为此,我们将提出用于(i)并行作业的缓存和缓存替换的技术,该技术甚至优于Belady的MIN(使用oracle),(ii)数据局部性和(iii)缓解混乱的性能。我们的分析和评估是使用Facebook和Bing生产数据中心的工作负载执行的。在此过程中,我们还将描述如何打破磁盘本地化在数据中心计算中的重要性的神话。

著录项

  • 作者

    Ananthanarayanan, Ganesh.;

  • 作者单位

    University of California, Berkeley.;

  • 授予单位 University of California, Berkeley.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 124 p.
  • 总页数 124
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号