首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium >DAG-Aware Joint Task Scheduling and Cache Management in Spark Clusters
【24h】

DAG-Aware Joint Task Scheduling and Cache Management in Spark Clusters

机译:Spark集群中的DAG感知联合任务调度和缓存管理

获取原文

摘要

Data dependency, often presented as directed acyclic graph (DAG), is a crucial application semantics for the performance of data analytic platforms such as Spark. Spark comes with two built-in schedulers, namely FIFO and Fair scheduler, which do not take advantage of data dependency structures. Recently proposed DAG-aware task scheduling approaches, notably GRAPHENE, have achieved significant performance improvements but paid little attention to cache management. The resulted data access patterns interact poorly with the built-in LRU caching, leading to significant cache misses and performance degradation. On the other hand, DAG-aware caching schemes, such as Most Reference Distance (MRD), are designed for FIFO scheduler instead of DAG-aware task schedulers.In this paper, we propose and develop a middleware Dagon, which leverages the complexity and heterogeneity of DAGs to jointly execute task scheduling and cache management. Dagon relies on three key mechanisms: DAG-aware task assignment that considers dependency structure and heterogeneous resource demands to reduce potential resource fragmentation, sensitivity-aware delay scheduling that prevents executors from long waiting for tasks insensitive to locality, and priority-aware caching that makes the cache eviction and prefetching decisions based on the stage priority determined by DAG-aware task assignment. We have implemented Dagon in Apache Spark. Evaluation on a testbed shows that Dagon improves the job completion time by up to 42% and CPU utilization by up to 46% respectively, compared to GRAPHENE plus MRD.
机译:数据依赖关系(通常表示为有向无环图(DAG))是对数据分析平台(例如Spark)的性能至关重要的应用程序语义。 Spark带有两个内置的调度程序,即FIFO和Fair调度程序,它们没有利用数据依赖结构。最近提出的支持DAG的任务调度方法,尤其是GRAPHENE,已经实现了显着的性能改进,但很少关注缓存管理。产生的数据访问模式与内置LRU缓存的交互性很差,从而导致大量的缓存未命中和性能下降。另一方面,针对DAG的缓存方案(例如最参考距离(MRD))是为FIFO调度程序而不是针对DAG的任务调度程序而设计的。 DAG的异质性,以共同执行任务调度和缓存管理。 Dagon依赖于三种关键机制:DAG感知任务分配考虑了依赖关系结构和异构资源需求,以减少潜在的资源碎片;灵敏感知延迟调度可防止执行者长时间等待对本地不敏感的任务;以及优先级感知缓存缓存逐出和预取决策基于DAG感知任务分配确定的阶段优先级。我们已经在Apache Spark中实现了Dagon。在测试床上进行的评估表明,与GRAPHENE plus MRD相比,Dagon分别将作业完成时间提高了42%,将CPU利用率提高了46%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号