...
首页> 外文期刊>IEEE Transactions on Computers >Cross-Platform Resource Scheduling for Spark and MapReduce on YARN
【24h】

Cross-Platform Resource Scheduling for Spark and MapReduce on YARN

机译:YARN上用于Spark和MapReduce的跨平台资源调度

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

While MapReduce is inherently designed for batch and high throughput processing workloads, there is an increasing demand for non-batch processes on big data, e.g., interactive jobs, real-time queries, and stream computations. Emerging Apache Spark fills in this gap, which can run on an established Hadoop cluster and take advantages of existing HDFS. As a result, the deployment model of Spark-on-YARN is widely applied by many industry leaders. However, we identify three key challenges to deploy Spark on YARN, inflexible reservation-based resource management, inter-task dependency blind scheduling, and the locality interference between Spark and MapReduce applications. The three challenges cause inefficient resource utilization and significant performance deterioration. We propose and develop a cross-platform resource scheduling middleware, iKayak, which aims to improve the resource utilization and application performance in multi-tenant Spark-on-YARN clusters. iKayak relies on three key mechanisms: reservation-aware executor placement to avoid long waiting for resource reservation, dependency-aware resource adjustment to exploit under-utilized resource occupied by reduce tasks, and cross-platform locality-aware task assignment to coordinate locality competition between Spark and MapReduce applications. We implement iKayak in YARN. Experimental results on a testbed show that iKayak can achieve 50 percent performance improvement for Spark applications and 19 percent performance improvement for MapReduce applications, compared to two popular Spark-on-YARN deployment models, i.e., YARN-client model and YARN-cluster model.
机译:虽然MapReduce本质上是为批处理和高吞吐量处理工作负载而设计的,但对大数据的非批处理流程(例如交互式作业,实时查询和流计算)的需求不断增长。新兴的Apache Spark填补了这一空白,它可以在已建立的Hadoop集群上运行并利用现有HDFS的优势。结果,Spark-on-YARN的部署模型被许多行业领导者广泛采用。但是,我们确定了将Spark部署在YARN上的三个主要挑战,不灵活的基于预留的资源管理,任务间相关性盲目调度以及Spark和MapReduce应用程序之间的局部性干扰。这三个挑战导致资源利用效率低下和性能显着下降。我们提出并开发了一种跨平台的资源调度中间件iKayak,旨在提高多租户Spark-on-YARN集群中的资源利用率和应用程序性能。 iKayak依赖于三种关键机制:预留感知的执行器放置以避免长时间等待资源预留;依赖感知的资源调整以利用减少任务占用的未充分利用的资源;以及跨平台的本地感知任务分配,以协调之间的本地竞争。 Spark和MapReduce应用程序。我们在YARN中实施iKayak。与两种流行的YARN客户端模型和YARN群集模型相比,iKayak在Spark应用程序上的性能提高了50%,在MapReduce应用程序上的性能提高了19%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号