首页> 外文期刊>Journal of Parallel and Distributed Computing >Dynamic memory-aware scheduling in spark computing environment
【24h】

Dynamic memory-aware scheduling in spark computing environment

机译:动态内存感知在Spark计算环境中的调度

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Scheduling plays an important role in improving the performance of big data-parallel processing. Spark is an in-memory parallel computing framework that uses a multi-threaded model in task scheduling. Most Spark task scheduling processes do not take the memory into account, but the number of concurrent task threads determined by the user. It emerges as a potential limitation for the performance. To overcome the limitations in the Spark-core source code, this paper proposes a dynamic Spark memory-aware task scheduler (DMATS), which not only treats memory and network I/O as a computational resource but also dynamically adjusts concurrency when scheduling tasks. Specifically, we first analyze the RDD based Spark execution engine to obtain the amount of task processing data and propose an algorithm for estimating the initial adaptive task concurrency, which is integrated with the known task input information and the executor memory. Then, a dynamic adjustment algorithm is proposed to change the concurrency dynamically through feedback information to optimally utilize the limited memory resources. We implement a dynamic memory-aware task scheduling (DMATS) in Spark 2.3.4 and evaluate performance with two typical benchmarks, shuffle-light and shuffle-heavy. The results show that the algorithm not only reduces the execution time by 43.64%, but also significantly improves resource utilization. Experiments also show that our proposed method has advantages compared with other similar works such as WASP.
机译:调度在提高大数据并行处理的性能方面发挥着重要作用。 Spark是一个内存的并行计算框架,它使用任务调度中的多线程模型。大多数Spark任务调度进程不会考虑内存,但用户确定的并发任务线程数。它作为表现的潜在限制。为了克服Spark-Core源代码中的限制,本文提出了一种动态的火花记忆感知任务调度程序(DMATS),其不仅将内存和网络I / O视为计算资源,而且在调度任务时也会动态调整并发性。具体地,我们首先分析基于RDD的火花执行引擎,以获得任务处理数据的量,并提出一种估计初始自适应任务并发性的算法,该算法与已知的任务输入信息和执行器存储器集成。然后,提出了一种动态调整算法来通过反馈信息动态地改变并发性以最佳地利用有限的存储器资源。我们在Spark 2.3.4中实现动态内存感知任务调度(DMATS),并使用两个典型的基准,Shuffle-Light和Shuffle-Shive评估性能。结果表明,该算法不仅将执行时间降低了43.64%,而且显着提高了资源利用率。实验还表明,与其他类似的工作相比,我们的提出方法具有如此类似的作品。

著录项

  • 来源
  • 作者单位

    College of Computer Science and Electronic Engineering Hunan University Hunan Changsha 410082 China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Hunan Changsha 410073 China;

    College of Computer Science and Electronic Engineering Hunan University Hunan Changsha 410082 China;

    College of Computer Science and Electronic Engineering Hunan University Hunan Changsha 410082 China;

    College of Computer and Communication Engineering Changsha University of Science and Technology Hunan Changsha 410076 China;

    College of Computer Science and Electronic Engineering Hunan University Hunan Changsha 410082 China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Concurrency; Dynamic adjustment; Memory resource; Spark; Task scheduling;

    机译:并发;动态调整;内存资源;火花;任务调度;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号