首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium >Characterizing Scheduling Delay for Low-Latency Data Analytics Workloads
【24h】

Characterizing Scheduling Delay for Low-Latency Data Analytics Workloads

机译:表征低延迟数据分析工作负载的调度延迟

获取原文

摘要

Data analytics workloads are shifting to shorter task execution time, higher degree of parallelism, and execution on faster hardware. As a result, job scheduling is becoming a bottleneck, which needs to offer extreme low-latency, massive throughput, and high scalability. However, few efforts have been focused on systematically understanding the scheduling delay. In this paper, we propose a method and develop a tool, SD-checker, that decomposes the job scheduling delay into multiple components and characterizes each by extensive experiments. SDchecker extracts event messages through mining both cluster scheduler logs and application logs, and constructs a scheduling order graph for the ease of analysis. We use SDchecker to evaluate Spark-SQL on a popular cluster scheduler Yarn. Results show that the scheduling delay may account for 60% of the job runtime of small data analytics workloads. After decomposing the total scheduling delay, we find Spark itself contributes 70% of the delay. Through the evaluation and analysis, we conclude that (1) The causes of scheduling delay are determined by many factors, and (2) The job scheduling is not well optimized yet, and far from ideal for low-latency data analytics workloads.
机译:数据分析工作负载正在转移到更短的任务执行时间,更高的并行度以及在更快的硬件上执行。结果,作业调度已成为瓶颈,需要提供极低的延迟,巨大的吞吐量和高可伸缩性。但是,很少有工作集中在系统地了解调度延迟上。在本文中,我们提出了一种方法和开发工具SD-checker,该工具可将作业调度延迟分解为多个组件,并通过大量实验对其进行表征。 SDchecker通过挖掘集群调度程序日志和应用程序日志来提取事件消息,并构造一个调度顺序图以便于分析。我们使用SDchecker在流行的群集调度程序Yarn上评估Spark-SQL。结果表明,调度延迟可能占小型数据分析工作负载的工作时间的60%。分解总的调度延迟后,我们发现Spark本身占延迟的70%。通过评估和分析,我们得出结论:(1)调度延迟的原因由许多因素决定,并且(2)作业调度尚未得到很好的优化,对于低延迟数据分析工作负载而言,它还不是理想的选择。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号