首页> 外文会议>International Conference for High Performance Computing, Networking, Storage and Analysis >A Multi-faceted Approach to Job Placement for Improved Performance on Extreme-Scale Systems
【24h】

A Multi-faceted Approach to Job Placement for Improved Performance on Extreme-Scale Systems

机译:在极端规模的系统上提高绩效的多方位工作安置方法

获取原文

摘要

Job placement plays a pivotal role in application performance on supercomputers. We present a multi-faceted exploration to influence placement in extreme-scale systems, to improve network performance and decrease variability. In our first exploration, Scores, we developed a machine learning model that extracts features from a job's node-allocation and grades performance. This identified several important node-metrics that led to Dual-Ended scheduling, a means of reducing network contention without impacting utilization. In evaluations on the Titan supercomputer, we observed reductions in average hop-count by up to 50%. We also developed an improved node-layout strategy that targets a better balance between network latency and bandwidth, replacing the default ALPS layout on Titan that resulted in an average of 10% runtime improvement. Both of these efforts underscore the importance of a job placement strategy that is cognizant of workload mixture and network topology.
机译:作业放置在超级计算机上的应用程序性能中起着至关重要的作用。我们提出了一个多方面的探索,以影响极端规模系统中的放置,以改善网络性能并减少可变性。在我们的首次探索中,Scores开发了一种机器学习模型,该模型从作业的节点分配中提取特征并为绩效评分。这确定了导致双端调度的几个重要节点指标,这是在不影响利用率的情况下减少网络争用的一种方法。在Titan超级计算机上的评估中,我们观察到平均跳数减少了多达50%。我们还开发了一种改进的节点布局策略,旨在在网络延迟和带宽之间实现更好的平衡,替换了Titan上的默认ALPS布局,从而使运行时间平均提高了10%。这两项工作都凸显了认识工作负载混合和网络拓扑的工作安置策略的重要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号