...
首页> 外文期刊>International journal of parallel programming >Automatic Halo Management for the Uintah GPU-Heterogeneous Asynchronous Many-Task Runtime
【24h】

Automatic Halo Management for the Uintah GPU-Heterogeneous Asynchronous Many-Task Runtime

机译:Uintah GPU异构异步多任务运行时的自动光晕管理

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The Uintah computational framework is used for the parallel solution of partial differential equations on adaptive mesh refinement grids using modern supercomputers. Uintah is structured with an application layer and a separate runtime system. Uintah is based on a distributed directed acyclic graph of computational tasks, with a task scheduler that efficiently schedules and executes these tasks on both CPU cores and on-node accelerators. The runtime system identifies task dependencies, creates a task graph prior to the execution of these tasks, automatically generates MPI message tags, and automatically performs halo transfers for simulation variables. Automating halo transfers in a heterogeneous environment poses significant challenges when tasks compute within a few milliseconds, as runtime overhead affects wall time execution, or when simulation variables require large halos spanning most or all of the computational domain, as task dependencies become expensive to process. These challenges are magnified at production scale when application developers require each compute node perform thousands of different halo transfers among thousands simulation variables. The principal contribution of this work is to (1) identify and address inefficiencies that arise when mapping tasks onto the GPU in the presence of automated halo transfers, (2) implement new schemes to reduce runtime system overhead, (3) minimize application developer involvement with the runtime, and (4) show overhead reduction results from these improvements.
机译:Uintah计算框架用于使用现代超级计算机在自适应网格细化网格上并行求解偏微分方程。 Uintah由应用程序层和单独的运行时系统构成。 Uintah基于计算任务的分布式有向无环图,具有任务计划程序,可以在CPU内核和节点加速器上高效地计划和执行这些任务。运行时系统识别任务相关性,在执行这些任务之前创建任务图,自动生成MPI消息标签,并自动执行仿真变量的光环转移。当任务在几毫秒内计算时,由于运行时开销会影响墙时间的执行,或者当仿真变量需要跨越大部分或所有计算域的大光环时,由于任务相关性的处理成本很高,因此在异构环境中自动进行光环转移会带来巨大挑战。当应用程序开发人员要求每个计算节点在数千个模拟变量之间执行数千个不同的光环转移时,这些挑战在生产规模上会放大。这项工作的主要贡献在于(1)识别并解决在存在自动光晕传输的情况下将任务映射到GPU时出现的低效率;(2)实施新方案以减少运行时系统开销;(3)最小化应用程序开发人员的参与(4)显示了这些改进带来的开销减少结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号